Search for first duplicate in vectorized `unique` #5363

AlexGuteniev · 2025-03-25T19:35:58Z

🔍 Missing piece

Vectorized remove has carefully considered find before actual removal loop. But the similar adjacent_find was missed in unique.

The effect is at least performance. Find is faster as vectorization, and it also avoids extra writes. I doubt about whether they have correctness impact though.

I've added the missing piece in separately compiled code. This is not the same as in remove, where it is in the header. But unique has in-place search for duplicates in scalar part that is tightly coupled with the rest of it, so I thought it is better to avoid disrupting that.

🏁 Benchmark

The modified benchmark adds two regions without duplicates: at the beginning and at the end of the range. The beginning one is avoidable with adjacent_find.

I've ran the modified benchmark for the main before #5092, to have the updated non-vectorized baseline to compare against

⏱️ Benchmark results

Benchmark	Before vectorization	Before PR	After PR
u<alg_type::std_fn, std::uint8_t>	1181 ns	540 ns	217 ns
u<alg_type::std_fn, std::uint16_t>	1075 ns	649 ns	208 ns
u<alg_type::std_fn, std::uint32_t>	1204 ns	775 ns	285 ns
u<alg_type::std_fn, std::uint64_t>	1110 ns	1482 ns	551 ns
u<alg_type::rng, std::uint8_t>	1233 ns	540 ns	221 ns
u<alg_type::rng, std::uint16_t>	1059 ns	653 ns	207 ns
u<alg_type::rng, std::uint32_t>	1298 ns	785 ns	285 ns
u<alg_type::rng, std::uint64_t>	1216 ns	1490 ns	629 ns

💡 Results interpretation

The effect is way more than expected. in particular, the "Before PR" column has shown that vectorization performance has severely degraded after adding already-unique ranges parts to the input.

Apparently it is due to reading just written output. Before the first removed element we read back last element previously written along with newly read elements. The mix of cache data and store buffer data causes the CPU to stall until the store buffer entry reaches the cache. So when we skip to the first non-unique, it is no longer a problem.

StephanTLavavej · 2025-04-09T12:07:48Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej · 2025-04-10T20:18:08Z

🏇 🏎️ 🚀

AlexGuteniev added 2 commits March 25, 2025 20:57

benchmark

8cf50ce

fixup

92bb791

AlexGuteniev requested a review from a team as a code owner March 25, 2025 19:35

github-project-automation bot added this to STL Code Reviews Mar 25, 2025

github-project-automation bot moved this to Initial Review in STL Code Reviews Mar 25, 2025

StephanTLavavej added the performance Must go faster label Mar 25, 2025

StephanTLavavej self-assigned this Mar 25, 2025

no longer requiring that

89baac9

StephanTLavavej approved these changes Apr 1, 2025

View reviewed changes

StephanTLavavej removed their assignment Apr 1, 2025

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Apr 1, 2025

StephanTLavavej mentioned this pull request Apr 1, 2025

Maintainer priorities #4700

Open

StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Apr 9, 2025

StephanTLavavej self-assigned this Apr 9, 2025

StephanTLavavej merged commit d3c9bf9 into microsoft:main Apr 10, 2025
39 checks passed

github-project-automation bot moved this from Merging to Done in STL Code Reviews Apr 10, 2025

AlexGuteniev deleted the piano-in-the-bushes branch April 11, 2025 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search for first duplicate in vectorized `unique` #5363

Search for first duplicate in vectorized `unique` #5363

AlexGuteniev commented Mar 25, 2025 •

edited

Loading

StephanTLavavej commented Apr 9, 2025

StephanTLavavej commented Apr 10, 2025

Search for first duplicate in vectorized unique #5363

Search for first duplicate in vectorized unique #5363

Conversation

AlexGuteniev commented Mar 25, 2025 • edited Loading

🔍 Missing piece

🏁 Benchmark

⏱️ Benchmark results

💡 Results interpretation

StephanTLavavej commented Apr 9, 2025

StephanTLavavej commented Apr 10, 2025

🏇 🏎️ 🚀

Search for first duplicate in vectorized `unique` #5363

Search for first duplicate in vectorized `unique` #5363

AlexGuteniev commented Mar 25, 2025 •

edited

Loading