`<bit>`: Improve `has_single_bit()` codegen #5367

pps83 · 2025-03-27T00:47:19Z

code avoids branches, provides 20% speed up on x64 targets.

x64 results

---------------------------------------
Benchmark                          Time
---------------------------------------
Rand16_has_single_bit            284 ns
Rand16_has_single_bit_new        232 ns
Rand32_has_single_bit            306 ns
Rand32_has_single_bit_new        246 ns
Rand64_has_single_bit            289 ns
Rand64_has_single_bit_new        233 ns

partially addresses #5359

pps83 · 2025-03-27T01:00:54Z

seems like code is better for x86/x64/arm/arm64 : https://godbolt.org/z/fjshqMf4d

the only potential drawback: old code would check for _Val = 0 and jump. New code has no jumps (except on arm)

stl/inc/bit

AlexGuteniev

Looks good to me, some nitpicks...

benchmarks/src/has_single_bit.cpp

pps83 · 2025-03-27T19:06:38Z

Can you please try this local change also:

_NODISCARD constexpr bool has_single_bit(const _Ty _Val) noexcept {
#if 1 // _POPCNT_INTRINSICS_ALWAYS_AVAILABLE
#if _HAS_CXX20
    if (_STD is_constant_evaluated())
        return 1 == _Popcount_fallback(_Val);
#endif // _HAS_CXX20
    return 1 == _Unchecked_popcount(_Val);
#else
    return (_Val ^ (_Val - 1)) > _Val - 1;
#endif
}

AlexGuteniev · 2025-03-27T19:41:49Z

Turns out this particular benchmark is very much affected by #4496 and popcnt version is slower.

That's why I've hidden the previous results, and obtained the updated results by changing also this line to say Release instead of RelWithDebInfo

STL/benchmarks/CMakeLists.txt

Line 62 in f2a2933

set(CMAKE_BUILD_TYPE RelWithDebInfo)

i5-1235U

x64

P-core

Benchmark	Before	After	popcnt
has_single_bit_if<uint16_t>	11.5 ns	10.7 ns	6.68 ns
has_single_bit_if<uint32_t>	11.3 ns	11.3 ns	10.6 ns
has_single_bit_if<uint64_t>	6.98 ns	11.7 ns	6.57 ns
has_single_bit<uint16_t>	6.36 ns	4.51 ns	3.51 ns
has_single_bit<uint32_t>	6.08 ns	4.71 ns	3.52 ns
has_single_bit<uint64_t>	5.27 ns	4.71 ns	3.53 ns

E-core

Benchmark	Before	After	popcnt
has_single_bit_if<uint16_t>	16.2 ns	14.1 ns	14.1 ns
has_single_bit_if<uint32_t>	16.7 ns	14.0 ns	14.0 ns
has_single_bit_if<uint64_t>	16.2 ns	16.2 ns	14.0 ns
has_single_bit<uint16_t>	12.2 ns	9.07 ns	7.24 ns
has_single_bit<uint32_t>	11.9 ns	9.07 ns	7.24 ns
has_single_bit<uint64_t>	11.6 ns	9.07 ns	7.25 ns

Apparently popcnt is a good idea. Even for the "if" case.

#4496 wasn't a severe obstacle for most previous optimization, but at such micro level it is really disrupting, we'll need to get into that.

AlexGuteniev · 2025-03-27T19:49:01Z

Can you please try this local change also:

Before you go with this in the PR:

We don't compare like 1 == function() in new code, although there are some relic precedents. In new code, we use natural order function() == 1
Always control flow braces {} even around single statement in if
You don't need to check #if _HAS_CXX20. It is C++20 header, and the check is already there at the very beginning for the whole content

pps83 · 2025-03-27T20:24:13Z

Before you go with this in the PR...

This PR doesn't have popcnt change. The popcnt change I suggest above doesn't use the dispatcher, it's the unchecked version that does the popcnt opcode directly and it should only be used when building for AVX+. The default (version with this PR) is better than using dispatcher imo. _Checked_popcount will use the dispatcher.

I can add popcnt it as a separate PR, i think it's better that way.

StephanTLavavej · 2025-03-27T22:30:25Z

For the record, ARM32 performance is indeed utterly irrelevant. Windows ripped out their ARM32 build, and will even be dropping support for ARM32 binaries running on ARM64 machines, see https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64 .

ARM32 still needs to behave correctly (until DevDiv/MSVC drops support for targeting it, which I am constantly asking about), but we don't want to waste any more time on it.

benchmarks/src/has_single_bit.cpp

StephanTLavavej · 2025-04-01T18:24:34Z

Thanks! I pushed some changes to the benchmark for consistency with repo conventions.

<bit>: Improve std::has_single_bit

365b42a

pps83 requested a review from a team as a code owner March 27, 2025 00:47

StephanTLavavej changed the title ~~<bit>: Improve std::has_single_bit~~ <bit>: Improve has_single_bit() codegen Mar 27, 2025

StephanTLavavej added the performance Must go faster label Mar 27, 2025

StephanTLavavej requested changes Mar 27, 2025

View reviewed changes

stl/inc/bit Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

Keep old code for 32-bit x86 (30% faster)

59a62b8

pps83 force-pushed the bithas_single_bit-faster branch 6 times, most recently from 0c35bcb to cae8296 Compare March 27, 2025 15:09

Add has_single_bit benchmark

b96db7a

pps83 force-pushed the bithas_single_bit-faster branch from cae8296 to b96db7a Compare March 27, 2025 15:18

AlexGuteniev reviewed Mar 27, 2025

View reviewed changes

benchmarks/src/has_single_bit.cpp Outdated Show resolved Hide resolved

benchmarks/src/has_single_bit.cpp Outdated Show resolved Hide resolved

benchmarks/src/has_single_bit.cpp Outdated Show resolved Hide resolved

pps83 force-pushed the bithas_single_bit-faster branch from 4c7e0e7 to 16d3c6a Compare March 27, 2025 15:35

update

f546b3b

pps83 force-pushed the bithas_single_bit-faster branch from 16d3c6a to f546b3b Compare March 27, 2025 15:47

test 1 per iteration

14656fc

pps83 force-pushed the bithas_single_bit-faster branch from 86f7638 to 14656fc Compare March 27, 2025 15:54

This comment was marked as outdated.

Sign in to view

StephanTLavavej self-assigned this Mar 27, 2025

Code review feedback.

283c320

StephanTLavavej reviewed Apr 1, 2025

View reviewed changes

StephanTLavavej approved these changes Apr 1, 2025

View reviewed changes

StephanTLavavej removed their assignment Apr 1, 2025

StephanTLavavej mentioned this pull request Apr 1, 2025

Maintainer priorities #4700

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<bit>`: Improve `has_single_bit()` codegen #5367

`<bit>`: Improve `has_single_bit()` codegen #5367

pps83 commented Mar 27, 2025 •

edited

Loading

pps83 commented Mar 27, 2025

This comment was marked as resolved.

This comment was marked as resolved.

AlexGuteniev left a comment

This comment was marked as outdated.

pps83 commented Mar 27, 2025

AlexGuteniev commented Mar 27, 2025 •

edited

Loading

AlexGuteniev commented Mar 27, 2025

pps83 commented Mar 27, 2025 •

edited

Loading

StephanTLavavej commented Mar 27, 2025

StephanTLavavej commented Apr 1, 2025

<bit>: Improve has_single_bit() codegen #5367

Are you sure you want to change the base?

<bit>: Improve has_single_bit() codegen #5367

Conversation

pps83 commented Mar 27, 2025 • edited Loading

pps83 commented Mar 27, 2025

This comment was marked as resolved.

This comment was marked as resolved.

AlexGuteniev left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

pps83 commented Mar 27, 2025

AlexGuteniev commented Mar 27, 2025 • edited Loading

AlexGuteniev commented Mar 27, 2025

pps83 commented Mar 27, 2025 • edited Loading

StephanTLavavej commented Mar 27, 2025

StephanTLavavej commented Apr 1, 2025

`<bit>`: Improve `has_single_bit()` codegen #5367

`<bit>`: Improve `has_single_bit()` codegen #5367

pps83 commented Mar 27, 2025 •

edited

Loading

AlexGuteniev commented Mar 27, 2025 •

edited

Loading

pps83 commented Mar 27, 2025 •

edited

Loading