-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<bit>
: Improve has_single_bit()
codegen
#5367
base: main
Are you sure you want to change the base?
Conversation
seems like code is better for x86/x64/arm/arm64 : https://godbolt.org/z/fjshqMf4d the only potential drawback: old code would check for _Val = 0 and jump. New code has no jumps (except on arm) |
<bit>
: Improve has_single_bit()
codegen
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
0c35bcb
to
cae8296
Compare
cae8296
to
b96db7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, some nitpicks...
4c7e0e7
to
16d3c6a
Compare
16d3c6a
to
f546b3b
Compare
86f7638
to
14656fc
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Can you please try this local change also: _NODISCARD constexpr bool has_single_bit(const _Ty _Val) noexcept {
#if 1 // _POPCNT_INTRINSICS_ALWAYS_AVAILABLE
#if _HAS_CXX20
if (_STD is_constant_evaluated())
return 1 == _Popcount_fallback(_Val);
#endif // _HAS_CXX20
return 1 == _Unchecked_popcount(_Val);
#else
return (_Val ^ (_Val - 1)) > _Val - 1;
#endif
} |
Turns out this particular benchmark is very much affected by #4496 and That's why I've hidden the previous results, and obtained the updated results by changing also this line to say Line 62 in f2a2933
i5-1235U x64 P-core
E-core
Apparently #4496 wasn't a severe obstacle for most previous optimization, but at such micro level it is really disrupting, we'll need to get into that. |
Before you go with this in the PR:
|
This PR doesn't have popcnt change. The popcnt change I suggest above doesn't use the dispatcher, it's the unchecked version that does the I can add popcnt it as a separate PR, i think it's better that way. |
For the record, ARM32 performance is indeed utterly irrelevant. Windows ripped out their ARM32 build, and will even be dropping support for ARM32 binaries running on ARM64 machines, see https://learn.microsoft.com/en-us/windows/arm/arm32-to-arm64 . ARM32 still needs to behave correctly (until DevDiv/MSVC drops support for targeting it, which I am constantly asking about), but we don't want to waste any more time on it. |
Thanks! I pushed some changes to the benchmark for consistency with repo conventions. |
code avoids branches, provides 20% speed up on x64 targets.
x64 results
partially addresses #5359