Skip to content

Conversation

@yanjiew1
Copy link

@yanjiew1 yanjiew1 commented Aug 12, 2024

This approach is from chapter 7 of Hacker's Delight. It eliminates the need of POPCNT and BZHI instructions totally, enhancing portability.

GCC generates less instructions compared to the original implementation.

This approach is from chapter 7 of Hacker's Delight. It eliminates
the need of POPCNT and BZHI instructions totally, enhancing portability.

GCC generates less instructions compared to the original implementation.
@yanjiew1
Copy link
Author

yanjiew1 commented Aug 14, 2024

The performance can be slower when compiled with "-O3" flag because the new implementation cannot take advantage of the LEA instruction.
If we compile this code with "-O3", GCC may inline the function and then optimize the code in a way that the shifts of the ppp are computed once for each mask. Thus, the number of instructions may not be reduced when "-O3" is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant