Skip to content

Conversation

@publixsubfan
Copy link
Contributor

Summary

When available, uses SSE2 operations for GroupBucket::getEmptyBucket() and GroupBucket::visitHashBucket(). This should accelerate performance of lookup and non-batched insertion operations on the CPU.

Performance

image image

We see a roughly 3x performance bump at small numbers of elements, which drops to 2x at the 100k-900k element count, and to 1.3-1.5x at 1M-9M elements.

@publixsubfan publixsubfan added Core Issues related to Axom's 'core' component Performance Issues related to code performance labels Jul 16, 2025
#include "axom/core/ArrayView.hpp"
#include "axom/core/utilities/BitUtilities.hpp"

#if defined(_MSC_VER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do anything special to get the intrinsics, e.g. compile with the -march flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that nothing special is required if Axom is compiled as a 64-bit library, since support for up to SSE2 is a part of the x86-64 spec. For 32-bit x86 systems, you would need to specify -march=....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed with my changes from #1614 that I am using the new SSE2 intrinsics w/ the rzwhippet-clang host-config

Copy link
Member

@kennyweiss kennyweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @publixsubfan

Copy link
Contributor

@Arlie-Capps Arlie-Capps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, Max. Thanks!

@rhornung67
Copy link
Member

@publixsubfan Nice work!

@publixsubfan publixsubfan merged commit d5e807f into develop Jul 16, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Issues related to Axom's 'core' component Performance Issues related to code performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants