FlatMap: Use SSE2 intrinstics #1616

publixsubfan · 2025-07-16T00:53:52Z

Summary

When available, uses SSE2 operations for GroupBucket::getEmptyBucket() and GroupBucket::visitHashBucket(). This should accelerate performance of lookup and non-batched insertion operations on the CPU.

Performance

We see a roughly 3x performance bump at small numbers of elements, which drops to 2x at the 100k-900k element count, and to 1.3-1.5x at 1M-9M elements.

kennyweiss · 2025-07-16T00:57:10Z

src/axom/core/detail/FlatTable.hpp

 #include "axom/core/ArrayView.hpp"
 #include "axom/core/utilities/BitUtilities.hpp"

+#if defined(_MSC_VER)


Do we need to do anything special to get the intrinsics, e.g. compile with the -march flag?

My understanding is that nothing special is required if Axom is compiled as a 64-bit library, since support for up to SSE2 is a part of the x86-64 spec. For 32-bit x86 systems, you would need to specify -march=....

I confirmed with my changes from #1614 that I am using the new SSE2 intrinsics w/ the rzwhippet-clang host-config

kennyweiss

Thanks @publixsubfan

Arlie-Capps

Very cool, Max. Thanks!

rhornung67 · 2025-07-16T15:50:25Z

@publixsubfan Nice work!

publixsubfan added 2 commits July 15, 2025 16:45

FlatMap: optimize with SSE intrinsics

eb3b4e3

Detect SSE2 support across different compilers

4cabe1b

publixsubfan requested review from Arlie-Capps, BradWhitlock, bmhan12, jcs15c, kennyweiss, rhornung67 and white238 July 16, 2025 00:53

publixsubfan added Core Issues related to Axom's 'core' component Performance Issues related to code performance labels Jul 16, 2025

kennyweiss reviewed Jul 16, 2025

View reviewed changes

kennyweiss mentioned this pull request Jul 16, 2025

Use axom::FlatMap in spin's Octree implementation #1614

Merged

kennyweiss approved these changes Jul 16, 2025

View reviewed changes

bmhan12 approved these changes Jul 16, 2025

View reviewed changes

Arlie-Capps approved these changes Jul 16, 2025

View reviewed changes

rhornung67 approved these changes Jul 16, 2025

View reviewed changes

publixsubfan merged commit d5e807f into develop Jul 16, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FlatMap: Use SSE2 intrinstics #1616

FlatMap: Use SSE2 intrinstics #1616

Uh oh!

publixsubfan commented Jul 16, 2025

Uh oh!

kennyweiss Jul 16, 2025

Uh oh!

publixsubfan Jul 16, 2025

Uh oh!

kennyweiss Jul 16, 2025

Uh oh!

kennyweiss left a comment

Uh oh!

Arlie-Capps left a comment

Uh oh!

rhornung67 commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

FlatMap: Use SSE2 intrinstics #1616

FlatMap: Use SSE2 intrinstics #1616

Uh oh!

Conversation

publixsubfan commented Jul 16, 2025

Summary

Performance

Uh oh!

kennyweiss Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

publixsubfan Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

kennyweiss Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

kennyweiss left a comment

Choose a reason for hiding this comment

Uh oh!

Arlie-Capps left a comment

Choose a reason for hiding this comment

Uh oh!

rhornung67 commented Jul 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants