Skip to content

Conversation

@kennyweiss
Copy link
Member

@kennyweiss kennyweiss commented Jul 15, 2025

Summary

  • This PR is progress towards a GPU-capable InOutOctree implementation (Port InOutOctree to GPU #993)
  • This first PR uses axom::FlatMap in the implementation of spin::SparseOctreeLevel
    • Everything is still single threaded on the CPU
  • It also cleans up and modernizes the Octree implementation a bit

Performance

Here is the performance difference I saw using axom::FlatMap instead of sparsehash in the implementation
in the InOutOctree containment_driver example.

Sphere (1K tris) BoxedSphere (65K tris) Plane (373K tris)
Construct Query $64^3$ Query $256^3$ Construct Query $64^3$ Query $256^3$ Construct Query $64^3$ Query $256^3$
Using FlatMap
Debug 0.153 0.856 -- 25.145 0.700 -- 116.298 0.394 --
Release 0.012 0.155 9.187 1.500 0.112 6.412 7.301 0.027 1.608
Using sparsehash
Debug 0.145 0.763 -- 25.833 0.639 -- 111.526 0.412 --
Release 0.009 0.126 7.405 1.163 0.086 4.903 5.608 0.027 1.364
Diff FlatMap/Sparsehash
Debug 1.06 1.12 -- 0.97 1.10 -- 1.04 0.96 --
Release 1.25 1.23 1.24 1.29 1.30 1.31 1.30 1.00 1.18

Notes/observations:

  • Each run was of the form
> ./examples/quest_containment_driver_ex -i /path/to/stl   --caliper report
  • Timings are in seconds
  • I only ran each query a single time, but didn't notice too much difference on repeated runs
  • For Debug configs, I only ran the query to $64^3$ query points
  • Overall, FlatMap worked as a drop-in replacement for std:unordered_map and sparsehash
  • In this first step, I am building up the FlatMap incrementally. I.e., I am not using the batched creation (FlatMap: add method for batched GPU construction #1610)
  • In debug configurations, the performance is within 5-10% for construction and queries
  • In release configurations, the FlatMap performance is about ~25-30% slower than sparsehash for construction and queries

@kennyweiss kennyweiss self-assigned this Jul 15, 2025
@kennyweiss kennyweiss added Quest Issues related to Axom's 'quest' component Spin Issues related to Axom's 'spin' component maintenance Issues related to code maintenance labels Jul 15, 2025
@kennyweiss
Copy link
Member Author

kennyweiss commented Jul 15, 2025

The table above didn't render with all the examples I ran, so I removed the golfball example.
Here's a screenshot from excel with golfball dataset (I forgot to run it w/ FlatMap in my Debug config, so it's missing that data):
image

#else
using MapType = std::unordered_map<RepresentationType, BroodDataType>;
#endif
using MapType = axom::FlatMap<RepresentationType, BroodDataType>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is the main change in this PR -- using axom::FlatMap instead of google::dense_hash_map or std::unordered_map in SparseOctreeLevel

Comment on lines 151 to +155
IteratorHelper(OctreeLevelType* octLevel, bool begin)
: m_offset(0)
: m_currentIter(begin ? octLevel->m_map.begin() : octLevel->m_map.end())
, m_offset(0)
, m_isLevelZero(octLevel->level() == 0)
{
m_currentIter = begin ? octLevel->m_map.begin() : octLevel->m_map.end();
}
{ }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@publixsubfan -- FlatMap was mostly a drop-in replacement for the other hash maps.

The only part that needed adjustment was in this constructor. It seems that the iterator for FlatMap does not have a default constructor.

Without this change, I got lots of errors of the form

In file included from <axom>/src/tools/data_collection_util.cpp:12:
In file included from <axom>/build_axom/include/axom/quest.hpp:12:
In file included from <axom>/src/axom/quest/Delaunay.hpp:14:
In file included from <axom>/build_axom/include/axom/spin.hpp:17:
In file included from <axom>/src/axom/spin/OctreeBase.hpp:22:
<axom>/src/axom/spin/SparseOctreeLevel.hpp:157:5: error: constructor for 'axom::spin::SparseOctreeLevel<3, axom::quest::InOutBlockData, unsigned int>::IteratorHelper<const axom::spin::SparseOctreeLevel<3, axom::quest::InOutBlockData, unsigned int>, axom::FlatMap<unsigned int, axom::NumericArray<axom::quest::InOutBlockData, 8>>::IteratorImpl<true>, axom::spin::OctreeLevel<3, axom::quest::InOutBlockData>::ConstBlockIteratorHelper>' must explicitly initialize the member 'm_currentIter' which does not have a default constructor
    IteratorHelper(OctreeLevelType* octLevel, bool begin)
    ^
<axom>/src/axom/spin/SparseOctreeLevel.hpp:220:16: note: in instantiation of member function 'axom::spin::SparseOctreeLevel<3, axom::quest::InOutBlockData, unsigned int>::IteratorHelper<const axom::spin::SparseOctreeLevel<3, axom::quest::InOutBlockData, unsigned int>, axom::FlatMap<unsigned int, axom::NumericArray<axom::quest::InOutBlockData, 8>>::IteratorImpl<true>, axom::spin::OctreeLevel<3, axom::quest::InOutBlockData>::ConstBlockIteratorHelper>::IteratorHelper' requested here
    return new ConstIterHelper(this, begin);
               ^
<axom>/src/axom/spin/SparseOctreeLevel.hpp:202:3: note: in instantiation of member function 'axom::spin::SparseOctreeLevel<3, axom::quest::InOutBlockData, unsigned int>::getIteratorHelper' requested here
  SparseOctreeLevel(int level = -1) : Base(level) { BroodTraits::initializeMap(m_map); }
  ^
<axom>/src/axom/spin/OctreeBase.hpp:453:35: note: in instantiation of member function 'axom::spin::SparseOctreeLevel<3, axom::quest::InOutBlockData, unsigned int>::SparseOctreeLevel' requested here
        m_leavesLevelMap[i] = new Sparse32OctLevType(i);
                                  ^
<axom>/src/axom/spin/SpatialOctree.hpp:48:7: note: in instantiation of member function 'axom::spin::OctreeBase<3, axom::quest::InOutBlockData>::OctreeBase' requested here
    : BaseOctree()
      ^
<axom>/src/axom/quest/InOutOctree.hpp:166:7: note: in instantiation of member function 'axom::spin::SpatialOctree<3, axom::quest::InOutBlockData>::SpatialOctree' requested here
    : SpatialOctreeType(GeometricBoundingBox(bb).scale(DEFAULT_BOUNDING_BOX_SCALE_FACTOR))
      ^
<axom>/src/axom/quest/detail/shaping/InOutSampler.hpp:92:20: note: in instantiation of member function 'axom::quest::InOutOctree<3>::InOutOctree' requested here
    m_octree = new InOutOctreeType(m_bbox, m_surfaceMesh);
                   ^
<axom>/src/axom/quest/SamplingShaper.hpp:176:25: note: in instantiation of member function 'axom::quest::shaping::InOutSampler<3>::initSpatialIndex' requested here
      m_inoutSampler3D->initSpatialIndex(this->m_vertexWeldThreshold);
                        ^
<axom>/src/axom/spin/SparseOctreeLevel.hpp:195:21: note: member is declared here
    AdaptedIterType m_currentIter;
                    ^
<axom>/src/axom/core/FlatMap.hpp:671:42: note: 'axom::FlatMap<unsigned int, axom::NumericArray<axom::quest::InOutBlockData, 8>>::IteratorImpl<true>' declared here
class FlatMap<KeyType, ValueType, Hash>::IteratorImpl
                                         ^

i.e.: error: constructor for 'ConstBlockIteratorHelper' must explicitly initialize the member 'm_currentIter' which does not have a default constructor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that looks like a bug in FlatMap: the LegacyForwardIterator concept also requires DefaultConstructible https://en.cppreference.com/w/cpp/named_req/ForwardIterator.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #1618 to track this.

struct PointHash
{
using MortonIndex = std::size_t;
using result_type = std::size_t;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This was the only other change to make axom::FlatMap work with the PointHash class!
(i.e. adding an expected typedef)

/**
* \brief Generate the spatial index over the surface mesh
*/
/// \brief Generate the spatial index over the surface mesh
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all the noise in this PR -- I used it as an opportunity to clean up documentation in the files related to the InOutOctree

//Returns 30 bit morton code for coordinates point is expected to be between [0,1]
template <typename FloatType, int Dims>
static inline AXOM_HOST_DEVICE std::int32_t morton32_encode(const primal::Vector<FloatType, Dims>& point)
static inline AXOM_HOST_DEVICE std::uint32_t morton32_encode(const primal::Vector<FloatType, Dims>& point)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another minor change -- the return type for these morton_encode functions should be unsigned since the MortonIndexType is unsigned.

For other recent changes to this function, see https://github.com/LLNL/axom/pull/1611/files#diff-17b5f2106d61093b9348d431dc0e84ce494f4551984015540561e7ebf53c0e32

While looking at this again, I decided to use axom::clampVal instead of fmin(fmax, ...)

@rhornung67
Copy link
Member

@kennyweiss based on the performance results, is your plan to use sparsehash for CPU implementation and flatmap for GPU? That would seem to be an easy configuration option since flatmap is a drop-in replacement for sparshehash.

Would sparsehash need to be enhanced to be thread safe for OpenMP parallelism, for example?

@kennyweiss
Copy link
Member Author

Thanks for the SSE improvements @publixsubfan!
I reran my experiments using the SSE2 intrinsics from #1616

Here are my updated results for release configs (note the different y-axis ranges):
inout_octree_construct_times
inout_octree_query64_times
inout_octree_query256_times

There's a definite improvement, but there's still a performance regression w.r.t. sparsehash.

Here are the comparisons


--- Construct ---
Model               FlatMap/sparsehash     FlatMap w/ SSE2/sparsehash
Sphere                            1.33                           1.11
BoxedSphere                       1.29                           1.20
Plane                             1.30                           1.21
Golf ball                         1.25                           1.17

--- Query 64³ ---
Model               FlatMap/sparsehash     FlatMap w/ SSE2/sparsehash
Sphere                            1.23                           1.22
BoxedSphere                       1.30                           1.24
Plane                             1.00                           0.89
Golf ball                         1.28                           1.16

--- Query 256³ ---
Model               FlatMap/sparsehash     FlatMap w/ SSE2/sparsehash
Sphere                            1.24                           1.23
BoxedSphere                       1.31                           1.27
Plane                             1.18                           1.07
Golf ball                         1.32                           1.23

@kennyweiss
Copy link
Member Author

kennyweiss commented Jul 16, 2025

@kennyweiss based on the performance results, is your plan to use sparsehash for CPU implementation and flatmap for GPU? That would seem to be an easy configuration option since flatmap is a drop-in replacement for sparshehash.

Would sparsehash need to be enhanced to be thread safe for OpenMP parallelism, for example?

@rhornung67 -- While it would be relatively easy to add support back for the sparsehash in this PR, it will become much more difficult to support both once we start using the batch creation features from FlatMap (#1610), and I'm anticipating the latter (and associated algorithmic changes) to yield significant speedups in construction times. I spoke to a few key users of the InOutOctree and they are ok with the temporary performance regression as we port this to the GPU. For now, I'll note the regression in the RELEASE-NOTES.

@rhornung67
Copy link
Member

@kennyweiss how does performance compare with sparsehash using @publixsubfan SIMD approach?

@kennyweiss
Copy link
Member Author

@kennyweiss how does performance compare with sparsehash using @publixsubfan SIMD approach?

I posted it in the charts above. The SSE intrinsics get us part of the way there. There's now about 10-20% slowdown in construction and about 20% slowdown in queries (with some exceptions).

Copy link
Member

@BradWhitlock BradWhitlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems fine as it is mostly comment reformatting, enum->constexpr changes, and minor changes needed to use FlatMap.

@kennyweiss kennyweiss merged commit e33582f into develop Jul 21, 2025
15 checks passed
@kennyweiss kennyweiss deleted the feature/kweiss/inout-gpu-prep branch July 21, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Issues related to code maintenance Quest Issues related to Axom's 'quest' component Spin Issues related to Axom's 'spin' component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants