Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Right now this is rather an experiment investigating possibilities of integrating SIMD-enabled algorithms into the math library. Currently there's the simplest algorithms ever, packing and unpacking index arrays to larger/smaller representations (and I was not able to beat the compiler, my AVX2 version was as what GCC managed to optimize out of the scalar implementation). It also shows how GCC function multi-versioning could be used to have multiple implementations in the same binary instead of compiling always only for one target.
CPU feature detection and dispatch were moved to mosra/corrade#115, the following list contains just the remaining tasks related to actual SIMD code:
investigate differences in intrinsics on MSVC vs GCCnot that many based on practical experience with tests in Compile-time and runtime CPU feature (SIMD) detection and dispatch corrade#115Travis has ARM builds now: https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-supportTravis is dead, having a CircleCI ARM build instead-m<arch>
in order to be able to use the intrinsics at all -- resolved in Compile-time and runtime CPU feature (SIMD) detection and dispatch corrade#115/arch:SSE2
and use all SSE3–SSE4.2 intrinsics, then/arch:AVX
for all AVX1 code etc, which reduces the combinations a lot; technically one can usetarget
attribute that could solve this -- what's the support on 4.8? what about clang? alternatives on msvc?NEVER_INLINE
on runtime dispatchers so LTO + speculative execution doesn't cause SIGILL? https://www.reddit.com/r/cpp/comments/eneib0/detecting_sse_features_at_runtime/fed6zvr?utm_source=share&utm_medium=web2x&context=3 -- investigate further, might be avoidable with the target attributesUseful SIMD-able things to add
List will grow.
MeshTools::generateSmoothNormals()
/generateFlatNormals()
, nicely self-contained and could have a big visible impactUtility::copy()
(lots of potential for optimizations when the (sub)ranges are contiguous or evenly spaced)Containers::BoolArray
, popcnt and other operations