-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #1588 #1590
fix #1588 #1590
Conversation
This avoids adding compiler flags to the CMake configuration, giving developers proper flexibility. |
@arturbac, thanks for this pull request. I deleted the old |
I think what I mentioned in issue #1591, you don't need explicit intrinsics at all, You should avoid using them always. All You need is to write generic code using vector types and let compiler vectorize depending on instruction set being used at compile time. In last resort for some minor cases in auto vectorized code use in single places some intrinsics, in cases where auto vectorization can not generate most performant intrinsic. |
There are places in Glaze where I know I can optimize code using BMI2 instructions that are not generated from generic vectorized code. But, I could add that as a separate CMake option. I think I'm going to follow your advice and use |
In my production code I was using approach hiding explicit intrinsic calls with forced inline functions written for generic, x86_64 variants an aarch64 examples: template<int... args, typename vector_type>
constexpr vector_type shuffle_vector(vector_type a, vector_type b) noexcept
{
#if defined(__clang__)
return __builtin_shufflevector(a, b, args...);
#else
using element_type = typename std::remove_reference<typename std::remove_cv<decltype(a[0])>::type>::type;
return __builtin_shuffle(a, b, vector_type{static_cast<element_type>(args)...});
#endif
} or #if defined(__ARM_NEON)
[[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b)
{
return vpmaxq_f64(a, b);
}
#elif defined(__SSE2__)
using float64x2_t = __m128d;
[[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b) noexcept
{
return _mm_max_pd(a, b);
}
#else
[[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b) noexcept
{
return and that way i was able to avoid using direct intrinsic |
That's a good approach. |
This pull request adds
glaze_DISABLE_SIMD_WHEN_SUPPORTED
, which is OFF by default and will build with AVX2 instructions when available. But, it allows developers to turn off AVX2 compilation when doing cross-compiles.