Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #1588 #1590

Merged
merged 4 commits into from
Feb 4, 2025
Merged

fix #1588 #1590

merged 4 commits into from
Feb 4, 2025

Conversation

arturbac
Copy link
Contributor

@arturbac arturbac commented Feb 3, 2025

This pull request adds glaze_DISABLE_SIMD_WHEN_SUPPORTED, which is OFF by default and will build with AVX2 instructions when available. But, it allows developers to turn off AVX2 compilation when doing cross-compiles.

@stephenberry
Copy link
Owner

This avoids adding compiler flags to the CMake configuration, giving developers proper flexibility.

@stephenberry
Copy link
Owner

@arturbac, thanks for this pull request. I deleted the old avx.cmake and make some minor changes. Do you think in the long run we'll want separate options for different instruction sets, such as what I've named glaze_DISABLE_AVX2_WHEN_SUPPORTED, or should Glaze just use a catch all glaze_DISABLE_SIMD_WHEN_SUPPORTED? I'm thinking we might end up with a few different options for different instruction sets in the long run, except that your recent suggestion of using built in vectorization types might make us just want one global option.

@arturbac
Copy link
Contributor Author

arturbac commented Feb 4, 2025

Do you think in the long run we'll want separate options for different instruction sets

I think what I mentioned in issue #1591, you don't need explicit intrinsics at all, You should avoid using them always. All You need is to write generic code using vector types and let compiler vectorize depending on instruction set being used at compile time. In last resort for some minor cases in auto vectorized code use in single places some intrinsics, in cases where auto vectorization can not generate most performant intrinsic.

@stephenberry
Copy link
Owner

There are places in Glaze where I know I can optimize code using BMI2 instructions that are not generated from generic vectorized code. But, I could add that as a separate CMake option. I think I'm going to follow your advice and use glaze_DISABLE_SIMD_WHEN_SUPPORTED for general SIMD instruction sets.

@stephenberry stephenberry merged commit b92ddaf into stephenberry:main Feb 4, 2025
12 checks passed
@arturbac
Copy link
Contributor Author

arturbac commented Feb 4, 2025

In my production code I was using approach hiding explicit intrinsic calls with forced inline functions written for generic, x86_64 variants an aarch64
So in global code everything was generic and not infected with any explicit intrinsic calls, more readable and explicit wrappers over intrinsic sets were able to be testable.
So in case of BMI2 i wold not use them directly but create 3 wrapper functions
[bmi2, ararch64 neon, generic c++ code]

examples:

template<int... args, typename vector_type>
constexpr vector_type shuffle_vector(vector_type a, vector_type b) noexcept
  {
#if defined(__clang__)
  return __builtin_shufflevector(a, b, args...);
#else
  using element_type = typename std::remove_reference<typename std::remove_cv<decltype(a[0])>::type>::type;
  return __builtin_shuffle(a, b, vector_type{static_cast<element_type>(args)...});
#endif
  }

or

#if defined(__ARM_NEON)
[[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b)
  {
  return vpmaxq_f64(a, b);
  }
#elif defined(__SSE2__)
using float64x2_t = __m128d;

[[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b) noexcept
  {
  return _mm_max_pd(a, b);
  }
 #else 
 [[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b) noexcept
{ 
return 

and that way i was able to avoid using direct intrinsic

@stephenberry
Copy link
Owner

That's a good approach.

@arturbac arturbac deleted the pr-vector-code branch February 4, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants