fix #1588 #1590

arturbac · 2025-02-03T23:06:45Z

This pull request adds glaze_DISABLE_SIMD_WHEN_SUPPORTED, which is OFF by default and will build with AVX2 instructions when available. But, it allows developers to turn off AVX2 compilation when doing cross-compiles.

stephenberry · 2025-02-04T15:22:45Z

This avoids adding compiler flags to the CMake configuration, giving developers proper flexibility.

stephenberry · 2025-02-04T15:31:01Z

@arturbac, thanks for this pull request. I deleted the old avx.cmake and make some minor changes. Do you think in the long run we'll want separate options for different instruction sets, such as what I've named glaze_DISABLE_AVX2_WHEN_SUPPORTED, or should Glaze just use a catch all glaze_DISABLE_SIMD_WHEN_SUPPORTED? I'm thinking we might end up with a few different options for different instruction sets in the long run, except that your recent suggestion of using built in vectorization types might make us just want one global option.

arturbac · 2025-02-04T18:16:05Z

Do you think in the long run we'll want separate options for different instruction sets

I think what I mentioned in issue #1591, you don't need explicit intrinsics at all, You should avoid using them always. All You need is to write generic code using vector types and let compiler vectorize depending on instruction set being used at compile time. In last resort for some minor cases in auto vectorized code use in single places some intrinsics, in cases where auto vectorization can not generate most performant intrinsic.

stephenberry · 2025-02-04T18:49:08Z

There are places in Glaze where I know I can optimize code using BMI2 instructions that are not generated from generic vectorized code. But, I could add that as a separate CMake option. I think I'm going to follow your advice and use glaze_DISABLE_SIMD_WHEN_SUPPORTED for general SIMD instruction sets.

arturbac · 2025-02-04T19:42:44Z

In my production code I was using approach hiding explicit intrinsic calls with forced inline functions written for generic, x86_64 variants an aarch64
So in global code everything was generic and not infected with any explicit intrinsic calls, more readable and explicit wrappers over intrinsic sets were able to be testable.
So in case of BMI2 i wold not use them directly but create 3 wrapper functions
[bmi2, ararch64 neon, generic c++ code]

examples:

template<int... args, typename vector_type>
constexpr vector_type shuffle_vector(vector_type a, vector_type b) noexcept
  {
#if defined(__clang__)
  return __builtin_shufflevector(a, b, args...);
#else
  using element_type = typename std::remove_reference<typename std::remove_cv<decltype(a[0])>::type>::type;
  return __builtin_shuffle(a, b, vector_type{static_cast<element_type>(args)...});
#endif
  }

or

#if defined(__ARM_NEON)
[[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b)
  {
  return vpmaxq_f64(a, b);
  }
#elif defined(__SSE2__)
using float64x2_t = __m128d;

[[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b) noexcept
  {
  return _mm_max_pd(a, b);
  }
 #else 
 [[nodiscard, gnu::always_inline, gnu::const]]
inline float64x2_t max_pd(float64x2_t a, float64x2_t b) noexcept
{ 
return

and that way i was able to avoid using direct intrinsic

stephenberry · 2025-02-04T20:00:16Z

That's a good approach.

Artur Bać and others added 2 commits February 4, 2025 00:05

fix stephenberry#1588

b85f5d6

Fix spelling and allow glaze grouping in Cmake

fa6fd4a

Delete avx.cmake

927aaf1

glaze_DISABLE_AVX2_WHEN_SUPPORTED -> glaze_DISABLE_SIMD_WHEN_SUPPORTED

3b9588a

stephenberry merged commit b92ddaf into stephenberry:main Feb 4, 2025
12 checks passed

arturbac deleted the pr-vector-code branch February 4, 2025 20:41

BrewTestBot mentioned this pull request Feb 4, 2025

glaze 4.4.0 Homebrew/homebrew-core#206544

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #1588 #1590

fix #1588 #1590

arturbac commented Feb 3, 2025 •

edited by stephenberry

Loading

stephenberry commented Feb 4, 2025

stephenberry commented Feb 4, 2025

arturbac commented Feb 4, 2025

stephenberry commented Feb 4, 2025

arturbac commented Feb 4, 2025

stephenberry commented Feb 4, 2025

fix #1588 #1590

fix #1588 #1590

Conversation

arturbac commented Feb 3, 2025 • edited by stephenberry Loading

stephenberry commented Feb 4, 2025

stephenberry commented Feb 4, 2025

arturbac commented Feb 4, 2025

stephenberry commented Feb 4, 2025

arturbac commented Feb 4, 2025

stephenberry commented Feb 4, 2025

arturbac commented Feb 3, 2025 •

edited by stephenberry

Loading