Skip to content

Add SVE, SVE2, SVEBITPERM, and FP16 feature detection on Windows/ARM64#55

Draft
raneashay wants to merge 1 commit intomicrosoft:mainfrom
raneashay:ashay/fix-windows-arm-sve-support
Draft

Add SVE, SVE2, SVEBITPERM, and FP16 feature detection on Windows/ARM64#55
raneashay wants to merge 1 commit intomicrosoft:mainfrom
raneashay:ashay/fix-windows-arm-sve-support

Conversation

@raneashay
Copy link

This patch adds support for SVE, SVE2, SVEBITPERM, FPHP, and ASIMDHP on
supported ARM64 processors when running Windows. Since the MSVC
compiler does not support inline assembly or ARM64 processors, this
patch introduces a separate file to be able to read VL using the rdvl
assembly instruction. Windows does not expose a mechanism for writing VL
so this patch makes set_and_get_current_sve_vector_length() simply
return the existing VL.

This patch introduces a new test (TestSVEFeatureDetection.java) that
validates the SVE level and VL determined by the CPU feature detection
code, and this patch modifies two existing tests to disassociate support
for SVE/SVE2 from support for FPHP and ASIMDHP. Specifically, in
TestFloat16VectorOperations.java, SVE alone is insufficient to expect
half-precision vector operations; instead FPHP and ASIMDHP support
(which is already exercised by the test case) suffices. Along similar
lines, in TestReductions.java, we should expect to see non-zero vector
operations when SVE is available and we should fail on vector operations
when SVE is unavailable.

Finally, this patch updates TestFloat16ScalarOperations.java to check
for constant-folding of FMA operations only on non-Windows platforms.
We do this because FmaDNode::Value(), FmaFNode::Value(), as well as
FMAHFNode::Value() fold FMA nodes only when __STDC_IEC_559__ is
defined, which is not the case on Windows for both GCC as well as MSVC.
Perhaps the reason we hadn't discovered this discrepancy until now could
be because FMA support for Windows (on ARM64) was disabled until this
patch so the tests that were predicated on FPHP/ASIMDHP support never
ran on Windows. Of course, this doesn't explain why we never caught
this problem on Windows/x86 machines that support FMAs, but that could
be because processors that support avx512_fp16 are new and we haven't
run CI on the machines.

This patch adds support for SVE, SVE2, SVEBITPERM, FPHP, and ASIMDHP on
supported ARM64 processors when running Windows.  Since the MSVC
compiler does not support inline assembly or ARM64 processors, this
patch introduces a separate file to be able to read VL using the rdvl
assembly instruction. Windows does not expose a mechanism for writing VL
so this patch makes `set_and_get_current_sve_vector_length()` simply
return the existing VL.

This patch introduces a new test (TestSVEFeatureDetection.java) that
validates the SVE level and VL determined by the CPU feature detection
code, and this patch modifies two existing tests to disassociate support
for SVE/SVE2 from support for FPHP and ASIMDHP.  Specifically, in
TestFloat16VectorOperations.java, SVE alone is insufficient to expect
half-precision vector operations; instead FPHP and ASIMDHP support
(which is already exercised by the test case) suffices.  Along similar
lines, in TestReductions.java, we should expect to see non-zero vector
operations when SVE is available and we should fail on vector operations
when SVE is unavailable.

Finally, this patch updates TestFloat16ScalarOperations.java to check
for constant-folding of FMA operations only on non-Windows platforms.
We do this because `FmaDNode::Value()`, `FmaFNode::Value()`, as well as
`FMAHFNode::Value()` fold FMA nodes only when `__STDC_IEC_559__` is
defined, which is not the case on Windows for both GCC as well as MSVC.
Perhaps the reason we hadn't discovered this discrepancy until now could
be because FMA support for Windows (on ARM64) was disabled until this
patch so the tests that were predicated on FPHP/ASIMDHP support never
ran on Windows.  Of course, this doesn't explain why we never caught
this problem on Windows/x86 machines that support FMAs, but that could
be because processors that support `avx512_fp16` are new and we haven't
run CI on the machines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant