Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Arm64/SVE: Implemented AddRotateComplex and AddSequentialAcross #104258

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c738b77
Added ConverToInt32 and ConvertToUInt32 for float inputs.
ebepho Jun 5, 2024
35d39d9
Added flags to handle only low predicate registers.
ebepho Jun 5, 2024
7a781e1
Fixed merge conflicts.
ebepho Jun 5, 2024
1378d60
Fix whitespace
ebepho Jun 5, 2024
10c7a15
Remove special codegen flag
ebepho Jun 7, 2024
8004868
Added new test template for operations with different return types.
ebepho Jun 10, 2024
af7ccd4
Merge branch 'main' into ConvertToInt32
ebepho Jun 10, 2024
8cb76da
Add new test template.
ebepho Jun 11, 2024
abe25fc
Added api for ConvertToInt32 and ConvertToUInt 32 for double.
ebepho Jun 13, 2024
0f51f38
fix merge conflicts.
ebepho Jun 13, 2024
7fabb91
Merge branch 'dotnet:main' into main
ebepho Jun 14, 2024
d5374ca
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 15, 2024
4aa224d
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 15, 2024
49a6c85
Round SVE intrinsics for floats.
ebepho Jun 16, 2024
bd2702d
Completed Round SVE fp apis.
ebepho Jun 16, 2024
56601b4
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 17, 2024
6ba83c3
Merge branch 'main' into round
ebepho Jun 17, 2024
ba922e7
Completed sve apis for scale and sqrt, added a new test template for …
ebepho Jun 18, 2024
04071a3
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 18, 2024
9863b7c
Merge branch 'main' into scale+sqrt
ebepho Jun 18, 2024
c7fbb4d
Started implementation for AddRotateComplex and AddSequentialAcross S…
ebepho Jun 18, 2024
ffcd267
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 18, 2024
5803fc2
modified test template for AddRotateComplex sve api.
ebepho Jun 19, 2024
33626b3
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 19, 2024
6816686
AddRotateComplex helper for double variation.
ebepho Jun 21, 2024
da441d1
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 21, 2024
637f059
Merge branch 'main' into add
ebepho Jun 21, 2024
f98fd84
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 24, 2024
1e68ff6
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jun 26, 2024
e6777ca
Merge branch 'main' into add
ebepho Jun 27, 2024
214cd60
AddRotateComplex and AddSequentialAcross (WIP).
ebepho Jul 1, 2024
44231c0
addrotatecomplex hwintrinsic tags
ebepho Jul 2, 2024
a35a74e
removed addroatecomplex from specialcodegen path
ebepho Jul 2, 2024
a5cad71
fixed path for addrotatecomplex
ebepho Jul 2, 2024
33e925b
Merge branch 'main' of github.com:ebepho/runtime
ebepho Jul 3, 2024
68b9204
fixed hwintrins tags for addseqacross
ebepho Jul 3, 2024
eb2822e
fixed instr(op1, op2, imm) test template naming
ebepho Jul 3, 2024
94956b3
Merge branch 'main' into add
ebepho Jul 3, 2024
3de605e
WIP addsequentialacross.
ebepho Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions src/coreclr/jit/codegenarm64test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8391,13 +8391,13 @@ void CodeGen::genArm64EmitterUnitTestsSve()
INS_OPTS_SCALABLE_D); // ST1B {<Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

// IF_SVE_GP_3A
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 90,
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 0,
Copy link
Member

@amanasifkhalid amanasifkhalid Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunalspathak at the API level, do we want users to pass the actual angle value (90, 180, etc) for the immediate? If so, we might have to do some awkward transformations throughout the JIT's phases to get this to work:

  • If we need to generate a switch table of all immediate values (in case the user doesn't pass a constant), HWIntrinsicImmOpHelper expects the immediates to be contiguous, like [0, 3]. If the possible values are 90, 180, etc., we'll need some special handling there to pass the correct immediates to emitIns_R_R_R_I.
  • For FCADD, emitIns_R_R_R_I expects us to pass the immediate as an angle value; it then converts the value to its bitwise representation [0, 3] internally. If we streamline this so we can just pass the immediate in its bitwise form to emitIns_R_R_R_I, that might simplify the logic elsewhere in the JIT. For example, the bounds for this intrinsic in lookupImmBounds would be [0, 3], and we'd just have to transform the user's input to this form somewhere in the JIT -- perhaps during importation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is bit tricky. The only acceptable and valid values for FCADD is 90 or 270 and that's what we expect user to pass. All the other values are invalid and we should probably throw ArgumentOutOfRangeException. Also, since the values are not contiguous, we might not be able to use the generic table generation logic. It is meant for the contiguous value. For this API, we want something like this:

if (IsConstant(rot) && (rot == 90) || (rot == 270))
{    
    fcadd ... // here we will embed 0 or 1, depending on if the rot is 90 or 270
}
else
{
    // generate fallback
}

// fallback codegen
rot = ... // either constant or from variable
if (rot == 90)
{
    fcadd ...0 // '0' to specify rotation is 90
}
else if (rot == 270)
{
    fcadd ...1 // '1' to specify rotation is 270
}
else
{
    throw ArgumentOutOfRangeException();
}

@tannergooding - I don't believe we have API that has such restriction about the input value, do we? For eg. I don't see we have implemented AdvSimd's FcAdd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant to put a comment here, but spoke with @tannergooding offline and the right thing to do here is to handle the fallback in C# level, something like:

if (cns == 90) { AddRotateComplex(..., 90) }
else if (cns == 270) { .... }
else { throw }

and then in rationalizer, make sure to the argument is indeed a constant and is "in bounds", before we rewrite it back to the call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebepho @amanasifkhalid - let me know if you need anything else to move this further.

INS_OPTS_SCALABLE_H); // FCADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>, <const>
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 270,
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 1,
INS_OPTS_SCALABLE_H); // FCADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>, <const>
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 270,
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 1,
INS_OPTS_SCALABLE_S); // FCADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>, <const>
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 270,
theEmitter->emitIns_R_R_R_I(INS_sve_fcadd, EA_SCALABLE, REG_V0, REG_P1, REG_V2, 1,
INS_OPTS_SCALABLE_D); // FCADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>, <const>

// IF_SVE_GT_4A
Expand Down
5 changes: 5 additions & 0 deletions src/coreclr/jit/hwintrinsicarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -447,6 +447,11 @@ void HWIntrinsicInfo::lookupImmBounds(
immUpperBound = Compiler::getSIMDVectorLength(simdSize, baseType) - 1;
break;

case NI_Sve_AddRotateComplex:
immLowerBound = 0;
immUpperBound = 1;
break;

case NI_Sve_CreateTrueMaskByte:
case NI_Sve_CreateTrueMaskDouble:
case NI_Sve_CreateTrueMaskInt16:
Expand Down
310 changes: 169 additions & 141 deletions src/coreclr/jit/hwintrinsiccodegenarm64.cpp

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions src/coreclr/jit/hwintrinsiclistarm64sve.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ HARDWARE_INTRINSIC(Sve, Abs,
HARDWARE_INTRINSIC(Sve, AbsoluteDifference, -1, -1, false, {INS_sve_sabd, INS_sve_uabd, INS_sve_sabd, INS_sve_uabd, INS_sve_sabd, INS_sve_uabd, INS_sve_sabd, INS_sve_uabd, INS_sve_fabd, INS_sve_fabd}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, Add, -1, -1, false, {INS_sve_add, INS_sve_add, INS_sve_add, INS_sve_add, INS_sve_add, INS_sve_add, INS_sve_add, INS_sve_add, INS_sve_fadd, INS_sve_fadd}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_OptionalEmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, AddAcross, -1, 1, true, {INS_sve_saddv, INS_sve_uaddv, INS_sve_saddv, INS_sve_uaddv, INS_sve_saddv, INS_sve_uaddv, INS_sve_uaddv, INS_sve_uaddv, INS_sve_faddv, INS_sve_faddv}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbeddedMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, AddRotateComplex, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_fcadd, INS_sve_fcadd}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics|HW_Flag_EmbeddedMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, AddSaturate, -1, 2, true, {INS_sve_sqadd, INS_sve_uqadd, INS_sve_sqadd, INS_sve_uqadd, INS_sve_sqadd, INS_sve_uqadd, INS_sve_sqadd, INS_sve_uqadd, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_OptionalEmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, AddSequentialAcross, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_fadda, INS_sve_fadda}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_HasRMWSemantics|HW_Flag_EmbeddedMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, And, -1, -1, false, {INS_sve_and, INS_sve_and, INS_sve_and, INS_sve_and, INS_sve_and, INS_sve_and, INS_sve_and, INS_sve_and, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_OptionalEmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, AndAcross, -1, -1, false, {INS_sve_andv, INS_sve_andv, INS_sve_andv, INS_sve_andv, INS_sve_andv, INS_sve_andv, INS_sve_andv, INS_sve_andv, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, BitwiseClear, -1, -1, false, {INS_sve_bic, INS_sve_bic, INS_sve_bic, INS_sve_bic, INS_sve_bic, INS_sve_bic, INS_sve_bic, INS_sve_bic, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_OptionalEmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/lowerarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3215,6 +3215,7 @@ void Lowering::ContainCheckHWIntrinsic(GenTreeHWIntrinsic* node)
case NI_Sve_PrefetchInt32:
case NI_Sve_PrefetchInt64:
case NI_Sve_ExtractVector:
case NI_Sve_AddRotateComplex:
assert(hasImmediateOperand);
assert(varTypeIsIntegral(intrin.op3));
if (intrin.op3->IsCnsIntOrI())
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,28 @@ internal Arm64() { }
/// </summary>
public static unsafe Vector<ulong> AddAcross(Vector<ulong> value) { throw new PlatformNotSupportedException(); }


/// AddRotateComplex : Complex add with rotate

/// <summary>
/// svfloat64_t svcadd[_f64]_m(svbool_t pg, svfloat64_t op1, svfloat64_t op2, uint64_t imm_rotation)
/// FCADD Ztied1.D, Pg/M, Ztied1.D, Zop2.D, #imm_rotation
/// svfloat64_t svcadd[_f64]_x(svbool_t pg, svfloat64_t op1, svfloat64_t op2, uint64_t imm_rotation)
/// FCADD Ztied1.D, Pg/M, Ztied1.D, Zop2.D, #imm_rotation
/// svfloat64_t svcadd[_f64]_z(svbool_t pg, svfloat64_t op1, svfloat64_t op2, uint64_t imm_rotation)
/// </summary>
public static unsafe Vector<double> AddRotateComplex(Vector<double> left, Vector<double> right, [ConstantExpected] byte rotation) { throw new PlatformNotSupportedException(); }

/// <summary>
/// svfloat32_t svcadd[_f32]_m(svbool_t pg, svfloat32_t op1, svfloat32_t op2, uint64_t imm_rotation)
/// FCADD Ztied1.S, Pg/M, Ztied1.S, Zop2.S, #imm_rotation
/// svfloat32_t svcadd[_f32]_x(svbool_t pg, svfloat32_t op1, svfloat32_t op2, uint64_t imm_rotation)
/// FCADD Ztied1.S, Pg/M, Ztied1.S, Zop2.S, #imm_rotation
/// svfloat32_t svcadd[_f32]_z(svbool_t pg, svfloat32_t op1, svfloat32_t op2, uint64_t imm_rotation)
/// </summary>
public static unsafe Vector<float> AddRotateComplex(Vector<float> left, Vector<float> right, [ConstantExpected] byte rotation) { throw new PlatformNotSupportedException(); }


/// AddSaturate : Saturating add

/// <summary>
Expand Down Expand Up @@ -341,6 +363,22 @@ internal Arm64() { }
/// </summary>
public static unsafe Vector<ulong> AddSaturate(Vector<ulong> left, Vector<ulong> right) { throw new PlatformNotSupportedException(); }


/// AddSequentialAcross : Add reduction (strictly-ordered)

/// <summary>
/// float64_t svadda[_f64](svbool_t pg, float64_t initial, svfloat64_t op)
/// FADDA Dtied, Pg, Dtied, Zop.D
/// </summary>
public static unsafe Vector<double> AddSequentialAcross(Vector<double> initial, Vector<double> value) { throw new PlatformNotSupportedException(); }

/// <summary>
/// float32_t svadda[_f32](svbool_t pg, float32_t initial, svfloat32_t op)
/// FADDA Stied, Pg, Stied, Zop.S
/// </summary>
public static unsafe Vector<float> AddSequentialAcross(Vector<float> initial, Vector<float> value) { throw new PlatformNotSupportedException(); }


/// And : Bitwise AND

/// <summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,28 @@ internal Arm64() { }
/// </summary>
public static unsafe Vector<ulong> AddAcross(Vector<ulong> value) => AddAcross(value);


/// AddRotateComplex : Complex add with rotate

/// <summary>
/// svfloat64_t svcadd[_f64]_m(svbool_t pg, svfloat64_t op1, svfloat64_t op2, uint64_t imm_rotation)
/// FCADD Ztied1.D, Pg/M, Ztied1.D, Zop2.D, #imm_rotation
/// svfloat64_t svcadd[_f64]_x(svbool_t pg, svfloat64_t op1, svfloat64_t op2, uint64_t imm_rotation)
/// FCADD Ztied1.D, Pg/M, Ztied1.D, Zop2.D, #imm_rotation
/// svfloat64_t svcadd[_f64]_z(svbool_t pg, svfloat64_t op1, svfloat64_t op2, uint64_t imm_rotation)
/// </summary>
public static unsafe Vector<double> AddRotateComplex(Vector<double> left, Vector<double> right, [ConstantExpected] byte rotation) => AddRotateComplex(left, right, rotation);

/// <summary>
/// svfloat32_t svcadd[_f32]_m(svbool_t pg, svfloat32_t op1, svfloat32_t op2, uint64_t imm_rotation)
/// FCADD Ztied1.S, Pg/M, Ztied1.S, Zop2.S, #imm_rotation
/// svfloat32_t svcadd[_f32]_x(svbool_t pg, svfloat32_t op1, svfloat32_t op2, uint64_t imm_rotation)
/// FCADD Ztied1.S, Pg/M, Ztied1.S, Zop2.S, #imm_rotation
/// svfloat32_t svcadd[_f32]_z(svbool_t pg, svfloat32_t op1, svfloat32_t op2, uint64_t imm_rotation)
/// </summary>
public static unsafe Vector<float> AddRotateComplex(Vector<float> left, Vector<float> right, [ConstantExpected] byte rotation) => AddRotateComplex(left, right, rotation);


/// AddSaturate : Saturating add

/// <summary>
Expand Down Expand Up @@ -370,6 +392,22 @@ internal Arm64() { }
/// </summary>
public static unsafe Vector<ulong> AddSaturate(Vector<ulong> left, Vector<ulong> right) => AddSaturate(left, right);


/// AddSequentialAcross : Add reduction (strictly-ordered)

/// <summary>
/// float64_t svadda[_f64](svbool_t pg, float64_t initial, svfloat64_t op)
/// FADDA Dtied, Pg, Dtied, Zop.D
/// </summary>
public static unsafe Vector<double> AddSequentialAcross(Vector<double> initial, Vector<double> value) => AddSequentialAcross(initial, value);

/// <summary>
/// float32_t svadda[_f32](svbool_t pg, float32_t initial, svfloat32_t op)
/// FADDA Stied, Pg, Stied, Zop.S
/// </summary>
public static unsafe Vector<float> AddSequentialAcross(Vector<float> initial, Vector<float> value) => AddSequentialAcross(initial, value);


/// And : Bitwise AND

/// <summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4235,6 +4235,9 @@ internal Arm64() { }
public static System.Numerics.Vector<ulong> AddAcross(System.Numerics.Vector<uint> value) { throw null; }
public static System.Numerics.Vector<ulong> AddAcross(System.Numerics.Vector<ulong> value) { throw null; }

public static System.Numerics.Vector<double> AddRotateComplex(System.Numerics.Vector<double> left, System.Numerics.Vector<double> right, [ConstantExpected] byte rotation) { throw null; }
public static System.Numerics.Vector<float> AddRotateComplex(System.Numerics.Vector<float> left, System.Numerics.Vector<float> right, [ConstantExpected] byte rotation) { throw null; }

public static System.Numerics.Vector<byte> AddSaturate(System.Numerics.Vector<byte> left, System.Numerics.Vector<byte> right) { throw null; }
public static System.Numerics.Vector<short> AddSaturate(System.Numerics.Vector<short> left, System.Numerics.Vector<short> right) { throw null; }
public static System.Numerics.Vector<int> AddSaturate(System.Numerics.Vector<int> left, System.Numerics.Vector<int> right) { throw null; }
Expand All @@ -4244,6 +4247,9 @@ internal Arm64() { }
public static System.Numerics.Vector<uint> AddSaturate(System.Numerics.Vector<uint> left, System.Numerics.Vector<uint> right) { throw null; }
public static System.Numerics.Vector<ulong> AddSaturate(System.Numerics.Vector<ulong> left, System.Numerics.Vector<ulong> right) { throw null; }

public static System.Numerics.Vector<double> AddSequentialAcross(System.Numerics.Vector<double> initial, System.Numerics.Vector<double> value) { throw null; }
public static System.Numerics.Vector<float> AddSequentialAcross(System.Numerics.Vector<float> initial, System.Numerics.Vector<float> value) { throw null; }

public static System.Numerics.Vector<byte> And(System.Numerics.Vector<byte> left, System.Numerics.Vector<byte> right) { throw null; }
public static System.Numerics.Vector<short> And(System.Numerics.Vector<short> left, System.Numerics.Vector<short> right) { throw null; }
public static System.Numerics.Vector<int> And(System.Numerics.Vector<int> left, System.Numerics.Vector<int> right) { throw null; }
Expand Down
Loading