-
Notifications
You must be signed in to change notification settings - Fork 814
SIMD vectorization of Array.sum<int>, etc #18509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ge, Array.sum and Array.average to take advantage of vectorization in System.Linq.Enumerable module.
❗ Release notes required
|
src/FSharp.Core/array.fs
Outdated
[<CompiledName("Sum")>] | ||
let inline sumFloat (array: float array) : float = | ||
System.Linq.Enumerable.Sum array | ||
|
||
[<CompiledName("Sum")>] | ||
let inline sumFloat32 (array: float32 array) : float32 = | ||
System.Linq.Enumerable.Sum array | ||
|
||
[<CompiledName("Sum")>] | ||
let inline sumInt (array: int array) : int = | ||
System.Linq.Enumerable.Sum array | ||
|
||
[<CompiledName("Sum")>] | ||
let inline sumInt64 (array: int64 array) : int64 = | ||
System.Linq.Enumerable.Sum array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that this would be a reasonable place to use static optimization syntax to specify which types should delegate to the LINQ method and which to the existing code, e.g.,
let inline sum (array: ^T array) : ^T =
existingSumCode array
when ^T : int = System.Linq.Enumerable.Sum array
when ^T : int64 = System.Linq.Enumerable.Sum array
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I expect "static optimization conditionals" are a compile-time thing and not runtime? Because I can't check easily with sharplab.io, it says "error FS0817: Static optimization conditionals are only for use within the F# library"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the appropriate implementation would be chosen at compile-time once the type parameter was resolved.
You could add some IL tests under https://github.com/dotnet/fsharp/tree/main/tests/FSharp.Compiler.ComponentTests/EmittedIL if you wanted.
It looks like the error case is now different for average over an empty collection. (we should not evaluate a seq before calling into Average, since this would be breaking in a different way) |
I did some initial tests to see if this makes sense at all or not. I used the current [<Benchmark>]
member x.ArraySum() = // Array sum
array
|> Array.sum
|> ignore
[<Benchmark>]
member x.ArrayAverage() = // There has an extra map, because average needs float
array
|> Array.map float
|> Array.average
|> ignore
[<Benchmark>]
member x.ArraySeqSum() = // Seq sum by using array as base data
array
|> Seq.sum
|> ignore
[<Benchmark>]
member x.ListSeqSum() = // Seq sum by using list as base data
list
|> Seq.sum
|> ignore And here are the results with current main:
Here are the results with this PR:
Edit: I used Net 9 but the FSharp.Core.dll netstandard2.0 version in both, which was probably a mistake because Spans are truely efficient on netstandard2.1 only (?) |
Hmm, I'm thinking if default FSI is compiled with debug mode and people use F# as a scripting language, then performance optimizations like this could be marked as I did run the same performance tests with FSharp.Core.dll netstandard2.1 version Main branch:
This PR
By quick look of this, the Sum seems to be improved but the Average not really. Should I revert the Average change? Edit: Average removed |
1/As per your benchmark, the cost for average might be dominated by doing the casts and additional array allocations. 2/The idea of using |
What are the numbers when running on the full clr? 4.6-4.8? Enumerable probably has different implementation there. |
Average will now throw different exception, so it's a breaking change at this point. |
@Thorium: |
I did check this, and the results were worse for Enumerable.Sum
This PR - with Enumerable.Sum
|
…nchmarks\CompiledCodeBenchmarks\MicroPerf\MicroPerf.fsproj
Because this is just a forwarding, behaviour difference is easy to explore already in FSI: [| 1.; Double.NaN |] |> Array.sum;;
//val it: float = nan
[| 1.; Double.NaN |] |> System.Linq.Enumerable.Sum;;
//val it: float = nan Seems to be the same with edge-cases like Double.MaxValue and infinity. |
I don't know how to easily run MicroPerf.fsproj in .NET, but I can run
// * Summary * BenchmarkDotNet v0.13.12, Windows 11 (10.0.26100.3915) Job=.NET Framework 4.8 Runtime=.NET Framework 4.8
Not exactly the results I was hoping for. |
Framework-dependent runtime switch would solve this (and since it is static, I would hope JIT would eliminate such branching), but is not something we are doing in FSharp.Core. On Desktop:
On Core:
It would be great if we could make use of https://learn.microsoft.com/en-us/dotnet/api/system.numerics.tensors.tensorprimitives.sum?view=net-9.0-pp#system-numerics-tensors-tensorprimitives-sum-1(system-readonlyspan((-0))) for some of those implementations. Right now, any such support depending on Tensors will have to be done as a separate lib outside of FSharp.Core . |
Yeah, that's what I suspected will happen :( Bcl in full framework doesn't have those functions vectorised. And the issue with fslib is that it ships as netstandard, so there's no way to tailor separate implementations for netfx and netcore. |
The
What do you mean by "On Desktop" and "On Core" ? I can see that in FSharp.Core.fsproj there is |
Reflection can likely neglect many optimizations (needs proving tho).
Desktop is usually referred when full framework is used, and core is when coreclr runtime and bcl are used.
These should already be defined. However they will be defined on both net48 and net9.0 |
I meant a statically stored flag based on something like: open System.Runtime.InteropServices
let runtimeDescription = RuntimeInformation.FrameworkDescription
let hasLinqAcceleration = runtimeDescription > "..." // string-based logic here And switching at runtime (so cannot be part of statically optimized e.g. top level function: |
Runtime check added. Compared to this This PR:
|
After |
src/FSharp.Core/array.fs
Outdated
let inline sum (array: ^T array) : ^T = | ||
classicSum array | ||
when ^T : float = | ||
if System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription.StartsWith ".NET Framework" then classicSum array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work.
Can you store it in a static boolean flag somewhere, is it possible?
src/FSharp.Core/array.fs
Outdated
@@ -1578,8 +1578,7 @@ module Array = | |||
checkNonNull "array" array | |||
Microsoft.FSharp.Primitives.Basics.Array.permute indexMap array | |||
|
|||
[<CompiledName("Sum")>] | |||
let inline sum (array: ^T array) : ^T = | |||
let inline private classicSum (array: ^T array) : ^T = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it sumImpl
to indicate it is the sum implemented here. Or maybe fsharpSumImpl
.
Would rather avoid the term classic
Is it vectorized as well? Is it doing copies first? I am curious. |
Oh, you are right, it isn't. That's sad, makes this very marginal feature especially when tools like FSharpLint prefer sumBy over (map >> sum) |
.NET Framework is not .NET Standard 2.1 compatible, thus .NET Standard 2.1 version shouldn't need runtime check.
|
Will all netstandard2.1 implementations be vectorized? If yes, then it's safe enough (and also it should be defined already). |
Description
Specific overloads (float, float32, int, int64) of Seq.sum,
Seq.average,Array.sumand Array.averageto take advantage of vectorization in System.Linq.Enumerable module.This is potentially a naive first try to solve #16230 by the spirit of @T-Gro comment #16230 (comment)
Checklist