-
Notifications
You must be signed in to change notification settings - Fork 7
Add Statistics in KA, (only mean and var implemented) #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yolhan83
wants to merge
23
commits into
JuliaGPU:main
Choose a base branch
from
yolhan83:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 18 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
c8b3fe3
Add Statistics in KA, (only mean and var implemented)
yolhan83 b0a8954
make Float32 the type integers promote to
yolhan83 1af77d6
use a kernel to substract the mean (perf boost and test to fix Metal)
yolhan83 c897aa7
fix
yolhan83 f31b232
make gpu friendly backends
yolhan83 e6428f2
use _ convention for kernels
yolhan83 fd17ece
f32 default on docs
yolhan83 3e57905
fix
yolhan83 072d55b
fix reducing multi-dim arrray to scalar
yolhan83 f6977d6
add kenel for multi dim var >3
yolhan83 59622e7
fix reducing multi-dim arrray to scalar
yolhan83 d7cc95d
kernel like cpu
yolhan83 3d01489
tiny perf
yolhan83 b8ca7c6
fix
yolhan83 009cfa9
add benchmarks
yolhan83 35b5322
little perf improvment
yolhan83 038a86e
rework var completly
yolhan83 bb68e51
add doc back
yolhan83 face574
avoid rm new line
yolhan83 f93a708
another new line
yolhan83 2e0967d
other lines
yolhan83 bc7869f
other lines
yolhan83 1b77f18
only keep mean and var for now
yolhan83 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| group = addgroup!(SUITE, "statistics") | ||
|
|
||
| n1d = 1_000_000 | ||
| n3d = 100 | ||
| for T in [UInt32, Int64, Float32] | ||
| local _group = addgroup!(group, "mean $T") | ||
|
|
||
| local randrange = T == Float32 ? T : T(1):T(100) | ||
|
|
||
| _group["mean1d_statistics"] = @benchmarkable @sb(Statistics.mean(v)) setup=(v = ArrayType(rand(rng, $randrange, n1d))) | ||
| _group["mean1d_ak"] = @benchmarkable @sb(AK.mean(v)) setup=(v = ArrayType(rand(rng, $randrange, n1d))) | ||
|
|
||
|
|
||
| _group["meannd_statistics"] = @benchmarkable @sb(Statistics.mean(v,dims=3)) setup=(v = ArrayType(rand(rng, $randrange, n3d,n3d,n3d))) | ||
| _group["meannd_ak"] = @benchmarkable @sb(AK.mean(v,dims=3)) setup=(v = ArrayType(rand(rng, $randrange, n3d,n3d,n3d))) | ||
|
|
||
| local _group = addgroup!(group, "var $T") | ||
| _group["var1d_statistics"] = @benchmarkable @sb(Statistics.var(v)) setup=(v = ArrayType(rand(rng, $randrange, n1d))) | ||
| _group["var1d_ak"] = @benchmarkable @sb(AK.var(v)) setup=(v = ArrayType(rand(rng, $randrange, n1d))) | ||
|
|
||
|
|
||
| _group["varnd_statistics"] = @benchmarkable @sb(Statistics.var(v,dims=3)) setup=(v = ArrayType(rand(rng, $randrange, n3d,n3d,n3d))) | ||
| _group["varnd_ak"] = @benchmarkable @sb(AK.var(v,dims=3)) setup=(v = ArrayType(rand(rng, $randrange, n3d,n3d,n3d))) | ||
|
|
||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| """ | ||
| mean( | ||
| f, src::AbstractArray{T}, backend::Backend=get_backend(src); | ||
| dims::Union{Nothing, Int}=nothing, | ||
|
|
||
| # CPU settings | ||
| max_tasks::Int=Threads.nthreads(), | ||
| min_elems::Int=1, | ||
| prefer_threads::Bool=true, | ||
|
|
||
| # GPU settings | ||
| block_size::Int=256, | ||
| temp::Union{Nothing, AbstractArray}=nothing, | ||
| switch_below::Int=0, | ||
| ) where {T<:Real} | ||
|
|
||
| Compute the mean of `src` along dimensions `dims` after applying `f`. | ||
| If `dims` is `nothing`, reduce `src` to a scalar. If `dims` is an integer, reduce `src` along that | ||
| dimension. The return type will be the same as the element type of `src` if it is a float type, or `Float32` | ||
| if it is an integer type. | ||
| ## CPU settings | ||
| Use at most `max_tasks` threads with at least `min_elems` elements per task. For N-dimensional | ||
| arrays (`dims::Int`) multithreading currently only becomes faster for `max_tasks >= 4`; all other | ||
| cases are scaling linearly with the number of threads. | ||
|
|
||
| ## GPU settings | ||
| The `block_size` parameter controls the number of threads per block. | ||
|
|
||
| The `temp` parameter can be used to pass a pre-allocated temporary array. For reduction to a scalar | ||
| (`dims=nothing`), `length(temp) >= 2 * (length(src) + 2 * block_size - 1) ÷ (2 * block_size)` is | ||
| required. For reduction along a dimension (`dims` is an integer), `temp` is used as the destination | ||
| array, and thus must have the exact dimensions required - i.e. same dimensionwise sizes as `src`, | ||
| except for the reduced dimension which becomes 1; there are some corner cases when one dimension is | ||
| zero, check against `Base.reduce` for CPU arrays for exact behavior. | ||
|
|
||
| The `switch_below` parameter controls the threshold below which the reduction is performed on the | ||
| CPU and is only used for 1D reductions (i.e. `dims=nothing`). | ||
| """ | ||
| function mean( | ||
| f::Function,src::AbstractArray{T},backend::Backend=get_backend(src); | ||
| dims::Union{Nothing, Int}=nothing, | ||
| # CPU settings - ignored here | ||
| max_tasks::Int = Threads.nthreads(), | ||
| min_elems::Int = 1, | ||
| prefer_threads::Bool=true, | ||
| # GPU settings | ||
| block_size::Int = 256, | ||
| temp::Union{Nothing, AbstractArray} = nothing, | ||
| switch_below::Int=0, | ||
| ) where {T<:Real} | ||
| init = T<:Integer ? zero(Float32) : zero(T) | ||
| res = mapreduce(f,+,src,backend; | ||
| init=init, | ||
| dims=dims, | ||
| max_tasks=max_tasks, | ||
| min_elems=min_elems, | ||
| prefer_threads=prefer_threads, | ||
| block_size=block_size, | ||
| temp=temp, | ||
| switch_below=switch_below) | ||
| if isnothing(dims) | ||
| return res./length(src) | ||
| else | ||
| return res./size(src,dims) | ||
| end | ||
| end | ||
|
|
||
| function mean( | ||
| src::AbstractArray{T},backend::Backend=get_backend(src); | ||
| dims::Union{Nothing, Int}=nothing, | ||
| # CPU settings - ignored here | ||
| max_tasks::Int = Threads.nthreads(), | ||
| min_elems::Int = 1, | ||
| prefer_threads::Bool=true, | ||
|
|
||
| # GPU settings | ||
| block_size::Int = 256, | ||
| temp::Union{Nothing, AbstractArray} = nothing, | ||
| switch_below::Int=0, | ||
| ) where {T<:Real} | ||
| init = T<:Integer ? zero(Float32) : zero(T) | ||
| res = reduce(+,src,backend; | ||
| init=init, | ||
| dims=dims, | ||
| max_tasks=max_tasks, | ||
| min_elems=min_elems, | ||
| prefer_threads=prefer_threads, | ||
| block_size=block_size, | ||
| temp=temp, | ||
| switch_below=switch_below) | ||
| if isnothing(dims) | ||
| return res./length(src) | ||
| else | ||
| return res./size(src,dims) | ||
| end | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # need to do the middle kernel first |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # need to sort multi dimensional arrays first |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| include("median.jl") | ||
| include("mean.jl") | ||
| include("cor.jl") | ||
| include("cov.jl") | ||
| include("middle.jl") | ||
| include("quantile.jl") | ||
| include("std.jl") | ||
| include("stdm.jl") | ||
| include("var.jl") | ||
| include("varm.jl") |
Empty file.
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| @inline function _chan_merge(a::Tuple{Int64,T,T}, b::Tuple{Int64,T,T}) where {T<:Real} | ||
| nA, mA, M2A = a | ||
| nB, mB, M2B = b | ||
| if nA == 0 | ||
| return b | ||
| elseif nB == 0 | ||
| return a | ||
| else | ||
| nAB = nA + nB | ||
| δ = mB - mA | ||
| invn = inv(T(nAB)) | ||
| mean = (nA*mA + nB*mB) * invn | ||
| cross = (δ*δ) * (nA*nB * invn) | ||
| return (nAB, mean, M2A + M2B + cross) | ||
| end | ||
| end | ||
| """ | ||
| var( | ||
| src::AbstractArray{T}, backend::Backend=get_backend(src); | ||
| dims::Union{Nothing, Int}=nothing, | ||
| corrected ::Bool = true, | ||
|
|
||
| # CPU settings | ||
| max_tasks::Int=Threads.nthreads(), | ||
| min_elems::Int=1, | ||
| prefer_threads::Bool=true, | ||
|
|
||
| # GPU settings | ||
| block_size::Int=256, | ||
| temp::Union{Nothing, AbstractArray}=nothing, | ||
| switch_below::Int=0, | ||
| ) where {T<:Real} | ||
|
|
||
| Compute the varience of `src` along dimensions `dims`. | ||
| If `dims` is `nothing`, reduce `src` to a scalar. If `dims` is an integer, reduce `src` along that | ||
| dimension. The return type will be the same as the element type of `src` if it is a float type, or `Float32` | ||
| if it is an integer type. | ||
| ## CPU settings | ||
| Use at most `max_tasks` threads with at least `min_elems` elements per task. For N-dimensional | ||
| arrays (`dims::Int`) multithreading currently only becomes faster for `max_tasks >= 4`; all other | ||
| cases are scaling linearly with the number of threads. | ||
|
|
||
| ## GPU settings | ||
| The `block_size` parameter controls the number of threads per block. | ||
|
|
||
| The `temp` parameter can be used to pass a pre-allocated temporary array. For reduction to a scalar | ||
| (`dims=nothing`), `length(temp) >= 2 * (length(src) + 2 * block_size - 1) ÷ (2 * block_size)` is | ||
| required. For reduction along a dimension (`dims` is an integer), `temp` is used as the destination | ||
| array, and thus must have the exact dimensions required - i.e. same dimensionwise sizes as `src`, | ||
| except for the reduced dimension which becomes 1; there are some corner cases when one dimension is | ||
| zero, check against `Base.reduce` for CPU arrays for exact behavior. | ||
|
|
||
| The `switch_below` parameter controls the threshold below which the reduction is performed on the | ||
| CPU and is only used for 1D reductions (i.e. `dims=nothing`). | ||
| """ | ||
| function var( | ||
| src::AbstractArray{T,N}, backend::Backend=get_backend(src); | ||
| dims::Union{Nothing,Int}=nothing, | ||
| corrected::Bool=true, | ||
| max_tasks::Int=Threads.nthreads(), | ||
| min_elems::Int=1, | ||
| prefer_threads::Bool=true, | ||
| block_size::Int=256, | ||
| temp::Union{Nothing,AbstractArray}=nothing, # ignored | ||
| switch_below::Int=0, | ||
| ) where {T<:Integer,N} | ||
|
|
||
| init = (0, 0f0, 0f0) | ||
| mapper = x -> (1, Float32(x), 0f0) | ||
|
|
||
| stats = mapreduce( | ||
| mapper, _chan_merge, src, backend; | ||
| init=init, neutral=init, | ||
| dims=dims, | ||
| max_tasks=max_tasks, min_elems=min_elems, prefer_threads=prefer_threads, | ||
| block_size=block_size, | ||
| temp=nothing, | ||
| switch_below=switch_below, | ||
| ) | ||
|
|
||
| if dims === nothing | ||
| n, _, M2 = stats | ||
| return M2 / Float32(n - ifelse(corrected , 1 , 0)) | ||
| else | ||
| out = similar(stats, Float32) | ||
| AcceleratedKernels.map!( | ||
| s -> @inbounds(s[3] / Float32(s[1] - ifelse(corrected , 1 , 0))), | ||
| out, stats, backend; | ||
| max_tasks=max_tasks, min_elems=min_elems, block_size=block_size, | ||
| ) | ||
| return out | ||
| end | ||
| end | ||
|
|
||
|
|
||
| function var( | ||
| src::AbstractArray{T,N}, backend::Backend=get_backend(src); | ||
| dims::Union{Nothing,Int}=nothing, | ||
| corrected::Bool=true, | ||
| max_tasks::Int=Threads.nthreads(), | ||
| min_elems::Int=1, | ||
| prefer_threads::Bool=true, | ||
| block_size::Int=256, | ||
| temp::Union{Nothing,AbstractArray}=nothing, # ignored | ||
| switch_below::Int=0, | ||
| ) where {T<:AbstractFloat,N} | ||
|
|
||
| init = (0, zero(T), zero(T)) | ||
| mapper = x -> (1, x, zero(typeof(x))) | ||
|
|
||
| stats = mapreduce( | ||
| mapper, _chan_merge, src, backend; | ||
| init=init, neutral=init, | ||
| dims=dims, | ||
| max_tasks=max_tasks, min_elems=min_elems, prefer_threads=prefer_threads, | ||
| block_size=block_size, | ||
| temp=nothing, | ||
| switch_below=switch_below, | ||
| ) | ||
|
|
||
| if dims === nothing | ||
| n, _, M2 = stats | ||
| return M2 / T(n - ifelse(corrected , 1 , 0)) | ||
| else | ||
| out = similar(stats, T) | ||
| AcceleratedKernels.map!( | ||
| s -> @inbounds(s[3] / (s[1] - ifelse(corrected , 1 , 0))), | ||
| out, stats, backend; | ||
| max_tasks=max_tasks, min_elems=min_elems, block_size=block_size, | ||
| ) | ||
| return out | ||
| end | ||
| end |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not removing the last newline character of the file (here and everywhere else) would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at https://github.com/JuliaGPU/AcceleratedKernels.jl/pull/64/files you'll easily spot other files with missing ending newline (github highlights them with a special symbol). I believe vscode by default does this idiotic thing of removing the last newline, which makes very easy to guess that someone is using that editor. I hope there's an option to stop it doing stupid things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no more symbols, I also removed non implemented statistics files for now, we will add them as they are implemented I think