Skip to content

Use GPUArrays accumulation implementation #2813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

christiangnrd
Copy link
Member

Opened to run benchmarks.

Todo:

  • Add compat bound when GPUArrays version released

Copy link
Contributor

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/runtests.jl b/test/runtests.jl
index b6c479cce..89bf840c9 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -5,7 +5,7 @@ using Printf: @sprintf
 using Base.Filesystem: path_separator
 
 using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="accumulatetests")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "accumulatetests")
 
 # parse some command-line arguments
 function extract_flag!(args, flag, default=nothing; typ=typeof(default))

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: f8088b1 Previous: e561e7a Ratio
latency/precompile 42896033326.5 ns 43393378645 ns 0.99
latency/ttfp 7099587815 ns 7099882121 ns 1.00
latency/import 3574268220 ns 3463869374 ns 1.03
integration/volumerhs 9621277 ns 9623663 ns 1.00
integration/byval/slices=1 146915 ns 146714 ns 1.00
integration/byval/slices=3 425748.5 ns 425787 ns 1.00
integration/byval/reference 144907 ns 144967 ns 1.00
integration/byval/slices=2 286403 ns 286209 ns 1.00
integration/cudadevrt 103446 ns 103426 ns 1.00
kernel/indexing 14264 ns 14196 ns 1.00
kernel/indexing_checked 14951 ns 14906 ns 1.00
kernel/occupancy 672.626582278481 ns 759.2189781021898 ns 0.89
kernel/launch 2152.222222222222 ns 2287.222222222222 ns 0.94
kernel/rand 17637 ns 15792 ns 1.12
array/reverse/1d 20110.5 ns 19624 ns 1.02
array/reverse/2d 24609 ns 24928.5 ns 0.99
array/reverse/1d_inplace 10850 ns 10448 ns 1.04
array/reverse/2d_inplace 13297 ns 12006 ns 1.11
array/copy 20888 ns 20990 ns 1.00
array/iteration/findall/int 116428.5 ns 159128.5 ns 0.73
array/iteration/findall/bool 98530 ns 139832 ns 0.70
array/iteration/findfirst/int 161354.5 ns 162546 ns 0.99
array/iteration/findfirst/bool 163024 ns 164393.5 ns 0.99
array/iteration/scalar 71507 ns 72740 ns 0.98
array/iteration/logical 173144.5 ns 216803.5 ns 0.80
array/iteration/findmin/1d 46910 ns 45968 ns 1.02
array/iteration/findmin/2d 96034 ns 96433 ns 1.00
array/reductions/reduce/Int64/1d 45446 ns 44555 ns 1.02
array/reductions/reduce/Int64/dims=1 51689 ns 48607 ns 1.06
array/reductions/reduce/Int64/dims=2 62807 ns 63682.5 ns 0.99
array/reductions/reduce/Int64/dims=1L 88945 ns 88842 ns 1.00
array/reductions/reduce/Int64/dims=2L 87955 ns 89417.5 ns 0.98
array/reductions/reduce/Float32/1d 34217 ns 34490 ns 0.99
array/reductions/reduce/Float32/dims=1 50563 ns 50554 ns 1.00
array/reductions/reduce/Float32/dims=2 59393 ns 59726 ns 0.99
array/reductions/reduce/Float32/dims=1L 52272 ns 52852 ns 0.99
array/reductions/reduce/Float32/dims=2L 69662 ns 70052.5 ns 0.99
array/reductions/mapreduce/Int64/1d 44348.5 ns 45547 ns 0.97
array/reductions/mapreduce/Int64/dims=1 53384.5 ns 48423.5 ns 1.10
array/reductions/mapreduce/Int64/dims=2 61862 ns 61443 ns 1.01
array/reductions/mapreduce/Int64/dims=1L 88844 ns 88888 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 86736.5 ns 87908.5 ns 0.99
array/reductions/mapreduce/Float32/1d 34025 ns 34245.5 ns 0.99
array/reductions/mapreduce/Float32/dims=1 41472 ns 47287 ns 0.88
array/reductions/mapreduce/Float32/dims=2 59968 ns 59743 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52682 ns 53154 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 70380 ns 70503 ns 1.00
array/broadcast 20047 ns 20866 ns 0.96
array/copyto!/gpu_to_gpu 11206 ns 12817 ns 0.87
array/copyto!/cpu_to_gpu 214939.5 ns 213873 ns 1.00
array/copyto!/gpu_to_cpu 282695.5 ns 284406 ns 0.99
array/accumulate/Int64/1d 80469 ns 125170 ns 0.64
array/accumulate/Int64/dims=1 220590 ns 83519 ns 2.64
array/accumulate/Int64/dims=2 112517 ns 158002 ns 0.71
array/accumulate/Int64/dims=1L 409581.5 ns 1709945.5 ns 0.24
array/accumulate/Int64/dims=2L 5190679 ns 966571 ns 5.37
array/accumulate/Float32/1d 54935 ns 109737 ns 0.50
array/accumulate/Float32/dims=1 201705 ns 80823.5 ns 2.50
array/accumulate/Float32/dims=2 92304 ns 147778 ns 0.62
array/accumulate/Float32/dims=1L 245100 ns 1619194 ns 0.15
array/accumulate/Float32/dims=2L 3737008 ns 698530 ns 5.35
array/construct 1256.7 ns 1279.85 ns 0.98
array/random/randn/Float32 43200 ns 47253.5 ns 0.91
array/random/randn!/Float32 24962 ns 24573 ns 1.02
array/random/rand!/Int64 27381 ns 27294 ns 1.00
array/random/rand!/Float32 8756.333333333334 ns 8724.333333333334 ns 1.00
array/random/rand/Int64 29945 ns 29633 ns 1.01
array/random/rand/Float32 13068 ns 12902 ns 1.01
array/permutedims/4d 60355.5 ns 61250.5 ns 0.99
array/permutedims/2d 54052 ns 54865 ns 0.99
array/permutedims/3d 54989 ns 55511 ns 0.99
array/sorting/1d 2766160 ns 2757710 ns 1.00
array/sorting/by 3354758 ns 3344132.5 ns 1.00
array/sorting/2d 1084688 ns 1080389 ns 1.00
cuda/synchronization/stream/auto 1046 ns 1015.8333333333334 ns 1.03
cuda/synchronization/stream/nonblocking 7903.799999999999 ns 7618.9 ns 1.04
cuda/synchronization/stream/blocking 855.6 ns 799.1530612244898 ns 1.07
cuda/synchronization/context/auto 1161.7 ns 1164.1 ns 1.00
cuda/synchronization/context/nonblocking 8059.799999999999 ns 7651.4 ns 1.05
cuda/synchronization/context/blocking 925.8139534883721 ns 895.8490566037735 ns 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant