-
Notifications
You must be signed in to change notification settings - Fork 243
Use GPUArrays accumulation implementation #2813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
christiangnrd
wants to merge
2
commits into
JuliaGPU:master
Choose a base branch
from
christiangnrd:noaccum
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+4
−268
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/runtests.jl b/test/runtests.jl
index b6c479cce..89bf840c9 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -5,7 +5,7 @@ using Printf: @sprintf
using Base.Filesystem: path_separator
using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="accumulatetests")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "accumulatetests")
# parse some command-line arguments
function extract_flag!(args, flag, default=nothing; typ=typeof(default)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: f8088b1 | Previous: e561e7a | Ratio |
---|---|---|---|
latency/precompile |
42896033326.5 ns |
43393378645 ns |
0.99 |
latency/ttfp |
7099587815 ns |
7099882121 ns |
1.00 |
latency/import |
3574268220 ns |
3463869374 ns |
1.03 |
integration/volumerhs |
9621277 ns |
9623663 ns |
1.00 |
integration/byval/slices=1 |
146915 ns |
146714 ns |
1.00 |
integration/byval/slices=3 |
425748.5 ns |
425787 ns |
1.00 |
integration/byval/reference |
144907 ns |
144967 ns |
1.00 |
integration/byval/slices=2 |
286403 ns |
286209 ns |
1.00 |
integration/cudadevrt |
103446 ns |
103426 ns |
1.00 |
kernel/indexing |
14264 ns |
14196 ns |
1.00 |
kernel/indexing_checked |
14951 ns |
14906 ns |
1.00 |
kernel/occupancy |
672.626582278481 ns |
759.2189781021898 ns |
0.89 |
kernel/launch |
2152.222222222222 ns |
2287.222222222222 ns |
0.94 |
kernel/rand |
17637 ns |
15792 ns |
1.12 |
array/reverse/1d |
20110.5 ns |
19624 ns |
1.02 |
array/reverse/2d |
24609 ns |
24928.5 ns |
0.99 |
array/reverse/1d_inplace |
10850 ns |
10448 ns |
1.04 |
array/reverse/2d_inplace |
13297 ns |
12006 ns |
1.11 |
array/copy |
20888 ns |
20990 ns |
1.00 |
array/iteration/findall/int |
116428.5 ns |
159128.5 ns |
0.73 |
array/iteration/findall/bool |
98530 ns |
139832 ns |
0.70 |
array/iteration/findfirst/int |
161354.5 ns |
162546 ns |
0.99 |
array/iteration/findfirst/bool |
163024 ns |
164393.5 ns |
0.99 |
array/iteration/scalar |
71507 ns |
72740 ns |
0.98 |
array/iteration/logical |
173144.5 ns |
216803.5 ns |
0.80 |
array/iteration/findmin/1d |
46910 ns |
45968 ns |
1.02 |
array/iteration/findmin/2d |
96034 ns |
96433 ns |
1.00 |
array/reductions/reduce/Int64/1d |
45446 ns |
44555 ns |
1.02 |
array/reductions/reduce/Int64/dims=1 |
51689 ns |
48607 ns |
1.06 |
array/reductions/reduce/Int64/dims=2 |
62807 ns |
63682.5 ns |
0.99 |
array/reductions/reduce/Int64/dims=1L |
88945 ns |
88842 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
87955 ns |
89417.5 ns |
0.98 |
array/reductions/reduce/Float32/1d |
34217 ns |
34490 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
50563 ns |
50554 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
59393 ns |
59726 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
52272 ns |
52852 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
69662 ns |
70052.5 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
44348.5 ns |
45547 ns |
0.97 |
array/reductions/mapreduce/Int64/dims=1 |
53384.5 ns |
48423.5 ns |
1.10 |
array/reductions/mapreduce/Int64/dims=2 |
61862 ns |
61443 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1L |
88844 ns |
88888 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
86736.5 ns |
87908.5 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
34025 ns |
34245.5 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1 |
41472 ns |
47287 ns |
0.88 |
array/reductions/mapreduce/Float32/dims=2 |
59968 ns |
59743 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52682 ns |
53154 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
70380 ns |
70503 ns |
1.00 |
array/broadcast |
20047 ns |
20866 ns |
0.96 |
array/copyto!/gpu_to_gpu |
11206 ns |
12817 ns |
0.87 |
array/copyto!/cpu_to_gpu |
214939.5 ns |
213873 ns |
1.00 |
array/copyto!/gpu_to_cpu |
282695.5 ns |
284406 ns |
0.99 |
array/accumulate/Int64/1d |
80469 ns |
125170 ns |
0.64 |
array/accumulate/Int64/dims=1 |
220590 ns |
83519 ns |
2.64 |
array/accumulate/Int64/dims=2 |
112517 ns |
158002 ns |
0.71 |
array/accumulate/Int64/dims=1L |
409581.5 ns |
1709945.5 ns |
0.24 |
array/accumulate/Int64/dims=2L |
5190679 ns |
966571 ns |
5.37 |
array/accumulate/Float32/1d |
54935 ns |
109737 ns |
0.50 |
array/accumulate/Float32/dims=1 |
201705 ns |
80823.5 ns |
2.50 |
array/accumulate/Float32/dims=2 |
92304 ns |
147778 ns |
0.62 |
array/accumulate/Float32/dims=1L |
245100 ns |
1619194 ns |
0.15 |
array/accumulate/Float32/dims=2L |
3737008 ns |
698530 ns |
5.35 |
array/construct |
1256.7 ns |
1279.85 ns |
0.98 |
array/random/randn/Float32 |
43200 ns |
47253.5 ns |
0.91 |
array/random/randn!/Float32 |
24962 ns |
24573 ns |
1.02 |
array/random/rand!/Int64 |
27381 ns |
27294 ns |
1.00 |
array/random/rand!/Float32 |
8756.333333333334 ns |
8724.333333333334 ns |
1.00 |
array/random/rand/Int64 |
29945 ns |
29633 ns |
1.01 |
array/random/rand/Float32 |
13068 ns |
12902 ns |
1.01 |
array/permutedims/4d |
60355.5 ns |
61250.5 ns |
0.99 |
array/permutedims/2d |
54052 ns |
54865 ns |
0.99 |
array/permutedims/3d |
54989 ns |
55511 ns |
0.99 |
array/sorting/1d |
2766160 ns |
2757710 ns |
1.00 |
array/sorting/by |
3354758 ns |
3344132.5 ns |
1.00 |
array/sorting/2d |
1084688 ns |
1080389 ns |
1.00 |
cuda/synchronization/stream/auto |
1046 ns |
1015.8333333333334 ns |
1.03 |
cuda/synchronization/stream/nonblocking |
7903.799999999999 ns |
7618.9 ns |
1.04 |
cuda/synchronization/stream/blocking |
855.6 ns |
799.1530612244898 ns |
1.07 |
cuda/synchronization/context/auto |
1161.7 ns |
1164.1 ns |
1.00 |
cuda/synchronization/context/nonblocking |
8059.799999999999 ns |
7651.4 ns |
1.05 |
cuda/synchronization/context/blocking |
925.8139534883721 ns |
895.8490566037735 ns |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Opened to run benchmarks.
Todo: