-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bad scaling in map
#191
Comments
It looks like it actually scales julia> @time map(x->sleep(0.005), a);
0.098876 seconds (58.11 k allocations: 2.854 MiB)
julia> @time map(x->sleep(0.005), da);
0.282395 seconds (689.12 k allocations: 35.036 MiB, 3.20% gc time)
julia> @time map(x->sleep(0.5), a);
5.066354 seconds (58.11 k allocations: 2.854 MiB)
julia> @time map(x->sleep(0.5), da);
2.763649 seconds (688.98 k allocations: 34.847 MiB, 0.15% gc time) but that the latency is huge. I might try to bisect this later. |
Yes latency has been becoming larger, and I haven't figured out where it is coming from... |
I think this is related to matrix multiplication using multi threaded BLAS on the master but single threaded BLAS on the workers.
Julia 0.3 used to set blas num threads to 1 on the master too when multi was invoked - https://github.com/JuliaLang/julia/blob/v0.3.0/base/multi.jl#L1233 . Somewhere along the way this was changed to the master process having default blas settings. |
Setting num BLAS threads to 1 on the master shows distributed map scaling reasonably well for larger workloads.
Compilation runs omitted. |
It doesn't explain my julia> @time map(x->(sleep(0.005);x), a);
elapsed time: 0.071501629 seconds (5472 bytes allocated)
julia> @time map(x->(sleep(0.005);x), da);
elapsed time: 0.035031453 seconds (24552 bytes allocated) so the latency is now much higher. In Julia 0.6, I'm getting julia> @time map(x->(sleep(0.005);x), a);
0.085523 seconds (9.70 k allocations: 530.142 KiB)
julia> @time map(x->(sleep(0.005);x), da);
0.117121 seconds (66.14 k allocations: 3.522 MiB) Initially, I thought this might just be overhead from julia> function mymap(f, da::DArray)
DArray(size(da), procs(da)) do I
map(f, Array(da[I...]))
end
end
mymap (generic function with 1 method)
julia> @time mymap(x->(sleep(0.005);x), da);
0.173570 seconds (434.61 k allocations: 21.887 MiB, 2.71% gc time) |
Timing
The latency is in the REPL though I don't understand why it so much only for the |
Good observation. After setting the number of threads to one on the master process, the original example becomes julia> function f()
a = fill(1000,10)
da = distribute(a)
@time map(t -> rand(t,t)^2, a)
@time map(t -> rand(t,t)^2, da)
return nothing
end
f (generic function with 1 method)
julia> f()
0.401364 seconds (42 allocations: 152.590 MiB)
0.206807 seconds (371 allocations: 25.438 KiB) after warm up. So the extra latency seems to be associated with compilation. |
@JeffBezanson What are your thoughts here? It seems that the main difference between the 0.3 and 1.0 timings is due to more/slower compilation associated with the top level execution of |
(first time compilation omitted)
Even though this is embarrassingly parallel, the distributed version is consistently around the same time or slower. I tried this in julia 0.3 and the distributed time is around 0.5 seconds, close to the expected ~2x speedup.
The text was updated successfully, but these errors were encountered: