-
Notifications
You must be signed in to change notification settings - Fork 3
First draft for the section on atomics #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pzehner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better with the tables, indeed. I think that the layout of the slides should be modified to separate the code from its execution.
| % Trainee could play with the following program to check that it really present a race condition: | ||
| %#include <iostream> | ||
| %#include <Kokkos_Core.hpp> | ||
| % | ||
| %int main(int argc, char *argv[]) { | ||
| % Kokkos::initialize(argc, argv); | ||
| % { | ||
| % const int N = 10000; | ||
| % Kokkos::View<double*> v("v", N); | ||
| % Kokkos::deep_copy(v, 4); | ||
| % | ||
| % Kokkos::View<double> res("res", N); | ||
| % | ||
| % Kokkos::parallel_for(Kokkos::RangePolicy(0, N), | ||
| % KOKKOS_LAMBDA(int i) { | ||
| % //Kokkos::atomic_add(&res(), v(i)); | ||
| % res() = res() + v(i); | ||
| % }); | ||
| % | ||
| % double res_; | ||
| % | ||
| % deep_copy(res_, res); | ||
| % | ||
| % std::cout << "res_:" << res_ << std::endl; | ||
| % std::cout << "4*N:" << 4*N << std::endl; | ||
| % } | ||
| % Kokkos::finalize(); | ||
| %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea for an exercise.
courses/02_intermediate/main.tex
Outdated
| \item they bypass and invalidate cache line. | ||
| \end{itemize} | ||
|
|
||
| => Atomics should be used with care and only when strictly necessary.\linebreak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that sometimes, developers should change an algorithm that depends on atomics.
This is especially true for algorithms that iterate over faces of a mesh, then over the two cells neighboring the face. This pattern is very common for unstructured CFD codes that run on CPU, because then you compute the flux between the two cells only once. This can be ported to GPU as is, but sometimes the best strategy is actually to rewrite the algorithm to iterate over the cells directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you are right, but it deserves its own set of slides. I didn't had time today, I'll see later.
Thank you for the proposed example.
courses/02_intermediate/main.tex
Outdated
| For some of your needs, more performant alternative exist, like \texttt{parallel\_reduce} or \texttt{Kokkos::ScatterView}. | ||
| \end{frame} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware, because on GPU a ScatterView will not give you any performance gain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following a discussion I had this morning, maybe it's better to not talk about ScatterView in this tutorial. It's still experimental and maybe less suited for nowadays CPUs. Especially, since it relies on data duplication on CPU, it may be counterproductive on CPUs with a very large number of threads (say, more than 100).
Signed-off-by: Paul Gannay <[email protected]>
Co-authored-by: Paul Zehner <[email protected]>
pzehner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's almost done. Just some fixes.
courses/02_intermediate/main.tex
Outdated
| \begin{column}{0.5\linewidth} | ||
| \begin{minted}{C++} | ||
| Kokkos::View<double*> histo(5); | ||
| Kokkos::deep_copy(histo, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to manually initialize to 0, the view does it by default.
(Same remark for the other slides.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is guaranteed and not a side effect of memory allocation?
I find the doc not very clear on this subject, all allocating constructor have this text:
The initialization is executed on the default instance of the execution space corresponding to memory_space and fences it.
but it doesn't explain what kind of initialisation takes place for default types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked Adrien (he worked on View initialisation), and he confirmed that you are right, I will delete the extra deep_copy.
courses/02_intermediate/main.tex
Outdated
| \colorlet{thread1}{gray!25} | ||
| \colorlet{thread6}{example!25} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would select two tones of gray instead
| \colorlet{thread1}{gray!25} | |
| \colorlet{thread6}{example!25} | |
| \colorlet{thread1}{gray!20} | |
| \colorlet{thread6}{gray!40} |
Or plainly use colors:
| \colorlet{thread1}{gray!25} | |
| \colorlet{thread6}{example!25} | |
| \colorlet{thread1}{lightalert} | |
| \colorlet{thread6}{lightexample} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried with the different levels of gray but found it hard to read, especially on slide 30.
The light red + light blue looks nice in colour but is harder to differentiate in B&W.
I'll do the change, we'll revert it if you think readability in B&W is important.
No description provided.