-
Notifications
You must be signed in to change notification settings - Fork 47
Remove runtime recursion from find_ntuplets() #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove runtime recursion from find_ntuplets() #239
Conversation
Originally by Vincenzo Innocente in cms-sw/cmssw#35473 Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473 Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473 Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473 Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473 Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473 Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473 Contains also cms-sw/cmssw#35542 by Andrea Bocci
| if constexpr (DEPTH == 0) { | ||
| printf("ERROR: GPUCACell::find_ntuplets reached full depth!\n"); | ||
| ALPAKA_ASSERT_OFFLOAD(false); | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fwyzard @VinInn For Alpaka I went with this instead of the specialization for DEPTH == 0 because partial specializations of functions are not allowed ("partial" caused by the additional T_Acc template argument). By quick test the throughput improvement is similar order (3-4 %) than in cuda without caching/async allocator and in kokkos (that both use the specialization as in the original PR). If you have any better suggestions, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After seeing it, I actually like the approach with if constexpr better than the one with the specialisation for 0, as it keeps things more localised.
We should check that it works well also for the native CUDA and HIP cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be ok for you to do that in a subsequent PR?
|
Sure.
|
|
I opened an issue to remind about that #240. |
|
I consider "partial specializations of functions are not allowed" a defect in Alpaka. |
Well, maybe a defect of C++ :-/ ? |
|
let's rephrase: The need to specifically template in an intrusive fashion ALL functions with |
|
We don't need to follow the same approach in our code, but I don't think a template parameter is a bag choice. |
From cms-sw/cmssw#35473 and cms-sw/cmssw#35542 for all other versions than
hip(that was done in #233).