Skip to content

Conversation

@makortel
Copy link
Collaborator

From cms-sw/cmssw#35473 and cms-sw/cmssw#35542 for all other versions than hip (that was done in #233).

Originally by Vincenzo Innocente in cms-sw/cmssw#35473
Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473
Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473
Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473
Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473
Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473
Contains also cms-sw/cmssw#35542 by Andrea Bocci
Originally by Vincenzo Innocente in cms-sw/cmssw#35473
Contains also cms-sw/cmssw#35542 by Andrea Bocci
if constexpr (DEPTH == 0) {
printf("ERROR: GPUCACell::find_ntuplets reached full depth!\n");
ALPAKA_ASSERT_OFFLOAD(false);
} else {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard @VinInn For Alpaka I went with this instead of the specialization for DEPTH == 0 because partial specializations of functions are not allowed ("partial" caused by the additional T_Acc template argument). By quick test the throughput improvement is similar order (3-4 %) than in cuda without caching/async allocator and in kokkos (that both use the specialization as in the original PR). If you have any better suggestions, let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After seeing it, I actually like the approach with if constexpr better than the one with the specialisation for 0, as it keeps things more localised.

We should check that it works well also for the native CUDA and HIP cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok for you to do that in a subsequent PR?

@fwyzard
Copy link
Contributor

fwyzard commented Oct 15, 2021 via email

@makortel
Copy link
Collaborator Author

I opened an issue to remind about that #240.

@makortel makortel merged commit 25ac585 into cms-patatrack:master Oct 15, 2021
@VinInn
Copy link
Contributor

VinInn commented Oct 16, 2021

I consider "partial specializations of functions are not allowed" a defect in Alpaka.
The use of "if constexpr" instead of template specialization to terminate recursion is a more general coding pattern far more reaching that the case in hand

@fwyzard
Copy link
Contributor

fwyzard commented Oct 16, 2021

I consider "partial specializations of functions are not allowed" a defect in Alpaka.

Well, maybe a defect of C++ :-/ ?

@VinInn
Copy link
Contributor

VinInn commented Oct 16, 2021

let's rephrase: The need to specifically template in an intrusive fashion ALL functions with T_ACC is a serious issue in Alpaka

@fwyzard
Copy link
Contributor

fwyzard commented Oct 16, 2021

We don't need to follow the same approach in our code, but I don't think a template parameter is a bag choice.
For example, after working on #241 I think that a template parameter is easier to deal with that a different namespace - especially if we don't want to multiplicate every symbol, object, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants