Skip to content

Conversation

@makortel
Copy link
Collaborator

@makortel makortel commented Oct 22, 2021

This PR PR is on top of #250 (to avoid conflicts, to be rebased after #250 is merged) and simplifies the Device management in ScopedContext and Product by using the Alpaka's Device abstraction instead of a mixture of it and an integer. ScopedContext no longer needs the Device to be given by outside, but takes it from the global vector using the same logic as in cuda.

The code could be simplified further by not storing the Device at all, since it can be obtained from Queue, although that would lead to Device objects be copied more. The cost of those copies appear to be at most the same as with std::shared_ptr, so maybe that acceptable at this point?

@makortel makortel requested a review from fwyzard October 22, 2021 01:27
@fwyzard
Copy link
Contributor

fwyzard commented Oct 24, 2021

Since ContextState is used to keep track of the "stream" and "device", and Alpaka's Queue already encapsulate both, would it work to simply drop the ContextState and use the Queue directly ?

here is a possible implementation of this idea.
I only checked that --cuda/--serial/--tbb build, run, give the correct results according to --validation, and are not slower than before.
And I pick roughly at random whether to pass std::shared_ptr<Queue> by value, reference or const reference :-/

While working on it, I though of one reason we may decide not to do this: if we ever want to support submitting a single collaborative kernel to multiple devices at the same time, having the device onlyas part of the queue may not be the best idea.
But, that doesn't seem like a realistic use case any time soon...

@makortel
Copy link
Collaborator Author

here is a possible implementation of this idea.

I have one comment on the changes themselves: The SiPixelRawToCluster::ctxState_ (or the shared_ptr<Queue> equivalent) would be needed in the (near?) future when SiPixelRawToCluster will be made EDProducerExternalWork to avoid the blocking synchronization.

While working on it, I though of one reason we may decide not to do this: if we ever want to support submitting a single collaborative kernel to multiple devices at the same time, having the device onlyas part of the queue may not be the best idea.
But, that doesn't seem like a realistic use case any time soon...

I agree that particular case (spreading work from one Event to many devices) may be quite far in the future (given that we have "hard time" in filling even one GPU with the work of one Event). Written that, if really needed, a multi-device multi-stream work could be launched already in today's system as long as the end result ends up in "the device", "the stream" is synchronized with all the other streams, and the "multi-device state" is internal to one module (i.e. does not have to be passed through a chain of modules).

But the same argument would hold if we realize later that we'd need some other information be passed from the "acquire context" to "produce context". Having a specific class for it would make the addition trivial, but having to go through all necessary EDModules would mean more work (although probably not much compared to other migrations we are doing or have done, but still additional work).

I'm leaning towards keeping the specific class (if the delivery of device+stream is deemed to be necessary, ref #224, but that's more for the eventual CMSSW deployment; in the alpaka program for time being I'd try to mimic the behavior of cuda).

@fwyzard fwyzard force-pushed the alpakaScopedContextDevice branch from 86abd75 to e3fdd33 Compare October 26, 2021 07:18
@fwyzard
Copy link
Contributor

fwyzard commented Oct 26, 2021

Rebased after merging #250 .

@fwyzard
Copy link
Contributor

fwyzard commented Oct 26, 2021

I'm leaning towards keeping the specific class (if the delivery of device+stream is deemed to be necessary, ref #224, but that's more for the eventual CMSSW deployment; in the alpaka program for time being I'd try to mimic the behavior of cuda).

OK, I'll prepare a set of less invasive changes: take the Device from the Queue, but keep the Queue in the ScopedContext.

@fwyzard
Copy link
Contributor

fwyzard commented Oct 26, 2021

@makortel what do you think of these changes ?

@makortel
Copy link
Collaborator Author

@makortel what do you think of these changes ?

Looks good!

@fwyzard fwyzard merged commit 535ed9e into cms-patatrack:master Oct 27, 2021
@makortel makortel deleted the alpakaScopedContextDevice branch October 27, 2021 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants