Skip to content

Conversation

@rchen20
Copy link
Member

@rchen20 rchen20 commented Jun 19, 2020

No description provided.

@trws
Copy link
Member

trws commented Jul 21, 2021

@rchen20 what's the status of this? Should we take this, close it, rework it..?

@rchen20
Copy link
Member Author

rchen20 commented Jul 21, 2021

@rchen20 what's the status of this? Should we take this, close it, rework it..?

@trws I don't think we resolved this yet. Let me take another look at it and get back to you. I don't think this is critical for the upcoming release(s) though.

@trws
Copy link
Member

trws commented Jul 21, 2021

Good to know, thanks @rchen20. If we aren't going to use this implementation, I'd like to convert it to an issue (if we don't have one) and close this though. Let me know what makes sense so we can clean up.

@rchen20
Copy link
Member Author

rchen20 commented Jul 26, 2021

Hi @trws, this PR came from a problem we were trying to solve a while ago with the omp-target resource device pointer not being mapped properly (see the email snippet below). I recall we tried mapping the device memory allocation directly, and using unified_address, but those techniques did not work. I'd like to keep this PR around so we can remember what we tried, but let me know if you want to close this and make it an issue instead.

Ok, so that’s not part of allocate, it could be part of memset (I’ll fix that in a second) but this is the core problem, and I’d like to hear what you all think of how we should handle this.

Currently, the resource allocates device memory and returns it. This is to make it work-alike with the CUDA and HIP versions. The problem you’re seeing is because the target region implicitly maps the thing, and apparently isn’t handling lambdas correctly yet because it’s nulling a captured pointer, someone would have the same issue with a bare target region unless they put use_device_ptr(p) on it or added a #pragma omp requires unified_address in the translation unit.

Alternately, I can make a resource, possibly an extension of this one with a flag, that allocates host memory and maps it to the device, in which case the mapping will be found automatically in this case and things work, but we are “wasting” host memory.

Thoughts?

-Tom

On 24 Apr 2020, at 10:54, Chen, Robert Chang Che wrote:

Hey Tom,

I’m trying out the register-singleton branch and it compiles, but runs into a segfault during v1::Omp::allocate() with xl/2020.03.18:

[----------] 3 tests from OpenMPTarget/ForallSegmentTest/0, where TypeParam = camp::list<long, camp::resources::v1::Omp, RAJA::policy::omp::omp_target_parallel_for_exec<8ul> >
[ RUN ] OpenMPTarget/ForallSegmentTest/0.RangeSegmentForall
1587-164 Encountered a zero-length array section that points to memory starting at address 0x200053200000. Because this memory is not currently mapped on the target device 0, a NULL pointer will be passed to the device.
1587-175 The underlying GPU runtime reported the following error "an illegal memory access was encountered".
1587-163 Error encountered while attempting to execute on the target device 0. The program will stop.

Side question: Will the ODR limit the singleton to one definition per translation unit, or per program?

Thanks,

Robert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants