Skip to content

[SYCL][CUDA] Returns minimum mandated capabilities for atomic_fence device queries #8901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

maarquitos14
Copy link
Contributor

@maarquitos14 maarquitos14 commented Mar 31, 2023

Currently, we were returning an error because it was unimplemented. I believe it makes more sense to return minimum mandated capabilities, as we do in other backends (e.g. HIP).

@maarquitos14 maarquitos14 requested a review from a team as a code owner March 31, 2023 08:11
@maarquitos14 maarquitos14 requested a review from npmiller March 31, 2023 08:11
@maarquitos14 maarquitos14 temporarily deployed to aws March 31, 2023 08:37 — with GitHub Actions Inactive
@maarquitos14 maarquitos14 temporarily deployed to aws March 31, 2023 09:41 — with GitHub Actions Inactive
@hdelan
Copy link
Contributor

hdelan commented Apr 3, 2023

Unfortunately the minimum mandated capabilities are not supported by all CUDA arches (especially acq_rel, which is not supported on GTX 1050, for instance). In the PR for HIP atomics I didn't include acq_rel for the same reason https://github.com/intel/llvm/pull/8003/files#diff-b88f648055cd14bce67433ad94075c14ff434a79d971857da7cf1a5daf596dd4R1855 Would you consider removing acq_rel?

@hdelan
Copy link
Contributor

hdelan commented Apr 3, 2023

This is beyond the scope of this PR, but if we want this query to return useful and correct information relating to a particular device we may have to implement some logic by hand to return the correct capabilities for each correct arch.

I am also unsure of subgroup level atomics. Do all CUDA arches support this (natively)?

@maarquitos14
Copy link
Contributor Author

This is beyond the scope of this PR, but if we want this query to return useful and correct information relating to a particular device we may have to implement some logic by hand to return the correct capabilities for each correct arch.

I am also unsure of subgroup level atomics. Do all CUDA arches support this (natively)?

Totally agree, as soon as I get this merged I will open an issue for that.

@maarquitos14
Copy link
Contributor Author

Unfortunately the minimum mandated capabilities are not supported by all CUDA arches (especially acq_rel, which is not supported on GTX 1050, for instance). In the PR for HIP atomics I didn't include acq_rel for the same reason https://github.com/intel/llvm/pull/8003/files#diff-b88f648055cd14bce67433ad94075c14ff434a79d971857da7cf1a5daf596dd4R1855 Would you consider removing acq_rel?

I'm not sure we can return something including less than minimum mandated capabilities, to be honest. @gmlueck what do you think?

@gmlueck
Copy link
Contributor

gmlueck commented Apr 3, 2023

I'm not sure we can return something including less than minimum mandated capabilities, to be honest. @gmlueck what do you think?

No, you cannot. All SYCL devices must support at least the required minimum capabilities when the spec says this.

I think you are asking about info::device::atomic_fence_order_capabilities here, correct? If the hardware does not natively support an acq_rel fence operation, can't you implement it with separate acquire and release fences?

@maarquitos14
Copy link
Contributor Author

I'm not sure we can return something including less than minimum mandated capabilities, to be honest. @gmlueck what do you think?

No, you cannot. All SYCL devices must support at least the required minimum capabilities when the spec says this.

I think you are asking about info::device::atomic_fence_order_capabilities here, correct? If the hardware does not natively support an acq_rel fence operation, can't you implement it with separate acquire and release fences?

Honestly, I'm not sure if @hdelan is talking about info::device::atomic_fence_order_capabilities. I just checked HIP code and the query for info::device::atomic_fence_order_capabilities is returning minimum mandated capabilities (including acq_rel), but info::device::atomic_memory_order_capabilities is the one not including acq_rel. @hdelan can you confirm?

@gmlueck
Copy link
Contributor

gmlueck commented Apr 3, 2023

but info::device::atomic_memory_order_capabilities is the one not including acq_rel.

The SYCL spec has extremely minimal requirements for this query. The only requirement is that the device must support the relaxed memory order (which is basically a no-op).

@hdelan
Copy link
Contributor

hdelan commented Apr 3, 2023

Aha sorry I mistakenly conflated this with atomic_memory_order_capabilities, apologies. LGTM

Copy link
Contributor

@npmiller npmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bader bader merged commit 82ac98f into intel:sycl Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants