-
Notifications
You must be signed in to change notification settings - Fork 58
[Pending on SYCL IPC] Add symmetric memory support on XPU device #2041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zhangxiaoli73
wants to merge
11
commits into
main
Choose a base branch
from
cherry/add-symm-xpu
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
76b7465
to
2ac439c
Compare
- Update UT result check with xml info - Add reproduce command for UT - Add UT test case number check disable_e2e --------- Co-authored-by: mengfei25 <[email protected]>
1. use cache in container for datasets and models 2. fix np.bool8 issue in soft_actor_critic 3. fix microbench test reference issue 4. remove inductor test in nightly 5. use nightly wheel in CI if build not necessary disable_build
- Set new max job for accelerating build - Separate the ut test and result check, which align with linux test disable_e2e disable_distribute
1. enable test in container 2. use local python instead of conda 3. enable pytest parallel run and continue-if-cash 4. use pytest-xdist to parallelize tests instead of pytest-shard on a 8 cards system 5. all tests on rolling driver test accelerate and transformers only disable_build disable_ut disable_e2e disable_distributed
follows #1883, shape [4096,256,6,6] channel last with output shape [6,6] in torchbench alexnet can get ~4x improvement on bmg --------- Co-authored-by: Copilot <[email protected]>
Refer #2019, support allow_inflight_collective_as_graph unregister
…ycl::reqd_sub_group_size(SIMD)]]` and remove unnecessary attributes (#1828) ### Summary This PR updates the codebase to replace the deprecated `[[intel::reqd_sub_group_size(SgSize)]]` attribute with the new `[[sycl::reqd_sub_group_size(SIMD)]]` attribute. Additionally, the attribute has been removed from certain locations where it was deemed unnecessary.These changes also aim to reduce the number of warnings, thereby decreasing the log size. ### Changes 1. **Attribute Replacement**: - Replaced all instances of `[[intel::reqd_sub_group_size(SgSize)]]` with `[[sycl::reqd_sub_group_size(SIMD)]]` to align with the latest SYCL specification and avoid using deprecated attributes. 2. **Attribute Removal**: - Removed the `[[sycl::reqd_sub_group_size(SIMD)]]` attribute from functions and kernels where it was not necessary. This was done to simplify the code and avoid redundant specifications. Co-authored-by: guangyey <[email protected]> Co-authored-by: Yutao Xu <[email protected]> Co-authored-by: Tomasz Socha <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PyTorch provides symmetric memory support on CUDA device.
Accordingly, we would like to provide similar feature on XPU device.