Skip to content

Conversation

zhangxiaoli73
Copy link
Contributor

PyTorch provides symmetric memory support on CUDA device.

Accordingly, we would like to provide similar feature on XPU device.

…_bundle (#1935)" (#2026) (#2035)

Cherry-pick #2026 to solve #2025.
This reverts commit 83a1555 to quickly
bypass the performance regression caused by usage of
sycl::get_kernel_bundle.
@zhangxiaoli73 zhangxiaoli73 changed the title Add symmetric memory support on XPU device [Pending on SYCL IPC] Add symmetric memory support on XPU device Sep 17, 2025
…ocessGroupXCCL (#2077)

See #2076 , this is a cherry-pick for 2.9 release

This is a high impact bug, in many distributed applications this is a
large memory leak resulting in OoM error (see
#2084)
RUIJIEZHONG66166 and others added 9 commits September 22, 2025 10:49
- Update UT result check with xml info
- Add reproduce command for UT
- Add UT test case number check
disable_e2e

---------

Co-authored-by: mengfei25 <[email protected]>
1. use cache in container for datasets and models
2. fix np.bool8 issue in soft_actor_critic
3. fix microbench test reference issue
4. remove inductor test in nightly
5. use nightly wheel in CI if build not necessary

disable_build
- Set new max job for accelerating build
- Separate the ut test and result check, which align with linux test

disable_e2e
disable_distribute
1. enable test in container
2. use local python instead of conda
3. enable pytest parallel run and continue-if-cash
4. use pytest-xdist to parallelize tests instead of pytest-shard on a 8
cards system
5. all tests on rolling driver

test accelerate and transformers only
disable_build
disable_ut
disable_e2e
disable_distributed
follows #1883, shape [4096,256,6,6] channel last with output shape [6,6]
in torchbench alexnet can get ~4x improvement on bmg

---------

Co-authored-by: Copilot <[email protected]>
Refer #2019, support
allow_inflight_collective_as_graph unregister
…ycl::reqd_sub_group_size(SIMD)]]` and remove unnecessary attributes (#1828)

### Summary

This PR updates the codebase to replace the deprecated
`[[intel::reqd_sub_group_size(SgSize)]]` attribute with the new
`[[sycl::reqd_sub_group_size(SIMD)]]` attribute. Additionally, the
attribute has been removed from certain locations where it was deemed
unnecessary.These changes also aim to reduce the number of warnings,
thereby decreasing the log size.

### Changes

1. **Attribute Replacement**:
- Replaced all instances of `[[intel::reqd_sub_group_size(SgSize)]]`
with `[[sycl::reqd_sub_group_size(SIMD)]]` to align with the latest SYCL
specification and avoid using deprecated attributes.

2. **Attribute Removal**:
- Removed the `[[sycl::reqd_sub_group_size(SIMD)]]` attribute from
functions and kernels where it was not necessary. This was done to
simplify the code and avoid redundant specifications.

Co-authored-by: guangyey <[email protected]>
Co-authored-by: Yutao Xu <[email protected]>
Co-authored-by: Tomasz Socha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants