[Pending on SYCL IPC] Add symmetric memory support on XPU device #2041

zhangxiaoli73 · 2025-09-15T07:04:49Z

PyTorch provides symmetric memory support on CUDA device.

Accordingly, we would like to provide similar feature on XPU device.

…_bundle (#1935)" (#2026) (#2035) Cherry-pick #2026 to solve #2025. This reverts commit 83a1555 to quickly bypass the performance regression caused by usage of sycl::get_kernel_bundle.

…ocessGroupXCCL (#2077) See #2076 , this is a cherry-pick for 2.9 release This is a high impact bug, in many distributed applications this is a large memory leak resulting in OoM error (see #2084)

- Update UT result check with xml info - Add reproduce command for UT - Add UT test case number check disable_e2e --------- Co-authored-by: mengfei25 <[email protected]>

1. use cache in container for datasets and models 2. fix np.bool8 issue in soft_actor_critic 3. fix microbench test reference issue 4. remove inductor test in nightly 5. use nightly wheel in CI if build not necessary disable_build

- Set new max job for accelerating build - Separate the ut test and result check, which align with linux test disable_e2e disable_distribute

1. enable test in container 2. use local python instead of conda 3. enable pytest parallel run and continue-if-cash 4. use pytest-xdist to parallelize tests instead of pytest-shard on a 8 cards system 5. all tests on rolling driver test accelerate and transformers only disable_build disable_ut disable_e2e disable_distributed

follows #1883, shape [4096,256,6,6] channel last with output shape [6,6] in torchbench alexnet can get ~4x improvement on bmg --------- Co-authored-by: Copilot <[email protected]>

Refer #2019, support allow_inflight_collective_as_graph unregister

…ycl::reqd_sub_group_size(SIMD)]]` and remove unnecessary attributes (#1828) ### Summary This PR updates the codebase to replace the deprecated `[[intel::reqd_sub_group_size(SgSize)]]` attribute with the new `[[sycl::reqd_sub_group_size(SIMD)]]` attribute. Additionally, the attribute has been removed from certain locations where it was deemed unnecessary.These changes also aim to reduce the number of warnings, thereby decreasing the log size. ### Changes 1. **Attribute Replacement**: - Replaced all instances of `[[intel::reqd_sub_group_size(SgSize)]]` with `[[sycl::reqd_sub_group_size(SIMD)]]` to align with the latest SYCL specification and avoid using deprecated attributes. 2. **Attribute Removal**: - Removed the `[[sycl::reqd_sub_group_size(SIMD)]]` attribute from functions and kernels where it was not necessary. This was done to simplify the code and avoid redundant specifications. Co-authored-by: guangyey <[email protected]> Co-authored-by: Yutao Xu <[email protected]> Co-authored-by: Tomasz Socha <[email protected]>

[Release/2.9] Revert "Roll back to original usage of sycl::get_kernel…

f8408a6

…_bundle (#1935)" (#2026) (#2035) Cherry-pick #2026 to solve #2025. This reverts commit 83a1555 to quickly bypass the performance regression caused by usage of sycl::get_kernel_bundle.

zhangxiaoli73 requested review from Chao1Han and gujinghui September 15, 2025 07:05

zhangxiaoli73 changed the title ~~Add symmetric memory support on XPU device~~ [Pending on SYCL IPC] Add symmetric memory support on XPU device Sep 17, 2025

[release/2.9] Revert tracking of Work status for FlightRecorder in Pr…

789f59d

…ocessGroupXCCL (#2077) See #2076 , this is a cherry-pick for 2.9 release This is a high impact bug, in many distributed applications this is a large memory leak resulting in OoM error (see #2084)

zhangxiaoli73 force-pushed the cherry/add-symm-xpu branch from 76b7465 to 2ac439c Compare September 22, 2025 02:21

RUIJIEZHONG66166 and others added 9 commits September 22, 2025 10:49

[CI] Update UT result check and add Reproduce Command for UT (#1984)

ec5b214

- Update UT result check with xml info - Add reproduce command for UT - Add UT test case number check disable_e2e --------- Co-authored-by: mengfei25 <[email protected]>

[CI] Modify test workflows (#2018)

089aeac

1. use cache in container for datasets and models 2. fix np.bool8 issue in soft_actor_critic 3. fix microbench test reference issue 4. remove inductor test in nightly 5. use nightly wheel in CI if build not necessary disable_build

[CI] Enhance Windows CI (#1990)

9fd8e44

- Set new max job for accelerating build - Separate the ut test and result check, which align with linux test disable_e2e disable_distribute

optimize adptive avg pool (#2012)

032699a

follows #1883, shape [4096,256,6,6] channel last with output shape [6,6] in torchbench alexnet can get ~4x improvement on bmg --------- Co-authored-by: Copilot <[email protected]>

add unregister wait_tensor (#2019)

d31563a

Refer #2019, support allow_inflight_collective_as_graph unregister

support symm memory on XPU devices

a8f5ba7

wa for d2d copy

2ac439c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Pending on SYCL IPC] Add symmetric memory support on XPU device #2041

[Pending on SYCL IPC] Add symmetric memory support on XPU device #2041

Uh oh!

zhangxiaoli73 commented Sep 15, 2025

Uh oh!

Uh oh!

[Pending on SYCL IPC] Add symmetric memory support on XPU device #2041

Are you sure you want to change the base?

[Pending on SYCL IPC] Add symmetric memory support on XPU device #2041

Uh oh!

Conversation

zhangxiaoli73 commented Sep 15, 2025

Uh oh!

Uh oh!