- 
                Notifications
    
You must be signed in to change notification settings  - Fork 61
 
Closed
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue
Milestone
Description
🐛 Describe the bug
please get the wheel from https://github.com/intel/torch-xpu-ops/actions/runs/16826215961
or use gh download
gh run download 16826215961 --repo intel/torch-xpu-ops --name Torch-XPU-Wheel-1826 --dir path --pattern "*.zip"
git clone -b distributed_2.9 https://github.com/daisyden/pytorch.git
cd pytorch
pip install pytest expecttest zstandard
pip install -r requirements.txt
pytest -v test/distributed/checkpoint/test_utils.py::TestDistWrapper::test_barrier
Traceback (most recent call last):
  File "/home/jenkins/.conda/envs/xpu_op_/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 716, in wrapper
    self._join_processes(fn)
  File "/home/jenkins/.conda/envs/xpu_op_/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 980, in _join_processes
    self._check_return_codes(fn, elapsed_time)
  File "/home/jenkins/.conda/envs/xpu_op_/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1025, in _check_return_codes
    raise RuntimeError(
RuntimeError: Process 0 terminated or timed out after 300.01091861724854 seconds
The pass rate is around 50%
Versions
PyTorch: https://github.com/daisyden/pytorch/tree/distributed_2.9
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue