Updated test logging and timeouts #697
Triggered via pull request
June 5, 2026 15:41
Status
Cancelled
Total duration
3h 57m 35s
Artifacts
7
rocm-ci-dispatch.yml
on: pull_request
determine_level
5s
CI Level 3
/
Select Docker Image
5s
CI Level 3
/
...
/
Build ROCm Docker image and TransformerEngine wheels
30m 43s
Matrix: dispatch / mgpu_tests
Matrix: dispatch / sgpu_tests
Annotations
15 errors, 8 warnings, and 1 notice
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-thd-cp_1_1-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_1_1', 'qkv_format=thd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-thd-cp_1_0-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_1_0', 'qkv_format=thd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-sbhd-cp_3_4-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_3_4', 'qkv_format=sbhd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-sbhd-cp_3_2-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_3_2', 'qkv_format=sbhd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-sbhd-cp_2_3-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_2_3', 'qkv_format=sbhd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-sbhd-cp_2_2-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_2_2', 'qkv_format=sbhd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-sbhd-cp_2_0-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_2_0', 'qkv_format=sbhd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-sbhd-cp_1_4-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_1_4', 'qkv_format=sbhd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-sbhd-cp_1_1-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_1_1', 'qkv_format=sbhd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
failed: tests.pytorch.attention.test_attention_with_cp
test_cp_with_fused_attention[False-None-False-False-False-p2p-sbhd-cp_1_0-bf16]::subprocess.CalledProcessError: Command '['python3', '-m', 'torch.distributed.launch', '--nproc-per-node=2', '/workspace/tests/pytorch/attention/run_attention_with_cp.py', 'dtype=bf16', 'model=cp_1_0', 'qkv_format=sbhd', 'kernel_backend=FusedAttention', 'cp_comm_type=p2p', 'fp8_bwd=False', 'fp8_dpa=False', 'fp8_mha=False', 'scaling_mode=None', 'f16_O=False', 'is_training=True', 'log_level=WARNING']' returned non-zero exit status 1.
|
|
CI Level 3 / mGPU Torch (mi35x)
Process completed with exit code 1.
|
|
CI Level 3 / mGPU Torch (mi35x)
PyTorch mGPU tests FAILED.
|
|
CI Level 3 / sGPU Tests (mi35x)
Canceling since a higher priority waiting request for PR Automatic CI-refs/pull/608/merge exists
|
|
CI Level 3 / sGPU Tests (mi35x)
The operation was canceled.
|
|
PR Automatic CI
Canceling since a higher priority waiting request for PR Automatic CI-refs/pull/608/merge exists
|
|
CI Level 3 / build / Build ROCm Docker image and TransformerEngine wheels
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / mGPU JAX (mi35x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / mGPU JAX (mi30x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / mGPU Torch (mi30x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / mGPU Torch (mi35x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / mGPU Torch (mi35x)
31 more failures omitted from annotations; see the job summary for the full list.
|
|
CI Level 3 / sGPU Tests (mi30x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / sGPU Tests (mi35x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / Select Docker Image
Using ci/ci_config.json from dev
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
logs-mgpu-mi30x-jax
Expired
|
48.8 KB |
sha256:37d4cdc0e4ddc5079e385e2d54c99c0d6ccbe6535686a1b4c69c45daf92bf150
|
|
|
logs-mgpu-mi30x-pytorch
Expired
|
113 KB |
sha256:12bed27efb59fedaa54b6cbf2a42e82b2c9a328cbcdc5cbfc43fda9911b7589e
|
|
|
logs-mgpu-mi35x-jax
Expired
|
53 KB |
sha256:080befbd928585bdfdf04237b8153e42741c9cbfbae19150acca03d9cba5e8ea
|
|
|
logs-mgpu-mi35x-pytorch
Expired
|
137 KB |
sha256:4ebb64d83ef85b57e50b649bef660af925032db4e3743116203dd8f727462f26
|
|
|
logs-sgpu-mi30x
Expired
|
3.42 MB |
sha256:a0337a44069940d31dde88467c7683bc82bb2af6997394c1bfb9d5e5b4ef3037
|
|
|
logs-sgpu-mi35x
Expired
|
2.71 MB |
sha256:974bb7e5ea0478f02e6fe25e2977aabe809d782cc17e4a9dde6c22228f65fd2b
|
|
|
te-rocm-wheels
Expired
|
722 MB |
sha256:a4c4f3320051701647fb7b52f9072752d789dd6e0da4c9e09b4356edd09eaf87
|
|