Ipanfilo/ci test fixes #775
rocm-ci-dispatch.yml
on: pull_request
determine_level
4s
CI Level 3
/
Select Docker Image
5s
CI Level 3
/
...
/
Build ROCm Docker image and TransformerEngine wheels
23m 27s
Matrix: dispatch / mgpu_tests
Matrix: dispatch / sgpu_tests
Annotations
13 errors and 3 warnings
|
CI Level 3 / sGPU Tests (mi35x)
Process completed with exit code 1.
|
|
CI Level 3 / sGPU Tests (mi35x)
torch tests FAILED.
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-512-4096-2048-160-False-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-512-4096-2048-160-False-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-32-16-8-4-True-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-32-16-8-4-True-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-32-16-8-4-False-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-32-16-8-4-False-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-10-2-3-4-True-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-10-2-3-4-True-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-10-2-3-4-False-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-10-2-3-4-False-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
|
|
CI Level 3 / sGPU Tests (mi35x)
Process completed with exit code 1.
|
|
CI Level 3 / mGPU JAX (mi35x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / sGPU Tests (mi35x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
|
|
CI Level 3 / sGPU Tests (mi35x)
94 more failures omitted from annotations; see the job summary for the full list.
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
logs-mgpu-mi35x-jax
Expired
|
53.2 KB |
sha256:83b126402b3bc590bdec0e2f825101c687516c659473302d46e535696e4172dc
|
|
|
logs-sgpu-mi35x
Expired
|
3.5 MB |
sha256:ab84f5dc8d82d8dfbbe91a890d836a52a1baa5e7b56841673bad72e9410ba13a
|
|