Skip to content

Ipanfilo/ci test fixes #775

Ipanfilo/ci test fixes

Ipanfilo/ci test fixes #775

Re-run triggered June 12, 2026 15:04
Status Failure
Total duration 2h 25m 43s
Artifacts 9

rocm-ci-dispatch.yml

on: pull_request
determine_level
4s
determine_level
CI Level 3  /  Select Docker Image
5s
CI Level 3 / Select Docker Image
CI Level 3  /  ...  /  Build ROCm Docker image and TransformerEngine wheels
23m 27s
CI Level 3 / build / Build ROCm Docker image and TransformerEngine wheels
Matrix: dispatch / mgpu_tests
Matrix: dispatch / sgpu_tests
Fit to window
Zoom out
Zoom in

Annotations

13 errors and 3 warnings
CI Level 3 / sGPU Tests (mi35x)
Process completed with exit code 1.
CI Level 3 / sGPU Tests (mi35x)
torch tests FAILED.
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-512-4096-2048-160-False-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-512-4096-2048-160-False-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-32-16-8-4-True-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-32-16-8-4-True-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-32-16-8-4-False-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-32-16-8-4-False-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-10-2-3-4-True-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-10-2-3-4-True-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-10-2-3-4-False-p]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
failed: tests.pytorch.triton_kernels.test_grouped_gemm
test_tgmm[rng77-tlhsT-obf16-ibf16-10-2-3-4-False-np]::RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
CI Level 3 / sGPU Tests (mi35x)
Process completed with exit code 1.
CI Level 3 / mGPU JAX (mi35x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
CI Level 3 / sGPU Tests (mi35x)
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/download-artifact@v4, actions/upload-artifact@v4. Actions will be forced to run with Node.js 24 by default starting June 16th, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
CI Level 3 / sGPU Tests (mi35x)
94 more failures omitted from annotations; see the job summary for the full list.

Artifacts

Produced during runtime
Name Size Digest
logs-mgpu-mi35x-jax Expired
53.2 KB
sha256:83b126402b3bc590bdec0e2f825101c687516c659473302d46e535696e4172dc
logs-sgpu-mi35x Expired
3.5 MB
sha256:ab84f5dc8d82d8dfbbe91a890d836a52a1baa5e7b56841673bad72e9410ba13a