gfx1201 gpu_navi4x runners have broken ROCm environments: rocminfo segfaults
Summary
Several recent CI CMake Linux gfx1201 package-test jobs on the gpu_navi4x
runner pool show the same runner-health failure: packaged rocminfo resolves
from the installed public dependency package, then immediately segfaults.
This is not a normal test failure. When the workflow temporarily treated
rocminfo as a warning and continued, the installed GPU suite produced a large
cascade of process segfaults, usually:
42% tests passed, 38 tests failed out of 66
Errors while running CTest
PR #52 changes the workflow to use rocminfo as a hard ROCm environment
canary again. While gfx1201 is marked experimental, the job can report the
broken runner and skip the installed suite instead of generating the cascade.
Impacted Runners
Sampled the last 80 CI CMake Linux workflow runs on 2026-06-09.
Matching rocminfo segfaults were observed on:
| Runner label |
Runner name |
Matching jobs |
gpu_navi4x |
CS-RORDMZ-DT145 |
7 |
gpu_navi4x |
CS-RORDMZ-DT147 |
4 |
No matching rocminfo segfault was found on CS-RORDMZ-DT143 in this sampled
window. The latest PR #52 gfx1201 job on CS-RORDMZ-DT143 succeeded:
Failure Signature
Pre-PR #52 warning-mode example:
rocminfo path: /home/ubuntu/actions-runner/_work/hrx-system/hrx-system/build/linux-gpu/install/public-deps/bin/rocminfo
/home/ubuntu/actions-runner/_work/_temp/9ead8cff-499a-488f-a48b-fa504e5798ed.sh: line 4: 382707 Segmentation fault (core dumped) rocminfo
Warning: rocminfo failed; continuing to installed GPU tests
42% tests passed, 38 tests failed out of 66
Errors while running CTest
PR #52 canary-mode example:
rocminfo path: /home/ubuntu/actions-runner/_work/hrx-system/hrx-system/build/linux-gpu/install/public-deps/bin/rocminfo
.github/scripts/check_rocm_environment.sh: line 19: 347376 Segmentation fault (core dumped) rocminfo
Error: ROCm environment on this runner is broken; rocminfo failed before GPU tests.
Matching Jobs
| Started UTC |
Workflow run |
Job |
Result |
Runner |
Symptom |
| 2026-06-09 06:36:42 |
27188339883 |
80262712962 |
failure |
CS-RORDMZ-DT145 |
rocminfo segfaulted before installed tests |
| 2026-06-09 06:42:45 |
27188655682 |
80263583859 |
failure |
CS-RORDMZ-DT147 |
rocminfo segfaulted before installed tests |
| 2026-06-09 15:43:43 |
27217788198 |
80364536623 |
failure |
CS-RORDMZ-DT145 |
rocminfo segfaulted; workflow continued; 38/66 installed tests failed |
| 2026-06-09 16:39:42 |
27220546268 |
80376180903 |
failure |
CS-RORDMZ-DT145 |
rocminfo segfaulted; workflow continued; 38/66 installed tests failed |
| 2026-06-09 16:54:59 |
27220546396 |
80379157442 |
failure |
CS-RORDMZ-DT145 |
rocminfo segfaulted; workflow continued; 38/66 installed tests failed |
| 2026-06-09 17:00:14 |
27222201440 |
80380071263 |
failure |
CS-RORDMZ-DT145 |
rocminfo segfaulted; workflow continued; 38/66 installed tests failed |
| 2026-06-09 17:04:45 |
27222459403 |
80381012028 |
failure |
CS-RORDMZ-DT147 |
rocminfo segfaulted; workflow continued; 38/66 installed tests failed |
| 2026-06-09 17:07:41 |
27222637646 |
80381605883 |
failure |
CS-RORDMZ-DT145 |
rocminfo segfaulted; workflow continued; 38/66 installed tests failed |
| 2026-06-09 17:18:41 |
27223228760 |
80383743191 |
failure |
CS-RORDMZ-DT145 |
rocminfo segfaulted; workflow continued; 38/66 installed tests failed |
| 2026-06-09 17:27:34 |
27223713966 |
80385468837 |
failure |
CS-RORDMZ-DT147 |
rocminfo segfaulted; workflow continued; 38/66 installed tests failed |
| 2026-06-09 17:29:49 |
27223853693 |
80385861126 |
success due to experimental lane |
CS-RORDMZ-DT147 |
PR #52 canary detected rocminfo segfault and skipped installed tests |
Nearby Non-Matches
These failed gfx1201 jobs in the sampled window did not match the rocminfo
segfault pattern and look like real test/application failures instead of this
runner-health issue:
Request
Please inspect or recycle the affected gpu_navi4x runners:
CS-RORDMZ-DT145
CS-RORDMZ-DT147
The first health check should be running the packaged/public-deps rocminfo
path used by CI:
/home/ubuntu/actions-runner/_work/hrx-system/hrx-system/build/linux-gpu/install/public-deps/bin/rocminfo
gfx1201
gpu_navi4xrunners have broken ROCm environments:rocminfosegfaultsSummary
Several recent
CI CMake Linuxgfx1201 package-test jobs on thegpu_navi4xrunner pool show the same runner-health failure: packaged
rocminforesolvesfrom the installed public dependency package, then immediately segfaults.
This is not a normal test failure. When the workflow temporarily treated
rocminfoas a warning and continued, the installed GPU suite produced a largecascade of process segfaults, usually:
PR #52 changes the workflow to use
rocminfoas a hard ROCm environmentcanary again. While gfx1201 is marked experimental, the job can report the
broken runner and skip the installed suite instead of generating the cascade.
Impacted Runners
Sampled the last 80
CI CMake Linuxworkflow runs on 2026-06-09.Matching
rocminfosegfaults were observed on:gpu_navi4xCS-RORDMZ-DT145gpu_navi4xCS-RORDMZ-DT147No matching
rocminfosegfault was found onCS-RORDMZ-DT143in this sampledwindow. The latest PR #52 gfx1201 job on
CS-RORDMZ-DT143succeeded:rocminfoprinted ROCk/HSA information.100% tests passed, 0 tests failed out of 66.Failure Signature
Pre-PR #52 warning-mode example:
PR #52 canary-mode example:
Matching Jobs
CS-RORDMZ-DT145rocminfosegfaulted before installed testsCS-RORDMZ-DT147rocminfosegfaulted before installed testsCS-RORDMZ-DT145rocminfosegfaulted; workflow continued; 38/66 installed tests failedCS-RORDMZ-DT145rocminfosegfaulted; workflow continued; 38/66 installed tests failedCS-RORDMZ-DT145rocminfosegfaulted; workflow continued; 38/66 installed tests failedCS-RORDMZ-DT145rocminfosegfaulted; workflow continued; 38/66 installed tests failedCS-RORDMZ-DT147rocminfosegfaulted; workflow continued; 38/66 installed tests failedCS-RORDMZ-DT145rocminfosegfaulted; workflow continued; 38/66 installed tests failedCS-RORDMZ-DT145rocminfosegfaulted; workflow continued; 38/66 installed tests failedCS-RORDMZ-DT147rocminfosegfaulted; workflow continued; 38/66 installed tests failedCS-RORDMZ-DT147rocminfosegfault and skipped installed testsNearby Non-Matches
These failed gfx1201 jobs in the sampled window did not match the
rocminfosegfault pattern and look like real test/application failures instead of this
runner-health issue:
Request
Please inspect or recycle the affected
gpu_navi4xrunners:CS-RORDMZ-DT145CS-RORDMZ-DT147The first health check should be running the packaged/public-deps
rocminfopath used by CI: