This tracks the LLVM/clang-side work needed for AMDGPU device code to use
compiler-emitted AddressSanitizer checks with an external device runtime.
The external-runtime model is:
- clang/LLVM inserts ASAN memory access checks into selected GPU device code
- generated code calls the stable
__asan_* hook ABI
- a caller-provided device bitcode library defines those hooks
- the launcher/runtime owns shadow mapping, poison/unpoison policy, and report delivery
- the mode does not depend on the stock ROCm ASAN hostcall reporting path
- the mode does not require XNACK because it is not relying on replayable GPU faults
This is a general infrastructure request for GPU execution environments with
structured allocation ownership. It should not change stock ROCm ASAN behavior.
Reproducers
Standalone reproducers are in this public gist:
https://gist.github.com/benvanik/b01df0c0914e6a5849bc4256b21e69b5
Zip download:
https://gist.github.com/benvanik/b01df0c0914e6a5849bc4256b21e69b5/archive/HEAD.zip
The gist contains:
sample_checked_kernel.hip: a tiny HIP kernel with three byte loads and one
32-bit store.
runtime_bitcode_no_sanitize.c: a tiny external runtime bitcode source with
two support globals and a few ASAN hook definitions.
reproduce.sh: commands that demonstrate the current compiler behavior and
the desired local-only no-sanitize bitcode shape.
README.md: expected outputs and setup notes.
The reproducer uses -nogpulib where possible so the interesting behavior is
compiler policy, LLVM IR composition, and code object metadata rather than a
particular ROCm package layout.
Known Blockers
1. Non-XNACK AMDGPU targets cannot request explicit ASAN checks with an external runtime
For AMDGPU, clang currently treats device AddressSanitizer support as equivalent
to the stock ROCm ASAN runtime model. On a non-XNACK target such as gfx1100,
the driver ignores -fsanitize=address for device code:
clang++ \
--offload-arch=gfx1100 \
--cuda-device-only \
-x hip sample_checked_kernel.hip \
--hip-path="$ROCM_PATH" \
--rocm-path="$ROCM_PATH" \
-nogpulib \
-std=c++17 \
-fsanitize=address \
-fsanitize-stable-abi \
-O1 \
-S -emit-llvm \
-o sample_checked_kernel.stock.ll
Observed warning:
warning: ignoring '-fsanitize=address' option for offload arch 'gfx1100' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead
warning: argument unused during compilation: '-fsanitize=address'
warning: argument unused during compilation: '-fsanitize-stable-abi'
Observed IR contains raw loads/stores and no sanitizer hook calls:
%4 = load i8, ptr addrspace(1) %3, align 1
%7 = load i8, ptr addrspace(1) %6, align 1
%11 = load i8, ptr addrspace(1) %10, align 1
store i32 %13, ptr addrspace(1) %1, align 4
With only the non-XNACK policy gate bypassed in a local clang build, the same
source emits the expected stable hook calls:
call void @__asan_load1(i64 %0)
%1 = load i8, ptr addrspace(1) %arrayidx, align 1
call void @__asan_load1(i64 %2)
%3 = load i8, ptr addrspace(1) %arrayidx3, align 1
call void @__asan_load1(i64 %4)
%5 = load i8, ptr addrspace(1) %arrayidx6, align 1
call void @__asan_store4(i64 %6)
store i32 %add8, ptr addrspace(1) %output.coerce, align 4
That is the key mechanism: the backend can represent and lower this
instrumented code shape for a non-XNACK target. The blocker is policy coupling
to the stock runtime model, not the target's ability to execute explicit shadow
checks.
The XNACK requirement can remain valid for stock ROCm ASAN if that runtime
depends on replayable faults, host-visible fault handling, or unified CPU/GPU
ASAN assumptions. The external mode is a different contract: the launcher/runtime
publishes shadow state explicitly and the compiler inserts explicit checks.
2. External runtime bitcode needs local no-sanitize state without suppressing the consumer module
An external ASAN runtime bitcode module needs two things at once:
- selected support globals/functions are not instrumented
- the linked consumer GPU module remains instrumented
The support globals may have a binary ABI with the launcher/runtime. ASAN global
instrumentation can add padding, ODR indicator symbols, and registration data,
which changes the symbol layout and breaks that ABI.
The support functions implement __asan_* hooks. Recursively instrumenting
those hook implementations is wrong.
Compiling the support runtime normally preserves
disable_sanitizer_instrumentation on functions, but does not preserve
no_sanitize_address on the support globals in emitted IR:
@external_shadow_config = protected addrspace(1) global ..., align 8
@external_feedback_config = protected addrspace(1) global ..., align 8
Linking that normal runtime bitcode into a sanitized GPU module produces final
object symbols like:
B __odr_asan_gen_external_feedback_config
B __odr_asan_gen_external_shadow_config
B external_feedback_config
B external_shadow_config
Those __odr_asan_gen_* symbols show that the support globals were treated as
ASAN globals. That is the wrong result for ABI globals owned by an external
runtime.
Compiling the support runtime through ASAN cc1 flags emits the per-global
markers the support library needs:
@external_shadow_config = protected addrspace(1) global ..., no_sanitize_address, align 8
@external_feedback_config = protected addrspace(1) global ..., no_sanitize_address, align 8
But it also emits a module-wide flag:
!llvm.module.flags = !{!0, !1, !2}
!2 = !{i32 4, !"nosanitize_address", i32 1}
When this bitcode is linked into the sanitized consumer module, clang reports:
warning: Redundant instrumentation detected, with module flag: nosanitize_address
and the final kernel body contains raw loads/stores instead of calls into the
ASAN hook path:
global_load_u8 ...
global_load_u8 ...
global_load_u8 ...
global_store_b32 ...
This is the opposite failure mode: the support globals remain exact, but the
consumer module loses the instrumentation we wanted.
The desired behavior is local-only no-sanitize state:
- keep per-global
no_sanitize_address on selected support globals
- keep
disable_sanitizer_instrumentation on selected support functions
- do not propagate a module-wide
nosanitize_address flag into the linked
consumer module
- keep the consumer module eligible for ASAN instrumentation
The gist script demonstrates this by removing only the module-wide
nosanitize_address flag from the ASAN-compiled support runtime IR and keeping
the per-global attributes. The final object then has exact support globals and
no ODR indicators:
B external_feedback_config
B external_shadow_config
The final kernel still calls sanitizer hooks before memory operations:
s_swappc_b64 ... ; call __asan_load1
global_load_u8 ...
s_swappc_b64 ... ; call __asan_load1
global_load_u8 ...
s_swappc_b64 ... ; call __asan_load1
global_load_u8 ...
s_swappc_b64 ... ; call __asan_store4
global_store_b32 ...
LLVM needs a principled way to build or link external sanitizer runtime bitcode
with item-local sanitizer exclusions but without transitive module-wide
nosanitize_address suppression.
Possible implementation shapes:
- preserve
no_sanitize_address on annotated globals without requiring the
runtime source file to be compiled under whole-module ASAN mode
- or treat external ASAN runtime bitcode as trusted support code, preserve
item-level no_sanitize_address attributes, and drop/ignore the module-wide
nosanitize_address flag for consumer instrumentation decisions
The important distinction is local exclusion vs. transitive exclusion. External
runtime bitcode needs local exclusion.
3. External ASAN should not require the stock hostcall ABI
Current AMDGPU ASAN instrumentation still introduces stock hostcall-related
kernel metadata even when the linked runtime hooks do not use hostcall.
Using the local-only support bitcode shape above, the final raw AMDGPU object
still contains:
.value_kind: hidden_hostcall_buffer
.kernarg_segment_size: 272
.name: sample_checked_kernel
The source kernel only has two explicit pointer arguments. The hidden hostcall
buffer is stock-runtime ABI coupling. It is not needed by an external runtime
that reports through a different device-global, queue, signal, trap, or
launcher-owned feedback mechanism.
The external-runtime mode should separate memory access check insertion from
the stock HIP/ROCm ASAN reporting transport.
In external mode:
- compiler-emitted memory checks still call
__asan_* hooks
- stock hidden hostcall kernel arguments are not added unless explicitly needed
- stock hostcall metadata is not required for the code object to be considered valid
- the external runtime supplies any reporting globals, queues, or signals it needs
Proposed LLVM Capability
Add an explicit external-runtime mode for AMDGPU AddressSanitizer device code.
Flag spelling is open; the semantic shape matters more than the exact name:
-fsanitize=address
-fsanitize-stable-abi
-fgpu-address-sanitizer-runtime=external
or:
-fsanitize=address
-fsanitize-address-gpu-runtime=external
The mode would:
- enable compiler insertion of ASAN memory access checks for AMDGPU device code
- allow non-XNACK targets when the external mode is explicit
- emit calls to the stable
__asan_* hook ABI
- link or otherwise allow a caller-provided device runtime bitcode library
- preserve item-local no-sanitize state in the external runtime bitcode
- avoid stock hostcall hidden kernel arguments when external mode does not use them
- leave stock ROCm ASAN behavior unchanged when external mode is not selected
The runtime bitcode link mechanism should be a supported driver-level path, not
a user-visible dependency on cc1-only -mlink-bitcode-file choreography. The
cc1 flag is useful as a proof mechanism, but an external runtime feature needs
clear driver semantics.
Hook ABI Surface
The minimal reproducer only needs:
__asan_load1
__asan_store4
__asan_init
__asan_version_mismatch_check_v8
__asan_register_elf_globals
__asan_unregister_elf_globals
A complete external runtime must match the actual hook surface emitted by the
selected ASAN lowering mode, including fixed-width loads/stores, variable-width
accesses, report/noabort forms, poison/unpoison helpers, and any device-library
support symbols the frontend references.
This does not necessarily require LLVM to invent a new ABI. The better shape is:
- external mode emits the same stable hook names where possible
- any stock-runtime-only hooks are documented or disabled in external mode
- the selected hook surface is testable without linking the stock runtime
The relationship with -fsanitize-stable-abi should be explicit. External GPU
ASAN likely wants that mode required or implied; unstable private ABI expansion
would make external runtimes brittle.
Non-Goals
This is not a request to change stock ROCm ASAN semantics.
This is not a request to make stock ASAN work without XNACK.
This is not a request for LLVM to standardize one downstream shadow memory
layout, feedback ring, or report packet format.
This is not a request to support arbitrary legacy pointer behavior that only
works with replayable faults. External-runtime ASAN is for launchers/runtimes
that can make allocation ranges and shadow state explicit.
Suggested Acceptance Criteria
Stock behavior remains unchanged without the new external-runtime option.
With the external-runtime option, compiling the sample kernel for gfx1100
with -fsanitize=address -fsanitize-stable-abi emits __asan_load* and
__asan_store* calls instead of warning that ASAN is ignored.
The emitted object can be linked against a caller-provided device runtime that
defines the required hooks.
Support bitcode can keep selected globals/functions uninstrumented without
disabling consumer instrumentation.
External mode does not add stock hostcall hidden kernel arguments when the
external runtime does not request them.
LLVM tests cover this with generic HIP/C/LLVM IR inputs independent of any
downstream runtime source tree.
Current Confidence
Directly observed:
- stock non-XNACK AMDGPU clang ignores device ASAN
- bypassing only that policy gate causes the sample kernel to emit
__asan hooks
- normal support bitcode loses per-global
no_sanitize_address
- ASAN-compiled support bitcode preserves per-global
no_sanitize_address but
emits a transitive module flag
- removing only the module-wide flag gives the desired final object shape
- current ASAN lowering still emits
hidden_hostcall_buffer metadata
Inferred:
- an upstream external-runtime mode can be mostly policy/linkage-shaped
- the backend does not need a new lowering strategy for explicit check calls
- stock ASAN can remain unchanged
Evidence that would change this conclusion:
- AMDGPU backend lowering failures for real external-runtime hook bodies
- code object loader requirements that make
hidden_hostcall_buffer mandatory
for any ASAN-marked kernel
- an LLVM module-flag reason that prevents local-only support bitcode semantics
None of those appeared in the reproducers above.
This tracks the LLVM/clang-side work needed for AMDGPU device code to use
compiler-emitted AddressSanitizer checks with an external device runtime.
The external-runtime model is:
__asan_*hook ABIThis is a general infrastructure request for GPU execution environments with
structured allocation ownership. It should not change stock ROCm ASAN behavior.
Reproducers
Standalone reproducers are in this public gist:
https://gist.github.com/benvanik/b01df0c0914e6a5849bc4256b21e69b5
Zip download:
https://gist.github.com/benvanik/b01df0c0914e6a5849bc4256b21e69b5/archive/HEAD.zip
The gist contains:
sample_checked_kernel.hip: a tiny HIP kernel with three byte loads and one32-bit store.
runtime_bitcode_no_sanitize.c: a tiny external runtime bitcode source withtwo support globals and a few ASAN hook definitions.
reproduce.sh: commands that demonstrate the current compiler behavior andthe desired local-only no-sanitize bitcode shape.
README.md: expected outputs and setup notes.The reproducer uses
-nogpulibwhere possible so the interesting behavior iscompiler policy, LLVM IR composition, and code object metadata rather than a
particular ROCm package layout.
Known Blockers
1. Non-XNACK AMDGPU targets cannot request explicit ASAN checks with an external runtime
For AMDGPU, clang currently treats device AddressSanitizer support as equivalent
to the stock ROCm ASAN runtime model. On a non-XNACK target such as
gfx1100,the driver ignores
-fsanitize=addressfor device code:Observed warning:
Observed IR contains raw loads/stores and no sanitizer hook calls:
With only the non-XNACK policy gate bypassed in a local clang build, the same
source emits the expected stable hook calls:
That is the key mechanism: the backend can represent and lower this
instrumented code shape for a non-XNACK target. The blocker is policy coupling
to the stock runtime model, not the target's ability to execute explicit shadow
checks.
The XNACK requirement can remain valid for stock ROCm ASAN if that runtime
depends on replayable faults, host-visible fault handling, or unified CPU/GPU
ASAN assumptions. The external mode is a different contract: the launcher/runtime
publishes shadow state explicitly and the compiler inserts explicit checks.
2. External runtime bitcode needs local no-sanitize state without suppressing the consumer module
An external ASAN runtime bitcode module needs two things at once:
The support globals may have a binary ABI with the launcher/runtime. ASAN global
instrumentation can add padding, ODR indicator symbols, and registration data,
which changes the symbol layout and breaks that ABI.
The support functions implement
__asan_*hooks. Recursively instrumentingthose hook implementations is wrong.
Compiling the support runtime normally preserves
disable_sanitizer_instrumentationon functions, but does not preserveno_sanitize_addresson the support globals in emitted IR:Linking that normal runtime bitcode into a sanitized GPU module produces final
object symbols like:
Those
__odr_asan_gen_*symbols show that the support globals were treated asASAN globals. That is the wrong result for ABI globals owned by an external
runtime.
Compiling the support runtime through ASAN cc1 flags emits the per-global
markers the support library needs:
But it also emits a module-wide flag:
When this bitcode is linked into the sanitized consumer module, clang reports:
and the final kernel body contains raw loads/stores instead of calls into the
ASAN hook path:
This is the opposite failure mode: the support globals remain exact, but the
consumer module loses the instrumentation we wanted.
The desired behavior is local-only no-sanitize state:
no_sanitize_addresson selected support globalsdisable_sanitizer_instrumentationon selected support functionsnosanitize_addressflag into the linkedconsumer module
The gist script demonstrates this by removing only the module-wide
nosanitize_addressflag from the ASAN-compiled support runtime IR and keepingthe per-global attributes. The final object then has exact support globals and
no ODR indicators:
The final kernel still calls sanitizer hooks before memory operations:
LLVM needs a principled way to build or link external sanitizer runtime bitcode
with item-local sanitizer exclusions but without transitive module-wide
nosanitize_addresssuppression.Possible implementation shapes:
no_sanitize_addresson annotated globals without requiring theruntime source file to be compiled under whole-module ASAN mode
item-level
no_sanitize_addressattributes, and drop/ignore the module-widenosanitize_addressflag for consumer instrumentation decisionsThe important distinction is local exclusion vs. transitive exclusion. External
runtime bitcode needs local exclusion.
3. External ASAN should not require the stock hostcall ABI
Current AMDGPU ASAN instrumentation still introduces stock hostcall-related
kernel metadata even when the linked runtime hooks do not use hostcall.
Using the local-only support bitcode shape above, the final raw AMDGPU object
still contains:
The source kernel only has two explicit pointer arguments. The hidden hostcall
buffer is stock-runtime ABI coupling. It is not needed by an external runtime
that reports through a different device-global, queue, signal, trap, or
launcher-owned feedback mechanism.
The external-runtime mode should separate memory access check insertion from
the stock HIP/ROCm ASAN reporting transport.
In external mode:
__asan_*hooksProposed LLVM Capability
Add an explicit external-runtime mode for AMDGPU AddressSanitizer device code.
Flag spelling is open; the semantic shape matters more than the exact name:
or:
The mode would:
__asan_*hook ABIThe runtime bitcode link mechanism should be a supported driver-level path, not
a user-visible dependency on cc1-only
-mlink-bitcode-filechoreography. Thecc1 flag is useful as a proof mechanism, but an external runtime feature needs
clear driver semantics.
Hook ABI Surface
The minimal reproducer only needs:
A complete external runtime must match the actual hook surface emitted by the
selected ASAN lowering mode, including fixed-width loads/stores, variable-width
accesses, report/noabort forms, poison/unpoison helpers, and any device-library
support symbols the frontend references.
This does not necessarily require LLVM to invent a new ABI. The better shape is:
The relationship with
-fsanitize-stable-abishould be explicit. External GPUASAN likely wants that mode required or implied; unstable private ABI expansion
would make external runtimes brittle.
Non-Goals
This is not a request to change stock ROCm ASAN semantics.
This is not a request to make stock ASAN work without XNACK.
This is not a request for LLVM to standardize one downstream shadow memory
layout, feedback ring, or report packet format.
This is not a request to support arbitrary legacy pointer behavior that only
works with replayable faults. External-runtime ASAN is for launchers/runtimes
that can make allocation ranges and shadow state explicit.
Suggested Acceptance Criteria
Stock behavior remains unchanged without the new external-runtime option.
With the external-runtime option, compiling the sample kernel for
gfx1100with
-fsanitize=address -fsanitize-stable-abiemits__asan_load*and__asan_store*calls instead of warning that ASAN is ignored.The emitted object can be linked against a caller-provided device runtime that
defines the required hooks.
Support bitcode can keep selected globals/functions uninstrumented without
disabling consumer instrumentation.
External mode does not add stock hostcall hidden kernel arguments when the
external runtime does not request them.
LLVM tests cover this with generic HIP/C/LLVM IR inputs independent of any
downstream runtime source tree.
Current Confidence
Directly observed:
__asanhooksno_sanitize_addressno_sanitize_addressbutemits a transitive module flag
hidden_hostcall_buffermetadataInferred:
Evidence that would change this conclusion:
hidden_hostcall_buffermandatoryfor any ASAN-marked kernel
None of those appeared in the reproducers above.