[LLVM][AMDGPU] Track external-runtime AddressSanitizer support for GPU device code

This tracks the LLVM/clang-side work needed for AMDGPU device code to use
compiler-emitted AddressSanitizer checks with an external device runtime.

The external-runtime model is:

- clang/LLVM inserts ASAN memory access checks into selected GPU device code
- generated code calls the stable `__asan_*` hook ABI
- a caller-provided device bitcode library defines those hooks
- the launcher/runtime owns shadow mapping, poison/unpoison policy, and report delivery
- the mode does not depend on the stock ROCm ASAN hostcall reporting path
- the mode does not require XNACK because it is not relying on replayable GPU faults

This is a general infrastructure request for GPU execution environments with
structured allocation ownership. It should not change stock ROCm ASAN behavior.

## Reproducers

Standalone reproducers are in this public gist:

https://gist.github.com/benvanik/b01df0c0914e6a5849bc4256b21e69b5

Zip download:

https://gist.github.com/benvanik/b01df0c0914e6a5849bc4256b21e69b5/archive/HEAD.zip

The gist contains:

- `sample_checked_kernel.hip`: a tiny HIP kernel with three byte loads and one
  32-bit store.
- `runtime_bitcode_no_sanitize.c`: a tiny external runtime bitcode source with
  two support globals and a few ASAN hook definitions.
- `reproduce.sh`: commands that demonstrate the current compiler behavior and
  the desired local-only no-sanitize bitcode shape.
- `README.md`: expected outputs and setup notes.

The reproducer uses `-nogpulib` where possible so the interesting behavior is
compiler policy, LLVM IR composition, and code object metadata rather than a
particular ROCm package layout.

## Known Blockers

### 1. Non-XNACK AMDGPU targets cannot request explicit ASAN checks with an external runtime

For AMDGPU, clang currently treats device AddressSanitizer support as equivalent
to the stock ROCm ASAN runtime model. On a non-XNACK target such as `gfx1100`,
the driver ignores `-fsanitize=address` for device code:

```bash
clang++ \
  --offload-arch=gfx1100 \
  --cuda-device-only \
  -x hip sample_checked_kernel.hip \
  --hip-path="$ROCM_PATH" \
  --rocm-path="$ROCM_PATH" \
  -nogpulib \
  -std=c++17 \
  -fsanitize=address \
  -fsanitize-stable-abi \
  -O1 \
  -S -emit-llvm \
  -o sample_checked_kernel.stock.ll
```

Observed warning:

```text
warning: ignoring '-fsanitize=address' option for offload arch 'gfx1100' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead
warning: argument unused during compilation: '-fsanitize=address'
warning: argument unused during compilation: '-fsanitize-stable-abi'
```

Observed IR contains raw loads/stores and no sanitizer hook calls:

```llvm
%4 = load i8, ptr addrspace(1) %3, align 1
%7 = load i8, ptr addrspace(1) %6, align 1
%11 = load i8, ptr addrspace(1) %10, align 1
store i32 %13, ptr addrspace(1) %1, align 4
```

With only the non-XNACK policy gate bypassed in a local clang build, the same
source emits the expected stable hook calls:

```llvm
call void @__asan_load1(i64 %0)
%1 = load i8, ptr addrspace(1) %arrayidx, align 1
call void @__asan_load1(i64 %2)
%3 = load i8, ptr addrspace(1) %arrayidx3, align 1
call void @__asan_load1(i64 %4)
%5 = load i8, ptr addrspace(1) %arrayidx6, align 1
call void @__asan_store4(i64 %6)
store i32 %add8, ptr addrspace(1) %output.coerce, align 4
```

That is the key mechanism: the backend can represent and lower this
instrumented code shape for a non-XNACK target. The blocker is policy coupling
to the stock runtime model, not the target's ability to execute explicit shadow
checks.

The XNACK requirement can remain valid for stock ROCm ASAN if that runtime
depends on replayable faults, host-visible fault handling, or unified CPU/GPU
ASAN assumptions. The external mode is a different contract: the launcher/runtime
publishes shadow state explicitly and the compiler inserts explicit checks.

### 2. External runtime bitcode needs local no-sanitize state without suppressing the consumer module

An external ASAN runtime bitcode module needs two things at once:

- selected support globals/functions are not instrumented
- the linked consumer GPU module remains instrumented

The support globals may have a binary ABI with the launcher/runtime. ASAN global
instrumentation can add padding, ODR indicator symbols, and registration data,
which changes the symbol layout and breaks that ABI.

The support functions implement `__asan_*` hooks. Recursively instrumenting
those hook implementations is wrong.

Compiling the support runtime normally preserves
`disable_sanitizer_instrumentation` on functions, but does not preserve
`no_sanitize_address` on the support globals in emitted IR:

```llvm
@external_shadow_config = protected addrspace(1) global ..., align 8
@external_feedback_config = protected addrspace(1) global ..., align 8
```

Linking that normal runtime bitcode into a sanitized GPU module produces final
object symbols like:

```text
B __odr_asan_gen_external_feedback_config
B __odr_asan_gen_external_shadow_config
B external_feedback_config
B external_shadow_config
```

Those `__odr_asan_gen_*` symbols show that the support globals were treated as
ASAN globals. That is the wrong result for ABI globals owned by an external
runtime.

Compiling the support runtime through ASAN cc1 flags emits the per-global
markers the support library needs:

```llvm
@external_shadow_config = protected addrspace(1) global ..., no_sanitize_address, align 8
@external_feedback_config = protected addrspace(1) global ..., no_sanitize_address, align 8
```

But it also emits a module-wide flag:

```llvm
!llvm.module.flags = !{!0, !1, !2}
!2 = !{i32 4, !"nosanitize_address", i32 1}
```

When this bitcode is linked into the sanitized consumer module, clang reports:

```text
warning: Redundant instrumentation detected, with module flag: nosanitize_address
```

and the final kernel body contains raw loads/stores instead of calls into the
ASAN hook path:

```text
global_load_u8 ...
global_load_u8 ...
global_load_u8 ...
global_store_b32 ...
```

This is the opposite failure mode: the support globals remain exact, but the
consumer module loses the instrumentation we wanted.

The desired behavior is local-only no-sanitize state:

- keep per-global `no_sanitize_address` on selected support globals
- keep `disable_sanitizer_instrumentation` on selected support functions
- do not propagate a module-wide `nosanitize_address` flag into the linked
  consumer module
- keep the consumer module eligible for ASAN instrumentation

The gist script demonstrates this by removing only the module-wide
`nosanitize_address` flag from the ASAN-compiled support runtime IR and keeping
the per-global attributes. The final object then has exact support globals and
no ODR indicators:

```text
B external_feedback_config
B external_shadow_config
```

The final kernel still calls sanitizer hooks before memory operations:

```text
s_swappc_b64 ...   ; call __asan_load1
global_load_u8 ...
s_swappc_b64 ...   ; call __asan_load1
global_load_u8 ...
s_swappc_b64 ...   ; call __asan_load1
global_load_u8 ...
s_swappc_b64 ...   ; call __asan_store4
global_store_b32 ...
```

LLVM needs a principled way to build or link external sanitizer runtime bitcode
with item-local sanitizer exclusions but without transitive module-wide
`nosanitize_address` suppression.

Possible implementation shapes:

- preserve `no_sanitize_address` on annotated globals without requiring the
  runtime source file to be compiled under whole-module ASAN mode
- or treat external ASAN runtime bitcode as trusted support code, preserve
  item-level `no_sanitize_address` attributes, and drop/ignore the module-wide
  `nosanitize_address` flag for consumer instrumentation decisions

The important distinction is local exclusion vs. transitive exclusion. External
runtime bitcode needs local exclusion.

### 3. External ASAN should not require the stock hostcall ABI

Current AMDGPU ASAN instrumentation still introduces stock hostcall-related
kernel metadata even when the linked runtime hooks do not use hostcall.

Using the local-only support bitcode shape above, the final raw AMDGPU object
still contains:

```text
.value_kind: hidden_hostcall_buffer
.kernarg_segment_size: 272
.name: sample_checked_kernel
```

The source kernel only has two explicit pointer arguments. The hidden hostcall
buffer is stock-runtime ABI coupling. It is not needed by an external runtime
that reports through a different device-global, queue, signal, trap, or
launcher-owned feedback mechanism.

The external-runtime mode should separate memory access check insertion from
the stock HIP/ROCm ASAN reporting transport.

In external mode:

- compiler-emitted memory checks still call `__asan_*` hooks
- stock hidden hostcall kernel arguments are not added unless explicitly needed
- stock hostcall metadata is not required for the code object to be considered valid
- the external runtime supplies any reporting globals, queues, or signals it needs

## Proposed LLVM Capability

Add an explicit external-runtime mode for AMDGPU AddressSanitizer device code.
Flag spelling is open; the semantic shape matters more than the exact name:

```text
-fsanitize=address
-fsanitize-stable-abi
-fgpu-address-sanitizer-runtime=external
```

or:

```text
-fsanitize=address
-fsanitize-address-gpu-runtime=external
```

The mode would:

- enable compiler insertion of ASAN memory access checks for AMDGPU device code
- allow non-XNACK targets when the external mode is explicit
- emit calls to the stable `__asan_*` hook ABI
- link or otherwise allow a caller-provided device runtime bitcode library
- preserve item-local no-sanitize state in the external runtime bitcode
- avoid stock hostcall hidden kernel arguments when external mode does not use them
- leave stock ROCm ASAN behavior unchanged when external mode is not selected

The runtime bitcode link mechanism should be a supported driver-level path, not
a user-visible dependency on cc1-only `-mlink-bitcode-file` choreography. The
cc1 flag is useful as a proof mechanism, but an external runtime feature needs
clear driver semantics.

## Hook ABI Surface

The minimal reproducer only needs:

```text
__asan_load1
__asan_store4
__asan_init
__asan_version_mismatch_check_v8
__asan_register_elf_globals
__asan_unregister_elf_globals
```

A complete external runtime must match the actual hook surface emitted by the
selected ASAN lowering mode, including fixed-width loads/stores, variable-width
accesses, report/noabort forms, poison/unpoison helpers, and any device-library
support symbols the frontend references.

This does not necessarily require LLVM to invent a new ABI. The better shape is:

- external mode emits the same stable hook names where possible
- any stock-runtime-only hooks are documented or disabled in external mode
- the selected hook surface is testable without linking the stock runtime

The relationship with `-fsanitize-stable-abi` should be explicit. External GPU
ASAN likely wants that mode required or implied; unstable private ABI expansion
would make external runtimes brittle.

## Non-Goals

This is not a request to change stock ROCm ASAN semantics.

This is not a request to make stock ASAN work without XNACK.

This is not a request for LLVM to standardize one downstream shadow memory
layout, feedback ring, or report packet format.

This is not a request to support arbitrary legacy pointer behavior that only
works with replayable faults. External-runtime ASAN is for launchers/runtimes
that can make allocation ranges and shadow state explicit.

## Suggested Acceptance Criteria

Stock behavior remains unchanged without the new external-runtime option.

With the external-runtime option, compiling the sample kernel for `gfx1100`
with `-fsanitize=address -fsanitize-stable-abi` emits `__asan_load*` and
`__asan_store*` calls instead of warning that ASAN is ignored.

The emitted object can be linked against a caller-provided device runtime that
defines the required hooks.

Support bitcode can keep selected globals/functions uninstrumented without
disabling consumer instrumentation.

External mode does not add stock hostcall hidden kernel arguments when the
external runtime does not request them.

LLVM tests cover this with generic HIP/C/LLVM IR inputs independent of any
downstream runtime source tree.

## Current Confidence

Directly observed:

- stock non-XNACK AMDGPU clang ignores device ASAN
- bypassing only that policy gate causes the sample kernel to emit `__asan` hooks
- normal support bitcode loses per-global `no_sanitize_address`
- ASAN-compiled support bitcode preserves per-global `no_sanitize_address` but
  emits a transitive module flag
- removing only the module-wide flag gives the desired final object shape
- current ASAN lowering still emits `hidden_hostcall_buffer` metadata

Inferred:

- an upstream external-runtime mode can be mostly policy/linkage-shaped
- the backend does not need a new lowering strategy for explicit check calls
- stock ASAN can remain unchanged

Evidence that would change this conclusion:

- AMDGPU backend lowering failures for real external-runtime hook bodies
- code object loader requirements that make `hidden_hostcall_buffer` mandatory
  for any ASAN-marked kernel
- an LLVM module-flag reason that prevents local-only support bitcode semantics

None of those appeared in the reproducers above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLVM][AMDGPU] Track external-runtime AddressSanitizer support for GPU device code #88

Reproducers

Known Blockers

1. Non-XNACK AMDGPU targets cannot request explicit ASAN checks with an external runtime

2. External runtime bitcode needs local no-sanitize state without suppressing the consumer module

3. External ASAN should not require the stock hostcall ABI

Proposed LLVM Capability

Hook ABI Surface

Non-Goals

Suggested Acceptance Criteria

Current Confidence

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[LLVM][AMDGPU] Track external-runtime AddressSanitizer support for GPU device code #88

Description

Reproducers

Known Blockers

1. Non-XNACK AMDGPU targets cannot request explicit ASAN checks with an external runtime

2. External runtime bitcode needs local no-sanitize state without suppressing the consumer module

3. External ASAN should not require the stock hostcall ABI

Proposed LLVM Capability

Hook ABI Surface

Non-Goals

Suggested Acceptance Criteria

Current Confidence

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions