rocr: Drain ASAN quarantine on runtime teardown#7764
Open
ApurvMishra-amd wants to merge 2 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Drains the AddressSanitizer (AMDGPU) device allocator quarantine during ROCr runtime teardown to prevent stale/dangling device chunks from surviving past Runtime::Unload(), which can later trip hsa_amd_pointer_info().
Changes:
- Forward-declare
__sanitizer_purge_allocator()behindSANITIZER_AMDGPU. - Invoke
__sanitizer_purge_allocator()inRuntime::Release()immediately beforeUnload().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Call __sanitizer_purge_allocator() in Runtime::Release() before Unload() so the ASAN quarantine is drained while device memory is still mapped. Guarded by SANITIZER_AMDGPU. Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Drain the ASAN quarantine with __sanitizer_purge_allocator(), so the deferred release completes before each check (ROCRTST_ASAN-guarded). The recursive and double-free notifier callbacks run while the quarantine is already being drained, and the drain cannot be invoked again from inside itself, so their inner repeated callback checks are skipped under ASAN. The outer notifier behavior is still validated. Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
6854609 to
3cbda99
Compare
dayatsin-amd
approved these changes
Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Drain ASan quarantine on runtime teardown to avoid stale device chunks
Motivation
On full teardown,
Unload()unmaps ROCr device memory still held in the ASANdevice allocator's quarantine, leaving dangling chunks in the sanitizer's
process-global allocator. A later
hsa_amd_pointer_info()then reads anuninitialized chunk header and aborts in
DeviceAllocatorT::GetBlockBegin.Under ASAN,
hsa_amd_memory_pool_freedoes not release the memory immediately.The sanitizer holds it in a quarantine and performs the real release (and runs
ROCr's deallocation notifier) later. Tests that check the result of a free
right after calling it therefore fail:
Memory_Availablesees the freed VRAM asstill in use, and
Deallocation_Notifiersees its callback as not yet run.Technical Details
Call
__sanitizer_purge_allocator()inRuntime::Release()beforeUnload()sothe quarantine is drained while device memory is still mapped. Guarded by
SANITIZER_AMDGPU.Drain the quarantine with __sanitizer_purge_allocator() so the deferred release
completes before each check (ROCRTST_ASAN-guarded; no-op otherwise).
The recursive and double-free notifier callbacks run while the quarantine is
already being drained, and the drain cannot be invoked again from inside
itself, so their inner same-callback checks are skipped under ASAN. The outer
notifier behavior is still validated.
JIRA ID
ROCM-26384, 26746
Test Plan
Running full rocrtst suite.