Skip to content

rocr: Close shareable dmabuf fds after use instead of holding them open#7786

Open
dayatsin-amd wants to merge 1 commit into
developfrom
users/dayatsin/close-dmabuf-handles
Open

rocr: Close shareable dmabuf fds after use instead of holding them open#7786
dayatsin-amd wants to merge 1 commit into
developfrom
users/dayatsin/close-dmabuf-handles

Conversation

@dayatsin-amd

Copy link
Copy Markdown
Contributor

Summary

  • Stop holding a shareable dmabuf_fd open for the entire lifetime of a VMM shareable handle; CreateShareableHandle now leaves dmabuf_fd as -1 after converting the allocation to a driver handle.
  • Lazily export a dmabuf fd in VMemorySetAccessPerHandle when peer import needs it, then close it again via a scope guard before returning.
  • Pass DriverMemoryHandle* through ImportMemoryHandle (reading .dmabuf_fd or .fabric_handle) and close fds in MemoryHandle destructor / DestroyMemoryHandle.

Fabric-handle export/import is unchanged and continues to use the internal BO handle path.

Test plan

  • Built libhsa-runtime64 on S83-1 (MI300X) against TheRock toolchain
  • Unit_hipMemExportToShareableHandle_Positive_Basic — pass
  • Unit_hipMemImportFromShareableHandle_Positive_Basic — pass
  • Full VirtualMemoryManagementTest suite (85 cases, HIP_VISIBLE_DEVICES=0): 69 passed / 6 failed / 10 skipped
  • A/B vs pre-patch baseline ROCr: same 6 failures in full suite; all 6 previously-failing tests pass in isolation on both builds — failures are not regressions from this change (suite-order / VA exhaustion)

Made with Cursor

Defer creating the shareable dmabuf_fd until VMemorySetAccessPerHandle
needs it for peer import, then close it again via a scope guard. This
avoids leaking open fds for the lifetime of a VMM shareable handle while
keeping fabric-handle export/import on the existing BO-based path.

Co-authored-by: Cursor <cursoragent@cursor.com>

@cfreeamd cfreeamd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any big problems, but have a few suggestions for consideration, but nothing blocking.

@@ -4148,6 +4153,11 @@ Runtime::MemoryHandle::MemoryHandle(hsa_fabric_handle_t fabric_handle)
Runtime::MemoryHandle::~MemoryHandle() {
if (driver_handle.handle != 0 && region != nullptr)
agentOwner()->driver().DestroyMemoryHandle(&driver_handle);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion from Claude related to lines 4155 and the block at 4157. In a nutshell, it's saying if we derive from the base class in the future, we may miss doing the clean up that is happening here.

~MemoryHandle dual-cleanup coupling — runtime.cpp:4153

The destructor now has two cleanup sites:
if (driver_handle.handle != 0 && region != nullptr)
agentOwner()->driver().DestroyMemoryHandle(&driver_handle); // (A)

if (driver_handle.dmabuf_fd >= 0) { // (B)
os::DmaBufClose(driver_handle.dmabuf_fd);
driver_handle.dmabuf_fd = -1;
}

This is safe today because KfdDriver::DestroyMemoryHandle now zeroes dmabuf_fd before returning (added in this PR), so block (B) won't fire for
owned handles. But the two cleanup sites are coupled through an informal convention: any future Driver subclass that implements
DestroyMemoryHandle without zeroing dmabuf_fd will silently double-close the fd — a latent fd-aliasing bug with no compiler enforcement.

Suggestion: Either make the base class zero dmabuf_fd after the virtual call, or merge the dmabuf_fd close into DestroyMemoryHandle entirely and
remove block (B), with a brief comment explaining the imported-handle path.

* holding an fd open for the lifetime of the handle. Export it lazily here so the target agents
* can import the memory below, then close it again before returning.
*/
bool created_dmabuf_fd = false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how often a single MemoryHandle is mapped across multiple VA chunks--if not often, then this probably isn't worth worrying about, but here's Claude feedback on this:

[MINOR] Per-chunk re-export amplification — runtime.cpp:4166

When a single MemoryHandle is mapped across multiple VA chunks (e.g., the same handle mapped twice via VMemoryHandleMap), VMemorySetAccess calls
VMemorySetAccessPerHandle once per chunk. Each call finds dmabuf_fd == -1 (the scope guard reset it) and does a fresh ExportMemoryHandle → kernel
ioctl. Cost becomes O(chunks × agents) rather than O(agents). Not a correctness issue, but worth lifting the lazy-export out to the per-handle
level in VMemorySetAccess if this path gets hot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants