Add GPU memory pressure notification and multi-level watermark eviction#3
Open
zyt600 wants to merge 25 commits into
Open
Add GPU memory pressure notification and multi-level watermark eviction#3zyt600 wants to merge 25 commits into
zyt600 wants to merge 25 commits into
Conversation
Define UVM_WAIT_EVICTION_NOTICE and its userspace parameter structure to expose eviction target memory through UVM ioctl.
Implement gvm_send_eviction_notice and gvm_wait_eviction_notice to deliver per-GPU eviction signals via spinlock-protected mailbox. Add uuid field to eviction_notice struct and change the ioctl param from IN-reserved to OUT so userspace receives the GPU identity.
When GPU memory usage exceeds the high watermark during chunk allocation, notify all processes proportionally to shrink their memory down to the low watermark target.
… debugfs filename
bytes_to_reclaim * process_current overflows NvU64 when both values are in the ~40GB range, producing a truncated quotient that makes process_target nearly equal to process_current instead of the intended low_watermark (85%) target. Use mul_u64_u64_div_u64() which computes the full 128-bit intermediate product before dividing.
numFreePages64k and numFreePages2m describe the same physical memory at different granularities (one 2MB page = 32 x 64KB frames). Adding both caused available_bytes to be ~2x the actual free memory, making the high watermark trigger much later than configured. Use only numFreePages64k for the available memory calculation.
Reject setting high_watermark below low_watermark. Previously only the low side checked against high, so writing a small high value could create an invalid high < low state. Also fix the range check to use == 0 instead of <= 0 for unsigned int.
When GPU memory exceeds the high watermark, processes are notified to shrink voluntarily. After a configurable grace period, a delayed work item force-shrinks any process that has not complied. Notification frequency is throttled at the global level to avoid redundant work. Both grace_period_ms and notify_throttle_ms are exposed via debugfs with cross-validation (grace < throttle).
Add a new UVM_WAIT_AVAILABILITY_NOTICE ioctl that lets userspace block until GPU memory crosses back above the low watermark after eviction. Unify shrink and availability notification throttling into a single shared atomic_long_t timestamp with cmpxchg for concurrency safety. - uvm_va_space.h: add availability mailbox struct - uvm_va_space.c: initialize availability state in va_space_create - uvm_ioctl.h: define UVM_WAIT_AVAILABILITY_NOTICE ioctl and params - uvm.c: route the new ioctl to gvm_wait_availability_notice - gvm_debugfs.h: declare send/notify/wait availability functions - gvm_debugfs.c: implement availability send/notify/wait, replace per-function static throttle timestamps with a shared atomic_long_t using cmpxchg, remove redundant gvm_eviction_in_progress flag - uvm_pmm_gpu.c: on chunk free, check if available memory exceeds the low watermark and fire the availability notification
current_memory only counted physical GPU memory (memory_current), causing userspace to underestimate pressure and continue allocating. Add memory_swap_current so the reported usage reflects the true footprint.
Mirrors the existing memory.limit.high interface to provide a lower memory protection threshold. Also renames the backing field from memory_limit to memory_limit_high for naming consistency.
Add per-process per-GPU memory.limit.min interface mirroring memory.limit.low. Validate on write that high >= low >= min, returning -EINVAL on violation.
Two-phase reclaim in gvm_notify_all_processes_to_shrink: first reclaim above low proportionally, then dip into low-to-min range only if needed. Clamp force_shrink target to memory_limit_min. In pick_and_evict_root_chunk, skip chunks belonging to processes already at min (with retry up to root_chunks.count), and rename memory_limit to memory_limit_high. Clarify target_memory/current_memory semantics across the eviction notice chain.
Move current_physical_mem == 0 check before limit/reclaim calculations to avoid unnecessary work. Add reclaim_physical_mem == 0 check to skip processes that have nothing to reclaim.
When a block is killed (cudaFree / VA space teardown), pages that were evicted from GPU to CPU had their memory.swap.current charge but never got uncharged, causing the swap counter to leak indefinitely. Uncharge swap for all evicted pages before block_destroy_gpu_state clears the evicted mask.
Switch to a single wait ioctl and shared wait queue so user space can block on one notification channel while preserving typed payloads.
Use an atomic counter (notice_listener_count) incremented/decremented around wait_event_interruptible in gvm_wait_notice to track the number of programs actively waiting for notices. In broadcast_availability, divide available_bytes by this count so each listener is notified with its fair share instead of the full free memory amount.
When GPU memory utilization exceeds the high watermark, evict to the average of high and low watermarks instead of all the way down to the low watermark, reducing unnecessary over-eviction.
The _low and _min variants already have explicit suffixes; align the _high variant to the same naming convention.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a kernel-to-userspace GPU memory pressure notification mechanism with multi-level watermark-based eviction control, enabling cooperative memory management between the kernel driver and user-space applications.
Key changes
memory.limitwith three cgroup-style controls —memory.limit.high,memory.limit.low, andmemory.limit.minUVM_WAIT_NOTICEioctl. A force-shrink delayed work fallback ensures reclaim happens even if user-space does not respond in time.block_kill