Skip to content

Add GPU memory pressure notification and multi-level watermark eviction#3

Open
zyt600 wants to merge 25 commits into
ovg-project:mainfrom
zyt600:main
Open

Add GPU memory pressure notification and multi-level watermark eviction#3
zyt600 wants to merge 25 commits into
ovg-project:mainfrom
zyt600:main

Conversation

@zyt600

@zyt600 zyt600 commented Apr 8, 2026

Copy link
Copy Markdown

Summary

Add a kernel-to-userspace GPU memory pressure notification mechanism with multi-level watermark-based eviction control, enabling cooperative memory management between the kernel driver and user-space applications.

Key changes

  • Multi-level memory limits: Replace the single memory.limit with three cgroup-style controls — memory.limit.high, memory.limit.low, and memory.limit.min
  • Eviction notice: When GPU memory pressure exceeds the high watermark, the kernel computes a proportional per-process reclaim target and sends an eviction notice to user-space via a new UVM_WAIT_NOTICE ioctl. A force-shrink delayed work fallback ensures reclaim happens even if user-space does not respond in time.
  • Availability notice: When there is lots of GPU memory, the kernel broadcasts an availability notice (available memory, split equally among listeners) so that waiting processes can opportunistically allocate.
  • Bug fixes during development: Fixed swap charge leak on block_kill

Define UVM_WAIT_EVICTION_NOTICE and its userspace parameter structure to expose eviction target memory through UVM ioctl.
Implement gvm_send_eviction_notice and gvm_wait_eviction_notice to
deliver per-GPU eviction signals via spinlock-protected mailbox.
Add uuid field to eviction_notice struct and change the ioctl param
from IN-reserved to OUT so userspace receives the GPU identity.
When GPU memory usage exceeds the high watermark during chunk
allocation, notify all processes proportionally to shrink their
memory down to the low watermark target.
bytes_to_reclaim * process_current overflows NvU64 when both values
are in the ~40GB range, producing a truncated quotient that makes
process_target nearly equal to process_current instead of the
intended low_watermark (85%) target.

Use mul_u64_u64_div_u64() which computes the full 128-bit
intermediate product before dividing.
numFreePages64k and numFreePages2m describe the same physical memory at
different granularities (one 2MB page = 32 x 64KB frames). Adding both
caused available_bytes to be ~2x the actual free memory, making the
high watermark trigger much later than configured.

Use only numFreePages64k for the available memory calculation.
Reject setting high_watermark below low_watermark. Previously only the
low side checked against high, so writing a small high value could
create an invalid high < low state. Also fix the range check to use
== 0 instead of <= 0 for unsigned int.
When GPU memory exceeds the high watermark, processes are notified to
shrink voluntarily. After a configurable grace period, a delayed work
item force-shrinks any process that has not complied. Notification
frequency is throttled at the global level to avoid redundant work.

Both grace_period_ms and notify_throttle_ms are exposed via debugfs
with cross-validation (grace < throttle).
Add a new UVM_WAIT_AVAILABILITY_NOTICE ioctl that lets userspace block
until GPU memory crosses back above the low watermark after eviction.
Unify shrink and availability notification throttling into a single
shared atomic_long_t timestamp with cmpxchg for concurrency safety.

- uvm_va_space.h: add availability mailbox struct
- uvm_va_space.c: initialize availability state in va_space_create
- uvm_ioctl.h: define UVM_WAIT_AVAILABILITY_NOTICE ioctl and params
- uvm.c: route the new ioctl to gvm_wait_availability_notice
- gvm_debugfs.h: declare send/notify/wait availability functions
- gvm_debugfs.c: implement availability send/notify/wait, replace
  per-function static throttle timestamps with a shared atomic_long_t
  using cmpxchg, remove redundant gvm_eviction_in_progress flag
- uvm_pmm_gpu.c: on chunk free, check if available memory exceeds
  the low watermark and fire the availability notification
current_memory only counted physical GPU memory (memory_current), causing
userspace to underestimate pressure and continue allocating. Add
memory_swap_current so the reported usage reflects the true footprint.
Mirrors the existing memory.limit.high interface to provide a lower
memory protection threshold. Also renames the backing field from
memory_limit to memory_limit_high for naming consistency.
Add per-process per-GPU memory.limit.min interface mirroring
memory.limit.low. Validate on write that high >= low >= min,
returning -EINVAL on violation.
Two-phase reclaim in gvm_notify_all_processes_to_shrink: first reclaim
above low proportionally, then dip into low-to-min range only if needed.
Clamp force_shrink target to memory_limit_min. In pick_and_evict_root_chunk,
skip chunks belonging to processes already at min (with retry up to
root_chunks.count), and rename memory_limit to memory_limit_high. Clarify
target_memory/current_memory semantics across the eviction notice chain.
Move current_physical_mem == 0 check before limit/reclaim calculations
to avoid unnecessary work. Add reclaim_physical_mem == 0 check to skip
processes that have nothing to reclaim.
When a block is killed (cudaFree / VA space teardown), pages that were
evicted from GPU to CPU had their memory.swap.current charge but never
got uncharged, causing the swap counter to leak indefinitely. Uncharge
swap for all evicted pages before block_destroy_gpu_state clears the
evicted mask.
Switch to a single wait ioctl and shared wait queue so user space can block on one notification channel while preserving typed payloads.
Use an atomic counter (notice_listener_count) incremented/decremented
around wait_event_interruptible in gvm_wait_notice to track the number
of programs actively waiting for notices. In broadcast_availability,
divide available_bytes by this count so each listener is notified with
its fair share instead of the full free memory amount.
When GPU memory utilization exceeds the high watermark, evict to the
average of high and low watermarks instead of all the way down to the
low watermark, reducing unnecessary over-eviction.
The _low and _min variants already have explicit suffixes; align the
_high variant to the same naming convention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant