Perf script utility to lock gpu frequency.#2977
Perf script utility to lock gpu frequency.#2977dingqingy-nv wants to merge 8 commits intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: Dingqing Yang <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThese changes add GPU frequency locking capability to the performance experiment setup system. A new CLI argument accepts an optional GPU graphics clock frequency in MHz, which is threaded through the setup pipeline and applied via Slurm Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/performance/perf_plugins.py (1)
428-432: Consider distinct command labels when both features are enabled.Both
_set_vboostand_set_lock_gpu_frequse"# Command 0:"in their comments. If a user enables both features simultaneously, the setup script will have two commands labeled "Command 0", which could be confusing when debugging.💡 Optional: Use descriptive labels instead of numeric indices
lock_freq_cmd = "\n".join( [ "", - "# Command 0: lock GPU graphics clock", + "# Setup: lock GPU graphics clock", " ".join(Or consider implementing a shared counter if command ordering matters for documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/perf_plugins.py` around lines 428 - 432, Both _set_vboost and _set_lock_gpu_freq generate script comments using the same literal label "# Command 0:", which can produce duplicate command labels when both features are enabled; update the comment labels in the lock_freq_cmd and the vboost_cmd generation to use distinct, descriptive labels (e.g., "# Command: lock GPU graphics clock" and "# Command: set vboost") or implement a shared incremental counter used by both functions to produce unique "Command N" labels so the produced setup script has non‑conflicting command identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/performance/perf_plugins.py`:
- Around line 428-432: Both _set_vboost and _set_lock_gpu_freq generate script
comments using the same literal label "# Command 0:", which can produce
duplicate command labels when both features are enabled; update the comment
labels in the lock_freq_cmd and the vboost_cmd generation to use distinct,
descriptive labels (e.g., "# Command: lock GPU graphics clock" and "# Command:
set vboost") or implement a shared incremental counter used by both functions to
produce unique "Command N" labels so the produced setup script has
non‑conflicting command identifiers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6473137f-4a78-46da-9ee4-380569f9ad03
📒 Files selected for processing (3)
scripts/performance/argument_parser.pyscripts/performance/perf_plugins.pyscripts/performance/setup_experiment.py
|
@dingqingy-nv can you add note for the new arg in README and doc string? |
Address review feedback: document when/why to use GPU clock locking and add the new arg to the performance README. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Dingqing Yang <[email protected]>
Address review feedback: document the correlation study use case for GPU clock locking and add the new arg to the performance README. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Dingqing Yang <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Dingqing Yang <[email protected]>
|
@malay-nagda fixed, please review and help approve, thanks! |
What does this PR do ?
provide a command line option in perf script for locking gpu frequency. This can be helpful for correlation study.
Summary by CodeRabbit
-lgc/--lock_gpu_freqCLI option to lock GPU graphics clock frequency (MHz) during training on compute clusters, with automatic reset after execution.