-
Notifications
You must be signed in to change notification settings - Fork 13
[WIP] Auto-generate metrics config by merging with upstream #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] Auto-generate metrics config by merging with upstream #77
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: empovit The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ced5098 to
ce6cfd0
Compare
|
/hold |
|
Request changes
Helm change ( |
|
Thanks @yakovbeder, good catch! Fixing those |
ce6cfd0 to
0ddcdc5
Compare
yakovbeder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @empovit 👋
The script fixes look good:
✅ curl -fsSL for proper error handling
✅ Whitespace trimming in awk dedup logic
✅ .vscode/settings.json removed
One question about the relationship with PR #69:
required-metrics.csv includes DCGM_FI_PROF_GR_ENGINE_ACTIVE, but #69 aims to remove profiling metrics due to pre-Volta GPU compatibility issues. Should this PR use DCGM_FI_DEV_GPU_UTIL instead in the required list, or is the plan to have both available and let the plugin code choose?
Also, small nit: required-metrics.csv has a trailing blank line (line 28).
0ddcdc5 to
0ce70bd
Compare
|
Warning Rate limit exceeded@empovit has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 46 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (5)
WalkthroughVersion bump to Chart.yaml and refactoring of DCGM metric configuration by replacing inline metric definitions in a ConfigMap with an external file reference. A new shell script is introduced to fetch, filter, deduplicate, and manage metrics from upstream sources. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0ce70bd to
9780cb2
Compare
|
@yakovbeder #69 is wrong. We will have to support both metrics, depending on which one is available. I believe in some cases both old and new can be found in the same cluster (node with old and new GPUs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
deployment/console-plugin-nvidia-gpu/templates/configmap.yaml (1)
9-9: The external metrics file exists and is properly configured.The
dcgm-metrics.csvfile is present (2,357 bytes, 30 lines) infiles/and contains valid metric definitions. Thescripts/update-metrics.shscript is in place to manage updates. The Helm template syntax is correct and will properly load the file.To strengthen the setup, consider:
- Adding a
.helmignorecheck to ensurefiles/dcgm-metrics.csvis packaged with releases- Documenting in the README that the metrics file is auto-generated and the update script must be run before deployment
- Adding CI validation to verify the file is non-empty before release
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
deployment/console-plugin-nvidia-gpu/files/dcgm-metrics.csvis excluded by!**/*.csvscripts/required-metrics.csvis excluded by!**/*.csv
📒 Files selected for processing (3)
deployment/console-plugin-nvidia-gpu/Chart.yamldeployment/console-plugin-nvidia-gpu/templates/configmap.yamlscripts/update-metrics.sh
🔇 Additional comments (3)
deployment/console-plugin-nvidia-gpu/Chart.yaml (1)
15-15: LGTM! Appropriate version bump.The patch version increment aligns with the addition of the auto-generated metrics configuration feature.
scripts/update-metrics.sh (2)
1-19: LGTM! Good use of strict bash options and cleanup trap.The script setup follows best practices with
set -euo pipefailfor error handling and a trap to clean up temporary files on exit.
28-30: LGTM! Clear auto-generated file header.The header comment appropriately indicates the file is auto-generated.
6c56b8f to
766ab7f
Compare
Users of this console plugin are required to replace their current list of DCGM metrics with the list installed by the plugin, because the plugin uses a few metrics that are not included with the DCGM exporter by default. In order to make this configuration switch less intrusive, and to cover the most likely case when no custom DCGM configuration is used, we want to include the metrics required by this plugin in addition to, and not instead of, the default ones.
766ab7f to
61ad14c
Compare
|
I see. In that case, this PR is /lgtm |
|
/lgtm |
Users of this console plugin are required to replace their current list of DCGM metrics with the list installed by the plugin, because the plugin uses a few metrics that are not included with the DCGM exporter by default.
In order to make this configuration switch less intrusive, and to cover the most likely case when no custom DCGM configuration is used, we want to include the metrics required by this plugin in addition to, and not instead of, the default ones.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.