-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DCGM-Expoter msg="Could not retrieve ConfigMap ..." #400
Comments
@devnjw What env are you passing for dcgm-exporter? Are you trying to pass ConfigMap name using DCGM_EXPORTER_CONFIGMAP_DATA env? For custom metrics you can create a ConfigMap and deploy as here: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#custom-metrics-config. |
@shivamerla Thank you for your reply, but I don't need a custom exporter. |
got it, yes i will relay this to DCGM exporter team. When its configured to run using custom ConfigMap and that is not found, exporter should error out. |
@glowkey @dualvtable Please take a look at this. |
Tracking here: NVIDIA/dcgm-exporter#111 |
@devnjw this should be fixed in newer versions of dcgm-exporter. Closing. Please re-open if you are still experiencing this issue. |
I am running a cluster with a number of nvidia gpu. I'm also monitoring gpu using dcgm-exporter. However, sometimes the dcgm-exporter fails to give metrics with the logs below.
I think it is normal to restart Pod if the exporter has not found ConfigMap, but it doesn't. (Or at least it should be marked as not ready.)
I would appreciate it if you could give me feedback or fix this issue after checking it.
Other normal dcgm-exporters have the following logs.
The text was updated successfully, but these errors were encountered: