-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to start NVML integration #2382
Comments
All of those fixes seem reasonable. As datadog's officially supporting the NVIDA DCGM Exporter now, I've deprecated the nvml plugin internally. It may be best to add it as deprecated here as well. Someone could also modify the plugin to refuse to install for newer datadog versions,but I won't have time to contribute this. |
datadog-agent updates have broken this integration for me as well. I've been able to use the DCGM exporter but it requires running the DCGM exporter container which is less than ideal if it's a machine that doesn't run Docker. |
While it's not an optimal workaround, I've made the check work using pure-Python Protobuf implementation: PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python agent check nvml
...
Running Checks
==============
nvml (1.0.9)
------------
Instance ID: nvml:b6f35e1900952b0b [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/nvml.yaml
Total Runs: 1
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 1ms
Last Execution Date : 2024-11-11 12:28:56 UTC (1731328136000)
Last Successful Execution Date : 2024-11-11 12:28:56 UTC (1731328136000)
Metadata
========
config.hash: nvml:b6f35e1900952b0b
config.provider: file
Check has run only once, if some metrics are missing you can try again with --check-rate to see any other metric if available.
This check type has 1 instances. If you're looking for a different check instance, try filtering on a specific one using the --instance-filter flag or set --discovery-min-instances to a higher value This means that it would needed to be applied at agent level for all checks I guess - I'm not aware of being able to use the non-C++ implementation only for this check. |
Trying to solve the issue at the root, I think we can release a new patch version for $ protoc --python_out=nvml/datadog_checks/nvml nvml/datadog_checks/nvml/api.proto |
JFI I've opened #2535, tested against Datadog Agent v7.59.0. |
Output of the info page
When installing NVML integration, getting the following error:
Loading Errors
Looking at the debug logs
To fix this issue:
This method is recommended best practices as the feature is owned and supported by Datadog. Included in the accompanying documentation is an example configuration that executes the same processes as the NVML Integration.
Nvidia DCGM Exporter: https://docs.datadoghq.com/integrations/dcgm/?tab=hostdocker#overview
The text was updated successfully, but these errors were encountered: