Skip to content

Conversation

@alexezio
Copy link
Contributor

@alexezio alexezio commented Nov 20, 2025

Description

This PR fixes a configuration mismatch in the datadog_csi_driver integration's auto-discovery configuration that causes the check to fail when deployed via Kubernetes Autodiscovery.

Problem

When the Datadog Agent attempts to auto-discover and configure the datadog_csi_driver check in Kubernetes environments, it fails with the following error:

Error: Detected 1 error while loading configuration model `InstanceConfig`:
openmetrics_endpoint
  Field required

Root Cause

The datadog_csi_driver integration uses OpenMetricsBaseCheckV2 as its base class, which expects the configuration field openmetrics_endpoint. However, the auto_conf.yaml file was incorrectly configured with the legacy field name prometheus_url:

Before:

instances:
  - prometheus_url: http://%%host%%:5000/metrics

This mismatch occurred because:

  1. The check implementation (check.py) correctly uses OpenMetricsBaseCheckV2
  2. The config spec (spec.yaml) correctly uses template: instances/openmetrics
  3. The generated config model expects openmetrics_endpoint: str (required field)
  4. But the auto-discovery config file used the old prometheus_url field name

Solution

Update auto_conf.yaml to use the correct field name openmetrics_endpoint to match the OpenMetrics V2 API:

After:

instances:
  - openmetrics_endpoint: http://%%host%%:5000/metrics

Testing

The fix has been validated to ensure:

  • ✅ The configuration model successfully loads with openmetrics_endpoint
  • ✅ No linter errors introduced
  • ✅ Existing tests continue to pass

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@codecov
Copy link

codecov bot commented Nov 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.01%. Comparing base (bd0fc42) to head (4584fb9).
⚠️ Report is 29 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@adel121
Copy link
Contributor

adel121 commented Nov 24, 2025

Thanks for fixing this.

I tested it and it is working fine after the fix:

=== Series ===
  METRIC                                                 TYPE   TIMESTAMP   VALUE  TAGS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
  datadog.csi_driver.node_publish_volume_attempts.count  count  1763996417  0      docker_image:gcr.io/datadoghq/csi-driver:1.0.0, endpoint:http://10.244.0.106:5000/metrics, image_id:gcr.io/datadoghq/csi-driver@sha256:fe7aefd6618b032014bbd8c96bf8dd15a2cb62dd75c50703e7c92868508ebcee, image_name:gcr.io/datadoghq/csi-driver, image_tag:1.0.0, kube_container_name:csi-node-driver, kube_daemon_set:datadog-csi-driver-node-server, kube_namespace:default, kube_ownerref_kind:daemonset, kube_qos:BestEffort, path:/var/run/datadog, pod_phase:running, short_image:csi-driver, status:success, type:APMSocketDirectory  
  datadog.csi_driver.node_publish_volume_attempts.count  count  1763996417  0      docker_image:gcr.io/datadoghq/csi-driver:1.0.0, endpoint:http://10.244.0.106:5000/metrics, image_id:gcr.io/datadoghq/csi-driver@sha256:fe7aefd6618b032014bbd8c96bf8dd15a2cb62dd75c50703e7c92868508ebcee, image_name:gcr.io/datadoghq/csi-driver, image_tag:1.0.0, kube_container_name:csi-node-driver, kube_daemon_set:datadog-csi-driver-node-server, kube_namespace:default, kube_ownerref_kind:daemonset, kube_qos:BestEffort, path:/var/run/datadog, pod_phase:running, short_image:csi-driver, status:success, type:DSDSocketDirectory  

=== Service Checks ===

  datadog.csi_driver.openmetrics.health  minikube  1763996415  OK               docker_image:gcr.io/datadoghq/csi-driver:1.0.0, endpoint:http://10.244.0.106:5000/metrics, image_id:gcr.io/datadoghq/csi-driver@sha256:fe7aefd6618b032014bbd8c96bf8dd15a2cb62dd75c50703e7c92868508ebcee, image_name:gcr.io/datadoghq/csi-driver, image_tag:1.0.0, kube_container_name:csi-node-driver, kube_daemon_set:datadog-csi-driver-node-server, kube_namespace:default, kube_ownerref_kind:daemonset, kube_qos:BestEffort, pod_phase:running, short_image:csi-driver  
  datadog.csi_driver.openmetrics.health  minikube  1763996416  OK               docker_image:gcr.io/datadoghq/csi-driver:1.0.0, endpoint:http://10.244.0.106:5000/metrics, image_id:gcr.io/datadoghq/csi-driver@sha256:fe7aefd6618b032014bbd8c96bf8dd15a2cb62dd75c50703e7c92868508ebcee, image_name:gcr.io/datadoghq/csi-driver, image_tag:1.0.0, kube_container_name:csi-node-driver, kube_daemon_set:datadog-csi-driver-node-server, kube_namespace:default, kube_ownerref_kind:daemonset, kube_qos:BestEffort, pod_phase:running, short_image:csi-driver  



  Running Checks
  ==============
    
    datadog_csi_driver (1.0.0)
    --------------------------
      Instance ID: datadog_csi_driver:44aaca37f4c10601 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/datadog_csi_driver.d/auto_conf.yaml[0]
      Total Runs: 2
      Metric Samples: Last Run: 2, Total: 4
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 2
      Average Execution Time : 12ms
      Last Execution Date : 2025-11-24 15:00:16.555 UTC (1763996416555)
      Last Successful Execution Date : 2025-11-24 15:00:16 UTC (1763996416000)
      
  Check Worker Utilization
  ========================
    No worker utilization data available

  Metadata
  ========
    config.hash: datadog_csi_driver:44aaca37f4c10601
    config.provider: file
    config.source: /etc/datadog-agent/conf.d/datadog_csi_driver.d/auto_conf.yaml[0]
2025-11-24 15:00:19 UTC | CORE | INFO | (comp/api/api/apiimpl/server.go:39 in stopServer) | Stopped HTTP server 'CMD API Server'
2025-11-24 15:00:19 UTC | CORE | INFO | (pkg/aggregator/aggregator.go:777 in run) | Stopping aggregator

Thank you

@adel121 adel121 added this pull request to the merge queue Nov 25, 2025
Merged via the queue into DataDog:master with commit f2c29c5 Nov 25, 2025
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants