Skip to content

Conversation

@tdharris
Copy link

@tdharris tdharris commented Oct 15, 2025

Issue

#1098 - Failed to scrape Prometheus endpoint

The CloudWatch agent fails to scrape metrics from the control plane, resulting in logs with Failed to scrape Prometheus endpoint and a 401 Unauthorized error.

{
    "caller": "internal/transaction.go:123",
    "msg": "Failed to scrape Prometheus endpoint",
    "kind": "receiver",
    "name": "awscontainerinsightreceiver",
    "data_type": "metrics",
    "scrape_timestamp": 1760551208877,
    "target_labels": "{ClusterName=\"eks-clustername\", NodeName=\"<ip-redacted>.<region-redacted>.compute.internal\", Sources=\"[\\\"apiserver\\\"]\", Type=\"ControlPlane\", Version=\"0\", __name__=\"up\", instance=\"172.20.0.1:443\", job=\"containerInsightsKubeAPIServerScraper/172.20.0.1\"}"
}

Description of changes

It appears the existing ClusterRole for the agent's service account lacks the permission to access this non-resource URL.

This change resolves the issue by adding a new rule to the ClusterRole, granting the get verb for the nonResourceURLs path /metrics. This permission allows the agent to successfully authenticate and scrape the control plane's Prometheus metrics.

See Metrics For Kubernetes System Components:

"If your cluster uses RBAC, reading metrics requires authorization via a user, group or ServiceAccount with a ClusterRole that allows accessing /metrics. For example:"

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - nonResourceURLs:
      - "/metrics"
    verbs:
      - get

Checklist

  • Added/modified documentation as required (such as the README.md for modified charts)
  • Incremented the chart version in Chart.yaml for the modified chart(s)
  • Manually tested. Describe what testing was done in the testing section below
  • Make sure the title of the PR is a good description that can go into the release notes

Testing

I have duplicated the error mentioned above using the latest 0.0.11 version of this chart and have adjusted the existing cluster role to include the updated permission and verified I no longer see the error repeated.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@tdharris tdharris changed the title [aws-cloudwatch-metrics] fix: grant permission to scrape api server metrics [aws-cloudwatch-metrics] fix: grant permission to scrape api server metrics - Failed to scrape Prometheus endpoint Oct 27, 2025
@tdharris
Copy link
Author

tdharris commented Nov 3, 2025

@jayanthvn, hoping you might be able to offer a review if possible. Thanks so much for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant