[aws-cloudwatch-metrics] fix: grant permission to scrape api server metrics - Failed to scrape Prometheus endpoint #1257
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
#1098 - Failed to scrape Prometheus endpoint
The CloudWatch agent fails to scrape metrics from the control plane, resulting in logs with
Failed to scrape Prometheus endpointand a401Unauthorized error.{ "caller": "internal/transaction.go:123", "msg": "Failed to scrape Prometheus endpoint", "kind": "receiver", "name": "awscontainerinsightreceiver", "data_type": "metrics", "scrape_timestamp": 1760551208877, "target_labels": "{ClusterName=\"eks-clustername\", NodeName=\"<ip-redacted>.<region-redacted>.compute.internal\", Sources=\"[\\\"apiserver\\\"]\", Type=\"ControlPlane\", Version=\"0\", __name__=\"up\", instance=\"172.20.0.1:443\", job=\"containerInsightsKubeAPIServerScraper/172.20.0.1\"}" }Description of changes
It appears the existing
ClusterRolefor the agent's service account lacks the permission to access this non-resource URL.This change resolves the issue by adding a new rule to the ClusterRole, granting the
getverb for the nonResourceURLs path/metrics. This permission allows the agent to successfully authenticate and scrape the control plane's Prometheus metrics.See Metrics For Kubernetes System Components:
Checklist
README.mdfor modified charts)versioninChart.yamlfor the modified chart(s)Testing
I have duplicated the error mentioned above using the latest
0.0.11version of this chart and have adjusted the existing cluster role to include the updated permission and verified I no longer see the error repeated.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.