-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics Server could not scrape log with “tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-“ error in log #1468
Comments
Duplicate #1422 |
Hi honarkhah, |
/kind support |
Hello, We are having this same issue. The only workaround I have found is to run metrics-server on ec2 rather then Fargate. When running metrics-server on ec2, there are no issues or errors seen in the logs. |
same question for k3s # kubectl logs -n kube-system metrics-server-79f66dff9d-5sflh --tail 300 -f
Error from server: Get "https://10.1.4.13:10250/containerLogs/kube-system/metrics-server-79f66dff9d-5sflh/metrics-server?follow=true&tailLines=300": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.1.4.13 |
facing the same for ec2? any workaround? |
I had the same issue while I was upgrading my cluster, and the issue practically solved itself. Tl;dr: check if you have any other clusters using similar fargate profiles, and if they're on different versions, upgrade them to match. I had dev and prod. Even though they're completely different clusters (with their own nodes and fargate profiles), they broke each other (and fixed each other). I'm still not quite sure what caused my issue, but I'm leaving my story below in case it helps someone else. long version: It was odd, because the issue came out of nowhere; I never had issues with the metrics-server on fargate before and it was always able to scrape fargate nodes. This was obviously preventing me from upgrading prod. I couldn't upgrade prod if the broken metrics-server was caused by the upgrade. I decided to see how the metrics-server was working on fargate on prod. That's when I was surprised to see that it was also broken on prod! Baffled, because the two clusters and their fargate profiles are (should be?!) completely separate. I checked prod's version, and it was still 1.25 as expected. For some reason, all my fargate nodes (on prod) had restarted, and that's when my metrics-server problems started. I decided to go ahead and update prod to 1.26, and voila, metrics server suddenly started working on both clusters, dev and prod. Still not sure why... I've now upgraded dev and prod to 1.29, and metrics-server is still working well. |
I'm using AWS EKS Fargate V1.29 After I lowered the metrics-server version to V0.6.4, it worked normally
|
it works fo me #1025 (comment) |
What happened:
Logs from the matrics-server pod show this repeatedly
E0410 22:04:01.247686 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.124.4.238:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-10-124-4-238.ap-southeast-1.compute.internal"
E0410 22:04:16.201141 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.124.4.238:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-10-124-4-238.ap-southeast-1.compute.internal"
E0410 22:04:31.201853 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.124.4.238:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-10-124-4-238.ap-southeast-1.compute.internal"
E0410 22:04:46.277913 1 scraper.go:149] "Failed to scrape node" err="Get "https://10.124.4.238:10250/metrics/resource\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, not 10.124.4.238" node="fargate-ip-10-124-4-238.ap-southeast-1.compute.internal"
What you expected to happen:
To be able to scrape itself.
Anything else we need to know?:
However, the follow errors appear.
E0410 22:13:28.928630 1 scraper.go:149] "Failed to scrape node" err="request failed, status: "403 Forbidden"" node="fargate-ip-10-124-4-186.ap-southeast-1.compute.internal"
E0410 22:13:43.827793 1 scraper.go:149] "Failed to scrape node" err="request failed, status: "403 Forbidden"" node="fargate-ip-10-124-4-186.ap-southeast-1.compute.internal"
Environment:
spoiler for Metrics Server manifest:
spoiler for Metrics Server manifest:
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-10T21:48:44Z"
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: metrics-server
app.kubernetes.io/version: 0.7.1
helm.sh/chart: metrics-server-3.12.1
name: metrics-server
namespace: kube-system
resourceVersion: "1044967"
uid: bbd89fdf-d933-4fd3-9bfa-2c8351bc9159
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-10T21:48:44Z"
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: metrics-server
app.kubernetes.io/version: 0.7.1
helm.sh/chart: metrics-server-3.12.1
name: metrics-server
namespace: kube-system
resourceVersion: "1044976"
uid: fe68eb2b-9ecf-4c57-996e-6836955f614c
spec:
clusterIP: 172.20.20.200
clusterIPs:
internalTrafficPolicy: Cluster
ipFamilies:
ipFamilyPolicy: SingleStack
ports:
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/name: metrics-server
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-10T21:48:44Z"
generation: 3
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: metrics-server
app.kubernetes.io/version: 0.7.1
helm.sh/chart: metrics-server-3.12.1
name: metrics-server
namespace: kube-system
resourceVersion: "1048455"
uid: 51c7e198-d10b-4ec4-b96d-69e151de778b
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/name: metrics-server
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/name: metrics-server
spec:
containers:
- args:
- --secure-port=10250
- --cert-dir=/tmp
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
image: registry.k8s.io/metrics-server/metrics-server:v0.7.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: metrics-server
ports:
- containerPort: 10250
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
dnsPolicy: ClusterFirst
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: metrics-server
serviceAccountName: metrics-server
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: tmp
status:
availableReplicas: 1
conditions:
lastTransitionTime: "2024-04-10T21:50:06Z"
lastUpdateTime: "2024-04-10T21:50:06Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
lastTransitionTime: "2024-04-10T21:48:44Z"
lastUpdateTime: "2024-04-10T22:13:47Z"
message: ReplicaSet "metrics-server-578bc9bf64" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 3
readyReplicas: 1
replicas: 1
updatedReplicas: 1
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
annotations:
meta.helm.sh/release-name: metrics-server
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-10T21:48:44Z"
labels:
app.kubernetes.io/instance: metrics-server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: metrics-server
app.kubernetes.io/version: 0.7.1
helm.sh/chart: metrics-server-3.12.1
name: v1beta1.metrics.k8s.io
resourceVersion: "1048453"
uid: 84cc08c7-27bc-4a4e-a7b8-efcd7b428ea2
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
status:
conditions:
message: 'failing or missing response from https://10.124.4.186:10250/apis/metrics.k8s.io/v1beta1:
bad status from https://10.124.4.186:10250/apis/metrics.k8s.io/v1beta1: 404'
reason: FailedDiscoveryCheck
status: "False"
type: Available
spoiler for Kubelet config:
spoiler for Metrics Server logs:
spolier for Status of Metrics API:
/kind bug
The text was updated successfully, but these errors were encountered: