Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS handshake error from API server #6898

Open
sknmi opened this issue Aug 30, 2024 · 28 comments
Open

TLS handshake error from API server #6898

sknmi opened this issue Aug 30, 2024 · 28 comments
Labels
bug Something isn't working triage/solved Mark the issue as solved by a Karpenter maintainer. This gives time for the issue author to confirm.

Comments

@sknmi
Copy link

sknmi commented Aug 30, 2024

Description

Observed Behavior:

karpenter-c595bb5d8-8r8jr controller {"level":"ERROR","time":"2024-08-30T08:06:16.304Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:40666: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-hzfgs controller {"level":"ERROR","time":"2024-08-30T08:07:18.550Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:58290: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-8r8jr controller {"level":"ERROR","time":"2024-08-30T08:07:18.571Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:55794: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-8r8jr controller {"level":"ERROR","time":"2024-08-30T08:07:18.572Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:55792: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-hzfgs controller {"level":"ERROR","time":"2024-08-30T08:08:10.419Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:43424: EOF\n","commit":"62a726c"}
karpenter-c595bb5d8-8r8jr controller {"level":"ERROR","time":"2024-08-30T08:08:10.427Z","logger":"webhook","message":"http: TLS handshake error from 10.x.x.x:52314: EOF\n","commit":"62a726c"}

Expected Behavior:
No errors :)
Reproduction Steps (Please include YAML):
Karpenter on fargate in karpenter namespace.
These messages started to appear after upgrading to 1.0.1
Versions:

  • Chart Version: 1.0.1
  • Kubernetes Version (kubectl version): 1.30
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@sknmi sknmi added bug Something isn't working needs-triage Issues that need to be triaged labels Aug 30, 2024
@sknmi
Copy link
Author

sknmi commented Aug 30, 2024

fixed with

webhook:
  enabled: false

@sknmi sknmi closed this as completed Aug 30, 2024
@levinedaniel
Copy link

I don't think this issue should be closed. I am seeing a similar error in my log messages and require the webhook to remain enabled to facilitate the conversion to the latest api version for my resources.

@ezh
Copy link

ezh commented Sep 12, 2024

I agree with @levinedaniel. What is the reason to mark solution as closed with

webhook:
  enabled: false

The webhook is broken.

@Hronom
Copy link
Contributor

Hronom commented Sep 20, 2024

Same, v1.0.2. Please re-open.

Is disabling webhook an ok solution or some functionality will not work?

@Hronom
Copy link
Contributor

Hronom commented Sep 20, 2024

cc @sknmi message above

@sknmi sknmi reopened this Sep 20, 2024
@sknmi
Copy link
Author

sknmi commented Sep 20, 2024

@Hronom reopened :)

@m0untains
Copy link

Also seeing this issue after upgrading to v0.37.3.

@adawalli
Copy link

Saw this issue on 0.37.3 and 1.0.1

@AnkitBhalla22
Copy link

Seeing same in 1.0.2

@liafizan
Copy link
Contributor

liafizan commented Sep 25, 2024

Below findings are incorrect

Here is my observation. Please let me know if this is incorrect:

Karpenter does not provide a ca-client bundle as we can see from here.

When I look at the CRD in my cluster, I can see that it has been injected with a caBundle:

 webhook:
      clientConfig:
        caBundle: Redacted...
        service:
          name: karpenter
          namespace: karpenter
          path: /conversion/karpenter.sh
          port: 8443
      conversionReviewVersions:
      - v1beta1
      - v1
  group: karpenter.sh

I believe this is happening through ca-injector. So this means, that client config for this webhook has a ca-bundle specified but karpenter uses knative to inject certificate data into karpernter-certsecret which comes from here.

So this means that CA for CRD & Webhooks do not match and hence the error. If this is correct, then may be we can look at the possible solutions


I am still not sure how CA bundle is injected in CRD and I did see at one point that the CA bundle in secret vs CRD was different.

@jmdeal
Copy link
Contributor

jmdeal commented Oct 4, 2024

This appears to be the same issue we saw with the our defaulting / validating webhooks previously, the original issue was closed out when those webhooks were disabled by default: kubernetes-sigs/karpenter#718. I've been able to reproduce, and as with that issues there does not appear to be any actual impact to Karpenter's operation and the errors can be safely ignored.

From the original issue:

These TLS errors appear to be related to kubernetes/kubernetes#109022 which states that these handshake errors may be generated by some caching mechanism that is happening in the standard library that causes TLS errors on a cert rotation.

@liafizan are you still running into this? The cert is injected by knative, and I've been unable to reproduce. If you're still encountering this, I'd recommend opening a separate issue. I don't think it's related to the TLS errors we're seeing here.

I am still not sure how CA bundle is injected in CRD and I did see at one point that the CA bundle in secret vs CRD was different.

I'm going to mark this issue as solved for now, but let us know if any of you believe this issue is impacting Karpenter's ability to operate.

@jmdeal jmdeal added triage/solved Mark the issue as solved by a Karpenter maintainer. This gives time for the issue author to confirm. and removed needs-triage Issues that need to be triaged labels Oct 4, 2024
@laserpedro
Copy link

Hello @jmdeal,

After upgrading to minor 0.37.5 to enable the deleting of webhooks when deployed with ArgoCD I see two things:

  • first the validating and mutating webhooks are now properly deleted using ArgoCD.
  • the second one is that my CRDs are not in version v1 and are still in v1beta1 so IMO the TLS handshake error is causing the conversion webhook to fail, which is a problem with Karpenter migration to v1.0.x.
    kubectl get crd nodeclaims.karpenter.sh -o jsonpath='{.spec.versions[*].name}' =. v1 v1beta1 / So both versions exist in the cluster.
    Therefore the TLS handshake error in my case seems to prevent the validating webhook to perform the v1 migration. I checked the logs inside the controller and that is all I got from the webhook ...

@jmdeal
Copy link
Contributor

jmdeal commented Oct 7, 2024

the second one is that my CRDs are not in version v1 and are still in v1beta1 so IMO the TLS handshake error is causing the conversion webhook to fail

This doesn't indicate any issue with the conversion webhook. If you're on any pre-1.0 version with the conversion webhooks, the storage version is still v1beta1. The conversion webhooks only exist on those versions to enable rollback from v1.0. Also, once you upgrade to v1, both versions will still be present on the CRD, one isn't automatically removed once all stored resources are converted. Instead, you want to look at .status.storedVersions on the CRDs. On Karpenter v1.0.5+ Karpenter will remove v1beta1 from the stored versions once all CRs have been successfully migrated.

@laserpedro
Copy link

@jmdeal thank you for your answer, I misunderstood the conversion webhook and thought is was the other way around, thanks for the clarification !

@elihuj117
Copy link

We are seeing this same behavior. Upgrade from 0.37.0 to 1.0.3 (with a minor upgrade to 0.37.3 during the upgrade process). The error seems to be innocuous, but I wanted to see if there was any impact to the core functionality of Karpenter.

@apurvabhandari
Copy link

apurvabhandari commented Oct 9, 2024

I have done the upgrade from 0.37.5 to 1.0.6 and still see this issue. I have enabled webhook in 0.37.5 and this error is from karpenter 1.0.6
{"level":"ERROR","time":"2024-10-09T14:27:06.147Z","logger":"webhook","message":"http: TLS handshake error from 10.214.2.206:34084: EOF\n","commit":"6174c75"} {"level":"ERROR","time":"2024-10-09T14:27:06.319Z","logger":"webhook","message":"http: TLS handshake error from 10.214.60.56:40108: EOF\n","commit":"6174c75"}

@itayvolo
Copy link
Contributor

+1

@awoimbee
Copy link

awoimbee commented Oct 23, 2024

I think this issue is caused by the conversion webhook configured on the CRDs (I have had a hard time with these already with #6818).
I use pulumi transforms to remove them, the error is gone:

    transforms: [
      ({ props, opts, type }) => {
        if (type === "kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition") {
          // Disable Karpenter conversion webhooks which was only useful when upgrading to v1 and now causes errors
          props.spec.conversion = undefined;
          return { props, opts };
        }
        return undefined;
      }
    ]

@ajith-thomas-fw
Copy link

hi, I did the karpenter version upgrade from v0.33.10 to v1.0.3 following the upgrade guide, https://karpenter.sh/docs/upgrading/v1-migration/#upgrade-procedure,
but as mentioned above by others, ran into the TLS error, but without any impact on the karpenter functionalities.

{"level":"ERROR","time":"2024-11-01T05:16:43.587Z","logger":"webhook","message":"http: TLS handshake error from 100.x.x.x:32858: read tcp 100.x.x.x:8443->100.x.x.x:32858: read: connection reset by peer\n","commit":"688ea21"}
{"level":"ERROR","time":"2024-11-01T05:16:43.590Z","logger":"webhook","message":"http: TLS handshake error from 100.x.x.x:32876: read tcp 100.x.x.x:8443->100.x.x.x:32876: read: connection reset by peer\n","commit":"688ea21"}

i was able to ignore the errors by disabling the webhook by setting DISABLE_WEBHOOK=true. but as mentioned in the below thread, i am also not sure on the repercussions of this.
kubernetes-sigs/karpenter#718 (comment)

following the discussions in threads, i believe these webhooks are necessary to migrate the api from v1beta1 to v1 in future release. can someone comment on this.

Copy link
Contributor

This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.

@Hronom
Copy link
Contributor

Hronom commented Dec 15, 2024

I would like to hear clarifications about this from developers.
Specifically what is the recommended way if you use latest version of karpenter.

I still don't understand for what webhooks is used for and if I need to keep them enabled in latest version of karpenter.

Copy link
Contributor

This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.

@prad9192
Copy link
Contributor

One of the things I notice is that if we run a single replica of Karpenter, this error goes away. Not a recommendation, but reporting an observation if it helps the investigation.

Copy link
Contributor

github-actions bot commented Jan 2, 2025

This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.

@nantiferov
Copy link
Contributor

I hope this error will be gone with update to 1.1, which should support only v1 API.

Copy link
Contributor

This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.

@korncola
Copy link

korncola commented Jan 13, 2025

devs dead or why they are not responding? what are these webhooks, is disabling them safe?

@ajaykumarmandapati
Copy link

Issue exists after upgrading to Karpenter v1.1.1 , it is quite misleading and pollutes our logs. Do you'l recommend to turn off the webhook?

"message":"http: TLS handshake error from [2a05:d014:3b8:5c05::221b]:41978: EOF\n","commit":"a2875e3"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/solved Mark the issue as solved by a Karpenter maintainer. This gives time for the issue author to confirm.
Projects
None yet
Development

No branches or pull requests