-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator Deployment OOMKill. #86
Comments
I am assuming you mean the Cloudflare tunnel deployment and not the operator itself? Mine seems to be running fine with the same limits, may I ask which version of cloudflared are you using? |
No, the operator. This is the snippet from my kustomization.yaml: patches:
- path: patches/cloudflare-operator-controller-manager-resources.json
target:
group: apps
version: v1
kind: Deployment
name: cloudflare-operator-controller-manager |
I'll get a log capture. Not at the system right now. |
Okay. I see the confusion. I referenced the the tunnel deployment code in the original post. This is what I meant to reference:
Pod logs:
It seems doubling the limit to |
I wonder if the kube client discovery cache is bloating the memory. I don't have an excessive number of CRDs in either. I cleaned up the 1.24 cluster before the image above. I'll clean up the 1.23 cluster tomorrow and see what happens. |
That does not seem right and I cannot think of a way to debug why this one is taking more memory (other than profiling it, which I am not sure is worth the effort haha) since the containers themselves do not have any tools for you to exec into. The 50 MB sounds about right. I do not think the kube discovery has anything to do with this, but sure, lemme know. Mine used to be on k8s 1.22, now on 1.26 so the version should not be an issue. |
I did get an alloc flame graph with the krew flame plugin. Github does a static rendering, so the 15min one is sort of useless when posted here. I guess I have something wrong with that cluster. I'll roll this out to the rest and compare. FWIW, there's just a single ClusterTunnel in my deployment. The overlays only change the name of the tunnel. |
Is that a alloc count or byte graph? Either way, all I see are k8s libraries used by the controller, nothing from the code from this project. The widest call by x/net/http2 -> compress/gzip seems like a lot of (or a large body of, depending on what graph this is) HTTP requests to the manager pod. If health checks or something like that are misconfigured (to either send a lot of requests or request with large content), it could be a reason too. |
The memory limits might be a little too low. I wonder if anyone else is seeing the same with this version. I'm not doing anything fancy.
Version:
v0.10.0
I needed to patch them to 400Mi (Not precise. I just picked a number):
Thanks!
Edit: Removed an incorrect code line reference.
https://github.com/adyanth/cloudflare-operator/blob/c38e0cc14dceef41729f8f9852c5e3743d392bff/controllers/reconciler.go#L491The text was updated successfully, but these errors were encountered: