fix(local): cap keycloak jvm heap and raise nats helm wait timeout#79
fix(local): cap keycloak jvm heap and raise nats helm wait timeout#79drobertson123 wants to merge 1 commit into
Conversation
Keycloak was OOMKilled in a crashloop during its build phase: with no heap settings the Quarkus default MaxRAMPercentage=70% plus non-heap RSS (metaspace, GC, buffers) exceeded the container limit. Constrain the heap via JAVA_OPTS_KC_HEAP so it fits, and bump the limit to 1.5Gi for headroom. Keycloak now reaches Ready in ~30s. The nats-event-bus Helm release uses wait: true, but skaffold relied on Helm's default 5m --wait timeout. The mTLS NATS cluster can take longer to become ready on slower/loaded machines, producing "context deadline exceeded" even though the release ultimately deploys. Raise the upgrade --timeout to 15m on all three cluster configs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughUpdates Keycloak container memory resources and adds JVM heap configuration. Extends Helm deployment upgrade timeouts from 5 to 15 minutes for three NATS release environments to accommodate mTLS cluster readiness checks. ChangesInfrastructure deployment tuning
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adjusts local deployment settings to reduce flaky/failed startups on slower machines by increasing Helm timeouts for NATS and tuning Keycloak memory/JVM heap to avoid OOM kills.
Changes:
- Increase Helm upgrade timeout to 15 minutes for local NATS releases in Skaffold.
- Raise Keycloak container memory requests/limits.
- Constrain Keycloak JVM heap via environment variable to keep RSS within container limits.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| local/nats/skaffold.releases.yaml | Adds Helm upgrade --timeout=15m to give local NATS clusters more time to become ready. |
| local/infra/keycloak/keycloak.yaml | Increases Keycloak memory resources and sets JVM heap bounds to reduce OOMKilled restarts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| resources: | ||
| requests: | ||
| cpu: "100m" | ||
| memory: "550Mi" | ||
| memory: "1Gi" | ||
| limits: | ||
| cpu: "1000m" | ||
| memory: "550Mi" | ||
| memory: "1536Mi" |
There was a problem hiding this comment.
@drobertson123 I tuned the usage pretty low on purpose, so it can fit on smaller machines. It should be using ~80-90% of the requested memory. I measured a run just now at 458MiB used. The e2e tests in CI show the deployment success. Are you running additional activity on keycloak outside of the normal test flow?
| # Constrain the JVM heap so heap + non-heap RSS fits inside the | ||
| # container memory limit. Without this, Quarkus defaults to | ||
| # MaxRAMPercentage=70%, and heap + metaspace/GC/buffers exceed the | ||
| # limit and the pod is OOMKilled during the build phase. | ||
| - name: JAVA_OPTS_KC_HEAP | ||
| value: "-Xms256m -Xmx768m" |
| resources: | ||
| requests: | ||
| cpu: "100m" | ||
| memory: "550Mi" | ||
| memory: "1Gi" | ||
| limits: | ||
| cpu: "1000m" | ||
| memory: "550Mi" | ||
| memory: "1536Mi" |
There was a problem hiding this comment.
@drobertson123 I tuned the usage pretty low on purpose, so it can fit on smaller machines. It should be using ~80-90% of the requested memory. I measured a run just now at 458MiB used. The e2e tests in CI show the deployment success. Are you running additional activity on keycloak outside of the normal test flow?
Summary
Two fixes to the local Kind e2e stack (
make test) that were preventing it from completing:MaxRAMPercentage=70%, so heap + non-heap RSS (metaspace, GC, buffers) exceeded the container memory limit and the pod was OOMKilled during its build phase — regardless of machine speed. Constrain the heap viaJAVA_OPTS_KC_HEAP=-Xms256m -Xmx768mand bump the limit to 1.5Gi for headroom. Keycloak now reaches Ready in ~30s.nats-event-busis deployed withwait: true, but skaffold relied on Helm's default 5-minute--waittimeout. The mTLS NATS cluster can take longer to become ready on slower/loaded machines, producingcontext deadline exceededeven though the release ultimately deploys. Raise the upgrade--timeoutto 15m on all three cluster configs (csc, cpc-1, cpc-2).Testing
Full
make test(3-cluster Kind e2e incl. federation perf tests) passes end-to-end with these changes:🤖 Generated with Claude Code
Summary by CodeRabbit