-
Notifications
You must be signed in to change notification settings - Fork 595
Closed
Labels
Milestone
Description
Description
We're experiencing a critical issue with Envoy Gateway where endpoint updates are delayed by 5-10 minutes for a specific HTTPRoute (Keycloak), resulting in 503 errors during deployments. This issue is reproducible only with this particular HTTPRoute, while other HTTPRoutes in our cluster work correctly.
Environment
- Kubernetes Version: GKE v1.32 with dataplane v2
- Envoy Gateway Version: v1.5.4
- Cluster Setup:
- 2 Envoy Gateway instances
- 2 proxy replicas per gateway
- 1 GKE native gateway (same namespace)
- Routing Mode: Endpoints (issue doesn't occur with Service mode, but that's not a viable solution for us)
Observed Behavior
During deployment rollouts of the Keycloak application, Envoy proxies receive endpoint updates with significant delays (5-10 minutes), causing:
- Incorrect proxy state
- 503 errors for incoming requests
- Traffic being routed to terminated pods
We monitored this using:
viddy 'egctl config envoy-proxy endpoint -n gateways -l gateway.envoyproxy.io/owning-gateway-name=my-public-envoy | jq -S "
.gateways |= with_entries(
.value.dynamicEndpointConfigs |= sort_by(.endpointConfig.clusterName) |
.value.staticEndpointConfigs |= sort_by(.endpointConfig.clusterName)
)
"'
Expected Behavior
Endpoint updates should propagate to Envoy proxies within seconds (similar to other HTTPRoutes in the cluster), ensuring zero-downtime deployments.
Manifests
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: keycloak
namespace: keycloak
spec:
hostnames:
- keycloak.my.domain
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: my-public-envoy
namespace: gateways
sectionName: https
rules:
- backendRefs:
- group: ''
kind: Service
name: keycloak
port: 8080
weight: 1
matches:
- path:
type: PathPrefix
value: /realms/
- path:
type: PathPrefix
value: /resources/
- path:
type: Exact
value: /robots.txt
status:
parents:
- conditions:
- lastTransitionTime: '2025-10-27T16:23:15Z'
message: Route is accepted
observedGeneration: 3
reason: Accepted
status: 'True'
type: Accepted
- lastTransitionTime: '2025-10-27T16:23:15Z'
message: Resolved all the Object references for the Route
observedGeneration: 3
reason: ResolvedRefs
status: 'True'
type: ResolvedRefs
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parentRef:
group: gateway.networking.k8s.io
kind: Gateway
name: my-public-envoy
namespace: gateways
sectionName: httpsapiVersion: v1
kind: Service
metadata:
name: keycloak
namespace: keycloak
spec:
clusterIP: 10.3.12.238
clusterIPs:
- 10.3.12.238
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: http
port: 8080
protocol: TCP
targetPort: http
selector:
app.kubernetes.io/instance: keycloak
app.kubernetes.io/name: keycloak
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/instance: keycloak
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: keycloak
app.kubernetes.io/version: 26.0.5
helm.sh/chart: keycloak-0.1.0
name: keycloak
namespace: keycloak
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: keycloak
app.kubernetes.io/name: keycloak
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/instance: keycloak
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: keycloak
app.kubernetes.io/version: 26.0.5
helm.sh/chart: keycloak-0.1.0
spec:
containers:
- args:
- start
- '--verbose'
- '--log-console-output=json'
- '--log-level=INFO,org.keycloak:DEBUG'
- '--features=user-event-metrics,client-secret-rotation'
env:
- name: STAKATER_KEYCLOAK_CONFIGMAP
value: 4e6c49a1c79cdff645dce8afc232d95bc93155b4
- name: KC_DB_USERNAME
valueFrom:
secretKeyRef:
key: LOGIN
name: keycloak-db
- name: KC_DB_PASSWORD
valueFrom:
secretKeyRef:
key: PASSWORD
name: keycloak-db
- name: KC_DB_URL_HOST
valueFrom:
secretKeyRef:
key: HOST
name: keycloak-db
- name: KC_DB_URL_PORT
valueFrom:
secretKeyRef:
key: PORT
name: keycloak-db
envFrom:
- secretRef:
name: keycloak-env
- configMapRef:
name: keycloak
image: europe-docker.pkg.dev/my-company/docker/keycloak:main-2ac8399
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /health/live
port: management
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: keycloak
ports:
- containerPort: 8080
name: http
protocol: TCP
- containerPort: 9000
name: management
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health/ready
port: management
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 3Gi
requests:
cpu: 100m
memory: 1Gi
securityContext: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /certs/public
name: public-tls
readOnly: true
- mountPath: /certs/private
name: private-tls
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: keycloak
serviceAccountName: keycloak
terminationGracePeriodSeconds: 30
volumes:
- name: public-tls
secret:
defaultMode: 420
secretName: keycloak-public-tls
- name: private-tls
secret:
defaultMode: 420
secretName: keycloak-private-tls
status:
availableReplicas: 1
conditions:
- lastTransitionTime: '2025-10-22T15:47:45Z'
lastUpdateTime: '2025-10-22T15:47:45Z'
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: 'True'
type: Available
- lastTransitionTime: '2025-09-24T19:04:05Z'
lastUpdateTime: '2025-10-28T10:07:22Z'
message: ReplicaSet "keycloak-7f64f656bf" has successfully progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
observedGeneration: 26
readyReplicas: 1
replicas: 1
updatedReplicas: 1Any guidance on debugging or resolving this issue would be greatly appreciated!