From 9f93f25145525760dd58242b53eb3af5fbfa6635 Mon Sep 17 00:00:00 2001 From: Timo Derstappen Date: Thu, 25 Jun 2026 20:13:10 +0200 Subject: [PATCH] fix(connectivity): allow agentgateway data-plane egress to klaus-gateway:8080 The channel paths (/v1, /web, /cli/v1, /channels/slack) are served on the agentgateway data-plane Gateway and forwarded to the klaus-gateway Service on :8080. The data-plane runs in Cilium default-deny egress (the -dataplane policy), whose cluster allowance only covers 80/443, and there was no per-backend egress allowance for klaus-gateway (unlike -dataplane-to-kagent). So the forward was dropped and every channel request -- including inbound Slack events delivered to the public hostname and routed through the data-plane -- failed with 503 UpstreamFailure ("Connect: deadline has elapsed"), so the Slack bot never replied. Add a -dataplane-to-klausgateway egress policy (cilium + kubernetes flavors, rendered when klausGateway.agentgatewayRoute.enabled) mirroring the existing -dataplane-to-kagent allowance. Co-authored-by: Cursor --- CHANGELOG.md | 1 + .../templates/klausgateway/netpol.yaml | 58 +++++++++++++++++++ 2 files changed, 59 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9f6acfc..4473f9e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed +- klausgateway channel routing: a new `*-dataplane-to-klausgateway` egress policy (cilium + kubernetes flavors, rendered when `klausGateway.agentgatewayRoute.enabled`) lets the agentgateway data-plane reach the `klaus-gateway` Service on `:8080`. The channel paths (`/v1`, `/web`, `/cli/v1`, and `/channels/slack` when `slack.enabled`) are served on the data-plane Gateway and forwarded to klaus-gateway, but the data-plane runs in default-deny egress (the `-dataplane` policy) whose cluster allowance only covers `80`/`443`. klaus-gateway listens on `8080`, so the forward was dropped and every channel request — including inbound Slack events delivered to the public hostname — failed with `503 UpstreamFailure ("Connect: deadline has elapsed")`, so the Slack bot never replied. Mirrors the existing `-dataplane-to-kagent` allowance. - `agentic-platform-connectivity` `values.schema.json`: allow `klausGateway.slack.{dmOnly,botToken,signingSecret,appToken}`. The umbrella forwards its whole `klausGateway` block to the connectivity HelmRelease via `forwardAllValues`, but the connectivity `slack` schema had `additionalProperties: false` and only declared `enabled`/`mode`/`secretName`, so a real install (gazelle) failed the HelmRelease upgrade with `Additional property dmOnly/botToken/signingSecret/appToken is not allowed`. These keys are consumed by the klaus-gateway subchart, not this chart; they are now declared (and documented as forwarded-only) so validation passes. - klausgateway Slack OBO egress: a new `klausgateway-obo-egress` NetworkPolicy (cilium + kubernetes flavors, rendered when `klausGateway.obo.enabled`) lets the klaus-gateway pod reach the muster authorization server on 443/10443 for RFC 8414 discovery and the OAuth token exchange. The gateway is put into default-deny egress by the `klausgateway-a2a-egress` policy, which only allowed DNS + the agentgateway data plane; without this allowance the OBO token call to muster's public issuer host (which resolves to the public NLB / private LB VIP) was dropped. Mirrors the existing kagent-agent and oauth2-proxy `world`+`cluster` 443/10443 egress. - klausgateway connectivity route: the `AgentgatewayBackend` `.spec.static.host` now defaults to the correct `klaus-gateway` Service name (the klaus-gateway chart's default, matching `templates/klausgateway/netpol.yaml`) instead of `klausgateway`, which resolved to a non-existent Service when `klausGateway.fullnameOverride` was unset. diff --git a/helm/agentic-platform-connectivity/templates/klausgateway/netpol.yaml b/helm/agentic-platform-connectivity/templates/klausgateway/netpol.yaml index e2f8d25..8b52bf0 100644 --- a/helm/agentic-platform-connectivity/templates/klausgateway/netpol.yaml +++ b/helm/agentic-platform-connectivity/templates/klausgateway/netpol.yaml @@ -110,3 +110,61 @@ spec: protocol: TCP {{- end }} {{- end }} +{{- if and .Values.klausGateway.enabled .Values.klausGateway.agentgatewayRoute.enabled .Values.networkPolicy.enabled }} +{{- $klausgatewayPort := "8080" }} +{{- $fullname := .Values.klausGateway.fullnameOverride | default "klaus-gateway" }} +{{- if eq .Values.networkPolicy.flavor "cilium" }} +--- +# Egress from the agentgateway data-plane to klaus-gateway. The channel paths +# (/v1, /web, /cli/v1, and /channels/slack when slack.enabled) are served on the +# agentgateway data-plane Gateway and forwarded to the klaus-gateway Service. The +# data-plane runs in Cilium default-deny egress (see the -dataplane policy), whose +# cluster allowance only covers 80/443 — klaus-gateway listens on 8080, so without +# this rule the data-plane → klaus-gateway connection is dropped and the request +# fails with a 503 UpstreamFailure ("Connect: deadline has elapsed"). For Slack +# this means inbound events (delivered to the public hostname, routed through the +# data-plane) never reach the gateway, so the bot never replies. Mirrors the +# -dataplane-to-kagent allowance. +apiVersion: cilium.io/v2 +kind: CiliumNetworkPolicy +metadata: + name: {{ include "name" . }}-dataplane-to-klausgateway + namespace: {{ .Release.Namespace }} + labels: + {{- include "labels.common" . | nindent 4 }} +spec: + endpointSelector: + matchLabels: + gateway.networking.k8s.io/gateway-name: {{ .Values.gateway.name }} + egress: + - toEndpoints: + - matchLabels: + app.kubernetes.io/name: {{ $fullname | quote }} + toPorts: + - ports: + - port: {{ $klausgatewayPort | quote }} + protocol: TCP +{{- else if eq .Values.networkPolicy.flavor "kubernetes" }} +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: {{ include "name" . }}-dataplane-to-klausgateway + namespace: {{ .Release.Namespace }} + labels: + {{- include "labels.common" . | nindent 4 }} +spec: + podSelector: + matchLabels: + gateway.networking.k8s.io/gateway-name: {{ .Values.gateway.name }} + policyTypes: [Egress] + egress: + - to: + - podSelector: + matchLabels: + app.kubernetes.io/name: {{ $fullname | quote }} + ports: + - port: {{ $klausgatewayPort | int }} + protocol: TCP +{{- end }} +{{- end }}