From 8f4be39cab44bac9d098926f06c9d5d641c89978 Mon Sep 17 00:00:00 2001 From: Timo Derstappen Date: Thu, 25 Jun 2026 20:47:26 +0200 Subject: [PATCH] fix(connectivity): drop hostname scoping on inner kagent-controller route The inner kagent-controller HTTPRoute (agentgateway data-plane Gateway) was rendered with the public hostname from kagent.controllerRoute.hostname. In-cluster A2A callers (klaus-gateway --a2a-url -> agentgateway Service cluster-DNS host) did not match it, so requests fell through to the MCP catch-all route ("/") and were rejected with 406 ("client must accept both application/json and text/event-stream"). The internal data-plane hop must match any Host; /kagent is more specific than "/", so a hostname-less route wins. The public route keeps its hostname. Co-authored-by: Cursor --- CHANGELOG.md | 1 + .../templates/kagent/controller-route.yaml | 13 +++++++++---- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4473f9e..5740ed7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed +- kagent A2A routing: the inner `kagent-controller` HTTPRoute (on the agentgateway data-plane Gateway) is no longer hostname-scoped. It was rendered with `hostnames: []` from `kagent.controllerRoute.hostname`, but in-cluster A2A callers — notably klaus-gateway's `--a2a-url`, which targets the agentgateway Service's cluster-DNS name (`agentgateway..svc.cluster.local`) — send that Service host, which did not match the public hostname. The request fell through to the catch-all MCP route (`agentic-platform-mcps`, path `/`), was handled as MCP Streamable-HTTP, and was rejected with `406 Not Acceptable ("mcp: client must accept both application/json and text/event-stream")`, so the Slack OBO sre-agent turn never reached the agent (it had passed JWT validation — the human token was forwarded — but died at content negotiation on the wrong route). The inner hop is internal and must match any Host; the `/kagent` prefix is more specific than the MCP `/` route, so a hostname-less route wins for every Host. The outer `kagent-controller-public` route keeps the public `hostname` (required on the shared TLS Gateway). - klausgateway channel routing: a new `*-dataplane-to-klausgateway` egress policy (cilium + kubernetes flavors, rendered when `klausGateway.agentgatewayRoute.enabled`) lets the agentgateway data-plane reach the `klaus-gateway` Service on `:8080`. The channel paths (`/v1`, `/web`, `/cli/v1`, and `/channels/slack` when `slack.enabled`) are served on the data-plane Gateway and forwarded to klaus-gateway, but the data-plane runs in default-deny egress (the `-dataplane` policy) whose cluster allowance only covers `80`/`443`. klaus-gateway listens on `8080`, so the forward was dropped and every channel request — including inbound Slack events delivered to the public hostname — failed with `503 UpstreamFailure ("Connect: deadline has elapsed")`, so the Slack bot never replied. Mirrors the existing `-dataplane-to-kagent` allowance. - `agentic-platform-connectivity` `values.schema.json`: allow `klausGateway.slack.{dmOnly,botToken,signingSecret,appToken}`. The umbrella forwards its whole `klausGateway` block to the connectivity HelmRelease via `forwardAllValues`, but the connectivity `slack` schema had `additionalProperties: false` and only declared `enabled`/`mode`/`secretName`, so a real install (gazelle) failed the HelmRelease upgrade with `Additional property dmOnly/botToken/signingSecret/appToken is not allowed`. These keys are consumed by the klaus-gateway subchart, not this chart; they are now declared (and documented as forwarded-only) so validation passes. - klausgateway Slack OBO egress: a new `klausgateway-obo-egress` NetworkPolicy (cilium + kubernetes flavors, rendered when `klausGateway.obo.enabled`) lets the klaus-gateway pod reach the muster authorization server on 443/10443 for RFC 8414 discovery and the OAuth token exchange. The gateway is put into default-deny egress by the `klausgateway-a2a-egress` policy, which only allowed DNS + the agentgateway data plane; without this allowance the OBO token call to muster's public issuer host (which resolves to the public NLB / private LB VIP) was dropped. Mirrors the existing kagent-agent and oauth2-proxy `world`+`cluster` 443/10443 egress. diff --git a/helm/agentic-platform-connectivity/templates/kagent/controller-route.yaml b/helm/agentic-platform-connectivity/templates/kagent/controller-route.yaml index 21bd657..907fb78 100644 --- a/helm/agentic-platform-connectivity/templates/kagent/controller-route.yaml +++ b/helm/agentic-platform-connectivity/templates/kagent/controller-route.yaml @@ -25,6 +25,15 @@ spec: # HTTPRoute — exposes the kagent API at on the agentgateway Gateway. # Path prefix is stripped (URLRewrite) before forwarding so the controller # receives requests at /api/... as it expects. +# +# This is the internal data-plane hop. It is deliberately NOT hostname-scoped: +# in-cluster A2A callers (e.g. klaus-gateway's --a2a-url, which points at the +# agentgateway Service's cluster-DNS name) send Host=..svc.cluster.local, +# which would not match the public hostname. A host-scoped inner route lets those +# requests fall through to the catch-all MCP route ("/"), where they are handled +# as MCP Streamable-HTTP and rejected with 406 ("client must accept both +# application/json and text/event-stream"). The path prefix (/kagent) is more +# specific than the MCP "/" route, so a hostname-less route wins for every Host. apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: @@ -38,10 +47,6 @@ spec: namespace: {{ .Release.Namespace }} group: gateway.networking.k8s.io kind: Gateway - {{- if $route.hostname }} - hostnames: - - {{ $route.hostname | quote }} - {{- end }} rules: - matches: - path: