feat: Discord chat ingress (discordbot) by 0xdiid · Pull Request #401 · paradigmxyz/centaur

0xdiid · 2026-06-04T12:56:11Z

Adds Discord as a chat ingress for centaur. It's deliberately a close clone of slackbotv2 — same session lifecycle against the api-rs control plane, same streaming/rendering loop, same stream-safety behavior (bounded task payloads, tolerant context collection, execution-scoped event streams, Postgres resilience, plain-text render requests) — with Discord-shaped edges: a persistent Gateway connection instead of webhooks, a guild allowlist (the bot is inert until one is configured), auto-threading off the triggering message, an instant placeholder message while the sandbox spins up (where slackbotv2 instead waits for the first visible output), and thread-starter context — Discord keeps the message a thread was started from in the parent channel rather than in the thread itself, and webhook-style messages (Sentry alerts and the like) carry their payload in embeds, so both get fetched/flattened into the agent's context where Slack hands them over for free. The deliberate divergences are commented in-code at each site.

Non-Discord changes

api-rs sandboxes get tools + an overlay, via the CLI-shim model. Agents spawned by the Rust control plane get no deployment tools and no overlay without this — it affects every ingress riding api-rs, not just Discord. Rather than duplicating the Python control plane's tool-server sidecar, this finishes the last mile of the shim architecture the control plane already carries: two init containers copy the tool sources and overlay tree into the agent container, and the agent env gets the matching tool-directory paths so the existing shim installer turns them into CLIs at boot. Secrets were already handled — placeholder env vars with credential injection at the egress proxy, database access through proxied DSNs — so nothing real ever enters the sandbox, and warm-pool sandboxes get the same wiring through the shared spawn path. Everything is opt-in and additive (the config defaults to off, and the chart wires it through the existing toolServer/overlay values), so unset flags reproduce upstream behavior exactly. No new dependencies; needs a live re-verification on a cluster.
A network policy carve-out for direct HTTPS egress from the bot, because the Discord Gateway is an outbound websocket that can't go through the proxy. Scoped to the bot's pod only (health-probe ingress, control-plane + database + 443 egress), but worth a security look.
The bot duplicates a lot of slackbotv2 (session client, streaming, rendering) rather than sharing a package. We've been keeping them in sync by hand; factoring out the common core is probably the right long-term move.

🤖 Generated with Claude Code

Rebased onto api-rs-control-plane. discordbot is a direct clone of the current slackbotv2 — its own inlined session-api (no shared bridge package, to avoid coupling to slackbotv2's churn) — pointing at the api-rs session control plane, including the upstream handoff-before-ack flow, retryable SessionApiError classification, execute idempotency keys, client_message_id append dedupe, and render obligations with on-start recovery. Discord-specific pieces: createDiscordAdapter, a long-lived Gateway controller, fail-closed guild allowlist + DM-deny, native-thread naming, and a typing-indicator keepalive in place of Slack status/title. Where slackbotv2 answers 503 to request a Slack webhook retry, the Gateway has no re-delivery, so handoff failures render in-thread. Carries the prior P1 fixes (concurrency:'drop', generic api-rs error messages). 21 unit tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Mirror the slackbotv2 deploy pattern for discordbot against the api-rs control plane: Dockerfile (copies packages/ + .npmrc for the tslib hoist), Helm discordbot.yaml (Deployment + Service, CENTAUR_API_URL -> api-rs:{apiRs.port}, replicas:1 + Recreate + 35s grace for the singleton Gateway session), a dedicated NetworkPolicy (egress to api-rs + postgres + direct :443 for the Gateway, since the cluster is default-deny), values + dev override (off by default, guild allowlist required), Justfile _build-discordbot + import/ghcr wiring, and Discord keys in the secrets bootstrap. CI matrix unchanged (slackbotv2/api-rs build locally too). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ress The discordbot (a clone of slackbotv2) forwards every message to api-rs (CENTAUR_API_URL → centaur-centaur-api-rs:8080). Its own egress NetworkPolicy already permits →api-rs, but the api-rs ingress `from` list allowed only slackbotv2 and managed-by=api-rs sandboxes — not the discordbot. On a cluster that enforces NetworkPolicy the forward is rejected at connect time, so the bot can never reach the control plane. Add a discordbot podSelector to the api-rs ingress, gated on `.Values.discordbot.enabled` and mirroring the slackbotv2 block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The discordbot (like slackbotv2) connects to Postgres directly for its per-thread advisory locks / session state, and its egress NetworkPolicy already permits →postgres:5432. But the postgres ingress `from` list allowed api / api-rs / slackbotv2 / slackbot / iron-control — not the discordbot — so on a NetworkPolicy-enforcing cluster the bot crash-looped at startup with `ECONNREFUSED 5432`. Add a discordbot podSelector to the postgres ingress, gated on `.Values.discordbot.enabled` and mirroring the slackbotv2 block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… max LONG_RUNNING_MS was 365*24*60*60*1000 (31_536_000_000), passed as the Gateway listener's self-destruct durationMs. The adapter backs it with a single setTimeout, whose delay is a 32-bit signed int: any value above 2^31-1 ms silently clamps to 1ms (Node/Bun logs `TimeoutOverflowWarning ... set to 1`). The "stay connected" timer fired almost immediately — the bot connected, logged "duration elapsed, disconnecting", and exited via onFatalEnd, crash-looping the pod. Cap at 2_147_483_647 (~24.8 days), the largest delay setTimeout can represent. discord.js holds one session via RESUME within the window; the timer then forces at most one re-IDENTIFY per ~24.8 days, well under the 1000/24h budget. A re-arm loop is avoided: the adapter can't tell timer-expiry from a fatal login error, so looping would mask a bad token into an infinite reconnect. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The chat SDK posts a bare "..." while the agent works before any streamed content. Set ChatConfig.fallbackStreamingPlaceholderText to "✨ thinking..." (overridable via DiscordbotOptions.streamingPlaceholderText). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…stream The placeholder only appeared after the ~9s cold sandbox spin-up because the discordbot awaited the execute call (which blocks on sandbox+tool-server readiness) before starting the render that posts the placeholder. Do create+append only up front (fast), then run executeSession INSIDE the render stream — after yielding the placeholder. The user now sees "✨ thinking..." in ~0.3s while the sandbox spins up. executeSession is idempotent (idempotency_key = message id), so a render retry won't re-spawn; sandbox-spawn failures surface as an error in the same message (api-rs writes no event if the spawn itself fails, so this avoids a hung placeholder). The activeExecution guard is still set synchronously under the per-thread lock before execute, so double-execution protection is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Sync pass against slackbotv2 @ e391f11 (3-way merge per file): - Port paradigmxyz#416/paradigmxyz#418: discordSafeChatSdkStream omits task_update output and truncates details to 500 chars before streaming. - Port the thread_not_found tolerance in collectInitialContext with a Discord-shaped guard (NetworkError carrying "Unknown Channel"/10003). Deliberate deltas kept (commented in-code): the synthetic starting notification and no streamAfterFirstChunk deferral — both serve the instant "✨ thinking..." placeholder, where slackbotv2 instead posts nothing until the first visible chunk (paradigmxyz#406/paradigmxyz#415). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

api-rs sandboxes had no tools and no overlay. Give api-rs-spawned agents the same base + overlay tools and overlay system-prompt the chart already wires for the api-rs pod, using upstream's CLI-shim tool model rather than a sidecar. Upstream direction: tools are shell CLI shims, not an HTTP registry. The agent image's install-tool-shims (services/sandbox/install_tool_shims.py) scans TOOL_DIRS at entrypoint and `uvx`-installs each pyproject [project.scripts] as a CLI; the SYSTEM_PROMPT points agents at those CLIs and `centaur-tools list`. The old `call <tool>` HTTP registry is deprecated to control-plane-only. Tool secrets are already handled upstream: codex_app_server_env_template pushes the tool placeholder creds onto the agent env, iron-control grants the per-sandbox principal the real secrets, and Postgres rides proxied `*_DSN` env from apply_proxy_env. So the agent needs only the tool SOURCES at the right paths — no sidecar, no HMAC sandbox token, no loopback tool server. - tools.rs (replaces tool_server.rs): a `tools-bootstrap` init container copies /app/tools out of the shared centaur-api image into an emptyDir mounted at /app/tools in the agent, and an `overlay-bootstrap` init container copies the org overlay tree into overlay-root mounted at overlay.mountPath (the same path the api-rs Deployment uses) and stages the overlay's SYSTEM_PROMPT.md as $HOME/AGENTS_OVERLAY.md, which the sandbox entrypoint appends to the base prompt. TOOL_DIRS is set on the agent env to /app/tools (or /app/tools:<mountPath>/tools with the overlay) — identical to the value the api-rs pod computes for its own tool discovery, set deterministically in the spec builder rather than via passthrough env. - lib.rs: build_agent_sandbox layers the tools/overlay env over spec.env, mounts the bootstrapped sources read-only into the agent, and appends the tools-bootstrap + overlay-bootstrap init containers and their volumes. No sidecar container, no token minting. - args.rs: a minimal ToolsArgs (source image/pull-policy, reusing the KUBERNETES_TOOL_SERVER_IMAGE* env the chart sets from the shared api image) and OverlayArgs (image/pull-policy/source-path/mount-path) wired into AgentSandboxConfig. Explicit clap arg ids avoid id collisions with the other flattened arg structs. - chart apirs.yaml: render the tools source image (api.image.*, gated on toolServer.enabled) and overlay (overlay.*) onto the api-rs env, replacing the KUBERNETES_TOOL_SERVER_* sidecar block. Gone vs the sidecar port: tool_server.rs, the sbx1 HMAC token minting and its SANDBOX_SIGNING_KEY requirement, CENTAUR_TOOLS_URL, the sidecar pg-DSN/proxy-env collection, and the hmac/base64/sha2 dependency additions (nothing else in the agent-k8s crate uses them). Warm-pool sandboxes route through the same build_agent_sandbox path, so they get the tools/overlay init containers and volumes for free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…eded The .npmrc public-hoist-pattern (and discordbot's direct tslib dep) worked around Bun failing to resolve tslib from discord.js under pnpm's strict layout. On the current lockfile discord.js@14.26.4 declares tslib properly, so pnpm nests a copy right next to it and Bun resolves it without help. Verified both locally and in the production image (build + import discord.js in the container) with the hoist and the direct dep removed. Removes the only workspace-wide file the discordbot work touched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…igmxyz#422 port) Sync pass against slackbotv2 @ 17882c4: thread executionId through ForwardSessionInput into the events URL so streams only carry the turn they're rendering (set in-stream after execute returns, and from the stored obligation on recovery — the obligation already tracked it). Also adopt the fixed-point truncation helper. Not ported (deliberate delta): the oversized-render plain-text fallback — the Discord adapter hard-truncates content at the 2000-char limit on every outgoing payload, so Slack's msg_too_long failure mode cannot occur here. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…aradigmxyz#404 port) Own the pg pool so an error handler can swallow idle-client drops (Postgres restart / startup network races) instead of crashing the process, and block obligation recovery on a backoff-retried first connect — same rationale as upstream, Discord trace names. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Sync with slackbotv2 @ 02db3eb: command-execution tasks omit their details from the stream entirely instead of truncating them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Sync with slackbotv2 @ f9fcb5e: messages asking for plain text ("plain text only", "no interactive blocks", "no dashboards") drain the stream silently and post one final text message — captured terminal result text first, accumulated markdown as fallback. Discord-flavored: typing keepalive instead of assistant status, and the final text pre-truncates to fit Discord's 2000-char content cap with an honest suffix. Brings in the render collector class this needed (the msg_too_long fallback path it was built for remains unported — the Discord adapter hard-truncates, so that failure mode can't occur). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…gent A thread created from a message keeps that message in the parent channel (the thread shares its ID), so the thread-history walk in collectInitialContext never sees it — Slack has no analog because conversations.replies returns the parent as the first reply. Fetch the starter from the parent channel (mirroring discord.js ThreadChannel#fetchStarterMessage) and prepend it to the initial context. Webhook-style messages (Sentry alerts) carry their payload entirely in embeds with empty content, which the chat adapter drops; flatten embed text into the forwarded message so the agent can read them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The tools-bootstrap init container mounted the tools emptyDir at /app/tools — the same path it copies FROM. The mount shadows the source image's tools tree, so the script self-copies the empty volume and GNU cp rejects it (exit 1); every sandbox dies with 'reached terminal state before running' and no agent ever starts. Mount the volume at /tools-bootstrap instead (mirroring how overlay-bootstrap stages to a distinct target) and copy the image's /app/tools into it. The agent container keeps mounting the same volume at /app/tools, so TOOL_DIRS and the shim installer are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sage Discord's post+edit streaming dropped every task chunk, so runs showed a bare placeholder that was later overwritten by the answer — no chain of thought, and the answer sorted above messages sent mid-run. Runs now post a progress message that's edited in place with a step timeline (reasoning excerpts, commands, tool calls) and finalized as a permanent record, while the answer streams into a separate message created on first visible text. Drops the now-subsumed task-chunk stripping (the renderer truncates its own previews instead). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The edited step timeline felt heavy-handed in practice. Replace it with fully append-only narration: the triggering message gets an instant 👀 reaction (flipped to ✅/❌ on settle), and the agent's reasoning posts as its own italic messages as each thought completes — commands and tools render nothing. Reactions go through raw Discord REST since the adapter can't reach thread-starter messages, which live in the parent channel. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

0xdiid force-pushed the feat/discord-chat-ingress branch 4 times, most recently from fc0e56f to a55a343 Compare June 5, 2026 17:48

0xdiid and others added 8 commits June 5, 2026 13:46

0xdiid force-pushed the feat/discord-chat-ingress branch from a55a343 to 79dd0e9 Compare June 5, 2026 19:48

0xdiid and others added 6 commits June 5, 2026 20:22

fix(discordbot): drop details from command-execution task chunks (port)

7f5311c

Sync with slackbotv2 @ 02db3eb: command-execution tasks omit their details from the stream entirely instead of truncating them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

0xdiid force-pushed the feat/discord-chat-ingress branch from 79dd0e9 to a23bb65 Compare June 6, 2026 02:25

0xdiid and others added 4 commits June 5, 2026 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Discord chat ingress (discordbot)#401

feat: Discord chat ingress (discordbot)#401
0xdiid wants to merge 18 commits into
paradigmxyz:api-rs-control-planefrom
0xSplits:feat/discord-chat-ingress

0xdiid commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xdiid commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Non-Discord changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0xdiid commented Jun 4, 2026 •

edited

Loading