Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
b1c3778
feat(api-rs): add sandbox core
Zygimantass May 31, 2026
0a35900
feat(api-rs): add local sandbox backend
Zygimantass May 31, 2026
700a733
feat(api-rs): add agent sandbox backend
Zygimantass May 31, 2026
61b5832
refactor(api-rs): parse sandbox test config with clap
Zygimantass Jun 1, 2026
3489b53
feat(api-rs): add session store
Zygimantass May 31, 2026
bb2c035
chore(api-rs): raise session db pool limit
Zygimantass Jun 1, 2026
1577e01
chore(api-rs): remove unused execution claim helper
Zygimantass Jun 1, 2026
29ab883
feat(api-rs): restrict session harness types
Zygimantass Jun 1, 2026
534bafb
refactor(api-rs): derive session harness serde
Zygimantass Jun 1, 2026
8475482
refactor(api-rs): derive session enum strings
Zygimantass Jun 1, 2026
df806a9
feat(api-rs): add session HTTP API
Zygimantass May 31, 2026
2613d2a
refactor(api-rs): parse server config with clap
Zygimantass Jun 1, 2026
cfd548b
fix(api-rs): render execution status with display
Zygimantass Jun 1, 2026
8c2bb93
feat(api-rs): add session CLI
Zygimantass May 31, 2026
7b3a98a
feat(api-rs): wire codex sandbox e2e path
Zygimantass May 31, 2026
91afb38
feat(api-rs): accept stdin session events in cli
Zygimantass May 31, 2026
648a297
feat(api-rs): add session cli tui
Zygimantass May 31, 2026
3b599b5
feat(api-rs): add sandbox core (#315)
Zygimantass Jun 1, 2026
03a2c1a
refactor: simplify sandbox interface
Zygimantass Jun 1, 2026
b7e5a0a
refactor(api-rs): use owned sandbox io streams (#334)
Zygimantass Jun 1, 2026
32d53dd
refactor(api-rs): wake session streams with postgres notify (#345)
Zygimantass Jun 1, 2026
d23a467
[codex] stack Chat SDK Slackbot on api-rs control plane (#346)
gakonst Jun 1, 2026
9e45022
fix: make sandbox tool CLIs importable (#351)
Zygimantass Jun 1, 2026
4162459
fix: preserve final Slack answers after plans (#352)
Zygimantass Jun 1, 2026
9ff2d85
fix: use upstream Chat SDK Slack streaming (#360)
goksu Jun 2, 2026
7b80aa1
docs: add local centaur dev skill
Zygimantass Jun 2, 2026
f6042db
[codex] add focused api-rs iron proxy integration (#365)
gakonst Jun 2, 2026
d346466
chore: trim sandbox image size (#371)
Zygimantass Jun 2, 2026
ee7ed96
feat: load iron proxy fragments from tool pyprojects (#372)
Zygimantass Jun 2, 2026
3e07479
fix: improve sandbox tool smoke behavior (#373)
Zygimantass Jun 2, 2026
3596bd8
[codex] pass sandbox tool paths and package tool CLIs (#354)
gakonst Jun 2, 2026
4630a78
feat(api-rs): wire api-rs and slackbotv2 into helm chart (#374)
mslipper Jun 2, 2026
f814ba7
fix(rendering): preserve full task output (#379)
goksu Jun 2, 2026
42c6ae1
fix(slackbotv2): harden api-rs handoff and stream recovery
Zygimantass Jun 3, 2026
f16d00b
fix(session): make Slack retries idempotent for append and execute (#…
Zygimantass Jun 3, 2026
e6b8e79
chore(sandbox): add thin test image (#389)
Zygimantass Jun 3, 2026
1dc7ba1
fix: render terminal slack session completions
gakonst Jun 3, 2026
1258b37
feat(api-rs): integrate iron-control for per-principal proxy credenti…
mslipper Jun 4, 2026
0f1790f
fix(api-rs): renumber duplicate 0003 migration to 0004 (#405)
mslipper Jun 4, 2026
a76adae
feat(api-rs): make warm pools deployable (#400)
Zygimantass Jun 4, 2026
56fd887
fix: remove slackbotv2 synthetic starting task (#406)
Zygimantass Jun 4, 2026
1cbb529
fix(api-rs): preserve session migration order (#407)
Zygimantass Jun 4, 2026
d08af3d
fix(api-rs): select idempotency_key in active/latest execution querie…
mslipper Jun 4, 2026
04b6024
fix(api-rs): return idempotency key on terminal updates (#410)
Zygimantass Jun 4, 2026
4b0efaf
fix(slackbotv2): render api-rs terminal result text (#413)
Zygimantass Jun 5, 2026
e433501
fix(api-rs): include terminal result text in completions (#412)
Zygimantass Jun 5, 2026
a4002ab
fix(slackbotv2): defer Slack stream until visible output (#415)
Zygimantass Jun 5, 2026
9a38a82
fix(slackbotv2): bound Slack task stream payloads (#416)
Zygimantass Jun 5, 2026
aaf6020
fix(slackbotv2): omit task output from Slack streams (#418)
Zygimantass Jun 5, 2026
e5b5493
fix: preserve final answer for textless turn completion (#421)
Zygimantass Jun 5, 2026
42d834a
fix(slackbotv2): scope session streams to execution (#422)
Zygimantass Jun 5, 2026
a636dea
fix(slackbotv2): recover from oversized Slack renders
Zygimantass Jun 5, 2026
6079661
feat(api-rs): manage broker credentials in iron-control, drop sidecar…
mslipper Jun 5, 2026
e0d4394
fix: fail sessions on oversized sandbox output
Zygimantass Jun 5, 2026
0eb0cef
fix(slackbotv2): honor plain text render requests
Zygimantass Jun 5, 2026
9872baf
fix(slackbotv2): show command details without output
Zygimantass Jun 7, 2026
8a5f18b
feat: update iron-proxy to 0.42.0-rc.8 with single multiplexed pg lis…
mslipper Jun 8, 2026
fefe549
feat(api-rs): persist session personas (#429)
goksu Jun 8, 2026
69f967f
feat(api-rs): add telemetry observability (#446)
Zygimantass Jun 8, 2026
fed07a7
fix(slackbotv2): pin Slack stream continuation fix (#453)
Zygimantass Jun 9, 2026
a5e3fb7
feat(api-rs): CloudWatch tool aws_auth via iron-control (#451)
mslipper Jun 9, 2026
274f988
fix(chart): repo-cache temp dir broken by k8s $$ collapse
mslipper Jun 9, 2026
4773f9a
Revert "fix(chart): repo-cache temp dir broken by k8s $$ collapse"
mslipper Jun 9, 2026
14025e0
fix(chart): repo-cache temp dir broken by k8s $$ collapse
mslipper Jun 9, 2026
ba3ad8f
fix(chart): keep repo-cache target local
Zygimantass Jun 9, 2026
c6068df
fix(slackbotv2): continue large task streams (#458)
Zygimantass Jun 9, 2026
5c21295
feat(api-rs): add Absurd workflow runtime (#465)
Zygimantass Jun 10, 2026
de5f623
fix(slackbotv2): suppress too-long fallback reposts (#466)
Zygimantass Jun 10, 2026
19a4c5f
fix(slackbotv2): avoid paging open task cards (#467)
Zygimantass Jun 10, 2026
94f9bf1
fix(slackbotv2): page Slack plans by visible tasks (#469)
Zygimantass Jun 10, 2026
1fecb32
fix(slackbotv2): accept Slack events route (#471)
Zygimantass Jun 10, 2026
eb0969b
fix(api-rs): raise stdout cap and disable service links (#473)
Zygimantass Jun 10, 2026
4c7ee51
fix: keep sandbox bootstrap noise out of the harness stdout stream (#…
Zygimantass Jun 10, 2026
d695348
fix: P0 review fixes for the Rust control plane (#344 review) (#472)
Zygimantass Jun 10, 2026
9621f24
fix: align Slack pagination with chat sdk and raise pg limits (#475)
Zygimantass Jun 10, 2026
60f7eac
feat(api-rs): serve tools + overlay to agent sandboxes (#443)
0xdiid Jun 10, 2026
d8a228d
Add Discord tool
Jun 11, 2026
412c9ad
Use Discord self-token client
Jun 11, 2026
4ae9094
fix(api-rs): Rust CI gate, constant-time webhook auth, error-chain ha…
Zygimantass Jun 11, 2026
5381e03
feat(slackbotv2): conflate render streams for slow Slack consumers (#…
Zygimantass Jun 11, 2026
e2b14f4
chore(sandbox): bump codex 0.130.0 -> 0.139.0 (#490)
Zygimantass Jun 11, 2026
41d36d6
Remove overlay images and refresh repo-cache tools/workflows
Zygimantass Jun 11, 2026
4a2a164
docs: add api-rs migration checklist (#493)
Zygimantass Jun 11, 2026
d6e3cd9
Add repo-cache extra tool sources
Zygimantass Jun 11, 2026
d66595e
fix(sandbox): disable Codex multi-agent tools (#499)
Zygimantass Jun 11, 2026
6fe82a2
feat: adopt orphaned executions and unwedge render recovery (#486)
Zygimantass Jun 11, 2026
64206a2
fix(api-rs): raise session body limits (#501)
akshaan Jun 11, 2026
d08c58c
fix(api-rs): wire harness OTLP export so Laminar traces carry cost (#…
Zygimantass Jun 11, 2026
5001072
fix(api-rs): materialize Codex attachments (#502)
akshaan Jun 11, 2026
67991f2
build(sandbox): bake pnpm 10.9.0 into the agent image (#504)
0xdiid Jun 11, 2026
7d72b41
fix(slackbotv2): survive Slack stream expiry and guarantee final-answ…
Zygimantass Jun 11, 2026
cff83e2
Add Slack feedback modal storage (#500)
decofe Jun 11, 2026
21d8a9f
chore(chart): carry repo-cache sync-sentinel readiness onto api-rs-co…
Zygimantass Jun 11, 2026
7809a27
fix(sandbox): deliver files via slack-upload instead of local-path li…
Zygimantass Jun 11, 2026
ac6b6a1
[codex] add Rust harness app server (#463)
gakonst Jun 11, 2026
2fd7daf
feat(slackbotv2): native feedback buttons on final answers (#508)
Zygimantass Jun 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions .agents/skills/harness-development/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
name: harness-development
description: "Add, modify, or debug Centaur harness-server backends in crates/harness-server. Use when adding support for a new harness CLI, changing Codex App Server V2 normalization, investigating Claude Code/Amp/Codex streaming or steering behavior, removing Python/TypeScript harness normalizers, or differentially testing real harness stdout/stderr against the shared App Server protocol."
---

# Harness Development

## Overview

Work in `crates/harness-server`. The Rust binary is the only normalization layer for sandbox harness output; do not add Python or TypeScript normalizers, and do not reintroduce per-client protocol shims outside this crate.

The target wire protocol is OpenAI Codex App Server V2. Prefer the pinned `codex-app-server-protocol` Rust types already in `Cargo.toml`; if a type is missing, add a small typed wrapper in Rust rather than passing unstructured JSON through the system.

## Implementation Workflow

1. Observe the native harness CLI before changing the wrapper. Run the real CLI with streaming stdin/stdout, feed hand-written NDJSON, and capture both stdout and stderr.
2. Identify the real process contract: startup args, stdin message shape, stdout event types, terminal event, session id, resume flag or id, multi-turn behavior, tool-use/tool-result shape, and steering behavior.
3. Add one module under `src/` for the backend, such as `src/<harness>.rs`, and implement `HarnessServer` from `src/traits.rs`.
4. Keep conversions inside that harness implementation. Prefer typed `serde` event enums plus explicit `From`/conversion helpers into `NormalizedEvent`; avoid generic `serde_json::Value` plumbing unless it is only at the parser boundary.
5. Wire the subcommand in `src/main.rs` and the dispatch in `src/lib.rs`/`src/server.rs`. The public CLI shape should stay `harness-server codex|claude-code|amp|<new-harness>`.
6. Add unit tests for stdin generation, steering generation, parser behavior, and representative event conversion. Add or extend ignored real-binary cargo tests when native behavior can only be proven with the CLI.

## Protocol Invariants

- The wrapper process stays alive across turns. Do not spawn the underlying harness once per user turn unless the native harness cannot support a live streaming process.
- `turn/start` and `turn/steer` must emit Codex V2 `userMessage` item started and completed events, then include those user-message items in the final `turn/completed` item list.
- Complete a turn only at the harness's real completion boundary. Claude Code completes on its `result` event. Amp's streaming process may not emit `result` until stdin closes, so complete live turns on assistant `end_turn` when that is the observed terminal boundary.
- Do not map steering to interruption. Steering appends a new user message to the active turn; interruption is cancellation and has different semantics.
- Claude Code steering uses another streaming user input message. Amp steering uses a streaming user input message with top-level `steer: true`. Codex uses App Server `turn/steer` natively.
- Resume must preserve the native session id or native resume token and must not silently create a fresh conversation when the caller expects continuity.
- Stdout from `harness-server` must be JSON-RPC/App Server JSON only. Harness stderr can be logged, but raw non-protocol lines must not leak on stdout.

## Native Probing

Use direct native probes when behavior is unclear. Save every stdin line and stdout/stderr line to a temp directory so the wrapped behavior can be compared later.

Claude Code streaming:

```bash
claude --print \
--input-format stream-json \
--output-format stream-json \
--verbose \
--include-partial-messages \
--dangerously-skip-permissions \
--permission-mode bypassPermissions \
--model "${CENTAUR_REAL_CLAUDE_MODEL:-sonnet}" \
--session-id "$(uuidgen | tr 'A-Z' 'a-z')"
```

Amp streaming:

```bash
amp --no-ide \
--no-notifications \
--no-color \
--dangerously-allow-all \
--execute \
--stream-json \
--stream-json-input \
--stream-json-thinking \
--mode "${AMP_MODE:-smart}"
```

For steering probes, start a long-running tool call, then send the native steering line before the tool finishes. Claude Code should receive a second `{"type":"user","message":...}` line. Amp should receive the same shape with top-level `"steer":true`.

## Differential Test Commands

Run Rust tests first:

```bash
cargo test --manifest-path crates/harness-server/Cargo.toml
```

Run real-harness comparisons through ignored cargo tests from the repo root. These tests spawn the actual harness binaries and may make network/auth calls.

```bash
cargo test --manifest-path crates/harness-server/Cargo.toml \
real_claude_code_long_streaming_is_anchored_to_native_cli \
-- --ignored --nocapture
```

```bash
cargo test --manifest-path crates/harness-server/Cargo.toml \
real_amp_long_streaming_is_anchored_to_native_cli \
-- --ignored --nocapture
```

```bash
cargo test --manifest-path crates/harness-server/Cargo.toml \
real_codex_long_streaming_uses_native_app_server_chunks \
-- --ignored --nocapture
```

Run steering and resume coverage across all real harnesses with:

```bash
cargo test --manifest-path crates/harness-server/Cargo.toml \
real_harnesses_basic_steer_and_resume \
-- --ignored --nocapture
```

Inspect the `--nocapture` stdout, not only the cargo summary. Look for non-JSON stdout, missing `item/completed`, stale final answers after steering, wrong thread or turn ids, lost session continuity on resume, duplicate assistant text, queued steer messages, and process restarts between turns.

## Done Criteria

Consider a harness change done only when:

- Unit tests pass.
- Real Claude Code, Amp, and Codex pass the ignored real-binary cargo tests for long streaming, steering, and multi-turn/resume unless the change is explicitly scoped to fewer harnesses.
- The logs show the exact stdout JSON-RPC stream and the native harness stderr/stdout observations explain any harness-specific branch.
- Python and TypeScript contain no custom harness output normalization for the changed path.
- Any native quirk is captured in the harness module or tests, not as tribal knowledge in the final response.
4 changes: 4 additions & 0 deletions .agents/skills/harness-development/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "Harness Development"
short_description: "Add and compare harness server backends"
default_prompt: "Use $harness-development to add or debug a harness-server backend with real differential tests."
Loading
Loading