Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/add.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# worker/add — install a worker

Install from the iii registry or an OCI image. Writes the entry to `iii.config.yaml`, caches the artifact under `~/.iii/managed/{name}/`, and pins the resolved version in `iii.lock`. Calling twice yields the same outcome.

- id: `worker::add`
- timeout: 600s (registry pull + binary fetch + ready wait)
- idempotent: yes
- request: `AddOptions { source, force?, reset_config?, wait? }`
- response: `AddOutcome { name, version?, status, awaited_ready, config_path }`

`source` variants:
- `{ "kind": "registry", "name": "image-resize", "version": "0.1.2" }` — registry slug, optional pinned semver.
- `{ "kind": "oci", "reference": "ghcr.io/iii-hq/node:latest" }` — full OCI ref.
- `{ "kind": "local", "path": "./..." }` — **CLI only**; over the trigger this returns **W102**.

`status` values: `installed` (new), `already_current` (lockfile match), `repaired` (cache was corrupt), `replaced` (different version was installed before).

## Example

```json
{
"source": { "kind": "registry", "name": "image-resize", "version": "0.1.2" },
"force": false,
"reset_config": false,
"wait": true
}
```

## Errors

- **W101** missing/malformed `source`.
- **W102** local path via trigger.
- **W110** worker name not in registry.
- **W900** OCI pull, network, or filesystem failure (see `details.message`).
28 changes: 28 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/clear.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# worker/clear — wipe cached worker artifacts

Delete cached artifacts under `~/.iii/managed/{name}/`. The worker's entry in `iii.config.yaml` and `iii.lock` is **not** touched — call [`worker/remove`](iii://worker/remove) for that. Useful after `worker/remove` to reclaim disk, or before `worker/add --force` to force a clean re-download.

- id: `worker::clear`
- timeout: 30s
- idempotent: yes (second call clears 0 bytes)
- request: `ClearOptions { names: [], all?, yes }`
- response: `ClearOutcome { cleared_bytes }`

Same target rules as [`worker/remove`](iii://worker/remove): either `names` *or* `all: true`, never both, never neither, always `yes: true`.

## Example

```json
{ "all": true, "yes": true }
```

Response:
```json
{ "cleared_bytes": 812727797 }
```

## Errors

- **W103** target missing or ambiguous.
- **W104** consent missing.
- **W900** filesystem failure (rare).
37 changes: 37 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/list.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# worker/list — list workers and run state

Read the union of every worker in `iii.config.yaml`, every artifact on disk under `~/.iii/managed/`, and every running process.

- id: `worker::list`
- timeout: 10s
- idempotent: yes (pure read)
- request: `ListOptions { running_only? }`
- response: `ListOutcome { workers: [WorkerEntry] }`

`WorkerEntry`:
- `name` — config name.
- `version` — string for registry-tracked workers; **omitted from JSON** for engine builtins that aren't lock-tracked.
- `running` — bool.
- `pid` — u32 when discoverable via ps; `null` for engine builtins.

## Example

Request:
```json
{ "running_only": false }
```

Response:
```json
{
"workers": [
{ "name": "iii-stream", "running": true, "pid": null },
{ "name": "image-resize", "running": true, "pid": 37037, "version": "0.1.2" },
{ "name": "skills", "running": true, "pid": 37036, "version": "0.2.4" }
]
}
```

## Errors

- **W900** filesystem read failure (rare — config/lock both unreadable).
25 changes: 25 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/remove.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# worker/remove — uninstall workers

Remove a worker's entry from `iii.config.yaml`. The engine's file watcher tears down any running sandbox. Cached artifacts under `~/.iii/managed/{name}/` are **not** deleted — call [`worker/clear`](iii://worker/clear) for that.

- id: `worker::remove`
- timeout: 30s
- idempotent: yes
- request: `RemoveOptions { names: [], all?, yes }`
- response: `RemoveOutcome { removed: [string] }`

Either explicit `names` *or* `all: true`, never both, never neither. `yes: true` is always required.

## Example

```json
{ "names": ["image-resize"], "yes": true }
```

Also valid: `{ "names": ["image-resize", "skills"], "yes": true }`, `{ "all": true, "yes": true }`.

## Errors

- **W103** `names` empty and `all` unset, or both set.
- **W104** `yes` is not `true`.
- **W900** filesystem failure writing `iii.config.yaml`.
29 changes: 29 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/router.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# worker — install and manage workers

Owned by the `iii-worker-ops` daemon (auto-spawned as an engine sidecar). Every op below is also callable as `iii worker <cmd>` on the CLI; the trigger surface is the SDK path.

## Operations

- [`worker/add`](iii://worker/add) — install from registry or OCI reference
- [`worker/remove`](iii://worker/remove) — uninstall (consent required)
- [`worker/update`](iii://worker/update) — re-resolve registry versions
- [`worker/start`](iii://worker/start) — start a configured worker
- [`worker/stop`](iii://worker/stop) — stop a running worker (consent required)
- [`worker/list`](iii://worker/list) — installed + run state + versions
- [`worker/clear`](iii://worker/clear) — wipe cached artifacts (consent required)
- [`worker/schema`](iii://worker/schema) — discover request/response shapes

Live data: [`iii://fn/worker/schema`](iii://fn/worker/schema) returns the full JSON Schema for all 8 ops, plus per-op `default_timeout_ms` and `idempotent` hints.

## Error envelope

All ops return errors as `{ "type": "WorkerOpError", "code": "Wxxx", "message": "...", "details": {...} }`.

Codes:
- **W100** InvalidName — name doesn't match `[a-z0-9_-]{1,64}`.
- **W101** InvalidSource — required field missing or malformed.
- **W102** LocalPathNotAllowedViaTrigger — `kind: "local"` is CLI-only.
- **W103** MissingTarget — empty `names` + `all` unset, or both set.
- **W104** ConsentRequired — destructive op needs `yes: true`.
- **W110** NotFound — worker not in registry / not installed.
- **W900** Internal — unexpected failure.
30 changes: 30 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# worker/schema — discover trigger schemas

Introspect the JSON Schemas for every `worker::*` trigger. Each entry carries field descriptions, defaults, types, plus per-op `default_timeout_ms` and `idempotent` hints. Construct payloads from this without source-diving.

- id: `worker::schema`
- timeout: 10s
- idempotent: yes (pure read)
- request: `SchemaRequest { function_id? }`
- response: `SchemaResponse { schemas: [SchemaEntry] }`

## Example

Omit `function_id` to list all 8 ops:
```json
{}
```

Each `SchemaEntry`:
```json
{
"function_id": "worker::add",
"description": "Install a worker from registry name or OCI ref",
"request": { "...": "JSON Schema for AddOptions" },
"response": { "...": "JSON Schema for AddOutcome" },
"default_timeout_ms": 600000,
"idempotent": true
}
```

Also reachable inline as a section URI: [`iii://fn/worker/schema`](iii://fn/worker/schema).
33 changes: 33 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# worker/start — start a configured worker

Spawn a worker that's already in `iii.config.yaml`. The engine connects to the process over its WebSocket port and waits for the ready signal.

- id: `worker::start`
- timeout: 60s
- idempotent: no (stateful — starting an already-running worker is a no-op or error depending on health)
- request: `StartOptions { name, port?, config?, wait? }`
- response: `StartOutcome { name, pid?, port? }`

- `name` — installed worker name.
- `port` — override the engine WS port (default = engine's `iii-worker-manager` port).
- `config` — YAML config file forwarded as `--config <path>`. Binary workers only; OCI ignores it.
- `wait` — block until ready. Default `true`.

`pid` and `port` may be `null` for engine builtins that don't surface a process (e.g. `iii-stream`, `iii-http`).

## Example

```json
{ "name": "image-resize", "wait": true }
```

Response:
```json
{ "name": "image-resize", "pid": 12345, "port": 49134 }
```

## Errors

- **W100** invalid name.
- **W110** worker not installed.
- **W900** spawn failure, ready-wait timeout, or port-bind failure.
31 changes: 31 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/stop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# worker/stop — stop a running worker

Send a graceful shutdown signal. Destructive — requires explicit `yes: true` consent on the trigger surface. The CLI prompts interactively or accepts `-y`.

- id: `worker::stop`
- timeout: 30s
- idempotent: no (stateful)
- request: `StopOptions { name, yes }`
- response: `StopOutcome { name, stopped }`

`yes` must be exactly `true` — not `false`, not omitted, not the string `"true"`, not the number `1`. A slip in caller code should not silently kill a worker.

`stopped: false` means the stop didn't take effect within the daemon's grace window. Retry, or verify with [`worker/list`](iii://worker/list).

## Example

```json
{ "name": "image-resize", "yes": true }
```

Response:
```json
{ "name": "image-resize", "stopped": true }
```

## Errors

- **W100** invalid worker name (shell metacharacters, empty, > 64 chars).
- **W104** `yes` not `true`.
- **W110** worker name unknown.
- **W900** signal failure or grace-window timeout.
33 changes: 33 additions & 0 deletions crates/iii-worker/src/cli/worker_manager_daemon/skills/update.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# worker/update — re-resolve registry versions

Re-resolve each named worker against the registry, download newer artifacts if available, and rewrite `iii.lock`. Configs in `iii.config.yaml` are preserved.

- id: `worker::update`
- timeout: 600s
- idempotent: yes
- request: `UpdateOptions { names: [] }`
- response: `UpdateOutcome { updated: [{ name, from_version, to_version }] }`

`names: []` updates every installed registry-backed worker. `updated` contains one entry per worker that actually changed version; workers already at latest are omitted.

## Example

Request:
```json
{ "names": [] }
```

After a no-op:
```json
{ "updated": [] }
```

After a real change:
```json
{ "updated": [{ "name": "image-resize", "from_version": "0.1.2", "to_version": "0.1.3" }] }
```

## Errors

- **W110** name not installed.
- **W900** registry / network / filesystem failure.
45 changes: 45 additions & 0 deletions crates/iii-worker/src/sandbox_daemon/skills/create.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# sandbox/create — spin up a sandbox

Pull an image from the catalog and start a sandbox VM. Returns a UUID `sandbox_id` you reuse for [`sandbox/exec`](iii://sandbox/exec) and [`sandbox/stop`](iii://sandbox/stop).

- id: `sandbox::create`
- timeout: 600s (image pull dominates cold starts)
- idempotent: no (each call spawns a new sandbox)
- request: `CreateRequest { image, cpus?, memory_mb?, name?, network?, idle_timeout_secs?, env: [] }`
- response: `CreateResponse { sandbox_id, image }`

- `image` — preset (`python`, `node`) or full OCI ref in the catalog. Empty catalog denies every call (fail-closed).
- `env` — `Vec<String>` of `"K=V"` entries, NOT a map.
- `name` — optional human label, surfaced in [`sandbox/list`](iii://sandbox/list).

There is no `cwd` field; the workdir is fixed by the rootfs.

## Example

```json
{
"image": "ghcr.io/iii-hq/node:latest",
"env": ["NODE_ENV=production"],
"cpus": 2,
"memory_mb": 1024
}
```

Response:
```json
{
"sandbox_id": "550e8400-e29b-41d4-a716-446655440000",
"image": "ghcr.io/iii-hq/node:latest"
}
```

`sandbox_id` is a UUID, not the `sbx_*` opaque-id shape some other engines use.

## Errors

- **S001** invalid request (malformed `image`, missing required field).
- **S100** image not in catalog. Add it to the catalog or use a preset.
- **S101** rootfs missing on disk — run `iii worker add <image-ref>` first.
- **S102** auto-install failed (transient — retry).
- **S300** VM boot failed.
- **S400** resource limit hit (too many concurrent sandboxes).
49 changes: 49 additions & 0 deletions crates/iii-worker/src/sandbox_daemon/skills/exec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# sandbox/exec — run a command in a sandbox

Execute a command in a running sandbox and stream stdout/stderr back. Per-sandbox serialization: only one exec runs at a time — a concurrent call against the same `sandbox_id` returns **S003**. The call returns when the child exits, the timeout fires, or the VM becomes unreachable.

- id: `sandbox::exec`
- timeout: caller-set (pass `timeout_ms`)
- idempotent: no
- request: `ExecRequest { sandbox_id, cmd, args?, stdin?, env?, timeout_ms?, workdir? }`
- response: `ExecResponse { stdout, stderr, exit_code?, timed_out, duration_ms, success }`

- `cmd` is the binary as a single string, NOT an argv array.
- `args` is the argv tail (`Vec<String>`).
- `stdin` is base64-encoded bytes.
- `env` is `Vec<String>` of `"K=V"` entries.

## Example

```json
{
"sandbox_id": "550e8400-e29b-41d4-a716-446655440000",
"cmd": "bash",
"args": ["-lc", "echo hello && date"],
"env": ["FOO=bar"],
"workdir": "/workspace",
"timeout_ms": 30000
}
```

Response:
```json
{
"stdout": "hello\nMon Oct 13 19:42:11 UTC 2025\n",
"stderr": "",
"exit_code": 0,
"timed_out": false,
"duration_ms": 42,
"success": true
}
```

`success` is `true` iff `exit_code == 0` and `timed_out == false`. `exit_code: 127` means "command not found"; `126` means "not executable". Per POSIX, spawn failures surface in `exit_code`, NOT as an error envelope.

## Errors

- **S001** invalid request (bad `sandbox_id` UUID, missing `cmd`).
- **S002** sandbox not found.
- **S003** concurrent exec — await the previous one first.
- **S200** exec timed out (stdout/stderr captured pre-timeout are still returned).
- **S300** VM unreachable (boot failed earlier, or shell socket dropped).
Loading
Loading