Skip to content

Latest commit

 

History

History
301 lines (206 loc) · 10.8 KB

File metadata and controls

301 lines (206 loc) · 10.8 KB

scafld Agent Contract

Contract

  • spec is the living contract.
  • session is the durable evidence ledger.
  • handoff is transport, not source of truth.
  • review is the adversarial completion gate.

You execute autonomously inside the contract. You do not close the task unchallenged.

Commands

scafld init
scafld plan <task-id> --title "Title" --size small --risk low
scafld harden <task-id>
scafld harden <task-id> --mark-passed
scafld validate <task-id>
scafld approve <task-id>
scafld build <task-id>
scafld review <task-id>
scafld complete <task-id>
scafld status <task-id>
scafld list
scafld report
scafld handoff <task-id>
scafld update

For real review: scafld review <task-id> --provider {codex|claude|command}. --provider local is smoke-test only and cannot satisfy complete. Only an operator may use scafld review <task-id> --human-reviewed --reason ....

Source Checkout

Inside the scafld repo, use ./bin/scafld or go run ./cmd/scafld. Do not use a copied compiled binary; stale binaries can report old lifecycle state.

Lifecycle

plan -> harden -> approve -> build -> review -> complete

Hardening attacks the draft. Review attacks the result. Build opens one phase at a time. After implementing the opened phase, run scafld build <task-id> again to record evidence and advance.

Do Not

  • Edit outside declared scope, objectives, or invariants.
  • Reconstruct lifecycle state by scraping Markdown. Use status --json.
  • Mutate .scafld/core/ by hand. Use scafld update.
  • Run --provider local for real review.
  • Cite files, commands, or review findings you have not verified.

Prompts

.scafld/prompts/* overrides .scafld/core/prompts/* overrides built-ins.

runx OSS Agent Guide

Canonical reference for AI coding agents working in the runx OSS workspace. This repo uses scafld for non-trivial work, but the architecture rules are the runx rules in CONVENTIONS.md, docs/rust-kernel-architecture.md, and docs/trusted-kernel-package-truth.md.

Key files:

  • .scafld/config.yaml - Validation rules, rubric weights, safety controls, profiles
  • .scafld/prompts/plan.md - Planning mode prompt
  • .scafld/prompts/exec.md - Execution mode prompt
  • .scafld/core/schemas/spec.json - Spec validation schema
  • CONVENTIONS.md - Coding standards and patterns

How scafld Works

Spec-driven development: every non-trivial task becomes a machine-readable markdown specification before any code changes happen.

  1. Plan - Analyze task, explore codebase, generate spec in .scafld/specs/drafts/
  2. Review - Human reviews and approves the spec
  3. Build - Agent executes approved spec with validation
  4. Complete - Completed specs are marked through the scafld lifecycle

The spec is the contract. Operate autonomously within its bounds; pause for approval on deviations.

For detailed planning instructions, read .scafld/prompts/plan.md. For execution, read .scafld/prompts/exec.md.


Spec Status Lifecycle

draft → approved → review → completed
  ↓         ↓          ↓
(edit)   failed    cancelled

Valid transitions:

  • draftapprovedreviewcompleted
  • active work can move to failed or cancelled
  • blocked work must be recorded in the spec state and handoff

Architectural Invariants

These rules must not be violated. See config.yaml for the canonical invariant list.

Rust Trusted Runtime

Rust owns trusted local execution, receipt sealing, runtime policy, harness replay, MCP, payment gates, and sandbox planning. TypeScript packages may wrap or present those paths, but must not reintroduce local execution fallback logic.

Pure Kernel Boundaries

Pure crates and packages stay pure. runx-core, runx-contracts, runx-parser, and runx-receipts must not import filesystem, network, subprocess, CLI, adapter, or runtime concerns.

Stable Public Contracts

Public contract changes require a clean cutover through Rust-owned schemas and fixtures. Do not add compatibility aliases, .v2 ids, or dual-read runtime shims for governed wire shapes.

Generic Stateful Effects

Official skills that drive stateful hosted apps emit generic effect transition packets. Put product identity in effect_family and the runner/action in operation; do not add product-specific AuthorityResourceFamily variants or runx.<product>.* packet namespaces. Stateful app memory belongs in the hosted stateful-effect substrate and its declared reducers/views, not in OSS core enums or bespoke runtime branches.

No Legacy Fallbacks

No dual-reads, dual-writes, or runtime fallbacks. When changing schemas or identifiers, adopt the new scheme immediately. Use one-off migration scripts, not runtime code.

Loop Orchestration

Long-running agent workflows are loops over governed turns, not resident kernel loops. The loop host lives in an app, hosted service, local script, or external orchestrator. It owns scheduling, durable loop state, wakeups, projections, and stop policy. A runx turn is one skill or graph run with explicit inputs, authority, allowed_tools, optional context_skills, bounded model/tool rounds, approval gates, and one sealed receipt.

Handoffs are receipt-backed artifacts or tool-shaped results. Prior receipts and skill context are untrusted data for the next turn, not new authority. Do not add loop-specific authority families, packet namespaces, product branches, or schedulers to runx-core; build residency outside the kernel over ordinary runx submissions.

No Hardcoded Secrets

Configuration from environment or secrets management, never hardcoded. No secrets in code, logs, or diffs.

Test-Logic Separation

No test fixtures, mocks, or conditional test-only logic in production code. Test utilities stay in dedicated test helper modules.


Spec Management

Always use the scafld CLI for spec lifecycle management. Never manually move, copy, or rename spec files between directories. Never manually change the status field. The CLI enforces validation, state transitions, and the review gate — bypassing it breaks the audit trail.


Operating Modes

Planning Mode

  • When: Starting a new task, exploring requirements
  • Actions: Search, read, analyze (NO code changes outside .scafld/specs/)
  • Output: Markdown spec in .scafld/specs/drafts/ with status draft
  • Prompt: Read .scafld/prompts/plan.md before entering this mode

Execution Mode

  • When: Spec has status approved
  • Actions: Apply changes, run acceptance criteria, record scafld build evidence
  • Output: Code changes, validation results, updated spec
  • Prompt: Read .scafld/prompts/exec.md before entering this mode
  • Autonomy: Execute all phases without pausing unless blocked, deviating from spec, or hitting a destructive action not covered by spec

For trivial changes (typos, single-line fixes), skip the spec workflow and work directly.

Review Mode

  • When: Build has passed and status is review
  • Actions: Run scafld review, then scafld complete only after the native review gate passes
  • Output: Review verdict recorded in the spec and available through scafld status / scafld handoff
  • Prompt: Read .scafld/prompts/review.md before entering this mode
  • Mandate: Find problems, not confirm success. A review that finds zero issues still needs grounded evidence from the changed files, validation commands, and spec scope.

Validation

Validation profiles (light, standard, strict) and their check pipelines are defined in config.yaml. Agents select a profile based on task.acceptance.validation_profile or derive from task.risk_level (low→light, medium→standard, high→strict).

Per-phase: Run configured checks after each phase completes.

Pre-commit: Run full validation pipeline before marking task complete.

Self-evaluation: Score work on rubric (defined in config.yaml). Threshold is 7/10; perform second pass if below.


Safety Controls

Defined in config.yaml under safety. Key rules:

Require approval for: Schema migrations, public API changes, data deletion, production deployments.

Automatically prevent: Hardcoded secrets, unbounded queries, SQL injection, XSS vulnerabilities.


Coding Conventions

See CONVENTIONS.md for full coding standards. Key points:

  • Match existing code style; keep diffs focused
  • Prefer existing helpers; keep code DRY
  • Explicit named imports, no confusing aliases
  • Clear module ownership; split mixed responsibility files when boundaries are already visible in the code
  • Idempotent one-off migrations executed out of band, never hidden runtime compatibility paths

Git Commits

Only commit when explicitly asked by the user.

Format: type(scope): title (conventional commits)

Types: feat, fix, refactor, docs, test, chore, perf, style

Rules:

  • One logical change per commit
  • Title under 72 characters
  • Include what changed and why in the body
  • No unrelated edits bundled together
  • Pre-commit: code builds, tests pass, no secrets in diff, no debug code

Communication

Progress updates: Report phase completion, acceptance criteria pass/fail counts, next action. Keep it concise - no verbose preambles.

When blocked: State what's blocked, brief error, one recommendation, resolution options.

Final summary: Phases completed, acceptance results, self-evaluation score, deviations, files changed.


Quick Reference

Key Paths

Path Purpose
.scafld/config.yaml Validation, rubric, safety, profiles
.scafld/prompts/plan.md Planning mode instructions
.scafld/prompts/exec.md Execution mode instructions
.scafld/prompts/review.md Adversarial review mode instructions
.scafld/core/schemas/spec.json Spec JSON schema
.scafld/specs/ Task specs by lifecycle status
.scafld/runs/ Session ledger, diagnostics, and handoffs
CONVENTIONS.md Coding standards

Spec Lifecycle

# CLI (manages status, validation, file moves)
scafld plan <task-id>            # scaffold a markdown spec in drafts/
scafld list                      # show all specs
scafld status <task-id>          # show details + phase progress
scafld validate <task-id>        # check against schema
scafld approve <task-id>         # approve the draft spec
scafld build <task-id>           # run validation and move to review when checks pass
scafld exec <task-id>            # execute configured task actions when used
scafld review <task-id>          # run native review provider
scafld complete <task-id>        # record review verdict and complete the task
scafld handoff <task-id>         # render markdown handoff
scafld fail <task-id>            # mark failed
scafld cancel <task-id>          # mark cancelled
scafld report                    # aggregate stats across all specs