External Conformance Experiments

This repository keeps internal regression and external interoperability experiments separate on purpose.

Scope

./scripts/doctor.sh remains the primary internal regression entrypoint.
./scripts/conformance.sh is a local/manual experiment entrypoint for official external tooling.
External conformance output should be treated as investigation input, not as an automatic merge gate.

Current Experiment Shape

The default ./scripts/conformance.sh workflow does the following:

Sync the repository environment unless explicitly skipped.
Cache or refresh the official a2aproject/a2a-tck checkout.
Start a local dummy-backed opencode-a2a runtime unless CONFORMANCE_SUT_URL points to an existing SUT.
Run the requested TCK category, defaulting to mandatory.
Preserve raw logs and machine-readable reports under run/conformance/<timestamp>/.

The default local SUT uses the repository test double DummyChatOpencodeUpstreamClient. That keeps the experiment reproducible without requiring a live OpenCode upstream.

Usage

Run the default mandatory experiment:

bash ./scripts/conformance.sh

Run a different TCK category:

bash ./scripts/conformance.sh capabilities

Target an already running runtime instead of the local dummy-backed SUT:

CONFORMANCE_SUT_URL=http://127.0.0.1:8000 \
A2A_AUTH_TYPE=bearer \
A2A_AUTH_TOKEN=dev-token \
bash ./scripts/conformance.sh mandatory

Artifacts

Each run keeps the following artifacts in the selected output directory:

agent-card.json: fetched public Agent Card
health.json: fetched authenticated health payload when the local SUT is used
tck.log: raw TCK console output
pytest-report.json: pytest-json-report output emitted by the TCK runner
failed-tests.json: compact list of failed/error node IDs for triage
metadata.json: experiment metadata including local repo commit and cached TCK commit

Interpretation Guidance

When a TCK run fails, inspect the raw report before changing the runtime:

Some failures may point to real runtime gaps.
Some failures may come from TCK assumptions that do not match a2a-sdk==0.3.25.
Some failures may come from A2A v0.3 versus v1.0 naming or schema drift.

The experiment is useful only if those categories stay separate during triage.

The current first-pass triage is recorded in ./conformance-triage.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External Conformance Experiments

Scope

Current Experiment Shape

Usage

Artifacts

Interpretation Guidance

FilesExpand file tree

conformance.md

Latest commit

History

conformance.md

File metadata and controls

External Conformance Experiments

Scope

Current Experiment Shape

Usage

Artifacts

Interpretation Guidance