This repository keeps internal regression and external interoperability experiments separate on purpose.
./scripts/doctor.shremains the primary internal regression entrypoint../scripts/conformance.shis a local/manual experiment entrypoint for official external tooling.- External conformance output should be treated as investigation input, not as an automatic merge gate.
The default ./scripts/conformance.sh workflow does the following:
- Sync the repository environment unless explicitly skipped.
- Cache or refresh the official
a2aproject/a2a-tckcheckout. - Start a local dummy-backed
opencode-a2aruntime unlessCONFORMANCE_SUT_URLpoints to an existing SUT. - Run the requested TCK category, defaulting to
mandatory. - Preserve raw logs and machine-readable reports under
run/conformance/<timestamp>/.
The default local SUT uses the repository test double DummyChatOpencodeUpstreamClient. That keeps the experiment reproducible without requiring a live OpenCode upstream.
Run the default mandatory experiment:
bash ./scripts/conformance.shRun a different TCK category:
bash ./scripts/conformance.sh capabilitiesTarget an already running runtime instead of the local dummy-backed SUT:
CONFORMANCE_SUT_URL=http://127.0.0.1:8000 \
A2A_AUTH_TYPE=bearer \
A2A_AUTH_TOKEN=dev-token \
bash ./scripts/conformance.sh mandatoryEach run keeps the following artifacts in the selected output directory:
agent-card.json: fetched public Agent Cardhealth.json: fetched authenticated health payload when the local SUT is usedtck.log: raw TCK console outputpytest-report.json: pytest-json-report output emitted by the TCK runnerfailed-tests.json: compact list of failed/error node IDs for triagemetadata.json: experiment metadata including local repo commit and cached TCK commit
When a TCK run fails, inspect the raw report before changing the runtime:
- Some failures may point to real runtime gaps.
- Some failures may come from TCK assumptions that do not match
a2a-sdk==0.3.25. - Some failures may come from A2A v0.3 versus v1.0 naming or schema drift.
The experiment is useful only if those categories stay separate during triage.
The current first-pass triage is recorded in ./conformance-triage.md.