Skip to content

Add overview baseline comparison#203

Merged
luoyuctl merged 1 commit into
masterfrom
quality/200-baseline-comparison
May 17, 2026
Merged

Add overview baseline comparison#203
luoyuctl merged 1 commit into
masterfrom
quality/200-baseline-comparison

Conversation

@luoyuctl
Copy link
Copy Markdown
Owner

Summary

  • add --baseline support for --overview -f json with explicit duration, cost, and token delta thresholds
  • expose deterministic overview comparison fields for failure families, tool/file surface, and high-authority tool use
  • extend CI output-contract, deterministic-output, and report-semantics checks for baseline comparison

Closes #200

Test plan

  • go test ./...
  • go build -o /tmp/agenttrace-quality-200 ./cmd/agenttrace
  • AGENTTRACE_BIN=/tmp/agenttrace-quality-200 AGENTTRACE_CI_OUT=/tmp/agenttrace-quality-200-ci scripts/ci/check-output-contract.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-quality-200 AGENTTRACE_CI_OUT=/tmp/agenttrace-quality-200-ci scripts/ci/check-deterministic-output.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-quality-200 AGENTTRACE_CI_OUT=/tmp/agenttrace-quality-200-ci scripts/ci/check-report-semantics.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-quality-200 AGENTTRACE_CI_OUT=/tmp/agenttrace-quality-200-ci scripts/ci/check-docs-commands.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-quality-200 AGENTTRACE_CI_OUT=/tmp/agenttrace-quality-200-ci scripts/ci/check-release-surfaces.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-quality-200 AGENTTRACE_CI_OUT=/tmp/agenttrace-quality-200-ci scripts/ci/check-pages-artifact.sh site
  • /tmp/agenttrace-quality-200 --doctor || true
  • /tmp/agenttrace-quality-200 --demo --overview -f json --baseline /tmp/agenttrace-quality-200-ci/agenttrace-demo.json | node -e 'const fs=require("fs"); const r=JSON.parse(fs.readFileSync(0,"utf8")); if(!r.baseline_comparison) throw new Error("missing baseline_comparison"); console.log(JSON.stringify({duration_delta_pct:r.baseline_comparison.duration_delta_pct,cost_delta_pct:r.baseline_comparison.cost_delta_pct,token_delta_pct:r.baseline_comparison.token_delta_pct,broader_tool_surface:r.baseline_comparison.broader_tool_surface}))'

@luoyuctl luoyuctl added lane/quality Reliability, tests, and diagnostics status/ready-for-review Ready for review labels May 17, 2026
@luoyuctl luoyuctl merged commit 3fd21d3 into master May 17, 2026
1 check passed
@luoyuctl luoyuctl deleted the quality/200-baseline-comparison branch May 17, 2026 09:08
@luoyuctl
Copy link
Copy Markdown
Owner Author

Event: Validation
Actor: Product
Scope: PR #203 / Issue #200
State change: no label change; keep status/ready-for-review
Evidence: GitHub CI Test and build is SUCCESS and mergeStateStatus is CLEAN. Local validation on the PR head also passed: go test ./..., go build -o /tmp/agenttrace-product-pr203 ./cmd/agenttrace, output-contract, deterministic-output, report-semantics, docs-commands, release-surfaces, pages-artifact, --doctor, and --demo --overview -f json --baseline <baseline>.
Next owner: Quality / Maintainer

Product acceptance notes:

Decision: Product acceptance PASS for the #200 user-facing CI regression-comparison slice. Keep this in review/merge consideration; no release action is authorized by this validation.

@luoyuctl
Copy link
Copy Markdown
Owner Author

Quality Gatekeeper Review

Verdict: PASS
Risk: Medium
Lane: quality
Checks:

  • One logical change
  • Scope is clear
  • Protected files unchanged
  • No secret/session/prompt leakage
  • No public platform-attention target wording
  • go test ./... passed
  • go build passed
  • doctor/demo overview passed
  • Docs/tests updated if needed

Notes:

  • GitHub mergeStateStatus observed as UNKNOWN; PR state is now MERGED with merge commit 3fd21d35e9a8959b26fbf199eb934870f3094874.
  • GitHub CI Test and build: SUCCESS.
  • Changed files: cmd/agenttrace/main.go, docs/ci-integration.md, internal/engine/baseline.go, internal/engine/engine.go, internal/engine/engine_test.go, internal/engine/report.go, and CI scripts under scripts/ci/.
  • Risk is Medium because this extends JSON report contract and CI validation, but stays local-only and does not touch release/package/protected files.

Validated commands:

  • go test ./...
  • go build -o /tmp/agenttrace-review-pr-203 ./cmd/agenttrace
  • /tmp/agenttrace-review-pr-203 --doctor || true
  • /tmp/agenttrace-review-pr-203 --demo --overview -f json
  • AGENTTRACE_BIN=/tmp/agenttrace-review-pr-203 AGENTTRACE_CI_OUT=/tmp/agenttrace-review-pr-203-ci scripts/ci/check-output-contract.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-review-pr-203 AGENTTRACE_CI_OUT=/tmp/agenttrace-review-pr-203-ci scripts/ci/check-deterministic-output.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-review-pr-203 AGENTTRACE_CI_OUT=/tmp/agenttrace-review-pr-203-ci scripts/ci/check-report-semantics.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-review-pr-203 AGENTTRACE_CI_OUT=/tmp/agenttrace-review-pr-203-ci scripts/ci/check-release-surfaces.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-review-pr-203 AGENTTRACE_CI_OUT=/tmp/agenttrace-review-pr-203-ci scripts/ci/check-docs-commands.sh
  • AGENTTRACE_BIN=/tmp/agenttrace-review-pr-203 AGENTTRACE_CI_OUT=/tmp/agenttrace-review-pr-203-ci scripts/ci/check-pages-artifact.sh site
  • /tmp/agenttrace-review-pr-203 --demo --overview -f json --baseline /tmp/agenttrace-review-pr-203-ci/agenttrace-demo.json

Decision: PASS. The implementation satisfies #200 with explicit baseline thresholds, deterministic comparison fields, local-only artifacts, and preserved CI/report semantics. Final label state observed: status/auto-merged.

@luoyuctl
Copy link
Copy Markdown
Owner Author

Event: Validation
Actor: Product
Scope: PR #203 post-merge
State change: status/ready-for-review -> status/auto-merged
Evidence: Revalidated master at merge commit 3fd21d3 after #203 landed. Passed go test ./..., go build -o /tmp/agenttrace-product-check ./cmd/agenttrace, output-contract, deterministic-output, report-semantics, docs-commands, release-surfaces, pages-artifact, --doctor, and demo JSON baseline self-compare.
Next owner: Product / Growth

Decision: Post-merge validation PASS. #200 is closed with status/auto-merged. Follow-up remains #201 for shared authority category classification; #203 only seeds high-authority comparison as nullable until that contract lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lane/quality Reliability, tests, and diagnostics status/auto-merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add local baseline comparison for CI agent workflow regressions

1 participant