feat(testing): add baseline comparison for DSL run artifacts by AdityaShome · Pull Request #1558 · mofa-org/mofa

AdityaShome · 2026-04-01T15:07:47Z

Summary

This PRis rebased on #1556, #1555 and #1447, adds baseline comparison for DSL run artifacts on top of the canonical artifact path.

mofa test-dsl can now save the current run artifact as a baseline and compare future runs against a saved baseline artifact. This adds the first regression-oriented workflow for the DSL path without expanding into replay or CI gating yet.

Context

The previous step introduced a canonical AgentRunArtifact and optional artifact output via --artifact-out.

That established a stable execution artifact, but there was still no way to compare a current run against a saved baseline. This PR adds that comparison layer in a narrow MVP form.

This PR closes that gap by:

adding a baseline diff model for DSL run artifacts
comparing the current artifact against a saved baseline artifact
exposing baseline input/output through mofa test-dsl
surfacing compact match/mismatch output in the CLI

What Changed

Added AgentRunArtifactDiff and ArtifactDifference
Added AgentRunArtifact::compare_to(&baseline)
Added --baseline-in <file> to compare against a saved baseline artifact
Added --baseline-out <file> to write the current artifact as a baseline
Added compact CLI output for baseline matches and mismatches
Added unit and CLI integration coverage for baseline flows

Files Changed

Core Files

`tests/src/artifact.rs`

Added AgentRunArtifactDiff.
Added ArtifactDifference.
Added compare_to for MVP artifact comparison.

`tests/src/lib.rs`

Exported the baseline diff types from the root-level tests/ crate.

`crates/mofa-cli/src/cli.rs`

Added --baseline-in and --baseline-out for mofa test-dsl.
Added clap parse coverage for the new flags.

`crates/mofa-cli/src/commands/test_dsl.rs`

Loads a saved baseline artifact when --baseline-in is provided.
Compares the current artifact to the baseline artifact.
Prints baseline: matched or baseline: mismatch.
Prints compact difference: <field> lines when mismatches are detected.
Writes the current artifact as a baseline when --baseline-out is provided.

`crates/mofa-cli/src/main.rs`

Wired baseline flags through CLI dispatch.

`tests/tests/artifact_tests.rs`

Added focused artifact comparison tests for exact-match and mismatch cases.

Supporting Files

crates/mofa-cli/tests/test_dsl_integration_tests.rs: Added CLI integration coverage for baseline write and mismatch reporting.

Baseline Comparison Scope

The initial comparison is intentionally narrow and compares:

status
output_text
assertion signatures (kind and passed)
ordered tool call names

This PR does not yet compare:

durations or timing thresholds
workspace snapshots
session snapshots
LLM request/response payload details

CLI Usage

Examples:

mofa test-dsl tests/examples/simple_agent.toml --baseline-out dsl-baseline.json

mofa test-dsl tests/examples/tool_agent.toml --baseline-in dsl-baseline.json

Execution Flow

  flowchart TD
      A[TOML DSL file] --> B[mofa test-dsl]
      B --> C[Execute case]
      C --> D[Build current AgentRunArtifact]

      E[Saved baseline artifact] --> F[Load baseline]
      F --> G[Compare artifacts]
      D --> G

      G --> H[baseline matched]
      G --> I[baseline mismatch]
      I --> J[Difference fields]

      D --> K[Write artifact or baseline file]
      G --> L[CLI summary output]

Example Output

mofa test-dsl tests/examples/tool_agent.toml --baseline-in dsl-baseline.json

case: tool_agent_run
status: passed
output: Tool execution complete
tool_calls: echo_tool
duration_ms: 0
baseline: mismatch
difference: output_text
difference: tool_calls

Tests

cargo test -p mofa-testing --test artifact_tests
cargo test -p mofa-cli --test test_dsl_integration_tests
cargo test -p mofa-cli test_test_dsl_

Notes

This PR keeps baseline comparison intentionally small and artifact backed. It adds the first regression comparison loop for DSL runs while leaving deeper diff semantics, replay, and CI gating to later steps.

…traps support

… runner

…ests

AdityaShome added 14 commits March 23, 2026 19:41

tests: add real agent runner harness

00d3d55

Add agent runner test harness with workspace isolation and tool/boots…

aed5b97

…traps support

Add agent runner metadata, tool capture, and prompt customization

4543c4d

Add workspace snapshots, session assertions, and tool timing to agent…

dcdb83e

… runner

Add agent runner examples and session aware runner updates

4bf72fe

Normalize workspace snapshot paths for cross platform agent runner t…

5560ff7

…ests

Merge branch 'main' into agent-runner-testing

709f06e

feat(testing): add agent runner assertions

9efd792

feat(testing): add minimal TOML DSL adapter

f3768d3

feat(cli): add test dsl command for TOML agent test cases

b4d3d08

feat(cli): add report output for test dsl command

3dcceb5

feat(testing): add canonical run artifact for test dsl execution

c7099b9

feat(testing): add baseline comparison for DSL run artifacts

1e3b47c

test(testing): add missing bootstrap DSL fixture

23cc2e9

This was referenced Apr 2, 2026

feat(testing): add structured comparison output for test-dsl #1566

Open

feat(testing): add replay backed provider support for DSL runs #1581

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(testing): add baseline comparison for DSL run artifacts#1558

feat(testing): add baseline comparison for DSL run artifacts#1558
AdityaShome wants to merge 14 commits intomofa-org:mainfrom
AdityaShome:testing-baseline-comparison

AdityaShome commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AdityaShome commented Apr 1, 2026

Summary

Context

What Changed

Files Changed

Core Files

tests/src/artifact.rs

tests/src/lib.rs

crates/mofa-cli/src/cli.rs

crates/mofa-cli/src/commands/test_dsl.rs

crates/mofa-cli/src/main.rs

tests/tests/artifact_tests.rs

Supporting Files

Baseline Comparison Scope

CLI Usage

Execution Flow

Example Output

Tests

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`tests/src/artifact.rs`

`tests/src/lib.rs`

`crates/mofa-cli/src/cli.rs`

`crates/mofa-cli/src/commands/test_dsl.rs`

`crates/mofa-cli/src/main.rs`

`tests/tests/artifact_tests.rs`