Skip to content

feat(testing): add canonical run artifact for DSL test execution#1556

Open
AdityaShome wants to merge 13 commits intomofa-org:mainfrom
AdityaShome:testing-canonical-artifact
Open

feat(testing): add canonical run artifact for DSL test execution#1556
AdityaShome wants to merge 13 commits intomofa-org:mainfrom
AdityaShome:testing-canonical-artifact

Conversation

@AdityaShome
Copy link
Copy Markdown
Contributor

Summary

This PR is rebased on #1555 and #1447 and adds a canonical run artifact for mofa test-dsl execution and exposes it as optional JSON output via --artifact-out.

The goal is to move the DSL path onto a stable execution artifact instead of relying only on ad hoc runner fields and formatter-specific report generation. This keeps the DSL thin while establishing the output contract needed for later baseline comparison, replay-oriented workflows, and CI artifact upload.

Context

The current DSL and CLI path can execute test cases and emit reports, but it does not yet have one stable, serializable run artifact that represents a case execution end to end.

in /tmp/dsl-artifact.json saved file:
Screenshot from 2026-04-01 07-50-08
image

This PR closes that gap by:

  • defining a canonical artifact shape for DSL backed agent runs
  • building that artifact from the existing AgentRunResult
  • separating test-case execution from assertion evaluation so assertion outcomes can be captured explicitly
  • allowing mofa test-dsl to write the artifact as JSON

What Changed

  • Added a canonical AgentRunArtifact in the root-level tests/ crate
  • Added serializable nested artifact types for:
    • agent identity
    • tool calls
    • LLM request/response
    • session snapshot
    • workspace snapshots
    • assertion outcomes
  • Split DSL execution from assertion evaluation
  • Added --artifact-out to mofa test-dsl
  • Updated report generation to build from the canonical artifact
  • Added unit and CLI integration coverage for artifact generation

Execution Flow

image

Files Changed

Core Files

tests/src/artifact.rs

  • Added the canonical AgentRunArtifact type and nested serializable artifact models.
  • Added conversion from TestCaseDsl + AgentRunResult into a stable artifact.

tests/src/dsl.rs

  • Split execution from assertion evaluation.
  • Added execute_test_case.
  • Added collect_assertion_outcomes.
  • Added assertion_error_from_outcomes.
  • Preserved existing DSL behavior while making assertion results explicit for artifact capture.

tests/src/lib.rs

  • Exported the new artifact types and DSL evaluation helpers from the root-level tests/ crate.

crates/mofa-cli/src/cli.rs

Execution Flow

  • Added --artifact-out for mofa test-dsl.

crates/mofa-cli/src/commands/test_dsl.rs

  • Builds a canonical artifact from DSL execution.
  • Writes artifact JSON when --artifact-out is provided.
  • Builds report output from the artifact-backed result path.

crates/mofa-cli/src/main.rs

  • Wired --artifact-out through CLI dispatch.

tests/tests/artifact_tests.rs

  • Added focused unit coverage for canonical artifact generation and failed assertion capture.

Supporting Files

crates/mofa-cli/tests/test_dsl_integration_tests.rs: Added CLI integration coverage for --artifact-out.

Execution Flow

Canonical Artifact Scope

The initial artifact captures:

  • case name
  • pass/fail status
  • output text
  • runner error
  • duration and start timestamp
  • execution id and session id
  • agent identity
  • assertion outcomes
  • tool call records
  • last LLM request/response
  • session snapshot
  • workspace snapshots before and after execution

CLI Usage

Examples:

mofa test-dsl tests/examples/simple_agent.toml --artifact-out dsl-artifact.json
mofa test-dsl tests/examples/tool_agent.toml --artifact-out dsl-artifact.json --report-out dsl-report.json --report-format json

Example Output

mofa test-dsl tests/examples/simple_agent.toml --artifact-out dsl-artifact.json

saved in /tmp/dsl-artifact.json saved file.

Tests

  • cargo test -p mofa-testing --test artifact_tests
  • cargo test -p mofa-testing --test dsl_tests
  • cargo test -p mofa-cli --test test_dsl_integration_tests
  • cargo test -p mofa-cli test_test_dsl_

Notes

This PR does not introduce replay or baseline comparison yet. It establishes the canonical artifact foundation those later steps can build on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant