Skip to content

[feat] Run Trace Export System for WhaleFlow/Model Lab #2752

@nayar-900

Description

@nayar-900

Problem

When working with multiple models and workflows in CodeWhale (especially within Model Lab + WhaleFlow pipelines), it becomes difficult to track which model configuration produced which output.

Currently, there is no structured way to export or record a “run trace” that includes:

model used
prompt inputs
workflow path (WhaleFlow / Harness)
intermediate steps
final output

This makes debugging, evaluation, and reproducibility harder.


Proposed solution

Introduce a Run Trace Export System for CodeWhale workflows.

Add a feature such as:

/export run --format json

or UI button:
“Export Run Trace”

Output example:

{
  "run_id": "abc123",
  "model": "deepseek-v4",
  "workflow": "whaleflow",
  "steps": [
    {
      "node": "planner",
      "input": "...",
      "output": "..."
    },
    {
      "node": "executor",
      "output": "..."
    }
  ],
  "final_output": "..."
}

Use case

Debugging failed agent workflows
Comparing model performance in Model Lab
Reproducing Whaleflow executions
Auditing Harness-generated agent behavior
Research experiments (A/B testing workflows)


Alternatives considered

Manually logging outputs from each step
Copy-pasting terminal logs
External observability tools

These approaches are error-prone and not integrated into CodeWhale's workflow system.


Impact

Improves reproducibility of AI workflows
Helps debugging complex multi-agent runs
Useful for Model Lab evaluation pipelines
Strong fit for v0.9.0 direction (workflow + execution layer maturity)


Additional context

This aligns closely with:

WhaleFlow runtime execution model
Model Lab evaluation workflows
Harness Creator pipeline (inspect → evaluate → promote)

A structured run-trace system would significantly improve transparency and debugging across all layers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions