Problem
When working with multiple models and workflows in CodeWhale (especially within Model Lab + WhaleFlow pipelines), it becomes difficult to track which model configuration produced which output.
Currently, there is no structured way to export or record a “run trace” that includes:
model used
prompt inputs
workflow path (WhaleFlow / Harness)
intermediate steps
final output
This makes debugging, evaluation, and reproducibility harder.
Proposed solution
Introduce a Run Trace Export System for CodeWhale workflows.
Add a feature such as:
/export run --format json
or UI button:
“Export Run Trace”
Output example:
{
"run_id": "abc123",
"model": "deepseek-v4",
"workflow": "whaleflow",
"steps": [
{
"node": "planner",
"input": "...",
"output": "..."
},
{
"node": "executor",
"output": "..."
}
],
"final_output": "..."
}
Use case
Debugging failed agent workflows
Comparing model performance in Model Lab
Reproducing Whaleflow executions
Auditing Harness-generated agent behavior
Research experiments (A/B testing workflows)
Alternatives considered
Manually logging outputs from each step
Copy-pasting terminal logs
External observability tools
These approaches are error-prone and not integrated into CodeWhale's workflow system.
Impact
Improves reproducibility of AI workflows
Helps debugging complex multi-agent runs
Useful for Model Lab evaluation pipelines
Strong fit for v0.9.0 direction (workflow + execution layer maturity)
Additional context
This aligns closely with:
WhaleFlow runtime execution model
Model Lab evaluation workflows
Harness Creator pipeline (inspect → evaluate → promote)
A structured run-trace system would significantly improve transparency and debugging across all layers.
Problem
When working with multiple models and workflows in CodeWhale (especially within Model Lab + WhaleFlow pipelines), it becomes difficult to track which model configuration produced which output.
Currently, there is no structured way to export or record a “run trace” that includes:
model used
prompt inputs
workflow path (WhaleFlow / Harness)
intermediate steps
final output
This makes debugging, evaluation, and reproducibility harder.
Proposed solution
Introduce a Run Trace Export System for CodeWhale workflows.
Add a feature such as:
or UI button:
“Export Run Trace”
Output example:
{ "run_id": "abc123", "model": "deepseek-v4", "workflow": "whaleflow", "steps": [ { "node": "planner", "input": "...", "output": "..." }, { "node": "executor", "output": "..." } ], "final_output": "..." }Use case
Debugging failed agent workflows
Comparing model performance in Model Lab
Reproducing Whaleflow executions
Auditing Harness-generated agent behavior
Research experiments (A/B testing workflows)
Alternatives considered
Manually logging outputs from each step
Copy-pasting terminal logs
External observability tools
These approaches are error-prone and not integrated into CodeWhale's workflow system.
Impact
Improves reproducibility of AI workflows
Helps debugging complex multi-agent runs
Useful for Model Lab evaluation pipelines
Strong fit for v0.9.0 direction (workflow + execution layer maturity)
Additional context
This aligns closely with:
WhaleFlow runtime execution model
Model Lab evaluation workflows
Harness Creator pipeline (inspect → evaluate → promote)
A structured run-trace system would significantly improve transparency and debugging across all layers.