Feat/whaleflow cost tracking#2486
Conversation
New crate crates/whaleflow providing declarative JSON-config-driven sub-agent swarm orchestration for CodeWhale. Inspired by Claude Code's Dynamic Workflows (Opus 4.8, May 2026). - WorkflowConfig JSON schema with phases, tasks, dependencies - Topological scheduler with semaphore-based concurrency control - File-scope conflict detection for parallel write safety - Git worktree isolation per task (create → extract → apply → clean) - Structured WorkflowResult with per-task cost/token tracking - workflow_run tool schema for model invocation - TUI integration via WhaleFlowSpawner (SubAgentManager bridge) - 18 tests: 15 unit + 3 integration
…_blocking - cwd_path() now returns worktree path for Worktree variant (was dead code) - parallel phases now honor Abort failure policy - WorktreeManager git calls wrapped in tokio::spawn_blocking - timeout_secs wired end-to-end with tokio::time::timeout on polling loop - AgentSpawner trait extended with timeout_secs/max_steps parameters - WorkflowRunTool no longer claims ReadOnly capability - unknown agent_type now logs a warning instead of silently defaulting Addresses Greptile review: P1 (blocking Command), P2 (dead timeout_secs)
… tests - max_steps flows through SubAgentSpawnOptions to per-agent step budget - extract_changes errors now logged instead of silently ignored - files_touched populated from worktree diff output - TaskStatus re-exported from whaleflow crate - 3 new tests: abort in parallel phase, abort stops subsequent phases, timeout_secs/max_steps deserialization Addresses Greptile P1 (silent failure on extract_changes)
There was a problem hiding this comment.
Code Review
This pull request introduces WhaleFlow, a declarative multi-agent workflow orchestration framework for CodeWhale, adding the whaleflow crate and integrating it into the TUI with a new workflow_run tool. While the architecture is solid, several critical issues were identified in the review: token and cost tracking metrics are not being propagated from the sub-agent snapshots; nested Results are ignored during worktree patch application and removal, leading to silent failures; background tasks are not explicitly aborted when a phase fails under an abort policy; untracked files are missed during worktree change extraction; and glob pattern normalization in conflict detection fails for patterns like *.rs.
| tokens_used: None, | ||
| cost_usd: None, |
There was a problem hiding this comment.
The tokens_used and cost_usd fields are hardcoded to None in the WhaleFlowSpawner implementation. Since this PR introduces cost tracking and plumbs these fields into SubAgentResult, they should be propagated from the snapshot to the AgentResult so that WhaleFlow can display the accumulated cost data.
| tokens_used: None, | |
| cost_usd: None, | |
| tokens_used: snapshot.tokens_used, | |
| cost_usd: snapshot.cost_usd, |
| if let Err(e) = tokio::task::spawn_blocking( | ||
| move || WorktreeManager::apply_patch(&ws, &p), | ||
| ) | ||
| .await | ||
| .map_err(|e| { | ||
| SpawnError::Internal(format!( | ||
| "spawn_blocking join: {e}" | ||
| )) | ||
| }) { | ||
| tracing::warn!( | ||
| task_id = %task_id, | ||
| error = %e, | ||
| "Failed to apply worktree patch" | ||
| ); | ||
| } |
There was a problem hiding this comment.
The if let Err(e) = ... block only checks the outer Result (the JoinError mapped to SpawnError::Internal), completely ignoring the inner Result returned by WorktreeManager::apply_patch. If the patch application fails, the error will be silently ignored. We should flatten the nested Result to correctly handle and log patch application failures.
let apply_result = tokio::task::spawn_blocking(
move || WorktreeManager::apply_patch(&ws, &p),
)
.await
.map_err(|e| {
SpawnError::Internal(format!(
"spawn_blocking join: {e}"
))
})
.and_then(|res| res);
if let Err(e) = apply_result {
tracing::warn!(
task_id = %task_id,
error = %e,
"Failed to apply worktree patch"
);
}| if let Err(e) = tokio::task::spawn_blocking(move || { | ||
| WorktreeManager::remove(&tid, &ws) | ||
| }) | ||
| .await | ||
| .map_err(|e| { | ||
| SpawnError::Internal(format!("spawn_blocking join: {e}")) | ||
| }) { | ||
| tracing::warn!( | ||
| task_id = %task_id, | ||
| error = %e, | ||
| "Failed to remove worktree" | ||
| ); | ||
| } |
There was a problem hiding this comment.
The if let Err(e) = ... block only checks the outer Result (the JoinError mapped to SpawnError::Internal), completely ignoring the inner Result returned by WorktreeManager::remove. If the worktree removal fails, the error will be silently ignored. We should flatten the nested Result to correctly handle and log cleanup failures.
let remove_result = tokio::task::spawn_blocking(move || {
WorktreeManager::remove(&tid, &ws)
})
.await
.map_err(|e| {
SpawnError::Internal(format!("spawn_blocking join: {e}"))
})
.and_then(|res| res);
if let Err(e) = remove_result {
tracing::warn!(
task_id = %task_id,
error = %e,
"Failed to remove worktree"
);
}| for (task_id, handle) in handles { | ||
| match handle.await { | ||
| Ok(Ok(agent_result)) => { | ||
| self.results.insert(task_id.clone(), agent_result.clone()); | ||
| task_results.push(TaskResult { | ||
| id: task_id, | ||
| status: TaskStatus::Completed, | ||
| summary: Some(truncate(&agent_result.summary, 500)), | ||
| files_touched: agent_result.files_touched, | ||
| error: None, | ||
| }); | ||
| } | ||
| Ok(Err(spawn_err)) => { | ||
| warn!(task = %task_id, error = %spawn_err, "task failed"); | ||
| task_results.push(TaskResult { | ||
| id: task_id.clone(), | ||
| status: TaskStatus::Failed, | ||
| summary: None, | ||
| files_touched: vec![], | ||
| error: Some(spawn_err.to_string()), | ||
| }); | ||
| if phase.on_failure == FailurePolicy::Abort { | ||
| break; | ||
| } | ||
| } | ||
| Err(join_err) => { | ||
| warn!(task = %task_id, error = %join_err, "task panicked"); | ||
| task_results.push(TaskResult { | ||
| id: task_id.clone(), | ||
| status: TaskStatus::Failed, | ||
| summary: None, | ||
| files_touched: vec![], | ||
| error: Some(format!("join error: {}", join_err)), | ||
| }); | ||
| if phase.on_failure == FailurePolicy::Abort { | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
When phase.on_failure == FailurePolicy::Abort is triggered, the remaining spawned tasks are dropped but NOT aborted. In Tokio, dropping a JoinHandle does not cancel the background task, meaning the remaining sub-agents will continue to run in the background, consuming resources and API tokens. We should use a VecDeque to track the handles and explicitly call h.abort() on all remaining tasks when aborting.
let mut handles: std::collections::VecDeque<_> = handles.into();
while let Some((task_id, handle)) = handles.pop_front() {
match handle.await {
Ok(Ok(agent_result)) => {
self.results.insert(task_id.clone(), agent_result.clone());
task_results.push(TaskResult {
id: task_id,
status: TaskStatus::Completed,
summary: Some(truncate(&agent_result.summary, 500)),
files_touched: agent_result.files_touched,
error: None,
});
}
Ok(Err(spawn_err)) => {
warn!(task = %task_id, error = %spawn_err, "task failed");
task_results.push(TaskResult {
id: task_id.clone(),
status: TaskStatus::Failed,
summary: None,
files_touched: vec![],
error: Some(spawn_err.to_string()),
});
if phase.on_failure == FailurePolicy::Abort {
for (_, h) in handles {
h.abort();
}
break;
}
}
Err(join_err) => {
warn!(task = %task_id, error = %join_err, "task panicked");
task_results.push(TaskResult {
id: task_id.clone(),
status: TaskStatus::Failed,
summary: None,
files_touched: vec![],
error: Some(format!("join error: {}", join_err)),
});
if phase.on_failure == FailurePolicy::Abort {
for (_, h) in handles {
h.abort();
}
break;
}
}
}
}| let output = Command::new("git") | ||
| .arg("-C") | ||
| .arg(&worktree_path) | ||
| .arg("diff") | ||
| .arg("HEAD") | ||
| .output() | ||
| .map_err(|e| { | ||
| SpawnError::WorktreeError(format!("git diff in worktree failed: {}", e)) | ||
| })?; |
There was a problem hiding this comment.
extract_changes runs git diff HEAD which does not capture untracked files. Any new files created by the sub-agent in the worktree will be completely ignored and lost when the worktree is deleted. We should run git add -A before running git diff HEAD to ensure all changes are captured.
| let output = Command::new("git") | |
| .arg("-C") | |
| .arg(&worktree_path) | |
| .arg("diff") | |
| .arg("HEAD") | |
| .output() | |
| .map_err(|e| { | |
| SpawnError::WorktreeError(format!("git diff in worktree failed: {}", e)) | |
| })?; | |
| // Stage all changes (including untracked files) so they are captured by git diff | |
| let _ = Command::new("git") | |
| .arg("-C") | |
| .arg(&worktree_path) | |
| .arg("add") | |
| .arg("-A") | |
| .output(); | |
| let output = Command::new("git") | |
| .arg("-C") | |
| .arg(&worktree_path) | |
| .arg("diff") | |
| .arg("HEAD") | |
| .output() | |
| .map_err(|e| { | |
| SpawnError::WorktreeError(format!("git diff in worktree failed: {}", e)) | |
| })?; |
| /// `src/*/handler.rs` strips the `*` and checks prefix boundaries. | ||
| fn scopes_overlap(a: &[String], b: &[String]) -> bool { | ||
| if a.is_empty() || b.is_empty() { | ||
| return false; | ||
| } | ||
|
|
||
| /// Strip glob wildcards from the end of a pattern, stopping before | ||
| /// the last directory separator so path-boundary matching works. | ||
| fn normalize_pattern(s: &str) -> String { | ||
| // Remove trailing globs: /** → nothing, /* → nothing, /*.rs → nothing | ||
| let mut p = s.trim_end_matches('/').to_string(); | ||
| while p.ends_with("**") || p.ends_with('*') { | ||
| let trimmed = p.trim_end_matches("**").trim_end_matches('*'); | ||
| if trimmed.len() == p.len() { | ||
| break; | ||
| } |
There was a problem hiding this comment.
The normalize_pattern function fails to correctly normalize glob patterns that contain wildcards followed by other characters (such as *.rs), resulting in false negatives for conflict detection. We should truncate the pattern at the first wildcard character to correctly extract the static directory prefix.
/// Strip glob wildcards from the end of a pattern, stopping before
/// the last directory separator so path-boundary matching works.
fn normalize_pattern(s: &str) -> String {
let mut p = s.to_string();
if let Some(idx) = p.find(|c| c == '*' || c == '?' || c == '[') {
p.truncate(idx);
}
let mut p = p.trim_end_matches('/').to_string();
if p.is_empty() {
p = ".".to_string();
}
p
}|
Thanks @AdityaVG13. I am tracking this with #2482 rather than merging it separately for v0.9 right now. The cost/token visibility goal is useful, but this branch currently inherits the larger WhaleFlow draft surface and its own review points mean the stated propagation is not release-ready yet: For a future narrow harvest, the safest split is likely: first land sub-agent accounting where the runtime actually records token/cost data, then add |
Adds @AdityaVG13 to the contribution-gate allowlist now that WhaleFlow #2482/#2486 have been harvested into the maintained v0.9 IR/TraceStore foundation with public credit.
|
Thanks @AdityaVG13. I opened #2821 as a narrow v0.9 maintainer harvest of the safe typed-IR portion that overlaps this WhaleFlow/cost-tracking direction. #2821 keeps the change metadata-only: Your broader direction is still credited in the commit trailer, changelog, and PR body; I am leaving the larger workflow work open rather than pretending the maintainer slice covers it all. |
Add the explicit WorkflowSpec/WorkflowNode metadata surface requested for the v0.9 WhaleFlow IR, including budget, permission, model, and promotion policy records plus serde roundtrip coverage. Runtime execution, replay, and worktree application remain out of scope. Refs #2668, #2482, #2486. Co-authored-by: AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>
Add a crate-local mock executor over WorkflowSpec that records leaf, branch, and control-node results for Sequence, BranchSet, Leaf, Reduce, TeacherReview, LoopUntil, Cond, and Expand. Add reducer scaffolding for BranchTournament and ParetoFrontier, plus #2669 acceptance-style tests, without exposing workflow_run, spawning agents, or applying worktrees. Refs #2669. Harvests narrow WhaleFlow executor intent from #2482/#2486. Co-authored-by: AdityaVG13 <44177453+AdityaVG13@users.noreply.github.com>
|
Thanks again @AdityaVG13. I opened #2827 as another narrow v0.9 maintainer harvest from this cost-tracking direction. What #2827 lands:
I am still leaving this broader draft PR open as the source branch for the larger WhaleFlow/runtime cost-tracking work. The unsafe pieces called out earlier remain out of scope for #2827, especially runtime spawner wiring, worktree extraction, and cancellation semantics. Local verification for #2827: |
What
Adds
tokens_used: Option<u64>andcost_usd: Option<f64>fields toSubAgentResultso WhaleFlow (and the TUI agents pane) can displayper-agent API cost data.
Currently
AgentResult::tokens_usedandAgentResult::cost_usdarealways
Nonebecause the sub-agent infrastructure doesn't track them.This PR adds the accumulator fields to
SubAgentand plumbs them intoSubAgentResult::snapshot().Why
Without cost tracking, WhaleFlow users can't see how much each task in
a swarm cost. For a 20-agent security audit, you'd want to know which
agents burned the most tokens. The fields are already defined on
AgentResult— they're just never populated.Scope
tokens_used: u64andcost_usd: f64toSubAgentSubAgentResult::snapshot()asOptionAgentResultvia the spawnerThe actual per-step accumulation in
run_subagent_taskis left for afollow-up — this PR sets up the data structures so accumulation can
be wired in one place.
Related
Testing
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-featurescargo test --workspace --all-featuresChecklist
Greptile Summary
This PR introduces the
codewhale-whaleflowcrate — a declarative multi-agent workflow orchestration layer — and wires it into the TUI via a newWorkflowRunTool. It also addstokens_used/cost_usdaccumulator fields toSubAgent/SubAgentResultand amax_stepsper-spawn override toSubAgentSpawnOptions.codewhale-whaleflowcrate: implementsWorkflowConfig(phases, tasks, isolation modes), a topological scheduler with a shared concurrency semaphore, git worktree lifecycle management (create → extract → apply → remove), and theAgentSpawnertrait that decouples orchestration logic from TUI runtime details.crates/tui/src/tools/workflow/mod.rs): implementsAgentSpawnerviaWhaleFlowSpawner, which spawns background sub-agents throughSubAgentManager, polls for completion, and handles worktree patch extraction and apply.SubAgent,SubAgentResult): addstokens_used: u64andcost_usd: f64fields initialized to zero; the spawner plumbing that was supposed to forward these values toAgentResultinstead hardcodesNone, so no cost data reaches WhaleFlow even after the fields are wired up elsewhere.Confidence Score: 3/5
Not ready to merge: the spawner hardcodes None for the cost fields the PR exists to propagate, and the worktree extract-changes path silently discards any git commits made by a sub-agent.
Two concrete correctness gaps stand out. First,
WhaleFlowSpawner::spawn()returnstokens_used: Noneandcost_usd: Noneunconditionally even thoughSubAgentResultnow carries those fields — the propagation step the PR claims to implement does nothing. Second,WorktreeManager::extract_changesrunsgit diff HEADinside the worktree; when a sub-agent commits inside that worktree,HEADadvances to the new commit, sogit diff HEADproduces an empty or incomplete patch and the committed work is silently lost when the worktree is removed.crates/tui/src/tools/workflow/mod.rs(cost field propagation) andcrates/whaleflow/src/worktree.rs(extract_changes commit coverage) need fixes before merge.Important Files Changed
git diff HEADwhich only captures staged/unstaged changes and silently drops any commits made by the sub-agent inside the worktreeSequence Diagram
sequenceDiagram participant Model as Orchestrator Model participant WRT as WorkflowRunTool participant Sched as Scheduler participant Spawner as WhaleFlowSpawner participant WTM as WorktreeManager participant SAM as SubAgentManager Model->>WRT: workflow_run(config JSON) WRT->>Sched: execute_workflow(config, spawner) Sched->>Sched: validate + topological sort phases loop For each phase (in dependency order) Sched->>Spawner: spawn(task_id, prompt, cwd, timeout, max_steps) alt "isolation = Worktree" Spawner->>WTM: create(task_id, workspace) WTM-->>Spawner: worktree path end Spawner->>SAM: spawn_background_with_assignment_options SAM-->>Spawner: agent_id loop Poll until terminal Spawner->>SAM: get_result(agent_id) SAM-->>Spawner: "SubAgentResult (tokens_used⚠=0, cost_usd⚠=0)" end alt Completed + Worktree Spawner->>WTM: extract_changes [git diff HEAD ⚠misses commits] WTM-->>Spawner: patch Spawner->>WTM: apply_patch(workspace, patch) Spawner->>WTM: remove(task_id) end Spawner-->>Sched: "AgentResult(tokens_used=None⚠, cost_usd=None⚠)" end Sched-->>WRT: WorkflowResult JSON WRT-->>Model: tool resultComments Outside Diff (4)
crates/tui/src/tools/workflow/mod.rs, line 459-469 (link)None, defeating the PR's primary goalThe PR's stated scope includes "Propagates to WhaleFlow's
AgentResultvia the spawner," butWhaleFlowSpawner::spawn()hardcodestokens_used: Noneandcost_usd: Noneeven though those fields are now available onsnapshot. SinceSubAgentResult(returned bymgr.get_result()) now carriestokens_used: Option<u64>andcost_usd: Option<f64>, those values should be forwarded here:tokens_used: snapshot.tokens_usedandcost_usd: snapshot.cost_usd. Without this, WhaleFlow's per-agent cost display will always show nothing regardless of whether the follow-up accumulation work is done.crates/whaleflow/src/worktree.rs, line 305-327 (link)git diff HEADsilently drops commits made inside the worktreeextract_changesrunsgit diff HEADin the worktree. When a worktree is created viagit worktree add <path> HEAD, itsHEADis detached at the current commit. If the sub-agent callsgit commitinside the worktree, the detachedHEADadvances to the new commit, sogit diff HEADcompares the working tree to that new commit — the committed changes never appear in the patch and are silently discarded when the worktree is removed. A sub-agent that stages and commits its work (a common pattern) will lose all of that work. The diff should be taken against the original starting commit (captured at worktree-creation time), e.g.git diff <original-sha>or by combininggit diff HEADfor uncommitted changes withgit logto detect new commits.crates/whaleflow/src/scheduler.rs, line 1612-1658 (link)When
FailurePolicy::Abortfires inside the sequential handle-collection loop, the remainingJoinHandles inhandlesare dropped. Dropping a TokioJoinHandledetaches — it does NOT cancel the spawned task. The background agents keep polling, hold their semaphore permits, consume API tokens, and (if worktree-isolated) never receive cleanup signals. A cancellation token or explicitabort()on each handle would be needed to actually stop the running agents.crates/whaleflow/src/config.rs, line 831-843 (link)depends_on_resultsfrom the same parallel phase silently injects empty contextValidation only checks that
depends_on_resultsreferences a known task ID, but does not verify that the referenced task belongs to an earlier (already completed) phase. If a parallel-phase task lists a sibling task independs_on_results,build_promptwill find no entry inself.resultsfor it (the sibling runs concurrently) and silently emit"### sibling-id (not available)". The model receives a prompt that looks like it has context but has none, with no warning. Consider rejectingdepends_on_resultsreferences that point to tasks in the same phase (or later phases) during validation.Reviews (1): Last reviewed commit: "feat(whaleflow): add tokens_used and cos..." | Re-trigger Greptile