Skip to content

Optimize task ordering through LPT scheduling#117

Open
clkao wants to merge 3 commits intodbt-labs:mainfrom
clkao:feature/duration-hints-lpt
Open

Optimize task ordering through LPT scheduling#117
clkao wants to merge 3 commits intodbt-labs:mainfrom
clkao:feature/duration-hints-lpt

Conversation

@clkao
Copy link
Copy Markdown
Contributor

@clkao clkao commented Feb 6, 2026

Shave sage run from 7m57s to 7m13s.

Summary

  • Add --duration-hints CLI option pointing to a reference experiment directory
  • Load runtime_ms from that experiment's results.json and sort tasks by descending duration before submitting to the thread pool
  • Tasks without a hint are scheduled first (pessimistic — unknowns might be slow)
  • Reduces total wall-clock time when running with multiple concurrent trials by avoiding slow tasks starting late

Test plan

  • ade run all --db duckdb --project-type dbt --agent sage --duration-hints experiments/claude-haiku-4-5-20251001-latest — confirm LPT task order: log line shows descending runtimes
  • ade run all --db duckdb --project-type dbt --agent sage — confirm no LPT log, behavior unchanged
  • --duration-hints experiments/nonexistent — confirm warning logged, falls back to default order
  • uv run python -m pytest tests/test_duration_hints.py -v — 6 unit tests pass

🤖 Generated with Claude Code

clkao and others added 2 commits February 6, 2026 15:33
Sort tasks by descending estimated runtime before submitting to the
thread pool, so the slowest tasks start first and reduce tail latency.
Hints are loaded from a reference experiment's results.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RobertIsmo
Copy link
Copy Markdown
Collaborator

Do you have a profiler run you can share?

@joellabes
Copy link
Copy Markdown
Collaborator

I like the idea of this working magically, but dislike the idea of having to specify a directory every time. What about a well known directory like experiments/timing/timing_history.json which is just key-value pairs of task name and last known runtime?

@clkao
Copy link
Copy Markdown
Contributor Author

clkao commented Feb 9, 2026

@joellabes we can also default to the latest run in experiments, but this gets annoying if the latest one is a subset and you're running the full suite.

another reason for a specific one is with different plugin sets, tasks might have different duration.

@joellabes
Copy link
Copy Markdown
Collaborator

default to the latest run in experiments, but this gets annoying if the latest one is a subset and you're running the full suite

yeah that well-known file path wouldn’t be just the most recent run, it'd be populated by every invocation. I figured it'd overwrite just the ones that were evaluated with their most recent timing, so it might look like

{ "timings": [
  {"task_id": "airbnb001", "last_run_ms": 23819},
  {"task_id": "airbnb002", "last_run_ms": 31603},
  {"task_id": "analytics_engineering001", "last_run_ms": 4071}
]
}

with different plugin sets, tasks might have different duration

I guess? if so, you'd only have a single suboptimal run before it got overwritten with up to date stats, assuming you're iterating on a single plugin type at once. That json file could be changed to be keyed by many different dimensions (plugin set, agent, database, project type, prompt hash...) if you want, but more cardinality reduces the number of hits you get 🤷‍♂️ ⚖️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants