feat(helixops_saas): add shared dbt project and 17 benchmark tasks by joellabes · Pull Request #137 · dbt-labs/ade-bench

joellabes · 2026-03-19T02:26:00Z

Summary

Adds the helixops_saas shared dbt project (28 models: 9 staging, 11 intermediate, 8 marts) with a pre-built DuckDB database
Adds 17 benchmark tasks (helixops_saas001–helixops_saas017) covering a wide range of dbt agent challenges

Adds the OpsPilot SaaS analytics benchmark project as a new shared dbt project in ADE-bench. The project contains 28 models (11 staging, 11 intermediate, 6 marts) over an intentionally messy 11-table seed dataset covering accounts, workspaces, users, subscriptions, invoices, payments, and support tickets. Key setup decisions: - Seeds converted to dbt sources (source() refs, not ref()) to match existing ADE-bench shared project conventions - Raw data loaded into shared/databases/duckdb/ops_pilot.duckdb as VARCHAR columns to preserve intentional messiness - Hardcoded date anchors replaced with now() in three intermediate models (int_account_users, int_account_engagement, int_support_sla) - DuckDB-only for now; Snowflake migration deferred Verified: dbt run 28/28 PASS with dbt-core==1.10.11 + dbt-duckdb==1.9.3 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

25-task plan for creating ADE-bench tasks against the ops_pilot shared project. Covers all benchmark task ideas from the original handoff doc: Type A (remove-and-restore), Type B (genuine addition), and Type C (logic change). Includes common patterns, file manifest, and end-to-end verification steps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…project Creates task directories helixops_saas001-017 covering: - Type A (remove-and-restore): billing_country, owner_team propagation, api_calls DAG propagation - Type B (genuine addition): net_mrr trap, geo_segment, filter archived workspaces, dim_accounts_v2, SLA seeds - Type C (logic fix): sandbox filter, department infer trap, email rename, Falcon Works sbx bug, rename+propagate, onboarding fees - Refactor: inline int_monthly_revenue_prep as CTE - Already-done: department (no-op, expected-pass for none agent) Each task has task.yaml, setup.sh, solution.sh, solutions/ SQL, and tests/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… diffs Convert all 17 helixops_saas tasks from heredoc/cp-based setup and solution scripts to unified patch files, following the convention introduced in PR #139. - setup.sh: uses `patch -p1 < /app/setup/changes.patch` for tasks with file modifications (001-006, 011, 014); tasks with no file changes unchanged - solution.sh: uses `patch -p1 < /sage/solutions/changes.patch` for all tasks - setup/changes.patch: unified diff from shared project baseline to broken state - solutions/changes.patch: unified diff from broken state to correct solution - Removes ~1500 lines of boilerplate SQL files from solutions/ directories Includes new file creation (009 dim_accounts_v2, 015/016 seeds) and file deletion (012 removes int_monthly_revenue_prep) using /dev/null patch syntax. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ken-state patches The shared helixops_saas.duckdb will only contain raw tables (matching the airbnb pattern). Setup scripts for tasks 001-005, 011, and 013 previously ran dbt on a partial model selection, relying on pre-built views in the DB. Change all partial `dbt run --select ...` calls to `dbt run || true` so the full project is built from raw tables before the patch establishes the broken state. Task 013's DB-mutation-only setup also gets a full `dbt run`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove dummy.sql from 11 tasks (was a placeholder to ensure tasks always failed before solution seeds were generated; SELECT 1 always returns a row which fails dbt tests, and the semicolon caused parser errors in dbt-fusion) - Add AUTO_*_equality.sql and AUTO_*_existence.sql tests for all 17 tasks - Add solution seed CSVs generated by sage agent for all 17 tasks - Add ade_bench_equality_test.sql macros and _no-op.txt for all 17 tasks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- harness.py: SET TimeZone = 'UTC' before COPY TO CSV to fix seeds generated on UTC+12 machines being 12h ahead of CI (UTC) values - helixops_saas014/solutions/changes.patch: add missing stg_workspace_usage_daily hunk so sage can restore api_calls at the source; without this, the downstream int_workspace_daily_metrics.u.api_calls reference caused a compile error and total_api_calls_not_null test always failed - helixops_saas005/tests: exclude days_since_last_login from equality comparison since it uses now() and changes daily - Regenerated seeds for tasks 004, 005, 010, 014, 017 with UTC timezone fix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

AUTO_*.sql test files are regenerated on every harness run, so cols_to_exclude must be set in task.yaml's solution_seeds config rather than directly in the test SQL. - helixops_saas005: exclude days_since_last_login (uses now()) - helixops_saas015: exclude ticket_age_days (uses now()) - helixops_saas016: exclude ticket_age_days (uses now()) Regenerated seeds for all three tasks with UTC timezone fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

**helixops_saas008**: solution patch was missing int_account_users.sql (which also selects a.account_status). Patch now renames all 5 occurrences. Seed regenerated with customer_status. **helixops_saas012**: structural refactor (inline CTE) produces identical data, so equality alone can't distinguish broken from fixed state. - test_setup now drops the int_monthly_revenue_prep relation before rebuilding fct_monthly_revenue, causing the none-agent build to fail when it still ref()s the dropped view - Added manifest check test: fails if int_monthly_revenue_prep is still present in graph.nodes (i.e. the file was not deleted) - Added drop_relation macro used by the test_setup operation **ci.yml**: add helixops_saas017 to ALLOWED_TO_PASS — it is an intentional no-op task (department already exists) where both none and sage agents correctly produce passing output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Each description now captures the non-obvious aspect an agent must understand to solve the task — e.g. the column source, the trap in the prompt, or the structural requirement — rather than just restating the prompt. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tasks/helixops_saas004/task.yaml

tasks/helixops_saas006/task.yaml

tasks/helixops_saas007/task.yaml

tasks/helixops_saas008/task.yaml

tasks/helixops_saas010/task.yaml

tasks/helixops_saas010/solution.sh

tasks/helixops_saas012/task.yaml

tasks/helixops_saas013/task.yaml

tasks/helixops_saas014/task.yaml

- 004: add no_hint prompt variant (same challenge, without the hint) - 006: add compile+grep test_setup check to enforce upstream field reuse - 007: fix prompt wording (hyphen separated); add equality seeds for int_account_billing_snapshot and mart_account_health - 008: add equality seeds for dim_accounts, int_account_users, mart_account_health - 009: rework to use dbt model versioning YAML (latest_version=1, v2 via _models.yml); add graph.nodes manifest check; fix tests to use versioned ref syntax ref('dim_accounts', v=2) - 010: add stg_workspaces seed to verify staging layer is unchanged; update solution.sh to rebuild stg_workspaces - 012: no change needed (orphan model check already in place) - 013: add equality seeds for stg_invoice_line_items and int_invoice_finance - 014: add equality seeds for all 6 intermediate models in the api_calls chain - shared/scripts/run-dbt-test.sh: propagate test_setup exit code so compile-based checks can fail the task Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…for upstream field check When mart_account_360 doesn't reference effective_monthly_value_usd, write a failing singular test file into /app/tests/ so dbt picks it up as a real test failure with proper results page output. Reverts the || exit 1 approach in run-dbt-test.sh which caused an empty test list on the results page. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tasks/helixops_saas007/solutions/changes.patch

tasks/helixops_saas015/task.yaml

joellabes · 2026-03-26T01:21:55Z

tasks/helixops_saas006/task.yaml

+description: Add net_mrr to mart_account_360 — correct solution reuses an upstream calculated field rather than recalculating from raw inputs
+prompts:
+  - key: base
+    prompt: |-
+      Please add net_mrr to the account 360, based on contracted price less discount, divided by 12 if billed annually.


Let's add another task which is like this one, but requires a formula to be brought through from a grandparent model. Won't be a variant prompt on 006, will need to work out what it actually is

tasks/helixops_saas007/task.yaml

tasks/helixops_saas008/task.yaml

tasks/helixops_saas010/task.yaml

tasks/helixops_saas011/task.yaml

tasks/helixops_saas012/task.yaml

- saas007: fix geo_segment separator ' / ' -> '-' in patch; add no_location_hint prompt variant - saas008: add stg_accounts to solution_seeds - saas010: add int_workspace_roster, int_workspace_daily_metrics, int_support_sla to solution_seeds - saas011: add hard prompt variant - saas012: add hard prompt variant - saas015: add compile+grep check — write failing test if int_support_sla doesn't reference sla_response_targets Seeds for saas007/008/010 need regeneration via sage --seed run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…check Agent created sla_priority_config.csv with 'standard' priority and joined using CASE WHEN ... ELSE 'standard' END, which maps 'low' tickets to 'standard' and produces numerically identical output — equality test cannot distinguish. Compile int_support_sla and fail if compiled SQL contains 'standard' but not 'low': correct solution joins directly on t.priority so neither literal appears; agent's normalization approach contains 'standard'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…seed and SQL Two checks in test_setup: 1. grep any non-solution__ seed file for 'standard' (catches agents that create sla_priority_config.csv with 'standard' instead of 'low') 2. grep compiled int_support_sla for literal 'standard' in SQL (catches agents that normalize via CASE WHEN ... ELSE 'standard' END) Both must be absent for the task to pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- saas007: regenerate 3 seeds with '-' separator (was ' / ') - saas008: add solution__stg_accounts.csv + regenerate affected seeds - saas010: add solution__int_workspace_roster/daily_metrics/support_sla.csv All 4 tasks (including saas007.no_location_hint) pass 38/38 tests with sage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_usd pre-exposed All four raw formula ingredients (billing_cycle, contracted_seats, discount_pct, list_price_usd) are already present in mart_account_360, making inline recalculation maximally tempting. Correct solution still reuses effective_monthly_value_usd from int_account_billing_snapshot. Setup patch adds list_price_usd to both int_account_billing_snapshot and mart_account_360. Same compile check as saas006. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…made it annoying Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

joellabes and others added 5 commits March 18, 2026 14:09

refactor: rename ops_pilot -> helixops_saas throughout

0766481

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: add author_email to all helixops_saas task.yaml files

1a2742e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

joellabes marked this pull request as draft March 19, 2026 03:06

joellabes and others added 8 commits March 23, 2026 15:23

Merge branch 'main' into feature/ops-pilot-shared-project

0389584

joellabes commented Mar 24, 2026

View reviewed changes

joellabes and others added 2 commits March 25, 2026 07:44

joellabes commented Mar 26, 2026

View reviewed changes

joellabes and others added 4 commits March 30, 2026 13:52

joellabes marked this pull request as ready for review March 30, 2026 01:57

joellabes and others added 2 commits March 30, 2026 15:39

remove(helixops_saas014): delete task — solution naming disagreement …

9145735

…made it annoying Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

joellabes merged commit 3ffb1c1 into main Mar 30, 2026
9 checks passed

joellabes deleted the feature/ops-pilot-shared-project branch March 30, 2026 06:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(helixops_saas): add shared dbt project and 17 benchmark tasks#137

feat(helixops_saas): add shared dbt project and 17 benchmark tasks#137
joellabes merged 21 commits intomainfrom
feature/ops-pilot-shared-project

joellabes commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joellabes Mar 26, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joellabes commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joellabes Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joellabes commented Mar 19, 2026 •

edited

Loading