dbt-labs · joellabes · Mar 30, 2026 · Mar 18, 2026 · Mar 18, 2026 · Mar 18, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -162,7 +162,8 @@ jobs:
               results = json.load(f)
 
           # Tasks that are allowed to pass even for the "none" agent
-          ALLOWED_TO_PASS = {"analytics_engineering001"}
+          # (e.g. no-op tasks where the answer is already in place)
+          ALLOWED_TO_PASS = {"analytics_engineering001", "helixops_saas017"}
 
           failed_tasks = []
           passed_tasks = []

diff --git a/ade_bench/harness.py b/ade_bench/harness.py
@@ -1168,6 +1168,7 @@ def _extract_duckdb_csv(
                 import duckdb
 
                 con = duckdb.connect(temp_db_path)
+                con.execute("SET TimeZone = 'UTC';")
 
                 # Collect all schema information
                 all_schemas = []

diff --git a/docs/superpowers/plans/2026-03-18-helixops-saas-tasks.md b/docs/superpowers/plans/2026-03-18-helixops-saas-tasks.md
@@ -0,0 +1,125 @@
+# helixops_saas Benchmark Tasks Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Create 25 ADE-bench benchmark tasks that test an AI agent's ability to make targeted dbt model changes against the helixops_saas shared project.
+
+**Architecture:** Each task lives in `tasks/helixops_saas0NN/`, contains a `task.yaml`, `setup.sh`, `solution.sh`, a `solutions/` directory with correct SQL, and a `tests/` directory with SQL validation queries. The shared project (`shared/projects/dbt/helixops_saas`) and database (`shared/databases/duckdb/helixops_saas.duckdb`) are referenced but not modified by individual tasks. setup.sh puts the project into a broken/incomplete state; the agent's job is to fix it; solution.sh is the answer key.
+
+**Tech Stack:** bash, dbt-core 1.10.11, dbt-duckdb 1.9.3, DuckDB 1.3.0. DuckDB-only for now (Snowflake deferred). No external dbt packages.
+
+---
+
+## Task Classification
+
+Tasks fall into three categories based on the model changes required:
+
+**Type A — Remove-and-restore:** The field already exists in the correct model. `setup.sh` removes it with `sed`. The agent must identify what is missing and restore it. `solution.sh` copies back the full correct SQL.
+
+**Type B — Genuine addition:** The field does not currently exist. `setup.sh` runs a baseline dbt build. The agent must implement a new column from scratch. `solution.sh` applies the correct implementation.
+
+**Type C — Logic change:** The field exists but with wrong logic. `setup.sh` applies the broken logic. The agent must fix it. `solution.sh` restores correct logic.
+
+---
+
+## Common Patterns
+
+### task.yaml template
+
+```yaml
+task_id: helixops_saasNNN
+status: ready
+description: One-line description
+prompts:
+  - key: base
+    prompt: |-
+      <task description — goal-oriented, no project context, no command hints>
+author_name: joel
+difficulty: easy
+tags:
+  - dbt
+  - helixops_saas
+variants:
+- db_type: duckdb
+  db_name: helixops_saas
+  project_type: dbt
+  project_name: helixops_saas
+- db_type: duckdb
+  db_name: helixops_saas
+  project_type: dbt-fusion
+  project_name: helixops_saas
+solution_seeds:
+  - table_name: <affected_mart_or_fact_table>
+```
+
+### setup.sh template (Type A — remove a column)
+
+```bash
+#!/bin/bash
+# Remove target column from model, then build baseline
+sed -i '/    column_expression_to_remove,/d' models/path/to/model.sql
+dbt run --select model_name
+```
+
+> Note: `sed -i` works on Linux (inside Docker). No need for macOS fallback since tasks run in containers.
+
+### solution.sh template
+
+```bash
+#!/bin/bash
+# Restore correct model SQL from solutions/ and rebuild
+SOLUTIONS_DIR="$(dirname "$(readlink -f "${BASH_SOURCE}")")/solutions"
+cp "$SOLUTIONS_DIR/model_name.sql" models/path/to/model_name.sql
+dbt run --select model_name
+```
+
+### SQL test template
+
+Tests return **0 rows** to pass, **≥1 row** to fail.
+
+```sql
+-- tests/column_name_exists.sql
+-- Fails if the column is missing or null for every row
+select 1
+from {{ ref('target_model') }}
+where column_name is null
+having count(*) = count(1)
+limit 1
+```
+
+Or for simple presence test (fails if query errors because column doesn't exist):
+
+```sql
+-- tests/column_name_not_null.sql
+select 1
+from {{ ref('target_model') }}
+where column_name is not null
+limit 0
+```
+
+For data-correctness tests using solution seeds, see `CONTRIBUTING.md` — run `ade run helixops_saasNNN --agent sage --db duckdb --seed` to auto-generate.
+
+---
+
+
+## Tasks
+
+| # | Task ID | Source model(s) | Target model | Type | Prompt | setup.sh sed pattern | solution notes |
+|---|---------|----------------|--------------|------|--------|---------------------|----------------|
+| 1 | helixops_saas001 | stg_accounts | dim_accounts | A | "Add billing_country to dim_accounts." | `/    a.billing_country,/d` | add missing column from parent model |
+| 2 | helixops_saas002 | stg_accounts | dim_accounts + mart_account_360 | A | "Add the owning team to the account 360." | remove `a.owner_team` from dim_accounts and `a.owner_team` from mart_account_360 | multi-layer propagation — must add to dim_accounts first, then mart_account_360 |
+| 3 | helixops_saas003 | stg_workspaces | int_workspace_daily_metrics + int_account_daily_usage | C+A | "Please filter sandbox usage out of daily usage reporting." | (1) strip CASE from stg_workspaces — replace with `trim(env_tier) as environment_tier` (raw values exposed); (2) remove `w.environment_tier` from int_workspace_daily_metrics | agent must propagate environment_tier through metrics and filter, handling raw values 'sandbox'/'sbx' |
+| 4 | helixops_saas004 | stg_users | int_workspace_roster | C | "I need to be able to work out what departments users belong to across their workspace memberships. Please add a department column — you can infer it from job title if needed." | remove `u.department` from int_workspace_roster | agent must recognize department already exists in stg_users and add it directly rather than inferring from title |
+| 5 | helixops_saas005 | stg_users | int_account_users | C | "Can you fix the failing model." | (1) remove `lower(trim(email_addr)) as email` from stg_users; (2) rename `u.email` → `u.email_address` in int_account_users; (3) remove `u.email` from int_workspace_roster (no hints) | solution adds `email_address` to stg_users; agent must not just rename column in intermediate |
+| 6 | helixops_saas006 | int_subscription_history + int_account_billing_snapshot | mart_account_360 | B | "Please add net_mrr to the account 360, based on contracted price less discount, divided by 12 if billed annually." | add `ls.list_price_usd` to int_account_billing_snapshot (so b.list_price_usd, b.discount_pct, b.billing_cycle are all available in mart_account_360); run baseline dbt build | correct solution is `b.effective_monthly_value_usd as net_mrr` — not recalculating from inputs; tests validate column exists with correct values |
+| 7 | helixops_saas007 | int_subscription_history | int_account_billing_snapshot + mart_account_360 + mart_account_health | B | "Add region and segment to int_account_billing_snapshot as a single geo_segment column, and carry it through to the account 360 and account health marts." | none (run baseline build) | add `ls.region || ' / ' || ls.segment as geo_segment` to int_account_billing_snapshot; add `b.geo_segment` to mart_account_360 and mart_account_health; tests confirm geo_segment is present in those three models and absent from fct_support_tickets and fct_monthly_revenue |
+| 8 | helixops_saas008 | stg_accounts | dim_accounts + mart_account_360 + mart_account_health + fct_support_tickets | C | "stg_accounts has account_status instead of customer_status, please rename and propagate." | none (run baseline build) | rename in stg_accounts; update all downstream references through full DAG |
+| 9 | helixops_saas009 | stg_accounts | dim_accounts | B | "Please create a v2 of dim_accounts with account_status renamed to customer_status — this will become the primary version in 6 months." | none (run baseline build) | create dim_accounts_v2.sql as a new model with customer_status; old dim_accounts remains unchanged; tests verify dim_accounts_v2 exists with customer_status column and dim_accounts still has account_status |
+| 10 | helixops_saas010 | stg_workspaces | int_account_workspaces | B | "Please filter out archived workspaces after the staging layer." | none (run baseline build) | add WHERE workspace_status != 'archived' to int_account_workspaces |
+| 11 | helixops_saas011 | stg_workspaces | int_account_workspaces + dim_accounts | C | "The Falcon Works sandbox isn't showing up in dim_accounts." | strip CASE from stg_workspaces environment_tier — replace with `lower(trim(env_tier)) as environment_tier` (exposes raw 'sbx' value); run baseline build | agent must trace 'sbx' back to staging normalization and fix the CASE to include 'sbx' → 'sandbox' |
+| 12 | helixops_saas012 | int_monthly_revenue_prep | fct_monthly_revenue | refactor | "Please move the monthly revenue prep model into being a CTE for the main revenue model." | none (run baseline build) | inline int_monthly_revenue_prep SQL as a CTE in fct_monthly_revenue; delete int_monthly_revenue_prep.sql; SQL tests verify fct_monthly_revenue still has all expected columns; structural verification (model file deleted) requires a non-SQL test |
+| 13 | helixops_saas013 | stg_invoice_line_items | int_invoice_finance + fct_monthly_revenue | C | "Helio Systems' onboarding fees are being treated as recurring revenue, please fix it." | run DuckDB UPDATE: `UPDATE raw_invoice_line_items SET recurring_hint = 'Y' WHERE line_id IN ('L7102', 'L7220', 'L7342')`; run dbt build | fix is_recurring_line CASE in stg_invoice_line_items to explicitly exclude line_type='onboarding_addon' regardless of recurring_hint |
+| 14 | helixops_saas014 | stg_workspace_usage_daily | mart_account_360 | A | "Please add total API calls to the account 360." | remove api_calls from all 7 models: stg_workspace_usage_daily, int_workspace_daily_metrics, int_account_daily_usage, int_account_engagement, fct_daily_account_usage, mart_account_360, mart_account_health; run baseline build | agent must propagate through full DAG (stg → metrics → daily_usage → engagement → 360); must not jump directly from staging to mart |
+| 15 | helixops_saas015 | int_support_sla | fct_support_tickets | B | two variants: (base) "We updated our support SLAs effective 2025-06-16 at 08:00 UTC. Please move priority and response_sla_minutes into a new seed with valid_from and valid_to timestamps, and update the models accordingly. New SLAs: urgent=20min, high=90min, medium=300min, standard=1500min." (low) same but says low=1500min instead of standard. | none (run baseline build) | create seed with priority/response_sla_minutes/valid_from/valid_to; join in int_support_sla on priority + opened_at between valid_from and valid_to; 'standard' variant is a trap — actual priority value in data is 'low' (staging normalizes to low); valid_from must be timestamp '2025-06-16 08:00:00' not just date |
+| 16 | helixops_saas016 | int_support_sla | fct_support_tickets | B | "We have new SLA targets for enterprise accounts only, effective 2025-06-16 at 08:00 UTC. Please update the SLA model so enterprise accounts get: urgent=20min, high=45min, medium=120min, standard=900min. Other segments keep existing SLAs." | none (run baseline build) | seed must include segment column; join on priority + segment + opened_at between valid_from and valid_to; 'standard' is the trap again — enterprise segment value in data is 'enterprise' (lowercase); non-enterprise rows keep old CASE logic or fallback rows in seed |
+| 17 | helixops_saas017 | stg_users | int_workspace_roster | already-done | "Add department to the workspace roster." | none (no setup changes — department is already in the model) | correct behavior is no changes; marked expected-pass for the none agent |
diff --git a/shared/projects/dbt/helixops_saas/dbt_project.yml b/shared/projects/dbt/helixops_saas/dbt_project.yml
@@ -0,0 +1,22 @@
+name: 'helixops_saas'
+version: '1.0.0'
+config-version: 2
+
+profile: 'helixops_saas-duckdb'
+
+model-paths: ['models']
+macro-paths: ['macros']
+
+target-path: target
+clean-targets:
+  - target
+  - dbt_packages
+
+models:
+  helixops_saas:
+    staging:
+      +materialized: view
+    intermediate:
+      +materialized: view
+    marts:
+      +materialized: table
diff --git a/shared/projects/dbt/helixops_saas/macros/clean_helpers.sql b/shared/projects/dbt/helixops_saas/macros/clean_helpers.sql
@@ -0,0 +1,27 @@
+{% macro strip_numeric(col) -%}
+regexp_replace(cast({{ col }} as varchar), '[^0-9\.-]', '', 'g')
+{%- endmacro %}
+
+{% macro integer_from_text(col) -%}
+try_cast(nullif({{ strip_numeric(col) }}, '') as integer)
+{%- endmacro %}
+
+{% macro numeric_from_text(col) -%}
+try_cast(nullif({{ strip_numeric(col) }}, '') as double)
+{%- endmacro %}
+
+{% macro epoch_to_timestamp(col) -%}
+case
+  when {{ col }} is null then null
+  when lower(trim(cast({{ col }} as varchar))) in ('', 'null', 'n/a', 'na') then null
+  else to_timestamp(try_cast(regexp_replace(trim(cast({{ col }} as varchar)), '\.[0-9]+$', '') as bigint))::timestamp
+end
+{%- endmacro %}
+
+{% macro bool_from_text(col) -%}
+case
+  when lower(trim(cast({{ col }} as varchar))) in ('y', 'yes', '1', 'true', 't') then true
+  when lower(trim(cast({{ col }} as varchar))) in ('n', 'no', '0', 'false', 'f') then false
+  else null
+end
+{%- endmacro %}
diff --git a/shared/projects/dbt/helixops_saas/models/intermediate/int_account_billing_snapshot.sql b/shared/projects/dbt/helixops_saas/models/intermediate/int_account_billing_snapshot.sql
@@ -0,0 +1,66 @@
+with sub_hist as (
+    select * from {{ ref('int_subscription_history') }}
+),
+invoice_finance as (
+    select * from {{ ref('int_invoice_finance') }}
+),
+sub_ranked as (
+    select
+        s.*,
+        row_number() over (
+            partition by account_id
+            order by case when subscription_status = 'active' then 0 else 1 end, start_date desc, subscription_id desc
+        ) as rn
+    from sub_hist s
+),
+latest_sub as (
+    select * from sub_ranked where rn = 1
+),
+inv_ranked as (
+    select
+        f.*,
+        row_number() over (partition by account_id order by invoice_date desc, invoice_id desc) as rn
+    from invoice_finance f
+),
+latest_inv as (
+    select * from inv_ranked where rn = 1
+),
+acct_rollup as (
+    select
+        account_id,
+        sum(case when invoice_status = 'past_due' then 1 else 0 end) as past_due_invoice_count,
+        sum(case when invoice_status in ('open', 'past_due') then 1 else 0 end) as open_invoice_count,
+        max(case when invoice_status = 'past_due' then 1 else 0 end) as has_past_due_invoice
+    from invoice_finance
+    group by 1
+)
+select
+    coalesce(ls.account_id, li.account_id, ar.account_id) as account_id,
+    ls.subscription_id as latest_subscription_id,
+    ls.plan_id,
+    ls.plan_name,
+    ls.plan_family,
+    ls.billing_cycle,
+    ls.support_tier,
+    ls.subscription_status as latest_subscription_status,
+    ls.start_date as latest_subscription_start_date,
+    ls.end_date as latest_subscription_end_date,
+    ls.contracted_seats,
+    ls.discount_pct,
+    ls.effective_monthly_value_usd,
+    li.invoice_id as latest_invoice_id,
+    li.invoice_date as latest_invoice_date,
+    li.due_date as latest_invoice_due_date,
+    li.invoice_status as latest_invoice_status,
+    li.latest_payment_status,
+    li.latest_payment_method,
+    li.latest_payment_date,
+    li.total_usd as latest_invoice_total_usd,
+    li.amount_paid_usd as latest_invoice_amount_paid_usd,
+    li.outstanding_amount_usd as latest_outstanding_amount_usd,
+    coalesce(ar.past_due_invoice_count, 0) as past_due_invoice_count,
+    coalesce(ar.open_invoice_count, 0) as open_invoice_count,
+    case when coalesce(ar.has_past_due_invoice, 0) = 1 then true else false end as has_past_due_invoice
+from latest_sub ls
+full outer join latest_inv li on ls.account_id = li.account_id
+left join acct_rollup ar on coalesce(ls.account_id, li.account_id) = ar.account_id
diff --git a/shared/projects/dbt/helixops_saas/models/intermediate/int_account_daily_usage.sql b/shared/projects/dbt/helixops_saas/models/intermediate/int_account_daily_usage.sql
@@ -0,0 +1,20 @@
+with daily as (
+    select * from {{ ref('int_workspace_daily_metrics') }}
+)
+select
+    account_id,
+    max(account_name) as account_name,
+    max(industry) as industry,
+    max(segment) as segment,
+    max(region) as region,
+    max(billing_country) as billing_country,
+    usage_date,
+    count(*) as workspace_days_reporting,
+    sum(active_users) as active_users,
+    sum(projects_run) as projects_run,
+    sum(api_calls) as api_calls,
+    sum(alerts_sent) as alerts_sent,
+    max(storage_gb) as max_storage_gb,
+    avg(storage_gb) as avg_storage_gb
+from daily
+group by 1,7
diff --git a/shared/projects/dbt/helixops_saas/models/intermediate/int_account_engagement.sql b/shared/projects/dbt/helixops_saas/models/intermediate/int_account_engagement.sql
@@ -0,0 +1,53 @@
+with users as (
+    select * from {{ ref('int_account_users') }}
+),
+workspaces as (
+    select * from {{ ref('int_account_workspaces') }}
+),
+daily_usage as (
+    select * from {{ ref('int_account_daily_usage') }}
+),
+user_rollup as (
+    select
+        account_id,
+        count(*) as total_user_count,
+        sum(case when user_status = 'active' then 1 else 0 end) as active_user_count,
+        sum(case when user_status = 'inactive' then 1 else 0 end) as inactive_user_count,
+        sum(case when user_status = 'provisioned' then 1 else 0 end) as provisioned_user_count,
+        max(last_login_at) as latest_user_login_at
+    from users
+    group by 1
+),
+usage_rollup as (
+    select
+        account_id,
+        max(usage_date) as latest_usage_date,
+        avg(case when usage_date >= cast(now() as date) - interval 6 day then active_users end) as avg_active_users_7d,
+        avg(active_users) as avg_active_users_30d,
+        sum(projects_run) as total_projects_run_30d,
+        sum(api_calls) as total_api_calls_30d,
+        sum(alerts_sent) as total_alerts_30d,
+        max(max_storage_gb) as peak_storage_gb_30d
+    from daily_usage
+    group by 1
+)
+select
+    coalesce(u.account_id, w.account_id, g.account_id) as account_id,
+    coalesce(u.total_user_count, 0) as total_user_count,
+    coalesce(u.active_user_count, 0) as active_user_count,
+    coalesce(u.inactive_user_count, 0) as inactive_user_count,
+    coalesce(u.provisioned_user_count, 0) as provisioned_user_count,
+    u.latest_user_login_at,
+    coalesce(w.workspace_count, 0) as workspace_count,
+    coalesce(w.active_workspace_count, 0) as active_workspace_count,
+    coalesce(w.sandbox_workspace_count, 0) as sandbox_workspace_count,
+    g.latest_usage_date,
+    coalesce(g.avg_active_users_7d, 0) as avg_active_users_7d,
+    coalesce(g.avg_active_users_30d, 0) as avg_active_users_30d,
+    coalesce(g.total_projects_run_30d, 0) as total_projects_run_30d,
+    coalesce(g.total_api_calls_30d, 0) as total_api_calls_30d,
+    coalesce(g.total_alerts_30d, 0) as total_alerts_30d,
+    coalesce(g.peak_storage_gb_30d, 0) as peak_storage_gb_30d
+from user_rollup u
+full outer join workspaces w on u.account_id = w.account_id
+full outer join usage_rollup g on coalesce(u.account_id, w.account_id) = g.account_id
diff --git a/shared/projects/dbt/helixops_saas/models/intermediate/int_account_users.sql b/shared/projects/dbt/helixops_saas/models/intermediate/int_account_users.sql
@@ -0,0 +1,28 @@
+with users as (
+    select * from {{ ref('stg_users') }}
+),
+accounts as (
+    select * from {{ ref('stg_accounts') }}
+)
+select
+    u.user_id,
+    u.account_id,
+    a.account_name,
+    a.industry,
+    a.region,
+    a.segment,
+    a.billing_country,
+    a.account_status,
+    a.owner_team,
+    u.email,
+    u.full_name,
+    u.title,
+    u.department,
+    u.created_at,
+    u.last_login_at,
+    u.user_status,
+    u.is_active_user,
+    u.is_test_user,
+    date_diff('day', cast(u.last_login_at as date), cast(now() as date)) as days_since_last_login
+from users u
+left join accounts a using (account_id)
diff --git a/shared/projects/dbt/helixops_saas/models/intermediate/int_account_workspaces.sql b/shared/projects/dbt/helixops_saas/models/intermediate/int_account_workspaces.sql
@@ -0,0 +1,15 @@
+with workspaces as (
+    select * from {{ ref('stg_workspaces') }}
+)
+select
+    account_id,
+    count(*) as workspace_count,
+    sum(case when workspace_status = 'active' then 1 else 0 end) as active_workspace_count,
+    sum(case when environment_tier = 'prod' then 1 else 0 end) as prod_workspace_count,
+    sum(case when environment_tier = 'sandbox' then 1 else 0 end) as sandbox_workspace_count,
+    sum(case when is_primary then 1 else 0 end) as primary_workspace_count,
+    max(case when is_primary then workspace_name end) as primary_workspace_name,
+    min(created_at) as first_workspace_created_at,
+    max(created_at) as latest_workspace_created_at
+from workspaces
+group by 1