Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
2f83edf
feat(ops_pilot): add ops_pilot shared dbt project
joellabes Mar 18, 2026
b48b056
docs: add ops_pilot025 benchmark tasks implementation plan
joellabes Mar 18, 2026
0766481
refactor: rename ops_pilot -> helixops_saas throughout
joellabes Mar 18, 2026
1da9fe1
feat(helixops_saas): add 17 benchmark tasks for helixops_saas shared …
joellabes Mar 19, 2026
1a2742e
fix: add author_email to all helixops_saas task.yaml files
joellabes Mar 19, 2026
11ddd10
refactor(helixops_saas): replace full-file solutions with patch-based…
joellabes Mar 23, 2026
0389584
Merge branch 'main' into feature/ops-pilot-shared-project
joellabes Mar 23, 2026
ed232c2
fix(helixops_saas): rebuild full project in setup before applying bro…
joellabes Mar 23, 2026
3569908
fix(helixops_saas): remove dummy.sql placeholders and add solution seeds
joellabes Mar 23, 2026
03e50f3
fix(helixops_saas): fix 5 failing dbt-fusion CI tasks
joellabes Mar 24, 2026
4da7ffc
fix(helixops_saas): exclude now()-dependent columns via task.yaml
joellabes Mar 24, 2026
2029148
fix(helixops_saas): fix 3 tasks that pass for the none agent
joellabes Mar 24, 2026
735d020
docs(helixops_saas): update task descriptions to reflect key challenge
joellabes Mar 24, 2026
2732bc1
fix(helixops_saas): apply PR #137 review comments across 9 tasks
joellabes Mar 24, 2026
0970bf1
fix(helixops_saas006): write failing singular test instead of exit 1 …
joellabes Mar 25, 2026
7d0f456
fix(helixops_saas): apply PR #137 new review comments (non-seed changes)
joellabes Mar 30, 2026
1a6d8a7
fix(helixops_saas015): detect standard→low normalization via compile …
joellabes Mar 30, 2026
0f75247
fix(helixops_saas015): broaden standard-normalization check to cover …
joellabes Mar 30, 2026
eb40bf7
seed(helixops_saas): regenerate seeds for saas007/008/010
joellabes Mar 30, 2026
03e63e8
feat(helixops_saas018): add harder variant of saas006 with list_price…
joellabes Mar 30, 2026
9145735
remove(helixops_saas014): delete task — solution naming disagreement …
joellabes Mar 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,8 @@ jobs:
results = json.load(f)

# Tasks that are allowed to pass even for the "none" agent
ALLOWED_TO_PASS = {"analytics_engineering001"}
# (e.g. no-op tasks where the answer is already in place)
ALLOWED_TO_PASS = {"analytics_engineering001", "helixops_saas017"}

failed_tasks = []
passed_tasks = []
Expand Down
1 change: 1 addition & 0 deletions ade_bench/harness.py
Original file line number Diff line number Diff line change
Expand Up @@ -1168,6 +1168,7 @@ def _extract_duckdb_csv(
import duckdb

con = duckdb.connect(temp_db_path)
con.execute("SET TimeZone = 'UTC';")

# Collect all schema information
all_schemas = []
Expand Down
125 changes: 125 additions & 0 deletions docs/superpowers/plans/2026-03-18-helixops-saas-tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# helixops_saas Benchmark Tasks Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Create 25 ADE-bench benchmark tasks that test an AI agent's ability to make targeted dbt model changes against the helixops_saas shared project.

**Architecture:** Each task lives in `tasks/helixops_saas0NN/`, contains a `task.yaml`, `setup.sh`, `solution.sh`, a `solutions/` directory with correct SQL, and a `tests/` directory with SQL validation queries. The shared project (`shared/projects/dbt/helixops_saas`) and database (`shared/databases/duckdb/helixops_saas.duckdb`) are referenced but not modified by individual tasks. setup.sh puts the project into a broken/incomplete state; the agent's job is to fix it; solution.sh is the answer key.

**Tech Stack:** bash, dbt-core 1.10.11, dbt-duckdb 1.9.3, DuckDB 1.3.0. DuckDB-only for now (Snowflake deferred). No external dbt packages.

---

## Task Classification

Tasks fall into three categories based on the model changes required:

**Type A — Remove-and-restore:** The field already exists in the correct model. `setup.sh` removes it with `sed`. The agent must identify what is missing and restore it. `solution.sh` copies back the full correct SQL.

**Type B — Genuine addition:** The field does not currently exist. `setup.sh` runs a baseline dbt build. The agent must implement a new column from scratch. `solution.sh` applies the correct implementation.

**Type C — Logic change:** The field exists but with wrong logic. `setup.sh` applies the broken logic. The agent must fix it. `solution.sh` restores correct logic.

---

## Common Patterns

### task.yaml template

```yaml
task_id: helixops_saasNNN
status: ready
description: One-line description
prompts:
- key: base
prompt: |-
<task description — goal-oriented, no project context, no command hints>
author_name: joel
difficulty: easy
tags:
- dbt
- helixops_saas
variants:
- db_type: duckdb
db_name: helixops_saas
project_type: dbt
project_name: helixops_saas
- db_type: duckdb
db_name: helixops_saas
project_type: dbt-fusion
project_name: helixops_saas
solution_seeds:
- table_name: <affected_mart_or_fact_table>
```

### setup.sh template (Type A — remove a column)

```bash
#!/bin/bash
# Remove target column from model, then build baseline
sed -i '/ column_expression_to_remove,/d' models/path/to/model.sql
dbt run --select model_name
```

> Note: `sed -i` works on Linux (inside Docker). No need for macOS fallback since tasks run in containers.

### solution.sh template

```bash
#!/bin/bash
# Restore correct model SQL from solutions/ and rebuild
SOLUTIONS_DIR="$(dirname "$(readlink -f "${BASH_SOURCE}")")/solutions"
cp "$SOLUTIONS_DIR/model_name.sql" models/path/to/model_name.sql
dbt run --select model_name
```

### SQL test template

Tests return **0 rows** to pass, **≥1 row** to fail.

```sql
-- tests/column_name_exists.sql
-- Fails if the column is missing or null for every row
select 1
from {{ ref('target_model') }}
where column_name is null
having count(*) = count(1)
limit 1
```

Or for simple presence test (fails if query errors because column doesn't exist):

```sql
-- tests/column_name_not_null.sql
select 1
from {{ ref('target_model') }}
where column_name is not null
limit 0
```

For data-correctness tests using solution seeds, see `CONTRIBUTING.md` — run `ade run helixops_saasNNN --agent sage --db duckdb --seed` to auto-generate.

---


## Tasks

| # | Task ID | Source model(s) | Target model | Type | Prompt | setup.sh sed pattern | solution notes |
|---|---------|----------------|--------------|------|--------|---------------------|----------------|
| 1 | helixops_saas001 | stg_accounts | dim_accounts | A | "Add billing_country to dim_accounts." | `/ a.billing_country,/d` | add missing column from parent model |
| 2 | helixops_saas002 | stg_accounts | dim_accounts + mart_account_360 | A | "Add the owning team to the account 360." | remove `a.owner_team` from dim_accounts and `a.owner_team` from mart_account_360 | multi-layer propagation — must add to dim_accounts first, then mart_account_360 |
| 3 | helixops_saas003 | stg_workspaces | int_workspace_daily_metrics + int_account_daily_usage | C+A | "Please filter sandbox usage out of daily usage reporting." | (1) strip CASE from stg_workspaces — replace with `trim(env_tier) as environment_tier` (raw values exposed); (2) remove `w.environment_tier` from int_workspace_daily_metrics | agent must propagate environment_tier through metrics and filter, handling raw values 'sandbox'/'sbx' |
| 4 | helixops_saas004 | stg_users | int_workspace_roster | C | "I need to be able to work out what departments users belong to across their workspace memberships. Please add a department column — you can infer it from job title if needed." | remove `u.department` from int_workspace_roster | agent must recognize department already exists in stg_users and add it directly rather than inferring from title |
| 5 | helixops_saas005 | stg_users | int_account_users | C | "Can you fix the failing model." | (1) remove `lower(trim(email_addr)) as email` from stg_users; (2) rename `u.email` → `u.email_address` in int_account_users; (3) remove `u.email` from int_workspace_roster (no hints) | solution adds `email_address` to stg_users; agent must not just rename column in intermediate |
| 6 | helixops_saas006 | int_subscription_history + int_account_billing_snapshot | mart_account_360 | B | "Please add net_mrr to the account 360, based on contracted price less discount, divided by 12 if billed annually." | add `ls.list_price_usd` to int_account_billing_snapshot (so b.list_price_usd, b.discount_pct, b.billing_cycle are all available in mart_account_360); run baseline dbt build | correct solution is `b.effective_monthly_value_usd as net_mrr` — not recalculating from inputs; tests validate column exists with correct values |
| 7 | helixops_saas007 | int_subscription_history | int_account_billing_snapshot + mart_account_360 + mart_account_health | B | "Add region and segment to int_account_billing_snapshot as a single geo_segment column, and carry it through to the account 360 and account health marts." | none (run baseline build) | add `ls.region || ' / ' || ls.segment as geo_segment` to int_account_billing_snapshot; add `b.geo_segment` to mart_account_360 and mart_account_health; tests confirm geo_segment is present in those three models and absent from fct_support_tickets and fct_monthly_revenue |
| 8 | helixops_saas008 | stg_accounts | dim_accounts + mart_account_360 + mart_account_health + fct_support_tickets | C | "stg_accounts has account_status instead of customer_status, please rename and propagate." | none (run baseline build) | rename in stg_accounts; update all downstream references through full DAG |
| 9 | helixops_saas009 | stg_accounts | dim_accounts | B | "Please create a v2 of dim_accounts with account_status renamed to customer_status — this will become the primary version in 6 months." | none (run baseline build) | create dim_accounts_v2.sql as a new model with customer_status; old dim_accounts remains unchanged; tests verify dim_accounts_v2 exists with customer_status column and dim_accounts still has account_status |
| 10 | helixops_saas010 | stg_workspaces | int_account_workspaces | B | "Please filter out archived workspaces after the staging layer." | none (run baseline build) | add WHERE workspace_status != 'archived' to int_account_workspaces |
| 11 | helixops_saas011 | stg_workspaces | int_account_workspaces + dim_accounts | C | "The Falcon Works sandbox isn't showing up in dim_accounts." | strip CASE from stg_workspaces environment_tier — replace with `lower(trim(env_tier)) as environment_tier` (exposes raw 'sbx' value); run baseline build | agent must trace 'sbx' back to staging normalization and fix the CASE to include 'sbx' → 'sandbox' |
| 12 | helixops_saas012 | int_monthly_revenue_prep | fct_monthly_revenue | refactor | "Please move the monthly revenue prep model into being a CTE for the main revenue model." | none (run baseline build) | inline int_monthly_revenue_prep SQL as a CTE in fct_monthly_revenue; delete int_monthly_revenue_prep.sql; SQL tests verify fct_monthly_revenue still has all expected columns; structural verification (model file deleted) requires a non-SQL test |
| 13 | helixops_saas013 | stg_invoice_line_items | int_invoice_finance + fct_monthly_revenue | C | "Helio Systems' onboarding fees are being treated as recurring revenue, please fix it." | run DuckDB UPDATE: `UPDATE raw_invoice_line_items SET recurring_hint = 'Y' WHERE line_id IN ('L7102', 'L7220', 'L7342')`; run dbt build | fix is_recurring_line CASE in stg_invoice_line_items to explicitly exclude line_type='onboarding_addon' regardless of recurring_hint |
| 14 | helixops_saas014 | stg_workspace_usage_daily | mart_account_360 | A | "Please add total API calls to the account 360." | remove api_calls from all 7 models: stg_workspace_usage_daily, int_workspace_daily_metrics, int_account_daily_usage, int_account_engagement, fct_daily_account_usage, mart_account_360, mart_account_health; run baseline build | agent must propagate through full DAG (stg → metrics → daily_usage → engagement → 360); must not jump directly from staging to mart |
| 15 | helixops_saas015 | int_support_sla | fct_support_tickets | B | two variants: (base) "We updated our support SLAs effective 2025-06-16 at 08:00 UTC. Please move priority and response_sla_minutes into a new seed with valid_from and valid_to timestamps, and update the models accordingly. New SLAs: urgent=20min, high=90min, medium=300min, standard=1500min." (low) same but says low=1500min instead of standard. | none (run baseline build) | create seed with priority/response_sla_minutes/valid_from/valid_to; join in int_support_sla on priority + opened_at between valid_from and valid_to; 'standard' variant is a trap — actual priority value in data is 'low' (staging normalizes to low); valid_from must be timestamp '2025-06-16 08:00:00' not just date |
| 16 | helixops_saas016 | int_support_sla | fct_support_tickets | B | "We have new SLA targets for enterprise accounts only, effective 2025-06-16 at 08:00 UTC. Please update the SLA model so enterprise accounts get: urgent=20min, high=45min, medium=120min, standard=900min. Other segments keep existing SLAs." | none (run baseline build) | seed must include segment column; join on priority + segment + opened_at between valid_from and valid_to; 'standard' is the trap again — enterprise segment value in data is 'enterprise' (lowercase); non-enterprise rows keep old CASE logic or fallback rows in seed |
| 17 | helixops_saas017 | stg_users | int_workspace_roster | already-done | "Add department to the workspace roster." | none (no setup changes — department is already in the model) | correct behavior is no changes; marked expected-pass for the none agent |
22 changes: 22 additions & 0 deletions shared/projects/dbt/helixops_saas/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: 'helixops_saas'
version: '1.0.0'
config-version: 2

profile: 'helixops_saas-duckdb'

model-paths: ['models']
macro-paths: ['macros']

target-path: target
clean-targets:
- target
- dbt_packages

models:
helixops_saas:
staging:
+materialized: view
intermediate:
+materialized: view
marts:
+materialized: table
27 changes: 27 additions & 0 deletions shared/projects/dbt/helixops_saas/macros/clean_helpers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{% macro strip_numeric(col) -%}
regexp_replace(cast({{ col }} as varchar), '[^0-9\.-]', '', 'g')
{%- endmacro %}

{% macro integer_from_text(col) -%}
try_cast(nullif({{ strip_numeric(col) }}, '') as integer)
{%- endmacro %}

{% macro numeric_from_text(col) -%}
try_cast(nullif({{ strip_numeric(col) }}, '') as double)
{%- endmacro %}

{% macro epoch_to_timestamp(col) -%}
case
when {{ col }} is null then null
when lower(trim(cast({{ col }} as varchar))) in ('', 'null', 'n/a', 'na') then null
else to_timestamp(try_cast(regexp_replace(trim(cast({{ col }} as varchar)), '\.[0-9]+$', '') as bigint))::timestamp
end
{%- endmacro %}

{% macro bool_from_text(col) -%}
case
when lower(trim(cast({{ col }} as varchar))) in ('y', 'yes', '1', 'true', 't') then true
when lower(trim(cast({{ col }} as varchar))) in ('n', 'no', '0', 'false', 'f') then false
else null
end
{%- endmacro %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
with sub_hist as (
select * from {{ ref('int_subscription_history') }}
),
invoice_finance as (
select * from {{ ref('int_invoice_finance') }}
),
sub_ranked as (
select
s.*,
row_number() over (
partition by account_id
order by case when subscription_status = 'active' then 0 else 1 end, start_date desc, subscription_id desc
) as rn
from sub_hist s
),
latest_sub as (
select * from sub_ranked where rn = 1
),
inv_ranked as (
select
f.*,
row_number() over (partition by account_id order by invoice_date desc, invoice_id desc) as rn
from invoice_finance f
),
latest_inv as (
select * from inv_ranked where rn = 1
),
acct_rollup as (
select
account_id,
sum(case when invoice_status = 'past_due' then 1 else 0 end) as past_due_invoice_count,
sum(case when invoice_status in ('open', 'past_due') then 1 else 0 end) as open_invoice_count,
max(case when invoice_status = 'past_due' then 1 else 0 end) as has_past_due_invoice
from invoice_finance
group by 1
)
select
coalesce(ls.account_id, li.account_id, ar.account_id) as account_id,
ls.subscription_id as latest_subscription_id,
ls.plan_id,
ls.plan_name,
ls.plan_family,
ls.billing_cycle,
ls.support_tier,
ls.subscription_status as latest_subscription_status,
ls.start_date as latest_subscription_start_date,
ls.end_date as latest_subscription_end_date,
ls.contracted_seats,
ls.discount_pct,
ls.effective_monthly_value_usd,
li.invoice_id as latest_invoice_id,
li.invoice_date as latest_invoice_date,
li.due_date as latest_invoice_due_date,
li.invoice_status as latest_invoice_status,
li.latest_payment_status,
li.latest_payment_method,
li.latest_payment_date,
li.total_usd as latest_invoice_total_usd,
li.amount_paid_usd as latest_invoice_amount_paid_usd,
li.outstanding_amount_usd as latest_outstanding_amount_usd,
coalesce(ar.past_due_invoice_count, 0) as past_due_invoice_count,
coalesce(ar.open_invoice_count, 0) as open_invoice_count,
case when coalesce(ar.has_past_due_invoice, 0) = 1 then true else false end as has_past_due_invoice
from latest_sub ls
full outer join latest_inv li on ls.account_id = li.account_id
left join acct_rollup ar on coalesce(ls.account_id, li.account_id) = ar.account_id
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
with daily as (
select * from {{ ref('int_workspace_daily_metrics') }}
)
select
account_id,
max(account_name) as account_name,
max(industry) as industry,
max(segment) as segment,
max(region) as region,
max(billing_country) as billing_country,
usage_date,
count(*) as workspace_days_reporting,
sum(active_users) as active_users,
sum(projects_run) as projects_run,
sum(api_calls) as api_calls,
sum(alerts_sent) as alerts_sent,
max(storage_gb) as max_storage_gb,
avg(storage_gb) as avg_storage_gb
from daily
group by 1,7
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
with users as (
select * from {{ ref('int_account_users') }}
),
workspaces as (
select * from {{ ref('int_account_workspaces') }}
),
daily_usage as (
select * from {{ ref('int_account_daily_usage') }}
),
user_rollup as (
select
account_id,
count(*) as total_user_count,
sum(case when user_status = 'active' then 1 else 0 end) as active_user_count,
sum(case when user_status = 'inactive' then 1 else 0 end) as inactive_user_count,
sum(case when user_status = 'provisioned' then 1 else 0 end) as provisioned_user_count,
max(last_login_at) as latest_user_login_at
from users
group by 1
),
usage_rollup as (
select
account_id,
max(usage_date) as latest_usage_date,
avg(case when usage_date >= cast(now() as date) - interval 6 day then active_users end) as avg_active_users_7d,
avg(active_users) as avg_active_users_30d,
sum(projects_run) as total_projects_run_30d,
sum(api_calls) as total_api_calls_30d,
sum(alerts_sent) as total_alerts_30d,
max(max_storage_gb) as peak_storage_gb_30d
from daily_usage
group by 1
)
select
coalesce(u.account_id, w.account_id, g.account_id) as account_id,
coalesce(u.total_user_count, 0) as total_user_count,
coalesce(u.active_user_count, 0) as active_user_count,
coalesce(u.inactive_user_count, 0) as inactive_user_count,
coalesce(u.provisioned_user_count, 0) as provisioned_user_count,
u.latest_user_login_at,
coalesce(w.workspace_count, 0) as workspace_count,
coalesce(w.active_workspace_count, 0) as active_workspace_count,
coalesce(w.sandbox_workspace_count, 0) as sandbox_workspace_count,
g.latest_usage_date,
coalesce(g.avg_active_users_7d, 0) as avg_active_users_7d,
coalesce(g.avg_active_users_30d, 0) as avg_active_users_30d,
coalesce(g.total_projects_run_30d, 0) as total_projects_run_30d,
coalesce(g.total_api_calls_30d, 0) as total_api_calls_30d,
coalesce(g.total_alerts_30d, 0) as total_alerts_30d,
coalesce(g.peak_storage_gb_30d, 0) as peak_storage_gb_30d
from user_rollup u
full outer join workspaces w on u.account_id = w.account_id
full outer join usage_rollup g on coalesce(u.account_id, w.account_id) = g.account_id
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
with users as (
select * from {{ ref('stg_users') }}
),
accounts as (
select * from {{ ref('stg_accounts') }}
)
select
u.user_id,
u.account_id,
a.account_name,
a.industry,
a.region,
a.segment,
a.billing_country,
a.account_status,
a.owner_team,
u.email,
u.full_name,
u.title,
u.department,
u.created_at,
u.last_login_at,
u.user_status,
u.is_active_user,
u.is_test_user,
date_diff('day', cast(u.last_login_at as date), cast(now() as date)) as days_since_last_login
from users u
left join accounts a using (account_id)
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
with workspaces as (
select * from {{ ref('stg_workspaces') }}
)
select
account_id,
count(*) as workspace_count,
sum(case when workspace_status = 'active' then 1 else 0 end) as active_workspace_count,
sum(case when environment_tier = 'prod' then 1 else 0 end) as prod_workspace_count,
sum(case when environment_tier = 'sandbox' then 1 else 0 end) as sandbox_workspace_count,
sum(case when is_primary then 1 else 0 end) as primary_workspace_count,
max(case when is_primary then workspace_name end) as primary_workspace_name,
min(created_at) as first_workspace_created_at,
max(created_at) as latest_workspace_created_at
from workspaces
group by 1
Loading
Loading