feat(seer): Shard night shift triage into per-chunk feature runs by trevor-e · Pull Request #118080 · getsentry/sentry

trevor-e · 2026-06-18T23:11:30Z

Night shift dispatched all scored candidates to one Seer feature run, which degrades the triage agent on large sets. This shards candidates into chunks of seer.night_shift.shard_size (default 5), dispatching each chunk as its own feature run — so max_candidates=15 triggers 3 triage agents of 5.

New SeerNightShiftRunShard model (migration 0019) makes run → SeerRun one-to-many; one shard per chunk.
Delivery resolves the run via a shard's SeerRun uuid, falling back to the legacy seer_run FK for pre-shard runs.
Legacy seer_run FK kept (first shard) for the transition; backfilled and dropped in follow-up PRs.

A night shift run dispatched all scored candidates to a single Seer feature run. Large candidate sets degrade a single triage agent (limited time and context), so split candidates into chunks of seer.night_shift.shard_size (default 5) and dispatch each chunk as its own feature run / SeerRun. Each shard is recorded as a new SeerNightShiftRunShard, making the run -> SeerRun relationship one-to-many. Result delivery resolves the run via a shard's SeerRun uuid, falling back to the legacy scalar seer_run FK for runs created before sharding. The legacy FK still points at the first shard during the transition; it is backfilled and dropped in follow-up PRs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The shard model is generic to any night shift workflow, not just triage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-18T23:23:48Z

This PR has a migration; here is the generated SQL for src/sentry/seer/migrations/0019_add_night_shift_run_shard.py

for 0019_add_night_shift_run_shard in seer

--
-- Create model SeerNightShiftRunShard
--
CREATE TABLE "seer_nightshiftrunshard" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "date_updated" timestamp with time zone NOT NULL, "date_added" timestamp with time zone NOT NULL, "extras" jsonb DEFAULT '{}'::jsonb NOT NULL, "run_id" bigint NOT NULL, "seer_run_id" bigint NULL UNIQUE);
ALTER TABLE "seer_nightshiftrunshard" ADD CONSTRAINT "seer_nightshiftrunsh_run_id_574f3dc7_fk_seer_nigh" FOREIGN KEY ("run_id") REFERENCES "seer_nightshiftrun" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "seer_nightshiftrunshard" VALIDATE CONSTRAINT "seer_nightshiftrunsh_run_id_574f3dc7_fk_seer_nigh";
ALTER TABLE "seer_nightshiftrunshard" ADD CONSTRAINT "seer_nightshiftrunshard_seer_run_id_aa8616a4_fk_seer_seerrun_id" FOREIGN KEY ("seer_run_id") REFERENCES "seer_seerrun" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "seer_nightshiftrunshard" VALIDATE CONSTRAINT "seer_nightshiftrunshard_seer_run_id_aa8616a4_fk_seer_seerrun_id";
CREATE INDEX CONCURRENTLY "seer_nightshiftrunshard_run_id_574f3dc7" ON "seer_nightshiftrunshard" ("run_id");

Each SeerRun is dispatched for exactly one shard, so model the link as one-to-one to enforce the invariant at the DB level. Keep SET_NULL: the SeerRun is a mirror the shard references, not its owner, and gets TTL-cleaned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

trevor-e · 2026-06-19T00:07:07Z

-    except SeerNightShiftRun.DoesNotExist:
+    run = (
+        SeerNightShiftRun.objects.filter(organization_id=organization_id)
+        .filter(Q(shards__seer_run__uuid=run_uuid) | Q(seer_run__uuid=run_uuid))


This is to be backwards compatible until I perform a data migration.

Sharded runs share one SeerNightShiftRun, so writing per-delivery error_message to the run let one shard's success clear another shard's error, and pinned the legacy seer_run FK to shard_index 0 even if that chunk failed to dispatch. Record delivery errors on the shard, and point the legacy FK at the first successfully dispatched shard. Addresses Cursor review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Nothing outside the delivery fallback reads SeerNightShiftRun.seer_run, and sharded runs resolve via their shards. Leave the scalar FK null on new runs instead of pointing it at the first shard — this drops the first-shard tracking in dispatch and is a step toward removing the column once pre-shard rows are backfilled. The delivery read-fallback stays for those rows until then. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Per-shard delivery errors record on SeerNightShiftRunShard.extras, but the run serializer read errorMessage only from the run, so a failed shard could read as a healthy run. Surface a shard error_message when the run itself has none. Addresses Cursor review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

shard.seer_run is Optional (nullable OneToOne), so tests asserting on it tripped union-attr under CI mypy. Use the non-null SeerRun from the dispatch helper / captured locals instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit b5a6802. Configure here.}

When some shards failed to dispatch but at least one succeeded, the run was treated as fully successful and the API errorMessage stayed empty, hiding that candidates in the failed chunks were never triaged. Record a run-level error_message for partial failures (delivery only clears per-shard errors, so it persists) and emit a metric. Addresses Cursor review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sentry · 2026-06-19T07:44:28Z

-            on_run_created=_link_run,
-        )
-    except Exception:
+    shards = list(chunked(scored, options.get("seer.night_shift.shard_size")))


Bug: The seer.night_shift.shard_size option lacks validation. If set to 0, the chunked function will group all items into a single shard, defeating the purpose of sharding.
_{Severity: MEDIUM}

Suggested Fix

Add validation to the seer.night_shift.shard_size option registration in src/sentry/options/defaults.py to enforce a minimum value of 1. Alternatively, add a check in src/sentry/tasks/seer/night_shift/cron.py before calling chunked to handle a shard_size of 0 or less, perhaps by falling back to the default value or raising an error.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/tasks/seer/night_shift/cron.py#L539 Potential issue: The `seer.night_shift.shard_size` option, which is modifiable by operators, lacks validation to prevent non-positive values. If an operator sets this value to `0`, the `chunked` utility at `src/sentry/tasks/seer/night_shift/cron.py:539` will not create multiple small shards. Instead, it will silently create a single, large chunk containing all items. This defeats the purpose of the sharding logic, which is to prevent performance degradation in the triage agent by processing large candidate sets. The result is that all candidates are sent to the agent at once.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 18, 2026

docs(seer): Make SeerNightShiftRunShard docstring workflow-agnostic

291a074

The shard model is generic to any night shift workflow, not just triage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cursor Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread src/sentry/seer/night_shift/delivery.py

Comment thread src/sentry/tasks/seer/night_shift/cron.py Outdated

trevor-e commented Jun 18, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/night_shift/cron.py Outdated

trevor-e commented Jun 19, 2026

View reviewed changes

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/sentry/seer/night_shift/delivery.py

trevor-e and others added 3 commits June 18, 2026 20:18

cursor Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread src/sentry/tasks/seer/night_shift/cron.py

chromy marked this pull request as ready for review June 19, 2026 07:41

chromy requested review from a team as code owners June 19, 2026 07:41

sentry Bot reviewed Jun 19, 2026

View reviewed changes

chromy approved these changes Jun 19, 2026

View reviewed changes

chromy merged commit 626bb09 into master Jun 19, 2026
86 checks passed

chromy deleted the telkins/night-shift-shard-triage branch June 19, 2026 07:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(seer): Shard night shift triage into per-chunk feature runs#118080

feat(seer): Shard night shift triage into per-chunk feature runs#118080
chromy merged 8 commits into
masterfrom
telkins/night-shift-shard-triage

trevor-e commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trevor-e Jun 19, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

sentry Bot Jun 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

trevor-e commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trevor-e Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sentry Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 18, 2026 •

edited

Loading