You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Codex transcript review and commit-history analysis surfaced several recurring, mechanically detectable failure patterns that should be codified through existing repo checks. The goal is to capture only clear wins: low-noise checks that would have prevented repeated review findings, bugfix follow-ups, or command/workflow mistakes.
Methodology
I scanned local Codex transcript data from /Users/dcramer/.codex and then cross-checked the themes against repository commit history.
Transcript coverage:
1,239 Codex transcript JSONL files total.
469 files with exact cwd /Users/dcramer/src/junior.
581 files with Junior-like cwd, including Codex worktrees.
797 sessions that materially reference Junior.
Final reference-inclusive pass found 7,870 Junior-relevant messages and 3,048 finding/review-like messages.
Signals used:
Structured JSONL parsing of session metadata, user messages, assistant messages, tool commands, and command failures.
Filtering for finding/review-like text: severity labels, impact:, fix:, violates, review task prompts, and subagent notifications.
Command-pattern scans for known repo workflow mistakes such as evals through raw Vitest or -- forwarding.
Commit-history scan over 1,645 visible commits, including 657 fix-like commits.
Strong commit-history clusters:
Slack/message/routing: 147 fix-like commits.
Auth/credentials: 85.
Sandbox/egress: 83.
Plugin/runtime boundaries: 72.
Observability: 66.
Deploy/build/package tracing: 64.
SQL/conversation metadata: 55.
Queue/task execution: 45.
Existing check surfaces to reuse:
pnpm lint, already runs oxlint, ast-grep scan, and package:lint.
pnpm typecheck, focused package scripts, and targeted component/integration tests.
Tasks
Add an eval command/documentation check.
Detect and fail on eval guidance/examples that use raw pnpm exec vitest, pnpm exec vitest ... *.eval.ts, pnpm --filter @sentry/junior-evals evals -- ..., or evals ... -- -t.
Scope to AGENTS.md, packages/junior-evals/README.md, docs, specs, and eval package files.
Wire into pnpm lint.
Evidence: repeated transcript command drift plus fix: Route Sentry telemetry and simplify evals, fix(evals): Use runtime adapter overrides, and fix(junior-evals): Align harness with eval types.
Add an eval harness/schema drift check.
Detect old repo-local eval result surfaces such as result.output, assistant_posts, channel_posts, observed_tool_invocations, local transcript/event-log schemas, and contract/allow rubric remnants.
Prefer an ast-grep or focused text/AST check over broad grep where practical.
Add SQL/conversation boundary checks and invariant tests.
Static checks: ban generic raw row claims like query<T>() and raw SQL execution outside provider/migration modules.
Test invariants: backfill is non-destructive, SQL execution state is monotonic, transient work/lease fields do not leak into conversation metadata records.
Evidence: dense SQL/conversation fix chain around projection isolation, monotonic execution, leases, migrations, and backfill.
Keep Slack routing/message behavior primarily in integration/eval coverage, not static lint.
Static lint should only cover hard boundary mistakes, such as forbidden imports or old Slack test helpers.
Evidence: Slack has the largest fix cluster, but most issues are behavior contracts rather than syntax-level mistakes.
Keep broad "no fallback" as review guidance, not a generic linter.
Codify only narrow typed-boundary variants, such as auth/egress structured signals.
Evidence: transcript count is high, but broad fallback detection would be noisy and intent-dependent.
Recommended First Slice
Eval command/documentation check.
Eval harness/schema drift check.
Changed-spec metadata check.
Runtime singleton/test mutation ast-grep rules.
Production composition-root import boundary check.
These are the clearest low-noise wins and fit the existing pnpm lint/ast-grep setup.
Summary
Codex transcript review and commit-history analysis surfaced several recurring, mechanically detectable failure patterns that should be codified through existing repo checks. The goal is to capture only clear wins: low-noise checks that would have prevented repeated review findings, bugfix follow-ups, or command/workflow mistakes.
Methodology
I scanned local Codex transcript data from
/Users/dcramer/.codexand then cross-checked the themes against repository commit history.Transcript coverage:
/Users/dcramer/src/junior.Signals used:
impact:,fix:,violates, review task prompts, and subagent notifications.--forwarding.Strong commit-history clusters:
Existing check surfaces to reuse:
pnpm lint, already runsoxlint,ast-grep scan, andpackage:lint.sgconfig.ymlwith rules inast-grep/rules.scripts/check-release-config.mjs.pnpm typecheck, focused package scripts, and targeted component/integration tests.Tasks
Add an eval command/documentation check.
pnpm exec vitest,pnpm exec vitest ... *.eval.ts,pnpm --filter @sentry/junior-evals evals -- ..., orevals ... -- -t.AGENTS.md,packages/junior-evals/README.md, docs, specs, and eval package files.pnpm lint.fix: Route Sentry telemetry and simplify evals,fix(evals): Use runtime adapter overrides, andfix(junior-evals): Align harness with eval types.Add an eval harness/schema drift check.
result.output,assistant_posts,channel_posts,observed_tool_invocations, local transcript/event-log schemas, andcontract/allowrubric remnants.pnpm lint.vitest-evalsprimitives.Add a changed-spec metadata check.
specs/**/*.md, requireLast Editedand a changelog entry to be updated.scripts/check-release-config.mjs.pnpm lintorpnpm docs:check.Last Editedmetadata after spec changes.Add ast-grep rules for forbidden runtime singleton/test mutation patterns.
set*ForTestsandreset*ForTestsAPIs in runtime code.default*Store/singleton caches outside explicit allowlisted modules.packages/junior/src.Add a production composition-root import boundary check.
@/chat/app/productionoutside explicit app/composition-root allowlists.oxlint no-restricted-importsor ast-grep, whichever produces the clearer exception model.Add a plugin/runtime boundary schema ownership check.
packages/junior-plugin-api/src/**/*.Add a public API barrel/export check.
export *from package roots except explicit allowlists.packages/*/src/index.ts.export * from "./prompt"publishing dormant plugin APIs andexport * from "./json"adding unused public surface.Add an env/config docs alignment check.
packages/docs/src/content/docs/reference/config-and-env.mdand deploy docs.packages/junior/src/chat/config.tsand SQL/plugin database config.Add narrow auth/egress structured-signal checks.
fix(auth): consume egress auth signals unconditionally, remove heuristic detection,fix(sandbox): always consume egress auth signals regardless of exit code, andfix(auth): remove provider fallback.Add SQL/conversation boundary checks and invariant tests.
query<T>()and raw SQL execution outside provider/migration modules.Keep Slack routing/message behavior primarily in integration/eval coverage, not static lint.
Keep broad "no fallback" as review guidance, not a generic linter.
Recommended First Slice
These are the clearest low-noise wins and fit the existing
pnpm lint/ast-grepsetup.