Skip to content

fix(junior): keep resume diagnostics cumulative#385

Closed
omribz156 wants to merge 2 commits into
getsentry:mainfrom
omribz156:codex/timeout-resume-diagnostics-footer
Closed

fix(junior): keep resume diagnostics cumulative#385
omribz156 wants to merge 2 commits into
getsentry:mainfrom
omribz156:codex/timeout-resume-diagnostics-footer

Conversation

@omribz156
Copy link
Copy Markdown
Contributor

Fixes #384

Summary

  • add cumulative duration and token usage fields to paused turn checkpoints
  • use the prior checkpoint totals when rendering the final timeout-resume Slack footer
  • post timeout continuation notices with the normal Slack reply footer blocks so the conversation ID stays traceable

Verification

  • pnpm install --frozen-lockfile
  • pnpm --filter @sentry/junior exec vitest run tests/unit/services/turn-checkpoint.test.ts tests/unit/handlers/oauth-resume.test.ts
  • pnpm --filter @sentry/junior exec tsc --noEmit --pretty false
  • pnpm --filter @sentry/junior lint
  • pnpm --filter @sentry/junior build
  • git diff --check

Implemented with Codex assistance, then manually reviewed and kept to the pause/resume diagnostics path.

sentry-junior Bot and others added 2 commits May 20, 2026 11:56
…es (getsentry#383)

## Summary

The tool error handler (`handleToolExecutionError`) sent all non-MCP
tool errors to Sentry via `logException`, even when the error was caused
by invalid model input — missing files, bad regex, ambiguous edits, path
traversal. This created noisy alerts for expected conditions like
[JUNIOR-2Q](https://sentry.sentry.io/issues/7495355493/).

This introduces `ToolInputError` as the single classification mechanism
for expected model/user-caused tool failures. Domain code throws
`ToolInputError` at the source where it knows the failure is
input-caused, not a system error. The global error handler respects
this: no `logException`, no Sentry noise.

## Architecture

```
Domain code throws ToolInputError  → handleToolExecutionError → logWarn only → rethrow → Pi tool_error
Domain code throws Error (system)  → handleToolExecutionError → logException (Sentry) → rethrow
MCP tool returns isError           → McpToolError             → logWarn only → rethrow → Pi tool_error
```

No ad-hoc error-type catches at the executor level. The domain function
that creates the error owns the classification.

## Changes

**New:**
- `tool-input-error.ts` — `ToolInputError` class

**Error handler:**
- `tool-error-handler.ts` — skip `logException` for `ToolInputError`;
replace `getMcpAwareErrorType` with `getToolErrorType()` covering both
MCP and input error classification

**Domain sources that now throw `ToolInputError`:**
- `edit-file.ts` — ENOENT on `fs.readFile` → `ToolInputError("File not
found: ...")`
- `text-edits.ts` — all validation (not found, ambiguous, overlap,
no-op, empty oldText, bad newText)
- `file-utils.ts` — workspace path traversal
- `list-dir.ts` — not-a-directory
- `grep.ts` — invalid regex pattern
- `sandbox.ts` — missing required arguments (path, edits, pattern,
command)

**What stays as system errors (still reports to Sentry):**
- Sandbox API failures (`throwSandboxOperationError`)
- Slack API errors
- Network/connection failures
- Non-ENOENT filesystem errors
- `unsupported sandbox tool` (config error)

## Tests

- **`tool-error-handler.test.ts`** (new) — system errors call
`logException`; `ToolInputError` and `McpToolError` do not
- **`file-tools.test.ts`** — `ToolInputError` for ambiguous edits, text
not found, path traversal, invalid regex, file-as-directory
- **`sandbox-executor.test.ts`** — editFile on missing path throws
`ToolInputError`

## Verification

- `pnpm typecheck` ✅
- 51/51 tests passing ✅
- Lint + prettier ✅

Fixes [JUNIOR-2Q](https://sentry.sentry.io/issues/7495355493/)

Requested-By: David Cramer <dcramer>

Co-authored-by: Junior <junior[bot]@sentry.io>
Co-authored-by: Claude (anthropic/claude-opus-4.6) <[email protected]>
Track cumulative duration and token usage across paused turn checkpoints, then use those totals when rendering timeout-resume Slack footers. Timeout continuation notices also now use the normal Slack footer shape so the conversation ID stays traceable.

Co-Authored-By: GPT-5 Codex <[email protected]>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

@omribz156 is attempting to deploy a commit to the Sentry Team on Vercel.

A member of the Team first needs to authorize it.

@dcramer
Copy link
Copy Markdown
Member

dcramer commented May 21, 2026

Superseded by #388, which keeps the timeout diagnostics changes and adds the broader continuation/failure recovery work.

dcramer added a commit that referenced this pull request May 21, 2026
Persist safe running checkpoints at Pi-continuable boundaries so interrupted turns retain a previous resumable state without checkpointing mid-tool-call.

Treat interrupted sandbox command streams as failed bash results that the agent can inspect or retry, and add eval fault injection to prove the recovery path.

Supersedes GH-385
Co-Authored-By: GPT-5 Codex <[email protected]>
dcramer added a commit that referenced this pull request May 21, 2026
Preserve the timeout diagnostics work from #385 and broaden the
continuation recovery strategy for interrupted turns. Junior now writes
safe in-progress checkpoints at Pi-continuable boundaries, posts
consistent Slack continuation notices, and treats interrupted sandbox
command streams as recoverable bash results instead of terminal agent
failures.

**Safe Continuation Boundaries**

Running checkpoints are persisted only when Pi can legally continue from
the transcript, such as after the user message or after tool results are
recorded. This keeps progress available without checkpointing unsafe
mid-tool-call state.

**Sandbox Stream Recovery**

A sandbox command stream that ends before a command finishes now returns
a failed bash result with guidance to inspect or retry. The agent can
continue from that durable tool result rather than failing the whole
turn.

**Fault-Injection Coverage**

The eval harness can inject sandbox bash stream interruptions, and the
lifecycle eval proves the real agent recovers by retrying and finishing
the request.

Supersedes #385

---------

Co-authored-by: Omri SirComp <[email protected]>
Co-authored-by: GPT-5 Codex <[email protected]>
Co-authored-by: GPT-5 Codex <[email protected]>
@dcramer
Copy link
Copy Markdown
Member

dcramer commented May 21, 2026

Thanks!

@dcramer dcramer closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pause/resume turns missing footer and cumulative stats

2 participants