ci: Add weekly flaky test detector workflow by sl0thentr0py · Pull Request #6484 · getsentry/sentry-python

5 issues

High

GITHUB_TOKEN with issues:write is passed to the Claude action, contradicting the stated security model - `.github/workflows/flaky-test-detector.yml:112`

The security comment on lines 19–22 claims "The Claude step gets NO … write token," but github_token: ${{ github.token }} is explicitly passed to the action on line 112 — and that token carries issues: write (declared at the workflow level on line 43). If prompt-injection succeeds, or if a future version of claude-code-action exposes the token through any built-in GitHub tool, the LLM could open or modify issues directly, bypassing the isolation that step C is designed to enforce. Consider omitting github_token from the Claude step entirely, or binding it to a separate fine-grained token scoped to only contents: read/actions: read.

GITHUB_TOKEN with `issues: write` is passed to the LLM-controlled Claude step, contradicting the stated security model - `.github/workflows/flaky-test-detector.yml:112`

The security comment explicitly states the Claude step gets "NO write token", but github_token: ${{ github.token }} is passed at line 112. The workflow-level permission grants issues: write to that token, so a successful prompt injection that finds or reuses the token (e.g. through the action's own GitHub API plumbing) could create or manipulate issues — the exact capability the trust-boundary design aims to prevent.

Also found at:

.github/workflows/flaky-test-detector.yml:186-188
.github/workflows/flaky-test-detector.yml:114

Low

LLM-authored issue body posted with no size or content guard - `.github/workflows/flaky-test-detector.yml:201-215`

Step C posts flaky-issue-body.md to a public GitHub issue verbatim via gh issue create --body-file. That file is the sole output channel of the Claude step in Step B, which intentionally ingests untrusted CI log content from ./ci-logs/ (the workflow's own comments call out that prompt injection from log content is possible). The shell step has no size cap or sanity check on the file before posting. Because Step B has no Bash/network tools and no write credential, this cannot be used to exfiltrate secrets or write code — the realistic impact is limited to a misleading or oversized issue body that a human triager will see. This is a defense-in-depth gap, not an exploitable vulnerability: the comment on line 188 ("This step never ingests untrusted log text") is technically true of the shell, but the file it reads is LLM output derived from untrusted text. A small body-size cap (and perhaps a quick sanity check) would close the gap cheaply.

Duplicate issue created when `gh issue create` succeeds server-side but times out client-side - `.github/workflows/flaky-test-detector.yml:207-215`

The || gh issue create fallback (intended to retry without the missing flaky-test label) fires on any non-zero exit from the first call, including a network timeout that occurs after GitHub already persisted the issue, resulting in two identical issues being opened.

Claude step receives github.token with issues:write, weakening the documented 'no write credential' guarantee - `.github/workflows/flaky-test-detector.yml:107`

The workflow grants issues: write at the job level and passes ${{ github.token }} as github_token to anthropics/claude-code-action, which contradicts the inline security comment stating the Claude step gets no write token. In this schedule/workflow_dispatch context the action has no default issue/PR to comment on and allowedTools excludes Bash, so a direct write sink via the action is not demonstrated; however, handing a write-scoped token to an LLM agent processing attacker-controlled CI log content removes a mechanical protection the author explicitly claims exists, leaving only the prompt's soft 'treat as data' instruction.

4 skills analyzed

Skill	Findings	Duration	Cost
security-review	0	4m 17s	$0.47
code-review	2	2m 8s	$0.43
find-bugs	2	3m 43s	$0.65
skill-scanner	1	1m 37s	$0.22

_{⏱ 11m 45s · 232.6k in / 33.1k out · $1.77}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: Add weekly flaky test detector workflow#6484

ci: Add weekly flaky test detector workflow#6484
sl0thentr0py merged 2 commits into
masterfrom
neel/flaky-workflow

Uh oh!

5 issues

High

Low

Re-running checks...

Uh oh!

address security

Uh oh!

5 issues

High

Low

Re-running checks...