feat(flue): port skill-drift system to Flue framework (side-by-side) #127
2 issues
security-review: Found 2 issues (2 medium)
Medium
Prompt injection via PR-controlled SDK checkout lets manipulated LLM agent write to sentry-for-ai skills - `.flue/agents/skill-drift-detector.ts:50-62`
The detector agent runs with sandbox: 'local' and receives the SDK repo checkout path in its prompt, so adversarial content in a merged PR's files (source comments, README, AGENTS.md, etc.) can manipulate the LLM to emit a crafted create_pr patch that the actuate job then applies to getsentry/sentry-for-ai using a privileged GitHub App token. Move agent execution to an isolated sandbox without access to the PR checkout, or run the detector only against the PR diff (via gh pr diff) without providing a local filesystem path.
Also found at:
.flue/roles/detector.md:43-46.github/workflows/flue-skill-drift-detector-reusable.yml:213-220docs/agent-port/04-flue-implementation.md:31-32docs/agent-port/sdk-repo-wrappers/sentry-android.yml:19
LLM-generated patches that create new files bypass the `skills/` path allowlist - `.github/workflows/flue-skill-drift-detector-reusable.yml:211`
The path-allowlist guard uses git diff --name-only (working-tree vs. index), which does not list untracked files created by git apply (run without --index). A patch that adds a new file outside skills/ — e.g., .github/workflows/evil.yml — produces an empty touched variable, passes the allowlist loop, and then gets staged and committed by the subsequent git add -A, opening a path to committing arbitrary content into sentry-for-ai.
⏱ 18m 12s · 2.1M in / 239.4k out · $7.99
Annotations
Check warning on line 62 in .flue/agents/skill-drift-detector.ts
sentry-warden / warden: security-review
Prompt injection via PR-controlled SDK checkout lets manipulated LLM agent write to sentry-for-ai skills
The detector agent runs with `sandbox: 'local'` and receives the SDK repo checkout path in its prompt, so adversarial content in a merged PR's files (source comments, README, `AGENTS.md`, etc.) can manipulate the LLM to emit a crafted `create_pr` patch that the actuate job then applies to `getsentry/sentry-for-ai` using a privileged GitHub App token. Move agent execution to an isolated sandbox without access to the PR checkout, or run the detector only against the PR diff (via `gh pr diff`) without providing a local filesystem path.
Check warning on line 46 in .flue/roles/detector.md
sentry-warden / warden: security-review
[L28-7CY] Prompt injection via PR-controlled SDK checkout lets manipulated LLM agent write to sentry-for-ai skills (additional location)
The detector agent runs with `sandbox: 'local'` and receives the SDK repo checkout path in its prompt, so adversarial content in a merged PR's files (source comments, README, `AGENTS.md`, etc.) can manipulate the LLM to emit a crafted `create_pr` patch that the actuate job then applies to `getsentry/sentry-for-ai` using a privileged GitHub App token. Move agent execution to an isolated sandbox without access to the PR checkout, or run the detector only against the PR diff (via `gh pr diff`) without providing a local filesystem path.
Check warning on line 220 in .github/workflows/flue-skill-drift-detector-reusable.yml
sentry-warden / warden: security-review
[L28-7CY] Prompt injection via PR-controlled SDK checkout lets manipulated LLM agent write to sentry-for-ai skills (additional location)
The detector agent runs with `sandbox: 'local'` and receives the SDK repo checkout path in its prompt, so adversarial content in a merged PR's files (source comments, README, `AGENTS.md`, etc.) can manipulate the LLM to emit a crafted `create_pr` patch that the actuate job then applies to `getsentry/sentry-for-ai` using a privileged GitHub App token. Move agent execution to an isolated sandbox without access to the PR checkout, or run the detector only against the PR diff (via `gh pr diff`) without providing a local filesystem path.
Check warning on line 32 in docs/agent-port/04-flue-implementation.md
sentry-warden / warden: security-review
[L28-7CY] Prompt injection via PR-controlled SDK checkout lets manipulated LLM agent write to sentry-for-ai skills (additional location)
The detector agent runs with `sandbox: 'local'` and receives the SDK repo checkout path in its prompt, so adversarial content in a merged PR's files (source comments, README, `AGENTS.md`, etc.) can manipulate the LLM to emit a crafted `create_pr` patch that the actuate job then applies to `getsentry/sentry-for-ai` using a privileged GitHub App token. Move agent execution to an isolated sandbox without access to the PR checkout, or run the detector only against the PR diff (via `gh pr diff`) without providing a local filesystem path.
Check warning on line 19 in docs/agent-port/sdk-repo-wrappers/sentry-android.yml
sentry-warden / warden: security-review
[L28-7CY] Prompt injection via PR-controlled SDK checkout lets manipulated LLM agent write to sentry-for-ai skills (additional location)
The detector agent runs with `sandbox: 'local'` and receives the SDK repo checkout path in its prompt, so adversarial content in a merged PR's files (source comments, README, `AGENTS.md`, etc.) can manipulate the LLM to emit a crafted `create_pr` patch that the actuate job then applies to `getsentry/sentry-for-ai` using a privileged GitHub App token. Move agent execution to an isolated sandbox without access to the PR checkout, or run the detector only against the PR diff (via `gh pr diff`) without providing a local filesystem path.
Check warning on line 211 in .github/workflows/flue-skill-drift-detector-reusable.yml
sentry-warden / warden: security-review
LLM-generated patches that create new files bypass the `skills/` path allowlist
The path-allowlist guard uses `git diff --name-only` (working-tree vs. index), which does not list untracked files created by `git apply` (run without `--index`). A patch that adds a new file outside `skills/` — e.g., `.github/workflows/evil.yml` — produces an empty `touched` variable, passes the allowlist loop, and then gets staged and committed by the subsequent `git add -A`, opening a path to committing arbitrary content into `sentry-for-ai`.