fix(agents): Explore agent type is *specifically* not to be used for code review#2098
Conversation
61695e7 to
878d86c
Compare
Site previewPreview: https://cd93ba24-site.fullsend-ai.workers.dev Commit: |
|
🤖 Finished Review · ✅ Success · Started 8:30 PM UTC · Completed 8:38 PM UTC |
ReviewFindingsMedium
Low
Info
Previous runReviewFindingsMedium
Low
Info
Previous run (2)ReviewFindingsHigh
Medium
Low
Info
Previous run (3)ReviewFindingsMedium
Low
Info
Previous run (4)ReviewFindingsMedium
Low
Info
Previous run (5)ReviewFindingsMedium
Low
Previous run (6)ReviewFindingsMedium
Low
Previous run (7)ReviewFindingsMedium
Low
Previous run (8)ReviewFindingsMedium
Low
Previous run (9)ReviewFindingsMedium
Low
Previous runReviewFindingsHigh
Low
Previous run (10)ReviewFindingsMedium
Low
Previous run (11)ReviewFindingsHigh
Low
|
a6bfcd5 to
b4d01d4
Compare
|
🤖 Finished Review · ✅ Success · Started 8:59 PM UTC · Completed 9:11 PM UTC |
|
Good catch! |
|
@rh-hemartin, @rbean - FWIW, FYI; The In my mind, the likelihood of Claude Code discovering this randomly, during a code review, is miniscule, so, by Occam's Chisel (not as nice/precise as the Razor 😁) it must be part of the system prompt[2][3]. TL;DR - it's not an hallucination. Having said all of that: the security implications are real, and severe. If I can get an answer in PR 192, and the answer is "diffs are the way", then we can stick with Explore type and moot this PR. [1] Summarizing, not quoting verbatim. |
|
Ok, thinking. Explore has the right permission profile (no write tools) but the wrong behavioral disposition (reads excerpts, not whole files). Switching to general-purpose fixes the behavioral problem but opens a privilege escalation surface because Can we solve this at the sandbox layer instead of the agent type layer? Right now filesystem_policy:
include_workdir: true
read_only: [/usr, /lib, /proc, /dev/urandom, /app, /etc, /var/log, /sandbox/workspace/target-repo]
read_write: [/sandbox, /tmp, /dev/null]With that, it wouldn't matter what Then, we could make sub-agents full "general-purpose" agents. Does usage of read_only like that actually work? |
|
EDIT: Experiment at fullsend-ai/experiments/pull/31 tl;dr - No.
🤷 @maruiz93? 🤣 |
|
Try it! :D |
b4d01d4 to
bb7b589
Compare
|
🤖 Review · ❌ Terminated · Started 7:36 PM UTC · Ended 7:49 PM UTC |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
🤖 Finished Review · ✅ Success · Started 7:36 PM UTC · Completed 7:49 PM UTC |
bb7b589 to
617f74d
Compare
|
🤖 Finished Review · ✅ Success · Started 1:22 PM UTC · Completed 1:39 PM UTC |
617f74d to
f4dca6d
Compare
|
🤖 Review · ❌ Terminated · Started 4:21 PM UTC · Ended 4:35 PM UTC |
|
🤖 Finished Review · ✅ Success · Started 4:21 PM UTC · Completed 4:35 PM UTC |
f4dca6d to
4f25a42
Compare
|
🤖 Review · |
4f25a42 to
222769c
Compare
|
🤖 Finished Review · ✅ Success · Started 6:00 PM UTC · Completed 6:19 PM UTC |
ralphbean
left a comment
There was a problem hiding this comment.
Typo at review.md:98 — pre-review should be pr-review. The challenger dispatch at step 6d (SKILL.md ~495) still uses the old params too. And I'm not confident the new parameter names are right — see inline.
| - `run_in_background`: `true` | ||
| - `prompt`: composed from parts 1–5 | ||
| the Agent tool's `model` parameter) | ||
| - `permissionMode`: `dontAsk` |
There was a problem hiding this comment.
[important] Are these parameter names verified against a live Agent tool call? The schema I can see has prompt, run_in_background, subagent_type, description — the new ones (name, tools, permissionMode, background, initialPrompt) don't appear. If we're swapping one set of unverified params for another, we should confirm they actually work before merging.
Also — the challenger dispatch at step 6d (~line 495) still uses the old subagent_type: Explore / prompt params. Worth updating to match.
There was a problem hiding this comment.
Verified in the Anthropic docs https://code.claude.com/docs/en/sub-agents#supported-frontmatter-fields
EDIT: I keep forgetting that you guys don't necessarily read commit messages 😁
There was a problem hiding this comment.
The Agent tool's own JSON schema — you can see it in any Claude Code session by looking at what parameters the Agent tool accepts. It takes description, prompt, subagent_type, run_in_background, resume, and isolation.
The page you linked (code.claude.com/docs/en/sub-agents#supported-frontmatter-fields) lists frontmatter fields for agent definition .md files — name, tools, permissionMode, initialPrompt, etc. These go in the .md file's YAML header, not in the Agent tool call.
So there are two interfaces here, and I think the PR is crossing them:
- Agent tool call: the orchestrator calls the Agent tool with
prompt,subagent_type,run_in_background - Sub-agent definition frontmatter: the
.mdfile declaresname,tools,permissionMode, etc.
The old SKILL.md used valid Agent tool params (subagent_type: Explore, run_in_background: true, prompt). The new version uses frontmatter fields as if they were Agent tool params.
ADR 0027 in this repo covers the distinction — tools in frontmatter restricts what a subagent can access, but it's a property of the definition file, not something you pass to the Agent tool.
If the goal is to restrict sub-agent tools to Read, Grep, Glob (which I think is a good idea), we could add tools: [Read, Grep, Glob] to the sub-agent .md frontmatter and keep using valid Agent tool params for the invocation.
There was a problem hiding this comment.
If the goal is to restrict sub-agent tools to Read, Grep, Glob (which I think is a good idea), we could add tools: [Read, Grep, Glob] to the sub-agent .md frontmatter
Agreed, but I'd take it a step further: IMO, all of that argument marshalling in the skill is wasted tokens anyway; Claude Code knows how to invoke subagents (using the Agent tool, which presumably Claude Code knows how to populate from the subagent definitions) - all we need to do is instruct it to build up the prompt the way we want.
222769c to
e17c1a8
Compare
|
🤖 Review · |
d8c8385 to
533d579
Compare
|
🤖 Finished Review · ✅ Success · Started 8:08 PM UTC · Completed 8:25 PM UTC |
In working this PR, I had to *carefully* review the Claude Code docs. *This* commit is an outcome of that review. - There is no such thing as a "subagent_type" - it's an hallucination. The only existing fields are listed in https://code.claude.com/docs/en/sub-agents#supported-frontmatter-fields - Agent definitions *are not composable* - The builtin agents (e.g. Explore) are defined in *exactly* the same way as custom agents - There is no mechanism for "subclassing" agents - The recommended way to restrict subagents is using one or both of `tools`/`disallowedTools` - `permissionMode`: `dontAsk` (unintuitively) forbids Claude Code from executing any action *which would normally require user approval* - There is no such thing as "run_in_background" - another hallucination - `prompt` is under-specified: it *does not* exist in the list of frontmatter fields mentioned above, but it *can* be used when invoking the `--agents` command line flag. `initialPrompt` exists, but its description is a little fuzzy. Going conservative here. Signed-off-by: Ben Alkov <ben.alkov@redhat.com>
Review agents should not modify the target repository - add harness field `readonly_repo` (boolean) which enforces `chmod -R a-w` on the cloned repo, after copy - tighten the review agent's sandbox policy to exclude workdir writes Assisted-by: Claude Code (Opus 4.6) Signed-off-by: Ben Alkov <ben.alkov@redhat.com>
533d579 to
c1d2b49
Compare
|
🤖 Finished Review · ✅ Success · Started 2:33 PM UTC · Completed 2:48 PM UTC |
2nd EDIT:
In working this PR, I had to carefully review the Claude Code docs. This commit is an outcome of that review.
tools/disallowedToolspermissionMode:dontAsk(unintuitively) forbids Claude Code from executing any action which would normally require user approvalpromptis under-specified: it does not exist in the list of frontmatter fields mentioned above, but it can be used when invoking the--agentscommand line flag.initialPromptexists, but its description is a little fuzzy. Going conservative here.EDIT: This change also requires a supporting mechanism for blocking writes to "/sandbox/workspace/target-repo".
After much discussion, we've settled on a harness step which runs
chmod -R a-wimmediately after the copy takes place (obviously an agent could chmod it back, but at least we'd be able to analyze the session trace(s) to know why).The review subagents are being invoked incorrectly;
subagent_typedoesn't exist. It's a complete hallucination