fix(prompts): remove 10-item cap from discovery TodoWrite plan#2298
Merged
Conversation
The RULE 3 sentence in DISCOVERY_AND_PHILOSOPHY told the model to write 'a plan of 5–10 short imperative items'. That upper bound caused the agent to cap every plan at exactly ten steps even when the task genuinely needed more. The TodoWrite JSON schema imposes no maxItems constraint, so the cap was entirely prompt-driven. Replace '5–10 short imperative items' with 'short imperative items covering the work'. TodoWrite intent, RULE 3 label, and planning-before-building requirement all survive unchanged. Red spec: apps/daemon/tests/prompts/discovery-todo-cap.test.ts
…rden tests [pass-6,7 BLOCKER] packages/contracts/src/prompts/discovery.ts still had the old '5-10 short imperative items' wording. apps/web imports composeSystemPrompt from @open-design/contracts (ProjectView.tsx:43), so web-originated chat runs were still subject to the cap. [pass-8 WARNING] discovery-todo-cap.test.ts did not cover the contracts copy, leaving that path unguarded. Also no guard against semantically equivalent re-introduction via 'at most / maximum / no more than'. Changes: - packages/contracts/src/prompts/discovery.ts: apply same wording fix as apps/daemon; add inline rationale comment - apps/daemon/src/prompts/discovery.ts: add inline rationale comment - apps/daemon/tests/prompts/discovery-todo-cap.test.ts: add 4th assertion blocking 'at most|maximum|no more than N item' re-introduction - packages/contracts/tests/system-prompt.test.ts: add 5-assertion suite guarding the contracts copy and composed prompt output
Siri-Ray
approved these changes
May 19, 2026
Contributor
Siri-Ray
left a comment
There was a problem hiding this comment.
@neogenix I reviewed the prompt wording changes in both the daemon and contracts copies, plus the new regression coverage around the removed TodoWrite plan item cap. I also ran the contracts prompt suite and the daemon test suite after installing workspace dependencies; both passed. Thanks for tightening both runtime paths and adding guardrails against the cap coming back.
🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
I noticed agents capping their plan at exactly 10 items even on complex
briefs that genuinely needed more — and silently skipping the rest after
hitting the limit. Tracing it back: RULE 3 of
DISCOVERY_AND_PHILOSOPHYin
apps/daemon/src/prompts/discovery.tsexplicitly told the model"a plan of 5–10 short imperative items". That upper bound was the cap.
TodoWrite's schema is unbounded; the limit was entirely prompt copy. The
same wording also lived in
packages/contracts/src/prompts/discovery.ts(consumed by
apps/webfor BYOK API mode), so the daemon-only fix wouldhave left the cap active for half the user base.
What users will see
Plans for complex tasks can now legitimately exceed 10 steps when the
brief warrants it. RULE 3's intent — TodoWrite as the first tool call,
live progress updates, planning before building — is preserved. No UI
change. No new keys, no settings, no behavior change on simpler briefs
that already fit in 10 items.
Surface area
(
packages/contractsis touched, but only a string constant inside; noDTO, no SSE event, no exported shape change.)
Screenshots
N/A (server-side prompt change, no UI).
Bug fix verification
apps/daemon/tests/prompts/discovery-todo-cap.test.tspackages/contracts/tests/system-prompt.test.tsmainand green on this branch: yes — the prior wordingmatched the
5[–\-]10\s+short\s+imperativeassertion the new testsforbid. Confirmed by grepping the main snapshot at the commit before
this branch's first commit.
re-introductions (
at most N items,maximum N,no more than N)so a future prompt rewrite cannot accidentally bring the cap back.
Validation
pnpm guard(clean)pnpm typecheck(clean)pnpm --filter @open-design/daemon test— 2870 tests passpnpm --filter @open-design/contracts test— 90 tests passAdjacent issues (out of scope)
apps/web/src/components/DesignSystemFlow.tsx:1801silently caps the DSworkspace compact todo widget at 6 items via
todos.slice(0, 6). Nowthat plans can legitimately exceed that count, the widget needs a
+N moreoverflow indicator. Filed as a follow-up; not in scope here.