fix: grep false negatives, output mangling, and truncation annotations#791
Conversation
📊 Automated PR Analysis
SummaryFixes three issues in grep and smart_truncate that caused AI agents to waste tokens on retry loops: adds --no-ignore to rg so gitignored files aren't silently skipped, passes through raw grep output for small result sets (<=50 matches) instead of a grouped format that confused AI parsers, and replaces synthetic '// ... N lines omitted' truncation markers with clean first-N-lines truncation plus a single '[X more lines]' suffix. Review Checklist
Analyzed automatically by wshm · This is an automated analysis, not a human review. |
pszymkowiak
left a comment
There was a problem hiding this comment.
Thanks @BadassBison — good analysis on grep false negatives and AI retry loops.
Please retarget to develop — all PRs must target develop, not master.
Review notes:
--no-ignoreis risky — this searches insidenode_modules/,target/, etc. Consider--no-ignore-vcsinstead (skips .gitignore but respects .ignore)- Passthrough <=50 — interesting idea but the threshold should be configurable, and it changes RTK's savings metrics
- 10 README files — doc changes should be separate from the code fix
Please retarget and address the --no-ignore concern. Thanks!
|
Hi! Two things needed before we can review:
Thanks! |
|
Hey We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes No logic changes — only file moves and import path updates. What you need to doRebase your branch on git fetch origin && git rebase origin/developGit detects renames automatically. If you get import conflicts, update the paths: use crate::git; // now: use crate::cmds::git::git;
use crate::tracking; // now: use crate::core::tracking;
use crate::config; // now: use crate::core::config;
use crate::init; // now: use crate::hooks::init;
use crate::gain; // now: use crate::analytics::gain;Need help rebasing? Tag @aeppling |
c6c979a to
2cc8a19
Compare
|
@pszymkowiak @aeppling — addressed all feedback:
Thanks for the thorough review! |
Documents the changes from rtk-ai#791: - grep now passes through raw output for <=50 matches (standard file:line:content) - grep uses grouped format only for >50 matches where token savings are meaningful - --no-ignore-vcs flag added to match grep -r behavior for .gitignore'd files - savings range updated to 0-90% to reflect passthrough for small result sets
|
also awaiting changes ;) |
- grep: use --no-ignore-vcs so .gitignore'd files aren't silently skipped (matches grep -r behavior, avoids false negatives in large monorepos) - grep: passthrough raw output for <=50 matches so AI agents can parse standard file:line:content format without retry loops - filter: replace smart_truncate heuristic with clean first-N-lines truncation and a single [X more lines] suffix (eliminates synthetic // ... markers that AI agents misread as code, causing parsing confusion and retries)
baddd42 to
36041c5
Compare
|
@nicklloyd — all changes have been addressed! Retargeted to |
|
@BadassBison - just following from the sidelines as this one is a blocker. Looking forward to being able to try it out 🤘🏻 |
|
@pszymkowiak, |
Summary
Fixes three issues where RTK's output filtering causes AI agents (Claude Code) to burn extra tokens on retry loops, producing net-negative token impact during analysis-heavy workflows.
--no-ignore-vcsto rg — prevents false negatives in repos with.gitignorewhile still respecting.ignoreand.rgignorefile:line:contentformat AI agents can parse// ... N lines omittedannotations that break AI parsingProblem
Observed in a real session across a large Rails monorepo (~83K files, 1,633 RTK commands):
rgrespects.gitignoreby default,grep -rdoesn't"217 matches in 1F:"format// ... 81 lines omittedin file readssmart_truncateinserts synthetic comment markersQuantified impact: grep had the lowest savings rate (9.3%) but the highest retry cost. Estimated 200-500K tokens burned on retries across ~15 retry patterns, each requiring 2-4 extra tool calls.
Changes
1.
src/cmds/system/grep_cmd.rs—--no-ignore-vcsflagAdded
--no-ignore-vcsto therginvocation so it doesn't skip files listed in.gitignore/.hgignore. This matchesgrep -rbehavior and eliminates false negatives in repos where test files, build artifacts, or generated code live in gitignored directories. Using--no-ignore-vcs(not--no-ignore) so.ignoreand.rgignoreare still respected.2.
src/cmds/system/grep_cmd.rs— Passthrough for small resultsResults with <=50 matches now output raw
file:line:contentformat (standard grep output that AI agents already know how to parse). The grouped"X matches in Y files:"format is preserved only for >50 matches where token savings are meaningful.3.
src/core/filter.rs— Clean truncation insmart_truncateReplaced the "smart" truncation logic that scattered
" // ... N lines omitted"markers throughout file content with clean first-N-lines truncation. A single[X more lines]marker appears at the end only.Tests
test_smart_truncate_no_annotations— verifies no// ...markers in outputtest_smart_truncate_no_truncation_when_under_limit— no truncation when content fitstest_smart_truncate_exact_limit— edge case at exact line counttest_rg_no_ignore_vcs_flag_accepted— verifies rg accepts the new flagTest plan
cargo fmt --all && cargo clippy --all-targets && cargo test --allrtk grep "fn run" src/with <=50 results outputs rawfile:line:contentformatrtk read src/main.rs --max-lines 5shows clean truncation without// ...markers.gitignored directories