Skip to content

fix: grep false negatives, output mangling, and truncation annotations#791

Open
BadassBison wants to merge 1 commit intortk-ai:developfrom
BadassBison:fix/grep-false-negatives-and-truncation-annotations
Open

fix: grep false negatives, output mangling, and truncation annotations#791
BadassBison wants to merge 1 commit intortk-ai:developfrom
BadassBison:fix/grep-false-negatives-and-truncation-annotations

Conversation

@BadassBison
Copy link
Copy Markdown

@BadassBison BadassBison commented Mar 23, 2026

Summary

Fixes three issues where RTK's output filtering causes AI agents (Claude Code) to burn extra tokens on retry loops, producing net-negative token impact during analysis-heavy workflows.

  • grep: add --no-ignore-vcs to rg — prevents false negatives in repos with .gitignore while still respecting .ignore and .rgignore
  • grep: passthrough for small results (<=50 matches) — preserves standard file:line:content format AI agents can parse
  • smart_truncate: clean truncation — removes synthetic // ... N lines omitted annotations that break AI parsing

Problem

Observed in a real session across a large Rails monorepo (~83K files, 1,633 RTK commands):

Issue Root Cause Impact
grep returns "0 matches" for existing files rg respects .gitignore by default, grep -r doesn't ~10 false negatives led to wrong analysis conclusions
grep output in "217 matches in 1F:" format Always reformatted, even for 4 matches AI agents can't parse it, retry 2-4 times each
// ... 81 lines omitted in file reads smart_truncate inserts synthetic comment markers AI treats annotations as code, retries with alternative commands

Quantified impact: grep had the lowest savings rate (9.3%) but the highest retry cost. Estimated 200-500K tokens burned on retries across ~15 retry patterns, each requiring 2-4 extra tool calls.

Changes

1. src/cmds/system/grep_cmd.rs--no-ignore-vcs flag

Added --no-ignore-vcs to the rg invocation so it doesn't skip files listed in .gitignore/.hgignore. This matches grep -r behavior and eliminates false negatives in repos where test files, build artifacts, or generated code live in gitignored directories. Using --no-ignore-vcs (not --no-ignore) so .ignore and .rgignore are still respected.

2. src/cmds/system/grep_cmd.rs — Passthrough for small results

Results with <=50 matches now output raw file:line:content format (standard grep output that AI agents already know how to parse). The grouped "X matches in Y files:" format is preserved only for >50 matches where token savings are meaningful.

3. src/core/filter.rs — Clean truncation in smart_truncate

Replaced the "smart" truncation logic that scattered " // ... N lines omitted" markers throughout file content with clean first-N-lines truncation. A single [X more lines] marker appears at the end only.

Tests

  • test_smart_truncate_no_annotations — verifies no // ... markers in output
  • test_smart_truncate_no_truncation_when_under_limit — no truncation when content fits
  • test_smart_truncate_exact_limit — edge case at exact line count
  • test_rg_no_ignore_vcs_flag_accepted — verifies rg accepts the new flag

Test plan

  • cargo fmt --all && cargo clippy --all-targets && cargo test --all
  • Manual: rtk grep "fn run" src/ with <=50 results outputs raw file:line:content format
  • Manual: rtk read src/main.rs --max-lines 5 shows clean truncation without // ... markers
  • Manual: verify grep finds files in .gitignored directories

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 23, 2026

CLA assistant check
All committers have signed the CLA.

@pszymkowiak pszymkowiak added bug Something isn't working effort-medium 1-2 jours, quelques fichiers filter-quality Filter produces incorrect/truncated signal labels Mar 23, 2026
@pszymkowiak
Copy link
Copy Markdown
Collaborator

[w] wshm · Automated triage by AI

📊 Automated PR Analysis

🐛 Type bug-fix
🟡 Risk medium

Summary

Fixes three issues in grep and smart_truncate that caused AI agents to waste tokens on retry loops: adds --no-ignore to rg so gitignored files aren't silently skipped, passes through raw grep output for small result sets (<=50 matches) instead of a grouped format that confused AI parsers, and replaces synthetic '// ... N lines omitted' truncation markers with clean first-N-lines truncation plus a single '[X more lines]' suffix.

Review Checklist

  • Tests present
  • Breaking change
  • Docs updated

Analyzed automatically by wshm · This is an automated analysis, not a human review.

Copy link
Copy Markdown
Collaborator

@pszymkowiak pszymkowiak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BadassBison — good analysis on grep false negatives and AI retry loops.

Please retarget to develop — all PRs must target develop, not master.

Review notes:

  1. --no-ignore is risky — this searches inside node_modules/, target/, etc. Consider --no-ignore-vcs instead (skips .gitignore but respects .ignore)
  2. Passthrough <=50 — interesting idea but the threshold should be configurable, and it changes RTK's savings metrics
  3. 10 README files — doc changes should be separate from the code fix

Please retarget and address the --no-ignore concern. Thanks!

@pszymkowiak
Copy link
Copy Markdown
Collaborator

Hi! Two things needed before we can review:

  1. Retarget to develop — this PR targets master, but all PRs should target develop. You can change the base branch in the PR settings (right sidebar).
  2. Sign the CLA — if not already done, please sign at https://cla-assistant.io/rtk-ai/rtk

Thanks!

@aeppling
Copy link
Copy Markdown
Contributor

Hey

We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes src/ from a flat layout into subfolders.

No logic changes — only file moves and import path updates.

What you need to do

Rebase your branch on develop when receiving this comment:

git fetch origin && git rebase origin/develop

Git detects renames automatically. If you get import conflicts, update the paths:

use crate::git;        // now: use crate::cmds::git::git;
use crate::tracking;   // now: use crate::core::tracking;
use crate::config;     // now: use crate::core::config;
use crate::init;       // now: use crate::hooks::init;
use crate::gain;       // now: use crate::analytics::gain;

Need help rebasing? Tag @aeppling

@BadassBison BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from c6c979a to 2cc8a19 Compare March 26, 2026 21:52
@BadassBison BadassBison changed the base branch from master to develop March 26, 2026 21:52
@BadassBison BadassBison requested a review from pszymkowiak March 26, 2026 21:53
@BadassBison
Copy link
Copy Markdown
Author

@pszymkowiak @aeppling — addressed all feedback:

  1. Retargeted to develop — base branch updated.
  2. --no-ignore--no-ignore-vcs — switched to the more targeted flag that only disables VCS ignore files (.gitignore/.hgignore) while still respecting .ignore and .rgignore. Updated the corresponding test.
  3. Doc changes removed — the 10 README files have been dropped from this PR. The branch now contains only src/cmds/system/grep_cmd.rs and src/core/filter.rs.
  4. Rebased on develop — applied changes to the new file paths after the PR feat(refacto-codebase-onboarding): partie 1 - folders and technical docs #826 reorganization.
  5. CLA is signed — already confirmed by the CLA assistant bot above.

Thanks for the thorough review!

BadassBison added a commit to BadassBison/rtk that referenced this pull request Mar 26, 2026
Documents the changes from rtk-ai#791:
- grep now passes through raw output for <=50 matches (standard file:line:content)
- grep uses grouped format only for >50 matches where token savings are meaningful
- --no-ignore-vcs flag added to match grep -r behavior for .gitignore'd files
- savings range updated to 0-90% to reflect passthrough for small result sets
@nicklloyd
Copy link
Copy Markdown

also awaiting changes ;)

- grep: use --no-ignore-vcs so .gitignore'd files aren't silently skipped
  (matches grep -r behavior, avoids false negatives in large monorepos)
- grep: passthrough raw output for <=50 matches so AI agents can parse
  standard file:line:content format without retry loops
- filter: replace smart_truncate heuristic with clean first-N-lines truncation
  and a single [X more lines] suffix (eliminates synthetic // ... markers
  that AI agents misread as code, causing parsing confusion and retries)
@BadassBison BadassBison force-pushed the fix/grep-false-negatives-and-truncation-annotations branch from baddd42 to 36041c5 Compare March 26, 2026 22:23
@BadassBison
Copy link
Copy Markdown
Author

@nicklloyd — all changes have been addressed! Retargeted to develop, switched to --no-ignore-vcs, doc updates moved to a separate PR (#871), rebased on the new src/ structure, and the cargo fmt CI failure is fixed. Should be good for another look. 🙏

@nicklloyd
Copy link
Copy Markdown

@BadassBison - just following from the sidelines as this one is a blocker. Looking forward to being able to try it out 🤘🏻

@BadassBison
Copy link
Copy Markdown
Author

BadassBison commented Mar 27, 2026

@pszymkowiak,
Everything is updated and awaiting review. Seems like this work is blocking others, anything else you need from me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-changes bug Something isn't working effort-medium 1-2 jours, quelques fichiers filter-quality Filter produces incorrect/truncated signal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants