Skip to content

feat: replace custom filter engine with tokf-filter crate#577

Open
mpecan wants to merge 2 commits intortk-ai:developfrom
mpecan:feat/tokf-filter-engine
Open

feat: replace custom filter engine with tokf-filter crate#577
mpecan wants to merge 2 commits intortk-ai:developfrom
mpecan:feat/tokf-filter-engine

Conversation

@mpecan
Copy link

@mpecan mpecan commented Mar 13, 2026

Summary

Hey — I'm the maintainer of tokf. I built tokf because I wanted a configurable, locally-definable filter pipeline for reducing LLM token consumption from CLI output. RTK was an inspiration — I credited it in the README from day one — but tokf's filter engine and TOML DSL predate RTK's TOML filter engine by about three weeks (tokf's filter pipeline landed Feb 18; RTK's TOML Part 1 landed Mar 10).

When RTK added its own TOML engine, the two projects ended up with very similar designs — same core stages (skip/keep, replace, match_output, truncate, head/tail, on_empty), similar TOML schemas. Rather than let the two implementations drift apart, I added RTK format compatibility to tokf's serde layer so RTK's field names (strip_lines_matching, keep_lines_matching, head_lines, tail_lines, message) all deserialize natively.

This PR replaces RTK's custom filter pipeline with a delegation to tokf-filter::apply(). RTK keeps everything that makes it RTK — the registry, command matching, build.rs concatenation, rtk verify, omission markers — but the actual line-by-line filtering is now handled by the shared library.

What changes

  • Filter execution delegates to tokf-filter::apply() (net -35 lines of pipeline code)
  • RTK's registry, matching, and TOML parsing are unchanged — [filters.name] + match_command, build.rs, filter priority chain, rtk verify, RTK_NO_TOML / RTK_TOML_DEBUG all work exactly as before
  • Omission markers ("... (N lines omitted)", "... (N lines truncated)") are still applied by RTK as post-processing
  • 7 pre-existing verify test failures fixed (filters with on_empty + empty input expected "" instead of the on_empty message — these were broken on master before this PR)

What this unlocks for RTK users

After this lands, anyone writing .rtk/filters.toml or built-in filters gains access to tokf's full feature set — without any breaking changes to existing filters:

  • [[section]] state machines for collecting failure blocks
  • [[chunk]] for splitting output into repeating blocks with aggregation
  • [on_success] / [on_failure] branches with templates
  • dedup / dedup_window for collapsing duplicate lines
  • [json] extraction via JSONPath
  • Template pipe operations (| each:, | join:, | truncate:, | keep:)

Backward compatibility

  • All 890 unit tests pass
  • All 111/111 inline verify tests pass (rtk verify --require-all)
  • cargo fmt --all --check && tokf run cargo clippy --all-targets clean
  • No changes to any .toml filter's logic (7 test expectation fixes only)
  • One cosmetic difference: truncate_lines_at now uses (unicode ellipsis) instead of ... (3 ASCII dots)

Dependencies added

  • tokf-filter = "0.2.33" (no default features, Lua disabled — minimal binary size impact)
  • tokf-common = "0.2.33" (shared config types)

Benchmark results

Metric Master (no tokf) With tokf Delta
Startup time 13.1ms ± 0.6ms 15.2ms ± 1.3ms +2.1ms
Binary size 5.4MB 5.6MB +0.2MB

Note: The 10ms startup target was already exceeded on current master (13.1ms) — this predates this PR. The tokf dependency adds ~2ms and 0.2MB, which I consider acceptable for the capabilities gained. Happy to investigate optimization opportunities if the maintainers feel differently.

Motivation

I don't want two near-identical filter engines maintained in parallel. By sharing the core pipeline, bug fixes and new features in tokf automatically benefit RTK, and RTK's extensive filter library (47 built-in filters with 111 inline tests) has already helped me find and fix bugs in tokf — like match_output not respecting strip_ansi. The ecosystems are stronger together.

Test plan

  • cargo fmt --all --check — clean
  • cargo clippy --all-targets — clean
  • cargo test --all — 890 passed, 0 failed
  • rtk verify --require-all — 111/111 passed
  • Benchmark startup time with hyperfine — 15.2ms (+2.1ms over master)
  • Binary size check — 5.6MB (+0.2MB over master)
  • Manual smoke test: rtk make --version, rtk git log -5, rtk ping -c 2 localhost

🤖 Generated with Claude Code

Delegate RTK's 8-stage filter pipeline to tokf-filter::apply() while
keeping the registry, command matching, build.rs concatenation, rtk verify,
and omission markers unchanged. Unlocks tokf's full feature set (sections,
chunks, JSON extraction, templates) for .rtk/filters.toml authors.

- All 890 unit tests pass
- All 111/111 inline verify tests pass
- 7 pre-existing verify test failures fixed (on_empty + empty input)
- One cosmetic change: truncate_lines_at uses unicode ellipsis (…)
- +2.1ms startup overhead, +0.2MB binary size

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Alorse
Copy link

Alorse commented Mar 13, 2026

@mpecan This is a great PR that unlocks a lot of potential for RTK! I wanted to suggest an enhancement that would be incredibly valuable for MCP tool users.

Use Case: MCP-Specific Filters

With the rise of MCP (Model Context Protocol) tools, there's a growing need to filter verbose JSON responses from external tools that RTK doesn't natively support. For example, the ClickUp MCP returns massive JSON payloads with fields like workspace_id, creator, custom_fields.type_config, etc. that consume tokens but are rarely relevant.

Proposed Enhancement

Would it be possible to extend the TOML DSL to support conditional filters based on MCP tool name patterns? Something like:

[[filters.clickup]]
match_mcp = "mcp__clickup__.*"

[filters.clickup.json]
# JSONPath extraction for specific fields
extract = "{tasks: [.tasks[] | {id, name, status, assignee: .assignees[0].username}]}"

# Or field exclusion
exclude_paths = [
    "$..workspace_id",
    "$..creator",
    "$..custom_fields[*].type_config",
    "$..assignees[*].profilePicture"
]

# Array limits
max_array_items = 10

# Field truncation
[filters.clickup.json.truncate]
description = 200
markdown_description = 0  # 0 = remove entirely

Why This Matters

  1. No code changes needed: Users can add filters for new MCP tools without waiting for RTK releases
  2. Community sharing: Users could share .toml filter packs for popular MCPs (ClickUp, Slack, Notion, etc.)
  3. Complements PR feat: Compress MCP tool output via PostToolUse hook #535: While feat: Compress MCP tool output via PostToolUse hook #535 adds generic MCP compression, this would enable semantic filtering per tool

Implementation Ideas

The filter could be triggered by:

  • A new match_mcp pattern in the TOML
  • Or reuse match_command with MCP-aware detection
  • The JSONPath features you mentioned in the PR description seem like they could handle the field filtering

Would this fit within the scope of tokf-filter's roadmap? Happy to help test or refine the proposal!


Related: PR #535 also addresses MCP output compression but with a generic approach. These two PRs could work beautifully together—#535 for generic truncation, and this proposal for tool-specific semantic filtering.

@mpecan
Copy link
Author

mpecan commented Mar 13, 2026

@Alorse The matching is done fully in RTK, so the filter that is used is completely independent from the tokf implementation, this only adds the Filter layer.

That said: the fact above makes it easier to add MCP matching into the tool and reuse the JSON capabilities from tokf.

Just to make sure: what I am trying to say is that the use-case you are proposing doesn't require any change to tokf-filter.

Merge upstream/master into feat/tokf-filter-engine.
Conflicts resolved:
- Cargo.toml: keep both tokf-* deps and new which dep
- 7 filter TOMLs: trivial test name wording (upstream fix matched ours)

932 tests pass, 111/111 verify tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mpecan
Copy link
Author

mpecan commented Mar 17, 2026

@Alorse resolved conflicts with master. Any concerns with the PR?

@pszymkowiak
Copy link
Collaborator

Please retarget this PR to develop instead of master. We use develop as the integration branch — master is for releases only. You can change the base branch in the PR settings. Thanks!

@pszymkowiak pszymkowiak added the wrong-base PR targets master instead of develop label Mar 17, 2026
@mpecan mpecan changed the base branch from master to develop March 17, 2026 16:32
@mpecan
Copy link
Author

mpecan commented Mar 17, 2026

@pszymkowiak sorry about that, need to be more careful when reading.

@CLAassistant
Copy link

CLAassistant commented Mar 20, 2026

CLA assistant check
All committers have signed the CLA.

@aeppling
Copy link
Contributor

Hey

We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes src/ from a flat layout into subfolders.

No logic changes — only file moves and import path updates.

What you need to do

Rebase your branch on develop when receiving this comment:

git fetch origin && git rebase origin/develop

Git detects renames automatically. If you get import conflicts, update the paths:

use crate::git;        // now: use crate::cmds::git::git;
use crate::tracking;   // now: use crate::core::tracking;
use crate::config;     // now: use crate::core::config;
use crate::init;       // now: use crate::hooks::init;
use crate::gain;       // now: use crate::analytics::gain;

Need help rebasing? Tag @aeppling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wrong-base PR targets master instead of develop

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants