feat: replace custom filter engine with tokf-filter crate by mpecan · Pull Request #577 · rtk-ai/rtk

mpecan · 2026-03-13T15:47:23Z

Summary

Hey — I'm the maintainer of tokf. I built tokf because I wanted a configurable, locally-definable filter pipeline for reducing LLM token consumption from CLI output. RTK was an inspiration — I credited it in the README from day one — but tokf's filter engine and TOML DSL predate RTK's TOML filter engine by about three weeks (tokf's filter pipeline landed Feb 18; RTK's TOML Part 1 landed Mar 10).

When RTK added its own TOML engine, the two projects ended up with very similar designs — same core stages (skip/keep, replace, match_output, truncate, head/tail, on_empty), similar TOML schemas. Rather than let the two implementations drift apart, I added RTK format compatibility to tokf's serde layer so RTK's field names (strip_lines_matching, keep_lines_matching, head_lines, tail_lines, message) all deserialize natively.

This PR replaces RTK's custom filter pipeline with a delegation to tokf-filter::apply(). RTK keeps everything that makes it RTK — the registry, command matching, build.rs concatenation, rtk verify, omission markers — but the actual line-by-line filtering is now handled by the shared library.

What changes

Filter execution delegates to tokf-filter::apply() (net -35 lines of pipeline code)
RTK's registry, matching, and TOML parsing are unchanged — [filters.name] + match_command, build.rs, filter priority chain, rtk verify, RTK_NO_TOML / RTK_TOML_DEBUG all work exactly as before
Omission markers ("... (N lines omitted)", "... (N lines truncated)") are still applied by RTK as post-processing
7 pre-existing verify test failures fixed (filters with on_empty + empty input expected "" instead of the on_empty message — these were broken on master before this PR)

What this unlocks for RTK users

After this lands, anyone writing .rtk/filters.toml or built-in filters gains access to tokf's full feature set — without any breaking changes to existing filters:

[[section]] state machines for collecting failure blocks
[[chunk]] for splitting output into repeating blocks with aggregation
[on_success] / [on_failure] branches with templates
dedup / dedup_window for collapsing duplicate lines
[json] extraction via JSONPath
Template pipe operations (| each:, | join:, | truncate:, | keep:)

Backward compatibility

All 890 unit tests pass
All 111/111 inline verify tests pass (rtk verify --require-all)
cargo fmt --all --check && tokf run cargo clippy --all-targets clean
No changes to any .toml filter's logic (7 test expectation fixes only)
One cosmetic difference: truncate_lines_at now uses … (unicode ellipsis) instead of ... (3 ASCII dots)

Dependencies added

tokf-filter = "0.2.33" (no default features, Lua disabled — minimal binary size impact)
tokf-common = "0.2.33" (shared config types)

Benchmark results

Metric	Master (no tokf)	With tokf	Delta
Startup time	13.1ms ± 0.6ms	15.2ms ± 1.3ms	+2.1ms
Binary size	5.4MB	5.6MB	+0.2MB

Note: The 10ms startup target was already exceeded on current master (13.1ms) — this predates this PR. The tokf dependency adds ~2ms and 0.2MB, which I consider acceptable for the capabilities gained. Happy to investigate optimization opportunities if the maintainers feel differently.

Motivation

I don't want two near-identical filter engines maintained in parallel. By sharing the core pipeline, bug fixes and new features in tokf automatically benefit RTK, and RTK's extensive filter library (47 built-in filters with 111 inline tests) has already helped me find and fix bugs in tokf — like match_output not respecting strip_ansi. The ecosystems are stronger together.

Test plan

cargo fmt --all --check — clean
cargo clippy --all-targets — clean
cargo test --all — 890 passed, 0 failed
rtk verify --require-all — 111/111 passed
Benchmark startup time with hyperfine — 15.2ms (+2.1ms over master)
Binary size check — 5.6MB (+0.2MB over master)
Manual smoke test: rtk make --version, rtk git log -5, rtk ping -c 2 localhost

🤖 Generated with Claude Code

Delegate RTK's 8-stage filter pipeline to tokf-filter::apply() while keeping the registry, command matching, build.rs concatenation, rtk verify, and omission markers unchanged. Unlocks tokf's full feature set (sections, chunks, JSON extraction, templates) for .rtk/filters.toml authors. - All 890 unit tests pass - All 111/111 inline verify tests pass - 7 pre-existing verify test failures fixed (on_empty + empty input) - One cosmetic change: truncate_lines_at uses unicode ellipsis (…) - +2.1ms startup overhead, +0.2MB binary size Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Alorse · 2026-03-13T17:00:27Z

@mpecan This is a great PR that unlocks a lot of potential for RTK! I wanted to suggest an enhancement that would be incredibly valuable for MCP tool users.

Use Case: MCP-Specific Filters

With the rise of MCP (Model Context Protocol) tools, there's a growing need to filter verbose JSON responses from external tools that RTK doesn't natively support. For example, the ClickUp MCP returns massive JSON payloads with fields like workspace_id, creator, custom_fields.type_config, etc. that consume tokens but are rarely relevant.

Proposed Enhancement

Would it be possible to extend the TOML DSL to support conditional filters based on MCP tool name patterns? Something like:

[[filters.clickup]]
match_mcp = "mcp__clickup__.*"

[filters.clickup.json]
# JSONPath extraction for specific fields
extract = "{tasks: [.tasks[] | {id, name, status, assignee: .assignees[0].username}]}"

# Or field exclusion
exclude_paths = [
    "$..workspace_id",
    "$..creator",
    "$..custom_fields[*].type_config",
    "$..assignees[*].profilePicture"
]

# Array limits
max_array_items = 10

# Field truncation
[filters.clickup.json.truncate]
description = 200
markdown_description = 0  # 0 = remove entirely

Why This Matters

No code changes needed: Users can add filters for new MCP tools without waiting for RTK releases
Community sharing: Users could share .toml filter packs for popular MCPs (ClickUp, Slack, Notion, etc.)
Complements PR feat: Compress MCP tool output via PostToolUse hook #535: While feat: Compress MCP tool output via PostToolUse hook #535 adds generic MCP compression, this would enable semantic filtering per tool

Implementation Ideas

The filter could be triggered by:

A new match_mcp pattern in the TOML
Or reuse match_command with MCP-aware detection
The JSONPath features you mentioned in the PR description seem like they could handle the field filtering

Would this fit within the scope of tokf-filter's roadmap? Happy to help test or refine the proposal!

Related: PR #535 also addresses MCP output compression but with a generic approach. These two PRs could work beautifully together—#535 for generic truncation, and this proposal for tool-specific semantic filtering.

mpecan · 2026-03-13T17:06:40Z

@Alorse The matching is done fully in RTK, so the filter that is used is completely independent from the tokf implementation, this only adds the Filter layer.

That said: the fact above makes it easier to add MCP matching into the tool and reuse the JSON capabilities from tokf.

Just to make sure: what I am trying to say is that the use-case you are proposing doesn't require any change to tokf-filter.

Merge upstream/master into feat/tokf-filter-engine. Conflicts resolved: - Cargo.toml: keep both tokf-* deps and new which dep - 7 filter TOMLs: trivial test name wording (upstream fix matched ours) 932 tests pass, 111/111 verify tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mpecan · 2026-03-17T12:50:11Z

@Alorse resolved conflicts with master. Any concerns with the PR?

pszymkowiak · 2026-03-17T14:58:24Z

Please retarget this PR to develop instead of master. We use develop as the integration branch — master is for releases only. You can change the base branch in the PR settings. Thanks!

mpecan · 2026-03-17T16:32:43Z

@pszymkowiak sorry about that, need to be more careful when reading.

CLAassistant · 2026-03-20T16:46:30Z

All committers have signed the CLA.

aeppling · 2026-03-26T18:43:49Z

Hey

We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes src/ from a flat layout into subfolders.

No logic changes — only file moves and import path updates.

What you need to do

Rebase your branch on develop when receiving this comment:

git fetch origin && git rebase origin/develop

Git detects renames automatically. If you get import conflicts, update the paths:

use crate::git;        // now: use crate::cmds::git::git;
use crate::tracking;   // now: use crate::core::tracking;
use crate::config;     // now: use crate::core::config;
use crate::init;       // now: use crate::hooks::init;
use crate::gain;       // now: use crate::analytics::gain;

Need help rebasing? Tag @aeppling

pszymkowiak added the wrong-base PR targets master instead of develop label Mar 17, 2026

mpecan changed the base branch from master to develop March 17, 2026 16:32

pszymkowiak force-pushed the develop branch from d400e71 to 8fae5b0 Compare March 18, 2026 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace custom filter engine with tokf-filter crate#577

feat: replace custom filter engine with tokf-filter crate#577
mpecan wants to merge 2 commits intortk-ai:developfrom
mpecan:feat/tokf-filter-engine

mpecan commented Mar 13, 2026

Uh oh!

Alorse commented Mar 13, 2026

Uh oh!

mpecan commented Mar 13, 2026 •

edited

Loading

Uh oh!

mpecan commented Mar 17, 2026

Uh oh!

pszymkowiak commented Mar 17, 2026

Uh oh!

mpecan commented Mar 17, 2026

Uh oh!

CLAassistant commented Mar 20, 2026 •

edited

Loading

Uh oh!

aeppling commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mpecan commented Mar 13, 2026

Summary

What changes

What this unlocks for RTK users

Backward compatibility

Dependencies added

Benchmark results

Motivation

Test plan

Uh oh!

Alorse commented Mar 13, 2026

Use Case: MCP-Specific Filters

Proposed Enhancement

Why This Matters

Implementation Ideas

Uh oh!

mpecan commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mpecan commented Mar 17, 2026

Uh oh!

pszymkowiak commented Mar 17, 2026

Uh oh!

mpecan commented Mar 17, 2026

Uh oh!

CLAassistant commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aeppling commented Mar 26, 2026

What you need to do

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mpecan commented Mar 13, 2026 •

edited

Loading

CLAassistant commented Mar 20, 2026 •

edited

Loading