Skip to content

feat(json): add TOON encoding to preserve all JSON values (#621, #827)#863

Open
voska wants to merge 1 commit intortk-ai:developfrom
voska:feat/toon-encoding
Open

feat(json): add TOON encoding to preserve all JSON values (#621, #827)#863
voska wants to merge 1 commit intortk-ai:developfrom
voska:feat/toon-encoding

Conversation

@voska
Copy link

@voska voska commented Mar 26, 2026

Summary

  • json_cmd and curl_cmd now try TOON encoding before compact/schema extraction
  • TOON preserves ALL values with 22-59% fewer bytes than JSON — LLMs get data they can reason over instead of type placeholders (benchmarks: 76.4% accuracy vs JSON's 75.0% across 4 models, 5,016 calls)
  • 16 KB byte budget (~4,000 tokens) with line-boundary truncation for large datasets
  • RTK_NO_TOON=1 disables, falling back to existing behavior
  • New toon_convert module: 1 public function, 9 tests

Before (6-item array):

[{
    id: 1
    name: "alice"
    role: "admin"
  }, ... +5 more]

After:

[6]{id,name,role}:
  1,alice,admin
  2,bob,user
  3,charlie,user
  4,diana,admin
  5,eve,user
  6,frank,user

142 bytes / 0% data → 120 bytes / 100% data.

Addresses #621 (JSON filter strips all values), #827 (silent truncation trust loss), #690 (over-compression causing 3x agent cost), #488 (no middle ground between aggressive and off).

Test plan

  • cargo fmt --all && cargo clippy --all-targets && cargo test — 1123 passed, 0 warnings
  • Binary size unchanged: 5.7 MB → 5.7 MB
  • Manual: echo '[6 items]' | rtk json - shows all items as TOON
  • Manual: RTK_NO_TOON=1 restores old compact behavior
  • Manual: --schema mode unchanged
  • UTF-8 boundary safety verified (multi-byte chars near budget)
  • No-newline truncation path verified (wide headers)

@CLAassistant
Copy link

CLAassistant commented Mar 26, 2026

CLA assistant check
All committers have signed the CLA.

@voska voska force-pushed the feat/toon-encoding branch from 47c290c to f316211 Compare March 26, 2026 15:33
@aeppling
Copy link
Contributor

Hey

We are cleaning up the codebase and improving the project structure for better onboarding. As part of this effort, PR #826 reorganizes src/ from a flat layout into subfolders.

No logic changes — only file moves and import path updates.

What you need to do

Rebase your branch on develop when receiving this comment:

git fetch origin && git rebase origin/develop

Git detects renames automatically. If you get import conflicts, update the paths:

use crate::git;        // now: use crate::cmds::git::git;
use crate::tracking;   // now: use crate::core::tracking;
use crate::config;     // now: use crate::core::config;
use crate::init;       // now: use crate::hooks::init;
use crate::gain;       // now: use crate::analytics::gain;

Need help rebasing? Tag @aeppling

…rtk-ai#827)

RTK's json_cmd strips values from JSON arrays >5 items, showing only
the first item. Multiple P0/P1 issues document user pain from silent
data loss (rtk-ai#621, rtk-ai#827, rtk-ai#690, rtk-ai#488).

TOON (Token-Oriented Object Notation) encodes JSON 22-59% smaller
while preserving ALL values. LLM benchmarks show 76.4% accuracy vs
JSON's 75.0%.

For a 50-row dataset:
  Raw JSON:          4,214 bytes (all data)
  TOON:              1,963 bytes (all data, 53% smaller)
  RTK compact (old):   142 bytes (1 item shown, 0% data)

Changes:
- Add toon-rust crate for JSON-to-TOON encoding
- New toon_convert module with budget-enforced conversion
- json_cmd: try TOON before compact_json in data mode
- curl_cmd: try TOON before schema extraction
- 16KB byte budget (~4,000 tokens) with line-boundary truncation
- RTK_NO_TOON=1 env var to disable
- Schema mode (--schema) unchanged
@voska voska force-pushed the feat/toon-encoding branch from f316211 to 2959d53 Compare March 26, 2026 20:56
@voska
Copy link
Author

voska commented Mar 26, 2026

Rebased and moved toon_convert.rs into src/core/. Import paths updated, 1,132 tests pass. Thanks @aeppling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants