Skip to content

Conversation

@yahya010
Copy link
Contributor

@yahya010 yahya010 commented Aug 30, 2025

Adaptive token healing (@timvieira's idea) to handle any dead ends whete a tokenization makes the next byte unreachable. The boundary is moved earlier in the current partial bytes, commit an EOT, rematerialize and generate.

It first does a precheck O(len(P)) trie lookups and 0 LM calls, to find a valid earlier boundary where EOT + replay + target byte are available, then a materialize so only <=1 LM call needed.

@yahya010 yahya010 requested a review from benlebrun August 30, 2025 13:18
@codecov
Copy link

codecov bot commented Aug 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@yahya010
Copy link
Contributor Author

making some changes for multi splits...

@yahya010
Copy link
Contributor Author

yahya010 commented Sep 1, 2025

O(|P|) per k (greedy scan), worst‑case O(|P|^2) across k — but |P| is small (bytes since last boundary), so this is negligible.
LM calls: 1 materialization for the initial P[:k] commit + 1 per internal split planned inside S, no extra calls during planning

@benlebrun
Copy link
Member

@yahya010 Are you planning on fixing the code coverage?

@yahya010
Copy link
Contributor Author

yahya010 commented Sep 18, 2025

@benlebrun yeah, got distracted

@yahya010
Copy link
Contributor Author

alright, is that enough? rest are verbose and such

@avyavkumar
Copy link

Hi, we are using your library in our work -- and it's amazing, many thanks! If possible, can you please let me know by when this feature can be merged? It's something that would be extremely useful for our work.

@yahya010
Copy link
Contributor Author

yahya010 commented Oct 16, 2025

Hey, thanks a lot! I need to clean and add tests for code coverage and should be good to be reviewed. It's ready to use as is from the branch if needed, but I'll get it finalized soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants