fix(tokenizer): support Kimi K2/K2.5/K2.6 tiktoken models by CatherineSue · Pull Request #1482 · lightseekorg/smg

CatherineSue · 2026-05-13T01:38:34Z

Description

Problem

Kimi K2 / K2.5 / K2.6 ship tiktoken-style vocabs alongside their HF repos. Their tokenizer_config.json enumerates a handful of named special tokens (e.g. <|im_assistant|> at ID 163588), but does not enumerate every slot in the 256-slot reserved-special-token range that upstream tokenization_kimi.py synthesizes via:

{special_tokens_mapping.get(i, f"<|reserved_token_{i}|>"): i
 for i in range(num_base_tokens, num_base_tokens + 256)}

The generic TiktokenTokenizer only registered IDs present in tiktoken.model or added_tokens_decoder, so unnamed reserved IDs were unknown to CoreBPE. Decoding a sequence that contained one fell through into _decode_native_and_split, which indexes special_tokens_decoder[&token] directly — panic on the missing key.

There is also a pre-tokenizer mismatch: Kimi's regex begins with [\p{Han}]+ and uses character-class intersection ([X&&[^\p{Han}]]) so Han chars are split from the letter/contraction branches. The generic path used the cl100k regex, mangling Han-heavy inputs.

Fixes #1475.

Solution

Detect Kimi inside TiktokenTokenizer::load_from_path and specialize two things:

Pattern — use the Han-aware regex (byte-identical to upstream pat_str, verified against K2-Thinking, K2.5, and K2.6 — all three tokenization_kimi.py files are identical on the BPE-relevant fields).
Reserved slots — fill 256 entries at [len(mergeable_ranks), len(mergeable_ranks)+256) with <|reserved_token_{id}|>, skipping any IDs already occupied by explicit added_tokens_decoder entries.

Detection lives in a crate-private kimi_k2_tokenizer helper module and exposes only three items to tiktoken.rs: the regex constant, matches_dir, and apply_reserved_special_tokens. There is no separate public KimiK2Tokenizer type and no change to factory.rs — Kimi support is transparent at the call surface, so future tiktoken variants (LLaMA-3-tiktoken, etc.) can follow the same pattern without growing a dispatch table.

Detection matches all three Kimi variants:

Primary: tokenizer_config.json mentions tokenization_kimi (via auto_map, tokenizer_class, etc.).
Fallback: config.json::model_type ∈ {kimi_k2, kimi_k25}. K2-Thinking reports kimi_k2; K2.5 and K2.6 both report kimi_k25. The architecture string is intentionally not checked — K2-Thinking shares DeepseekV3ForCausalLM with vanilla DeepSeek and would false-positive.

Separately, the generic tiktoken decode path now returns a clean Err on truly unknown token IDs instead of falling into the lossy-byte path that panics. The guard matches tiktoken-rs's DecodeKeyError Display message; the typed error lives in a private module (vendor_tiktoken) and isn't re-exported as of tiktoken-rs 0.9.1, so a typed downcast isn't available.

Changes

crates/tokenizer/src/kimi_k2_tokenizer.rs (new, crate-private): KIMI_K2_PATTERN, matches_dir, apply_reserved_special_tokens, and helpers for the two-tier detection.
crates/tokenizer/src/tiktoken.rs: load_from_path now consults the Kimi module, picks the pattern, pre-fills reserved slots before constructing CoreBPE; from_encoder_and_config is now private. decode returns a structured error on unknown IDs instead of letting _decode_native_and_split panic.
crates/tokenizer/src/lib.rs: mod kimi_k2_tokenizer (was pub mod); the helper module has no public surface.

Test Plan

cargo test -p llm-tokenizer --lib — 132/132 pass, including five new Kimi tests:

reserved_special_tokens_are_synthesized — 256 slots filled, explicit names preserved, vocab_size correct, decode/encode of synthetic reserved IDs round-trips.
matches_via_model_type_kimi_k2 — model_type: "kimi_k2" (K2-Thinking) triggers Kimi load path.
matches_via_model_type_kimi_k25 — model_type: "kimi_k25" (K2.5 + K2.6) triggers Kimi load path.
uses_kimi_pattern_for_contractions — the Kimi regex matches hello's as one chunk (cl100k would split before 's).
han_input_round_trips_through_kimi_pattern — Han + ASCII mixed input encodes without panic and decodes byte-equal. Guards the fancy-regex character-class-intersection compatibility risk.

cargo clippy -p llm-tokenizer --lib --tests --no-deps clean.
cargo +nightly fmt -p llm-tokenizer -- --check clean.

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

New Features
- Automatic detection and support for Kimi K2/K2.5/K2.6 tokenizers, including Han-aware tokenization and improved contraction handling.
- Synthesized reserved special-token ranges to avoid ID conflicts.
Bug Fixes
- Clearer user-facing error when decoding unknown token IDs.
- More robust loading when tokenizer config file is absent.
Tests
- Added tests for tokenizer detection, reserved-token synthesis, missing-config handling, and decode error behavior.

Kimi K2/K2.5/K2.6 ship tiktoken-style vocabs with a 256-slot reserved special-token range starting at `len(mergeable_ranks)` and a Han-aware pre-tokenizer regex. The generic tiktoken loader knew nothing about either, so: - IDs in the reserved range that `tokenizer_config.json` did not enumerate (e.g. anything other than `<|im_assistant|>` etc.) panicked on decode via tiktoken-rs's `_decode_native_and_split` lossy-fallback path. - The cl100k regex was used for pre-tokenization, ignoring Kimi's `[\p{Han}]+` leading alternation. Detection and specialization now live inside `TiktokenTokenizer::load_from_path`, so `factory.rs` stays untouched and Kimi support is invisible at the type level — there is no separate public `KimiK2Tokenizer`. The Kimi module is crate-private and exposes only three items to its sibling: the regex constant, a `matches_dir` detector, and a reserved-special-token fill helper. Kimi detection matches all three known variants: - primary: `tokenizer_config.json` mentions `tokenization_kimi` - fallback: `config.json::model_type` is `kimi_k2` or `kimi_k25` (K2-Thinking reports `kimi_k2`; K2.5 and K2.6 both report `kimi_k25`) A separate decode guard returns a clean error on truly unknown token IDs instead of letting the lossy-decode fallback panic. The guard matches tiktoken-rs's `DecodeKeyError` Display message; the typed error is not re-exported by the crate as of 0.9.1. Fixes #1475 Signed-off-by: Chang Su <[email protected]>

coderabbitai · 2026-05-13T01:38:45Z

📝 Walkthrough

Walkthrough

Adds Kimi K2/K2.5/K2.6 tokenization support with a Han-aware regex, synthesizes 256 reserved special tokens at the vocab boundary, refactors tiktoken loading to detect/apply Kimi patterns, and surfaces a clean error for unknown token IDs to avoid decode panics.

Changes

Kimi K2 tokenizer integration

Layer / File(s)	Summary
Kimi K2 tokenizer detection & reserved-token synthesis `crates/tokenizer/src/kimi_k2_tokenizer.rs`, `crates/tokenizer/src/lib.rs`	Defines `KIMI_K2_PATTERN`, `matches(tokenizer_config, dir)` (searches `tokenizer_config.json` for `tokenization_kimi` or `config.json::model_type` for `kimi_k2`/`kimi_k25`), and `apply_reserved_special_tokens` that reserves 256 IDs starting at `base_vocab_size` by synthesizing `<
Refactored tiktoken loading with Kimi pattern & reserved-token handling `crates/tokenizer/src/tiktoken.rs`	Adds directory-based config loader returning a parsed JSON value or `TiktokenConfig::default()` when missing, refactors `TiktokenTokenizer::load_from_path` to load config via the helper, detect Kimi directories, mutate `config.added_tokens` for Kimi models, and select `KIMI_K2_PATTERN` or `CL100K_BASE_PATTERN`. `CoreBPE::new` now uses the chosen `pattern`. Includes a test for missing `tokenizer_config.json`.
Decode error detection to prevent panic on unknown token IDs `crates/tokenizer/src/tiktoken.rs`	`Decoder::decode` now recognizes tiktoken's unknown-token decode error and returns a clean `Error::msg` (via `is_unknown_tiktoken_decode_error`) instead of falling into a path that can panic. Adds test utilities (`write_minimal_tiktoken_dir`) and a test asserting the new error message for unknown token IDs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

lightseekorg/smg#439: Related tokenizer-config parsing and directory-based loading changes that this PR builds on.
lightseekorg/smg#769: Related changes to Decoder::decode error/losing-decode handling.

Suggested labels

tests

Suggested reviewers

slin1237
key4ng

Poem

🐰 A Kimi K2 song I wrote in a hop,
Han characters dance and never stop,
Reserved tokens lined in a neat little row,
Decoding now gentle — no panic, just flow,
Hooray for the tokenizer — nibble and go!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(tokenizer): support Kimi K2/K2.5/K2.6 tiktoken models' accurately and concisely describes the main change—adding support for Kimi K2 series tokenizers to the tokenizer module.
Linked Issues check	✅ Passed	The PR meets all coding objectives from issue `#1475`: detects Kimi tokenizers via model_type or tokenization_kimi patterns, synthesizes 256 reserved-token slots with proper name preservation, and hardens decode error handling to prevent panics on unknown token IDs.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to supporting Kimi K2 tokenizers: the new kimi_k2_tokenizer module, tiktoken.rs refactoring for Kimi detection and reserved-token synthesis, and lib.rs module registration are all aligned with issue `#1475` requirements.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/kimi-k2-tiktoken-support

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude

Clean, well-tested addition of Kimi K2/K2.5/K2.6 tokenizer support. The Kimi-specific logic is nicely isolated in its own module, the refactoring of load_from_path into from_encoder_and_config is a good seam for future model-specific specialization, and the unknown-token decode error fix prevents a panic in the lossy-decode fallback. One nit flagged inline about the K2.6 model_type fallback gap.

Summary: 0 🔴 Important · 1 🟡 Nit · 0 🟣 Pre-existing

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/tokenizer/src/kimi_k2_tokenizer.rs`:
- Around line 84-89: The current value_mentions_kimi_tokenizer function uses
s.contains("tokenization_kimi") which produces false positives; change the
string check in Value::String(s) to a stricter match (e.g., exact equality or a
word-boundary/regex match) so only true identifiers like "tokenization_kimi" or
token boundaries are accepted (preserve recursion for Value::Array and
Value::Object); update the check in value_mentions_kimi_tokenizer (the
Value::String arm) to use equality or a whole-word/regex test instead of
contains.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 104fa769-894c-4321-bea9-37e64f20a83b

📥 Commits

Reviewing files that changed from the base of the PR and between 76d9c18 and c2df5be.

📒 Files selected for processing (3)

crates/tokenizer/src/kimi_k2_tokenizer.rs
crates/tokenizer/src/lib.rs
crates/tokenizer/src/tiktoken.rs

…atches `load_from_path` no longer delegates to a separate `from_encoder_and_config` seam — that seam only existed to let the old external `KimiK2Tokenizer` type build the same struct. With Kimi specialization living inside `load_from_path` itself, the helper has no second caller and is folded back. Revert the matching `pub(crate)` visibility on `CL100K_BASE_PATTERN`, `Rank`, `TiktokenConfig` (and its fields), `load_tiktoken_config`, `load_tiktoken_config_from_dir`, `load_tiktoken_bpe`, and `find_tiktoken_file`. Nothing outside `tiktoken.rs` references any of them now, so they go back to private. The original step-numbered comments (1–5) in `load_from_path` are preserved. No behavior change. `cargo test -p llm-tokenizer --lib` still 132/132; clippy clean. Signed-off-by: Chang Su <[email protected]>

gemini-code-assist

Code Review

This pull request introduces specialized support for Kimi-K2, K2.5, and K2.6 models by implementing a Han-aware tokenization regex and handling a 256-slot reserved special-token range. The TiktokenTokenizer was refactored to support these specializations through automatic directory detection and configuration loading, and error handling was improved for unknown token IDs during decoding. Review feedback suggested using checked_add to prevent potential overflows when generating reserved token IDs and recommended refactoring the configuration loading process to eliminate redundant I/O during model detection.

gemini-code-assist · 2026-05-13T01:44:14Z

-            load_tiktoken_config(&config_path)?
+        let mut config = load_tiktoken_config_from_dir(dir)?;
+
+        let pattern = if kimi_k2_tokenizer::matches_dir(dir) {


This call to matches_dir leads to redundant I/O because it re-reads tokenizer_config.json, which was already read and parsed by load_tiktoken_config_from_dir on line 234. Consider refactoring load_tiktoken_config_from_dir to return the raw serde_json::Value alongside the TiktokenConfig, and update matches_dir to accept an optional pre-parsed config value to avoid unnecessary disk access and improve maintainability.

References

Extract duplicated logic into a shared helper function to improve maintainability and reduce redundancy.

Fixed in 9c48829. load_tiktoken_config_from_dir now returns (TiktokenConfig, Option<Value>) and kimi_k2_tokenizer::matches takes the pre-parsed value, so tokenizer_config.json is read once. config.json fallback still does its own read (different file).

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/tokenizer/src/tiktoken.rs`:
- Around line 62-69: Add a unit test that verifies load_tiktoken_config_from_dir
returns TiktokenConfig::default() when the directory lacks a
tokenizer_config.json: create a temporary directory, do not write
tokenizer_config.json, call load_tiktoken_config_from_dir(temp_dir.path()), and
assert the Result is Ok and equals TiktokenConfig::default(). Place the test
near other tokenizer tests (e.g., in the same module/tests area) and name it
something like load_tiktoken_config_from_dir_missing_file_returns_default to
make the intent clear.
- Around line 518-520: The helper is_unknown_tiktoken_decode_error currently
relies on the tiktoken-rs error string prefix "Invalid token for decoding:"
which is an internal formatting detail; update the code to either add an
explicit INVARIANT comment above is_unknown_tiktoken_decode_error documenting
that this prefix is a relied-upon upstream contract, or add a regression test
that constructs the upstream decode error (via vendor_tiktoken::DecodeKeyError
or by triggering the decode path) and asserts the error message format to detect
upstream changes; reference the function name is_unknown_tiktoken_decode_error
when adding the comment or test so reviewers can easily find and verify the
stability assumption.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6dda397c-12f4-453f-8bf5-2c7b755aff7f

📥 Commits

Reviewing files that changed from the base of the PR and between c2df5be and 9eb42a1.

📒 Files selected for processing (1)

crates/tokenizer/src/tiktoken.rs

Address two unresolved review threads on PR 1482: 1. (coderabbit) `value_mentions_kimi_tokenizer` matched any string containing `tokenization_kimi` as a substring, which would misclassify identifiers like `tokenization_kimi_v2` or `my_tokenization_kimi_helper`. Replace the `.contains(...)` test with a dotted-segment match via `split('.').any(...)`, so only real Python module identifiers (`tokenization_kimi`, `tokenization_kimi.TikTokenTokenizer`, `pkg.tokenization_kimi.X`) match. Add a regression test. 2. (gemini) `load_from_path` read `tokenizer_config.json` twice — once via `load_tiktoken_config_from_dir`, then again inside `kimi_k2_tokenizer::matches_dir`. Collapse to a single read: `load_tiktoken_config_from_dir` now returns the parsed JSON value alongside the `TiktokenConfig`, and the Kimi entry point is renamed `matches(tokenizer_config: Option<&Value>, dir)` so callers pass the parsed value in. The `config.json` fallback still does its own read (different file). `cargo test -p llm-tokenizer --lib` → 133/133 (was 132, gained the false-positive guard). Clippy clean. Signed-off-by: Chang Su <[email protected]>

coderabbitai

♻️ Duplicate comments (1)

crates/tokenizer/src/tiktoken.rs (1)

511-520: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Clarify the stability assumption with an INVARIANT marker.

The comment claims the error message format is "stable" based on internal vendor_tiktoken code, but this isn't a public API contract. Per repository conventions, consider using an INVARIANT: marker to document this relied-upon assumption.

Suggested documentation improvement

 /// Detect tiktoken's "unknown token id" error so we can surface a clean error
 /// instead of letting the lossy-decode fallback panic on a missing key.
 ///
-/// We match on the `Display` string because tiktoken-rs's `DecodeKeyError` lives
-/// in a private `vendor_tiktoken` module and isn't re-exported (as of 0.9.1),
-/// so a typed `downcast_ref` is not available. The message format is stable —
-/// see `vendor_tiktoken::DecodeKeyError::fmt` upstream.
+/// We match on the `Display` string because tiktoken-rs's `DecodeKeyError` lives
+/// in a private `vendor_tiktoken` module and isn't re-exported (as of 0.9.1),
+/// so a typed `downcast_ref` is not available.
+///
+/// INVARIANT: The error prefix "Invalid token for decoding:" is assumed stable
+/// based on tiktoken-rs 0.9.1's `vendor_tiktoken::DecodeKeyError::fmt`. If the
+/// upstream format changes, this check will fail to match and the error will
+/// fall through to lossy decoding (which may panic on truly unknown IDs).
 fn is_unknown_tiktoken_decode_error(err: &Error) -> bool {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tokenizer/src/tiktoken.rs` around lines 511 - 520, Update the doc
comment above is_unknown_tiktoken_decode_error to mark the stability assumption
explicitly with an INVARIANT: tag; keep the explanation that we match on Display
because vendor_tiktoken::DecodeKeyError is private, but add a line like
"INVARIANT: the upstream vendor_tiktoken::DecodeKeyError fmt string 'Invalid
token for decoding:' is relied upon here and must be updated if that
implementation changes" so future readers know this is an explicit contract tied
to the upstream implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@crates/tokenizer/src/tiktoken.rs`:
- Around line 511-520: Update the doc comment above
is_unknown_tiktoken_decode_error to mark the stability assumption explicitly
with an INVARIANT: tag; keep the explanation that we match on Display because
vendor_tiktoken::DecodeKeyError is private, but add a line like "INVARIANT: the
upstream vendor_tiktoken::DecodeKeyError fmt string 'Invalid token for
decoding:' is relied upon here and must be updated if that implementation
changes" so future readers know this is an explicit contract tied to the
upstream implementation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34abce70-ea5d-44f5-b69b-ac14c33fc48e

📥 Commits

Reviewing files that changed from the base of the PR and between 9eb42a1 and 9c48829.

📒 Files selected for processing (2)

crates/tokenizer/src/kimi_k2_tokenizer.rs
crates/tokenizer/src/tiktoken.rs

`load_tiktoken_config_from_dir` returns `(default, None)` when the file is absent; every other test writes one, so the branch was uncovered. Signed-off-by: Chang Su <[email protected]>

coderabbitai

♻️ Duplicate comments (1)

crates/tokenizer/src/tiktoken.rs (1)
514-517: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't present the tiktoken error prefix as a stable contract.

This helper still depends on an implementation-detail Display string from tiktoken-rs. If that formatting changes in a dependency bump, unknown IDs will miss this guard and drop back into the panic-prone fallback path. Please either document this as an INVARIANT: and pin it with a regression test, or soften the comment so it reads as a best-effort heuristic rather than a guaranteed contract.
In tiktoken-rs 0.9.1, is `DecodeKeyError` publicly re-exported, and is the `Display` prefix `Invalid token for decoding:` documented as a stable/public API contract?
Based on learnings, use INVARIANT: to document assumptions in safe Rust code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/tokenizer/src/tiktoken.rs` around lines 514 - 517, The code relies on
the tiktoken-rs DecodeKeyError Display prefix ("Invalid token for decoding:") as
a stable contract; either document this as an explicit INVARIANT and add a
regression test that asserts the current Display string (via the helper that
matches the Display output or by constructing vendor_tiktoken::DecodeKeyError
and calling fmt/display) so CI will catch future formatting changes, or relax
the comment to state this is a best-effort heuristic (not a guarantee) and
remove wording that treats the Display value as a stable API; update the comment
around the matching logic and add the unit test that fails if the Display prefix
changes (or remove hard equality checks in the helper to fall back more safely).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@crates/tokenizer/src/tiktoken.rs`:
- Around line 514-517: The code relies on the tiktoken-rs DecodeKeyError Display
prefix ("Invalid token for decoding:") as a stable contract; either document
this as an explicit INVARIANT and add a regression test that asserts the current
Display string (via the helper that matches the Display output or by
constructing vendor_tiktoken::DecodeKeyError and calling fmt/display) so CI will
catch future formatting changes, or relax the comment to state this is a
best-effort heuristic (not a guarantee) and remove wording that treats the
Display value as a stable API; update the comment around the matching logic and
add the unit test that fails if the Display prefix changes (or remove hard
equality checks in the helper to fall back more safely).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4aec6834-3459-41f5-8834-04dd1729e9a5

📥 Commits

Reviewing files that changed from the base of the PR and between 9c48829 and 7000219.

📒 Files selected for processing (1)

crates/tokenizer/src/tiktoken.rs

CatherineSue requested a review from slin1237 as a code owner May 13, 2026 01:38

github-actions Bot added the tokenizer Tokenizer related changes label May 13, 2026

claude Bot reviewed May 13, 2026

View reviewed changes

Comment thread crates/tokenizer/src/kimi_k2_tokenizer.rs

claude Bot approved these changes May 13, 2026

View reviewed changes

coderabbitai Bot requested changes May 13, 2026

View reviewed changes

Comment thread crates/tokenizer/src/kimi_k2_tokenizer.rs

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

coderabbitai Bot requested changes May 13, 2026

View reviewed changes

Comment thread crates/tokenizer/src/tiktoken.rs Outdated

Comment thread crates/tokenizer/src/tiktoken.rs

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

coderabbitai Bot approved these changes May 13, 2026

View reviewed changes

test(tokenizer): cover missing tokenizer_config.json branch

7000219

`load_tiktoken_config_from_dir` returns `(default, None)` when the file is absent; every other test writes one, so the branch was uncovered. Signed-off-by: Chang Su <[email protected]>

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Conversation

CatherineSue commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

CatherineSue May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CatherineSue commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading