-
Notifications
You must be signed in to change notification settings - Fork 75
fix: Fix Kimi K2.5 function calling and structured outputs #1430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gongwei-130
wants to merge
27
commits into
main
Choose a base branch
from
wei/kimi-k25-consolidate
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
4da2617
fix(grpc): skip reasoning parser when constrained decoding is active
vschandramourya b5830db
fix(grpc): register raise_exception in chat templates and coerce tool…
ConnorLi96 404afea
fix(tool_parser): fix function call parsing for models with native to…
ConnorLi96 163dc59
fix(gateway): comprehensive func call and response quality fixes
ConnorLi96 0a999ea
fix(tokenizer): load merged EOS token IDs from config.json + generati…
ConnorLi96 efc36fa
fix(tokenizer): reduce tiktoken partial UTF-8 decode log from warn to…
ConnorLi96 35de058
feat(protocol): add thinking param to Chat API and support bare strin…
ConnorLi96 2d10dbb
fix(reasoning): run reasoning parser before JSON/tool post-processing…
ConnorLi96 849382a
style: fix formatting, clippy warnings, and merge artifacts from cher…
ConnorLi96 1ea9977
fix(streaming): enable reasoning parser for constrained outputs
ConnorLi96 4e123d9
fix(kimik2): rewrite tool_call IDs and fix cross-chunk fence stripping
ConnorLi96 0bc9f24
feat(health): make /health_generate issue a real backend probe with l…
ConnorLi96 65dc892
feat(logging): pass through user-supplied request_id to engine
ConnorLi96 ddccd3b
feat(logging): compute per-message SHA-256 hashes for session reconst…
ConnorLi96 9edea19
fix(streaming): replace fence_buffer with simple cross-chunk fence st…
ConnorLi96 7086874
refactor(streaming): extract strip_json_fence helper with unit tests
ConnorLi96 52dd84b
feat(grpc): pass message_hashes through gRPC proto to TRT-LLM
ConnorLi96 f979166
fix(health): increase health_generate probe timeout from 3s to 60s
ConnorLi96 65ebdbf
fix(reasoning): skip reasoning parsing for structured output requests
ConnorLi96 0061eea
feat(health): skip inference probe when tokens forwarded recently
ConnorLi96 10af42d
fix(multimodal): use 1 placeholder per image for Kimi-K2.5
ConnorLi96 6764b57
fix(multimodal): collapse media placeholders for TRT-LLM only
ConnorLi96 4398a4f
fix(logging): downgrade message_hash log from info to debug
ConnorLi96 6a2278f
feat(protocols): add response_format.type=regex support
vschandramourya ad34ef9
Fix regex structured output reasoning parsing
883c8e7
Fix lint formatting
3c42c86
Make health_generate probe timeout configurable
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Nit:
ThinkingConfig::Enabled { .. }silently drops thebudget_tokensfield. Users who send{"type": "enabled", "budget_tokens": 4096}will get thinking enabled but their budget constraint ignored. Consider passingbudget_tokensthrough as a kwarg (e.g.,"budget_tokens") so chat templates or downstream engines that support it can respect the value.