Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/configuration/server.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,10 @@ about.
| `--tool-call-parser` | Parser for OpenAI-compatible tool-call payloads (handled by the smg gateway). |
| `--enable-custom-logit-processor` | Allow custom logit processors. Keep disabled unless the deployment needs it. |

Common parser values include `kimi_k2` and `gpt-oss`.
Common reasoning parser values include `kimi_k25`, `base`, `qwen3`, and
`deepseek_r1`. Common tool-call parser values include `kimik2`, `qwen`, `json`,
and `passthrough`. The parser names are validated by the SMG gateway, so use
the values accepted by the bundled `tokenspeed-smg` package.

## Speculative Decoding

Expand Down
4 changes: 2 additions & 2 deletions docs/guides/launching.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ tokenspeed serve nvidia/Kimi-K2.5-NVFP4 \
--max-num-seqs 256 \
--attention-backend trtllm_mla \
--moe-backend flashinfer_trtllm \
--reasoning-parser kimi_k2 \
--tool-call-parser kimi_k2
--reasoning-parser kimi_k25 \
--tool-call-parser kimik2
```

## Launch Checklist
Expand Down
10 changes: 4 additions & 6 deletions docs/recipes/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ tokenspeed serve nvidia/Kimi-K2.5-NVFP4 \
--max-num-seqs 256 \
--attention-backend trtllm_mla \
--moe-backend flashinfer_trtllm \
--reasoning-parser kimi_k2 \
--tool-call-parser kimi_k2 \
--reasoning-parser kimi_k25 \
--tool-call-parser kimik2 \
--host 0.0.0.0 \
--port 8000
```
Expand All @@ -44,8 +44,7 @@ tokenspeed serve openai/gpt-oss-20b \
--tensor-parallel-size 1 \
--max-model-len 131072 \
--chunked-prefill-size 8192 \
--reasoning-parser gpt-oss \
--tool-call-parser gpt-oss \
--reasoning-parser base \
--host 0.0.0.0 \
--port 8000
```
Expand All @@ -58,8 +57,7 @@ tokenspeed serve openai/gpt-oss-120b \
--kv-cache-dtype fp8 \
--chunked-prefill-size 8192 \
--max-num-seqs 256 \
--reasoning-parser gpt-oss \
--tool-call-parser gpt-oss \
--reasoning-parser base \
--host 0.0.0.0 \
--port 8000
```
Expand Down
2 changes: 1 addition & 1 deletion python/tokenspeed/runtime/utils/server_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -1312,7 +1312,7 @@ def add_cli_args(parser: argparse.ArgumentParser):
type=str,
default=ServerArgs.reasoning_parser,
help=(
"Reasoning parser name (e.g. 'minimax', 'gpt-oss'). "
"Reasoning parser name (e.g. 'minimax', 'kimi_k25'). "
"Used to defer json_schema grammars past the model's "
"reasoning channel."
),
Expand Down
Loading