[WIP] feat: Add UDS-based external tokenizer service #137

delavet · 2025-09-28T07:02:40Z

This PR introduces a new Unix Domain Socket (UDS) based tokenizer service that can be used as an external tokenizer for the KV Cache Manager, related to #126. The changes include:

Added a new tokenizer mode in the KV Cache Manager, which communicate with an external tokenizer service via http over UDS.
Added an example external tokenizer service with:
- Server implementation which can do both tokenization and chat-templating
- Tokenizer implementation with HF (transformers) code
- Dockerfile for containerization
- Gunicorn configuration for production deployment
- alongside documentation and tests
Updated the Helm chart to deploy the external tokenizer as a sidecar container alongside the KV Cache Manager
Added configuration options to enable the external tokenizer in the kv events online example.

Signed-off-by: Hang Yin <[email protected]>

vMaroon

Great work - thank you for this contribution @delavet. Added a couple of minor comments.

Do you think we can get some profiling data and performance benchmarks here?

vMaroon · 2025-10-06T07:57:03Z

examples/uds_tokenizer/models/README.md

@@ -0,0 +1,129 @@
+# Model Caching in Tokenizer Service


I think that the uds_tokenizer package would be better housed in a new services directory.

vMaroon · 2025-10-06T07:59:33Z

examples/uds_tokenizer/tokenizer_service/tokenizer.py

+        1. Once from the chat template itself (which may include BOS token)
+        2. Once from the add_special_tokens parameter
+
+        vLLM handles this by setting add_special_tokens=False when using chat templates.


Do you think it would make sense to extract and reuse vLLM preprocessing code as-is? Serving as a lightweight vLLM sub-component for disaggregated tokenization. Its maintenance would be syncing up versions and dependencies.

Not a blocker, for this open PR, but I think we should aim towards this path. What do you think?

delavet requested review from vMaroon, dannyharnik, elevran and kfirtoledo as code owners September 28, 2025 07:02

add external UDS based tokenizer service

d29dd1d

Signed-off-by: Hang Yin <[email protected]>

delavet force-pushed the external-tokenizer branch from ec44bd5 to d29dd1d Compare September 28, 2025 07:05

fix linting

02d8a41

Signed-off-by: Hang Yin <[email protected]>

vMaroon reviewed Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] feat: Add UDS-based external tokenizer service #137

[WIP] feat: Add UDS-based external tokenizer service #137

Uh oh!

delavet commented Sep 28, 2025

Uh oh!

vMaroon left a comment

Uh oh!

vMaroon Oct 6, 2025

Uh oh!

vMaroon Oct 6, 2025

Uh oh!

Uh oh!

[WIP] feat: Add UDS-based external tokenizer service #137

Are you sure you want to change the base?

[WIP] feat: Add UDS-based external tokenizer service #137

Uh oh!

Conversation

delavet commented Sep 28, 2025

Uh oh!

vMaroon left a comment

Choose a reason for hiding this comment

Uh oh!

vMaroon Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

vMaroon Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!