vllm-frontend-rs

vllm-frontend-rs is an early-stage Rust drop-in alternative frontend for the vLLM inference engine. The current goal is to rebuild the northbound serving layer in Rust while still talking to the core Python vLLM engine process(es) via ZMQ over the existing engine boundary.

Architecture

The project is organized as a Cargo workspace with several crates, layered bottom-up:

┌─────────────────────────────────┐
│  vllm-cmd / vllm-rs             │  CLI entrypoint:
│                                 │  Python vLLM frontend subprocess
│                                 │  Rust managed-engine serve mode
├─────────────────────────────────┤
│  vllm-server                    │  OpenAI-compatible HTTP API (axum)
├─────────────────────────────────┤
│  vllm-chat                      │  Chat completions: template rendering,
│                                 │  structured assistant events,
│                                 │  reasoning & tool parsing
├─────────────────────────────────┤
│  vllm-text                      │  Tokenizer & incremental detokenizer
├─────────────────────────────────┤
│  vllm-llm                       │  Thin token-in/token-out facade over
│                                 │  the engine client
├─────────────────────────────────┤
│  vllm-engine-core-client        │  ZMQ transport + MessagePack protocol
│                                 │  for the headless vLLM engine
└─────────────────────────────────┘

Quick Start

Install the CLI from the repo root first:

# from the local checkout
cargo install --path src/cmd --bin vllm-rs

# or directly from the git repo
cargo install --git https://github.com/inferact/vllm-frontend-rs --bin vllm-rs

Python Integration

vllm-rs integrates into Python vllm as a Rust frontend subprocess. In that setup, Python owns process startup and launches the Rust API server as a Python-supervised worker, while passing the inherited listening socket and transport addresses into vllm-rs.

For example:

VLLM_USE_RUST_FRONTEND=1 vllm serve Qwen/Qwen3-0.6B

As a tightly-coupled sub-component of vLLM, it is expected that the code in this repo will be relocated to live in a rust subdirectory of the vLLM repo. For staging purposes however, we're currently including it as a submodule.

External Engine

vllm-rs serve can be run standalone with --data-parallel-size-local 0 when the Python engines are started elsewhere and this node should run only the Rust frontend. The frontend still uses the global --data-parallel-size to determine how many engines it expects to join the shared handshake.

vllm serve Qwen/Qwen3-0.6B \
  --headless \
  --data-parallel-address 127.0.0.1 \
  --data-parallel-rpc-port 62100 \
  --data-parallel-size 1 \
  --data-parallel-size-local 1

Then start the Rust frontend-only server:

vllm-rs serve Qwen/Qwen3-0.6B \
  --data-parallel-address 127.0.0.1 \
  --data-parallel-rpc-port 62100 \
  --data-parallel-size 1 \
  --data-parallel-size-local 0

Example Request

After either startup path, you can use any OpenAI-compatible client:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "stream": true
  }'

Name		Name	Last commit message	Last commit date
Latest commit History 254 Commits
.github/workflows		.github/workflows
proto		proto
src		src
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DCO		DCO
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml
rustfmt.unstable.toml		rustfmt.unstable.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-frontend-rs

Architecture

Quick Start

Python Integration

External Engine

Example Request

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vllm-frontend-rs

Architecture

Quick Start

Python Integration

External Engine

Example Request

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages