Skip to content

[Frontend][RFC] Rust front-end integration#40848

Open
njhill wants to merge 15 commits intovllm-project:mainfrom
njhill:rust_frontend
Open

[Frontend][RFC] Rust front-end integration#40848
njhill wants to merge 15 commits intovllm-project:mainfrom
njhill:rust_frontend

Conversation

@njhill
Copy link
Copy Markdown
Member

@njhill njhill commented Apr 24, 2026

See corresponding RFC for introducing a rust-based alternative front-end process in vLLM #40846.

For now we have staged the poc implementation in https://github.com/Inferact/vllm-frontend-rs. This PR contains the logic to integrate it into vLLM.

You can try it out by building this branch:

# Full install
uv pip install .

# Or use pre-compiled wheel
VLLM_USE_PRECOMPILED=1 uv pip install --editable .
./build_rust.sh

# Run it
VLLM_USE_RUST_FRONTEND=1 vllm serve ...

Co-authored with @BugenZhao

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 24, 2026

Documentation preview: https://vllm--40848.org.readthedocs.build/en/40848/

@mergify mergify Bot added documentation Improvements or additions to documentation ci/build frontend nvidia labels Apr 24, 2026
@mergify mergify Bot added cpu Related to CPU backends v1 labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Rust-based frontend to vLLM, adding a git submodule and updating the build system to support native extensions via setuptools-rust. It includes a new process manager for the Rust binary and environment variables for configuration. Feedback identifies a future-dated toolchain version, fragile environment variable parsing, and an inconsistent path to the Rust manifest file in the setup configuration.

Comment thread rust-toolchain.toml Outdated
@@ -0,0 +1,2 @@
[toolchain]
channel = "nightly-2026-03-10"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The specified Rust toolchain date nightly-2026-03-10 appears to be in the future. This will cause build failures as the toolchain cannot be found. Please update it to a valid, existing nightly version.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, why is the nightly toolchain required?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because we're heavily using coroutines which are still unstable (in grammar) features yet for async stream.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the use of nightlies and experimental coroutine_trait language feature. It comes with a risk and will inflict pain on downstream rebuilds. I would prefer to have vLLM use a stable MSRV instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding coroutine-style async streams, I think there are several alternatives that work under stable toolchain, at the cost of a bit performance overhead, which should be acceptable.

I would agree that there would be much less concern if we can find an alternative to get rid of unstable features and switch to a stable toolchain.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for considering to use Rust stable feature! It would help our downstream testing and rebuilds a lot.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates: I've replaced all (actually only a few) unstable language features used by the Rust frontend to stable alternative libraries / desugared syntax, so it now builds with stable toolchain!

Comment thread vllm/envs.py
When enabled, resolves VLLM_RUST_FRONTEND_PATH ("auto" by default)
to the actual binary path.
"""
use_rust = bool(int(os.environ.get("VLLM_USE_RUST_FRONTEND", "0")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The boolean conversion bool(int(os.environ.get("VLLM_USE_RUST_FRONTEND", "0"))) is fragile. It will raise a ValueError if the environment variable is set to common boolean strings like "true" or "false". Consider using a more robust check similar to how VLLM_USE_PRECOMPILED_RUST is handled at line 572.

Suggested change
use_rust = bool(int(os.environ.get("VLLM_USE_RUST_FRONTEND", "0")))
use_rust = os.environ.get("VLLM_USE_RUST_FRONTEND", "").strip().lower() in ("1", "true")

Comment thread setup.py
rust_extensions = [
RustExtension(
target="vllm.vllm-rs",
path="rust/src/cmd/Cargo.toml",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The path to the Cargo.toml manifest for the RustExtension is inconsistent with the build script. build_rust.sh uses the manifest at the root of the submodule (rust/Cargo.toml), while this points to a sub-directory. This may lead to build failures or missing workspace dependencies if the submodule is structured as a workspace.

Suggested change
path="rust/src/cmd/Cargo.toml",
path="rust/Cargo.toml",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we cannot pass a virtual (workspace) manifest here, so this comment seems invalid.

@njhill njhill marked this pull request as ready for review April 25, 2026 00:08
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 25, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Comment thread .gitmodules
@@ -0,0 +1,3 @@
[submodule "rust"]
path = rust
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: what about "rsrc" to match "csrc"? Or just putting it as a subdir to avoid adding another top level dir, like "csrc/rust/"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can decide on the best name. I considered this but it seemed to be non-standard. But now I realize csrc isn't really standard either so maybe you're right and rsrc would be better. I'm not keen on putting it under csrc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't find it a common convention to use rsrc for the directory name of rust sources, also putting it under csrc will be more confusing. I would still incline to rust for this.

njhill added 3 commits April 25, 2026 16:09
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@mergify mergify Bot added the rocm Related to AMD ROCm label Apr 26, 2026
Comment thread rust
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be moved under the vLLM organization? Would it be donated from the Inferact organization to the vLLM organization? Also, the vllm-frontend-rs repository does not have a formal LICENSE file.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang yes we are just staging it in the Inferact org initially, where we had been experimenting.

It seems like the general preference (including mine) is to have the code live within a subdir of this main vllm repo (i.e. the submodule in this PR will contain the actual code that's currently in the vllm-frontend-rs repo). In which case there's no need to have a new repo under the vllm org anyhow.

Good point re the license, will add that asap.

njhill added 2 commits April 27, 2026 20:34
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill removed the ready ONLY add when PR is ready to merge/full CI is needed label Apr 28, 2026
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@njhill njhill added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Apr 28, 2026
njhill added 3 commits April 28, 2026 15:05
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 29, 2026
Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!
Do we have e2e benchmark showing how much perf we can get from this refactor?

Ideally for a large dense model and a large moe model

# checkout does not recurse submodules, and the Dockerfile only sees what's in
# the build context, so initialize the submodule here before building.
git submodule sync --recursive
git submodule update --init --recursive
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until this point we've tried to avoid using git submodules as the UX can be a bit rough; we've preferred to use use cmakes FetchContent. I think it would be worth looking into that I think. At the very least i guess its probably worth discussing the use of git submodules given this would be the first (im personally Im generally ok with it but I know there's varying opinions in the community)

cc @tlrmchlsmth

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have pretty strong preferences against using submodules. At NeuralMagic in the DeepSparse days we found stale submodules to be a very annoying footgun.

If we're moving the code to in-tree it seems like we could avoid this. Or use FetchContent like Lucas suggests

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LucasWilkinson I also agree that we should avoid using a submodule. This was only done here to stage things initially so that we could keep/show the integration scaffolding in this PR and still allow folks to pull it and build/try it.

We still need to have some final decision on where the rust code should live, which I think so far is leaning towards living in a subdir of the main vllm repo since it's tightly coupled and considered primarily an "internal" component (see discussion related to this in #40846)

One thing that @BugenZhao is a concerned about is how we retain the commit history though if we do move the code here, since there's already a fair amount of them in https://github.com/Inferact/vllm-frontend-rs. There's quite a lot of code now and it is useful to be able to see the provenance of different parts of it.

If we do want to try to keep it, the commits can be recreated in this repo with updated paths, but it would mean having a whole bunch of new commits on main in one go, or possibly doing a merge commit (rather than squash-merge) which we haven't used in the past. Would welcome any opinions/thoughts about this.

BugenZhao added 2 commits May 5, 2026 13:36
# Conflicts:
#	docker/Dockerfile

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends documentation Improvements or additions to documentation frontend nvidia ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

Status: Todo
Status: No status

Development

Successfully merging this pull request may close these issues.

8 participants