-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
[Frontend][RFC] Rust front-end integration #40848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
a0c27b9
0fbf081
5b28f17
23fba08
7dc64e0
dc31ff2
ea83543
b1e87b8
e688944
ffe6b3f
903d79e
fbe08cb
09a074b
7f227c5
ac9e1a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| group: Rust Frontend | ||
| depends_on: | ||
| - image-build | ||
| steps: | ||
| - label: Rust Frontend OpenAI Coverage | ||
| timeout_in_minutes: 90 | ||
| device: h200_18gb | ||
| working_dir: "/vllm-workspace/tests" | ||
| source_file_dependencies: | ||
| - rust/ | ||
| - vllm/benchmarks/ | ||
| - vllm/entrypoints/openai/ | ||
| - vllm/entrypoints/serve/ | ||
| - vllm/v1/sample/ | ||
| - tests/utils.py | ||
| - tests/benchmarks/test_serve_cli.py | ||
| - tests/entrypoints/openai/chat_completion/test_chat_completion.py | ||
| # - tests/entrypoints/openai/chat_completion/test_chat_logit_bias_validation.py | ||
| # - tests/entrypoints/openai/chat_completion/test_chat_with_tool_reasoning.py | ||
| # - tests/entrypoints/openai/completion/test_prompt_validation.py | ||
| - tests/entrypoints/openai/completion/test_shutdown.py | ||
| # - tests/entrypoints/openai/test_return_token_ids.py | ||
| # - tests/entrypoints/openai/test_uds.py | ||
| - tests/v1/sample/test_logprobs_e2e.py | ||
| commands: | ||
| - export VLLM_USE_RUST_FRONTEND=1 | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - pytest -v -s benchmarks/test_serve_cli.py -k "not insecure and not (test_bench_serve and not test_bench_serve_chat)" | ||
| - pytest -v -s entrypoints/openai/chat_completion/test_chat_completion.py | ||
| # - pytest -v -s entrypoints/openai/chat_completion/test_chat_logit_bias_validation.py -k "not invalid" | ||
| # - pytest -v -s entrypoints/openai/chat_completion/test_chat_with_tool_reasoning.py | ||
| # - pytest -v -s entrypoints/openai/completion/test_prompt_validation.py -k "not prompt_embeds" | ||
| - pytest -v -s entrypoints/openai/completion/test_shutdown.py -k "not engine_failure and not test_abort_timeout_exits_quickly" | ||
| # - pytest -v -s entrypoints/openai/test_return_token_ids.py | ||
| # - pytest -v -s entrypoints/openai/test_uds.py | ||
| - pytest -v -s v1/sample/test_logprobs_e2e.py -k "test_prompt_logprobs_e2e_server" | ||
|
|
||
| - label: Rust Frontend Serve/Admin Coverage | ||
| timeout_in_minutes: 60 | ||
| device: h200_18gb | ||
| working_dir: "/vllm-workspace/tests" | ||
| source_file_dependencies: | ||
| - rust/ | ||
| - vllm/entrypoints/openai/ | ||
| - vllm/entrypoints/serve/ | ||
| - vllm/v1/engine/ | ||
| - tests/utils.py | ||
| # - tests/entrypoints/rpc/test_collective_rpc.py | ||
| - tests/entrypoints/serve/disagg/test_serving_tokens.py | ||
| - tests/entrypoints/serve/instrumentator/test_basic.py | ||
| - tests/entrypoints/serve/instrumentator/test_metrics.py | ||
| # - tests/entrypoints/serve/instrumentator/test_sleep.py | ||
| commands: | ||
| - export VLLM_USE_RUST_FRONTEND=1 | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| # - pytest -v -s entrypoints/rpc/test_collective_rpc.py | ||
| - pytest -v -s entrypoints/serve/instrumentator/test_basic.py -k "not show_version and not server_load" | ||
| - pytest -v -s entrypoints/serve/disagg/test_serving_tokens.py -k "not stream and not lora and not test_generate_logprobs and not stop_string_workflow" | ||
| - pytest -v -s entrypoints/serve/instrumentator/test_metrics.py -k "text and not show and not run_batch and not test_metrics_counts and not test_metrics_exist" | ||
| # - pytest -v -s entrypoints/serve/instrumentator/test_sleep.py | ||
|
|
||
| - label: Rust Frontend Core Correctness | ||
| timeout_in_minutes: 30 | ||
| device: h200_18gb | ||
| working_dir: "/vllm-workspace/tests" | ||
| source_file_dependencies: | ||
| - rust/ | ||
| - vllm/entrypoints/openai/ | ||
| - tests/utils.py | ||
| - tests/entrypoints/openai/correctness/test_lmeval.py | ||
| commands: | ||
| - export VLLM_USE_RUST_FRONTEND=1 | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - pytest -s entrypoints/openai/correctness/test_lmeval.py::test_lm_eval_accuracy_v1_engine | ||
|
|
||
| - label: Rust Frontend Tool Use | ||
| timeout_in_minutes: 60 | ||
| working_dir: "/vllm-workspace/tests" | ||
| source_file_dependencies: | ||
| - rust/ | ||
| - vllm/entrypoints/openai/ | ||
| - vllm/tool_parsers/ | ||
| - tests/utils.py | ||
| - tests/tool_use/ | ||
| commands: | ||
| - export VLLM_USE_RUST_FRONTEND=1 | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - pytest -v -s tool_use --ignore=tool_use/mistral --models llama3.2 toolACE -k "not test_response_format_with_tool_choice_required and not test_parallel_tool_calls_false and not test_tool_call_and_choice" | ||
|
|
||
| - label: Rust Frontend Distributed | ||
| timeout_in_minutes: 30 | ||
| num_devices: 4 | ||
| working_dir: "/vllm-workspace/tests" | ||
| source_file_dependencies: | ||
| - rust/ | ||
| - vllm/distributed/ | ||
| - vllm/engine/ | ||
| - vllm/executor/ | ||
| - vllm/v1/engine/ | ||
| - vllm/v1/worker/ | ||
| - tests/utils.py | ||
| - tests/v1/distributed/test_internal_lb_dp.py | ||
| commands: | ||
| - export VLLM_USE_RUST_FRONTEND=1 | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - export NCCL_CUMEM_HOST_ENABLE=0 | ||
| - TP_SIZE=1 DP_SIZE=4 pytest -v -s v1/distributed/test_internal_lb_dp.py -k "not 4 and not server_info" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| [submodule "rust"] | ||
| path = rust | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: what about "rsrc" to match "csrc"? Or just putting it as a subdir to avoid adding another top level dir, like "csrc/rust/"
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah we can decide on the best name. I considered this but it seemed to be non-standard. But now I realize csrc isn't really standard either so maybe you're right and rsrc would be better. I'm not keen on putting it under csrc.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personally I don't find it a common convention to use |
||
| url = https://github.com/Inferact/vllm-frontend-rs.git | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| #!/bin/bash | ||
| # Build the vllm-rs Rust frontend binary and install it into the vllm package. | ||
| # Usage: ./build_rust.sh [--debug] | ||
| # | ||
| # By default builds in release mode. Pass --debug for faster compile times | ||
| # during development. | ||
|
|
||
| set -euo pipefail | ||
|
|
||
| REPO_ROOT="$(cd "$(dirname "$0")" && pwd)" | ||
| RUST_DIR="$REPO_ROOT/rust" | ||
| TARGET_PATH="$REPO_ROOT/vllm/vllm-rs" | ||
|
|
||
| # Read the required toolchain from rust-toolchain.toml. | ||
| TOOLCHAIN=$(grep '^channel' "$RUST_DIR/rust-toolchain.toml" | sed 's/.*= *"\(.*\)"/\1/') | ||
|
|
||
| # Ensure rustup and the required toolchain are available. | ||
| if ! command -v rustup &>/dev/null; then | ||
| echo "rustup not found, installing..." | ||
| curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain none | ||
| source "$HOME/.cargo/env" | ||
| fi | ||
|
|
||
| if ! rustup run "$TOOLCHAIN" rustc --version &>/dev/null; then | ||
| echo "Installing Rust toolchain: $TOOLCHAIN" | ||
| rustup toolchain install "$TOOLCHAIN" | ||
| fi | ||
|
|
||
| if [[ "${1:-}" == "--debug" ]]; then | ||
| PROFILE_ARGS=() | ||
| PROFILE_DIR="debug" | ||
| else | ||
| PROFILE_ARGS=(--release) | ||
| PROFILE_DIR="release" | ||
| fi | ||
|
|
||
| cargo +"$TOOLCHAIN" build "${PROFILE_ARGS[@]}" \ | ||
| --manifest-path "$RUST_DIR/Cargo.toml" \ | ||
| --bin vllm-rs \ | ||
| --features native-tls-vendored | ||
|
|
||
| cp "$RUST_DIR/target/$PROFILE_DIR/vllm-rs" "$TARGET_PATH" | ||
| echo "Installed vllm-rs to $TARGET_PATH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until this point we've tried to avoid using git submodules as the UX can be a bit rough; we've preferred to use use cmakes
FetchContent. I think it would be worth looking into that I think. At the very least i guess its probably worth discussing the use of git submodules given this would be the first (im personally Im generally ok with it but I know there's varying opinions in the community)cc @tlrmchlsmth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have pretty strong preferences against using submodules. At NeuralMagic in the DeepSparse days we found stale submodules to be a very annoying footgun.
If we're moving the code to in-tree it seems like we could avoid this. Or use FetchContent like Lucas suggests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucasWilkinson I also agree that we should avoid using a submodule. This was only done here to stage things initially so that we could keep/show the integration scaffolding in this PR and still allow folks to pull it and build/try it.
We still need to have some final decision on where the rust code should live, which I think so far is leaning towards living in a subdir of the main vllm repo since it's tightly coupled and considered primarily an "internal" component (see discussion related to this in #40846)
One thing that @BugenZhao is a concerned about is how we retain the commit history though if we do move the code here, since there's already a fair amount of them in https://github.com/Inferact/vllm-frontend-rs. There's quite a lot of code now and it is useful to be able to see the provenance of different parts of it.
If we do want to try to keep it, the commits can be recreated in this repo with updated paths, but it would mean having a whole bunch of new commits on main in one go, or possibly doing a merge commit (rather than squash-merge) which we haven't used in the past. Would welcome any opinions/thoughts about this.