-
Notifications
You must be signed in to change notification settings - Fork 79
Add vLLM #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vLLM #221
Changes from all commits
2b06217
aa89eca
4c5bc73
e27c7d0
5311dfe
b6d3432
c2af7ce
c870ab0
b554107
609abf6
717c407
0920524
b94bd9f
2be8126
0bb2f02
bf8d9e7
c771e94
1a6f20e
0a9893b
a8a6dc0
6ca1a5a
20272f2
b0b6ccc
13e2b4f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,7 +35,7 @@ RUN --mount=type=cache,target=/go/pkg/mod \ | |
| FROM docker/docker-model-backend-llamacpp:${LLAMA_SERVER_VERSION}-${LLAMA_SERVER_VARIANT} AS llama-server | ||
|
|
||
| # --- Final image --- | ||
| FROM docker.io/${BASE_IMAGE} AS final | ||
| FROM docker.io/${BASE_IMAGE} AS llamacpp | ||
|
|
||
| ARG LLAMA_SERVER_VARIANT | ||
|
|
||
|
|
@@ -55,9 +55,6 @@ RUN mkdir -p /var/run/model-runner /app/bin /models && \ | |
| chown -R modelrunner:modelrunner /var/run/model-runner /app /models && \ | ||
| chmod -R 755 /models | ||
|
|
||
| # Copy the built binary from builder | ||
| COPY --from=builder /app/model-runner /app/model-runner | ||
|
|
||
| # Copy the llama.cpp binary from the llama-server stage | ||
| ARG LLAMA_BINARY_PATH | ||
| COPY --from=llama-server ${LLAMA_BINARY_PATH}/ /app/. | ||
|
|
@@ -77,3 +74,31 @@ ENV LD_LIBRARY_PATH=/app/lib | |
| LABEL com.docker.desktop.service="model-runner" | ||
|
|
||
| ENTRYPOINT ["/app/model-runner"] | ||
|
|
||
| # --- vLLM variant --- | ||
| FROM llamacpp AS vllm | ||
|
|
||
| ARG VLLM_VERSION | ||
|
|
||
| USER root | ||
|
|
||
| RUN apt update && apt install -y python3 python3-venv python3-dev curl ca-certificates build-essential && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| RUN mkdir -p /opt/vllm-env && chown -R modelrunner:modelrunner /opt/vllm-env | ||
|
|
||
| USER modelrunner | ||
|
|
||
| # Install uv and vLLM as modelrunner user | ||
| RUN curl -LsSf https://astral.sh/uv/install.sh | sh \ | ||
| && ~/.local/bin/uv venv --python /usr/bin/python3 /opt/vllm-env \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we change this to copy from the vllm/vllm-openai:v0.11.0 container we get DGX Spark support (I know I suggested doing it this less hacky way, apologies, didn't realize the container had aarch64 and this way doesn't appear to). Could be a follow on PR too.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or this would be even better install the wheels from here: https://wheels.vllm.ai/b8b302cde434df8c9289a2b465406b47ebab1c2d/vllm/ That commit sha is the 0.11.0 one. They tipped me off in vLLM stack that they build CUDA x86_64 and aarch64 wheels for every commit. So this is the same thing, but has an aarch64 version also. Be better than the hacky container copy (which is prone to error, missing files, OS mismatch, library version mismatch).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a way to get that programatically: |
||
| && ~/.local/bin/uv pip install --python /opt/vllm-env/bin/python "vllm==${VLLM_VERSION}" | ||
doringeman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
doringeman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| RUN /opt/vllm-env/bin/python -c "import vllm; print(vllm.__version__)" > /opt/vllm-env/version | ||
|
|
||
| FROM llamacpp AS final-llamacpp | ||
| # Copy the built binary from builder | ||
| COPY --from=builder /app/model-runner /app/model-runner | ||
|
|
||
| FROM vllm AS final-vllm | ||
| # Copy the built binary from builder | ||
| COPY --from=builder /app/model-runner /app/model-runner | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag generation logic for vllm-cuda duplicates the pattern used for the cuda tags above. Consider extracting this into a reusable function or template to reduce code duplication and improve maintainability.