Skip to content

Commit 1736743

Browse files
mfuntowiczglegendre01Hugochpaulinebm
authored
Give TensorRT-LLMa proper CI/CD 😍 (#2886)
* test(ctest) enable address sanitizer * feat(trtllm): expose finish reason to Rust * feat(trtllm): fix logits retrieval * misc(ci): enabe building tensorrt-llm * misc(ci): update Rust action toolchain * misc(ci): let's try to build the Dockerfile for trtllm # Conflicts: # Dockerfile_trtllm * misc(ci): provide mecanism to cache inside container * misc(ci): export aws creds as output of step * misc(ci): let's try this way * misc(ci): again * misc(ci): again * misc(ci): add debug profile * misc(ci): add debug profile * misc(ci): lets actually use sccache ... * misc(ci): do not build with ssl enabled * misc(ci): WAT * misc(ci): WAT * misc(ci): WAT * misc(ci): WAT * misc(ci): WAT * misc(backend): test with TGI S3 conf * misc(backend): test with TGI S3 conf * misc(backend): once more? * misc(backend): let's try with GHA * misc(backend): missing env directive * misc(backend): make sure to correctly set IS_GHA_BUILD=true in wf * misc(backend): ok let's debug smtg * misc(backend): WWWWWWWWWWWWWAAAAAAAA * misc(backend): kthxbye retry s3 * misc(backend): use session token * misc(backend): add more info * misc(backend): lets try 1h30 * misc(backend): lets try 1h30 * misc(backend): increase to 2h * misc(backend): lets try... * misc(backend): lets try... * misc(backend): let's build for ci-runtime * misc(backend): let's add some more tooling * misc(backend): add some tags * misc(backend): disable Werror for now * misc(backend): added automatic gha detection * misc(backend): remove leak sanitizer which is included in asan * misc(backend): forward env * misc(backend): forward env * misc(backend): let's try * misc(backend): let's try * misc(backend): again * misc(backend): again * misc(backend): again * misc(backend): again * misc(backend): again * misc(backend): fix sscache -> sccache * misc(backend): fix sscache -> sccache * misc(backend): fix sscache -> sccache * misc(backend): let's actually cache things now * misc(backend): let's actually cache things now * misc(backend): attempt to run the testS? * misc(backend): attempt to run the tests? * misc(backend): attempt to run the tests? * change runner size * fix: Correctly tag docker images (#2878) * fix: Correctly tag docker images * fix: Correctly tag docker images * misc(llamacpp): maybe? * misc(llamacpp): maybe? * misc(llamacpp): maybe? * misc(ci): gogogo * misc(ci): gogogo * misc(ci): gogogo * misc(ci): gogogo * misc(ci): gogogo * misc(ci): gogogo * misc(ci): go * misc(ci): go * misc(ci): go * misc(ci): use bin folder * misc(ci): make the wf callable for reuse * misc(ci): make the wf callable for reuse (bis) * misc(ci): make the wf callable for reuse (bis) * misc(ci): give the wf a name * Create test-trtllm.yml * Update test-trtllm.yml * Create build-trtllm2 * Rename build-trtllm2 to 1-build-trtllm2 * Rename test-trtllm.yml to 1-test-trtllm2.yml * misc(ci): fw secrets * Update 1-test-trtllm2.yml * Rename 1-build-trtllm2 to 1-build-trtllm2.yml * Update 1-test-trtllm2.yml * misc(ci): use ci-build.yaml as main dispatcher * Delete .github/workflows/1-test-trtllm2.yml * Delete .github/workflows/1-build-trtllm2.yml * misc(ci): rights? * misc(ci): rights? * misc(ci): once more? * misc(ci): once more? * misc(ci): baby more time? * misc(ci): baby more time? * misc(ci): try the permission above again? * misc(ci): try the permission above again? * misc(ci): try the permission scoped again? * misc(ci): install tensorrt_llm_executor_static * misc(ci): attempt to rebuild with sccache? * misc(ci):run the tests on GPU instance * misc(ci): let's actually setup sccache in the build.rs * misc(ci): reintroduce variables * misc(ci): enforce sccache * misc(ci): correct right job name dependency * misc(ci): detect dev profile for debug * misc(ci): detect gha build * misc(ci): detect gha build * misc(ci): ok debug * misc(ci): wtf * misc(ci): wtf2 * misc(ci): wtf3 * misc(ci): use commit HEAD instead of merge commit for image id * misc(ci): wtfinfini * misc(ci): wtfinfini * misc(ci): KAMEHAMEHA * Merge TRTLLM in standard CI * misc(ci): remove input machine * misc(ci): missing id-token for AWS auth * misc(ci): missing id-token for AWS auth * misc(ci): missing id-token for AWS auth * misc(ci): again... * misc(ci): again... * misc(ci): again... * misc(ci): again... * misc(ci): missing benchmark * misc(ci): missing backends * misc(ci): missing launcher * misc(ci): give everything aws needs * misc(ci): give everything aws needs * misc(ci): fix warnings * misc(ci): attempt to fix sccache not building trtllm * misc(ci): attempt to fix sccache not building trtllm again --------- Co-authored-by: Guillaume LEGENDRE <[email protected]> Co-authored-by: Hugo Larcher <[email protected]> Co-authored-by: Pauline Bailly-Masson <[email protected]>
1 parent b980848 commit 1736743

14 files changed

+494
-178
lines changed

.github/workflows/build.yaml

+53-1
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,28 @@ jobs:
3131
group: ${{ github.workflow }}-build-and-push-image-${{ inputs.hardware }}-${{ github.head_ref || github.run_id }}
3232
cancel-in-progress: true
3333
runs-on:
34-
group: aws-highmemory-32-plus-priv
34+
group: aws-highmemory-64-plus-priv
3535
permissions:
3636
contents: write
3737
packages: write
38+
id-token: write
3839
steps:
3940
- name: Checkout repository
4041
uses: actions/checkout@v4
4142
- name: Inject slug/short variables
4243
uses: rlespinasse/[email protected]
44+
- name: Extract TensorRT-LLM version
45+
run: |
46+
echo "TENSORRT_LLM_VERSION=$(grep -oP '([a-z,0-9]{40})' $GITHUB_WORKSPACE/backends/trtllm/cmake/trtllm.cmake)" >> $GITHUB_ENV
47+
echo "TensorRT-LLM version: ${{ env.TENSORRT_LLM_VERSION }}"
48+
- name: "Configure AWS Credentials"
49+
id: aws-creds
50+
uses: aws-actions/configure-aws-credentials@v4
51+
with:
52+
aws-region: us-east-1
53+
role-to-assume: ${{ secrets.AWS_ROLE_GITHUB_TGI_TEST }}
54+
role-duration-seconds: 7200
55+
output-credentials: true
4356
- name: Construct harware variables
4457
shell: bash
4558
run: |
@@ -52,6 +65,7 @@ jobs:
5265
export runs_on="aws-g6-12xl-plus-priv-cache"
5366
export platform=""
5467
export extra_pytest=""
68+
export target="nil"
5569
;;
5670
cuda-trtllm)
5771
export dockerfile="Dockerfile_trtllm"
@@ -61,6 +75,10 @@ jobs:
6175
export runs_on="ubuntu-latest"
6276
export platform=""
6377
export extra_pytest=""
78+
export target="ci-runtime"
79+
export sccache_s3_key_prefix="trtllm"
80+
export sccache_region="us-east-1"
81+
export build_type="dev"
6482
;;
6583
rocm)
6684
export dockerfile="Dockerfile_amd"
@@ -71,6 +89,7 @@ jobs:
7189
export runs_on="ubuntu-latest"
7290
export platform=""
7391
export extra_pytest="-k test_flash_gemma_gptq_load"
92+
export target="nil"
7493
;;
7594
intel-xpu)
7695
export dockerfile="Dockerfile_intel"
@@ -80,6 +99,7 @@ jobs:
8099
export runs_on="ubuntu-latest"
81100
export platform="xpu"
82101
export extra_pytest=""
102+
export target="nil"
83103
;;
84104
intel-cpu)
85105
export dockerfile="Dockerfile_intel"
@@ -90,6 +110,7 @@ jobs:
90110
export runs_on="aws-highmemory-32-plus-priv"
91111
export platform="cpu"
92112
export extra_pytest="-k test_flash_gemma_simple"
113+
export target="nil"
93114
;;
94115
esac
95116
echo $dockerfile
@@ -106,6 +127,10 @@ jobs:
106127
echo "RUNS_ON=${runs_on}" >> $GITHUB_ENV
107128
echo "EXTRA_PYTEST=${extra_pytest}" >> $GITHUB_ENV
108129
echo REGISTRY_MIRROR=$REGISTRY_MIRROR >> $GITHUB_ENV
130+
echo "TARGET=${target}" >> $GITHUB_ENV
131+
echo "SCCACHE_S3_KEY_PREFIX=${sccache_s3_key_prefix}" >> $GITHUB_ENV
132+
echo "SCCACHE_REGION=${sccache_region}" >> $GITHUB_ENV
133+
echo "BUILD_TYPE=${build_type}" >> $GITHUB_ENV
109134
- name: Initialize Docker Buildx
110135
uses: docker/setup-buildx-action@v3
111136
with:
@@ -170,6 +195,14 @@ jobs:
170195
GIT_SHA=${{ env.GITHUB_SHA }}
171196
DOCKER_LABEL=sha-${{ env.GITHUB_SHA_SHORT }}${{ env.LABEL }}
172197
PLATFORM=${{ env.PLATFORM }}
198+
build_type=${{ env.BUILD_TYPE }}
199+
is_gha_build=true
200+
aws_access_key_id=${{ steps.aws-creds.outputs.aws-access-key-id }}
201+
aws_secret_access_key=${{ steps.aws-creds.outputs.aws-secret-access-key }}
202+
aws_session_token=${{ steps.aws-creds.outputs.aws-session-token }}
203+
sccache_bucket=${{ secrets.AWS_S3_BUCKET_GITHUB_TGI_TEST }}
204+
sccache_s3_key_prefix=${{ env.SCCACHE_S3_KEY_PREFIX }}
205+
sccache_region=${{ env.SCCACHE_REGION }}
173206
tags: ${{ steps.meta.outputs.tags || steps.meta-pr.outputs.tags }}
174207
labels: ${{ steps.meta.outputs.labels || steps.meta-pr.outputs.labels }}
175208
cache-from: type=s3,region=us-east-1,bucket=ci-docker-buildx-cache,name=text-generation-inference-cache${{ env.LABEL }},mode=min,access_key_id=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_ACCESS_KEY_ID }},secret_access_key=${{ secrets.S3_CI_DOCKER_BUILDX_CACHE_SECRET_ACCESS_KEY }},mode=min
@@ -215,3 +248,22 @@ jobs:
215248
echo $DOCKER_IMAGE
216249
docker pull $DOCKER_IMAGE
217250
pytest -s -vv integration-tests ${PYTEST_FLAGS} ${EXTRA_PYTEST}
251+
252+
backend_trtllm_cxx_tests:
253+
needs: build-and-push
254+
if: needs.build-and-push.outputs.label == '-trtllm'
255+
concurrency:
256+
group: ${{ github.workflow }}-${{ github.job }}-trtllm-${{ github.head_ref || github.run_id }}
257+
cancel-in-progress: true
258+
runs-on:
259+
group: aws-g6-12xl-plus-priv-cache
260+
container:
261+
image: ${{ needs.build-and-push.outputs.docker_image }}
262+
credentials:
263+
username: ${{ secrets.REGISTRY_USERNAME }}
264+
password: ${{ secrets.REGISTRY_PASSWORD }}
265+
options: --gpus all --shm-size=8g
266+
267+
steps:
268+
- name: Run C++/CUDA tests
269+
run: /usr/local/tgi/bin/tgi_trtllm_backend_tests

.github/workflows/ci_build.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ jobs:
4242
permissions:
4343
contents: write
4444
packages: write
45+
id-token: write
4546
with:
4647
hardware: ${{ matrix.hardware }}
4748
# https://github.com/actions/runner/issues/2206

Cargo.toml

+14-14
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
[workspace]
22
members = [
3-
"benchmark",
4-
"backends/v2",
5-
"backends/v3",
6-
"backends/grpc-metadata",
7-
"backends/trtllm",
8-
"launcher",
9-
"router"
3+
"benchmark",
4+
"backends/v2",
5+
"backends/v3",
6+
"backends/grpc-metadata",
7+
"backends/trtllm",
8+
"launcher",
9+
"router"
1010
]
1111
default-members = [
12-
"benchmark",
13-
"backends/v2",
14-
"backends/v3",
15-
"backends/grpc-metadata",
16-
# "backends/trtllm",
17-
"launcher",
18-
"router"
12+
"benchmark",
13+
"backends/v2",
14+
"backends/v3",
15+
"backends/grpc-metadata",
16+
# "backends/trtllm",
17+
"launcher",
18+
"router"
1919
]
2020
resolver = "2"
2121

Dockerfile_trtllm

+67-32
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,7 @@
1-
ARG CUDA_ARCH_LIST="75-real;80-real;86-real;89-real;90-real"
2-
ARG OMPI_VERSION="4.1.7rc1"
3-
4-
# Build dependencies resolver stage
5-
FROM lukemathwalker/cargo-chef:latest-rust-1.84.0 AS chef
6-
WORKDIR /usr/src/text-generation-inference/backends/trtllm
7-
8-
FROM chef AS planner
9-
COPY Cargo.lock Cargo.lock
10-
COPY Cargo.toml Cargo.toml
11-
COPY rust-toolchain.toml rust-toolchain.toml
12-
COPY router router
13-
COPY benchmark/ benchmark/
14-
COPY backends/ backends/
15-
COPY launcher/ launcher/
16-
RUN cargo chef prepare --recipe-path recipe.json
1+
ARG cuda_arch_list="75-real;80-real;86-real;89-real;90-real"
2+
ARG ompi_version="4.1.7rc1"
3+
ARG build_type=release
4+
ARG is_gha_build=false
175

186
# CUDA dependent dependencies resolver stage
197
FROM nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 AS cuda-builder
@@ -26,8 +14,11 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
2614
g++-14 \
2715
git \
2816
git-lfs \
17+
lld \
2918
libssl-dev \
3019
libucx-dev \
20+
libasan8 \
21+
libubsan1 \
3122
ninja-build \
3223
pkg-config \
3324
pipx \
@@ -43,9 +34,9 @@ ENV TENSORRT_INSTALL_PREFIX=/usr/local/tensorrt
4334

4435
# Install OpenMPI
4536
FROM cuda-builder AS mpi-builder
46-
ARG OMPI_VERSION
37+
ARG ompi_version
4738

48-
ENV OMPI_TARBALL_FILENAME="openmpi-$OMPI_VERSION.tar.bz2"
39+
ENV OMPI_TARBALL_FILENAME="openmpi-$ompi_version.tar.bz2"
4940
RUN wget "https://download.open-mpi.org/release/open-mpi/v4.1/$OMPI_TARBALL_FILENAME" -P /opt/src && \
5041
mkdir /usr/src/mpi && \
5142
tar -xf "/opt/src/$OMPI_TARBALL_FILENAME" -C /usr/src/mpi --strip-components=1 && \
@@ -65,34 +56,56 @@ RUN chmod +x /opt/install_tensorrt.sh && \
6556
FROM cuda-builder AS tgi-builder
6657
WORKDIR /usr/src/text-generation-inference
6758

59+
# Scoped global args reuse
60+
ARG is_gha_build
61+
ARG build_type
62+
6863
# Install Rust
64+
ENV PATH="/root/.cargo/bin:$PATH"
6965
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | bash -s -- -y && \
7066
chmod -R a+w /root/.rustup && \
71-
chmod -R a+w /root/.cargo
67+
chmod -R a+w /root/.cargo && \
68+
cargo install sccache --locked
69+
70+
# SCCACHE Specifics args - before finding a better, more generic, way...
71+
ARG aws_access_key_id
72+
ARG aws_secret_access_key
73+
ARG aws_session_token
74+
ARG sccache_bucket
75+
ARG sccache_s3_key_prefix
76+
ARG sccache_region
77+
78+
ENV AWS_ACCESS_KEY_ID=$aws_access_key_id
79+
ENV AWS_SECRET_ACCESS_KEY=$aws_secret_access_key
80+
ENV AWS_SESSION_TOKEN=$aws_session_token
81+
ENV SCCACHE_BUCKET=$sccache_bucket
82+
ENV SCCACHE_S3_KEY_PREFIX=$sccache_s3_key_prefix
83+
ENV SCCACHE_REGION=$sccache_region
7284

73-
ENV PATH="/root/.cargo/bin:$PATH"
74-
RUN cargo install cargo-chef
75-
76-
# Cache dependencies
77-
COPY --from=planner /usr/src/text-generation-inference/backends/trtllm/recipe.json .
78-
RUN cargo chef cook --release --recipe-path recipe.json
79-
80-
# Build actual TGI
81-
ARG CUDA_ARCH_LIST
82-
ENV CMAKE_PREFIX_PATH="/usr/local/mpi:/usr/local/tensorrt:$CMAKE_PREFIX_PATH"
8385
ENV LD_LIBRARY_PATH="/usr/local/mpi/lib:$LD_LIBRARY_PATH"
8486
ENV PKG_CONFIG_PATH="/usr/local/mpi/lib/pkgconfig:$PKG_CONFIG_PATH"
87+
ENV CMAKE_PREFIX_PATH="/usr/local/mpi:/usr/local/tensorrt:$CMAKE_PREFIX_PATH"
88+
89+
ENV USE_LLD_LINKER=ON
90+
ENV CUDA_ARCH_LIST=${cuda_arch_list}
91+
ENV IS_GHA_BUILD=${is_gha_build}
8592

8693
COPY Cargo.lock Cargo.lock
8794
COPY Cargo.toml Cargo.toml
8895
COPY rust-toolchain.toml rust-toolchain.toml
8996
COPY router router
90-
COPY backends/trtllm backends/trtllm
97+
COPY backends backends
98+
COPY benchmark benchmark
99+
COPY launcher launcher
91100
COPY --from=trt-builder /usr/local/tensorrt /usr/local/tensorrt
92101
COPY --from=mpi-builder /usr/local/mpi /usr/local/mpi
102+
93103
RUN mkdir $TGI_INSTALL_PREFIX && mkdir "$TGI_INSTALL_PREFIX/include" && mkdir "$TGI_INSTALL_PREFIX/lib" && \
94-
cd backends/trtllm && \
95-
CMAKE_INSTALL_PREFIX=$TGI_INSTALL_PREFIX cargo build --release
104+
python3 backends/trtllm/scripts/setup_sccache.py --is-gha-build ${is_gha_build} && \
105+
CMAKE_INSTALL_PREFIX=$TGI_INSTALL_PREFIX \
106+
RUSTC_WRAPPER=sccache \
107+
cargo build --profile ${build_type} --package text-generation-backends-trtllm --bin text-generation-backends-trtllm && \
108+
sccache --show-stats
96109

97110
FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04 AS runtime
98111
RUN apt update && apt install -y libucx0 pipx python3-minimal python3-dev python3-pip python3-venv && \
@@ -116,6 +129,28 @@ FROM runtime
116129

117130
LABEL co.huggingface.vendor="Hugging Face Inc."
118131
LABEL org.opencontainers.image.authors="[email protected]"
132+
LABEL org.opencontainers.title="Text-Generation-Inference TensorRT-LLM Backend"
119133

120134
ENTRYPOINT ["./text-generation-launcher"]
121135
CMD ["--executor-worker", "/usr/local/tgi/bin/executorWorker"]
136+
137+
# This is used only for the CI/CD
138+
FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04 AS ci-runtime
139+
RUN apt update && apt install -y libasan8 libubsan1 libucx0 pipx python3-minimal python3-dev python3-pip python3-venv && \
140+
rm -rf /var/lib/{apt,dpkg,cache,log}/ && \
141+
pipx ensurepath && \
142+
pipx install --include-deps transformers tokenizers
143+
144+
WORKDIR /usr/local/tgi/bin
145+
146+
ENV PATH=/root/.local/share/pipx/venvs/transformers/bin/:$PATH
147+
ENV LD_LIBRARY_PATH="/usr/local/tgi/lib:/usr/local/mpi/lib:/usr/local/tensorrt/lib:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH"
148+
ENV TOKENIZERS_PARALLELISM=false
149+
ENV OMPI_MCA_plm_rsh_agent=""
150+
151+
COPY --from=mpi-builder /usr/local/mpi /usr/local/mpi
152+
COPY --from=trt-builder /usr/local/tensorrt /usr/local/tensorrt
153+
COPY --from=tgi-builder /usr/local/tgi /usr/local/tgi
154+
155+
# Basically we copy from target/debug instead of target/release
156+
COPY --from=tgi-builder /usr/src/text-generation-inference/target/debug/text-generation-backends-trtllm /usr/local/tgi/bin/text-generation-launcher

0 commit comments

Comments
 (0)