Skip to content

Latest commit

 

History

History
105 lines (78 loc) · 4.08 KB

File metadata and controls

105 lines (78 loc) · 4.08 KB

Recommended Installation Method

Triton SDK Container

The recommended way to access Perf Analyzer is to run the pre-built executable from within the Triton SDK docker container available on the NVIDIA GPU Cloud Catalog. As long as the SDK container has its network exposed to the address and port of the inference server, Perf Analyzer will be able to run.

export RELEASE=<yy.mm> # e.g. to use the release from the end of December of 2024, do `export RELEASE=24.12`

docker run --rm --gpus=all -it --net=host nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk

# inside container
perf_analyzer -m <model>

Alternative Installation Methods

pip

pip install perf-analyzer

perf_analyzer -m <model>

Warning: If any runtime dependencies are missing, Perf Analyzer will produce errors showing which ones are missing. You will need to manually install them.

Build from Source

docker run --rm --gpus all -it --network host ubuntu:24.04

# inside container, install build/runtime dependencies
apt update && apt install -y curl

curl -LsSf https://apt.kitware.com/kitware-archive.sh | sh

CMAKE_VERSION_FULL=$(apt-cache madison cmake | awk '/3.31.8/ {print $3; exit}')

apt update && DEBIAN_FRONTEND=noninteractive apt install -y cmake=${CMAKE_VERSION_FULL} cmake-data=${CMAKE_VERSION_FULL} g++ git libssl-dev nvidia-cuda-toolkit python3 rapidjson-dev zlib1g-dev

git clone --depth 1 https://github.com/triton-inference-server/perf_analyzer.git

mkdir perf_analyzer/build

cmake -B perf_analyzer/build -S perf_analyzer

cmake --build perf_analyzer/build --parallel 8

export PATH=$(pwd)/perf_analyzer/build/perf_analyzer/src/perf-analyzer-build${PATH:+:${PATH}}

perf_analyzer -m <model>
  • To enable OpenAI mode, add -D TRITON_ENABLE_PERF_ANALYZER_OPENAI=ON to the first cmake command.
  • To enable C API mode, add -D TRITON_ENABLE_PERF_ANALYZER_C_API=ON to the first cmake command.
  • To enable TorchServe backend, add -D TRITON_ENABLE_PERF_ANALYZER_TS=ON to the first cmake command.
  • To enable Tensorflow Serving backend, add -D TRITON_ENABLE_PERF_ANALYZER_TFS=ON to the first cmake command.
  • To disable CUDA shared memory support and the dependency on CUDA toolkit libraries, add -D TRITON_ENABLE_GPU=OFF to the first cmake command.