Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
8159ca7
first draft
tukwila Sep 15, 2025
0701389
Add formatting to json file with metrics
psydok Sep 25, 2025
f8f6f9d
Container CI bugfix and disable dry-run on image cleaner (#379)
sjmonson Sep 30, 2025
730eeb1
Initial state for datasets rework to enable multimodal and more compl…
markurtz Oct 1, 2025
c32896c
Merge branch 'main' into add_json_formatiing
markurtz Oct 1, 2025
d1297fe
Merge branch 'main' into example_simulator
markurtz Oct 1, 2025
ad25e06
Add formatting to json file with metrics (#372)
markurtz Oct 1, 2025
2c0d993
Merge branch 'main' into example_simulator
markurtz Oct 1, 2025
5c9982a
first benchark testing example (#328)
markurtz Oct 1, 2025
b1b1b78
update default build values to use versioned builds (#310)
DaltheCow Oct 1, 2025
108a657
update tpot to itl in labels and code use (#386)
DaltheCow Oct 3, 2025
dd7a4b8
[GuideLLM Refactor] Advanced Prefix Cache Controls (#382)
sjmonson Oct 3, 2025
000b39e
Fix for container rc tag (#389)
sjmonson Oct 3, 2025
bbca65a
Simplifications for new data pathways and reenablement of completions…
markurtz Oct 7, 2025
616ef92
Fix audio pathways so requests work
markurtz Oct 8, 2025
90a05ab
Fix for container rc tag (Attempt 2) (#398)
sjmonson Oct 9, 2025
81af01b
Fix the failing CI again (#400)
sjmonson Oct 9, 2025
a24a22d
Fix typo in CI (#401)
sjmonson Oct 9, 2025
87ba006
Fixed quality errors
jaredoconnell Oct 9, 2025
1bd8846
Run auto-formatter
jaredoconnell Oct 9, 2025
1e8974c
Fix remaining ruff errors
jaredoconnell Oct 10, 2025
121dcdc
Configurable max_tokens/max_completion_tokens key (#399)
sjmonson Oct 10, 2025
d0dad5a
Fix unit tests
jaredoconnell Oct 10, 2025
8f6fdfa
Features/refactor/quality fixes (#402)
jaredoconnell Oct 10, 2025
f862943
Finalize general refactor implementation for data pathways and enabli…
markurtz Oct 10, 2025
b243664
Move asyncio timeout to common location
sjmonson Oct 10, 2025
cfcbd13
Fix duplicate timeout in openai backend tests
sjmonson Oct 10, 2025
9ca2dba
Force time zone in tests
jaredoconnell Oct 10, 2025
8d20525
Fix function doc
sjmonson Oct 10, 2025
f9fb29c
[GuideLLM Refactor] Fixes for asyncio and timezone tests (#405)
sjmonson Oct 10, 2025
c65f37f
Merge branch 'features/refactor/base' into features/refactor/multimod…
markurtz Oct 13, 2025
3c646d4
runnable state for multi modal refactor
markurtz Oct 13, 2025
2b0fef8
Update src/guidellm/data/deserializers/synthetic.py
markurtz Oct 13, 2025
24f2ca3
Update src/guidellm/scheduler/worker_group.py
markurtz Oct 13, 2025
ef36af1
Fixes from review
markurtz Oct 13, 2025
fcc1114
Features/add tooltip to line chart (#392)
DaltheCow Oct 13, 2025
b0becd5
Reenablement of flows and fixes
markurtz Oct 14, 2025
6adf793
Update src/guidellm/backends/openai.py
markurtz Oct 14, 2025
687b52f
Updates from review for multi modal data
markurtz Oct 14, 2025
b162fb3
Revert "Features/add tooltip to line chart" (#409)
jaredoconnell Oct 14, 2025
16f981c
Merge branch 'features/refactor/base' into features/refactor/multimod…
markurtz Oct 14, 2025
4e142ba
[GuideLLM Refactor] Data pipelines rework and multimodal support (#384)
markurtz Oct 14, 2025
b524e5d
Merge branch 'features/refactor/base' into features/refactor/multimod…
markurtz Oct 14, 2025
c27d488
Missed updates from review that were included in multimodal merge for…
markurtz Oct 14, 2025
e95007d
Fixes for constant rate benchmarking race condition, simplfications, …
markurtz Oct 15, 2025
91f79b7
Propagate valid failures from HuggingFace datasets loading (ones that…
markurtz Oct 15, 2025
5f4a731
Fixes from review
markurtz Oct 15, 2025
eb84935
Merge branch 'features/refactor/constant_rate_fixes' into features/re…
markurtz Oct 15, 2025
035ae24
[GuideLLM Refactor] Propagate valid failures from HuggingFace dataset…
markurtz Oct 15, 2025
060343e
[GuideLLM Refactor] Fixes for constant rate and other minor bugs for …
markurtz Oct 15, 2025
57683a2
[GuideLLM Refactor] Reenablement of scenarios and fixes for benchmark…
markurtz Oct 16, 2025
a401165
Replace pydub, librosa, and soundfile with torchcodec
sjmonson Oct 14, 2025
23d65ed
Add all group for extras
sjmonson Oct 14, 2025
5a768f8
Fix lock
sjmonson Oct 14, 2025
ec7071b
Rewrite encode_audio to use torchcodec
sjmonson Oct 14, 2025
aee230c
Dump raw bytes not tensor
sjmonson Oct 15, 2025
c1340b4
Code pathway cleanup
sjmonson Oct 15, 2025
c8e9ff9
Defer multimodal imports
sjmonson Oct 15, 2025
6d036f8
Apply copliot fixes
sjmonson Oct 16, 2025
cf5a2e3
Bump torchcodec verison
sjmonson Oct 16, 2025
a47c3c3
[GuideLLM Refactor] Replace librosa, pydub, and soundfile with torchc…
sjmonson Oct 17, 2025
ad192cb
Add tox lockfile updater
sjmonson Oct 17, 2025
b65c6ab
Allow arguments for tox type checks
sjmonson Oct 17, 2025
5b38f40
Fix mock server type errors
jaredoconnell Oct 14, 2025
cb36c6a
More type fixes
jaredoconnell Oct 16, 2025
1ffe53a
Address utility and presentation type errors
jaredoconnell Oct 17, 2025
48769c2
Fix type errors in extras
jaredoconnell Oct 17, 2025
c3fdf88
[GuideLLM Refactor] Add tox command to update lock file (#415)
sjmonson Oct 17, 2025
fbd417f
[GuideLLM Refactor] Type fixes (#417)
sjmonson Oct 17, 2025
f9af34d
Merge branch 'main' into features/refactor/base
sjmonson Oct 17, 2025
e787cc1
Full refactor of GuideLLM (#351)
sjmonson Oct 17, 2025
af6b6b8
Split multimodal group into vision and audio
sjmonson Oct 17, 2025
c245fbd
Split multimodal group into vision and audio (#419)
sjmonson Oct 17, 2025
e0fc2e5
Ensure all optional dependicies are in container
sjmonson Oct 17, 2025
8bba4df
Add some nice utlities to the image
sjmonson Oct 17, 2025
58665b6
Add ffmpeg for audio
sjmonson Oct 17, 2025
9669983
Adapt container for new optional requirements (#420)
sjmonson Oct 17, 2025
9e2b7de
Generate synthetic data as multi-turn
sjmonson Oct 20, 2025
fad9e9c
Hack multiturn into dataset formatters
sjmonson Oct 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions .github/workflows/container-maintenance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,22 @@ on:
concurrency:
group: ${{ github.workflow }}

permissions:
packages: write

jobs:
cleanup-container-tags:
runs-on: ubuntu-latest
steps:
- name: Delete PR and untagged images older than 2 weeks
uses: snok/[email protected]
with:
account: ${{ github.actor }}
account: ${{ github.repository_owner }}
token: ${{ github.token }}
image-names: ${{ github.event.repository.name }}
image-tags: "pr-*"
cut-off: 2w
dry-run: true
dry-run: false

push-container-tags:
runs-on: ubuntu-latest
Expand All @@ -31,19 +34,20 @@ jobs:
- name: Log into ghcr.io
uses: redhat-actions/podman-login@v1
with:
username: ${{ github.actor }}
username: ${{ github.repository_owner }}
password: ${{ github.token }}
registry: ghcr.io/${{ github.repository_owner }}
- name: Get list of tags
run: |
skopeo list-tags docker://${{ github.repository }} | jq --raw-output '.Tags[]' > tags
set -euo pipefail # Fail pipe if any command fails
skopeo list-tags docker://ghcr.io/${{ github.repository }} | jq --raw-output '.Tags[]' > tags
- name: Get latest release and rc tags
run: |
STABLE_TAG="$(grep -P '^v\d+\.\d+\.\d+$' tags | sort -rV | head -n1)"
echo "STABLE_TAG=${STABLE_TAG:-v0.0.0}" >> $GITHUB_ENV
echo "stable_tag=${STABLE_TAG:-v0.0.0}" >> $GITHUB_ENV
LATEST_TAG="$(grep -P '^v\d+\.\d+\.\d+' tags | sort -rV | head -n1)"
echo "LATEST_TAG=${LATEST_TAG:-v0.0.0}" >> $GITHUB_ENV
echo "latest_tag=${LATEST_TAG:-v0.0.0}" >> $GITHUB_ENV
- name: Update latest and stable tags
run: |
skopeo copy docker://${{ github.repository }}:${{ env.stable_tag }} docker://${{ github.repository }}:stable
skopeo copy docker://${{ github.repository }}:${{ env.latest_tag }} docker://${{ github.repository }}:latest
skopeo copy docker://ghcr.io/${{ github.repository }}:${{ env.stable_tag }} docker://ghcr.io/${{ github.repository }}:stable
skopeo copy docker://ghcr.io/${{ github.repository }}:${{ env.latest_tag }} docker://ghcr.io/${{ github.repository }}:latest
9 changes: 7 additions & 2 deletions .github/workflows/release-candidate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ jobs:
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./ui/out
publish_dir: .src/ui/out
destination_dir: ui/release/${TAG}
keep_files: false
user_name: ${{ github.actor }}
Expand Down Expand Up @@ -298,7 +298,12 @@ jobs:
with:
fetch-depth: 0
- name: Get version from branch
run: echo "PACKAGE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
run: |
echo "package_version=${GITHUB_REF#refs/heads/release/}" >> $GITHUB_ENV
- name: Fail if version is unset
if: ${{ env.package_version == '' }}
run: |
exit 1
- name: Buildah build
id: build-image
uses: redhat-actions/buildah-build@v2
Expand Down
9 changes: 7 additions & 2 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ jobs:
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./ui/out
publish_dir: ./src/ui/out
destination_dir: ui/${TAG}
keep_files: false
user_name: ${{ github.actor }}
Expand Down Expand Up @@ -297,7 +297,12 @@ jobs:
with:
fetch-depth: 0
- name: Get version from branch
run: echo "PACKAGE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
run: |
echo "package_version=${GITHUB_REF#refs/tags/}" >> $GITHUB_ENV
- name: Fail if version is unset
if: ${{ env.package_version == '' }}
run: |
exit 1
- name: Buildah build
id: build-image
uses: redhat-actions/buildah-build@v2
Expand Down
17 changes: 13 additions & 4 deletions Containerfile
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,28 @@ COPY / /src

# Install guidellm and locked dependencies
RUN pdm use -p /src -f /opt/app-root \
&& pdm install -p /src --check --prod --no-editable
&& pdm install -p /src -G all --check --prod --no-editable

# Prod image
FROM $BASE_IMAGE

# Switch to root for installing packages
USER root

# Install some helpful utilities and deps
RUN dnf install -y --setopt=install_weak_deps=False \
vi tar rsync ffmpeg-free \
&& dnf clean all

# Switch back to unpriv user
# Root group for k8s
USER 1001:0

# Add guidellm bin to PATH
# Argument defaults can be set with GUIDELLM_<ARG>
ENV HOME="/home/guidellm" \
GUIDELLM_OUTPUT_PATH="/results/benchmarks.json"

# Make sure root is the primary group
USER 1001:0

# Create the user home dir
WORKDIR $HOME

Expand Down
Binary file added docs/assets/sample-output1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/sample-output2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/sample-output3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
117 changes: 117 additions & 0 deletions docs/examples/practice_on_vllm_simulator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# GuideLLM Benchmark Testing Best Practice

Do first easy-go guidellm benchmark testing from scratch using vLLM Simulator.

## Getting Started

### 📦 1. Benchmark Testing Environment Setup

#### 1.1 Create a Conda Environment (recommended)

```bash
conda create -n guidellm-bench python=3.11 -y
conda activate guidellm-bench
```

#### 1.2 Install Dependencies

```bash
git clone https://github.com/vllm-project/guidellm.git
cd guidellm
pip install guidellm
```

For more detailed instructions, refer to [GuideLLM README](https://github.com/vllm-project/guidellm/blob/main/README.md).

#### 1.3 Verify Installation

```bash
guidellm --help
```

#### 1.4 Startup OpenAI-compatible API in vLLM simulator docker container

```bash
docker pull ghcr.io/llm-d/llm-d-inference-sim:v0.4.0

docker run --rm --publish 8000:8000 \
ghcr.io/llm-d/llm-d-inference-sim:v0.4.0 \
--port 8000 \
--model "Qwen/Qwen2.5-1.5B-Instruct" \
--lora-modules '{"name":"tweet-summary-0"}' '{"name":"tweet-summary-1"}'
```

For more detailed instructions, refer to: [vLLM Simulator](https://llm-d.ai/docs/architecture/Components/inference-sim)

Docker image versions: [Docker Images](https://github.com/llm-d/llm-d-inference-sim/pkgs/container/llm-d-inference-sim)

Check open-ai api working via curl:

- check /v1/models

```bash
curl --request GET 'http://localhost:8000/v1/models'
```

- check /v1/chat/completions

```bash
curl --request POST 'http://localhost:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "tweet-summary-0",
"stream": false,
"messages": [{"role": "user", "content": "Say this is a test!"}]
}'
```

- check /v1/completions

```bash
curl --request POST 'http://localhost:8000/v1/completions' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "tweet-summary-0",
"stream": false,
"prompt": "Say this is a test!",
"max_tokens": 128
}'
```

#### 1.5 Download Tokenizer

Download Qwen/Qwen2.5-1.5B-Instruct tokenizer files from [Qwen/Qwen2.5-1.5B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-1.5B-Instruct/files) save to local path such as ${local_path}/Qwen2.5-1.5B-Instruct

```bash
ls ./Qwen2.5-1.5B-Instruct
merges.txt tokenizer.json tokenizer_config.json vocab.json
```

______________________________________________________________________

## 🚀 2. Running Benchmarks

```bash
guidellm benchmark \
--target "http://localhost:8000/" \
--model "tweet-summary-0" \
--processor "${local_path}/Qwen2.5-1.5B-Instruct" \
--rate-type sweep \
--max-seconds 10 \
--max-requests 10 \
--data "prompt_tokens=128,output_tokens=56"
```

______________________________________________________________________

## 📊 3. Results Interpretation

![alt text](../assets/sample-output1.png) ![alt text](../assets/sample-output2.png) ![alt text](../assets/sample-output3.png)

After the benchmark completes, key results are clear and straightforward, such as:

- **`TTFT`**: Time to First Token
- **`TPOT`**: Time Per Output Token
- **`ITL`**: Inter-Token Latency

The first benchmark test complete.
Loading