-
Notifications
You must be signed in to change notification settings - Fork 119
[Issue #197] Attempt to Reproduce ValueError: Cannot get 31 free blocks from the pool #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cui36
wants to merge
6
commits into
main
Choose a base branch
from
fix-block-issue
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
a6998ae
Add max-num-batched-tokens control
cui36 7ad08d1
Merge branch 'main' into fix-block-issue
cui36 5c9cbaa
working on issue-197
cui36 49a2857
Remove .claude and ignore it
cui36 9725bfc
try with llama3-8B
cui36 39cb7c2
adjust config to the issue
cui36 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| { | ||
| "permissions": { | ||
| "allow": [ | ||
| "Bash(python:*)", | ||
| "Read(//root/.cache/vllm/**)" | ||
| ], | ||
| "deny": [], | ||
| "ask": [] | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # Configuration | ||
| # Adjust num_prompts as needed (leave empty to use default calculation) | ||
| num_prompts=2000 | ||
| prompt_len=4096 # Default prompt length | ||
|
|
||
| for max_rps in 1; do | ||
| for completion_len in 5; do | ||
| # for max_rps in 1 2 3 4 5 6 7 8 9 10 15 20 25 30 40; do | ||
| # for completion_len in 64 128 256; do | ||
| ./run_benchmark_fixed_rate.sh $max_rps $completion_len $num_prompts "" "" "" $prompt_len | ||
| done | ||
| done |
130 changes: 130 additions & 0 deletions
130
benchmarks/bench_latency_benefit/run_benchmark_fixed_rate.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,130 @@ | ||
| #!/bin/bash | ||
| set -ex | ||
|
|
||
| # Usage: ./run_benchmark_fixed_rate.sh <FIXED_RPS> <COMPLETION_LEN> [NUM_PROMPTS] [DURATION] [MODEL_DELAY] [BURSTINESS] [PROMPT_LEN] | ||
| # Example: ./run_benchmark_fixed_rate.sh 12 256 720 60 30 10.0 4096 | ||
| # | ||
| # Parameters: | ||
| # FIXED_RPS - Fixed request rate (requests per second) | ||
| # COMPLETION_LEN - Completion length | ||
| # NUM_PROMPTS - Total number of prompts (optional, will calculate from FIXED_RPS * DURATION if not provided) | ||
| # DURATION - Duration in seconds (default: 30) | ||
| # MODEL_DELAY - Delay between models in seconds (default: DURATION) | ||
| # BURSTINESS - Higher values = more uniform timing (default: 10000.0) | ||
| # Use high values like 10-100 for near-constant intervals | ||
| # PROMPT_LEN - Prompt length (default: 4096) | ||
|
|
||
| # Set environment variables | ||
| export KVCACHED_IPC_NAME=VLLM | ||
|
|
||
| # Add vLLM benchmarks and kvcached to Python path | ||
| export PYTHONPATH="../../engine_integration/vllm-v0.9.2/benchmarks:../../:../../benchmarks:$PYTHONPATH" | ||
|
|
||
| # Benchmark parameters | ||
| PROMPT_LEN=${7:-4096} | ||
| COMPLETION_LEN=$2 | ||
| BACKEND="vllm" | ||
| # Fixed request rate parameters | ||
| FIXED_RPS=$1 # Fixed request rate (requests per second) | ||
| DURATION=${4:-0} # Duration in seconds (default: 30s) | ||
| BURSTINESS=${6:-10000.0} # Higher burstiness for more uniform requests (default: 10000.0) | ||
|
|
||
| # Calculate total number of requests | ||
| if [ -n "$3" ]; then | ||
| NUM_PROMPTS=$3 | ||
| echo "Using provided NUM_PROMPTS: $NUM_PROMPTS" | ||
| else | ||
| NUM_PROMPTS=$((FIXED_RPS * DURATION)) | ||
| echo "Calculated NUM_PROMPTS: $NUM_PROMPTS (fixed rate: ${FIXED_RPS} RPS for ${DURATION}s)" | ||
| fi | ||
|
|
||
| mkdir -p results results/metrics | ||
|
|
||
| # Define models and their configurations | ||
| MODELS=( | ||
| "meta-llama/Llama-3.1-8B-Instruct:12346" | ||
| "meta-llama/Llama-3.1-8B-Instruct:30000" | ||
| "meta-llama/Llama-3.1-8B-Instruct:40000" | ||
| ) | ||
| NUM_MODELS=${#MODELS[@]} | ||
|
|
||
| # Record unified start time | ||
| UNIFIED_START_TIME=$(date +%s.%N) | ||
| echo "Unified benchmark start time: $UNIFIED_START_TIME" | ||
|
|
||
| # Model delay (can be adjusted if needed) | ||
| MODEL_DELAY=${5:-$DURATION} # Delay in seconds before starting next model (default: DURATION) | ||
|
|
||
| # Arrays to store PIDs and result files | ||
| PIDS=() | ||
| RESULT_FILES=() | ||
|
|
||
| # Run benchmarks for each model | ||
| for i in "${!MODELS[@]}"; do | ||
| # Parse model and port | ||
| MODEL=$(echo "${MODELS[$i]}" | cut -d':' -f1) | ||
| PORT=$(echo "${MODELS[$i]}" | cut -d':' -f2) | ||
|
|
||
| # Generate model name and result file | ||
| MODEL_NAME=$(echo "$MODEL" | tr '/' '-') | ||
| MODEL_INDEX=$((i + 1)) | ||
|
|
||
| # Generate result file name for fixed rate strategy | ||
| RESULT_FILE="results/metrics/${BACKEND}-${MODEL_NAME}-fixed-rate-${FIXED_RPS}rps-duration-${DURATION}s-burstiness-${BURSTINESS}-prompt_${PROMPT_LEN}-completion_${COMPLETION_LEN}-${MODEL_INDEX}-delay-${MODEL_DELAY}-model-num-${NUM_MODELS}-num-prompt-${NUM_PROMPTS}.json" | ||
|
|
||
| # Add delay before starting next model (except for the first one) | ||
| if [ $i -gt 0 ] && [ "$MODEL_DELAY" -gt 0 ]; then | ||
| echo "Waiting ${MODEL_DELAY} seconds before starting Model ${MODEL_INDEX}..." | ||
| sleep $MODEL_DELAY | ||
| fi | ||
|
|
||
| echo "Starting benchmark for $MODEL (Model ${MODEL_INDEX}) on port $PORT..." | ||
|
|
||
| # Use fixed rate strategy | ||
| echo "Using fixed rate strategy: ${FIXED_RPS} RPS for ${DURATION} seconds (burstiness: ${BURSTINESS})" | ||
|
|
||
| python bench_kvcached_vllm.py \ | ||
| --backend "$BACKEND" \ | ||
| --model "$MODEL" \ | ||
| --dataset-name random \ | ||
| --random-input-len "$PROMPT_LEN" \ | ||
| --random-output-len "$COMPLETION_LEN" \ | ||
| --num-prompts "$NUM_PROMPTS" \ | ||
| --host "localhost" \ | ||
| --port "$PORT" \ | ||
| --endpoint "/v1/completions" \ | ||
| --save-result \ | ||
| --result-filename "$RESULT_FILE" \ | ||
| --metadata "unified_start_time=$UNIFIED_START_TIME" \ | ||
| --request-rate "$FIXED_RPS" \ | ||
| --burstiness "$BURSTINESS" & | ||
|
|
||
| # Store PID and result file | ||
| PIDS+=($!) | ||
| RESULT_FILES+=("$RESULT_FILE") | ||
|
|
||
| echo "Started Model ${MODEL_INDEX} with PID ${PIDS[$i]}" | ||
| done | ||
|
|
||
| # Wait for all benchmarks to complete | ||
| echo "Waiting for all benchmarks to complete..." | ||
| EXIT_CODES=() | ||
|
|
||
| for i in "${!PIDS[@]}"; do | ||
| wait ${PIDS[$i]} | ||
| EXIT_CODE=$? | ||
| EXIT_CODES+=($EXIT_CODE) | ||
| echo "Model $((i + 1)) benchmark exit code: $EXIT_CODE" | ||
| done | ||
|
|
||
| echo "All benchmarks completed!" | ||
| echo "Results saved to:" | ||
| for result_file in "${RESULT_FILES[@]}"; do | ||
| echo " - $result_file" | ||
| done | ||
|
|
||
| # Summary of exit codes | ||
| echo "Exit code summary:" | ||
| for i in "${!EXIT_CODES[@]}"; do | ||
| echo " Model $((i + 1)): ${EXIT_CODES[$i]}" | ||
| done |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.