Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,9 @@ disagg = [
"nixl-cu12 ; platform_machine == 'x86_64'",
"vllm-router ; platform_machine == 'x86_64'",
]
tui = [
"vllm-metrics-tui",
]
quack = [
"quack-kernels>=0.3.3",
]
Expand Down Expand Up @@ -127,6 +130,7 @@ transformers = { git = "https://github.com/huggingface/transformers.git", rev =
flash-attn-4 = { git = "https://github.com/Dao-AILab/flash-attention.git", subdirectory = "flash_attn/cute", rev = "abd9943b" }
pydantic-config = { git = "https://github.com/samsja/pydantic_config.git", branch = "main" }
vllm-router = { url = "https://github.com/PrimeIntellect-ai/router/releases/download/v0.1.14/vllm_router-0.1.14-cp38-abi3-linux_x86_64.whl" }
vllm-metrics-tui = { git = "https://github.com/samsja/vllm-metrics-tui.git", branch = "master" }
reverse-text = { index = "primeintellect" }
alphabet-sort = { index = "primeintellect" }
wiki-search = { index = "primeintellect" }
Expand Down
35 changes: 35 additions & 0 deletions scripts/tmux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,41 @@ tmux send-keys -t "$SESSION_NAME:Claude" \
You are running inside tmux session \"${SESSION_NAME}\". The Launcher window (window 0) is where the user runs launch commands. You can read its contents with: tmux capture-pane -t ${SESSION_NAME}:Launcher -p
Help the user monitor and debug this run.'" C-m

# Window 3: vLLM Metrics TUI (if installed)
# Discovers inference server URLs from log filenames in the output directory.
# Node logs are named node_N.log; port is inferred from the ROLE line in each log.
if command -v vllm-metrics-tui &>/dev/null; then
METRICS_URLS=""
INFERENCE_LOG_DIR="${LOG_DIR}/inference"
if [[ -d "$INFERENCE_LOG_DIR" ]]; then
for node_log in "$INFERENCE_LOG_DIR"/node_*.log; do
[[ -f "$node_log" ]] || continue
# Extract host:port from the "Starting inference on http://..." line
url=$(grep -m1 "Starting inference on" "$node_log" 2>/dev/null | grep -oP 'http://[^ ]+' | sed 's|/v1||')
if [[ -n "$url" ]]; then
# Replace 0.0.0.0 with the hostname from orchestrator connection logs
if [[ "$url" == *"0.0.0.0"* ]]; then
port=$(echo "$url" | grep -oP ':\d+$')
# Try to find the real hostname from orchestrator logs
host=$(grep -oP "ltc-[a-z0-9-]+${port}" "${LOG_DIR}/orchestrator.log" 2>/dev/null | head -1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hostname resolution returns same host for all nodes

Medium Severity

In multi-node inference where all servers bind to 0.0.0.0 on the same default port (8000), the grep … | head -1 on each iteration of the for node_log loop always returns the same first matching hostname from the orchestrator log. This means every node resolves to the same URL, so METRICS_URLS contains duplicates of a single server rather than distinct URLs for each inference node. The grep has no per-node correlation — it needs to either track already-matched hosts or use node-specific identifiers.

Fix in Cursor Fix in Web

if [[ -n "$host" ]]; then
url="http://${host}"
else
continue
fi
fi
METRICS_URLS="$METRICS_URLS $url"
fi
done
fi

if [[ -n "$METRICS_URLS" ]]; then
tmux new-window -t "$SESSION_NAME" -n "Metrics"
tmux send-keys -t "$SESSION_NAME:Metrics" \
"vllm-metrics-tui $METRICS_URLS" C-m
fi
fi

# Pane title styling
tmux set-option -t "$SESSION_NAME" -g pane-border-status top
tmux set-option -t "$SESSION_NAME" -g pane-border-format " #{pane_title} "
Expand Down
Loading