Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/content/docs/checkpointing-logging/meta.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"title": "Checkpointing and Logging",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"title": "Checkpointing and Logging",
"title": "Checkpointing and Observability",

"pages": [
"checkpointing",
"logging"
"logging",
"vllm-metrics"
]
}
72 changes: 72 additions & 0 deletions docs/content/docs/checkpointing-logging/vllm-metrics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: "vLLM Engine Metrics"
---

SkyRL can route vLLM's engine-level metrics (queue depth, KV cache usage,
throughput, latency, prefix-cache hit rate) through Ray's per-node Prometheus
metrics agents. A small fixed subset is also scraped once per training step
and merged into the trainer's wandb payload.

## Enabling

This is **on by default**. To disable it:

```yaml
generator:
inference_engine:
enable_ray_prometheus_stats: false
```

When enabled, vLLM's `RayPrometheusStatLogger` is installed on every engine. Each
engine reports its stats through `ray.util.metrics`, and Ray's per-node
metrics agent exposes them at `http://<node-ip>:<MetricsExportPort>/metrics`
in Prometheus text format. On Anyscale this feeds the hosted Prometheus +
Grafana stack with no extra setup.
Comment on lines +23 to +24
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in Prometheus text format. On Anyscale this feeds the hosted Prometheus +
Grafana stack with no extra setup.
in Prometheus text format.


## Inference path support

| Inference path | Supported |
| ----------------------------------------------- | --------- |
| New inference (`_SKYRL_USE_NEW_INFERENCE=1`, default) | Yes |
| Old inference + `generator.async_engine=true` | Yes |
| Old inference + `generator.async_engine=false` | **No** |

The new inference path ([vllm_server_actor.py:329-339](skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py#L329-L339))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link path skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py is relative to the repository root. In many Markdown renderers (including GitHub), this will be interpreted as relative to the current directory (docs/content/docs/checkpointing-logging/), which will result in a broken link. Consider using a relative path from this file (e.g., ../../../../skyrl/...).

The new inference path ([vllm_server_actor.py:329-339](../../../../skyrl/backends/skyrl_train/inference_servers/vllm_server_actor.py#L329-L339))

always uses `AsyncLLMEngine` and wires the stat logger unconditionally.

The legacy path supports it only when `async_engine=true`
([vllm_engine.py:359-370](skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py#L359-L370)).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link path is relative to the repository root, which may result in a broken link. Consider using a relative path from this file.

([vllm_engine.py:359-370](../../../../skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py#L359-L370)).

The synchronous `VLLMInferenceEngine` pops the flag and emits a warning
([vllm_engine.py:240-247](skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py#L240-L247)):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link path is relative to the repository root, which may result in a broken link. Consider using a relative path from this file.

([vllm_engine.py:240-247](../../../../skyrl/backends/skyrl_train/inference_engines/vllm/vllm_engine.py#L240-L247)):

vLLM's sync `LLM` class doesn't accept `stat_loggers`. Set
`generator.async_engine=true` if you need engine metrics on the legacy path.

## Metrics logged to wandb

When the flag is on, the trainer constructs a `VLLMMetricsScraper`
([trainer.py:122-124](skyrl/train/trainer.py#L122-L124)) that scrapes every
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link path is relative to the repository root, which may result in a broken link. Consider using a relative path from this file.

([trainer.py:122-124](../../../../skyrl/train/trainer.py#L122-L124)) that scrapes every

alive Ray node's metrics endpoint once per training step and merges its
output into the wandb log payload — the same payload used for training
metrics, so the keys appear under whatever logger backend is configured
(`wandb`, `mlflow`, `swanlab`, `tensorboard`, or `console`).

Both `Trainer` and `FullyAsyncTrainer` log these:

| Key | Source | Aggregation |
| ---------------------------------- | ---------------------------- | -------------------------- |
| `vllm/num_requests_running` | gauge | sum across replicas |
| `vllm/num_requests_waiting` | gauge | sum across replicas |
| `vllm/kv_cache_usage_perc` | gauge | mean across replicas |
| `vllm/generation_throughput_tok_s` | counter delta / Δt | summed before differencing |
| `vllm/prompt_throughput_tok_s` | counter delta / Δt | summed before differencing |
| `vllm/prefix_cache_hit_rate` | hits Δ / queries Δ | summed before ratio |
| `vllm/ttft_seconds_avg` | histogram sum Δ / count Δ | summed before ratio |
| `vllm/tpot_seconds_avg` | histogram sum Δ / count Δ | summed before ratio |

Rate- and ratio-style metrics need two consecutive samples to take a delta,
so they appear starting from the **second** training step. Counter resets
(e.g. engine restart) are skipped rather than reported as negative rates.

The full set of vLLM metrics is still available via the Prometheus endpoints
themselves — only this curated subset is forwarded to wandb. The selection
lives in [vllm_metrics_scraper.py:27-51](skyrl/train/utils/vllm_metrics_scraper.py#L27-L51).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link path is relative to the repository root, which may result in a broken link. Consider using a relative path from this file.

lives in [vllm_metrics_scraper.py:27-51](../../../../skyrl/train/utils/vllm_metrics_scraper.py#L27-L51).

Comment on lines +70 to +72
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example for querying KV Cache Residency metrics like Lifetime here?