vllm-project · adobrzyn · Oct 8, 2025 · Sep 25, 2025 · Sep 29, 2025 · Sep 29, 2025
@@ -15,17 +15,14 @@ Set the following environment variables to avoid OOM/functional issues.  Additio
 
 - `VLLM_ENGINE_ITERATION_TIMEOUT_S=3600`
 - `VLLM_RPC_TIMEOUT=100000`
-- `VLLM_PROMPT_USE_FUSEDSDPA=1`
-- `PT_HPU_ENABLE_LAZY_COLLECTIVES=true`
-- `PT_HPUGRAPH_DISABLE_TENSOR_CACHE=1`
 - `VLLM_ALLOW_LONG_MAX_MODEL_LEN=1`
 
-**32K context length flags examples:**
+## Warmup buckets preparation
+Exponential bucketing mechanism automatically prepares buckets for long context. Linear bucketing mechanism requires manual flags settings.
+
+**32K context length flags examples for linear warmup:**
 
 - `VLLM_GRAPH_RESERVED_MEM`: The value depends on the model and context length settings. Use `VLLM_GRAPH_RESERVED_MEM=0.02` for Llama3.1-8B or `VLLM_GRAPH_RESERVED_MEM=0.1` for Llama3.1-70B.
-- `VLLM_PROMPT_BS_BUCKET_MIN=1`: Suggested value, depends on the model. You can increase it until you reach an OOM error or decrease it if OOM occurs.
-- `VLLM_PROMPT_BS_BUCKET_STEP=16`: Suggested value, depends on the model. Increasing the step value results in fewer buckets. If an OOM error occurs, the value should be increased.
-- `VLLM_PROMPT_BS_BUCKET_MAX=16`: Suggested value, depends on the model.  You can increase it until you reach an OOM error or decrease it if OOM occurs.
 - `VLLM_PROMPT_SEQ_BUCKET_MIN=24576`: Suggested value, depends on warmup results.
 - `VLLM_PROMPT_SEQ_BUCKET_STEP=2048`: Suggested value, depends on warmup results. It is recommended to increase it to a higher value for faster warmup. `VLLM_PROMPT_SEQ_BUCKET_STEP=16384` - Suggested value for Intel Gaudi 3.
 - `VLLM_PROMPT_SEQ_BUCKET_MAX=32768`: Value for context length of 32K. Use 16384 for 16K.