From ef721810ec7418d203d89c80e029cecd4b263feb Mon Sep 17 00:00:00 2001 From: mhelf-intel Date: Fri, 9 Jan 2026 11:14:17 +0200 Subject: [PATCH 1/2] Update information about the supported models Signed-off-by: mhelf-intel --- README.md | 1 + docs/getting_started/validated_models.md | 43 +++++++++++++----------- docs/release_notes.md | 28 +++++++++++++++ 3 files changed, 52 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 9cc66c622..eb127f0e6 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ vLLM Hardware Plugin for IntelĀ® GaudiĀ® --- *Latest News* šŸ”„ +- [2026/01] Version 0.13.0 is now available, built on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and fully compatible with [IntelĀ® GaudiĀ® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). It introduces experimental dynamic quantization for MatMul and KV‑cache operations to improve performance and also supports additional models. - [2025/11] The 0.11.2 release introduces the production-ready version of the vLLM Hardware Plugin for IntelĀ® GaudiĀ® v1.22.2. The plugin is an alternative to the [vLLM fork](https://github.com/HabanaAI/vllm-fork), which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin. For more information about this release, see the [Release Notes](docs/release_notes.md). - [2025/06] We introduced an early developer preview of the vLLM Hardware Plugin for IntelĀ® GaudiĀ®, which is not yet intended for general use. diff --git a/docs/getting_started/validated_models.md b/docs/getting_started/validated_models.md index 91f840fe0..474748adc 100644 --- a/docs/getting_started/validated_models.md +++ b/docs/getting_started/validated_models.md @@ -7,40 +7,43 @@ The following configurations have been validated to function with IntelĀ® Gaudi | Model | Tensor parallelism [x HPU] | Datatype | Validated AI accelerator | |:--- |:---: |:---: |:---: | -| [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | 1 | BF16, FP8 | Gaudi 2, Gaudi 3| -| [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | 1 | BF16, FP8 | Gaudi 2, Gaudi 3| -| [meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B) | 2, 4, 8 | BF16, FP8 |Gaudi 2, Gaudi 3| -| [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B) | 8 | BF16, FP8 |Gaudi 3| -| [meta-llama/Meta-Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | 4, 8 | BF16, FP8 | Gaudi 3| -| [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) | 4, 8 | BF16 | Gaudi 3| -| [meta-llama/CodeLlama-34b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf) | 1 | BF16 |Gaudi 3| +| [bielik-1.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-1.5B-v3.0-Instruct) | 1 | BF16 | Gaudi 3 | +| [bielik-11b-v2.6-instruct](https://huggingface.co/speakleash/Bielik-11B-v2.6-Instruct) | 2 | BF16 | Gaudi 3 | +| [bielik-4.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-4.5B-v3.0-Instruct) | 1 | BF16 | Gaudi 3 | +| [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | 8 | FP8 | Gaudi 3| | [ibm-granite/granite-8b-code-instruct-4k](https://huggingface.co/ibm-granite/granite-8b-code-instruct-4k) | 1 | BF16 | Gaudi 3| +| [meta-llama/CodeLlama-34b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf) | 1 | BF16 |Gaudi 3| +| [meta-llama/Granite-3.1-8B-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) | 1 | BF16 | Gaudi 3| | [meta-llama/Granite-3B-code-instruct-128k](https://huggingface.co/ibm-granite/granite-3b-code-instruct-128k) | 1 | BF16 | Gaudi 3| | [meta-llama/Granite-8B-code-instruct-128k](https://huggingface.co/ibm-granite/granite-8b-code-instruct-128k) | 1 | BF16 | Gaudi 3| | [meta-llama/Granite-20B-code-instruct-8k](https://huggingface.co/ibm-granite/granite-20b-code-instruct-8k) | 1 | BF16, FP8 | Gaudi 2, Gaudi 3| | [meta-llama/Granite-34B-code-instruc-8k](https://huggingface.co/ibm-granite/granite-34b-code-instruct-8k) | 1 | BF16 | Gaudi 3| -| [meta-llama/Granite-3.1-8B-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) | 1 | BF16 | Gaudi 3| -| [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | 1, 2 | FP8, BF16 |Gaudi 2, Gaudi 3| -| [mistralai/Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1) | 4 | BF16 |Gaudi 3| +| [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) | 4, 8 | BF16 | Gaudi 3| +| [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | 1 | BF16, FP8 | Gaudi 2, Gaudi 3| +| [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | 1 | BF16, FP8 | Gaudi 2, Gaudi 3| +| [meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B) | 2, 4, 8 | BF16, FP8 |Gaudi 2, Gaudi 3| +| [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B) | 8 | BF16, FP8 |Gaudi 3| +| [meta-llama/Meta-Llama-3.3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 4 | BF16, FP8 | Gaudi 3| +| [meta-llama/Meta-Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | 4, 8 | BF16, FP8 | Gaudi 3| | [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) | 1 | BF16 | Gaudi 3| +| [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) | 1 | BF16 | Gaudi 3| | [mistralai/Mistral-Large-Instruct-2407](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407) | 4, 8 | BF16, FP8 | Gaudi 2, Gaudi 3| -| [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | 1, 8 | BF16 | Gaudi 2, Gaudi 3| -| [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | 1 | BF16 |Gaudi 3| -| [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 1, 8 | BF16, FP8 |Gaudi 3| +| [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | 1, 2 | FP8, BF16 |Gaudi 2, Gaudi 3| +| [mistralai/Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1) | 4 | BF16 |Gaudi 3| +| [Qwen/Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) | 8 | BF16 |Gaudi 2| +| [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | 1 | BF16 | Gaudi 3| +| [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | 1 | |Gaudi 3| +| [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 1 | |Gaudi 3| | [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) | 4, 8 | BF16 |Gaudi 3| -| [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 1 | BF16 |Gaudi 3| +| [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) | 1 | |Gaudi 3| +| [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | 1 | BF16 | Gaudi 3 | | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | 4, 8 | BF16, FP8 | Gaudi 2, Gaudi 3| -| [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | 8 | FP8, BF16 |Gaudi 2, Gaudi 3| Validation of the following configurations is currently in progress: | Model | Tensor parallelism [x HPU] | Datatype | Validated AI accelerator | |:--- |:---: |:---: |:---: | -| [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 8 | BF16 |Gaudi 2, Gaudi 3| +| [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | 1, 8 | BF16 | Gaudi 2, Gaudi 3 | | [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | 2, 4, 8 | BF16, FP8 |Gaudi 2, Gaudi 3| | [meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) | 8 | BF16, FP8 |Gaudi 3| -| [meta-llama/Meta-Llama-3.3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 4 | BF16, FP8 | Gaudi 3| -| [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) | 1, 2 | BF16 | Gaudi 2| -| [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | 1, 8 | BF16 | Gaudi 2, Gaudi 3 | | [princeton-nlp/gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) | 1 | BF16 |Gaudi 2, Gaudi 3| -| [Qwen/Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) | 8 | BF16 |Gaudi 2| diff --git a/docs/release_notes.md b/docs/release_notes.md index 4a3a5fc2c..1bb16d326 100644 --- a/docs/release_notes.md +++ b/docs/release_notes.md @@ -2,6 +2,34 @@ This document provides an overview of the features, changes, and fixes introduced in each release of the vLLM Hardware Plugin for IntelĀ® GaudiĀ®. +## 0.13.0 + +This version is based on [vLLM 0.13.0](https://github.com/vllm-project/vllm/releases/tag/v0.13.0) and supports [IntelĀ® GaudiĀ® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). + +The release includes experimental dynamic quantization for MatMul and KV‑cache operations. This feature improves performance, with minimal expected impact on accuracy. To enable the feature, see the [Dynamic Quantization for MatMul and KV‑cache Operations](features/supported_features.md#dynamic-quantization-for-matmul-and-kv-cache-operations) section. + +This release also introduces support for the following models supported on Gaudi 3: + +- [bielik-1.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-1.5B-v3.0-Instruct) +- [bielik-11b-v2.6-instruct](https://huggingface.co/speakleash/Bielik-11B-v2.6-Instruct) +- [bielik-4.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-4.5B-v3.0-Instruct) +- [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) +- [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) +- [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) +- [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) +- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) +- [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) + +Additionally, the following models were successfully validated: + +- [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) +- [meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B) +- [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B) +- [meta-llama/Meta-Llama-3.3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) +- [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) + +For the list of all supported models, see [Validated Models](getting_started/validated_models.md). + ## 0.11.2 This version is based on [vLLM 0.11.2](https://github.com/vllm-project/vllm/releases/tag/v0.11.2) and supports [IntelĀ® GaudiĀ® v1.22.2](https://docs.habana.ai/en/v1.22.2/Release_Notes/GAUDI_Release_Notes.html). From dd351b52dedc66937c4a62abca4b81f20607d30e Mon Sep 17 00:00:00 2001 From: PatrykWo Date: Fri, 9 Jan 2026 13:06:32 +0200 Subject: [PATCH 2/2] Update validated models and release notes to include additional model configurations Signed-off-by: PatrykWo --- docs/getting_started/validated_models.md | 6 +++--- docs/release_notes.md | 6 ++++-- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/getting_started/validated_models.md b/docs/getting_started/validated_models.md index 474748adc..cb64cc072 100644 --- a/docs/getting_started/validated_models.md +++ b/docs/getting_started/validated_models.md @@ -7,8 +7,8 @@ The following configurations have been validated to function with IntelĀ® Gaudi | Model | Tensor parallelism [x HPU] | Datatype | Validated AI accelerator | |:--- |:---: |:---: |:---: | -| [bielik-1.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-1.5B-v3.0-Instruct) | 1 | BF16 | Gaudi 3 | | [bielik-11b-v2.6-instruct](https://huggingface.co/speakleash/Bielik-11B-v2.6-Instruct) | 2 | BF16 | Gaudi 3 | +| [bielik-1.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-1.5B-v3.0-Instruct) | 1 | BF16 | Gaudi 3 | | [bielik-4.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-4.5B-v3.0-Instruct) | 1 | BF16 | Gaudi 3 | | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | 8 | FP8 | Gaudi 3| | [ibm-granite/granite-8b-code-instruct-4k](https://huggingface.co/ibm-granite/granite-8b-code-instruct-4k) | 1 | BF16 | Gaudi 3| @@ -22,7 +22,9 @@ The following configurations have been validated to function with IntelĀ® Gaudi | [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | 1 | BF16, FP8 | Gaudi 2, Gaudi 3| | [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | 1 | BF16, FP8 | Gaudi 2, Gaudi 3| | [meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B) | 2, 4, 8 | BF16, FP8 |Gaudi 2, Gaudi 3| +| [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | 2, 4, 8 | BF16, FP8 |Gaudi 2, Gaudi 3| | [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B) | 8 | BF16, FP8 |Gaudi 3| +| [meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) | 8 | BF16, FP8 |Gaudi 3| | [meta-llama/Meta-Llama-3.3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 4 | BF16, FP8 | Gaudi 3| | [meta-llama/Meta-Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | 4, 8 | BF16, FP8 | Gaudi 3| | [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) | 1 | BF16 | Gaudi 3| @@ -44,6 +46,4 @@ Validation of the following configurations is currently in progress: | Model | Tensor parallelism [x HPU] | Datatype | Validated AI accelerator | |:--- |:---: |:---: |:---: | | [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | 1, 8 | BF16 | Gaudi 2, Gaudi 3 | -| [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | 2, 4, 8 | BF16, FP8 |Gaudi 2, Gaudi 3| -| [meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) | 8 | BF16, FP8 |Gaudi 3| | [princeton-nlp/gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) | 1 | BF16 |Gaudi 2, Gaudi 3| diff --git a/docs/release_notes.md b/docs/release_notes.md index 1bb16d326..157031d40 100644 --- a/docs/release_notes.md +++ b/docs/release_notes.md @@ -10,8 +10,8 @@ The release includes experimental dynamic quantization for MatMul and KV‑cache This release also introduces support for the following models supported on Gaudi 3: -- [bielik-1.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-1.5B-v3.0-Instruct) - [bielik-11b-v2.6-instruct](https://huggingface.co/speakleash/Bielik-11B-v2.6-Instruct) +- [bielik-1.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-1.5B-v3.0-Instruct) - [bielik-4.5b-v3.0-instruct](https://huggingface.co/speakleash/Bielik-4.5B-v3.0-Instruct) - [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) - [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) @@ -24,7 +24,9 @@ Additionally, the following models were successfully validated: - [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) - [meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B) +- [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) - [meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B) +- [meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) - [meta-llama/Meta-Llama-3.3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) - [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) @@ -32,7 +34,7 @@ For the list of all supported models, see [Validated Models](getting_started/val ## 0.11.2 -This version is based on [vLLM 0.11.2](https://github.com/vllm-project/vllm/releases/tag/v0.11.2) and supports [IntelĀ® GaudiĀ® v1.22.2](https://docs.habana.ai/en/v1.22.2/Release_Notes/GAUDI_Release_Notes.html). +This version is based on [vLLM 0.11.2](https://github.com/vllm-project/vllm/releases/tag/v0.11.2) and supports [IntelĀ® GaudiĀ® v1.22.2](https://docs.habana.ai/en/v1.22.2/Release_Notes/GAUDI_Release_Notes.html) and supports [IntelĀ® GaudiĀ® v1.23.0](https://docs.habana.ai/en/v1.23.0/Release_Notes/GAUDI_Release_Notes.html). This release introduces the production-ready vLLM Hardware Plugin for IntelĀ® GaudiĀ®, a community-driven integration layer based on the [vLLM v1 architecture](https://blog.vllm.ai/2025/01/27/v1-alpha-release.html). It enables efficient, high-performance large language model (LLM) inference on [IntelĀ® GaudiĀ®](https://docs.habana.ai/) AI accelerators. The plugin is an alternative to the [vLLM fork](https://github.com/HabanaAI/vllm-fork), which reaches end of life with this release and will be deprecated in v1.24.0, remaining functional only for legacy use cases. We strongly encourage all fork users to begin planning their migration to the plugin.