Skip to content

Commit be7511b

Browse files
Update Qeff Documentation to indicate vLLM Support in Validated Models Page (#588)
Signed-off-by: Varun Gupta <[email protected]> Co-authored-by: Abhishek Kumar Singh <[email protected]>
1 parent b2dd328 commit be7511b

File tree

1 file changed

+38
-46
lines changed

1 file changed

+38
-46
lines changed

docs/source/validate.md

Lines changed: 38 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -4,21 +4,21 @@
44
## Text-only Language Models
55

66
### Text Generation Task
7-
**QEff Auto Class:** [`QEFFAutoModelForCausalLM`](#QEFFAutoModelForCausalLM)
7+
**QEff Auto Class:** `QEFFAutoModelForCausalLM`
88

9-
| Architecture | Model Family | Representative Models | CB Support |
10-
|-------------------------|--------------------|--------------------------------------------------------------------------------------|------------|
11-
| **FalconForCausalLM** | Falcon | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ |
12-
| **Qwen3MoeForCausalLM** | Qwen3Moe | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | ✔️ |
9+
| Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) |
10+
|-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------|
11+
| **FalconForCausalLM** | Falcon** | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ |
12+
| **Qwen3MoeForCausalLM** | Qwen3Moe | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | |
1313
| **GemmaForCausalLM** | CodeGemma | [google/codegemma-2b](https://huggingface.co/google/codegemma-2b)<br>[google/codegemma-7b](https://huggingface.co/google/codegemma-7b) | ✔️ |
14-
| | Gemma | [google/gemma-2b](https://huggingface.co/google/gemma-2b)<br>[google/gemma-7b](https://huggingface.co/google/gemma-7b)<br>[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)<br>[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)<br>[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) | ✔️ |
14+
| | Gemma*** | [google/gemma-2b](https://huggingface.co/google/gemma-2b)<br>[google/gemma-7b](https://huggingface.co/google/gemma-7b)<br>[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)<br>[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)<br>[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) | ✔️ |
1515
| **GPTBigCodeForCausalLM** | Starcoder1.5 | [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) | ✔️ |
1616
| | Starcoder2 | [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b) | ✔️ |
1717
| **GPTJForCausalLM** | GPT-J | [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | ✔️ |
1818
| **GPT2LMHeadModel** | GPT-2 | [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) | ✔️ |
1919
| **GraniteForCausalLM** | Granite 3.1 | [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct)<br>[ibm-granite/granite-guardian-3.1-8b](https://huggingface.co/ibm-granite/granite-guardian-3.1-8b) | ✔️ |
2020
| | Granite 20B | [ibm-granite/granite-20b-code-base-8k](https://huggingface.co/ibm-granite/granite-20b-code-base-8k)<br>[ibm-granite/granite-20b-code-instruct-8k](https://huggingface.co/ibm-granite/granite-20b-code-instruct-8k) | ✔️ |
21-
| **InternVLChatModel** | Intern-VL | [OpenGVLab/InternVL2_5-1B](https://huggingface.co/OpenGVLab/InternVL2_5-1B) | |
21+
| **InternVLChatModel** | Intern-VL | [OpenGVLab/InternVL2_5-1B](https://huggingface.co/OpenGVLab/InternVL2_5-1B) | ✔️ | | |
2222
| **LlamaForCausalLM** | CodeLlama | [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)<br>[codellama/CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf)<br>[codellama/CodeLlama-34b-hf](https://huggingface.co/codellama/CodeLlama-34b-hf) | ✔️ |
2323
| | DeepSeek-R1-Distill-Llama | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | ✔️ |
2424
| | InceptionAI-Adapted | [inceptionai/jais-adapted-7b](https://huggingface.co/inceptionai/jais-adapted-7b)<br>[inceptionai/jais-adapted-13b-chat](https://huggingface.co/inceptionai/jais-adapted-13b-chat)<br>[inceptionai/jais-adapted-70b](https://huggingface.co/inceptionai/jais-adapted-70b) | ✔️ |
@@ -31,45 +31,42 @@
3131
| **MistralForCausalLM** | Mistral | [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) | ✔️ |
3232
| **MixtralForCausalLM** | Codestral<br>Mixtral | [mistralai/Codestral-22B-v0.1](https://huggingface.co/mistralai/Codestral-22B-v0.1)<br>[mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | ✔️ |
3333
| **MPTForCausalLM** | MPT | [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | ✔️ |
34-
| **Phi3ForCausalLM** | Phi-3, Phi-3.5 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | ✔️ |
34+
| **Phi3ForCausalLM** | Phi-3**, Phi-3.5** | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | ✔️ |
3535
| **QwenForCausalLM** | DeepSeek-R1-Distill-Qwen | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ✔️ |
3636
| | Qwen2, Qwen2.5 | [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) | ✔️ |
3737
| **LlamaSwiftKVForCausalLM** | swiftkv | [Snowflake/Llama-3.1-SwiftKV-8B-Instruct](https://huggingface.co/Snowflake/Llama-3.1-SwiftKV-8B-Instruct) | ✔️ |
38-
| **Grok1ModelForCausalLM** | grok-1 | [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1) | ✔️ |
39-
40-
---
41-
38+
| **Grok1ModelForCausalLM** | grok-1 | [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1) ||
39+
- ** set "trust-remote-code" flag to True for e2e inference with vLLM
40+
- *** pass "disable-sliding-window" flag for e2e inference of Gemma-2 family of models with vLLM
4241
## Embedding Models
4342

4443
### Text Embedding Task
45-
**QEff Auto Class:** [`QEFFAutoModel`](#QEFFAutoModel)
46-
47-
| Architecture | Model Family | Representative Models |
48-
|--------------|--------------|---------------------------------|
49-
| **BertModel** | BERT-based | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)<br> [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)<br>[BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <br>[e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) |
50-
| **LlamaModel** | Llama-based | [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) |
51-
| **MPNetForMaskedLM** | MPNet | [sentence-transformers/multi-qa-mpnet-base-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1) |
52-
| **MistralModel** | Mistral | [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) |
53-
| **NomicBertModel** | NomicBERT | [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) |
54-
| **Qwen2ForCausalLM** | Qwen2 | [stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) |
55-
| **RobertaModel** | RoBERTa | [ibm-granite/granite-embedding-30m-english](https://huggingface.co/ibm-granite/granite-embedding-30m-english)<br> [ibm-granite/granite-embedding-125m-english](https://huggingface.co/ibm-granite/granite-embedding-125m-english) |
56-
| **XLMRobertaForSequenceClassification** | XLM-RoBERTa | [bge-reranker-v2-m3bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) |
57-
| **XLMRobertaModel** | XLM-RoBERTa |[ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual)<br> [ibm-granite/granite-embedding-278m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) |
58-
59-
---
44+
**QEff Auto Class:** `QEFFAutoModel`
45+
46+
| Architecture | Model Family | Representative Models | vLLM Support |
47+
|--------------|--------------|---------------------------------|--------------|
48+
| **BertModel** | BERT-based | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)<br> [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)<br>[BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <br>[e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | ✔️ |
49+
| **MPNetForMaskedLM** | MPNet | [sentence-transformers/multi-qa-mpnet-base-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1) ||
50+
| **MistralModel** | Mistral | [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) ||
51+
| **NomicBertModel** | NomicBERT | [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) ||
52+
| **Qwen2ForCausalLM** | Qwen2 | [stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) | ✔️ |
53+
| **RobertaModel** | RoBERTa | [ibm-granite/granite-embedding-30m-english](https://huggingface.co/ibm-granite/granite-embedding-30m-english)<br> [ibm-granite/granite-embedding-125m-english](https://huggingface.co/ibm-granite/granite-embedding-125m-english) | ✔️ |
54+
| **XLMRobertaForSequenceClassification** | XLM-RoBERTa | [bge-reranker-v2-m3bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) ||
55+
| **XLMRobertaModel** | XLM-RoBERTa |[ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual)<br> [ibm-granite/granite-embedding-278m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) | ✔️ |
6056

6157
## Multimodal Language Models
6258

6359
### Vision-Language Models (Text + Image Generation)
64-
**QEff Auto Class:** [`QEFFAutoModelForImageTextToText`](#QEFFAutoModelForImageTextToText)
60+
**QEff Auto Class:** `QEFFAutoModelForImageTextToText`
6561

66-
| Architecture | Model Family | Representative Models | CB Support | Single Qpc Support | Dual Qpc Support |
67-
|-----------------------------|--------------|----------------------------------------------------------------------------------------|------------|--------------------|------------------|
68-
| **LlavaForConditionalGeneration** | LLaVA-1.5 | [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) || ✔️ | ✔️ |
69-
| **MllamaForConditionalGeneration** | Llama 3.2 | [meta-llama/Llama-3.2-11B-Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)<br>[meta-llama/Llama-3.2-90B-Vision](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision) || ✔️ | ✔️ |
70-
|**LlavaNextForConditionalGeneration** | Granite Vision | [ibm-granite/granite-vision-3.2-2b](https://huggingface.co/ibm-granite/granite-vision-3.2-2b) ||| ✔️ |
71-
|**Llama4ForConditionalGeneration** | Llama-4-Scout | [Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) || ✔️ | ✔️ |
72-
|**Gemma3ForConditionalGeneration** | Gemma3 | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)|| ✔️ | ✔️ |
62+
| Architecture | Model Family | Representative Models | Qeff Single Qpc | Qeff Dual Qpc | vllm Single Qpc | vllm Dual Qpc |
63+
|------------------------------------|--------------|----------------------------------------------------------------------------------------|------------|---------------------|-------------------|-----------------|
64+
| **LlavaForConditionalGeneration** | LLaVA-1.5 | [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | ✔️ | ✔️ | ✔️ | ✔️ |
65+
| **MllamaForConditionalGeneration** | Llama 3.2 | [meta-llama/Llama-3.2-11B-Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)<br>[meta-llama/Llama-3.2-90B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct) | ✔️ | ✔️ | ✔️ | ✔️ |
66+
| **LlavaNextForConditionalGeneration** | Granite Vision | [ibm-granite/granite-vision-3.2-2b](https://huggingface.co/ibm-granite/granite-vision-3.2-2b) || ✔️ || ✔️ |
67+
| **Llama4ForConditionalGeneration** | Llama-4-Scout | [Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) | ✔️ | ✔️ | ✔️ | ✔️ |
68+
| **Gemma3ForConditionalGeneration** | Gemma3*** | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) | ✔️ | ✔️ | ✔️ ||
69+
- *** pass "disable-sliding-window" flag for e2e inference with vLLM
7370

7471

7572
**Dual QPC:**
@@ -85,25 +82,20 @@ In the Dual QPC(Qualcomm Program Container) setup, the model is split across two
8582
**Single QPC:**
8683
In the single QPC(Qualcomm Program Container) setup, the entire model—including both image encoding and text generation—runs within a single QPC. There is no model splitting, and all components operate within the same execution environment.
8784

88-
**For more details click [here](#QEFFAutoModelForImageTextToText)**
8985

90-
```{NOTE}
86+
87+
**Note:**
9188
The choice between Single and Dual QPC is determined during model instantiation using the `kv_offload` setting.
9289
If the `kv_offload` is set to `True` it runs in dual QPC and if its set to `False` model runs in single QPC mode.
93-
```
9490

9591
---
96-
9792
### Audio Models
9893
(Automatic Speech Recognition) - Transcription Task
94+
**QEff Auto Class:** `QEFFAutoModelForSpeechSeq2Seq`
9995

100-
**QEff Auto Class:** [`QEFFAutoModelForSpeechSeq2Seq`](#QEFFAutoModelForSpeechSeq2Seq)
101-
102-
| Architecture | Model Family | Representative Models |
103-
|--------------|--------------|----------------------------------------------------------------------------------------|
104-
| **Whisper** | Whisper | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)<br>[openai/whisper-base](https://huggingface.co/openai/whisper-base)<br>[openai/whisper-small](https://huggingface.co/openai/whisper-small)<br>[openai/whisper-medium](https://huggingface.co/openai/whisper-medium)<br>[openai/whisper-large](https://huggingface.co/openai/whisper-large)<br>[openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) |
105-
106-
---
96+
| Architecture | Model Family | Representative Models | vLLM Support |
97+
|--------------|--------------|----------------------------------------------------------------------------------------|--------------|
98+
| **Whisper** | Whisper | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)<br>[openai/whisper-base](https://huggingface.co/openai/whisper-base)<br>[openai/whisper-small](https://huggingface.co/openai/whisper-small)<br>[openai/whisper-medium](https://huggingface.co/openai/whisper-medium)<br>[openai/whisper-large](https://huggingface.co/openai/whisper-large)<br>[openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | ✔️ |
10799

108100
(models_coming_soon)=
109101
# Models Coming Soon

0 commit comments

Comments
 (0)