44## Text-only Language Models
55
66### Text Generation Task
7- ** QEff Auto Class:** [ ` QEFFAutoModelForCausalLM ` ] ( #QEFFAutoModelForCausalLM )
7+ ** QEff Auto Class:** ` QEFFAutoModelForCausalLM `
88
9- | Architecture | Model Family | Representative Models | CB Support |
10- | -------------------------| --------------------| --------------------------------------------------------------------------------------| ------------|
11- | ** FalconForCausalLM** | Falcon | [ tiiuae/falcon-40b] ( https://huggingface.co/tiiuae/falcon-40b ) | ✔️ |
12- | ** Qwen3MoeForCausalLM** | Qwen3Moe | [ Qwen/Qwen3-30B-A3B-Instruct-2507] ( https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507 ) | ✔️ |
9+ | Architecture | Model Family | Representative Models | [ vLLM Support] ( https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html ) |
10+ | -------------------------| --------------------| --------------------------------------------------------------------------------------| -------------- |
11+ | ** FalconForCausalLM** | Falcon** | [ tiiuae/falcon-40b] ( https://huggingface.co/tiiuae/falcon-40b ) | ✔️ |
12+ | ** Qwen3MoeForCausalLM** | Qwen3Moe | [ Qwen/Qwen3-30B-A3B-Instruct-2507] ( https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507 ) | ✕ |
1313| ** GemmaForCausalLM** | CodeGemma | [ google/codegemma-2b] ( https://huggingface.co/google/codegemma-2b ) <br >[ google/codegemma-7b] ( https://huggingface.co/google/codegemma-7b ) | ✔️ |
14- | | Gemma | [ google/gemma-2b] ( https://huggingface.co/google/gemma-2b ) <br >[ google/gemma-7b] ( https://huggingface.co/google/gemma-7b ) <br >[ google/gemma-2-2b] ( https://huggingface.co/google/gemma-2-2b ) <br >[ google/gemma-2-9b] ( https://huggingface.co/google/gemma-2-9b ) <br >[ google/gemma-2-27b] ( https://huggingface.co/google/gemma-2-27b ) | ✔️ |
14+ | | Gemma*** | [ google/gemma-2b] ( https://huggingface.co/google/gemma-2b ) <br >[ google/gemma-7b] ( https://huggingface.co/google/gemma-7b ) <br >[ google/gemma-2-2b] ( https://huggingface.co/google/gemma-2-2b ) <br >[ google/gemma-2-9b] ( https://huggingface.co/google/gemma-2-9b ) <br >[ google/gemma-2-27b] ( https://huggingface.co/google/gemma-2-27b ) | ✔️ |
1515| ** GPTBigCodeForCausalLM** | Starcoder1.5 | [ bigcode/starcoder] ( https://huggingface.co/bigcode/starcoder ) | ✔️ |
1616| | Starcoder2 | [ bigcode/starcoder2-15b] ( https://huggingface.co/bigcode/starcoder2-15b ) | ✔️ |
1717| ** GPTJForCausalLM** | GPT-J | [ EleutherAI/gpt-j-6b] ( https://huggingface.co/EleutherAI/gpt-j-6b ) | ✔️ |
1818| ** GPT2LMHeadModel** | GPT-2 | [ openai-community/gpt2] ( https://huggingface.co/openai-community/gpt2 ) | ✔️ |
1919| ** GraniteForCausalLM** | Granite 3.1 | [ ibm-granite/granite-3.1-8b-instruct] ( https://huggingface.co/ibm-granite/granite-3.1-8b-instruct ) <br >[ ibm-granite/granite-guardian-3.1-8b] ( https://huggingface.co/ibm-granite/granite-guardian-3.1-8b ) | ✔️ |
2020| | Granite 20B | [ ibm-granite/granite-20b-code-base-8k] ( https://huggingface.co/ibm-granite/granite-20b-code-base-8k ) <br >[ ibm-granite/granite-20b-code-instruct-8k] ( https://huggingface.co/ibm-granite/granite-20b-code-instruct-8k ) | ✔️ |
21- | ** InternVLChatModel** | Intern-VL | [ OpenGVLab/InternVL2_5-1B] ( https://huggingface.co/OpenGVLab/InternVL2_5-1B ) | |
21+ | ** InternVLChatModel** | Intern-VL | [ OpenGVLab/InternVL2_5-1B] ( https://huggingface.co/OpenGVLab/InternVL2_5-1B ) | ✔️ | | |
2222| ** LlamaForCausalLM** | CodeLlama | [ codellama/CodeLlama-7b-hf] ( https://huggingface.co/codellama/CodeLlama-7b-hf ) <br >[ codellama/CodeLlama-13b-hf] ( https://huggingface.co/codellama/CodeLlama-13b-hf ) <br >[ codellama/CodeLlama-34b-hf] ( https://huggingface.co/codellama/CodeLlama-34b-hf ) | ✔️ |
2323| | DeepSeek-R1-Distill-Llama | [ deepseek-ai/DeepSeek-R1-Distill-Llama-70B] ( https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B ) | ✔️ |
2424| | InceptionAI-Adapted | [ inceptionai/jais-adapted-7b] ( https://huggingface.co/inceptionai/jais-adapted-7b ) <br >[ inceptionai/jais-adapted-13b-chat] ( https://huggingface.co/inceptionai/jais-adapted-13b-chat ) <br >[ inceptionai/jais-adapted-70b] ( https://huggingface.co/inceptionai/jais-adapted-70b ) | ✔️ |
3131| ** MistralForCausalLM** | Mistral | [ mistralai/Mistral-7B-Instruct-v0.1] ( https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 ) | ✔️ |
3232| ** MixtralForCausalLM** | Codestral<br >Mixtral | [ mistralai/Codestral-22B-v0.1] ( https://huggingface.co/mistralai/Codestral-22B-v0.1 ) <br >[ mistralai/Mixtral-8x7B-v0.1] ( https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 ) | ✔️ |
3333| ** MPTForCausalLM** | MPT | [ mosaicml/mpt-7b] ( https://huggingface.co/mosaicml/mpt-7b ) | ✔️ |
34- | ** Phi3ForCausalLM** | Phi-3, Phi-3.5 | [ microsoft/Phi-3-mini-4k-instruct] ( https://huggingface.co/microsoft/Phi-3-mini-4k-instruct ) | ✔️ |
34+ | ** Phi3ForCausalLM** | Phi-3** , Phi-3.5** | [ microsoft/Phi-3-mini-4k-instruct] ( https://huggingface.co/microsoft/Phi-3-mini-4k-instruct ) | ✔️ |
3535| ** QwenForCausalLM** | DeepSeek-R1-Distill-Qwen | [ DeepSeek-R1-Distill-Qwen-32B] ( https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B ) | ✔️ |
3636| | Qwen2, Qwen2.5 | [ Qwen/Qwen2-1.5B-Instruct] ( https://huggingface.co/Qwen/Qwen2-1.5B-Instruct ) | ✔️ |
3737| ** LlamaSwiftKVForCausalLM** | swiftkv | [ Snowflake/Llama-3.1-SwiftKV-8B-Instruct] ( https://huggingface.co/Snowflake/Llama-3.1-SwiftKV-8B-Instruct ) | ✔️ |
38- | ** Grok1ModelForCausalLM** | grok-1 | [ hpcai-tech/grok-1] ( https://huggingface.co/hpcai-tech/grok-1 ) | ✔️ |
39-
40- ---
41-
38+ | ** Grok1ModelForCausalLM** | grok-1 | [ hpcai-tech/grok-1] ( https://huggingface.co/hpcai-tech/grok-1 ) | ✕ |
39+ - ** set "trust-remote-code" flag to True for e2e inference with vLLM
40+ - ** * pass "disable-sliding-window" flag for e2e inference of Gemma-2 family of models with vLLM
4241## Embedding Models
4342
4443### Text Embedding Task
45- ** QEff Auto Class:** [ ` QEFFAutoModel ` ] ( #QEFFAutoModel )
46-
47- | Architecture | Model Family | Representative Models |
48- | --------------| --------------| ---------------------------------|
49- | ** BertModel** | BERT-based | [ BAAI/bge-base-en-v1.5] ( https://huggingface.co/BAAI/bge-base-en-v1.5 ) <br > [ BAAI/bge-large-en-v1.5] ( https://huggingface.co/BAAI/bge-large-en-v1.5 ) <br >[ BAAI/bge-small-en-v1.5] ( https://huggingface.co/BAAI/bge-small-en-v1.5 ) <br >[ e5-large-v2] ( https://huggingface.co/intfloat/e5-large-v2 ) |
50- | ** LlamaModel** | Llama-based | [ intfloat/e5-mistral-7b-instruct] ( https://huggingface.co/intfloat/e5-mistral-7b-instruct ) |
51- | ** MPNetForMaskedLM** | MPNet | [ sentence-transformers/multi-qa-mpnet-base-cos-v1] ( https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1 ) |
52- | ** MistralModel** | Mistral | [ e5-mistral-7b-instruct] ( https://huggingface.co/intfloat/e5-mistral-7b-instruct ) |
53- | ** NomicBertModel** | NomicBERT | [ nomic-embed-text-v1.5] ( https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 ) |
54- | ** Qwen2ForCausalLM** | Qwen2 | [ stella_en_1.5B_v5] ( https://huggingface.co/NovaSearch/stella_en_1.5B_v5 ) |
55- | ** RobertaModel** | RoBERTa | [ ibm-granite/granite-embedding-30m-english] ( https://huggingface.co/ibm-granite/granite-embedding-30m-english ) <br > [ ibm-granite/granite-embedding-125m-english] ( https://huggingface.co/ibm-granite/granite-embedding-125m-english ) |
56- | ** XLMRobertaForSequenceClassification** | XLM-RoBERTa | [ bge-reranker-v2-m3bge-reranker-v2-m3] ( https://huggingface.co/BAAI/bge-reranker-v2-m3 ) |
57- | ** XLMRobertaModel** | XLM-RoBERTa | [ ibm-granite/granite-embedding-107m-multilingual] ( https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual ) <br > [ ibm-granite/granite-embedding-278m-multilingual] ( https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual ) |
58-
59- ---
44+ ** QEff Auto Class:** ` QEFFAutoModel `
45+
46+ | Architecture | Model Family | Representative Models | vLLM Support |
47+ | --------------| --------------| ---------------------------------| --------------|
48+ | ** BertModel** | BERT-based | [ BAAI/bge-base-en-v1.5] ( https://huggingface.co/BAAI/bge-base-en-v1.5 ) <br > [ BAAI/bge-large-en-v1.5] ( https://huggingface.co/BAAI/bge-large-en-v1.5 ) <br >[ BAAI/bge-small-en-v1.5] ( https://huggingface.co/BAAI/bge-small-en-v1.5 ) <br >[ e5-large-v2] ( https://huggingface.co/intfloat/e5-large-v2 ) | ✔️ |
49+ | ** MPNetForMaskedLM** | MPNet | [ sentence-transformers/multi-qa-mpnet-base-cos-v1] ( https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1 ) | ✕ |
50+ | ** MistralModel** | Mistral | [ e5-mistral-7b-instruct] ( https://huggingface.co/intfloat/e5-mistral-7b-instruct ) | ✕ |
51+ | ** NomicBertModel** | NomicBERT | [ nomic-embed-text-v1.5] ( https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 ) | ✕ |
52+ | ** Qwen2ForCausalLM** | Qwen2 | [ stella_en_1.5B_v5] ( https://huggingface.co/NovaSearch/stella_en_1.5B_v5 ) | ✔️ |
53+ | ** RobertaModel** | RoBERTa | [ ibm-granite/granite-embedding-30m-english] ( https://huggingface.co/ibm-granite/granite-embedding-30m-english ) <br > [ ibm-granite/granite-embedding-125m-english] ( https://huggingface.co/ibm-granite/granite-embedding-125m-english ) | ✔️ |
54+ | ** XLMRobertaForSequenceClassification** | XLM-RoBERTa | [ bge-reranker-v2-m3bge-reranker-v2-m3] ( https://huggingface.co/BAAI/bge-reranker-v2-m3 ) | ✕ |
55+ | ** XLMRobertaModel** | XLM-RoBERTa | [ ibm-granite/granite-embedding-107m-multilingual] ( https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual ) <br > [ ibm-granite/granite-embedding-278m-multilingual] ( https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual ) | ✔️ |
6056
6157## Multimodal Language Models
6258
6359### Vision-Language Models (Text + Image Generation)
64- ** QEff Auto Class:** [ ` QEFFAutoModelForImageTextToText ` ] ( #QEFFAutoModelForImageTextToText )
60+ ** QEff Auto Class:** ` QEFFAutoModelForImageTextToText `
6561
66- | Architecture | Model Family | Representative Models | CB Support | Single Qpc Support | Dual Qpc Support |
67- | -----------------------------| --------------| ----------------------------------------------------------------------------------------| ------------| --------------------| ------------------|
68- | ** LlavaForConditionalGeneration** | LLaVA-1.5 | [ llava-hf/llava-1.5-7b-hf] ( https://huggingface.co/llava-hf/llava-1.5-7b-hf ) | ✕ | ✔️ | ✔️ |
69- | ** MllamaForConditionalGeneration** | Llama 3.2 | [ meta-llama/Llama-3.2-11B-Vision Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct ) <br >[ meta-llama/Llama-3.2-90B-Vision] ( https://huggingface.co/meta-llama/Llama-3.2-90B-Vision ) | ✕ | ✔️ | ✔️ |
70- | ** LlavaNextForConditionalGeneration** | Granite Vision | [ ibm-granite/granite-vision-3.2-2b] ( https://huggingface.co/ibm-granite/granite-vision-3.2-2b ) | ✕ | ✕ | ✔️ |
71- | ** Llama4ForConditionalGeneration** | Llama-4-Scout | [ Llama-4-Scout-17B-16E-Instruct] ( https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct ) | ✕ | ✔️ | ✔️ |
72- | ** Gemma3ForConditionalGeneration** | Gemma3 | [ google/gemma-3-4b-it] ( https://huggingface.co/google/gemma-3-4b-it ) | ✕ | ✔️ | ✔️ |
62+ | Architecture | Model Family | Representative Models | Qeff Single Qpc | Qeff Dual Qpc | vllm Single Qpc | vllm Dual Qpc |
63+ | ------------------------------------| --------------| ----------------------------------------------------------------------------------------| ------------| ---------------------| -------------------| -----------------|
64+ | ** LlavaForConditionalGeneration** | LLaVA-1.5 | [ llava-hf/llava-1.5-7b-hf] ( https://huggingface.co/llava-hf/llava-1.5-7b-hf ) | ✔️ | ✔️ | ✔️ | ✔️ |
65+ | ** MllamaForConditionalGeneration** | Llama 3.2 | [ meta-llama/Llama-3.2-11B-Vision Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct ) <br >[ meta-llama/Llama-3.2-90B-Vision-Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct ) | ✔️ | ✔️ | ✔️ | ✔️ |
66+ | ** LlavaNextForConditionalGeneration** | Granite Vision | [ ibm-granite/granite-vision-3.2-2b] ( https://huggingface.co/ibm-granite/granite-vision-3.2-2b ) | ✕ | ✔️ | ✕ | ✔️ |
67+ | ** Llama4ForConditionalGeneration** | Llama-4-Scout | [ Llama-4-Scout-17B-16E-Instruct] ( https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct ) | ✔️ | ✔️ | ✔️ | ✔️ |
68+ | ** Gemma3ForConditionalGeneration** | Gemma3*** | [ google/gemma-3-4b-it] ( https://huggingface.co/google/gemma-3-4b-it ) | ✔️ | ✔️ | ✔️ | ✕ |
69+ - ** * pass "disable-sliding-window" flag for e2e inference with vLLM
7370
7471
7572** Dual QPC:**
@@ -85,25 +82,20 @@ In the Dual QPC(Qualcomm Program Container) setup, the model is split across two
8582** Single QPC:**
8683In the single QPC(Qualcomm Program Container) setup, the entire model—including both image encoding and text generation—runs within a single QPC. There is no model splitting, and all components operate within the same execution environment.
8784
88- ** For more details click [ here] ( #QEFFAutoModelForImageTextToText ) **
8985
90- ``` {NOTE}
86+
87+ ** Note:**
9188The choice between Single and Dual QPC is determined during model instantiation using the ` kv_offload ` setting.
9289If the ` kv_offload ` is set to ` True ` it runs in dual QPC and if its set to ` False ` model runs in single QPC mode.
93- ```
9490
9591---
96-
9792### Audio Models
9893(Automatic Speech Recognition) - Transcription Task
94+ ** QEff Auto Class:** ` QEFFAutoModelForSpeechSeq2Seq `
9995
100- ** QEff Auto Class:** [ ` QEFFAutoModelForSpeechSeq2Seq ` ] ( #QEFFAutoModelForSpeechSeq2Seq )
101-
102- | Architecture | Model Family | Representative Models |
103- | --------------| --------------| ----------------------------------------------------------------------------------------|
104- | ** Whisper** | Whisper | [ openai/whisper-tiny] ( https://huggingface.co/openai/whisper-tiny ) <br >[ openai/whisper-base] ( https://huggingface.co/openai/whisper-base ) <br >[ openai/whisper-small] ( https://huggingface.co/openai/whisper-small ) <br >[ openai/whisper-medium] ( https://huggingface.co/openai/whisper-medium ) <br >[ openai/whisper-large] ( https://huggingface.co/openai/whisper-large ) <br >[ openai/whisper-large-v3-turbo] ( https://huggingface.co/openai/whisper-large-v3-turbo ) |
105-
106- ---
96+ | Architecture | Model Family | Representative Models | vLLM Support |
97+ | --------------| --------------| ----------------------------------------------------------------------------------------| --------------|
98+ | ** Whisper** | Whisper | [ openai/whisper-tiny] ( https://huggingface.co/openai/whisper-tiny ) <br >[ openai/whisper-base] ( https://huggingface.co/openai/whisper-base ) <br >[ openai/whisper-small] ( https://huggingface.co/openai/whisper-small ) <br >[ openai/whisper-medium] ( https://huggingface.co/openai/whisper-medium ) <br >[ openai/whisper-large] ( https://huggingface.co/openai/whisper-large ) <br >[ openai/whisper-large-v3-turbo] ( https://huggingface.co/openai/whisper-large-v3-turbo ) | ✔️ |
10799
108100(models_coming_soon)=
109101# Models Coming Soon
0 commit comments