Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] aiPC performance update #29536

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ Most Efficient Large Language Models for AI PC

This page is regularly updated to help you identify the best-performing LLMs on the
Intel® Core™ Ultra processor family and AI PCs.
The current data is as of OpenVINO 2024.6, 13 Dec. 2024.
The current data is as of OpenVINO 2025.0, 06 March 2025 (7-155H and 7-268V)
and OpenVINO 2024.6, 13 Dec. 2024 (9-288V).

The tables below list the key performance indicators for inference on built-in GPUs.

Expand Down
243 changes: 147 additions & 96 deletions docs/sphinx_setup/_static/benchmarks_files/llm_models_7-155H.csv
Original file line number Diff line number Diff line change
@@ -1,96 +1,147 @@
Topology,Precision,Input Size,max rss memory,1st latency (ms),2nd latency (ms),2nd tok/sec
opt-125m-gptq,INT4-MIXED,32,1116,25.8,8.1,123.5
opt-125m-gptq,INT4-MIXED,1024,1187.1,75.2,8.2,122.0
qwen2-0.5b,INT4-MIXED,32,1587.4,45.1,15.4,64.9
qwen2-0.5b,INT4-MIXED,1024,1587.8,228.2,15.6,64.1
tiny-llama-1.1b-chat,INT4-MIXED,32,1704.2,42.4,17.6,56.8
tiny-llama-1.1b-chat,INT4-MIXED,1024,1616.3,489.2,18.9,52.9
qwen2-0.5b,INT8-CW,32,1477.3,51.5,20.2,49.5
qwen2-0.5b,INT8-CW,1024,1592,263.7,20.6,48.5
tiny-llama-1.1b-chat,INT8-CW,32,1855.6,60.2,20.7,48.3
tiny-llama-1.1b-chat,INT8-CW,1024,1992.6,618.2,21.7,46.1
qwen2-1.5b,INT4-MIXED,32,2024.2,59.6,23.1,43.3
bloomz-560m,FP16,1024,2773.1,647.8,23.8,42.0
qwen2-1.5b,INT4-MIXED,1024,2177.7,577.4,23.8,42.0
bloomz-560m,FP16,32,2582.7,44.2,25.1,39.8
dolly-v2-3b,INT4-MIXED,32,2507.9,79.8,29.4,34.0
phi-2,INT4-MIXED,32,2568.9,74.6,29.7,33.7
qwen2-1.5b,INT8-CW,32,2577.3,81.6,30.5,32.8
red-pajama-incite-chat-3b-v1,INT4-MIXED,32,2489.4,69.9,30.5,32.8
minicpm-1b-sft,INT4-MIXED,31,2442.1,84.7,31,32.3
qwen2-1.5b,INT8-CW,1024,2739.8,773.3,31.2,32.1
gemma-2b-it,INT4-MIXED,32,2998.2,103.5,31.4,31.8
dolly-v2-3b,INT4-MIXED,1024,2508.1,1396.6,32,31.3
gemma-2b-it,INT4-MIXED,1024,3171.5,822.3,32.2,31.1
phi-2,INT4-MIXED,1024,2940.5,1395.3,32.2,31.1
red-pajama-incite-chat-3b-v1,INT4-MIXED,1023,2489.6,1435.5,33.1,30.2
minicpm-1b-sft,INT8-CW,31,2818.6,86.9,33.4,29.9
stable-zephyr-3b-dpo,INT4-MIXED,32,2638.2,87.4,33.8,29.6
stablelm-3b-4e1t,INT4-MIXED,32,2750.5,89.4,35.6,28.1
stablelm-3b-4e1t,INT4-MIXED,1023,3115.5,1473.1,38.1,26.2
phi-3-mini-4k-instruct,INT4-MIXED,32,3039.1,109.2,40.4,24.8
phi-2,INT8-CW,32,3599.7,107.5,42.1,23.8
gemma-2b-it,INT8-CW,32,3845.4,111.3,42.2,23.7
dolly-v2-3b,INT8-CW,32,3596.4,110.1,42.5,23.5
gemma-2b-it,INT8-CW,1024,3844.6,1183,43,23.3
red-pajama-incite-chat-3b-v1,INT8-CW,32,3590,111,43.3,23.1
phi-3-mini-4k-instruct,INT4-MIXED,1024,3467.6,1721.6,43.5,23.0
stablelm-3b-4e1t,INT8-CW,32,3582.8,111,44.3,22.6
stable-zephyr-3b-dpo,INT8-CW,32,3607.2,110.2,44.5,22.5
phi-2,INT8-CW,1024,3982,1508,44.6,22.4
dolly-v2-3b,INT8-CW,1024,3596.5,1529.1,44.9,22.3
minicpm-1b-sft,FP16,31,3769.9,84,45.4,22.0
red-pajama-incite-chat-3b-v1,INT8-CW,1023,3952,2064.5,45.7,21.9
stablelm-3b-4e1t,INT8-CW,1023,3934.5,2286.3,46.8,21.4
gpt-j-6b,INT4-MIXED,32,4443.5,159.3,56.7,17.6
phi-3-mini-4k-instruct,INT8-CW,32,4545,117.1,57.6,17.4
phi-3-mini-4k-instruct,INT8-CW,1024,4810.4,2068.8,60.5,16.5
gpt-j-6b,INT4-MIXED,1024,4746.4,2397,60.6,16.5
falcon-7b-instruct,INT4-MIXED,32,5014,203.7,61.3,16.3
qwen2-7b,INT4-MIXED,32,5269.4,203.8,62.3,16.1
codegen25-7b,INT4-MIXED,32,4641.1,170.6,63.5,15.7
llama-2-7b-gptq,INT4-MIXED,32,4597.3,172.1,63.5,15.7
falcon-7b-instruct,INT4-MIXED,1024,5230.6,2695.3,63.6,15.7
qwen2-7b,INT4-MIXED,1024,5370.8,2505.9,63.9,15.6
decilm-7b-instruct,INT4-MIXED,36,4614.2,301.1,65.3,15.3
codegen25-7b,INT4-MIXED,1024,4641.9,2629.6,67.4,14.8
llama-2-7b-gptq,INT4-MIXED,1024,4928.1,2584.3,67.6,14.8
mistral-7b-v0.1,INT4-MIXED,32,4928.5,180.9,69.2,14.5
llama-2-7b-chat-hf,INT4-MIXED,32,4985.7,160.3,69.5,14.4
qwen-7b-chat-gptq,INT4-MIXED,32,5426.7,188.3,69.5,14.4
llama-3-8b,INT4-MIXED,33,5473.4,285.7,70,14.3
flan-t5-xxl,INT4-MIXED,33,19293.8,211.7,70.1,14.3
llama-3-8b,INT4-MIXED,33,5389.2,281,70.8,14.1
mistral-7b-v0.1,INT4-MIXED,1024,5225.4,2713.3,71.8,13.9
zephyr-7b-beta,INT4-MIXED,32,5306.1,177.9,72.1,13.9
llama-3-8b,INT4-MIXED,1025,5615.2,2937.8,72.4,13.8
llama-3-8b,INT4-MIXED,1025,5531.7,2815.4,73.2,13.7
llama-2-7b-chat-hf,INT4-MIXED,1024,5319.5,2736.2,73.6,13.6
phi-2,FP16,32,6197,104.6,74.7,13.4
zephyr-7b-beta,INT4-MIXED,1024,5306.4,2802.3,74.7,13.4
qwen-7b-chat-gptq,INT4-MIXED,1024,5934.9,2606.9,75,13.3
dolly-v2-3b,FP16,32,6195.1,105.3,75.3,13.3
baichuan2-7b-chat,INT4-MIXED,32,5837.9,188.5,76.8,13.0
red-pajama-incite-chat-3b-v1,FP16,32,6178.6,118,76.8,13.0
gemma-7b-it,INT4-MIXED,32,6495.9,230.6,77,13.0
stablelm-3b-4e1t,FP16,32,6174.2,105.9,77.1,13.0
stable-zephyr-3b-dpo,FP16,32,6217.8,107.9,77.2,13.0
glm-4-9b-chat,INT4-MIXED,32,6333.4,225,77.3,12.9
phi-2,FP16,1024,6411.5,2065.2,77.3,12.9
dolly-v2-3b,FP16,1024,6410.1,2075,77.7,12.9
llama-3.1-8b,INT4-MIXED,32,6324.6,182.2,78.8,12.7
red-pajama-incite-chat-3b-v1,FP16,1023,6394.2,2752.4,79.2,12.6
stablelm-3b-4e1t,FP16,1023,6386.9,2953.3,79.5,12.6
glm-4-9b-chat,INT4-MIXED,1024,6439.5,3282.2,80,12.5
baichuan2-7b-chat,INT4-MIXED,1024,6174.1,2752.6,80.6,12.4
gemma-7b-it,INT4-MIXED,1024,6795.4,3118.3,80.6,12.4
llama-3.1-8b,INT4-MIXED,1024,6324.8,2865.7,81.3,12.3
gpt-j-6b,INT8-CW,32,6793.2,167.6,85,11.8
qwen-7b-chat,INT4-MIXED,32,7274.8,168.8,85.2,11.7
gpt-j-6b,INT8-CW,1024,6793.3,2668.4,88.8,11.3
qwen-7b-chat,INT4-MIXED,1024,7610.3,2991.9,90.6,11.0
flan-t5-xxl,INT4-MIXED,1139,23514,540.8,94.9,10.5
falcon-7b-instruct,INT8-CW,32,7764.1,181.3,95.5,10.5
llama-2-7b-chat-hf,INT8-CW,32,7330.9,172,96.1,10.4
falcon-7b-instruct,INT8-CW,1024,7987.4,3072.8,98.1,10.2
qwen2-7b,INT8-CW,32,8175.3,211.3,99.6,10.0
Topology,Precision,Input Size,max rss memory,1st latency (ms),2nd latency (ms),2nd token per sec (2nd lat^(-1)),,,
bloomz-560m,INT4,32,2123,36.1,12.5,80,,,
bloomz-560m,INT4,1024,2123.6,195,13.7,72.99270073,,,
tiny-llama-1.1b-chat,INT4,32,2249.2,36.8,13.9,71.94244604,,,
tiny-llama-1.1b-chat,INT4,1024,2249.9,427.8,15,66.66666667,,,
qwen2-0.5b,INT4,32,1800.7,44.7,15.4,64.93506494,,,
bloomz-560m,INT8,32,2273.5,39.5,15.4,64.93506494,,,
qwen2-0.5b,INT4,1024,1801.1,185.9,15.5,64.51612903,,,
bloomz-560m,INT8,1024,2471.6,213.3,15.8,63.29113924,,,
qwen2-0.5b,INT8,32,2000.1,37.9,18.2,54.94505495,,,
qwen2-0.5b,INT8,1024,2135.9,218,18.7,53.47593583,,,
bloomz-560m,FP16,32,3069.2,39.1,19.7,50.76142132,,,
qwen2-1.5b,INT4,32,2750.3,47.6,20,50,,,
tiny-llama-1.1b-chat,INT8,32,2441.6,49.4,20.5,48.7804878,,,
qwen2-1.5b,INT4,1024,2575.9,531.2,20.9,47.84688995,,,
bloomz-560m,FP16,1024,3057.5,232.7,21,47.61904762,,,
tiny-llama-1.1b-chat,INT8,1024,2431.7,523.6,21.5,46.51162791,,,
dolly-v2-3b,INT4,32,3178.8,75.4,27.1,36.900369,,,
minicpm-1b-sft,INT4,31,3131.5,74,27.6,36.23188406,,,
red-pajama-incite-chat-3b-v1,INT4,32,3057.5,67.1,27.6,36.23188406,,,
gemma-2b-it,INT4,32,3460.7,97.9,28.5,35.0877193,,,
minicpm-1b-sft,INT4,1014,3132,732.4,29,34.48275862,,,
qwen2-1.5b,INT8,32,3126.4,77.4,29.3,34.12969283,,,
gemma-2b-it,INT4,1024,3461.4,796.3,29.4,34.01360544,,,
qwen2-1.5b,INT8,1024,3126.8,660.3,30.1,33.22259136,,,
dolly-v2-3b,INT4,1024,3179,1171.9,31.8,31.44654088,,,
minicpm-1b-sft,INT8,31,3496,77.9,31.9,31.34796238,,,
red-pajama-incite-chat-3b-v1,INT4,1023,3057.7,1211,32.8,30.48780488,,,
minicpm-1b-sft,INT8,1014,3433.2,783.7,33.6,29.76190476,,,
phi-3-mini-4k-instruct,INT4,32,3534.8,96.6,36.6,27.32240437,,,
red-pajama-incite-chat-3b-v1,INT8,32,4099.8,107.3,42.3,23.64066194,,,
gemma-2b-it,INT8,32,4478.7,103.1,42.4,23.58490566,,,
minicpm-1b-sft,FP16,31,4157.5,75.7,42.7,23.41920375,,,
phi-3-mini-4k-instruct,INT4,1024,3535.3,1521.7,42.8,23.36448598,,,
dolly-v2-3b,INT8,32,4143.7,102,43.1,23.20185615,,,
gemma-2b-it,INT8,1024,4478.9,936.2,43.3,23.09468822,,,
minicpm-1b-sft,FP16,1014,4329.7,876.6,44.8,22.32142857,,,
red-pajama-incite-chat-3b-v1,INT8,1023,4412.8,1815.9,44.9,22.27171492,,,
dolly-v2-3b,INT8,1024,4143.8,1276.4,45.6,21.92982456,,,
chatglm3-6b,INT4,32,4746.8,149.6,50.6,19.76284585,,,
chatglm3-6b,INT4,1024,4747,2279.1,52.6,19.01140684,,,
flan-t5-xxl,INT4,33,13681.2,91.7,53.6,18.65671642,,,
phi-3-mini-4k-instruct,INT8,32,5041.3,110.9,56.9,17.57469244,,,
llama-2-7b-gptq,INT4,32,5115.9,168.1,57.8,17.30103806,,,
chatglm3-6b-gptq,INT4,32,5371.4,159.5,57.8,17.30103806,,,
decilm-7b-instruct,INT4,36,5415.9,230.5,58,17.24137931,,,
codegen25-7b,INT4,32,5110.5,161,59.1,16.92047377,,,
flan-t5-xxl,INT4,1139,16627.6,455.8,59.3,16.86340641,,,
qwen2-7b,INT4,32,5802.2,173.2,60.1,16.63893511,,,
phi-3-mini-4k-instruct,INT8,1024,5041.7,1812.4,60.2,16.61129568,,,
chatglm3-6b-gptq,INT4,1024,5748.7,2236,60.2,16.61129568,,,
falcon-7b-instruct,INT4,32,5495.1,181.3,60.3,16.58374793,,,
decilm-7b-instruct,INT4,1091,5237.4,2995.4,60.9,16.42036125,,,
qwen2-7b,INT4,1024,5758.2,2445.4,61.9,16.15508885,,,
falcon-7b-instruct,INT4,1024,5682.7,2718.5,62.6,15.97444089,,,
codegen25-7b,INT4,1024,5513.9,2500.7,63.2,15.82278481,,,
mistral-7b-v0.1,INT4,32,5475.8,178.5,64.7,15.45595054,,,
qwen-7b-chat-gptq,INT4,32,6115.4,174.2,64.8,15.43209877,,,
llama-3-8b,INT4,33,5964.2,238.4,65.2,15.33742331,,,
llama-3-8b,INT4,33,5870.5,239.8,65.3,15.31393568,,,
llama-2-7b-chat-hf,INT4,32,5493.5,157.4,65.4,15.29051988,,,
llama-2-7b-gptq,INT4,1024,5802.7,2547.3,65.4,15.29051988,,,
mistral-7b-v0.1,INT4,1024,5476,2684.8,67.2,14.88095238,,,
llama-3-8b,INT4,1025,6163.2,2842.9,67.6,14.79289941,,,
zephyr-7b-beta,INT4,32,5739.1,177.4,67.7,14.77104874,,,
llama-3-8b,INT4,1025,6069.4,2741.8,67.8,14.74926254,,,
llama-2-7b-chat-hf,INT4,1024,5494,2500.3,69.5,14.38848921,,,
zephyr-7b-beta,INT4,1024,5739.7,2671.4,71,14.08450704,,,
qwen-7b-chat-gptq,INT4,1024,6646.3,2596.9,73,13.69863014,,,
baichuan2-7b-chat,INT4,32,6385.1,159.5,73.1,13.67989056,,,
gemma-7b-it,INT4,32,7297.7,221.9,73.7,13.56852103,,,
dolly-v2-3b,FP16,32,6652.1,107.1,74.2,13.47708895,,,
red-pajama-incite-chat-3b-v1,FP16,32,6640.8,103.1,74.7,13.38688086,,,
llama-3.1-8b,INT4,32,6797.5,182.7,76.3,13.1061599,,,
glm-4-9b-chat,INT4,32,6805.1,215.5,76.4,13.08900524,,,
baichuan2-7b-chat,INT4,1024,6385.5,2597,77.3,12.93661061,,,
gemma-7b-it,INT4,1024,6974.7,3126,77.5,12.90322581,,,
dolly-v2-3b,FP16,1024,6652.2,1542.4,78.7,12.7064803,,,
red-pajama-incite-chat-3b-v1,FP16,1023,7120.4,2490.4,79.3,12.61034048,,,
llama-3.1-8b,INT4,1024,7114,2807.6,79.7,12.54705144,,,
glm-4-9b-chat,INT4,1024,6805.2,3197,79.7,12.54705144,,,
qwen-7b-chat,INT4,32,7255.7,156.2,81.2,12.31527094,,,
chatglm3-6b,INT8,32,7308.6,154.4,85.1,11.75088132,,,
qwen-7b-chat,INT4,1024,7827.7,2693.7,86.6,11.54734411,,,
chatglm3-6b,INT8,1024,7308.9,2486,87.4,11.4416476,,,
flan-t5-xxl,INT8,33,20923.9,170.5,91.7,10.90512541,,,
llama-2-7b-chat-hf,INT8,32,7838.4,157.9,94.8,10.54852321,,,
falcon-7b-instruct,INT8,32,8250,175.3,95.1,10.51524711,,,
codegen25-7b,INT8,32,7996.9,162.7,95.7,10.44932079,,,
falcon-7b-instruct,INT8,1024,8445.4,3055.4,97.5,10.25641026,,,
flan-t5-xxl,INT8,1139,24095.3,571.2,97.6,10.24590164,,,
qwen2-7b,INT8,32,8542.4,185.5,98.2,10.18329939,,,
llama-2-7b-chat-hf,INT8,1024,7838.6,3132.1,98.8,10.12145749,,,
qwen2-7b,INT8,1024,8543.5,3124.5,99.8,10.02004008,,,
codegen25-7b,INT8,1024,8453.5,3136,99.9,10.01001001,,,
decilm-7b-instruct,INT8,36,8088.5,244.9,100.7,9.930486594,,,
phi-3-mini-4k-instruct,FP16,32,8592.5,124.5,102.9,9.718172983,,,
decilm-7b-instruct,INT8,1091,8292.4,9951.9,103.5,9.661835749,,,
qwen-7b-chat,INT8,32,8991.1,169.7,103.7,9.643201543,,,
zephyr-7b-beta,INT8,32,8267.2,183.1,104.5,9.56937799,,,
mistral-7b-v0.1,INT8,32,8269.6,184.1,104.9,9.532888465,,,
zephyr-7b-beta,INT8,1024,8268.1,3379.7,107,9.345794393,,,
mistral-7b-v0.1,INT8,1024,8513.8,3394.1,107.4,9.310986965,,,
phi-3-mini-4k-instruct,FP16,1024,9157.2,2080.8,108.4,9.225092251,,,
qwen-7b-chat,INT8,1024,8991.4,3137.5,109,9.174311927,,,
llama-3-8b,INT8,33,9085.1,264.9,109.4,9.140767824,,,
llama-3.1-8b,INT8,32,9070.9,189.1,110.7,9.033423668,,,
baichuan2-13b-chat,INT4,32,10592.1,330.4,111.4,8.976660682,,,
llama-3-8b,INT8,1025,9085.2,9900.1,111.9,8.936550492,,,
llama-3.1-8b,INT8,1024,9071,3408.2,113.2,8.833922261,,,
phi-3-medium-4k-instruct,INT4,38,9009.6,443.3,116,8.620689655,,,
phi-3-medium-4k-instruct,INT4,1061,8935.4,5655.5,119.9,8.34028357,,,
baichuan2-7b-chat,INT8,32,8633.7,172.7,120.5,8.298755187,,,
baichuan2-7b-chat,INT8,1024,9135.7,3192.6,124.7,8.019246191,,,
gemma-7b-it,INT8,32,10087.5,223.2,125.2,7.987220447,,,
glm-4-9b-chat,INT8,32,10440,224.2,125.7,7.955449483,,,
gemma-7b-it,INT8,1024,9965.1,3723.4,129.1,7.745933385,,,
glm-4-9b-chat,INT8,1024,10440.1,4054.2,129.2,7.73993808,,,
starcoder,INT4,32,9738.6,599.6,177.5,5.633802817,,,
flan-t5-xxl,FP16,33,19273,553.7,188.1,5.316321106,,,
flan-t5-xxl,FP16,1139,24887.6,999,193.1,5.178663905,,,
phi-3-medium-4k-instruct,INT8,38,14453.1,1342.7,205.9,4.856726566,,,
phi-3-medium-4k-instruct,INT8,1061,14287.2,19763.6,210.9,4.741583689,,,
decilm-7b-instruct,FP16,36,14215.6,465.7,222,4.504504505,,,
decilm-7b-instruct,FP16,1091,14332.5,12122.8,225.6,4.432624113,,,
starcoder,INT8,32,8567.4,379.1,235.4,4.24808836,,,
llama-3.1-8b,FP16,32,15653.3,319.9,240.7,4.154549231,,,
starcoder,INT4,1024,9738.7,6736.5,241.1,4.147656574,,,
llama-3.1-8b,FP16,1024,17004.9,4679.8,245.7,4.07000407,,,
starcoder,INT8,1024,9829.9,8819.9,269.2,3.714710253,,,
lcm-dreamshaper-v7,INT4,32,5391.5,296.1,284.2,3.518648839,,,
lcm-dreamshaper-v7,INT4,1024,5779.1,305.6,284.3,3.517411185,,,
lcm-dreamshaper-v7,FP16,1024,5967.9,304.5,284.5,3.514938489,,,
lcm-dreamshaper-v7,FP16,32,5238.8,295.8,284.5,3.514938489,,,
lcm-dreamshaper-v7,INT8,32,4974.1,314.4,301.4,3.317850033,,,
lcm-dreamshaper-v7,INT8,1024,5622.3,323.9,301.7,3.314550878,,,
stable-diffusion-v2-1,FP16,1024,5942.7,475.7,444.7,2.248706993,,,
stable-diffusion-v2-1,FP16,32,5197.9,466.9,445.4,2.245172878,,,
baichuan2-13b-chat,INT4,1024,12879,5213.1,448.6,2.229157379,,,
stable-diffusion-v2-1,INT8,32,4723.6,484,455.9,2.193463479,,,
stable-diffusion-v2-1,INT8,1024,5458.1,489.4,456.2,2.192021043,,,
stable-diffusion-v1-5,FP16,1024,6573.2,576.6,550.6,1.816200509,,,
stable-diffusion-v1-5,FP16,32,5848.9,570.5,551.4,1.81356547,,,
stable-diffusion-v1-5,INT8,32,5581,603.9,587.7,1.701548409,,,
stable-diffusion-v1-5,INT8,1024,6258.2,612.9,589.4,1.696640652,,,
phi-3-medium-4k-instruct,FP16,38,27222.7,3293.8,1198.9,0.834097923,,,
phi-3-medium-4k-instruct,FP16,1061,28813.8,32882.8,1199.7,0.833541719,,,
Loading
Loading