add extra_options use_channel_wised_quantization to builder.py #1362

bopeng1234 · 2025-03-31T08:03:43Z

Add extra options to builder.py

enable quantize the model with block size = K

this PR want to work with intel/onnxruntime#631 to enable channel wised quantization capability of onnxruntime genai to generate symmetric and block_size = -1 quantized model.

with this format model, Intel NPU is able to runs x20+ speed up compared to original block size 16/32/64/128/256 models.

command:

python -m onnxruntime_genai.models.builder -o E:\download\onnx\Phi-3-mini-4k-instruct-onnx-channelwise-modified -p int4 -e cpu -i E:\download\huggingface\Phi-3-mini-4k-instruct --extra_options use_channel_wised_quantization=1 use_qdq=1

bopeng1234 · 2025-04-07T03:19:27Z

@microsoft-github-policy-service agree

…ize the model with block size = K

bopeng1234 mentioned this pull request Mar 31, 2025

add 4bits channel-wised quantization capability for MatMulNbits Op intel/onnxruntime#631

Closed

bopeng1234 force-pushed the main branch from 4ebd8ee to ed100a5 Compare April 1, 2025 02:56

bopeng1234 marked this pull request as draft April 8, 2025 01:36

bopeng1234 mentioned this pull request Apr 22, 2025

add channel wise quantization option for QDQ, and opt for intel NPU intel/onnxruntime#669

Merged

bopeng1234 marked this pull request as ready for review April 22, 2025 02:24

bopeng1234 force-pushed the main branch from ed100a5 to bbc0168 Compare April 30, 2025 07:02

add extra_options use_channel_wised_quantization to builder.py, quant…

cc8b56a

…ize the model with block size = K

bopeng1234 force-pushed the main branch from 9ac5a6a to cc8b56a Compare May 7, 2025 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add extra_options use_channel_wised_quantization to builder.py #1362

add extra_options use_channel_wised_quantization to builder.py #1362

bopeng1234 commented Mar 31, 2025 •

edited

Loading

bopeng1234 commented Apr 7, 2025

add extra_options use_channel_wised_quantization to builder.py #1362

Are you sure you want to change the base?

add extra_options use_channel_wised_quantization to builder.py #1362

Conversation

bopeng1234 commented Mar 31, 2025 • edited Loading

bopeng1234 commented Apr 7, 2025

bopeng1234 commented Mar 31, 2025 •

edited

Loading