Refactor and clean up hf_ptq.py #665

shengliangxu · 2025-12-08T19:27:50Z

What does this PR do?

Refactor and clean up hf_ptq.py

This script has several separate logic and the code of them are entangled, making it really hard to add new features

Refactor them so that we separate these logics:

sparsity, all logic go to sparsity_main. TODO: we may actually move this logic out to a separate script
quantize, all logic go to quantize_main.

2.1 plain quantization with a single quantization format

2.2 auto quantization

In the quantization pipeline, separate the pipeline to:

model loading
calibrate dataset loading
pre-quantize processing
actual quantize
post-quantize processing
quantized model export

Testing

tested the mono quantization:

python examples/llm_ptq/hf_ptq.py \
    --pyt_ckpt_path=Qwen/Qwen3-8B \
    --export_path=qwen3-8B_fp8 \
    --qformat=fp8 \
    --kv_cache_qformat=fp8 \
    --calib_size=16 \
    --batch_size=0 \
    --trust_remote_code \
    --export_fmt=hf

and deployed to vLLM and TRTLLM, validated accuracy using lm_eval

tested auto quantize:

python examples/llm_ptq/hf_ptq.py \
    --qformat=nvfp4,fp8 \
    --auto_quantize_score_size 128 \
    --auto_quantize_bits 5.0 \
    --auto_quantize_checkpoint llama-8B-auto-quantize-checkpoint \
    --pyt_ckpt_path=meta-llama/Meta-Llama-3-8B \
    --export_path=llama-8B_auto_quantize
    --kv_cache_qformat=fp8 \
    --calib_size=16 \
    --batch_size=0 \
    --trust_remote_code \
    --export_fmt=tensorrt_llm

and compared exported files

copy-pr-bot · 2025-12-08T19:27:53Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

This script has several separate logic and the code of them are entangled, making it really hard to add new features Refactor them so that we separate these logics: 1. sparsity, all logic go to sparsity_main. TODO: we may actually move this logic out to a separate script 2. quantize, all logic go to quantize_main. 2.1 plain quantization with a single quantization format 2.2 auto quantization In the quantization pipeline, separate the pipeline to: 1. model loading 2. calibrate dataset loading 3. pre-quantize processing 4. actual quantize 5. post-quantize processing 6. quantized model export Signed-off-by: Shengliang Xu <[email protected]>

codecov · 2025-12-11T00:42:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.78%. Comparing base (cd0d185) to head (defa50a).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #665      +/-   ##
==========================================
- Coverage   74.80%   74.78%   -0.02%     
==========================================
  Files         192      192              
  Lines       18814    18814              
==========================================
- Hits        14073    14070       -3     
- Misses       4741     4744       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv · 2025-12-12T19:42:58Z

examples/llm_ptq/multinode_ptq.py

        args.qformat,
        args.kv_cache_qformat,
        args.awq_block_size,
-        None,


@shengliangxu is this intended?

yes, this None is for auto_quantize, the target function does not apply to auto_quantize at all, so removed this useless arg.

cjluo-nv · 2025-12-12T19:45:25Z

examples/llm_ptq/hf_ptq.py

+    mts.export(full_model)
+
+
+def plain_quantize(


nit: maybe let's call default_quantize or single_precision_quantize?

It's not necessary single precision, just a single config. We can have mixed precision even using single config.

default_quantize does not sounds a bit too general and a bit tedious. How about mono_quantize? meaning quantize using a single config

auto_quantize vs mono_quantize, quite symmetric

cjluo-nv

Thanks @shengliangxu for the refactoring. Could you also validate that the checkpoint before and after this change is the same?

shengliangxu · 2025-12-13T01:12:53Z

Thanks @shengliangxu for the refactoring. Could you also validate that the checkpoint before and after this change is the same?

done, updated the description.

Signed-off-by: Shengliang Xu <[email protected]>

shengliangxu force-pushed the shengliangx/hf_ptq_refactor_cleanup branch from 832fb13 to 070ae87 Compare December 8, 2025 21:01

shengliangxu force-pushed the shengliangx/hf_ptq_refactor_cleanup branch from 070ae87 to a89625b Compare December 8, 2025 22:05

shengliangxu marked this pull request as ready for review December 8, 2025 22:11

shengliangxu requested review from a team as code owners December 8, 2025 22:11

shengliangxu requested review from kevalmorabia97 and sugunav14 December 8, 2025 22:11

kevalmorabia97 requested review from Edwardf0t1 and cjluo-nv December 9, 2025 05:16

Merge branch 'main' into shengliangx/hf_ptq_refactor_cleanup

e15d632

shengliangxu self-assigned this Dec 11, 2025

Merge branch 'main' into shengliangx/hf_ptq_refactor_cleanup

7e9a419

cjluo-nv reviewed Dec 12, 2025

View reviewed changes

mono quantize

defa50a

Signed-off-by: Shengliang Xu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor and clean up hf_ptq.py #665

Refactor and clean up hf_ptq.py #665

Uh oh!

shengliangxu commented Dec 8, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Dec 8, 2025

Uh oh!

codecov bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

cjluo-nv Dec 12, 2025

Uh oh!

shengliangxu Dec 12, 2025

Uh oh!

cjluo-nv Dec 12, 2025

Uh oh!

shengliangxu Dec 12, 2025

Uh oh!

shengliangxu Dec 12, 2025

Uh oh!

cjluo-nv left a comment

Uh oh!

shengliangxu commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Refactor and clean up hf_ptq.py #665

Are you sure you want to change the base?

Refactor and clean up hf_ptq.py #665

Uh oh!

Conversation

shengliangxu commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Uh oh!

copy-pr-bot bot commented Dec 8, 2025

Uh oh!

codecov bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

shengliangxu Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

cjluo-nv Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

shengliangxu Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

shengliangxu Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

shengliangxu commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shengliangxu commented Dec 8, 2025 •

edited

Loading

codecov bot commented Dec 11, 2025 •

edited

Loading