Add autoquant support for torchao quantizer #35503

jerryzh168 · 2025-01-03T18:55:45Z

Summary:
att, also verified that autoquantized model can be saved and loaded:

save: https://gist.github.com/jerryzh168/01d367aaf44dbbbfd4068a4a10a00061
load: https://gist.github.com/jerryzh168/d5c6c401b2abdf18e0b6771341f1525c

Test Plan:
tested locally with above script
model uploaded to https://huggingface.co/jerryzh168/llama3-8b-autoquant

Reviewers:

Subscribers:

Tasks:

Tags:

jerryzh168 · 2025-01-03T21:58:48Z

README can be updated after #35490 is landed

jerryzh168 · 2025-01-03T22:25:35Z

cc @SunMarc can you help review?

SunMarc

Thanks for adding this ! You also wanted to update the min version of torchao no ? If so, we need to update the checks in TorchAoConfig and in TorchAoHfQuantizer. Also, maybe it could be nice to update the docs with details about autoquant option !

src/transformers/quantizers/quantizer_torchao.py

tests/quantization/torchao_integration/test_torchao.py

jerryzh168 · 2025-01-08T07:17:51Z

@SunMarc please take a look again, thanks

SunMarc

Nice, thanks for adding this ! Thanks for updating the docstring. It would be even better if you can add a paragraph on autoquant inside torchao docs there : https://huggingface.co/docs/transformers/main/en/quantization/torchao ?

HuggingFaceDocBuilderDev · 2025-01-08T10:52:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jerryzh168 · 2025-01-08T18:05:20Z

@SunMarc thanks for the review, yeah for the doc, I plan to add autoquant after #35490 is landed to avoid conflict

SunMarc · 2025-01-14T10:38:07Z

tests/quantization/torchao_integration/test_torchao.py

+        output = quantized_model.generate(**input_ids, max_new_tokens=self.max_new_tokens)
+        quantized_model.finalize_autoquant()


Make sure to include this in the doc, otherwise it will be hard for the user to understand how to use autoquant. Instead of generate, can we just call the forward of the model or it makes a difference ?

I think we should call the codepath that is finally being tested, in this case generate

If you are calling generate you need to make sure the static cache is used otherwise compilation will happen at each steps

SunMarc · 2025-01-14T10:38:32Z

gentle ping @ArthurZucker as @jerryzh168 will soon be in PTO.

jerryzh168 · 2025-01-29T23:00:10Z

@ArthurZucker can you help land this?

SunMarc · 2025-02-04T09:59:41Z

Can you update the documentation @jerryzh168 ? After that, I will merge the PR

ArthurZucker

LGTM, this needs some default generation config as well :

static cache
disable compile add disable compile option #36161 will bring it!

ArthurZucker · 2025-02-13T08:12:06Z

tests/quantization/torchao_integration/test_torchao.py

+        output = quantized_model.generate(**input_ids, max_new_tokens=self.max_new_tokens)
+        quantized_model.finalize_autoquant()


If you are calling generate you need to make sure the static cache is used otherwise compilation will happen at each steps

Summary: att, also verified that autoquantized model can be saved and loaded: save: https://gist.github.com/jerryzh168/01d367aaf44dbbbfd4068a4a10a00061 load: https://gist.github.com/jerryzh168/d5c6c401b2abdf18e0b6771341f1525c Test Plan: tested locally with above script model uploaded to https://huggingface.co/jerryzh168/llama3-8b-autoquant Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2025-02-19T23:52:37Z

@ArthurZucker added the cache_implementation settings, I didn't use disable_compile right now, is it required as well?

@SunMarc added autoquant to doc

tests/quantization/torchao_integration/test_torchao.py

SunMarc

Thanks for iterating !

docs/source/en/quantization/torchao.md

SunMarc reviewed Jan 6, 2025

View reviewed changes

src/transformers/quantizers/quantizer_torchao.py Outdated Show resolved Hide resolved

src/transformers/quantizers/quantizer_torchao.py Outdated Show resolved Hide resolved

tests/quantization/torchao_integration/test_torchao.py Outdated Show resolved Hide resolved

jerryzh168 requested a review from SunMarc January 8, 2025 07:17

SunMarc approved these changes Jan 8, 2025

View reviewed changes

SunMarc requested a review from ArthurZucker January 8, 2025 10:25

SunMarc reviewed Jan 14, 2025

View reviewed changes

ArthurZucker reviewed Feb 13, 2025

View reviewed changes

jerryzh168 added 8 commits February 18, 2025 17:36

add test

b894b67

ruff fix

d65104e

ruff reformat

9925a83

add docs and min_sqnr support

9519d80

format

990fea8

format

e1767b2

fix test

dfbb0a0

jerryzh168 force-pushed the add-autoquant branch from 88ef86d to dfbb0a0 Compare February 19, 2025 01:39

jerryzh168 added 2 commits February 19, 2025 00:40

update doc

807ee3c

format

2c71ea4

SunMarc reviewed Feb 20, 2025

View reviewed changes

tests/quantization/torchao_integration/test_torchao.py Outdated Show resolved Hide resolved

jerryzh168 added 2 commits February 20, 2025 15:29

remove disable_compile

6e20834

format

1e30887

SunMarc approved these changes Feb 21, 2025

View reviewed changes

docs/source/en/quantization/torchao.md Show resolved Hide resolved

SunMarc requested a review from ArthurZucker February 21, 2025 13:30

SunMarc merged commit 2af272c into huggingface:main Feb 24, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add autoquant support for torchao quantizer #35503

Add autoquant support for torchao quantizer #35503

jerryzh168 commented Jan 3, 2025

jerryzh168 commented Jan 3, 2025

jerryzh168 commented Jan 3, 2025 •

edited

Loading

SunMarc left a comment

jerryzh168 commented Jan 8, 2025

SunMarc left a comment •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 8, 2025

jerryzh168 commented Jan 8, 2025

SunMarc Jan 14, 2025

jerryzh168 Jan 16, 2025 •

edited

Loading

ArthurZucker Feb 13, 2025

SunMarc commented Jan 14, 2025

jerryzh168 commented Jan 29, 2025

SunMarc commented Feb 4, 2025

ArthurZucker left a comment

ArthurZucker Feb 13, 2025

jerryzh168 commented Feb 19, 2025

SunMarc left a comment

		output = quantized_model.generate(**input_ids, max_new_tokens=self.max_new_tokens)
		quantized_model.finalize_autoquant()

Add autoquant support for torchao quantizer #35503

Add autoquant support for torchao quantizer #35503

Conversation

jerryzh168 commented Jan 3, 2025

jerryzh168 commented Jan 3, 2025

jerryzh168 commented Jan 3, 2025 • edited Loading

SunMarc left a comment

Choose a reason for hiding this comment

jerryzh168 commented Jan 8, 2025

SunMarc left a comment • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 8, 2025

jerryzh168 commented Jan 8, 2025

SunMarc Jan 14, 2025

Choose a reason for hiding this comment

jerryzh168 Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

ArthurZucker Feb 13, 2025

Choose a reason for hiding this comment

SunMarc commented Jan 14, 2025

jerryzh168 commented Jan 29, 2025

SunMarc commented Feb 4, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Feb 13, 2025

Choose a reason for hiding this comment

jerryzh168 commented Feb 19, 2025

SunMarc left a comment

Choose a reason for hiding this comment

jerryzh168 commented Jan 3, 2025 •

edited

Loading

SunMarc left a comment •

edited

Loading

jerryzh168 Jan 16, 2025 •

edited

Loading