Add Quanto4,2, HQQ4,2 KV cache quantization support to Transformers loader #6768

dinerburger · 2025-02-22T03:22:15Z

Let's try again on the correct branch lol

This patch adds quanto KV cache quantization support for Transformers. The placement is unfortunately a little awkward, but we need to pass the same QuantizedCache object to each step. Happy to put it somewhere else.

Also, added optimum-quanto to requirements.txt. It can probably be safely made an optional dependency, but it's quite small and I believe portable across platforms.

Checklist:

I have read the Contributing guidelines.

cceneag

👍👍

dinerburger added 3 commits February 21, 2025 22:10

Get quanto4,2 KV cache working in Transformers

4a55c74

Add optimum-quanto to requirements

d0ea050

Add HQQ KV cache quantization for Transformers

244d3cb

dinerburger changed the title ~~Add Quanto4,2 KV cache quantization support to Transformers loader~~ Add Quanto4,2, HQQ4,2 KV cache quantization support to Transformers loader Feb 22, 2025

cceneag suggested changes Oct 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Quanto4,2, HQQ4,2 KV cache quantization support to Transformers loader #6768

Add Quanto4,2, HQQ4,2 KV cache quantization support to Transformers loader #6768

Uh oh!

dinerburger commented Feb 22, 2025 •

edited

Loading

Uh oh!

cceneag left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Quanto4,2, HQQ4,2 KV cache quantization support to Transformers loader #6768

Are you sure you want to change the base?

Add Quanto4,2, HQQ4,2 KV cache quantization support to Transformers loader #6768

Uh oh!

Conversation

dinerburger commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist:

Uh oh!

cceneag left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dinerburger commented Feb 22, 2025 •

edited

Loading