Skip to content

Releases: intel/auto-round

v0.8.0

23 Oct 08:53
v0.8.0
cee6ac3

Choose a tag to compare

Highlights

What's Changed

Full Changelog: v0.7.1...v0.8.0

v0.7.1 patch release

23 Sep 04:54
v0.7.1
4d72b45

Choose a tag to compare

fix severe vram leak regression in auto-round format packing @ #842

v0.7.0

10 Sep 09:12
v0.7.0

Choose a tag to compare

🚀 Highlights

  • Enhanced NVFP4 algorithm and added support to export MXFP4/NVFP4 to the llm-compressor format
    by @WeiweiZhang1 and @wenhuach21

  • Improved W2A16 quantization algorithm
    by @wenhuach21

  • Introduced the scheme interface for easier configuration of quantization settings
    by @wenhuach21

  • Added support for using FP8 models as input and str name as model input in API
    by @wenhuach21 and @n1ck-guo

  • Unified device and device_map arguments and introduced device_map="auto"
    to simplify quantization of extremely large models
    by @Kaihui-intel

What's Changed

New Contributors

Full Changelog: v0.6.0...v0.7.0

v0.6.0

24 Jul 02:33
v0.6.0
dd95bdb

Choose a tag to compare

Highlights

  • provide experimental support for gguf q*_k format and customized mixed bits setting
  • support xpu in triton backend by @wenhuach21 in #563
  • add torch backend by @WeiweiZhang1 in #555
  • provide initial support of llmcompressor format, only INT8 W8A8 dynamic quantization is supported by @xin3he in #646

What's Changed

New Contributors

Full Changelog: v0.5.1...v0.6.0

v0.5.1:bug fix release

23 Apr 08:50
v0.5.1
73669aa

Choose a tag to compare

What's Changed

Full Changelog: v0.5.0...v0.5.1

v0.5.0

22 Apr 08:05
v0.5.0
e90f991

Choose a tag to compare

Highlights

  • refine autoround format inference, support 2,3,4,8 bits and marlin kernel and fix several bugs in auto-round format
  • support xpu in tuning and inference by @wenhuach21 in #481
  • support for more vlms by @n1ck-guo in #390
  • change quantization method name and made several refinements by @wenhuach21 in #500
  • support rtn via iters==0 by @wenhuach21 in #510
  • fix bug of mix calib dataset by @n1ck-guo in #492

What's Changed

Full Changelog: v0.4.7...v0.5.0

v0.4.7

01 Apr 09:50

Choose a tag to compare

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

Full Changelog: v0.4.6...v0.4.7

v0.4.6

24 Feb 09:23

Choose a tag to compare

Highlights:

1 set torch compile to false by default in #447
2 Fix packing hang and force to fp16 at exporting in #430
3 align auto_quantizer with Transformers 4.49 in #437

What's Changed

Full Changelog: v0.4.5...v0.4.6

v0.4.5

27 Jan 12:12

Choose a tag to compare

Highlights:
We have enhanced support for extremely large models with the following updates:

Multi-Card Tuning Support: Added basic support for multi-GPU tuning. #415 support naive multi-card tuning

Accelerated Packing Stage: Improved the packing speed (2X-4X)for AutoGPTQ and AutoAWQ formats by leveraging cuda. #407 speedup packing stage for autogptq and autoawq forma

Deepseek V3 GGUF Export: Introduced support for exporting models to the Deepseek V3 GGUF format. #416 support to export deepseek v3 gguf format

What's Changed

Full Changelog: v0.4.4...v0.4.5

v0.4.4 release

10 Jan 01:47
86767b0

Choose a tag to compare

Highlights:
1 Fix install issue in #387
2 support to export gguf q4_0 and q4_1 format in #393
3 fix llm cmd line seqlen issue in #399

What's Changed

Full Changelog: v0.4.3...v0.4.4