Skip to content

v0.7.0

Choose a tag to compare

@wenhuach21 wenhuach21 released this 10 Sep 09:12
· 105 commits to main since this release
v0.7.0

🚀 Highlights

  • Enhanced NVFP4 algorithm and added support to export MXFP4/NVFP4 to the llm-compressor format
    by @WeiweiZhang1 and @wenhuach21

  • Improved W2A16 quantization algorithm
    by @wenhuach21

  • Introduced the scheme interface for easier configuration of quantization settings
    by @wenhuach21

  • Added support for using FP8 models as input and str name as model input in API
    by @wenhuach21 and @n1ck-guo

  • Unified device and device_map arguments and introduced device_map="auto"
    to simplify quantization of extremely large models
    by @Kaihui-intel

What's Changed

New Contributors

Full Changelog: v0.6.0...v0.7.0