24 Oct 04:31

Qubitium

45ab616

GPT-QModel v5.0.0 Latest

Latest

Notable Changes:

New Data-parallel quant support for MoE models on multi-gpu using nogil Python (Python >= 3.13t with PYTHON_GIL=0 env).
New offload_to_disk support enabled by default to massively reduce cpu ram usage.
New Intel optimized and Amd compatible cpu hw accelerated TorchFused kernel.
Packing stage is now 4x faster and now inlined with quantization.
Vram pressure for large models reduced during quantization.
act_group_aware is now 16k+ times faster and the default when desc_act=False for higher quality recovery without inference penalty of desc_act=True.
New beta quality AWQ support with full GEMM, GEMM_Fast, Marlin kernel support.
New LFM, Ling, Qwen3 Omni model support.
Bitblas kernel updated to support Bitblas 0.1.0.post1 reelase.
Quantization is now faster with reduced vram usage. Enhanced logging support with LogBar.
And much much more...

What's Changed

rename torch_dtype to dtype to sync with hf transformers by @Qubitium in #1804
drop support for python < 3.11 by @CSY-ModelCloud in #1805
hard deprecated ipex in favor of torch_fused by @Qubitium in #1807
update pyproject.toml by @CSY-ModelCloud in #1808
[CI] release with 3.13t by @CSY-ModelCloud in #1811
[QUANTIZATION] Add AWQ support by @ZX-ModelCloud in #1703
find mapping by @LRL-ModelCloud in #1812
Update README.md by @Qubitium in #1813
Update version.py by @Qubitium in #1814
Turtle in a half shell by @Qubitium in #1809
note about memory saving by @Qubitium in #1817
move fail_safe by @LRL-ModelCloud in #1818
rename turtle method by @Qubitium in #1820
add threads by @Qubitium in #1821
remove AWQ mod defs by @ZX-ModelCloud in #1822
[CI] use new docker by @CSY-ModelCloud in #1823
Fix awq quantize by @LRL-ModelCloud in #1824
[CI] use new docker for release source by @CSY-ModelCloud in #1825
fix awq pack by @LRL-ModelCloud in #1826
fix loading autoawq models and hf/vllm/sglang loading of newly awq qu… by @Qubitium in #1827
wrong arg check by @Qubitium in #1828
fix thread task var scoping by @Qubitium in #1829
fix call param by @Qubitium in #1830
fix threads > 1 not considered (unsafe) by @Qubitium in #1832
cleanup by @Qubitium in #1833
fix gptqmodel offload paths conflict by @Qubitium in #1834
Ci test by @Qubitium in #1835
eora: always diff in fp32 + cleanup by @Qubitium in #1836
add register_buffer/parameter to NamedModule class by @Qubitium in #1837
typo by @Qubitium in #1839
add thread safety to all classes by @Qubitium in #1840
fix fail_safe by @LRL-ModelCloud in #1844
update marlin kernel by @ZX-ModelCloud in #1838
fix fp32 reduce on/off by @Qubitium in #1845
bypass marlin kernel bias issue by @Qubitium in #1846
disable marlin atomics by default as it failed ci accuracy test by @Qubitium in #1847
[FIX] awq marlin by @ZX-ModelCloud in #1816
cleanup var names by @Qubitium in #1849
pack per module by @LRL-ModelCloud in #1842
[CI] use new docker by @CSY-ModelCloud in #1850
tweak eora test by @Qubitium in #1851
wait for thread tasks only when every module has completed. by @Qubitium in #1852
[FIX] Compatible with vllm v0.10.2 by @ZX-ModelCloud in #1855
move req.txt into toml by @CSY-ModelCloud in #1858
do not create buffers only to overite them by @Qubitium in #1857
pop states after use by @Qubitium in #1859
[FIX] multiple "register_buffers" parameters by @ZX-ModelCloud in #1860
Low memory pack by @Qubitium in #1861
fix packing ci test by @Qubitium in #1862
simplify by @Qubitium in #1853
Fix 3bit packing regression in previous commit by @Qubitium in #1863
remove deprecated parallel_packing property by @Qubitium in #1864
Fix qqq quant/offloading by @Qubitium in #1866
temp disable awq gemm kernel due to failing ci by @Qubitium in #1867
update vllm compat by @Qubitium in #1869
fix regression by @Qubitium in #1870
fix setup.py crashed because torch may not support float8_e8m0fnu by @CSY-ModelCloud in #1871
[FIX] AwqGEMMQuantLinear skip gptq_v1 convert to v2 by @ZX-ModelCloud in #1872
Fix awq gemm auto kernel selection order by @Qubitium in #1873
Update README.md by @Qubitium in #1874
reduce forwarding to minimal by @Qubitium in #1876
Update README.md by @Qubitium in #1877
fix exllama tests by @Qubitium in #1879
debug print all params/buffers by @Qubitium in #1880
skip internal loading of non-pkg compatible quantization models, i.e.… by @Qubitium in #1881
Loader by @Qubitium in #1882
Cleanup awq by @Qubitium in #1883
remove broken test by @Qubitium in #1884
[CI] remove old cuda/torch support for release by @CSY-ModelCloud in #1885
fix loader by @LRL-ModelCloud in #1886
fix nvcc warnings about pending cuda > 13.x compat by @Qubitium in #1887
fix packing speed test by @Qubitium in #1889
fix licenses warning by @CSY-ModelCloud in #1888
set licenses to apache by @CSY-ModelCloud in #1890
[FIX] AwqGEMMQuantLinear should is PackableQuantLinear by @ZX-ModelCloud in #1891
skip modules that have no parameters and no buffers since they can't be offloaded by @LRL-ModelCloud in #1892
skip modules that have no parameters and no buffers since they can't offload by @LRL-ModelCloud in #1894
Fix device check by @Qubitium in #1896
[CI] disable test install by @CSY-ModelCloud in #1895
remove hash feature by @Qubitium in #1897
fix cuda ext cannot be loaded by @Qubitium in #1898
lock numpy to 2.2.6 by @CSY-ModelCloud in #1899
[FIX] test_lm_eval.py by @ZX-ModelCloud in #1900
Patch fix model save by @Qubitium in #1901
Ugly patch save 2 by @Qubitium in #1902
fix potential leak by @Qubitium in #1904
[FIX] test_integration by @ZX-ModelCloud in #1903
fix build will uploaded a empty wheel by @CSY-ModelCloud in #1905
fix lm_head quant by @LRL-ModelCloud in #1906
batch tweaks by @Qubitium in #1907
[FIX] test_kernel_output_torch_fused by @ZX-ModelCloud in ...

Contributors

Qubitium, jiqing-feng, and 4 other contributors

Assets 34

gptqmodel-5.0.0+cu126torch2.8-cp310-cp310-linux_x86_64.whl

sha256:316e89a03c78d4ef85f27980f832c5e7ecfc91eef1c9598cb1f3daab2262adb9

118 MB 2025-10-24T08:16:13Z
gptqmodel-5.0.0+cu126torch2.8-cp311-cp311-linux_x86_64.whl

sha256:fbc284dd4f9e527a023de2fd1e5ab3c33d38223e7e1ff0dfe32d4bfbd18b16dc

118 MB 2025-10-24T07:40:32Z
gptqmodel-5.0.0+cu126torch2.8-cp312-cp312-linux_x86_64.whl

sha256:e0638664f49437804c5592b199a3b0247064d3717d82e00271cbd920e6de704a

118 MB 2025-10-24T08:08:48Z
gptqmodel-5.0.0+cu126torch2.8-cp313-cp313-linux_x86_64.whl

sha256:066059432ead8c70ecdcf693e697e80cba8307f618f1334804f9f95296066ea3

118 MB 2025-10-24T08:06:18Z
gptqmodel-5.0.0+cu126torch2.8-cp313-cp313t-linux_x86_64.whl

sha256:164bf0eff25b378fd49da9fc8d5ca301430ccde85d77a0c81e5286ea6e8276e3

118 MB 2025-10-24T07:29:06Z
gptqmodel-5.0.0+cu126torch2.9-cp310-cp310-linux_x86_64.whl

sha256:8fcff44581599dd203612fd80334965df8ba198eb5199d7237492afee2548818

118 MB 2025-10-24T06:25:57Z
gptqmodel-5.0.0+cu126torch2.9-cp311-cp311-linux_x86_64.whl

sha256:eb4bcca0eb420d683f115916d79ef9e205ebf925d2ce0155cbedcaba513c7600

119 MB 2025-10-24T06:10:17Z
gptqmodel-5.0.0+cu126torch2.9-cp312-cp312-linux_x86_64.whl

sha256:8ed3f7ae1024662f3c4045bc6d5539a3d82678fc9dab27b3a08503d9444555a5

119 MB 2025-10-24T06:41:25Z
gptqmodel-5.0.0+cu126torch2.9-cp313-cp313-linux_x86_64.whl

sha256:15912b5eb10ba52f6289a3cfef6931cbed5165a3f4d0701d14cc236f4a840aff

119 MB 2025-10-24T06:07:05Z
gptqmodel-5.0.0+cu126torch2.9-cp313-cp313t-linux_x86_64.whl

sha256:64a8280107d574023ea151beb6b634e4a92732635af1b5f7a2a692bc69f8f8d6

119 MB 2025-10-24T06:06:36Z
Source code (zip)

2025-10-24T04:30:33Z
Source code (tar.gz)

2025-10-24T04:30:33Z

16 Sep 13:19

Qubitium

v4.2.5

db41ae4

GPT-QModel v4.2.5

What's Changed

Cleanup hyb_act by @Qubitium in #1791
Remove torch import in setup.py by @Qubitium in #1729
Refractor: rename hyb_act to act_group_aware by @Qubitium in #1794
Cleanup by @Qubitium in #1795, #1796
[CI] Add torch 2.8.0 by @CSY-ModelCloud in #1797
[CI] torch-2.6.0+cu128-python-3.9 does not exist by @CSY-ModelCloud in #1798
Fix wf_unsqueeze_zero and wf_unsqueeze_neg_one by @LRL-ModelCloud in #1799
GAR field save to meta on quant save by @Qubitium in #1800
Add pyproject.toml by @CSY-ModelCloud in #1801
[CI] Don't detect arch list when it has already been set & fix build-system requirments by @CSY-ModelCloud in #1802

Full Changelog: v4.2.0...v4.2.5

Contributors

Qubitium, LRL-ModelCloud, and CSY-ModelCloud

Assets 45

12 Sep 08:02

Qubitium

v4.2.0

c0c3569

GPT-QModel v4.2.0

Notable Changes

Add Qwen3-Next by @Qubitium and @LRL-ModelCloud in #1787
Add Apertus support by @LRL-ModelCloud in #1767
Add Kimi k2 support by @LRL-ModelCloud in #1768
Add Klear support by @LRL-ModelCloud in #1769
Add FastLLM support by @LRL-ModelCloud in #1771
Add Nemotron H support by @LRL-ModelCloud in #1773
Add fail_safe option by @LRL-ModelCloud in #1775
Use threading lock to protect unsafe tensor moves in multi-gpu by @Qubitium in #1778
Avoid building experimental extensions to reduce wheel size by @Qubitium in #1763

What's Changed

Fix LlavaQwen2GPTQ by @LRL-ModelCloud in #1772
Fix Q.to on multi-gpu gptq when proceeding fast and has many experts and gpus by @avtc in #1774
Bump actions/setup-python from 5 to 6 in the github-actions group by @dependabot[bot] in #1758
[CI] fix release jobs were skipped by @CSY-ModelCloud in #1759
ignore compile warns about var declared but not used by @Qubitium in #1760
allow prebuilt wheel path to be customized via env by @Qubitium in #1761
add build toggles for all cpp kernels by @Qubitium in #1764
fix multi gpu inference by @LRL-ModelCloud in #1762
[CI] reduce wheel download size by @CSY-ModelCloud in #1765
start 4.2.0-dev cycle by @Qubitium in #1766
fix klear by @LRL-ModelCloud in #1770
FIX transformers >= 4.56.1 force changed torch.default_dtype by @Qubitium in #1779
fix multi gpu fail_safe by @LRL-ModelCloud in #1780
fix device instance by @LRL-ModelCloud in #1783
prepare for 4.2 release by @Qubitium in #1785

Full Changelog: v4.1.0...v4.2.0

Contributors

Qubitium, avtc, and 3 other contributors

Assets 64

04 Sep 20:18

Qubitium

v4.1.0

4ab07b5

GPT-QModel v4.1.0

Notable Changes:

Add a config option: mock_quantization to simplify heavy computations… by @avtc in #1731
Add GLM-4.5-Air support by @avtc in #1730
Add GPT-OSS support by @LRL2-ModelCloud in #1737
Add LongCatFlashGPTQ by @LRL-ModelCloud in #1751
Add Llama 4 Support by @Qubitium in #1508

What's Changed

Minor Cleanup by @Qubitium in #1718
disable some compilation on torch 2.8 due to compat issues by @Qubitium in #1727
add glm4 moe test by @LRL2-ModelCloud in #1734
deprecate autoround by @Qubitium in #1735
[FIX] test_kernel_output with XPU by @ZX-ModelCloud in #1741
cleanup checks for GIL control, GIL=0, and python >= 3.13.3t by @Qubitium in #1743
update torch/transformer depends by @Qubitium in #1749
reduce pkg depend by @Qubitium in #1750
fix triton compat check for 3.13.3t by @Qubitium in #1752
Bump torch from 2.7.1 to 2.8.0 in /gptqmodel_ext/exllama_eora by @dependabot[bot] in #1755
pkg update: tokenicer 0.0.5 by @Qubitium in #1756

New Contributors

@avtc made their first contribution in #1730

Full Changelog: v4.0.0...v4.1.0

Contributors

Qubitium, avtc, and 4 other contributors

Assets 55

22 Aug 14:25

Qubitium

v4.0.0

40759cd

GPT-QModel v4.0.0

Notable Changes

Supprt add glm4 by @glide-the in #1559
Add Xiaomi MiMo model by @Qubitium in #1571
Free threading (GIL free) Quantization for Linear NxGPU scaling of Quantization by @Qubitium in #1581
feat: add Qwen-Omni support. by @tiger-of-shawn in #1613
add Qwen 2.5 Omni support by @Qubitium in #1615
[MODEL] ERNIE4.5 by @LRL-ModelCloud in #1645
[MODEL]support pangu_alpha model by @ZX-ModelCloud in #1646
new baidu ernie & huawei pangu model support by @Qubitium in #1647
[MODEL] Add falcon h1 support by @LRL-ModelCloud in #1621
feat(gemma3): also support larger gemma3 models and not only small te… by @joennlae in #1627
Add Group Aware Reordering (GAR) for Efficient Activation Reordering by @tgafni in #1656
Enable pytorch fused op on XPU by @jiqing-feng in #1660
[MODEL] Add Seed-OSS support by @LRL2-ModelCloud in #1702

Other Changed

[CI] add release source with github's vm by @CSY-ModelCloud in #1543
Set format/method to string, enum by @Qubitium in #1546
Fix rotation for tied embedding models by @smpanaro in #1550
Fix missing import by @smpanaro in #1551
Fix input processing for convolution by @Cecilwang in #1554
[FIX] moe model quant division by zero issue by @LRL-ModelCloud in #1565
[FIX] remove too short calib data by @LRL-ModelCloud in #1566
Update qwen3 support by @Qubitium in #1567
[FIX] hook_module and qwen3_moe by @LRL-ModelCloud in #1569
[FIX] hook linear and triton by @LRL-ModelCloud in #1570
[MISC] simplify model definition by @LRL-ModelCloud in #1572
[FIX]qwen2 moe loop module by @LRL-ModelCloud in #1574
Process threads by @Qubitium in #1576
cleanup names by @Qubitium in #1578
Api refractor by @Qubitium in #1579
[CI] fix unit test was unable to run by @CSY-ModelCloud in #1580
fix has_gil was not imported & device-smi api wrong by @CSY-ModelCloud in #1586
Fix compat by @Qubitium in #1587
fix older python didn't have EnumType by @CSY-ModelCloud in #1590
allow hinv none to continue by @Qubitium in #1588
[FIX] get_module_by_name_prefix by @LRL-ModelCloud in #1591
[CI] update release CI, add torch 2.7.0 by @CSY-ModelCloud in #1592
Update test_opt.py by @Qubitium in #1593
remove bad test attributes by @Qubitium in #1594
default damp way too low by @Qubitium in #1599
FIX mult-gpu quant by @Qubitium in #1600
Fix reset device next by @Qubitium in #1601
fix reset by @Qubitium in #1602
ctx should be target by @Qubitium in #1603
fix qwen2-moe mlp.gate not quantized by @Qubitium in #1604
disable streaming for now by @Qubitium in #1605
disable streaming for now by @Qubitium in #1606
addm falcon h1 notes by @Qubitium in #1622
[FIX] Qwen2.5 vl quant by @LRL-ModelCloud in #1623
Bump torch from 2.6.0 to 2.7.1 in /gptqmodel_ext/exllama_eora by @dependabot[bot] in #1628
fix bug for device error by @kaixuanliu in #1631
[FIX]config seq len by @LRL-ModelCloud in #1640
gemma3 4B specific compat fix by @Qubitium in #1641
register buffer for wf_unsqueeze_zero and wf_unsqueeze_neg_one to… by @kaixuanliu in #1642
set_postfix is a tqdm function, no need anymore by @CSY-ModelCloud in #1643
Alkali modified by @alkalimc in #1644
fix exception to avoid memory issue by @jiqing-feng in #1679
lm_head hooked by @Chunfei-He in #1673
Bump the github-actions group across 1 directory with 2 updates by @dependabot[bot] in #1677
fixed bugs when quantize lm_head by @528-dev in #1675
Add gpt-neo model definition by @smpanaro in #1683
Skip compile if MPS and < torch 2.8.0 by @smpanaro in #1684
Model config.use_cache not correctly used during inference for some models by @LRL2-ModelCloud in #1686
[FIX] transformers compat by @LRL2-ModelCloud in #1687
Update module_looper.py by @LRL2-ModelCloud in #1690
Update requirements.txt by @LRL2-ModelCloud in #1689
Update version.py by @Qubitium in #1691
add ACCEPT_USE_FLASH_ATTEN2_ARG by @LRL2-ModelCloud in #1693
Fix kwarg vs pos arg hidden states by @LRL2-ModelCloud in #1694
fix import Perplexity failed by @CSY-ModelCloud in #1695
[CI] fix CI installed wrong libs' version by @CSY-ModelCloud in #1696
[FIX] GIL Check by @ZX-ModelCloud in #1697
[FIX] minicpm test by @LRL2-ModelCloud in #1698
[FIX] use AutoModelForImageTextToText instead of AutoModelForVision2Seq by @ZX-ModelCloud in #1699
[CI] change qwen2.5-omni model path by @ZX-ModelCloud in #1701
[CI] install jieba for test_pangu_alpha by @CSY-ModelCloud in #1706
disable torch.compile by @LRL2-ModelCloud in #1707
FIX minicpm CI test by @LRL2-ModelCloud in #1708
[CI] update torch for build by @CSY-ModelCloud in #1709
[CI] update release matrix by @CSY-ModelCloud in #1710
[CI] install torch compiled with cuda 126 by @CSY-ModelCloud in #1711
use "attn_implementation" by @LRL2-ModelCloud in #1712
prepare for 4.0.0 release by @Qubitium in #1704
[CI] add 5090 support & install latest intel_extension_for_pytorch by @CSY-ModelCloud in #1713
[CI] don't compile 5090 for cuda < 12.8 by @CSY-ModelCloud in #1714
[CI] Update unit test docker by @CSY-ModelCloud in #1715
[CI] fix release ci by @CSY-ModelCloud in #1716
fix model path is not public by @CSY-ModelCloud in #1720
[CI] don't exit when package doesn't exist by @CSY-ModelCloud in #1719
[CI] no need install logbar manually by @CSY-ModelCloud in #1721
[CI] remove legacy tests & skip intel tests & disable flash_attn for some models by @CSY-ModelCloud in #1722
[CI] no need install uv by @CSY-ModelCloud in #1723
[CI] use new docker with uv binary to fix shim/uv didn't exist by @CSY-ModelCloud in #1724

New Contributors

@Cecilwang made their first contribution in #1554
@glide-the made their first contribution in #1559
@tiger-of-shawn made their first contribution in https://github.com/ModelClo...

Contributors

Qubitium, joennlae, and 16 other contributors

Assets 64

14 Apr 14:15

Qubitium

v3.0.0

a0c7753

GPT-QModel v3.0.0

🎉 New ground-breaking GPTQ v2 quantization option for improved model quantization accuracy validated by GSM8K_PLATINUM benchmarks vs original gptq.
✨ New Phi4-MultiModal model support.
✨ New Nvidia Nemotron Ultra model support.
✨ New Dream model support. New experimental multi-gpu quantization support. Reduced vram usage. Faster quantization.

What's Changed

Multi GPU Quantization by @Qubitium in #1502
experimental multi-gpu quantization by @Qubitium in #1503
reduce allocation by @Qubitium in #1504
revert add_ by @Qubitium in #1506
Switch to non-deprecated mlx.core.clear_cache() by @smpanaro in #1510
Dream Model Support by @Qubitium in #1512
fix disabling batch/mask for dream by @Qubitium in #1514
reduce tensor device movement by @Qubitium in #1516
fix deepseek v3 module order by @Qubitium in #1517
Nemotron Ultra Support by @Qubitium in #1518
faster process_batch by @Qubitium in #1519
Fix missing arg due to recent Processor api changes by @Qubitium in #1523
Fix gpt2 columns calculation by @Qubitium in #1524
temp damper should not overwrite damp cfg by @Qubitium in #1526
Replace module hooking with tree-defined targeting by @Qubitium in #1527
Fix compat with XPU by @Qubitium in #1535
Phi4 MultiModal by @Qubitium in #1511
disable selection of ExllamaV2 kernel for group_size=16 for now by @Qubitium in #1537
Add Gptqv2 by @yhhhli and @Qubitium in #1533

New Contributors

@smpanaro made their first contribution in #1510
@yhhhli made their first contribution in #1533

Full Changelog: v2.2.0...v3.0.0

Contributors

Qubitium, smpanaro, and yhhhli

Assets 2

03 Apr 02:18

Qubitium

v2.2.0

ca9d634

GPTQModel v2.2.0

What's Changed

✨ New Qwen 2.5 VL model support. Prelim Qwen 3 model support.
✨ New samples log column during quantization to track module activation in MoE models.
✨ Loss log column now color-coded to highlight modules that are friendly/resistant to quantization.
✨ Progress (per-step) stats during quantization now streamed to log file.
✨ Auto bfloat16 dtype loading for models based on model config.
✨ Fix kernel compile for Pytorch/ROCm.
✨ Slightly faster quantization and auto-resolve some low-level oom issues for smaller vram gpus.

Enable ipex tests for CPU/XPU by @jiqing-feng in #1460
test kernel accuracies with more shapes on cuda by @Qubitium in #1461
Fix rocm flags by @Qubitium in #1467
use table like logging format by @Qubitium in #1471
stream process log entries to persistent file by @Qubitium in #1472
fix some models need trust-remote-code arg by @Qubitium in #1474
Fix wq dtype by @Qubitium in #1475
add colors to quant loss column by @Qubitium in #1477
add prelim qwen3 support by @Qubitium in #1478
Update eora.py for further optimization by @nbasyl in #1488
faster cholesky inverse and avoid oom when possible by @Qubitium in #1494
[MODEL] supports qwen2_5_vl by @ZX-ModelCloud in #1493

Full Changelog: v2.1.0...v2.2.0

Contributors

Qubitium, nbasyl, and 2 other contributors

Assets 61

13 Mar 14:30

Qubitium

v2.1.0

37d4b2b

GPTQModel v2.1.0

What's Changed

✨ New QQQ quantization method and inference support!
✨ New Google Gemma 3 day-zero model support.
✨ New Alibaba Ovis 2 VL model support.
✨ New AMD Instella day-zero model support.
✨ New GSM8K Platinum and MMLU-Pro benchmarking suppport.
✨ Peft Lora training with GPTQModel is now 30%+ faster on all gpu and IPEX devices.
✨ Auto detect MoE modules not activated during quantization due to insufficient calibration data.
✨ ROCm setup.py compat fixes.
✨ Optimum and Peft compat fixes.
✨ Fixed Peft bfloat16 training.

auto enable flash_attn only when flash-attn was installed by @CSY-ModelCloud in #1372
Fix rocm compat by @Qubitium in #1373
fix unnecessary mkdir by @CSY-ModelCloud in #1374
add test_kernel_output_xpu.py by @CSY-ModelCloud in #1382
clean test_kernel_output_xpu.py by @CSY-ModelCloud in #1383
tremove xpu support of triton kernel by @Qubitium in #1384
[MODEL] Add instella support by @LRL-ModelCloud in #1385
Fix optimum/peft trainer integration by @CSY-ModelCloud in #1381
rename peft test file by @CSY-ModelCloud in #1387
[CI] fix wandb was not installed & update test_olora_finetuning_xpu.py by @CSY-ModelCloud in #1388
Add lm-eval GSM8k Platinum by @Qubitium in #1394
Remove cuda kernel by @Qubitium in #1396
fix exllama kernels not compiled by @Qubitium in #1397
update tests by @Qubitium in #1398
make the kernel output validation more robust by @Qubitium in #1399
speed up ci by @Qubitium in #1400
add fwd counter by @yuchiwang in #1389
allow triton and ipex to inherit torch kernel and use torch for train… by @Qubitium in #1401
fix skip moe modules when fwd count is 0 by @Qubitium in #1404
fix ipex linear post init for finetune by @jiqing-feng in #1406
fix optimum compat by @Qubitium in #1408
[Feature] Add mmlupro API by @CL-ModelCloud in #1405
add training callback by @CSY-ModelCloud in #1409
Fix bf16 training by @Qubitium in #1410
fix bf16 forward for triton by @Qubitium in #1411
Add QQQ by @Qubitium in #1402
make IPEX or any kernel that uses Torch for Training to auto switch v… by @Qubitium in #1412
[CI] xpu inference test by @CL-ModelCloud in #1380
[FIX] qqq with eora by @ZX-ModelCloud in #1415
[FIX] device error by @ZX-ModelCloud in #1417
make quant linear expose internal buffers by @Qubitium in #1418
Fix bfloat16 kernels by @Qubitium in #1420
fix qqq bfloat16 forward by @Qubitium in #1423
Fix ci10 by @Qubitium in #1424
fix marlin bf16 compat by @Qubitium in #1427
[CI] no need reinstall requirements by @CSY-ModelCloud in #1426
[FIX] dynamic save error by @ZX-ModelCloud in #1428
[FIX] super().post_init() calling order by @ZX-ModelCloud in #1431
fix bitblas choose IPEX in cuda env by @CSY-ModelCloud in #1432
Fix exllama is not packable by @Qubitium in #1433
disable exllama for training by @Qubitium in #1435
remove TritonV2QuantLinear for xpu test by @CSY-ModelCloud in #1436
[MODEL] add gemma3 support by @LRL-ModelCloud in #1434
fix the error when downloading models using modelscope by @mushenL in #1437
Add QQQ Rotation by @ZX-ModelCloud in #1425
fix no init.py by @CSY-ModelCloud in #1438
Fix hardmard import by @Qubitium in #1441
Eora final by @nbasyl in #1440
triton is not validated for ipex by @Qubitium in #1445
Fix exllama adapter by @Qubitium in #1446
fix rocm compile by @Qubitium in #1447
[FIX] Correctly obtain the submodule's device by @ZX-ModelCloud in #1448
fix rocm not compatible with exllama v2 and eora kernel by @Qubitium in #1449
revert overflow code by @Qubitium in #1450
add kernel dtype support and add full float15 vs bfloat16 kernel testing by @Qubitium in #1452
[MODEL] add Ovis2 support and bug fix by @Fusionplay in #1454
add unit test for ovis2 by @CSY-ModelCloud in #1456

New Contributors

@yuchiwang made their first contribution in #1389
@mushenL made their first contribution in #1437
@nbasyl made their first contribution in #1440
@Fusionplay made their first contribution in #1454

Full Changelog: v2.0.0...v2.1.0

Contributors

Qubitium, yuchiwang, and 8 other contributors

Assets 61

03 Mar 22:14

Qubitium

v2.0.0

c0f9dc0

GPTQModel v2.0.0

What's Changed

🎉 GPTQ quantization internals are now broken into multiple stages (processes) for feature expansion.
🎉 Synced Marlin kernel inference quality fix from upstream. Added MARLIN_FP16, lower-quality but faster backend.
🎉 ModelScope support added.
🎉 Logging and cli progress bar output has been revamped with sticky bottom progress.
🎉 Added CI tests to track regression in kernel inference quality and sweep all bits/group_sizes.
🎉 Delegate loggin/progressbar to LogBar pkg.
🐛 Fix ROCm version auto detection in setup install.
🐛 Fixed generation_config.json save and load.
🐛 Fixed Transformers v4.49.0 compat. Fixed compat of models without bos.
🐛 Fixed group_size=-1 and bits=3 packing regression.
🐛 Fixed Qwen 2.5 MoE regressions.

fix 3 bit packing regression， fixed #1278 by @CSY-ModelCloud in #1280
Fix supported models list (syntax error) by @Forenche in #1281
feat: load model from modelscope by @suluyana in #1283
merge eval & utils.lm_eval by @CSY-ModelCloud in #1282
fix modelscope import & tests by @CSY-ModelCloud in #1285
allow passing model instance to evalplus & update tokenizer loading logics by @CSY-ModelCloud in #1284
fix lm-eval & vllm check tokenizer type by @CSY-ModelCloud in #1287
Fix generation_config.json not auto-saved by @Qubitium in #1292
[SAVE] Save config files with empty state dict by @ZX-ModelCloud in #1293
[SAVE] Save processor related config files by @ZX-ModelCloud in #1295
fix wrong order of config save causing sharded tensors to be removed by @Qubitium in #1297
[FIX] not pack when group_size=-1 by @ZX-ModelCloud in #1298
cleanup marlin paths: marlin does conversion on post_init by @Qubitium in #1310
bump tokenicer to v0.0.3 by @CSY-ModelCloud in #1308
clean is_marlin_format for tests by @CSY-ModelCloud in #1311
[CI] fix sglang test name & add status logs & remove exllama packing test by @CSY-ModelCloud in #1312
skip v1 to v2 conversion for sym=True only kernels by @Qubitium in #1314
bump tokenicer to 0.0.4 & remove FORMAT_FIELD_COMPAT_MARLIN by @CSY-ModelCloud in #1315
revert is_marlin_format check by @CSY-ModelCloud in #1316
Improve Marlin accuracy (default) but add MARLIN_FP16 backend for faster with less-accuracy by @Qubitium in #1317
marlin fp32 mode should also be enabled if kernel was selected due to… by @Qubitium in #1318
refractor logger by @Qubitium in #1319
fix typo by @Qubitium in #1320
refractor logger and have progress bar sticky to bottom of cli by @Qubitium in #1322
[CI] fix tokenicer upgraded transformers & install bitblas for test_save_quanted_model by @CSY-ModelCloud in #1321
[CI] allow to select compiler server & move model test to correct dir by @CSY-ModelCloud in #1323
fix bitblas loading regression by @Qubitium in #1324
marlin fp16 warning missed check by @Qubitium in #1325
fix custom logger overriding system level logger by @Qubitium in #1327
fix progress bar for packing by @CSY-ModelCloud in #1326
More log fixes by @Qubitium in #1328
fix no backend when creating a quant linear by @CSY-ModelCloud in #1329
use relative path instead of importing gptqmodel by @CSY-ModelCloud in #1331
no need patch vllm now by @CSY-ModelCloud in #1332
[CI] fix CI url by @CSY-ModelCloud in #1333
fix oom by @CSY-ModelCloud in #1335
add default value for backend, fix optimum doesn't pass it by @CSY-ModelCloud in #1334
refractor pb and pb usage by @Qubitium in #1341
fix generator has no length info by @CSY-ModelCloud in #1342
replace utils.Progressbar with logbar by @CSY-ModelCloud in #1343
[CI] update UI by @CSY-ModelCloud in #1344
fix logbar api usage by @CSY-ModelCloud in #1345
fix v2 to v1 missed logic bypass by @Qubitium in #1347
[CI] fix xpu env has no logbar by @CSY-ModelCloud in #1346
[CI] update runner ip env & fix show-statistics didn't run by @CSY-ModelCloud in #1348
fix time was not imported by @CSY-ModelCloud in #1349
update device-smi depend to v0.4.0 by @Qubitium in #1351
[CI] install requirements.txt for m4 by @CSY-ModelCloud in #1352
Exllama V1 is Packable by @ZX-ModelCloud in #1356
[FIX] test_packable.py by @ZX-ModelCloud in #1357
[setup] use torch.version.hip for rocm version check by @CSY-ModelCloud in #1360
save/load peft lora by @Qubitium in #1358
update device-smi to 0.4.1 for rocm fix by @Qubitium in #1362
strip model path by @Qubitium in #1363
[CI] exllama v1 kernel now eligible for quant stage by @Qubitium in #1364
Fix transformers modeling code passing input.shape[0] == 0 to nn.module by @Qubitium in #1365
simplify log var by @Qubitium in #1368
fix import by @CSY-ModelCloud in #1369
update by @Qubitium in #1370

New Contributors

@Forenche made their first contribution in #1281
@suluyana made their first contribution in #1283

Full Changelog: v1.9.0...v2.0.0

Contributors

Qubitium, Forenche, and 3 other contributors

Assets 60

12 Feb 09:34

Qubitium

v1.9.0

599e5c7

GPTQModel v1.9.0

What's Changed

⚡ Offload tokenizer fixes to Toke(n)icer pkg.
⚡ Optimized lm_head quant time and vram usage.
⚡ Optimized DeekSeek v3/R1 model quant vram usage.
⚡ 3x speed-up for Torch kernel when using Pytorch >= 2.5.0 with model.compile().
⚡ New calibration_dataset_concat_size option to enable calibration data concat mode to mimic original GPTQ data packing strategy which may improve quant speed and accuracy for datasets like wikitext2.
🐛 Fixed Optimum compat and XPU/IPEX auto kernel selection regresion in v1.8.1

Fix init arg order and optimum compat by @CSY-ModelCloud in #1240
[FIX][Optimize] lm_head quantize by @ZX-ModelCloud in #1239
[Model] [DeepSpeek] un-merge gate_proj and up_proj by @LRL-ModelCloud in #1241
Use Toke(n)icer by @CL-ModelCloud in #1242
#1244
Add Tokenicer Test by @CL-ModelCloud in #1245
prepare for 1.8.2 release by @Qubitium in #1243
simplify calls to tokenicer by @CL-ModelCloud in #1246
Update requirements.txt by @Qubitium in #1248
fix trust_remote was lost by @CSY-ModelCloud in #1249
fix trust_remote was lost by @CSY-ModelCloud in #1250
prepare for 1.8.5 release by @Qubitium in #1251
fix unit tests & tweak logic for selecting backends by @CSY-ModelCloud in #1253
install tokenicer form git & do ruff by @CSY-ModelCloud in #1254
fix k,v is not a dict by @CSY-ModelCloud in #1255
fix not enough values to unpack (expected 2, got 1) by @CSY-ModelCloud in #1256
fix sglang test requires numpy<2.0 by @CSY-ModelCloud in #1258
fix ipex backend by @jiqing-feng in #1259
ipex should be packable, reverted pr #1259 importer.py changes by @CSY-ModelCloud in #1260
remove sentencepiece by @CSY-ModelCloud in #1261
speed up torch dequantize by @Qubitium in #1262
Add calibration_dataset_concat_size option/mode by @LRL-ModelCloud in #1257
add transformers test by @CSY-ModelCloud in #1264
Add kernel torch.compile hook by @Qubitium in #1265
[FIX]fix vl model prepare_dataset by @LRL-ModelCloud in #1266

Full Changelog: v1.8.1...v1.9.0

Contributors

Qubitium, jiqing-feng, and 4 other contributors

Assets 60

Releases: ModelCloud/GPTQModel

GPT-QModel v5.0.0

Notable Changes:

What's Changed

Contributors

Uh oh!

GPT-QModel v4.2.5

What's Changed

Contributors

Uh oh!

GPT-QModel v4.2.0

Notable Changes

What's Changed

Contributors

Uh oh!

GPT-QModel v4.1.0

Notable Changes:

What's Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v4.0.0

Notable Changes

Other Changed

New Contributors

Contributors

Uh oh!

GPT-QModel v3.0.0

What's Changed

New Contributors

Contributors

Uh oh!

GPTQModel v2.2.0

What's Changed

Contributors

Uh oh!

GPTQModel v2.1.0

What's Changed

New Contributors

Contributors

Uh oh!

GPTQModel v2.0.0

What's Changed

New Contributors

Contributors

Uh oh!

GPTQModel v1.9.0

What's Changed

Contributors

Uh oh!