-
| 
         I tried quantizing the same model with same quantize config, with AutoGPTQ I can finish in 15~20 minutes, but with this library I need over 2 hours...  | 
  
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 8 replies
-
| 
         Please use and increase  auto-gptq has broken batch support for calibration. gptqmodel has batching support but you need to set to a proper value accordingly to your gpu capability and vram size.  | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         @CHNtentes  Was the speed issue resolved by   | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         @CHNtentes  In addition to  Quantization time is a bit slower than AutoGPTQ at the moment but we hope to fix that in our next release. More importanly, GPTQModel's   | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         @CHNtentes Memory usage issue has been fixed in https://github.com/ModelCloud/GPTQModel/releases/tag/v1.6.0 We are now 35% lower vram usage than before and 15% lower than AutoGPTQ and our test quants per layer, using same calibration data, with QwQ 32B test GPTQModel has consistent lower   | 
  
Beta Was this translation helpful? Give feedback.
-
| 
         We are now 90% less memory usage than AutoGPTQ on quantization of large models and should also be much faster with multi-gpu acceleration during quantization.  | 
  
Beta Was this translation helpful? Give feedback.
@CHNtentes Memory usage issue has been fixed in https://github.com/ModelCloud/GPTQModel/releases/tag/v1.6.0
We are now 35% lower vram usage than before and 15% lower than AutoGPTQ and our test quants per layer, using same calibration data, with QwQ 32B test GPTQModel has consistent lower
error_losswhich is critical for quantization. Speed is about 7.5% slower than AutoGPTQ for QwQ 32B but with lower vram usage and higher quality quants, it is worth the one-off cost. We will try to improve the speed in our next release.