diff --git a/README.md b/README.md
index 34f0490..ce32330 100644
--- a/README.md
+++ b/README.md
@@ -77,15 +77,13 @@ python main_large.py
```
## 🐤 Train quantized large models
-We also provide support for quantizing larger models, _e.g._, LLaMA 3.1 70B model, using the [GPTQ](https://arxiv.org/abs/2210.17323) algorithm and then optimizing the LoRA.
+We also provide support for quantizing larger models, _e.g._, the LLaMA 3.3 70B model, using the [GPTQ](https://arxiv.org/abs/2210.17323) algorithm and then optimizing the LoRA.
***The large models can be deployed on consumer GPUs after quantization.***
-We can directly use the Hugging Face [transformers](https://github.com/huggingface/transformers) package to conduct quantization.
-```shell
-python quantization_HF.py --repo "meta-llama/Meta-Llama-3.1-70B-Instruct" --bits 4 --group_size 128
-```
+> [!IMPORTANT]
+> Due to the [suspended development of the AutoGPTQ package](https://github.com/vkola-lab/PodGPT/issues/1), we strongly recommend conducting quantization using the [GPTQModel](https://github.com/ModelCloud/GPTQModel) package!
-Or, we enable the Python [GPTQModel](https://github.com/ModelCloud/GPTQModel) package to conduct quantization.
+First, install the GPTQModel,
```shell
pip install -v gptqmodel --no-build-isolation
```
@@ -95,20 +93,27 @@ Then,
python quantization_GPTQModel.py "meta-llama/Llama-3.3-70B-Instruct" "./gptq_model" --bits 4 --group_size 128 --seqlen 2048 --damp 0.01 --desc_act 1 --dtype bfloat16
```
-Alternatively, we also provide a quantization script using the Python [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ) package.
+Alternatively, we can use the Hugging Face [transformers](https://github.com/huggingface/transformers) package to do the quantization.
+```shell
+python quantization_HF.py --repo "meta-llama/Meta-Llama-3.1-70B-Instruct" --bits 4 --group_size 128
+```
+
+Lastly, we provide a quantization script based on the Python [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ) package.
+Please use the `pip install auto-gptq==0.6.0 --no-build-isolation` to install the AutoGPTQ.
```shell
python quantization.py "meta-llama/Meta-Llama-3.1-70B-Instruct" "./gptq_model" --bits 4 --group_size 128 --desc_act 1 --dtype bfloat16 --seqlen 2048 --damp 0.01
```
-Then, we need to upload the model to Hugging Face, for example,
+After the quantization process, you can upload the quantized model to your Hugging Face, for example,
```shell
-python upload_quantized_model.py --repo "shuyuej/MedLLaMA3-70B-BASE-MODEL-QUANT" --folder_path "./gptq_model"
+python upload_quantized_model.py --repo "shuyuej/Llama-3.3-70B-Instruct-GPTQ" --folder_path "./gptq_model"
```
-Lastly, we optimize the LoRA module,
+Finally, we optimize the LoRA adapter,
```shell
python main_quantization.py
```
+
_Quantized Model Training Special Notice_:
1. **Stable training** of the quantized model with a LoRA adapter is tricky.
We found the fine-tuned model tends to [**repeat the answer**](https://github.com/tloen/alpaca-lora/issues/467) during the generation process.