Skip to content

Commit 3f42edd

Browse files
authored
refine readme (#536)
1 parent 87579ef commit 3f42edd

File tree

2 files changed

+9
-3
lines changed

2 files changed

+9
-3
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,8 @@ in [Gaudi Guide](https://docs.habana.ai/en/latest/).
317317

318318
#### Gaudi/CPU/XPU/CUDA
319319

320+
**Please avoid manually moving the quantized model to a different device** (e.g., model.to('cpu')) during inference, as this may cause unexpected exceptions.
321+
320322
```python
321323
from transformers import AutoModelForCausalLM, AutoTokenizer
322324
from auto_round import AutoRoundConfig ## must import for auto-round format
@@ -512,3 +514,4 @@ If you find AutoRound useful for your research, please cite our paper:
512514

513515

514516

517+

docs/step_by_step.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ output_dir = "./tmp_autoround"
138138
autoround.quantize_and_save(output_dir, format='auto_round')
139139
```
140140

141-
#### mixed bits Usage
141+
#### Mixed bits Usage
142142
```python
143143
from transformers import AutoModelForCausalLM, AutoTokenizer
144144
from auto_round import AutoRound
@@ -320,6 +320,7 @@ autoround.quantize_and_save(output_dir, format='gguf:q4_0') # gguf:q4_1
320320

321321
AutoRound automatically selects the best available backend based on the installed libraries and prompts the user to install additional libraries when a better backend is found.
322322

323+
**Please avoid manually moving the quantized model to a different device** (e.g., model.to('cpu')) during inference, as this may cause unexpected exceptions.
323324

324325
### CPU
325326

@@ -398,7 +399,8 @@ The backend may not always be the most suitable for certain devices.
398399
You can specify your preferred backend such as "ipex" for CPU and XPU, "marlin/exllamav2/triton" for CUDA, according to your needs or hardware compatibility. Please note that additional corresponding libraries may be required.
399400

400401
```python
401-
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoRoundConfig
402+
from transformers import AutoModelForCausalLM, AutoTokenizer
403+
from auto_round import AutoRoundConfig
402404

403405
model_name = "OPEA/Qwen2.5-1.5B-Instruct-int4-sym-inc"
404406
quantization_config = AutoRoundConfig(backend="ipex")
@@ -415,7 +417,8 @@ print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50, do_sample=Fal
415417
Most GPTQ/AWQ models can be converted to the AutoRound format for better compatibility and support with Intel devices. Please note that the quantization config will be changed if the model is serialized.
416418

417419
```python
418-
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoRoundConfig
420+
from transformers import AutoModelForCausalLM, AutoTokenizer
421+
from auto_round import AutoRoundConfig
419422

420423
model_name = "ybelkada/opt-125m-gptq-4bit"
421424
quantization_config = AutoRoundConfig()

0 commit comments

Comments
 (0)