You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -317,6 +317,8 @@ in [Gaudi Guide](https://docs.habana.ai/en/latest/).
317
317
318
318
#### Gaudi/CPU/XPU/CUDA
319
319
320
+
**Please avoid manually moving the quantized model to a different device** (e.g., model.to('cpu')) during inference, as this may cause unexpected exceptions.
321
+
320
322
```python
321
323
from transformers import AutoModelForCausalLM, AutoTokenizer
322
324
from auto_round import AutoRoundConfig ## must import for auto-round format
@@ -512,3 +514,4 @@ If you find AutoRound useful for your research, please cite our paper:
AutoRound automatically selects the best available backend based on the installed libraries and prompts the user to install additional libraries when a better backend is found.
322
322
323
+
**Please avoid manually moving the quantized model to a different device** (e.g., model.to('cpu')) during inference, as this may cause unexpected exceptions.
323
324
324
325
### CPU
325
326
@@ -398,7 +399,8 @@ The backend may not always be the most suitable for certain devices.
398
399
You can specify your preferred backend such as"ipex"forCPUandXPU, "marlin/exllamav2/triton"forCUDA, according to your needs or hardware compatibility. Please note that additional corresponding libraries may be required.
399
400
400
401
```python
401
-
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoRoundConfig
402
+
from transformers import AutoModelForCausalLM, AutoTokenizer
Most GPTQ/AWQ models can be converted to the AutoRound formatfor better compatibility and support with Intel devices. Please note that the quantization config will be changed if the model is serialized.
416
418
417
419
```python
418
-
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoRoundConfig
420
+
from transformers import AutoModelForCausalLM, AutoTokenizer
0 commit comments