Skip to content

Commit 87579ef

Browse files
authored
update readme (#531)
1 parent 322ad6e commit 87579ef

File tree

3 files changed

+359
-117
lines changed

3 files changed

+359
-117
lines changed

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ pip install auto-round-lib
7272

7373
## Model Quantization
7474

75-
### Basic Usage (Gaudi/CPU/XPU/GPU)
75+
### Command Line Usage (Gaudi/CPU/XPU/GPU)
7676

7777
A user guide detailing the full list of supported arguments is provided by calling ```auto-round -h``` on the terminal.
7878
Set the format you want in `format` and
@@ -161,7 +161,7 @@ from auto_round import AutoRound
161161
bits, group_size, sym = 4, 128, True
162162
autoround = AutoRound(model, tokenizer, bits=bits, group_size=group_size, sym=sym)
163163

164-
## the best accuracy, 3X slower, low_gpu_mem_usage could save ~20G but ~30% slower
164+
## the best accuracy, 4-5X slower, low_gpu_mem_usage could save ~20G but ~30% slower
165165
# autoround = AutoRound(model, tokenizer, nsamples=512, iters=1000, low_gpu_mem_usage=True, bits=bits, group_size=group_size, sym=sym)
166166

167167
## 2-3X speedup, slight accuracy drop at W4G128
@@ -334,8 +334,8 @@ print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50)[0]))
334334

335335
AutoRound automatically selects the best available backend based on the installed libraries and prompts the user to
336336
install additional libraries when a better backend is found. On CUDA, the default priority is Marlin > ExLLaMAV2 >
337-
Triton, but the final choice depends on factors such as bits group_size packing format compatibility, etc. Please refer
338-
to the following table for the details.
337+
Triton, but the final choice depends on factors such as bits, group_size, packing format compatibility, etc. And the backend may not always be the most suitable for certain devices. Please refer
338+
to the following table for the details and specify the backend you want.
339339

340340
| Name | Devices | Bits | Dtypes | Priority | Packing format | Requirements |
341341
|--------------------------------------|---------|---------|-----------|----------|-----------------|-------------------------------|
@@ -345,12 +345,13 @@ to the following table for the details.
345345
| exllamav2 or<br/>gptqmodel:exllamav2 | cuda | 4 | BF16/FP16 | 5 | gptq | gptqmodel |
346346
| exllamav2 or<br/>gptq:exllamav2 | cuda | 4 | FP16 | 5 | gptq_zp+-1 | auto-gptq |
347347
| gptq:cuda | cuda | 2,3,4,8 | FP16 | 0 | gptq_zp+-1 | auto-gptq |
348-
| triton | cuda | 2,3,8 | BF16/FP16 | 1 | gptq/gptq_zp+-1 | auto-round |
348+
| triton | cuda | 2,4,8 | BF16/FP16 | 1 | gptq/gptq_zp+-1 | auto-round |
349349
| awq | cuda | 4 | FP16 | 5 | awq | auto-awq |
350350
| hpu | hpu | 4 | BF16 | 0 | gptq/gptq_zp+-1 | auto-round |
351351

352352
```python
353353
from transformers import AutoModelForCausalLM, AutoTokenizer
354+
from auto_round import AutoRoundConfig
354355

355356
quantized_model_path = "./tmp_autoround"
356357
quantization_config = AutoRoundConfig(backend="auto")

0 commit comments

Comments
 (0)