Skip to content

Commit 75ebc25

Browse files
authored
Merge branch 'main' into issue-1927-modernize-transformers
2 parents 1a56b66 + a270f33 commit 75ebc25

File tree

3 files changed

+9
-0
lines changed

3 files changed

+9
-0
lines changed

docs/getting-started/compress.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,4 +113,13 @@ This hessian matrix is used to increase the accuracy recovery of the algorithm,
113113
| **DeepSeek-R1-0528-BF16** | mem(684B params) ~= 1368Gb | mem(1 Layer) * 2 ~= 44.8Gb |
114114
| **Qwen2.5-VL-7B-Instruct** | mem(7B params) ~= 14Gb | max(mem(1 Text Layer)~= 0.4B, mem(Vision tower)~=1.3B)*2 ~= 2.6Gb |
115115
116+
## Runtime requirements for LLM Compressor
116117
118+
The following are typical runtimes for each LLM Compressor algorithm based on runs using Meta-Llama-3-8B-Instruct on a NVIDIA A100 Tensor Core GPU.
119+
120+
| Algorithm| Estimated Time
121+
|--------|-------------|
122+
| **RTN (QuantizationModifier)** <br> Weights only (no activation quant) | ~ 1 minutes |
123+
| **RTN (QuantizationModifier)** <br> Weights and activations | ~ 20 minutes |
124+
| **GPTQ** (weights only) | ~ 30 minutes |
125+
| **AWQ** (weights only) | ~ 30 minutes |

0 commit comments

Comments
 (0)