File tree Expand file tree Collapse file tree 1 file changed +4
-3
lines changed Expand file tree Collapse file tree 1 file changed +4
-3
lines changed Original file line number Diff line number Diff line change 11---
22layout : post
3- title : " Boosting vLLM Performance on AMD ROCm: PTPC-FP8 Quantization Unleashes Speed and Accuracy "
3+ title : " PTPC-FP8: Boosting vLLM Performance on AMD ROCm"
44author : " AMD and Embedded LLM"
55image : /assets/figures/ptpc/PTPC-tumbnail.png
66thumbnail-img : /assets/figures/ptpc/PTPC-tumbnail.png
@@ -36,7 +36,6 @@ LLMs develop activation outliers as they scale beyond certain sizes. These unusu
3636- Most values receive few effective bits of precision when using per-tensor quantization
3737- Outliers appear persistently in specific channels across different tokens
3838- While weights are relatively uniform and easy to quantize, activations are not
39-
4039#### PTPC: A Precision-Targeted Approach
4140
4241PTPC-FP8 (Per-Token-Activation, Per-Channel-Weight FP8) addresses this challenge by using tailored scaling factors based on three key observations:
@@ -49,7 +48,9 @@ This insight led to a dual-granularity approach:
4948* ** Per-Token Activation Quantization** : Each input token receives its own scaling factor
5049* ** Per-Channel Weight Quantization** : Each weight column gets a unique scaling factor
5150
52- <img align =" right " src =" /assets/figures/ptpc/PTPC-Diagram.png " alt =" Per-Token Activation + Per-Channel Weight Quantization " width =" 50% " height =" 50% " >
51+ <div align =" center " >
52+ <img src =" /assets/figures/ptpc/PTPC-Diagram.png " alt =" Per-Token Activation + Per-Channel Weight Quantization " width =" 80% " >
53+ </div >
5354
5455#### Understanding the Diagram
5556
You can’t perform that action at this time.
0 commit comments