Skip to content

truongdo619/WIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Weighted-Iterative Pruning (WIP)

Weighted Importance Meets Iterative Pruning for Zero-Shot LLM Compression

WIP is a post-training pruning method designed to shrink large language models (LLMs) without retraining or fine-tuning. It enhances recent methods (e.g., Wanda, RIA, SparseGPT) by introducing:

  1. Weighted Importance Metric: A tunable metric balancing row-wise and column-wise contributions to avoid channel collapse.
  2. Iterative Multi-stage Pruning: Incrementally recalculates importance scores, reducing accuracy degradation common in one-shot pruning.

Installation

Create a new conda environment and install dependencies:

conda create -n wip python=3.10
conda activate wip
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Optional for zero-shot evaluation:

git clone https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install -e .

Usage

Replace YOUR_MODEL_NAME (e.g., "huggingface/llama2-7b") in the commands below.

Unstructured pruning

python main.py \
  --model YOUR_MODEL_NAME \
  --prune_method wip \
  --sparsity_ratio 0.5 \
  --sparsity_type unstructured \
  --save

Semi-structured pruning (e.g., 2:4 pattern)

python main.py \
  --model YOUR_MODEL_NAME \
  --prune_method wip \
  --sparsity_ratio 0.5 \
  --sparsity_type 2:4 \
  --save

GPU Inference Speedup (Semi-structured sparsity)

We measure inference acceleration using GPUs that support semi-structured sparsity (e.g., NVIDIA Ampere and Hopper architectures). Specifically, we leverage TensorRT-LLM to run models with an N:M sparsity pattern. For more details, refer to this issue.


Acknowledgments

This repository extends prior work from SparseGPT, Wanda, and RIA.

For questions or suggestions, feel free to contact us.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages