Weighted-Iterative Pruning (WIP)

Weighted Importance Meets Iterative Pruning for Zero-Shot LLM Compression

WIP is a post-training pruning method designed to shrink large language models (LLMs) without retraining or fine-tuning. It enhances recent methods (e.g., Wanda, RIA, SparseGPT) by introducing:

Weighted Importance Metric: A tunable metric balancing row-wise and column-wise contributions to avoid channel collapse.
Iterative Multi-stage Pruning: Incrementally recalculates importance scores, reducing accuracy degradation common in one-shot pruning.

Installation

Create a new conda environment and install dependencies:

conda create -n wip python=3.10
conda activate wip
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Optional for zero-shot evaluation:

git clone https://github.com/EleutherAI/lm-evaluation-harness.git
cd lm-evaluation-harness
pip install -e .

Usage

Replace YOUR_MODEL_NAME (e.g., "huggingface/llama2-7b") in the commands below.

Unstructured pruning

python main.py \
  --model YOUR_MODEL_NAME \
  --prune_method wip \
  --sparsity_ratio 0.5 \
  --sparsity_type unstructured \
  --save

Semi-structured pruning (e.g., 2:4 pattern)

python main.py \
  --model YOUR_MODEL_NAME \
  --prune_method wip \
  --sparsity_ratio 0.5 \
  --sparsity_type 2:4 \
  --save

GPU Inference Speedup (Semi-structured sparsity)

We measure inference acceleration using GPUs that support semi-structured sparsity (e.g., NVIDIA Ampere and Hopper architectures). Specifically, we leverage TensorRT-LLM to run models with an N:M sparsity pattern. For more details, refer to this issue.

Acknowledgments

This repository extends prior work from SparseGPT, Wanda, and RIA.

For questions or suggestions, feel free to contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
lib		lib
Readme.md		Readme.md
main.py		main.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weighted-Iterative Pruning (WIP)

Installation

Usage

Unstructured pruning

Semi-structured pruning (e.g., 2:4 pattern)

GPU Inference Speedup (Semi-structured sparsity)

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Weighted-Iterative Pruning (WIP)

Installation

Usage

Unstructured pruning

Semi-structured pruning (e.g., 2:4 pattern)

GPU Inference Speedup (Semi-structured sparsity)

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages