Code for the ICLR 2025 paper: Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization.
The repository is based on SparseGPT code
torch
: tested on v2.2.1transformers
: tested on v4.35.2datasets
: tested on v2.16.1
We also provide LLaMA pruning script with the very same interface:
# Sparsify LLaMa with SparseGPT
python llama.py meta-llama/Llama-2-7b-hf c4 --sparsity 0.5
For replicating other experiments (comparision with OBC a post-training pruning with finetuning)
see other_experiments
directory.