Skip to content

Latest commit

 

History

History
133 lines (90 loc) · 10.6 KB

README.md

File metadata and controls

133 lines (90 loc) · 10.6 KB

Decay Pruning Method (DPM): Smooth Pruning With a Self-Rectifying Procedure

Static Badge DNN-compress-bage build-pytorchs-bage lincese-bage arXiv

Make Structured Pruning Methods Smooth and Adaptive!

Decay Pruning Method (DPM) is a novel smooth and dynamic pruning approach, that can be seemingly integrated with various existing structured pruning methods, providing significant improvement. Unlike traditional single-step pruning approaches that remove or zero out redundant structures abruptly, DPM employs a multi-step, smooth process that gradually decays these structures to zero, for better information retention. Additionally, DPM incorporates a gradient-based self-rectifying procedure that identifies and corrects sub-optimal pruning decisions during the decay, ensuring more precise and adaptive pruning decisions.

How DPM works.

Our DPM contains two procedures:

  • Smooth Pruning (SP): SP is a multi-step pruning process that gradually decays the weights of redundant structures to zero over N steps while maintaining continuous optimization. This approach minimizes drastic network changes and enhances information retention during pruning.
  • Self-Rectifying (SR): SR employs a gradient-driven approach to assess the resistance of decaying structures, effectively correcting sub-optimal pruning decisions and enabling more adaptive and optimal pruning decisions.

More technical details of DPM are available at our preprint paper: Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure

TODO List.

  • Add integration examples and tutorials for Depgraph and gate-decorator.
  • Make tutorials and user-friendly codes for DPM integration.

Experimental Results.

Table1.png We verified the effectiveness and generalizability of DPM by integrating it into three pruning frameworks: the newly proposed OTOv2 and Depgraph, as well as the classic Gate-Decorator, each within their original configurations. All codes for these examples of integration will be uploaded soon!

1, Integrating DPM with OTOv2.

Benchmark: VGG16-BN / CIFAR-10

Method FLOPs Params Top-1 Acc.
Baseline 100% 100% 93.2%
EC [1] 65.8% 37.0% 93.1%
Hinge [2] 60.9% 20.0% 93.6%
SCP [3] 33.8% 7.0% 93.8%
OTOv2 [4] 26.5% 4.8% 93.4%
+SP (Ours) 26.4% -0.1%↓ 4.8% - 93.6% +0.2%↑
+SR (Ours) 25.8% -0.7%↓ 4.8% - 93.8% +0.4%↑

Notations:

  • Note 1: We use ’+SP’ to denote the exclusive use of Smooth Pruning, and ’+SR’ when both Smooth Pruning and Self-Rectifying are applied.
  • Note 2: The best pruning results are highlighted in bold.
  • Note 3: The enhancement from DPM are presented with superscript.

In this benchmark, DPM notably increased the accuracy of OTOv2 by 0.4% and reduced the FLOPs by 0.7%, achieving the highest performance compared to OTOv2 and other leading methods, including SCP.

2, Integrating DPM with Depgraph.

Benchmark: ResNet-56 / CIFAR-10

Method FLOPs Params Top-1 Acc.
Baseline 100% 100% 93.53%
Hinge [2] 50.0% 48.73% 93.69%
SCP [3] 51.5% 48.47% 93.23%
ResRep [5] 47.2% 93.71%
SANP [6] 48.0% 93.81%
APIB [7] 46.0% 50.0% 93.92%
SFP [8] 47.4% 93.66%
ASFP [9] 47.4% 93.32%
Depgraph [10] 46.86% 52.9% 93.84%
+SP (Ours) 46.32% -0.1%↓ 49.69% -3.21%↓ 93.96% +0.12%↑
+SR (Ours) 45.80% -1.06%↓ 47.22% -5.68%↓ 94.13% +0.29%↑
Depgraph w/o SL [10] 47.2% 69.7% 93.32%
+SP (Ours) 46.7% -0.5%↓ 68.1% -1.6%↓ 93.62% +0.3%↑
+SR (Ours) 47.1% -0.1%↓ 65.7% -4.0%↓ 93.71% +0.39%↑

Notations:

  • Note 1: For cases where results are not reported from literature, we mark them as ‘-’.
  • Note 2: "w/o SL" = "without sparse learning".

DPM significantly enhances accuracy across both configurations, reducing parameters by over 4% compared to the original Depgraph. Specifically, with the Group Pruner combined with Sparse Learning, DPM achieves a state-of-the-art accuracy of 94.13%, while further reducing FLOPs by 1% and parameters by 5.7%. This performance significantly surpasses the state-of-the-art method APIB by 0.2% in accuracy, with even higher model efficiency.

3, Integrating DPM with Gate-Decorator.

Benchmark: VGG16 / CIFAR-10

Method FLOPs Params Top-1 Acc.
Gate-Decorator [11] 9.86% 1.98% 91.50%
+SP (Ours) 9.88% +0.02%↑ 1.97% -0.01%↓ 91.58% +0.08%↑
+SR (Ours) 9.79% -0.07%↓ 1.95% -0.03%↓ 91.74% +0.24%↑

DPM improves accuracy by 0.24%, reduces FLOPs by 0.07%, and decreases parameters by 0.03%.

Citation.

@misc{yang2024decaypruningmethodsmooth,
      title={Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure}, 
      author={Minghao Yang and Linlin Gao and Pengyuan Li and Wenbo Li and Yihong Dong and Zhiying Cui},
      year={2024},
      eprint={2406.03879},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2406.03879}, 
}

References.

[1] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” in Proc. Int. Conf. Learn. Represent., 2017.

[2] Y. Li, S. Gu, C. Mayer, L. V. Gool, and R. Timofte, “Group sparsity: The hinge between filter pruning and decomposition for network com-pression,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 8015–8024.

[3] M. Kang and B. Han, “Operation-aware soft channel pruning using differentiable masks,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 5122– 5131.

[4] T. Chen, L. Liang, T. Ding, Z. Zhu, and I. Zharkov, “Otov2: Automatic, generic, user-friendly,” in Proc. Int. Conf. Learn. Represent., 2023.

[5] X. Ding, T. Hao, J. Tan, J. Liu, J. Han, Y. Guo, and G. Ding, “Resrep: Lossless cnn pruning via decoupling remembering and forgetting,” in Proc. IEEE Int. Conf. Comput. Vis., 2020, pp. 4490–4500.

[6] S. Gao, Z. Zhang, Y. Zhang, F. Huang, and H. Huang, “Structural alignment for network pruning through partial regularization,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 17 356–17 366.

[7] S. Guo, L. Zhang, X. Zheng, Y. Wang, Y. Li, F. Chao, C. Wu, S. Zhang, and R. Ji, “Automatic network pruning via hilbert-schmidt independence criterion lasso under information bottleneck principle,” in Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 17 412–17 423.

[8] Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang, “Soft filter pruning for accelerating deep convolutional neural networks,” in Proc. Int. Joint Conf. Artif. Intell., 2018.

[9] Y. He, X. Dong, G. Kang, Y. Fu, C. Yan, and Y. Yang, “Asymptotic soft filter pruning for deep convolutional neural networks,” arXiv preprint arXiv:1808.07471, 2019.

[10] G. Fang, X. Ma, M. Song, M. B. Mi, and X. Wang, “Depgraph: Towards any structural pruning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 16 091–16 101.

[11] Z. You, K. Yan, J. Ye, M. Ma, and P. Wang, “Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 2130–2141.