Efficient Diffusion Models: A Survey [arXiv] (Version 1: 02/03/2025; Version 3: 06/06/2025, camera ready version of Transactions on Machine Learning Research (TMLR))
Hui Shen1, Jingxuan Zhang2, Boning Xiong3, Rui Hu4,Shoufa Chen1, Zhongwei Wan1, Xin Wang1, Yu Zhang5, Zixuan Gong5, Guangyin Bao5, Chaofan Tao6, Yongfeng Huang7, Ye Yuan8, Mi Zhang.1
1The Ohio State University, 2Indiana University, 3Fudan University, 4Hangzhou City University, 5Tongji University, 6The University of Hong Kong, 7The Chinese University of Hong Kong, 8Peking University.
β‘News: Our survey has been officially accepted by Transactions on Machine Learning Research (TMLR), May 2025.
@article{shen2025efficient,
title={Efficient Diffusion Models: A Survey},
author={Shen, Hui and Zhang, Jingxuan and Xiong, Boning and Hu, Rui and Chen, Shoufa and Wan, Zhongwei and Wang, Xin and Zhang, Yu and Gong, Zixuan and Bao, Guangyin and others},
journal={Transactions on Machine Learning Research (TMLR)},
year={2025}
}
We will actively maintain this repository by incorporating new research as it emerges. If you have any suggestions regarding our taxonomy, find any missed papers, or update any preprint arXiv paper that has been accepted to some venue, feel free to send us an email or submit a pull request using the following markdown format.
Paper Title, <ins>Conference/Journal/Preprint, Year</ins> [[pdf](link)] [[other resources](link)].
Diffusion models have emerged as powerful generative models capable of producing highquality contents such as images, videos, audio, and text, demonstrating their potential to revolutionize digital content generation. However, these capabilities come at the cost of their significant resource demands and lengthy generation time, underscoring the need to develop efficient techniques for practical deployment. In this survey, we provide a systematic and comprehensive review of research on efficient diffusion models. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient diffusion model topics from algorithm, system, and framework perspective, respectively. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of efficient diffusion model research and inspire them to contribute to this important and exciting field.
- [ICLR 2024] Latent 3D Graph Diffusion. [Paper] [Code]
- [Arxiv 2024.10] L3DG: Latent 3D Gaussian Diffusion. [Paper]
- [Arxiv 2024.09] Latent Diffusion Models for Controllable RNA Sequence Generation. [Paper]
- [ICLR 2024] MIXED-TYPE TABULAR DATA SYNTHESIS WITH SCORE-BASED DIFFUSION IN LATENT SPACE [Paper] [Code]
- [CVPR 2023] Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models. [Paper]
- [CVPR 2023] Video Probabilistic Diffusion Models in Projected Latent Space. [Paper]
- [Arxiv 2023.03] Latent Video Diffusion Models for High-Fidelity Long Video Generation. [Paper]
- [CVPR 2023] Executing your commands via motion diffusion in latent space. [Paper] [Code]
- [ICML 2023] AudioLDM: Text-to-Audio Generation with Latent Diffusion Models. [Paper] [Code]
- [NeurIPS 2023] Generating behaviorally diverse policies with latent diffusion models. [Paper]
- [Arxiv 2022.11] MagicVideo: Efficient Video Generation With Latent Diffusion Models. [Paper]
- [ICLR 2024] InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation. [Paper] [Code]
- [Arxiv 2024.09] PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator. [Paper] [Code]
- [Arxiv 2024.10] Improving the Training of Rectified Flows. [Paper] [Code]
- [Arxiv 2024.02] SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow. [Paper] [Code]
- [ICLR 2022] Score-Based Generative Modeling with Critically-Damped Langevin Diffusion. [Paper] [Code]
- [Arxiv 2022.09] Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. [Paper] [Code]
- [Arxiv 2022.09] Rectified Flow: A Marginal Preserving Approach to Optimal Transport. [Paper]
- [NeurIPS 2021] Maximum Likelihood Training of Score-Based Diffusion Models. [Paper] [Code]
- [UAI 2019] Sliced Score Matching: A Scalable Approach to Density and Score Estimation. [Paper] [Code]
- [NeurIPS 2019] Generative Modeling by Estimating Gradients of the Data Distribution. [Paper] [Code]
- [JMLR 2005] Estimation of Non-Normalized Statistical Models by Score Matching. [Paper]
- [ICLR 2025] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think. [Paper] [Code]
- [Arxiv 2025.01] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models. [Paper] [Code]
- [Arxiv 2024.07] Improved Noise Schedule for Diffusion Training. [Paper]
- [NeurIPS 2024] ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting. [Paper] [Code]
- [Arxiv 2024.06] Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment. [Paper] [Code]
- [Arxiv 2024.02] DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design. [Paper] [Code]
- [ACL 2024] Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation. [Paper]
- [Arxiv 2023.05] DiGress: Discrete Denoising diffusion for graph generation. [Paper] [Code]
- [CVPR 2023] Leapfrog diffusion model for stochastic trajectory prediction. [Paper] [Code]
- [ICLR 2023] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models. [Paper] [Code]
- [EMNLP 2023] A Cheaper and Better Diffusion Language Model with Soft-Masked Noise. [Paper] [Code]
- [Arxiv 2022.02] PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior. [Paper]
- [ICLR 2021] Denoising Diffusion Implicit Models. [Paper]
- [ICML 2021] Improved Denoising Diffusion Probabilistic Models. [Paper] [Code]
- [NeurIPS 2020] Denoising Diffusion Probabilistic Models. [Paper] [Code]
- [Arxiv 2024.10] LoRA: Low-Rank Adaptation of Large Language Models. [Paper] [Code]
- [ECCV 2024] Concept sliders: Lora adaptors for precise control in diffusion models. [Paper] [Code]
- [ECCV 2024] Lcm-lora: A universal stable-diffusion acceleration module [Paper] [Code]
- [Arxiv 2024.07] LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models. [Paper] [Code]
- [Arxiv 2024.10] Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model [Paper]
- [AAAI 2024] T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. [Paper] [Code]
- [ICML 2024] Accelerating Parallel Sampling of Diffusion Models. [Paper] [Code]
- [Arxiv 2024.05] Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper] [Code]
- [CVPR 2024] Simda: Simple diffusion adapter for efficient video generation [Paper] [Code]
- [Arxiv 2023.08] Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. [Paper] [Code]
- [ECCV 2025] ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback. [Paper][Code]
- [Arxiv 2024.08] ControlNext: Powerful and efficient control for image and video generation. [Paper][Code]
- [NeurIPS 2024] Uni-ControlNet: All-in-one control to text-to-image diffusion models. [Paper][Code]
- [NeurIPS 2023] UniControl: A unified diffusion model for controllable visual generation in the wild. [Paper][Code]
- [Arxiv 2023.12] ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems. [Paper][Code]
- [ICCV 2023] Adding conditional control to text-to-image diffusion models. [Paper][Code]
- [ICML 2024] Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations. [Paper] [Code]
- [NeurIPS 2023] Gaussian Mixture Solvers for Diffusion Models. [Paper] [Code]
- [NeurIPS 2023] SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models. [Paper]
- [NeurIPS 2023] Restart sampling for improving generative processes. [Paper] [Code]
- [ICLR 2023] Fast Sampling of Diffusion Models with Exponential Integrator. [Paper] [Code]
- [ICML 2023] Denoising MCMC for Accelerating Diffusion-Based Generative Models. [Paper] [Code]
- [ICML 2023] Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs. [Paper] [Code]
- [Arxiv 2023.09] Diffusion models with deterministic normalizing flow priors. [Paper] [Code]
- [NeurIPS 2022] DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. [Paper] [Code]
- [ICLR 2021] Denoising diffusion implicit models. [Paper]
- [Arxiv 2021.05] Gotta Go Fast When Generating Data with Score-Based Models. [Paper] [Code]
- [NeurIPS 2021] Diffusion Normalizing Flow. [Paper]
- [ICML 2024] Align Your Steps: Optimizing Sampling Schedules in Diffusion Models. [Paper]
- [ICML 2024] Accelerating Parallel Sampling of Diffusion Models. [Paper] [Code]
- [NeurIPS 2023] Parallel Sampling of Diffusion Models. [Paper] [Code]
- [Arxiv 2023.12] StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation. [Paper] [Code]
- [NeurIPS 2022] Deep Equilibrium Approaches to Diffusion Models. [Paper] [Code]
- [Arxiv 2021.06] On fast sampling of diffusion probabilistic models. [Paper] [Code]
- [Arxiv 2021.06] Learning to Efficiently Sample from Diffusion Probabilistic Models. [Paper]
- [ICML 2024] A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models. [Paper] [Code]
- [ICLR 2023] kNN-Diffusion: Image Generation via Large-Scale Retrieval. [Paper]
- [ICLR 2023] Re-Imagen: Retrieval-Augmented Text-to-Image Generator. [Paper]
- [ICML 2023] ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval. [Paper] [Code]
- [Arxiv 2022.04] Semi-Parametric Neural Image Synthesis. [Paper] [Code]
- [EMNLP 2021] Consistent Accelerated Inference via Confident Adaptive Transformers. [Paper] [Code]
- [CVPR 2025] MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation [Paper] [Code]
- [Arxiv 2025.03] SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation. [Paper] [Code]
- [CVPR 2024] 3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation. [Paper] [Code]
- [CVPR 2024] One-step Diffusion with Distribution Matching Distillation. [Paper]
- [CVPR 2023] On Distillation of Guided Diffusion Models. [Paper]
- [ICLR 2023] DreamFusion: Text-to-3D using 2D Diffusion. [Paper] [Project]
- [NeurIPS 2023] ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. [Paper] [Code]
- [NeurIPS 2023] Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models. [Paper] [Code]
- [ICLR 2022] Progressive Distillation for Fast Sampling of Diffusion Models. [Paper] [Code]
- [Arxiv 2021.01] Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed. [Paper] [Code]
- [NeurIPS 2024] BitsFusion: 1.99 bits Weight Quantization of Diffusion Model. [Paper]
- [ICLR 2024] EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models. [Paper] [Code]
- [CVPR 2023] Post-training Quantization on Diffusion Models. [Paper] [Code]
- [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models. [Paper] [Code]
- [NeurIPS 2023] Leveraging Early-Stage Robustness in Diffusion Models for Efficient and High-Quality Image Synthesis. [Paper]
- [NeurIPS 2023] PTQD: Accurate Post-Training Quantization for Diffusion Models. [Paper] [Code]
- [NeurIPS 2023] Temporal Dynamic Quantization for Diffusion Models. [Paper]
- [ICLR 2021] BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction. [Paper] [Code]
- [ICLR 2020] Learned Step Size Quantization. [Paper]
- [CVPRW 2024] LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights. [Paper]
- [ICML 2024] LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging. [Paper] [Code]
- [Arxiv 2024.04] LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models. [Paper]
- [NeurIPS 2023] Structural Pruning for Diffusion Models. [Paper] [Code]
- [FPL 2024] SDA: Low-Bit Stable Diffusion Acceleration on Edge FPGAs. [Paper] [Code]
- [ISCAS 2024] A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision. [Paper]
- [CVPRW 2023] Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations. [Paper] [Project]
- [CVPR 2024] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models. [Paper] [Code]
- [Arxiv 2024.07] SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules. [Paper]
- [Arxiv 2024.05] PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models. [Paper] [Code]
- [MLSys 2024] DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines. [Paper]
- [Arxiv 2025.02] $ackslash$Delta $-DiT: Accelerating Diffusion Transformers without training via Denoising Property Alignment. [Paper]
- [NSDI 2024] Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models. [Paper] [Code]
- [CVPR 2024] DeepCache: Accelerating Diffusion Models for Free. [Paper] [Code]
- [CVPR 2024] Cache Me if You Can: Accelerating Diffusion Models through Block Caching. [Paper] [Project]
- [Arxiv 2024.07] FORA: Fast-Forward Caching in Diffusion Transformer Acceleration. [Paper] [Code]
- [Arxiv 2024.06] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching. [Paper] [Code]
- [NeurIPS 2024] MD-DiT: Step-aware Mixture-of-Depths for Efficient Diffusion Transformers. [Paper]
Training | Inference | Key Features | |
---|---|---|---|
FlashAttention [Code] | β | β | High-efficient attention computation for Diffusion Transformers (DiT) |
xFormers [Code] | β | β | Memory-efficient attention and modular ops tailored for diffusion Transformer speedups |
DeepSpeed [Code] | β | β | Scalable distributed training and inference optimizations for large diffusion models |
OneFlow [Code] | β | β | Compiler-optimized pipeline for faster diffusion model training and sampling |
Stable-Fast [Code] | β | β | Fast inference optimization for Diffusers with CUDA and fusion |
Onediff [Code] | β | β | Diffusion-specific acceleration with DeepCache and quantization |
DeepCache [Code] | β | β | Reuses cached diffusion features to speed up inference iterations |
TGATE [Code] | β | β | Temporal gating to streamline cross-attention in diffusion inference |
xDiT [Code] | β | β | Parallel inference engine for Diffusion Transformers |