Skip to content

wangkai930418/awesome-diffusion-categorized

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

PostDoc position in LAMP group

We are looking for postdocs to join LAMP group working on Diffusion Models.

Awesome Diffusion Categorized

Contents

Accelerate

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
[ICLR 2024 Spotlight] [Diffusers 1] [Diffusers 2] [Project] [Code]

SDXL-Turbo: Adversarial Diffusion Distillation
[Website] [Diffusers 1] [Diffusers 2] [Project] [Code]

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping
[Website] [Diffusers 1] [Diffusers 2] [Project] [Code]

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
[Website] [Diffusers] [Project] [Code]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
[Website] [Project] [Code]

DMD2: Improved Distribution Matching Distillation for Fast Image Synthesis
[NeurIPS 2024 Oral] [Project] [Code]

DMD1: One-step Diffusion with Distribution Matching Distillation
[CVPR 2024] [Project] [Code]

Consistency Models
[ICML 2023](https://doi.org/10.48550/arXiv.2410.11081] [Diffusers] [Code]

SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
[CVPR 2024] [Project] [Code]

SwiftBrush V2: Make Your One-Step Diffusion Model Better Than Its Teacher
[ECCV 2024] [Project] [Code]

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
[CVPR 2024] [Project] [Code]

PCM : Phased Consistency Model
[NeurIPS 2024] [Project] [Code]

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
[NeurIPS 2024] [Project] [Code]

KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis
[NeurIPS 2024] [Project] [Code]

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
[Website] [Project] [Code]

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
[Website] [Project] [Code]

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
[Website] [Project] [Code]

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator
[Website] [Project] [Code]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
[Website] [Project] [Code]

Adaptive Caching for Faster Video Generation with Diffusion Transformers
[Website] [Project] [Code]

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
[Website] [Project] [Code]

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
[Website] [Project] [Code]

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
[Website] [Project] [Code]

Reward Guided Latent Consistency Distillation
[Website] [Project] [Code]

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
[Website] [Project] [Code]

Relational Diffusion Distillation for Efficient Image Generation
[ACM MM 2024 (Oral)] [Code]

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
[CVPR 2024] [Code]

SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
[ECCV 2024] [Code]

Accelerating Image Generation with Sub-path Linear Approximation Model
[ECCV 2024] [Code]

Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models
[NeurIPS 2023] [Code]

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
[NeurIPS 2024] [Code]

A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models
[ICML 2024] [Code]

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation
[ICML 2024] [Code]

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
[ICLR 2024] [Code]

Accelerating Vision Diffusion Transformers with Skip Branches
[Website] [Code]

Accelerating Diffusion Transformers with Dual Feature Caching
[Website] [Code]

One Step Diffusion via Shortcut Models
[Website] [Code]

DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach
[Website] [Code]

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
[Website] [Code]

Stable Consistency Tuning: Understanding and Improving Consistency Models
[Website] [Code]

SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
[Website] [Code]

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
[Website] [Code]

SDXL-Lightning: Progressive Adversarial Diffusion Distillation
[Website] [Code]

Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation
[Website] [Code]

Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation
[Website] [Code]

Diffusion Models Are Innate One-Step Generators
[Website] [Code]

Distilling Diffusion Models into Conditional GANs
[ECCV 2024] [Project]

Cache Me if You Can: Accelerating Diffusion Models through Block Caching
[CVPR 2024] [Project]

Plug-and-Play Diffusion Distillation
[CVPR 2024] [Project]

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
[NeurIPS 2023] [Project]

Diffusion Adversarial Post-Training for One-Step Video Generation
[Website] [Project]

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance
[Website] [Project]

NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
[Website] [Project]

Truncated Consistency Models
[Website] [Project]

Multi-student Diffusion Distillation for Better One-step Generators
[Website] [Project]

Effortless Efficiency: Low-Cost Pruning of Diffusion Models
[Website] [Project]

FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
[NeurIPS 2024]

One-Step Diffusion Distillation through Score Implicit Matching
[NeurIPS 2024]

Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
[AAAI 2025]

Inference-Time Diffusion Model Distillation
[Website]

Token Pruning for Caching Better: 9 Times Acceleration on Stable Diffusion for Free
[Website]

HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration
[Website]

Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models
[Website]

MLCM: Multistep Consistency Distillation of Latent Diffusion Model
[Website]

EM Distillation for One-step Diffusion Models
[Website]

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
[Website]

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
[Website]

Partially Conditioned Patch Parallelism for Accelerated Diffusion Model Inference
[Website]

Importance-based Token Merging for Diffusion Models
[Website]

Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation
[Website]

Accelerating Diffusion Models with One-to-Many Knowledge Distillation
[Website]

Accelerating Video Diffusion Models via Distribution Matching
[Website]

TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution
[Website]

DDIL: Improved Diffusion Distillation With Imitation Learning
[Website]

OSV: One Step is Enough for High-Quality Image to Video Generation
[Website]

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance
[Website]

Token Caching for Diffusion Transformer Acceleration
[Website]

DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization
[Website]

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
[Website]

Flow Generator Matching
[Website]

Multistep Distillation of Diffusion Models via Moment Matching
[Website]

SFDDM: Single-fold Distillation for Diffusion models
[Website]

LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models
[Website]

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
[Website]

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
[Website]

Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training
[Website]

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution
[Website]

Train-Free

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
[NeurIPS 2024] [Project] [Code]

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
[NeurIPS 2024] [Project] [Code]

DeepCache: Accelerating Diffusion Models for Free
[CVPR 2024] [Project] [Code]

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference
[NeurIPS 2024] [Code]

DiTFastAttn: Attention Compression for Diffusion Transformer Models
[NeurIPS 2024] [Code]

Structural Pruning for Diffusion Models
[NeurIPS 2023] [Code]

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
[ICCV 2023] [Code]

Agent Attention: On the Integration of Softmax and Linear Attention
[ECCV 2024] [Code]

Token Merging for Fast Stable Diffusion
[CVPRW 2024] [Code]

FORA: Fast-Forward Caching in Diffusion Transformer Acceleration
[Website] [Code]

Real-Time Video Generation with Pyramid Attention Broadcast
[Website] [Code]

Accelerating Diffusion Transformers with Token-wise Feature Caching
[Website] [Code]

TGATE-V1: Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
[Website] [Code]

TGATE-V2: Faster Diffusion via Temporal Attention Decomposition
[Website] [Code]

SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
[Website] [Code]

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
[CVPR 2024] [Project]

Cache Me if You Can: Accelerating Diffusion Models through Block Caching
[Website] [Project]

Token Fusion: Bridging the Gap between Token Pruning and Token Merging
[WACV 2024]

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
[Website]

PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future
[Website]

Δ-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
[Website]

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step
[Website]

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences
[Website]

Fast constrained sampling in pre-trained diffusion models
[Website]

Image Restoration

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model
[ICLR 2023 oral] [Project] [Code]

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
[CVPR 2024] [Project] [Code]

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
[CVPR 2024] [Project] [Code]

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
[CVPR 2024] [Project] [Code]

From Posterior Sampling to Meaningful Diversity in Image Restoration
[ICLR 2024] [Project] [Code]

Generative Diffusion Prior for Unified Image Restoration and Enhancement
[CVPR 2023] [Project] [Code]

MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
[ECCV 2024] [Project] [Code]

Image Restoration with Mean-Reverting Stochastic Differential Equations
[ICML 2023] [Project] [Code]

PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
[NeurIPS 2024 Spotlight] [Project] [Code]

Denoising Diffusion Models for Plug-and-Play Image Restoration
[CVPR 2023 Workshop NTIRE] [Project] [Code]

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration
[Website] [Project] [Code]

Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing
[Website] [Project] [Code]

SVFR: A Unified Framework for Generalized Video Face Restoration
[Website] [Project] [Code]

Solving Video Inverse Problems Using Image Diffusion Models
[Website] [Project] [Code]

Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration
[Website] [Project] [Code]

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
[Website] [Project] [Code]

FlowIE: Efficient Image Enhancement via Rectified Flow
[CVPR 2024 oral] [Code]

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
[NeurIPS 2023 (Spotlight)] [Code]

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
[ICML 2023 oral] [Code]

Diffusion Priors for Variational Likelihood Estimation and Image Denoising
[NeurIPS 2024 Spotlight] [Code]

Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance
[CVPR 2024] [Code]

DiffIR: Efficient Diffusion Model for Image Restoration
[ICCV 2023] [Code]

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
[ECCV 2024] [Code]

Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
[ECCV 2024] [Code]

DAVI: Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problem
[ECCV 2024] [Code]

Low-Light Image Enhancement with Wavelet-based Diffusion Models
[SIGGRAPH Asia 2023] [Code]

Residual Denoising Diffusion Models
[CVPR 2024] [Code]

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
[CVPR 2024] [Code]

Deep Equilibrium Diffusion Restoration with Parallel Sampling
[CVPR 2024] [Code]

ReFIR: Grounding Large Restoration Models with Retrieval Augmentation
[NeurIPS 2024] [Code]

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
[NeurIPS 2024] [Code]

Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models
[CVPR 2023 Workshop NTIRE] [Code]

Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement
[CVPR 2024 Workshop NTIRE] [Code]

Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression
[Website] [Code]

Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems
[Website] [Code]

UniProcessor: A Text-induced Unified Low-level Image Processor
[Website] [Code]

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)
[Website] [Code]

Varformer: Adapting VAR's Generative Prior for Image Restoration
[Website] [Code]

Low-Light Image Enhancement via Generative Perceptual Priors
[Website] [Code]

PnP-Flow: Plug-and-Play Image Restoration with Flow Matching
[Website] [Code]

VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement
[Website] [Code]

Deep Data Consistency: a Fast and Robust Diffusion Model-based Solver for Inverse Problems
[Website] [Code]

Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration
[Website] [Code]

Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling
[Website] [Code]

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models
[Website] [Code]

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior
[Website] [Code]

Frequency Compensated Diffusion Model for Real-scene Dehazing
[Website] [Code]

Efficient Image Deblurring Networks based on Diffusion Models
[Website] [Code]

Blind Image Restoration via Fast Diffusion Inversion
[Website] [Code]

DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models
[Website] [Code]

Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling
[Website] [Code]

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
[Website] [Code]

Unlimited-Size Diffusion Restoration
[Website] [Code]

VmambaIR: Visual State Space Model for Image Restoration
[Website] [Code]

Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model
[Website] [Code]

Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model
[Website] [Code]

TIP: Text-Driven Image Processing with Semantic and Restoration Instructions
[ECCV 2024] [Project]

Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models
[NeurIPS 2024] [Project]

GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration
[Website] [Project]

VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models
[Website] [Project]

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
[Website] [Project]

Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model
[ICCV 2023]

Multiscale Structure Guided Diffusion for Image Deblurring
[ICCV 2023]

Boosting Image Restoration via Priors from Pre-trained Models
[CVPR 2024]

A Modular Conditional Diffusion Framework for Image Reconstruction
[Website]

Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model
[Website]

Particle-Filtering-based Latent Diffusion for Inverse Problems
[Website]

Bayesian Conditioned Diffusion Models for Inverse Problem
[Website]

ReCo-Diff: Explore Retinex-Based Condition Strategy in Diffusion Model for Low-Light Image Enhancement
[Website]

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
[Website]

Tell Me What You See: Text-Guided Real-World Image Denoising
[Website]

Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement
[Website]

Prototype Clustered Diffusion Models for Versatile Inverse Problems
[Website]

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement
[Website]

Taming Generative Diffusion for Universal Blind Image Restoration
[Website]

Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL
[Website]

TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration
[Website]

Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior
[Website]

Enhancing Diffusion Models for Inverse Problems with Covariance-Aware Posterior Sampling
[Website]

Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration
[Website]

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process
[Website]

Diffusion State-Guided Projected Gradient for Inverse Problems
[Website]

InstantIR: Blind Image Restoration with Instant Generative Reference
[Website]

Score-Based Variational Inference for Inverse Problems
[Website]

Towards Flexible and Efficient Diffusion Low Light Enhancer
[Website]

G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving
[Website]

AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations
[Website]

DiffMVR: Diffusion-based Automated Multi-Guidance Video Restoration
[Website]

Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion
[Website]

DIVD: Deblurring with Improved Video Diffusion Model
[Website]

Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
[Website]

Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization
[Website]

Are Conditional Latent Diffusion Models Effective for Image Restoration?
[Website]

Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration
[Website]

DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration
[Website]

Colorization

ColorFlow: Retrieval-Augmented Image Sequence Colorization
[Website] [Project] [Code]

Control Color: Multimodal Diffusion-based Interactive Image Colorization
[Website] [Project] [Code]

Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior
[Website] [Project] [Code]

MangaNinja: Line Art Colorization with Precise Reference Following
[Website] [Project] [Code]

ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
[Website] [Code]

Diffusing Colors: Image Colorization with Text Guided Diffusion
[SIGGRAPH Asia 2023] [Project]

VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization
[Website] [Project]

Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements
[Website]

DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models
[Website]

Face Restoration

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
[Website] [Project] [Code]

OSDFace: One-Step Diffusion Model for Face Restoration
[Website] [Project] [Code]

ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration
[Website] [Project] [Code]

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention
[Website] [Project] [Code]

DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
[CVPR 2023] [Code]

PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance
[NeurIPS 2023] [Code]

DifFace: Blind Face Restoration with Diffused Error Contraction
[Website] [Code]

AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior
[Website] [Code]

RestorerID: Towards Tuning-Free Face Restoration with ID Preservation
[Website] [Code]

Towards Real-World Blind Face Restoration with Generative Diffusion Prior
[Website] [Code]

Towards Unsupervised Blind Face Restoration using Diffusion Prior
[Website] [Project]

DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration
[Website]

CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models
[Website]

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration
[Website]

Gaussian is All You Need: A Unified Framework for Solving Inverse Problems via Diffusion Posterior Sampling
[Website]

Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
[Website]

DR-BFR: Degradation Representation with Diffusion Models for Blind Face Restoration
[Website]

Storytelling

⭐⭐Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
[CVPR 2024] [Project] [Code]

⭐⭐Training-Free Consistent Text-to-Image Generation
[SIGGRAPH 2024] [Project] [Code]

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
[SIGGRAPH 2024] [Project] [Code]

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
[Website] [Project] [Code]

AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
[Website] [Project] [Code]

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
[Website] [Project] [Code]

StoryGPT-V: Large Language Models as Consistent Story Visualizers
[Website] [Project] [Code]

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
[Website] [Project] [Code]

TaleCrafter: Interactive Story Visualization with Multiple Characters
[Website] [Project] [Code]

Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
[Website] [Project] [Code]

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
[Website] [Project] [Code]

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
[Website] [Project] [Code]

Manga Generation via Layout-controllable Diffusion
[Website] [Project] [Code]

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
[ECCV 2024] [Code]

Make-A-Story: Visual Memory Conditioned Consistent Story Generation
[CVPR 2023] [Code]

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization
[AAAI 2025] [Code]

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
[Website] [Code]

SEED-Story: Multimodal Long Story Generation with Large Language Model
[Website] [Code]

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
[Website] [Code]

Masked Generative Story Transformer with Character Guidance and Caption Augmentation
[Website] [Code]

StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
[Website] [Code]

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
[Website] [Code]

DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
[Website] [Project]

Multi-Shot Character Consistency for Text-to-Video Generation
[Website] [Project]

MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising
[Website] [Project]

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis
[ICASSP 2024]

CogCartoon: Towards Practical Story Visualization
[Website]

Generating coherent comic with rich story using ChatGPT and Stable Diffusion
[Website]

Improved Visual Story Generation with Adaptive Context Modeling
[Website]

Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control
[Website]

Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models
[Website]

Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models
[Website]

ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models
[Website]

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
[Website]

StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
[Website]

Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention
[Website]

Try On

TryOnDiffusion: A Tale of Two UNets
[CVPR 2023] [Website] [Project] [Official Code] [Unofficial Code]

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
[CVPR 2024] [Project] [Code]

VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
[Website] [Project] [Code]

IMAGDressing-v1: Customizable Virtual Dressing
[Website] [Project] [Code]

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person
[Website] [Project] [Code]

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
[Website] [Project] [Code]

ViViD: Video Virtual Try-on using Diffusion Models
[Website] [Project] [Code]

FashionComposer: Compositional Fashion Image Generation
[Website] [Project] [Code]

GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting
[Website] [Project] [Code]

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images
[Website] [Project] [Code]

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
[Website] [Project] [Code]

PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns
[Website] [Project] [Code]

StableGarment: Garment-Centric Generation via Stable Diffusion
[Website] [Project] [Code]

Improving Diffusion Models for Virtual Try-on
[Website] [Project] [Code]

D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On
[ECCV 2024] [Code]

Improving Virtual Try-On with Garment-focused Diffusion Models
[ECCV 2024] [Code]

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
[CVPR 2024] [Code]

Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow
[ACM MM 2023] [Code]

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
[ACM MM 2023] [Code]

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
[Website] [Code]

CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Model
[Website] [Code]

Learning Flow Fields in Attention for Controllable Person Image Generation
[Website] [Code]

DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling
[Website] [Code]

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
[Website] [Code]

Consistent Human Image and Video Generation with Spatially Conditioned Diffusion
[Website] [Code]

MV-VTON: Multi-View Virtual Try-On with Diffusion Models
[Website] [Code]

PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
[Website] [Code]

M&M VTO: Multi-Garment Virtual Try-On and Editing
[CVPR 2024 Highlight] [Project]

WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
[ECCV 2024] [Project]

Fashion-VDM: Video Diffusion Model for Virtual Try-On
[SIGGRAPH Asia 2024] [Project]

Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos
[Website] [Project]

Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild
[Website] [Project]

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
[Website] [Project]

Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
[Website] [Project]

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
[Website] [Project]

VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
[Website] [Project]

AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario
[Website] [Project]

Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
[Website] [Project]

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on
[IJCAI 2024]

GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon
[Website]

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on
[Website]

Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles
[Website]

Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models
[Website]

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models
[Website]

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On
[Website]

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model
[Website]

AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion
[Website]

DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing
[Website]

TED-VITON: Transformer-Empowered Diffusion Models for Virtual Try-On
[Website]

Controllable Human Image Generation with Personalized Multi-Garments
[Website]

RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
[Website]

SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
[Website]

IGR: Improving Diffusion Model for Garment Restoration from Person Image
[Website]

DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On
[Website]

DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
[Website]

Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models
[Website]

MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer
[Website]

Drag Edit

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
[ICLR 2024] [Website] [Project] [Code]

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
[SIGGRAPH 2023] [Project] [Code]

Readout Guidance: Learning Control from Diffusion Features
[CVPR 2024 Highlight] [Project] [Code]

FreeDrag: Feature Dragging for Reliable Point-based Image Editing
[CVPR 2024] [Project] [Code]

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
[CVPR 2024] [Project] [Code]

InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
[Website] [Project] [Code]

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
[Website] [Project] [Code]

Repositioning the Subject within Image
[Website] [Project] [Code]

Drag-A-Video: Non-rigid Video Editing with Point-based Interaction
[Website] [Project] [Code]

ObjCtrl-2.5D: Training-free Object Control with Camera Poses
[Website] [Project] [Code]

DragAnything: Motion Control for Anything using Entity Representation
[Website] [Project] [Code]

InstantDrag: Improving Interactivity in Drag-based Image Editing
[Website] [Project] [Code]

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
[CVPR 2024] [Code]

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
[CVPR 2024] [Code]

DragVideo: Interactive Drag-style Video Editing
[ECCV 2024] [Code]

RotationDrag: Point-based Image Editing with Rotated Diffusion Features
[Website] [Code]

TrackGo: A Flexible and Efficient Method for Controllable Video Generation
[Website] [Project]

DragText: Rethinking Text Embedding in Point-based Image Editing
[Website] [Project]

OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation
[Website] [Project]

FastDrag: Manipulate Anything in One Step
[Website] [Project]

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
[Website] [Project]

StableDrag: Stable Dragging for Point-based Image Editing
[Website] [Project]

DiffUHaul: A Training-Free Method for Object Dragging in Images
[Website] [Project]

RegionDrag: Fast Region-Based Image Editing with Diffusion Models
[Website]

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators
[Website]

Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing
[Website]

AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing
[Website]

Diffusion Models Inversion

⭐⭐⭐Null-text Inversion for Editing Real Images using Guided Diffusion Models
[CVPR 2023] [Website] [Project] [Code]

⭐⭐Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
[ICLR 2024] [Website] [Project] [Code]

Inversion-Based Creativity Transfer with Diffusion Models
[CVPR 2023] [Website] [Code]

EDICT: Exact Diffusion Inversion via Coupled Transformations
[CVPR 2023] [Website] [Code]

Improving Negative-Prompt Inversion via Proximal Guidance
[Website] [Code]

An Edit Friendly DDPM Noise Space: Inversion and Manipulations
[CVPR 2024] [Project] [Code] [Demo]

Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing
[NeurIPS 2023] [Website] [Code]

Inversion-Free Image Editing with Natural Language
[CVPR 2024] [Project] [Code]

LEDITS++: Limitless Image Editing using Text-to-Image Models
[CVPR 2024] [Project] [Code]

Noise Map Guidance: Inversion with Spatial Context for Real Image Editing
[ICLR 2024] [Website] [Code]

ReNoise: Real Image Inversion Through Iterative Noising
[ECCV 2024] [Project] [Code]

IterInv: Iterative Inversion for Pixel-Level T2I Models
[NeurIPS-W 2023] [Openreview] [NeuripsW] [Website] [Code]

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
[Website] [Project] [Code]

Object-aware Inversion and Reassembly for Image Editing
[Website] [Project] [Code]

Taming Rectified Flow for Inversion and Editing
[Website] [Project] [Code]

A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance
[ICCV 2023] [Code]

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
[ECCV 2024] [Code]

LocInv: Localization-aware Inversion for Text-Guided Image Editing
[CVPR 2024 AI4CC workshop] [Code]

Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling
[IJCAI 2024] [Code]

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
[Website] [Code]

Generating Non-Stationary Textures using Self-Rectification
[Website] [Code]

Exact Diffusion Inversion via Bi-directional Integration Approximation
[Website] [Code]

IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
[Website] [Code]

Fixed-point Inversion for Text-to-image diffusion models
[Website] [Code]

Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
[Website] [Code]

Effective Real Image Editing with Accelerated Iterative Diffusion Inversion
[ICCV 2023 Oral] [Website]

BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models
[NeurIPS 2024]

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing
[NeurIPS 2024]

BARET : Balanced Attention based Real image Editing driven by Target-text Inversion
[WACV 2024]

Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing
[ICASSP 2024]

Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing
[Website]

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
[Website]

Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models
[Website]

Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models
[Website]

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
[Website]

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models
[Website]

KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing
[Website]

Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
[Website]

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
[Website]

Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
[Website]

Exploring Optimal Latent Trajetory for Zero-shot Image Editing
[Website]

Text Guided Image Editing

⭐⭐⭐Prompt-to-Prompt Image Editing with Cross Attention Control
[ICLR 2023] [Website] [Project] [Code] [Replicate Demo]

⭐⭐⭐Zero-shot Image-to-Image Translation
[SIGGRAPH 2023] [Project] [Code] [Replicate Demo] [Diffusers Doc] [Diffusers Code]

⭐⭐InstructPix2Pix: Learning to Follow Image Editing Instructions
[CVPR 2023 (Highlight)] [Website] [Project] [Diffusers Doc] [Diffusers Code] [Official Code] [Dataset]

⭐⭐Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
[CVPR 2023] [Website] [Project] [Code] [Dataset] [Replicate Demo] [Demo]

DiffEdit: Diffusion-based semantic image editing with mask guidance
[ICLR 2023] [Website] [Unofficial Code] [Diffusers Doc] [Diffusers Code]

Imagic: Text-Based Real Image Editing with Diffusion Models
[CVPR 2023] [Website] [Project] [Diffusers]

Inpaint Anything: Segment Anything Meets Image Inpainting
[Website] [Code 1] [Code 2]

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
[ICCV 2023] [Website] [Project] [Code] [Demo]

Collaborative Score Distillation for Consistent Visual Synthesis
[NeurIPS 2023] [Website] [Project] [Code]

Visual Instruction Inversion: Image Editing via Visual Prompting
[NeurIPS 2023] [Website] [Project] [Code]

Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models
[NeurIPS 2023] [Website] [Code]

Localizing Object-level Shape Variations with Text-to-Image Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance
[Website] [Code1] [Code2] [Diffusers Code]

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models
[Website] [Project] [Code] [Demo]

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
[CVPR 2024] [Project] [Code]

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
[CVPR 2024] [Project] [Code]

Text-Driven Image Editing via Learnable Regions
[CVPR 2024] [Project] [Code]

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators
[ICLR 2024] [Project] [Code]

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models
[SIGGRAPH Asia 2024] [Project] [Code]

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
[NeurIPS 2024] [Project] [Code]

Zero-shot Image Editing with Reference Imitation
[Website] [Project] [Code]

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
[Website] [Project] [Code]

MultiBooth: Towards Generating All Your Concepts in an Image from Text
[Website] [Project] [Code]

Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting
[Website] [Project] [Code]

StyleBooth: Image Style Editing with Multimodal Instruction
[Website] [Project] [Code]

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
[Website] [Project] [Code]

EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods
[Website] [Project] [Code]

InsightEdit: Towards Better Instruction Following for Image Editing
[Website] [Project] [Code]

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions
[Website] [Project] [Code]

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path
[Website] [Project] [Code]

HIVE: Harnessing Human Feedback for Instructional Visual Editing
[Website] [Project] [Code]

FaceStudio: Put Your Face Everywhere in Seconds
[Website] [Project] [Code]

Edicho: Consistent Image Editing in the Wild
[Website] [Project] [Code]

Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach
[Website] [Project] [Code]

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
[Website] [Project] [Code]

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction
[Website] [Project] [Code]

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance
[Website] [Project] [Code]

LIME: Localized Image Editing via Attention Regularization in Diffusion Models
[Website] [Project] [Code]

MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond
[Website] [Project] [Code]

MagicQuill: An Intelligent Interactive Image Editing System
[Website] [Project] [Code]

Scaling Concept With Text-Guided Diffusion Models
[Website] [Project] [Code]

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
[Website] [Project] [Code]

FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
[Website] [Project] [Code]

FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning
[Website] [Project] [Code]

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
[Website] [Project] [Code]

Delta Denoising Score
[Website] [Project] [Code]

InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences
[Website] [Project] [Code]

UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image
[SIGGRAPH 2023] [Code]

Learning to Follow Object-Centric Image Editing Instructions Faithfully
[EMNLP 2023] [Code]

GroupDiff: Diffusion-based Group Portrait Editing
[ECCV 2024] [Code]

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
[CVPR 2024] [Code]

ZONE: Zero-Shot Instruction-Guided Local Editing
[CVPR 2024] [Code]

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
[CVPR 2024] [Code]

DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
[ECCV 2024] [Code]

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
[ECCV 2024] [Code]

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing
[ECCV 2024] [Code]

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
[AAAI 2024] [Code]

FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
[AAAI 2024] [Code]

Face Aging via Diffusion-based Editing
[BMVC 2023] [Code]

Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing
[Website] [Code]

FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing
[Website] [Code]

Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing
[Website] [Code]

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
[Website] [Code]

DiT4Edit: Diffusion Transformer for Image Editing
[Website] [Code]

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
[Website] [Code]

Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing
[Website] [Code]

EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
[Website] [Code]

ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing
[Website] [Code]

Differential Diffusion: Giving Each Pixel Its Strength
[Website] [Code]

Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing
[Website] [Code]

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
[Website] [Code]

Region-Aware Diffusion for Zero-shot Text-driven Image Editing
[Website] [Code]

Forgedit: Text Guided Image Editing via Learning and Forgetting
[Website] [Code]

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing
[Website] [Code]

An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control
[Website] [Code]

FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
[Website] [Code]

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance
[Website] [Code]

SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing
[Website] [Code]

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
[Website] [Code]

PromptFix: You Prompt and We Fix the Photo
[Website] [Code]

FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
[Website] [Code]

Conditional Score Guidance for Text-Driven Image-to-Image Translation
[NeurIPS 2023] [Website]

Emu Edit: Precise Image Editing via Recognition and Generation Tasks
[CVPR 2024] [Project]

ByteEdit: Boost, Comply and Accelerate Generative Image Editing
[ECCV 2024] [Project]

Watch Your Steps: Local Image and Scene Editing by Text Instructions
[ECCV 2024] [Project]

TurboEdit: Instant text-based image editing
[ECCV 2024] [Project]

Novel Object Synthesis via Adaptive Text-Image Harmony
[NeurIPS 2024] [Project]

Textualize Visual Prompt for Image Editing via Diffusion Bridge
[AAAI 2025] [Project]

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
[Website] [Project]

HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads
[Website] [Project]

MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models
[Website] [Project]

Object-level Visual Prompts for Compositional Image Generation
[Website] [Project]

Instruction-based Image Manipulation by Watching How Things Move
[Website] [Project]

BrushEdit: All-In-One Image Inpainting and Editing
[Website] [Project]

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
[Website] [Project]

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
[Website] [Project]

SeedEdit: Align Image Re-Generation to Image Editing
[Website] [Project]

Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection
[Website] [Project]

Generative Image Layer Decomposition with Visual Effects
[Website] [Project]

Editable Image Elements for Controllable Synthesis
[Website] [Project]

SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing
[Website] [Project]

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
[Website] [Project]

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation
[Website] [Project]

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
[Website] [Project]

GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models
[Website] [Project]

MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers
[Website] [Project]

FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing
[Website] [Project]

GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
[Website] [Project]

SOEDiff: Efficient Distillation for Small Object Editing
[Website] [Project]

Click2Mask: Local Editing with Dynamic Mask Generation
[Website] [Project]

Stable Flow: Vital Layers for Training-Free Image Editing
[Website] [Project]

Iterative Multi-granular Image Editing using Diffusion Models
[WACV 2024]

Text-to-image Editing by Image Information Removal
[WACV 2024]

TexSliders: Diffusion-Based Texture Editing in CLIP Space
[SIGGRAPH 2024]

Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models
[CVPR 2023 AI4CC Workshop]

Learning Feature-Preserving Portrait Editing from Generated Pairs
[Website]

EmoEdit: Evoking Emotions through Image Manipulation
[Website]

DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images
[Website]

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models
[Website]

iEdit: Localised Text-guided Image Editing with Weak Supervision
[Website]

User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques
[Website]

PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing
[Website]

PRedItOR: Text Guided Image Editing with Diffusion Prior
[Website]

FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing
[Website]

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing
[Website]

Image Translation as Diffusion Visual Programmers
[Website]

Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing
[Website]

LoMOE: Localized Multi-Object Editing via Multi-Diffusion
[Website]

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
[Website]

DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation
[Website]

InstructGIE: Towards Generalizable Image Editing
[Website]

LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing
[Website]

Uncovering the Text Embedding in Text-to-Image Diffusion Models
[Website]

Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer
[Website]

Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion
[Website]

Text Guided Image Editing with Automatic Concept Locating and Forgetting
[Website]

The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP
[Website]

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing
[Website]

Achieving Complex Image Edits via Function Aggregation with Diffusion Models
[Website]

Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing
[Website]

InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models
[Website]

PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM
[Website]

Augmentation-Driven Metric for Balancing Preservation and Modification in TextGuided Image Editing
[Website]

Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing
[Website]

ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing
[Website]

ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models
[Website]

ColorEdit: Training-free Image-Guided Color editing with diffusion model
[Website]

GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter
[Website]

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
[Website]

Pathways on the Image Manifold: Image Editing via Video Generation
[Website]

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair
[Website]

Action-based image editing guided by human instructions
[Website]

Addressing Attribute Leakages in Diffusion-based Image Editing without Training
[Website]

Prompt Augmentation for Self-supervised Text-guided Image Manipulation
[Website]

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
[Website]

Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance
[Website]

Continual Learning

RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models
[CVPR 2023] [Website] [Project] [Code]

Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
[ECCV 2024 Oral] [Code]

How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?
[NeurIPS 2024] [Code]

CLoG: Benchmarking Continual Learning of Image Generation Models
[Website] [Code]

Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
[Website] [Code]

Continual Learning of Diffusion Models with Generative Distillation
[Website] [Code]

Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental Learning
[Website] [Code]

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA
[TMLR] [Project]

Assessing Open-world Forgetting in Generative Image Model Customization
[Website] [Project]

Class-Incremental Learning using Diffusion Model for Distillation and Replay
[ICCV 2023 VCL workshop best paper]

Create Your World: Lifelong Text-to-Image Diffusion
[Website]

Low-Rank Continual Personalization of Diffusion Models
[Website]

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
[Website]

Online Continual Learning of Video Diffusion Models From a Single Video Stream
[Website]

Exploring Continual Learning of Diffusion Models
[Website]

DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency
[Website]

DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation
[Website]

Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters
[Website]

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning
[Website]

MuseumMaker: Continual Style Customization without Catastrophic Forgetting
[Website]

Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion
[Website]

Remove Concept

Ablating Concepts in Text-to-Image Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Erasing Concepts from Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Paint by Inpaint: Learning to Add Image Objects by Removing Them First
[Website] [Project] [Code]

One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
[Website] [Project] [Code]

Editing Massive Concepts in Text-to-Image Diffusion Models
[Website] [Project] [Code]

Memories of Forgotten Concepts
[Website] [Project] [Code]

STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models
[Website] [Project] [Code]

ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling
[Website] [Project] [Code]

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
[Website] [Project] [Code]

ACE: Anti-Editing Concept Erasure in Text-to-Image Models
[Website] [Code]

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
[ICML 2023 workshop] [Code]

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
[ECCV 2024] [Code]

Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
[ECCV 2024] [Code]

Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation
[NeurIPS 2024] [Code]

Unveiling Concept Attribution in Diffusion Models
[Website] [Code]

TraSCE: Trajectory Steering for Concept Erasure
[Website] [Code]

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
[Website] [Code]

ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion
[Website] [Code]

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
[Website] [Code]

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
[Website] [Code]

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
[Website] [Code]

Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
[Website] [Code]

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
[Website] [Code]

Add-SD: Rational Generation without Manual Reference
[Website] [Code]

RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining
[Website] [Project]

MACE: Mass Concept Erasure in Diffusion Models
[CVPR 2024]

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers
[Website]

Continuous Concepts Removal in Text-to-image Diffusion Models
[Website]

Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models
[Website]

Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
[Website]

Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
[Website]

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Model
[Website]

Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning
[Website]

Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models
[Website]

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
[Website]

All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
[Website]

EraseDiff: Erasing Data Influence in Diffusion Models
[Website]

UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models
[Website]

Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts
[Website]

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
[Website]

Pruning for Robust Concept Erasing in Diffusion Models
[Website]

Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
[Website]

Unlearning Concepts from Text-to-Video Diffusion Models
[Website]

EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts
[Website]

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
[Website]

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?
[Website]

Model Integrity when Unlearning with T2I Diffusion Models
[Website]

Learning to Forget using Hypernetworks
[Website]

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
[Website]

AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors
[Website]

EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques
[Website]

New Concept Learning

⭐⭐⭐DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
[CVPR 2023 Honorable Mention] [Website] [Project] [Official Dataset] [Unofficial Code] [Diffusers Doc] [Diffusers Code]

⭐⭐⭐An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
[ICLR 2023 top-25%] [Website] [Diffusers Doc] [Diffusers Code] [Code]

⭐⭐Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion
[CVPR 2023] [Website] [Project] [Diffusers Doc] [Diffusers Code] [Code]

⭐⭐ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
[ECCV 2024] [Project] [Code]

⭐⭐ReVersion: Diffusion-Based Relation Inversion from Images
[Website] [Project] [Code]

SINE: SINgle Image Editing with Text-to-Image Diffusion Models
[CVPR 2023] [Website] [Project] [Code]

Break-A-Scene: Extracting Multiple Concepts from a Single Image
[SIGGRAPH Asia 2023] [Project] [Code]

Concept Decomposition for Visual Exploration and Inspiration
[SIGGRAPH Asia 2023] [Project] [Code]

Cones: Concept Neurons in Diffusion Models for Customized Generation
[ICML 2023 Oral] [ICML 2023 Oral] [Website] [Code]

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
[NeurIPS 2023] [Website] [Project] [Code]

Inserting Anybody in Diffusion Models via Celeb Basis
[NeurIPS 2023] [Website] [Project] [Code]

Controlling Text-to-Image Diffusion by Orthogonal Finetuning
[NeurIPS 2023] [Website] [Project] [Code]

Photoswap: Personalized Subject Swapping in Images
[NeurIPS 2023] [Website] [Project] [Code]

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
[NeurIPS 2023] [Website] [Project] [Code]

ITI-GEN: Inclusive Text-to-Image Generation
[ICCV 2023 Oral] [Website] [Project] [Code]

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
[ICCV 2023] [Website] [Project] [Code]

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
[ICCV 2023 Oral] [Website] [Code]

A Neural Space-Time Representation for Text-to-Image Personalization
[SIGGRAPH Asia 2023] [Project] [Code]

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models
[SIGGRAPH 2023] [Project] [Code]

Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation
[NeurIPS 2023] [Website] [Code]

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
[ECCV 2024] [Project] [Code]

Face2Diffusion for Fast and Editable Face Personalization
[CVPR 2024] [Project] [Code]

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
[CVPR 2024] [Project] [Code]

CapHuman: Capture Your Moments in Parallel Universes
[CVPR 2024] [Project] [Code]

Style Aligned Image Generation via Shared Attention
[CVPR 2024] [Project] [Code]

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
[CVPR 2024] [Project] [Code]

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
[CVPR 2024] [Project] [Code]

Material Palette: Extraction of Materials from a Single Image
[CVPR 2024] [Project] [Code]

Learning Continuous 3D Words for Text-to-Image Generation
[CVPR 2024] [Project] [Code]

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
[AAAI 2024] [Project] [Code]

Direct Consistency Optimization for Compositional Text-to-Image Personalization
[NeurIPS 2024] [Project] [Code]

The Hidden Language of Diffusion Models
[ICLR 2024] [Project] [Code]

ZeST: Zero-Shot Material Transfer from a Single Image
[ECCV 2024] [Project] [Code]

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
[Website] [Project] [Code]

MagicFace: Training-free Universal-Style Human Image Customized Synthesis
[Website] [Project] [Code]

LCM-Lookahead for Encoder-based Text-to-Image Personalization
[Website] [Project] [Code]

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
[Website] [Project] [Code]

AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation
[Website] [Project] [Code]

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
[Website] [Project] [Code]

ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
[Website] [Project] [Code]

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation
[Website] [Project] [Code]

Customizing Text-to-Image Models with a Single Image Pair
[Website] [Project] [Code]

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
[Website] [Project] [Code]

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
[Website] [Project] [Code]

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
[Website] [Project] [Code]

CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models
[Website] [Project] [Code]

Customizing Text-to-Image Diffusion with Camera Viewpoint Control
[Website] [Project] [Code]

Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization
[Website] [Project] [Code]

StyleDrop: Text-to-Image Generation in Any Style
[Website] [Project] [Code]

Personalized Representation from Personalized Generation
[Website] [Project] [Code]

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
[Website] [Project] [Code]

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
[Website] [Project] [Code]

Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[Website] [Project] [Code]

Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion
[Website] [Project] [Code]

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
[Website] [Project] [Code]

MagicNaming: Consistent Identity Generation by Finding a "Name Space" in T2I Diffusion Models
[Website] [Project] [Code]

DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning
[Website] [Project] [Code]

SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing
[Website] [Project] [Code]

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
[Website] [Project] [Code]

When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation
[Website] [Project] [Code]

InstantID: Zero-shot Identity-Preserving Generation in Seconds
[Website] [Project] [Code]

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
[Website] [Project] [Code]

Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction
[Website] [Project] [Code]

CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization
[Website] [Project] [Code]

DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
[Website] [Project] [Code]

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers
[Website] [Project] [Code]

λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
[Website] [Project] [Code]

Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
[Website] [Project] [Code]

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
[Website] [Project] [Code]

StableIdentity: Inserting Anybody into Anywhere at First Sight
[Website] [Project] [Code]

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
[Website] [Project] [Code]

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder
[Website] [Project] [Code]

EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance
[Website] [Project] [Code]

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
[Website] [Project] [Code]

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
[Website] [Project] [Code]

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
[Website] [Project] [Code]

CSGO: Content-Style Composition in Text-to-Image Generation
[Website] [Project] [Code]

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models
[NeurIPS 2024] [Code]

Customized Generation Reimagined: Fidelity and Editability Harmonized
[ECCV 2024] [Code]

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
[ECCV 2024] [Code]

High-fidelity Person-centric Subject-to-Image Synthesis
[CVPR 2024] [Code]

ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation
[SIGGRAPH Asia 2023] [Code]

Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier
[WACV 2025] [Code]

Multiresolution Textual Inversion
[NeurIPS 2022 workshop] [Code]

Compositional Inversion for Stable Diffusion Models
[AAAI 2024] [Code]

Decoupled Textual Embeddings for Customized Image Generation
[AAAI 2024] [Code]

DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning
[NeurIPS 2024] [Code]

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
[Website] [Code]

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
[Website] [Code]

Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis
[Website] [Code]

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
[Website] [Code]

PuLID: Pure and Lightning ID Customization via Contrastive Alignment
[Website] [Code]

Cross Initialization for Personalized Text-to-Image Generation
[Website] [Code]

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
[Website] [Code]

SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
[Website] [Code]

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation
[Website] [Code]

AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image
[Website] [Code]

A Closer Look at Parameter-Efficient Tuning in Diffusion Models
[Website] [Code]

FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization
[Website] [Code]

Controllable Textual Inversion for Personalized Text-to-Image Generation
[Website] [Code]

Cross-domain Compositing with Pretrained Diffusion Models
[Website] [Code]

Concept-centric Personalization with Large-scale Diffusion Priors
[Website] [Code]

Customization Assistant for Text-to-image Generation
[Website] [Code]

Cross Initialization for Personalized Text-to-Image Generation
[Website] [Code]

Cones 2: Customizable Image Synthesis with Multiple Subjects
[Website] [Code]

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
[Website] [Code]

AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization
[Website] [Code]

PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium
[Website] [Code]

CusConcept: Customized Visual Concept Decomposition with Diffusion Models
[Website] [Code]

HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
[ECCV 2024] [Project]

Language-Informed Visual Concept Learning
[ICLR 2024] [Project]

Key-Locked Rank One Editing for Text-to-Image Personalization
[SIGGRAPH 2023] [Project]

Diffusion in Style
[ICCV 2023] [Project]

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
[CVPR 2024] [Project]

RealCustom++: Representing Images as Real-Word for Real-Time Customization
[Website] [Project]

Personalized Residuals for Concept-Driven Text-to-Image Generation
[CVPR 2024] [Project]

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
[ECCV 2024] [Project]

Diffusion Self-Distillation for Zero-Shot Customized Image Generation
[Website] [Project]

Multi-subject Open-set Personalization in Video Generation
[Website] [Project]

RelationBooth: Towards Relation-Aware Customized Object Generation
[Website] [Project]

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
[Website] [Project]

InstructBooth: Instruction-following Personalized Text-to-Image Generation
[Website] [Project]

AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
[Website] [Project]

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
[Website] [Project]

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
[Website] [Project]

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
[Website] [Project]

Subject-driven Text-to-Image Generation via Apprenticeship Learning
[Website] [Project]

Orthogonal Adaptation for Modular Customization of Diffusion Models
[Website] [Project]

Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation
[Website] [Project]

Nested Attention: Semantic-aware Attention Values for Concept Personalization
[Website] [Project]

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
[Website] [Project]

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner
[Website] [Project]

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
[Website] [Project]

$P+$: Extended Textual Conditioning in Text-to-Image Generation
[Website] [Project]

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
[Website] [Project]

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
[Website] [Project]

Total Selfie: Generating Full-Body Selfies
[Website] [Project]

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation
[Website] [Project]

DreamTuner: Single Image is Enough for Subject-Driven Generation
[Website] [Project]

SerialGen: Personalized Image Generation by First Standardization Then Personalization
[Website] [Project]

ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning
[Website] [Project]

PALP: Prompt Aligned Personalization of Text-to-Image Models
[Website] [Project]

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
[CVPR 2024] [Project]

Visual Style Prompting with Swapping Self-Attention
[Website] [Project]

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
[Website] [Project]

Non-confusing Generation of Customized Concepts in Diffusion Models
[Website] [Project]

HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
[Website] [Project]

Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models
[NeurIPS 2024]

ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image
[ECCV 2024]

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
[CVPR 2024]

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
[CVPR 2024]

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
[AAAI 2024]

FreeTuner: Any Subject in Any Style with Training-free Diffusion
[Website]

Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling Augmentation Framework
[Website]

InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
[Website]

DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation
[Website]

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models
[Website]

Gradient-Free Textual Inversion
[Website]

Identity Encoder for Personalized Diffusion
[Website]

Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation
[Website]

ELODIN: Naming Concepts in Embedding Spaces
[Website]

Generate Anything Anywhere in Any Scene
[Website]

Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model
[Website]

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face
[Website]

MagiCapture: High-Resolution Multi-Concept Portrait Customization
[Website]

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization
[Website]

DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics
[Website]

An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis
[Website]

Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
[Website]

Memory-Efficient Personalization using Quantized Diffusion Model
[Website]

BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
[Website]

Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization
[Website]

Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
[Website]

SeFi-IDE: Semantic-Fidelity Identity Embedding for Personalized Diffusion-Based Generation
[Website]

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
[Website]

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
[Website]

MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration
[Website]

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
[Website]

OneActor: Consistent Character Generation via Cluster-Conditioned Guidance
[Website]

StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models
[Website]

Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks
[Website]

Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter
[Website]

PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
[Website]

AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models
[Website]

Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation
[Website]

PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
[Website]

MagicID: Flexible ID Fidelity Generation System
[Website]

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization
[Website]

ArtiFade: Learning to Generate High-quality Subject from Blemished Images
[Website]

CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
[Website]

Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis
[Website]

Event-Customized Image Generation
[Website]

LEARNING TO CUSTOMIZE TEXT-TO-IMAGE DIFFUSION IN DIVERSE CONTEXT
[Website]

HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects
[Website]

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
[Website]

Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency
[Website]

Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
[Website]

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
[Website]

RealisID: Scale-Robust and Fine-Controllable Identity Customization via Local and Global Complementation
[Website]

P3S-Diffusion:A Selective Subject-driven Generation Framework via Point Supervision
[Website]

PersonaHOI: Effortlessly Improving Personalized Face with Human-Object Interaction Generation
[Website]

T2I Diffusion Model augmentation

⭐⭐⭐Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
[SIGGRAPH 2023] [Project] [Official Code] [Diffusers Code] [Diffusers doc] [Replicate Demo]

SEGA: Instructing Diffusion using Semantic Dimensions
[NeurIPS 2023] [Website] [Code] [Diffusers Code] [Diffusers Doc]

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
[ICCV 2023] [Website] [Project] [Code Official] [Diffusers Doc] [Diffusers Code]

Expressive Text-to-Image Generation with Rich Text
[ICCV 2023] [Website] [Project] [Code] [Demo]

Editing Implicit Assumptions in Text-to-Image Diffusion Models
[ICCV 2023] [Website] [Project] [Code] [Demo]

ElasticDiffusion: Training-free Arbitrary Size Image Generation
[CVPR 2024] [Project] [Code] [Demo]

MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Discriminative Class Tokens for Text-to-Image Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Compositional Visual Generation with Composable Diffusion Models
[ECCV 2022] [Website] [Project] [Code]

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
[ICCV 2023] [Project] [Code] [Blog]

Diffusion Self-Guidance for Controllable Image Generation
[NeurIPS 2023] [Website] [Project] [Code]

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
[NeurIPS 2023] [Website] [Code]

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
[NeurIPS 2023] [Website] [Code]

Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
[NeurIPS 2023] [Website] [Code]

DemoFusion: Democratising High-Resolution Image Generation With No $$$
[CVPR 2024] [Project] [Code]

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
[CVPR 2024] [Project] [Code]

Training Diffusion Models with Reinforcement Learning
[ICLR 2024] [Project] [Code]

Divide & Bind Your Attention for Improved Generative Semantic Nursing
[BMVC 2023 Oral] [Project] [Code]

Make It Count: Text-to-Image Generation with an Accurate Number of Objects
[Website] [Project] [Code]

OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
[Website] [Project] [Code]

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
[Website] [Project] [Code]

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
[Website] [Project] [Code]

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
[Website] [Project] [Code]

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
[Website] [Project] [Code]

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
[Website] [Project] [Code]

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
[Website] [Project] [Code]

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
[Website] [Project] [Code]

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
[Website] [Project] [Code]

Real-World Image Variation by Aligning Diffusion Inversion Chain
[Website] [Project] [Code]

FreeU: Free Lunch in Diffusion U-Net
[Website] [Project] [Code]

GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
[Website] [Project] [Code]

ConceptLab: Creative Generation using Diffusion Prior Constraints
[Website] [Project] [Code]

Aligning Text-to-Image Diffusion Models with Reward Backpropagationn
[Website] [Project] [Code]

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
[Website] [Project] [Code]

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
[Website] [Project] [Code]

Tiled Diffusion
[Website] [Project] [Code]

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
[Website] [Project] [Code]

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
[Website] [Project] [Code]

TokenCompose: Grounding Diffusion with Token-level Supervision
[Website] [Project] [Code]

DiffusionGPT: LLM-Driven Text-to-Image Generation System
[Website] [Project] [Code]

Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models
[Website] [Project] [Code]

Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
[Website] [Project] [Code]

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
[Website] [Project] [Code]

MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion
[Website] [Project] [Code]

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
[Website] [Project] [Code]

Stylus: Automatic Adapter Selection for Diffusion Models
[Website] [Project] [Code]

MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
[Website] [Project] [Code]

Negative Token Merging: Image-based Adversarial Feature Guidance
[Website] [Project] [Code]

Iterative Object Count Optimization for Text-to-image Diffusion Models
[Website] [Project] [Code]

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
[Website] [Project] [Code]

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
[Website] [Project] [Code]

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
[Website] [Project] [Code]

TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
[Website] [Project] [Code]

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
[ACM MM 2023 Oral] [Code]

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models
[ICLR 2024] [Code]

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
[NeurIPS 2024] [Code]

Dynamic Prompt Optimizing for Text-to-Image Generation
[CVPR 2024] [Code]

Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
[CVPR 2024] [Code]

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
[CVPR 2024] [Code]

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
[CVPR 2024] [Code]

Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
[ECCV 2024] [Code]

On Discrete Prompt Optimization for Diffusion Models
[ICML 2024] [Code]

Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function
[NeurIPS 2024] [Code]

Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization
[ACM MM 2024] [Code]

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
[NeurIPS 2023] [Code]

Diffusion Model Alignment Using Direct Preference Optimization
[Website] [Code]

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment
[Website] [Code]

Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback
[Website] [Code]

Zigzag Diffusion Sampling: The Path to Success Is Zigzag
[Website] [Code]

Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models
[Website] [Code]

Progressive Compositionality In Text-to-Image Generative Models
[Website] [Code]

Improving Long-Text Alignment for Text-to-Image Diffusion Models
[Website] [Code]

Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization
[Website] [Code]

RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images
[Website] [Code]

Aggregation of Multi Diffusion Models for Enhancing Learned Representations
[Website] [Code]

AID: Attention Interpolation of Text-to-Image Diffusion
[Website] [Code]

Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
[Website] [Code]

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
[Website] [Code]

ORES: Open-vocabulary Responsible Visual Synthesis
[Website] [Code]

Alignment without Over-optimization: Training-Free Solution for Diffusion Models
[Website] [Code]

Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
[Website] [Code]

Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models
[Website] [Code]

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
[Website] [Code]

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs
[Website] [Code]

Detector Guidance for Multi-Object Text-to-Image Generation
[Website] [Code]

Designing a Better Asymmetric VQGAN for StableDiffusion
[Website] [Code]

FABRIC: Personalizing Diffusion Models with Iterative Feedback
[Website] [Code]

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
[Website] [Code]

Progressive Text-to-Image Diffusion with Soft Latent Direction
[Website] [Code]

Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy
[Website] [Code]

TraDiffusion: Trajectory-Based Training-Free Image Generation
[Website] [Code]

If at First You Don’t Succeed, Try, Try Again:Faithful Diffusion-based Text-to-Image Generation by Selection
[Website] [Code]

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
[Website] [Code]

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
[Website] [Code]

A General Framework for Inference-time Scaling and Steering of Diffusion Models
[Website] [Code]

Making Multimodal Generation Easier: When Diffusion Models Meet LLMs
[Website] [Code]

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
[Website] [Code]

AltDiffusion: A Multilingual Text-to-Image Diffusion Model
[Website] [Code]

It is all about where you start: Text-to-image generation with seed selection
[Website] [Code]

End-to-End Diffusion Latent Optimization Improves Classifier Guidance
[Website] [Code]

ReNeg: Learning Negative Embedding with Reward Guidance
[Website] [Code]

Correcting Diffusion Generation through Resampling
[Website] [Code]

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
[Website] [Code]

Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
[Website] [Code]

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
[Website] [Code]

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement
[Website] [Code]

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
[Website] [Code]

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
[Website] [Code]

Aligning Few-Step Diffusion Models with Dense Reward Difference Learning
[Website] [Code]

LightIt: Illumination Modeling and Control for Diffusion Models
[CVPR 2024] [Project]

Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
[NeurIPS 2024] [Project]

Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG
[Website] [Project]

Scalable Ranked Preference Optimization for Text-to-Image Generation
[Website] [Project]

A Noise is Worth Diffusion Guidance
[Website] [Project]

LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors
[Website] [Project]

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
[Website] [Project]

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
[Website] [Project]

MotiF: Making Text Count in Image Animation with Motion Focal Loss
[Website] [Project]

RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance
[Website] [Project]

UniFL: Improve Stable Diffusion via Unified Feedback Learning
[Website] [Project]

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
[Website] [Project]

ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
[Website] [Project]

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
[Website] [Project]

Semantic Guidance Tuning for Text-To-Image Diffusion Models
[Website] [Project]

Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation
[Website] [Project]

Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation
[Website] [Project]

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
[Website] [Project]

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
[Website] [Project]

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes
[Website] [Project]

Lazy Diffusion Transformer for Interactive Image Editing
[Website] [Project]

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
[Website] [Project]

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
[Website] [Project]

Norm-guided latent space exploration for text-to-image generation
[NeurIPS 2023] [Website]

Improving Diffusion-Based Image Synthesis with Context Prediction
[NeurIPS 2023] [Website]

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
[ECCV 2024]

MultiGen: Zero-shot Image Generation from Multi-modal Prompt
[ECCV 2024]

On Mechanistic Knowledge Localization in Text-to-Image Generative Models
[ICML 2024]

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
[NeurIPS 2024]

Generating Compositional Scenes via Text-to-image RGBA Instance Generation
[NeurIPS 2024]

A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization
[Website]

PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation
[Website]

Exposure Diffusion: HDR Image Generation by Consistent LDR denoising
[Website]

Information Theoretic Text-to-Image Alignment
[Website]

Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers
[Website]

Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control
[Website]

Aligning Diffusion Models by Optimizing Human Utility
[Website]

Instruct-Imagen: Image Generation with Multi-modal Instruction
[Website]

CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models
[Website]

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask
[Website]

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
[Website]

Text2Layer: Layered Image Generation using Latent Diffusion Model
[Website]

Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling
[Website]

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
[Website]

UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
[Website]

Improving Compositional Text-to-image Generation with Large Vision-Language Models
[Website]

Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else
[Website]

Unseen Image Synthesis with Diffusion Models
[Website]

AnyLens: A Generative Diffusion Model with Any Rendering Lens
[Website]

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering
[Website]

Text2Street: Controllable Text-to-image Generation for Street Views
[Website]

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
[Website]

Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Model
[Website]

Debiasing Text-to-Image Diffusion Models
[Website]

Stochastic Conditional Diffusion Models for Semantic Image Synthesis
[Website]

Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
[Website]

Transparent Image Layer Diffusion using Latent Transparency
[Website]

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
[Website]

HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
[Website]

StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
[Website]

Make Me Happier: Evoking Emotions Through Image Diffusion Models
[Website]

Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model
[Website]

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
[Website]

AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation
[Website]

U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models
[Website]

ECNet: Effective Controllable Text-to-Image Diffusion Models
[Website]

TextCraftor: Your Text Encoder Can be Image Quality Controller
[Website]

Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding
[Website]

Towards Better Text-to-Image Generation Alignment via Attention Modulation
[Website]

Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model
[Website]

SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance
[Website]

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
[Website]

Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
[Website]

FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting
[Website]

Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models
[Website]

SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation
[Website]

Training-Free Sketch-Guided Diffusion with Latent Optimization
[Website]

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
[Website]

Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models
[Website]

Training-free Diffusion Model Alignment with Sampling Demons
[Website]

MinorityPrompt: Text to Minority Image Generation via Prompt Optimization
[Website]

AUTOMATED FILTERING OF HUMAN FEEDBACK DATA FOR ALIGNING TEXT-TO-IMAGE DIFFUSION MODELS
[Website]

Saliency Guided Optimization of Diffusion Latents
[Website]

Preference Optimization with Multi-Sample Comparisons
[Website]

CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
[Website]

Redefining in Dictionary: Towards a Enhanced Semantic Understanding of Creative Generation
[Website]

Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation
[Website]

Improving image synthesis with diffusion-negative sampling
[Website]

Golden Noise for Diffusion Models: A Learning Framework
[Website]

Test-time Conditional Text-to-Image Synthesis Using Diffusion Models
[Website]

Decoupling Training-Free Guided Diffusion by ADMM
[Website]

Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
[Website]

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
[Website]

TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
[Website]

Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory
[Website]

CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis
[Website]

Reward Incremental Learning in Text-to-Image Generation
[Website]

QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
[Website]

Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
[Website]

Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models
[Website]

The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation
[Website]

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance
[Website]

Visual Lexicon: Rich Image Features in Language Space
[Website]

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models
[Website]

ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction
[Website]

TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization
[Website]

Personalized Preference Fine-tuning of Diffusion Models
[Website]

Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
[Website]

Spatial Control

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
[ICML 2023] [ICML 2023] [Website] [Project] [Code] [Diffusers Code] [Diffusers Doc] [Replicate Demo]

SceneComposer: Any-Level Semantic Image Synthesis
[CVPR 2023 Highlight] [Website] [Project] [Code]

GLIGEN: Open-Set Grounded Text-to-Image Generation
[CVPR 2023] [Website] [Code] [Demo]

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
[ICLR 2023] [Website] [Project] [Code]

Visual Programming for Text-to-Image Generation and Evaluation
[NeurIPS 2023] [Website] [Project] [Code]

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
[ICLR 2024] [Website] [Project] [Code]

GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation
[NeurIPS 2024] [Project] [Code]

ReCo: Region-Controlled Text-to-Image Generation
[CVPR 2023] [Website] [Code]

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
[ICCV 2023] [Website] [Code]

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
[ICCV 2023] [Website] [Code]

Dense Text-to-Image Generation with Attention Modulation
[ICCV 2023] [Website] [Code]

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
[Website] [Project] [Code] [Demo] [Blog]

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
[CVPR 2024] [Code] [Project]

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
[CVPR 2024] [Project] [Code]

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
[Website] [Project] [Code]

Training-Free Layout Control with Cross-Attention Guidance
[Website] [Project] [Code]

ROICtrl: Boosting Instance Control for Visual Generation
[Website] [Project] [Code]

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
[Website] [Project] [Code]

Directed Diffusion: Direct Control of Object Placement through Attention Guidance
[Website] [Project] [Code]

Grounded Text-to-Image Synthesis with Attention Refocusing
[Website] [Project] [Code]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
[Website] [Project] [Code]

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation
[Website] [Project] [Code]

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
[Website] [Project] [Code]

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation
[Website] [Project] [Code]

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
[Website] [Project] [Code]

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
[Website] [Project] [Code]

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
[Website] [Project] [Code]

InstanceDiffusion: Instance-level Control for Image Generation
[Website] [Project] [Code]

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering
[Website] [Project] [Code]

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
[CVPR 2024] [Code]

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
[CVPR 2024] [Code]

Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation
[Website] [Code]

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
[Website] [Code]

Enhancing Object Coherence in Layout-to-Image Synthesis
[Website] [Code]

Training-free Regional Prompting for Diffusion Transformers
[Website] [Code]

DivCon: Divide and Conquer for Progressive Text-to-Image Generation
[Website] [Code]

RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models
[Website] [Code]

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
[Website] [Code]

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation
[Website] [Code]

Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis
[ECCV 2024] [Project]

ReCorD: Reasoning and Correcting Diffusion for HOI Generation
[ACM MM 2024] [Project]

Compositional Text-to-Image Generation with Dense Blob Representations
[Website] [Project]

GroundingBooth: Grounding Text-to-Image Customization
[Website] [Project]

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
[Website] [Project]

ReGround: Improving Textual and Spatial Grounding at No Cost
[Website] [Project]

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
[CVPR 2024]

Guided Image Synthesis via Initial Image Editing in Diffusion Model
[ACM MM 2023]

Training-free Composite Scene Generation for Layout-to-Image Synthesis
[ECCV 2024]

LSReGen: Large-Scale Regional Generator via Backward Guidance Framework
[Website]

Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion
[Website]

Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching
[Website]

Boundary Attention Constrained Zero-Shot Layout-To-Image Generation
[Website]

Enhancing Image Layout Control with Loss-Guided Diffusion Models
[Website]

GLoD: Composing Global Contexts and Local Details in Image Generation
[Website]

A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
[Website]

Controllable Text-to-Image Generation with GPT-4
[Website]

Localized Text-to-Image Generation for Free via Cross Attention Control
[Website]

Training-Free Location-Aware Text-to-Image Synthesis
[Website]

Composite Diffusion | whole >= \Sigma parts
[Website]

Continuous Layout Editing of Single Images with Diffusion Models
[Website]

Zero-shot spatial layout conditioning for text-to-image diffusion models
[Website]

Obtaining Favorable Layouts for Multiple Object Generation
[Website]

EliGen: Entity-Level Controlled Image Generation with Regional Attention
[Website]

LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis
[Website]

Self-correcting LLM-controlled Diffusion Models
[Website]

Joint Generative Modeling of Scene Graphs and Images via Diffusion Models
[Website]

Spatial-Aware Latent Initialization for Controllable Image Generation
[Website]

Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control
[Website]

ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation
[Website]

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
[Website]

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
[Website]

SpotActor: Training-Free Layout-Controlled Consistent Image Generation
[Website]

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation
[Website]

Scribble-Guided Diffusion for Training-free Text-to-Image Generation
[Website]

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
[Website]

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
[Website]

Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement
[Website]

Grounding Text-To-Image Diffusion Models For Controlled High-Quality Image Generation
[Website]

I2I translation

⭐⭐⭐SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
[ICLR 2022] [Website] [Project] [Code]

⭐⭐⭐DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
[CVPR 2022] [Website] [Code]

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
[NeurIPS 2023] [Website] [Project] [Code]

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
[CVPR 2024] [Project] [Code]

Diffusion-based Image Translation using Disentangled Style and Content Representation
[ICLR 2023] [Website] [Code]

FlexIT: Towards Flexible Semantic Image Translation
[CVPR 2022] [Website] [Code]

Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
[ICCV 2023] [Website] [Code]

E2GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
[ICML 2024] [Project] [Code]

Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models
[Website] [Project] [Code]

Cross-Image Attention for Zero-Shot Appearance Transfer
[Website] [Project] [Code]

FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models
[Website] [Project] [Code]

Diffusion Guided Domain Adaptation of Image Generators
[Website] [Project] [Code]

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
[Website] [Project] [Code]

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models
[Website] [Project] [Code]

FilterPrompt: Guiding Image Transfer in Diffusion Models
[Website] [Project] [Code]

Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
[ECCV 2024] [Code]

One-Shot Structure-Aware Stylized Image Synthesis
[CVPR 2024] [Code]

BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models
[CVPR 2023] [Code]

Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile
[AAAI 2024] [Code]

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
[AAAI 2024] [Code]

ZePo: Zero-Shot Portrait Stylization with Faster Sampling
[ACM MM 2024] [Code]

DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer
[ACM MM Asia 2024] [Code]

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
[Website] [Code]

Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance
[Website] [Code]

Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis
[Website] [Code]

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
[Website] [Code]

GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis
[Website] [Code]

CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion
[Website] [Code]

PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering
[Website] [Code]

One-Step Image Translation with Text-to-Image Models
[Website] [Code]

D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods
[Website] [Code]

StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models
[ICCV 2023] [Website]

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors
[ACM MM 2023]

High-Fidelity Diffusion-based Image Editing
[AAAI 2024]

EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models
[ECCV 2024]

Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer
[Website]

UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators
[Website]

Regularized Distribution Matching Distillation for One-step Unpaired Image-to-Image Translation
[Website]

TEXTOC: Text-driven Object-Centric Style Transfer
[Website]

Seed-to-Seed: Image Translation in Diffusion Seed Space
[Website]

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation
[Website]

Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation
[Website]

Segmentation Detection Tracking

odise: open-vocabulary panoptic segmentation with text-to-image diffusion modelss
[CVPR 2023 Highlight] [Project] [Code] [Demo]

LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
[ICCV 2023] [Website] [Project] [Code]

Text-Image Alignment for Diffusion-Based Perception
[CVPR 2024] [Website] [Project] [Code]

Stochastic Segmentation with Conditional Categorical Diffusion Models
[ICCV 2023] [Website] [Code]

DDP: Diffusion Model for Dense Visual Prediction
[ICCV 2023] [Website] [Code]

DiffusionDet: Diffusion Model for Object Detection
[ICCV 2023] [Website] [Code]

OVTrack: Open-Vocabulary Multiple Object Tracking
[CVPR 2023] [Website] [Project]

SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
[NeurIPS 2023] [Website] [Code]

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
[CVPR 2024] [Project] [Code]

Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
[Website] [Project] [Code]

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
[Website] [Project] [Code]

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
[Website] [Project] [Code]

InvSeg: Test-Time Prompt Inversion for Semantic Segmentation
[Website] [Project] [Code]

SMITE: Segment Me In TimE
[Website] [Project] [Code]

Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation
[NeurIPS 2024] [Code]

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
[ECCV 2024] [Code]

ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model
[Website] [Code]

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
[Website] [Code]

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
[Website] [Code]

Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models
[Website] [Code]

Scribble Hides Class: Promoting Scribble-Based Weakly-Supervised Semantic Segmentation with Its Class Label
[Website] [Code]

Personalize Segment Anything Model with One Shot
[Website] [Code]

DiffusionTrack: Diffusion Model For Multi-Object Tracking
[Website] [Code]

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
[Website] [Code]

A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
[Website] [Code]

Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
[Website] [Code]

UniGS: Unified Representation for Image Generation and Segmentation
[Website] [Code]

Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
[Website] [Code]

MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation
[Website] [Code]

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
[Website] [Code]

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
[Website] [Code]

No Annotations for Object Detection in Art through Stable Diffusion
[Website] [Code]

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
[ICLR 2024] [Website] [Project]

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
[CVPR 2024] [Project]

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
[Website] [Project]

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
[Website] [Project]

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models
[Website] [Project]

Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation
[ICCV 2023] [Website]

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
[CVPR 2024]

Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
[ECCV 2024]

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation
[NeurIPS 2024]

Generalization by Adaptation: Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation
[WACV 2024]

Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis
[ACCV 2024]

A Simple Background Augmentation Method for Object Detection with Diffusion Model
[Website]

Unveiling the Power of Diffusion Features For Personalized Segmentation and Retrieval
[Website]

SLiMe: Segment Like Me
[Website]

ASAM: Boosting Segment Anything Model with Adversarial Tuning
[Website]

Diffusion Features to Bridge Domain Gap for Semantic Segmentation
[Website]

MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation
[Website]

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery
[Website]

Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
[Website]

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter
[Website]

Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion
[Website]

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models
[Website]

Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation
[Website]

Patch-based Selection and Refinement for Early Object Detection
[Website]

TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models
[Website]

Towards Granularity-adjusted Pixel-level Semantic Annotation
[Website]

Gen2Det: Generate to Detect
[Website]

Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors
[Website]

ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
[Website]

Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection
[Website]

Generative Edge Detection with Stable Diffusion
[Website]

DINTR: Tracking via Diffusion-based Interpolation
[Website]

Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking
[Website]

DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability
[Website]

Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation
[Website]

Panoptic Diffusion Models: co-generation of images and segmentation maps
[Website]

Additional conditions

⭐⭐⭐Adding Conditional Control to Text-to-Image Diffusion Models
[ICCV 2023 best paper] [Website] [Official Code] [Diffusers Doc] [Diffusers Code]

⭐⭐T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
[Website] [Official Code] [Diffusers Code]

SketchKnitter: Vectorized Sketch Generation with Diffusion Models
[ICLR 2023 Spotlight] [ICLR 2023 Spotlight] [Website] [Code]

Freestyle Layout-to-Image Synthesis
[CVPR 2023 highlight] [Website] [Project] [Code]

Collaborative Diffusion for Multi-Modal Face Generation and Editing
[CVPR 2023] [Website] [Project] [Code]

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
[ICCV 2023] [Website] [Project] [Code]

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
[ICCV 2023] [Website] [Code]

Sketch-Guided Text-to-Image Diffusion Models
[SIGGRAPH 2023] [Project] [Code]

Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive
[ICLR 2024] [Project] [Code]

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
[Website] [Project] [Code]

ControlNeXt: Powerful and Efficient Control for Image and Video Generation
[Website] [Project] [Code]

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance
[Website] [Project] [Code]

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
[Website] [Project] [Code]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
[Website] [Project] [Code]

Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis
[Website] [Project] [Code]

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
[Website] [Project] [Code]

A Simple Approach to Unifying Diffusion-based Conditional Generation
[Website] [Project] [Code]

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
[Website] [Project] [Code]

Late-Constraint Diffusion Guidance for Controllable Image Synthesis
[Website] [Project] [Code]

Composer: Creative and controllable image synthesis with composable conditions
[Website] [Project] [Code]

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
[Website] [Project] [Code]

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation
[Website] [Project] [Code]

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
[Website] [Project] [Code]

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
[Website] [Project] [Code]

LooseControl: Lifting ControlNet for Generalized Depth Conditioning
[Website] [Project] [Code]

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
[Website] [Project] [Code]

ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models
[Website] [Project] [Code]

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
[Website] [Project] [Code]

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior
[Website] [Project] [Code]

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
[ICLR 2024] [Code]

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
[CVPR 2024] [Code]

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis
[AAAI 2025] [Code]

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
[Website] [Code]

Universal Guidance for Diffusion Models
[Website] [Code]

Late-Constraint Diffusion Guidance for Controllable Image Synthesis
[Website] [Code]

Meta ControlNet: Enhancing Task Adaptation via Meta Learning
[Website] [Code]

Local Conditional Controlling for Text-to-Image Diffusion Models
[Website] [Code]

KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models
[Website] [Code]

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC
[Website] [Code]

OminiControl: Minimal and Universal Control for Diffusion Transformer
[Website] [Code]

Modulating Pretrained Diffusion Models for Multimodal Image Synthesis
[SIGGRAPH 2023] [Project]

SpaText: Spatio-Textual Representation for Controllable Image Generation
[CVPR 2023] [Project]

CCM: Adding Conditional Controls to Text-to-Image Consistency Models
[ICML 2024] [Project]

Dreamguider: Improved Training free Diffusion-based Conditional Generation
[Website] [Project]

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
[Website] [Project]

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
[Website] [Project]

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
[Website] [Project]

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection
[Website] [Project]

Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
[Website] [Project]

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
[Website] [Project]

CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models
[Website] [Project]

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
[Website] [Project]

EditAR: Unified Conditional Generation with Autoregressive Models
[Website] [Project]

Sketch-Guided Scene Image Generation
[Website]

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation
[Website]

Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation
[Website]

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt
[Website]

Adding 3D Geometry Control to Diffusion Models
[Website]

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation
[Website]

JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling
[Website]

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
[Website]

Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons
[Website]

Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt
[Website]

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
[Website]

Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation
[Website]

Label-free Neural Semantic Image Synthesis
[Website]

UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
[Website]

Few-Shot

Discriminative Diffusion Models as Few-shot Vision and Language Learners
[Website] [Code]

Few-Shot Diffusion Models
[Website] [Code]

Few-shot Semantic Image Synthesis with Class Affinity Transfer
[CVPR 2023] [Website]

DiffAlign : Few-shot learning using diffusion based synthesis and alignment
[Website]

Few-shot Image Generation with Diffusion Models
[Website]

Lafite2: Few-shot Text-to-Image Generation
[Website]

Few-Shot Task Learning through Inverse Generative Modeling
[Website]

SD-inpaint

Paint by Example: Exemplar-based Image Editing with Diffusion Models
[CVPR 2023] [Website] [Code] [Diffusers Doc] [Diffusers Code]

GLIDE: Towards photorealistic image generation and editing with text-guided diffusion model
[ICML 2022 Spotlight] [Website] [Code]

Blended Diffusion for Text-driven Editing of Natural Images
[CVPR 2022] [Website] [Project] [Code]

Blended Latent Diffusion
[SIGGRAPH 2023] [Project] [Code]

CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models
[NeurIPS 2024] [Project] [Code]

TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
[ICCV 2023] [Website] [Project] [Code]

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
[CVPR 2023] [Website] [Code]

Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models
[ICML 2023] [Website] [Code]

Coherent and Multi-modality Image Inpainting via Latent Space Optimization
[Website] [Project] [Code]

Inst-Inpaint: Instructing to Remove Objects with Diffusion Models
[Website] [Project] [Code] [Demo]

Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting
[Website] [Project] [Code]

AnyDoor: Zero-shot Object-level Image Customization
[Website] [Project] [Code]

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
[Website] [Project] [Code]

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
[Website] [Project] [Code]

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation
[Website] [Project] [Code]

Towards Language-Driven Video Inpainting via Multimodal Large Language Models
[Website] [Project] [Code]

Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
[Website] [Project] [Code]

Improving Text-guided Object Inpainting with Semantic Pre-inpainting
[ECCV 2024] [Code]

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
[ECCV 2024] [Code]

360-Degree Panorama Generation from Few Unregistered NFoV Images
[ACM MM 2023] [Code]

Delving Globally into Texture and Structure for Image Inpainting
[ACM MM 2022] [Code]

PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery
[AAAI 2025] [Code]

ControlEdit: A MultiModal Local Clothing Image Editing Method
[Website] [Code]

CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing
[Website] [Code]

DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting
[Website] [Code]

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
[Website] [Code]

Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing
[Website] [Code]

What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer
[Website] [Code]

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
[Website] [Code]

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
[Website] [Code]

Reference-based Image Composition with Sketch via Structure-aware Diffusion Model
[Website] [Code]

Image Inpainting via Iteratively Decoupled Probabilistic Modeling
[Website] [Code]

ControlCom: Controllable Image Composition using Diffusion Model
[Website] [Code]

Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model
[Website] [Code]

MAGICREMOVER: TUNING-FREE TEXT-GUIDED IMAGE INPAINTING WITH DIFFUSION MODELS
[Website] [Code]

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
[Website] [Code]

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
[Website] [Code]

Sketch-guided Image Inpainting with Partial Discrete Diffusion Process
[Website] [Code]

ReMOVE: A Reference-free Metric for Object Erasure
[Website] [Code]

Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting
[Website] [Code]

MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior
[Website] [Code]

Yuan: Yielding Unblemished Aesthetics Through A Unified Network for Visual Imperfections Removal in Generated Images
[Website] [Code]

AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes
[ECCV 2024] [Project]

Text2Place: Affordance-aware Text Guided Human Placement
[ECCV 2024] [Project]

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
[CVPR 2024] [Project]

Matting by Generation
[SIGGRAPH 2024] [Project]

PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference
[NeurIPS 2024] [Project]

Taming Latent Diffusion Model for Neural Radiance Field Inpainting
[Website] [Project]

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
[Website] [Project]

CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models
[Website] [Project]

SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
[Website] [Project]

Towards Stable and Faithful Inpainting
[Website] [Project]

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos
[Website] [Project]

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
[Website] [Project]

TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
[ACM MM 2024]

Semantically Consistent Video Inpainting with Conditional Diffusion Models
[Website]

Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention
[Website]

Outline-Guided Object Inpainting with Diffusion Models
[Website]

SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
[Website]

Gradpaint: Gradient-Guided Inpainting with Diffusion Models
[Website]

Infusion: Internal Diffusion for Video Inpainting
[Website]

Rethinking Referring Object Removal
[Website]

Tuning-Free Image Customization with Image and Text Guidance
[Website]

VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model
[Website]

FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image
[Website]

InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture
[Website]

Thinking Outside the BBox: Unconstrained Generative Object Compositing
[Website]

Content-aware Tile Generation using Exterior Boundary Inpainting
[Website]

AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status
[Website]

TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning
[Website]

MagicEraser: Erasing Any Objects via Semantics-Aware Control
[Website]

I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting
[Website]

VIPaint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference
[Website]

FreeCond: Free Lunch in the Input Conditions of Text-Guided Inpainting
[Website]

PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control
[Website]

Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment
[Website]

Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion
[Website]

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
[Website]

AsyncDSB: Schedule-Asynchronous Diffusion Schrödinger Bridge for Image Inpainting
[Website]

RAD: Region-Aware Diffusion Models for Image Inpainting
[Website]

MObI: Multimodal Object Inpainting Using Diffusion Models
[Website]

DiffuEraser: A Diffusion Model for Video Inpainting
[Website]

Layout Generation

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
[CVPR 2023] [Website] [Project] [Code]

Desigen: A Pipeline for Controllable Design Template Generation
[CVPR 2024] [Project] [Code]

DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer
[ICCV 2023] [Website] [Code]

LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models
[ICCV 2023] [Website] [Code]

Desigen: A Pipeline for Controllable Design Template Generation
[CVPR 2024] [Code]

DogLayout: Denoising Diffusion GAN for Discrete and Continuous Layout Generation
[Website] [Code]

LayoutDM: Transformer-based Diffusion Model for Layout Generation
[CVPR 2023] [Website]

Unifying Layout Generation with a Decoupled Diffusion Model
[CVPR 2023] [Website]

PLay: Parametrically Conditioned Layout Generation using Latent Diffusion
[ICML 2023] [Website]

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints
[ICLR 2024]

SLayR: Scene Layout Generation with Rectified Flow
[Website]

CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model
[Website]

Diffusion-based Document Layout Generation
[Website]

Dolfin: Diffusion Layout Transformers without Autoencoder
[Website]

LayoutFlow: Flow Matching for Layout Generation
[Website]

Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
[Website]

Text Generation

⭐⭐TextDiffuser: Diffusion Models as Text Painters
[NeurIPS 2023] [Website] [Project] [Code]

⭐⭐TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
[ECCV 2024 Oral] [Project] [Code]

GlyphControl: Glyph Conditional Control for Visual Text Generation
[NeurIPS 2023] [Website] [Code]

DiffUTE: Universal Text Editing Diffusion Model
[NeurIPS 2023] [Website] [Code]

Word-As-Image for Semantic Typography
[SIGGRAPH 2023] [Project] [Code]

Kinetic Typography Diffusion Model
[ECCV 2024] [Project] [Code]

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
[Website] [Project] [Code]

JoyType: A Robust Design for Multilingual Visual Text Creation
[Website] [Project] [Code]

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
[Website] [Project] [Code]

One-Shot Diffusion Mimicker for Handwritten Text Generation
[ECCV 2024] [Code]

DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution
[ECCV 2024] [Code]

HFH-Font: Few-shot Chinese Font Synthesis with Higher Quality, Faster Speed, and Higher Resolution
[SIGGRAPH Asia 2024] [Code]

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model
[AAAI 2024] [Code]

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
[AAAI 2024] [Code]

Text Image Inpainting via Global Structure-Guided Diffusion Models
[AAAI 2024] [Code]

Ambigram generation by a diffusion model
[ICDAR 2023] [Code]

Scene Text Image Super-resolution based on Text-conditional Diffusion Models
[WACV 2024] [Code]

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
[ECCV 2024] [Code]

First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending
[ECAI 2024] [Code]

Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models
[COLING 2025] [Code]

VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models
[Website] [Code]

Visual Text Generation in the Wild
[Website] [Code]

Deciphering Oracle Bone Language with Diffusion Models
[Website] [Code]

High Fidelity Scene Text Synthesis
[Website] [Code]

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
[Website] [Code]

AnyText: Multilingual Visual Text Generation And Editing
[Website] [Code]

AnyText2: Visual Text Generation and Editing With Customizable Attributes
[Website] [Code]

Few-shot Calligraphy Style Learning
[Website] [Code]

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
[Website] [Code]

DiffusionPen: Towards Controlling the Style of Handwritten Text Generation
[Website] [Code]

AmbiGen: Generating Ambigrams from Pre-trained Diffusion Model
[Website] [Project]

UniVG: Towards UNIfied-modal Video Generation
[Website] [Project]

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
[Website] [Project]

DECDM: Document Enhancement using Cycle-Consistent Diffusion Models
[WACV 2024]

SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
[Website]

Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation
[Website]

AnyTrans: Translate AnyText in the Image with Large Scale Models
[Website]

ARTIST: Improving the Generation of Text-rich Images by Disentanglement
[Website]

Improving Text Generation on Images with Synthetic Captions
[Website]

CustomText: Customized Textual Image Generation using Diffusion Models
[Website]

VecFusion: Vector Font Generation with Diffusion
[Website]

Typographic Text Generation with Off-the-Shelf Diffusion Model
[Website]

Font Style Interpolation with Diffusion Models
[Website]

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
[Website]

DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation
[Website]

CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction
[Website]

Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models
[Website]

Text Image Generation for Low-Resource Languages with Dual Translation Learning
[Website]

Decoupling Layout from Glyph in Online Chinese Handwriting Generation
[Website]

Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training
[Website]

TextMaster: Universal Controllable Text Edit
[Website]

Towards Visual Text Design Transfer Across Languages
[Website]

DiffSTR: Controlled Diffusion Models for Scene Text Removal
[Website]

TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images
[Website]

TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
[Website]

Conditional Text-to-Image Generation with Reference Guidance
[Website]

Type-R: Automatically Retouching Typos for Text-to-Image Generation
[Website]

AMO Sampler: Enhancing Text Rendering with Overshooting
[Website]

FonTS: Text Rendering with Typography and Style Controls
[Website]

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
[Website]

Super Resolution

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
[NeurIPS 2023 spotlight] [Website] [Project] [Code]

Image Super-Resolution via Iterative Refinement
[TPAMI] [Website] [Project] [Code]

DiffIR: Efficient Diffusion Model for Image Restoration
[ICCV 2023] [Website] [Code]

Kalman-Inspired Feature Propagation for Video Face Super-Resolution
[ECCV 2024] [Project] [Code]

HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior
[Website] [Project] [Code]

MatchDiffusion: Training-free Generation of Match-cuts
[Website] [Project] [Code]

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
[Website] [Project] [Code]

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
[Website] [Project] [Code]

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation
[Website] [Project] [Code]

FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution
[Website] [Project] [Code]

Exploiting Diffusion Prior for Real-World Image Super-Resolution
[Website] [Project] [Code]

SinSR: Diffusion-Based Image Super-Resolution in a Single Step
[CVPR 2024] [Code]

CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution
[CVPR 2024] [Code]

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
[NeurIPS 2024] [Code]

SeeClear: Semantic Distillation Enhances Pixel Condensation for Video Super-Resolution
[NeurIPS 2024] [Code]

Iterative Token Evaluation and Refinement for Real-World Super-Resolution
[AAAI 2024] [Code]

Boosting Diffusion Guidance via Learning Degradation-Aware Models for Blind Super Resolution
[Website] [Code]

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
[Website] [Code]

Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution
[Website] [Code]

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
[Website] [Code]

One Step Diffusion-based Super-Resolution with Time-Aware Distillation
[Website] [Code]

Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution
[Website] [Code]

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution
[Website] [Code]

Hero-SR: One-Step Diffusion for Super-Resolution with Human Perception Priors
[Website] [Code]

RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution
[Website] [Code]

One-Step Effective Diffusion Network for Real-World Image Super-Resolution
[Website] [Code]

Binarized Diffusion Model for Image Super-Resolution
[Website] [Code]

Does Diffusion Beat GAN in Image Super Resolution?
[Website] [Code]

PatchScaler: An Efficient Patch-independent Diffusion Model for Super-Resolution
[Website] [Code]

DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion
[Website] [Code]

Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
[Website] [Code]

OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
[Website] [Code]

Arbitrary-steps Image Super-resolution via Diffusion Inversion
[Website] [Code]

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
[Website] [Code]

DSR-Diff: Depth Map Super-Resolution with Diffusion Model
[Website] [Code]

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach
[Website] [Code]

RFSR: Improving ISR Diffusion Models via Reward Feedback Learning
[Website] [Code]

SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution
[Website] [Code]

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
[Website] [Code]

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
[Website] [Code]

BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution
[Website] [Code]

TASR: Timestep-Aware Diffusion Model for Image Super-Resolution
[Website] [Code]

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
[Website] [Project]

HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models
[ICCV 2023] [Website]

Text-guided Explorable Image Super-resolution
[CVPR 2024]

Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
[CVPR 2024]

AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution
[CVPR 2024]

Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network
[AAAI 2024]

Detail-Enhancing Framework for Reference-Based Image Super-Resolution
[Website]

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
[Website]

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution
[Website]

Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models
[Website]

Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
[Website]

YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution
[Website]

Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution -- a Non-Denoising Model
[Website]

TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution
[Website]

ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution
[Website]

Image Super-Resolution with Text Prompt Diffusio
[Website]

DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution
[Website]

DREAM: Diffusion Rectification and Estimation-Adaptive Models
[Website]

Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
[Website]

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
[Website]

CasSR: Activating Image Power for Real-World Image Super-Resolution
[Website]

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
[Website]

Frequency-Domain Refinement with Multiscale Diffusion for Super Resolution
[Website]

ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer
[Website]

Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution
[Website]

Adversarial Diffusion Compression for Real-World Image Super-Resolution
[Website]

HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution
[Website]

Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution
[Website]

RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution
[Website]

CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution
[Website]

Video Generation

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
[ICCV 2023 Oral] [Website] [Project] [Code]

SinFusion: Training Diffusion Models on a Single Image or Video
[ICML 2023] [Website] [Project] [Code]

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
[CVPR 2023] [Website] [Project] [Code]

ZIGMA: A DiT-style Zigzag Mamba Diffusion Model
[ECCV 2024] [Project] [Code]

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
[NeurIPS 2022] [Website] [Project] [Code]

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER
[NeurIPS 2023] [Website] [Code]

Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
[NeurIPS 2023] [Website] [Code]

Conditional Image-to-Video Generation with Latent Flow Diffusion Models
[CVPR 2023] [Website] [Code]

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
[CVPR 2023] [Project] [Code]

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
[CVPR 2024] [Project] [Code]

Video Diffusion Models
[ICLR 2022 workshop] [Website] [Code] [Project]

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
[Website] [Diffusers Doc] [Project] [Code]

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
[ECCV 2024] [Project] [Code]

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
[ECCV 2024] [Project] [Code]

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
[Website] [Project] [Code]

Tora: Trajectory-oriented Diffusion Transformer for Video Generation
[Website] [Project] [Code]

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
[Website] [Project] [Code]

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
[Website] [Project] [Code]

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
[Website] [Project] [Code]

Video Diffusion Alignment via Reward Gradients
[Website] [Project] [Code]

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
[Website] [Project] [Code]

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
[Website] [Project] [Code]

TVG: A Training-free Transition Video Generation Method with Diffusion Models
[Website] [Project] [Code]

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
[Website] [Project] [Code]

CamI2V: Camera-Controlled Image-to-Video Diffusion Model
[Website] [Project] [Code]

Identity-Preserving Text-to-Video Generation by Frequency Decomposition
[Website] [Project] [Code]

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
[Website] [Project] [Code]

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning
[Website] [Project] [Code]

MotionClone: Training-Free Motion Cloning for Controllable Video Generation
[Website] [Project] [Code]

TransPixar: Advancing Text-to-Video Generation with Transparency
[Website] [Project] [Code]

StableAnimator: High-Quality Identity-Preserving Human Image Animation
[Website] [Project] [Code]

AnimateAnything: Consistent and Controllable Animation for Video Generation
[Website] [Project] [Code]

GameGen-X: Interactive Open-world Game Video Generation
[Website] [Project] [Code]

AniDoc: Animation Creation Made Easier
[Website] [Project] [Code]

VEnhancer: Generative Space-Time Enhancement for Video Generation
[Website] [Project] [Code]

SF-V: Single Forward Video Generation Model
[Website] [Project] [Code]

Video Motion Transfer with Diffusion Transformers
[Website] [Project] [Code]

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
[Website] [Project] [Code]

Pyramidal Flow Matching for Efficient Video Generative Modeling
[Website] [Project] [Code]

AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation
[Website] [Project] [Code]

Trajectory Attention for Fine-grained Video Motion Control
[Website] [Project] [Code]

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
[Website] [Project] [Code]

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
[Website] [Project] [Code]

CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
[Website] [Project] [Code]

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
[Website] [Project] [Code]

MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
[Website] [Project] [Code]

VideoTetris: Towards Compositional Text-to-Video Generation
[Website] [Project] [Code]

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
[Website] [Project] [Code]

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
[Website] [Project] [Code]

MotionBooth: Motion-Aware Customized Text-to-Video Generation
[Website] [Project] [Code]

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
[Website] [Project] [Code]

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
[Website] [Project] [Code]

MotionDreamer: Zero-Shot 3D Mesh Animation from Video Diffusion Models
[Website] [Project] [Code]

MotionCraft: Physics-based Zero-Shot Video Generation
[Website] [Project] [Code]

MotionMaster: Training-free Camera Motion Transfer For Video Generation
[Website] [Project] [Code]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
[Website] [Project] [Code]

Motion Inversion for Video Customization
[Website] [Project] [Code]

MagicAvatar: Multimodal Avatar Generation and Animation
[Website] [Project] [Code]

Progressive Autoregressive Video Diffusion Models
[Website] [Project] [Code]

TrailBlazer: Trajectory Control for Diffusion-Based Video Generation
[Website] [Project] [Code]

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
[Website] [Project] [Code]

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
[Website] [Project] [Code]

Breathing Life Into Sketches Using Text-to-Video Priors
[Website] [Project] [Code]

Latent Video Diffusion Models for High-Fidelity Long Video Generation
[Website] [Project] [Code]

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
[Website] [Project] [Code]

RepVideo: Rethinking Cross-Layer Representation for Video Generation
[Website] [Project] [Code]

Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
[Website] [Project] [Code]

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
[Website] [Project] [Code]

VideoComposer: Compositional Video Synthesis with Motion Controllability
[Website] [Project] [Code]

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
[Website] [Project] [Code]

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
[Website] [Project] [Code]

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
[Website] [Project] [Code]

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
[Website] [Project] [Code]

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
[Website] [Project] [Code]

LLM-GROUNDED VIDEO DIFFUSION MODELS
[Website] [Project] [Code]

FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
[Website] [Project] [Code]

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
[Website] [Project] [Code]

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
[Website] [Project] [Code]

VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning
[Website] [Project] [Code]

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
[Website] [Project] [Code]

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
[Website] [Project] [Code]

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
[Website] [Project] [Code]

ART⋅V: Auto-Regressive Text-to-Video Generation with Diffusion Models
[Website] [Project] [Code]

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
[Website] [Project] [Code]

VideoBooth: Diffusion-based Video Generation with Image Prompts
[Website] [Project] [Code]

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
[Website] [Project] [Code]

LivePhoto: Real Image Animation with Text-guided Motion Control
[Website] [Project] [Code]

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
[Website] [Project] [Code]

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
[Website] [Project] [Code]

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
[Website] [Project] [Code]

DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models
[Website] [Project] [Code]

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
[Website] [Project] [Code]

FreeInit: Bridging Initialization Gap in Video Diffusion Models
[Website] [Project] [Code]

Text2AC-Zero: Consistent Synthesis of Animated Characters using 2D Diffusion
[Website] [Project] [Code]

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
[Website] [Project] [Code]

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
[Website] [Project] [Code]

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise
[Website] [Project] [Code]

GameFactory: Creating New Games with Generative Interactive Videos
[Website] [Project] [Code]

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
[Website] [Project] [Code]

Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions
[Website] [Project] [Code]

Latte: Latent Diffusion Transformer for Video Generation
[Website] [Project] [Code]

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
[Website] [Project] [Code]

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
[Website] [Project] [Code]

Towards A Better Metric for Text-to-Video Generation
[Website] [Project] [Code]

Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
[Website] [Project] [Code]

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
[Website] [Project] [Code]

AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
[Website] [Project] [Code]

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
[Website] [Project] [Code]

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
[Website] [Project] [Code]

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
[Website] [Project] [Code]

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
[Website] [Project] [Code]

Optical-Flow Guided Prompt Optimization for Coherent Video Generation
[Website] [Project] [Code]

Large Motion Video Autoencoding with Cross-modal Video VAE
[Website] [Project] [Code]

FlexiFilm: Long Video Generation with Flexible Conditions
[Website] [Project] [Code]

FIFO-Diffusion: Generating Infinite Videos from Text without Training
[Website] [Project] [Code]

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
[Website] [Project] [Code]

CV-VAE: A Compatible Video VAE for Latent Generative Video Models
[Website] [Project] [Code]

MVOC: a training-free multiple video object composition method with diffusion models
[Website] [Project] [Code]

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
[Website] [Project] [Code]

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
[Website] [Project] [Code]

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
[Website] [Project] [Code]

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
[Website] [Project] [Code]

Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction
[Website] [Project] [Code]

AMG: Avatar Motion Guided Video Generation
[Website] [Project] [Code]

DiVE: DiT-based Video Generation with Enhanced Control
[Website] [Project] [Code]

MegActor-Σ: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
[Website] [Project] [Code]

X-Dyna: Expressive Dynamic Human Image Animation
[Website] [Project] [Code]

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
[ICLR 2023] [Code]

UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer
[AAAI 2025] [Code]

Open-Sora: Democratizing Efficient Video Production for All
[Website] [Code]

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
[Website] [Code]

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
[Website] [Code]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
[Website] [Code]

Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing
[ICLR 2024] [Code]

SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces
[ICLR 2024] [Code]

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
[Website] [Code]

Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
[Website] [Code]

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
[Website] [Code]

Real-Time Video Generation with Pyramid Attention Broadcast
[Website] [Code]

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
[Website] [Code]

Diffusion Probabilistic Modeling for Video Generation
[Website] [Code]

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
[Website] [Code]

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
[Website] [Code]

Autoregressive Video Generation without Vector Quantization
[Website] [Code]

STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction
[Website] [Code]

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training
[Website] [Code]

LTX-Video: Realtime Video Latent Diffusion
[Website] [Code]

Vlogger: Make Your Dream A Vlog
[Website] [Code]

Magic-Me: Identity-Specific Video Customized Diffusion
[Website] [Code]

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
[Website] [Code]

EchoReel: Enhancing Action Generation of Existing Video Diffusion Models
[Website] [Code]

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
[Website] [Code]

TAVGBench: Benchmarking Text to Audible-Video Generation
[Website] [Code]

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
[Website] [Code]

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
[Website] [Code]

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
[Website] [Code]

IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis
[Website] [Code]

REDUCIO! Generating 1024×1024 Video within 16 Seconds using Extremely Compressed Motion Latents
[Website] [Code]

GRID: Visual Layout Generation
[Website] [Code]

MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling
[Website] [Code]

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
[Website] [Code]

HARIVO: Harnessing Text-to-Image Models for Video Generation [ECCV 2024] [Project]

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
[CVPR 2024] [Project]

AtomoVideo: High Fidelity Image-to-Video Generation
[CVPR 2024] [Project]

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
[ICLR 2024] [Project]

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
[CVPR 2024] [Project]

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
[ECCV 2024] [Project]

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
[ECCV 2024] [Project]

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
[Website] [Project]

Motion Prompting: Controlling Video Generation with Motion Trajectories
[Website] [Project]

Mojito: Motion Trajectory and Intensity Control for Video Generation
[Website] [Project]

OmniCreator: Self-Supervised Unified Generation with Universal Editing
[Website] [Project]

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models
[Website] [Project]

VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation
[Website] [Project]

Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
[Website] [Project]

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
[Website] [Project]

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
[Website] [Project]

Training-free Long Video Generation with Chain of Diffusion Model Experts
[Website] [Project]

Free2Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Model
[Website] [Project]

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
[Website] [Project]

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
[Website] [Project]

Hierarchical Patch Diffusion Models for High-Resolution Video Generation
[Website] [Project]

Mimir: Improving Video Diffusion Models for Precise Text Understanding
[Website] [Project]

From Slow Bidirectional to Fast Causal Video Generators
[Website] [Project]

ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement
[Website] [Project]

I4VGen: Image as Stepping Stone for Text-to-Video Generation
[Website] [Project]

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
[Website] [Project]

FrameBridge: Improving Image-to-Video Generation with Bridge Models
[Website] [Project]

MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
[Website] [Project]

Boosting Camera Motion Control for Video Diffusion Transformers
[Website] [Project]

UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
[Website] [Project]

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
[Website] [Project]

Controllable Longer Image Animation with Diffusion Models
[Website] [Project]

AniClipart: Clipart Animation with Text-to-Video Priors
[Website] [Project]

Spectral Motion Alignment for Video Motion Transfer using Diffusion Models
[Website] [Project]

TimeRewind: Rewinding Time with Image-and-Events Video Diffusion
[Website] [Project]

VideoPoet: A Large Language Model for Zero-Shot Video Generation
[Website] [Project]

PEEKABOO: Interactive Video Generation via Masked-Diffusion
[Website] [Project]

Searching Priors Makes Text-to-Video Synthesis Better
[Website] [Project]

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
[Website] [Project]

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
[Website] [Project]

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
[Website] [Project]

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
[Website] [Project]

Imagen Video: High Definition Video Generation with Diffusion Models
[Website] [Project]

MoVideo: Motion-Aware Video Generation with Diffusion Models
[Website] [Project]

Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training
[Website] [Project]

Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
[Website] [Project]

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
[Website] [Project]

Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
[Website] [Project]

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model
[Website] [Project]

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
[Website] [Project]

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
[Website] [Project]

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
[Website] [Project]

Customizing Motion in Text-to-Video Diffusion Models
[Website] [Project]

Photorealistic Video Generation with Diffusion Models
[Website] [Project]

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
[Website] [Project]

VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
[Website] [Project]

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
[Website] [Project]

ActAnywhere: Subject-Aware Video Background Generation
[Website] [Project]

Lumiere: A Space-Time Diffusion Model for Video Generation
[Website] [Project]

InstructVideo: Instructing Video Diffusion Models with Human Feedback
[Website] [Project]

Boximator: Generating Rich and Controllable Motions for Video Synthesis
[Website] [Project]

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
[Website] [Project]

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
[Website] [Project]

Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation
[Website] [Project]

Audio-Synchronized Visual Animation
[Website] [Project]

I2VControl: Disentangled and Unified Video Motion Synthesis Control
[Website] [Project]

Mind the Time: Temporally-Controlled Multi-Event Video Generation
[Website] [Project]

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
[Website] [Project]

S2DM: Sector-Shaped Diffusion Models for Video Generation
[Website] [Project]

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models
[Website] [Project]

AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment
[Website] [Project]

Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation
[Website] [Project]

Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
[Website] [Project]

Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss
[Website] [Project]

PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
[Website] [Project]

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
[Website] [Project]

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
[Website] [Project]

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
[Website] [Project]

VideoAuteur: Towards Long Narrative Video Generation
[Website] [Project]

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
[Website] [Project]

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
[Website] [Project]

DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
[Website] [Project]

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide
[Website] [Project]

MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis
[Website] [Project]

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
[Website] [Project]

Improved Video VAE for Latent Video Diffusion Model
[Website] [Project]

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
[Website] [Project]

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
[Website] [Project]

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
[Website] [Project]

OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization
[Website] [Project]

DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships
[ACM MM 2024 Oral]

Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
[Website]

Four-Plane Factorized Video Autoencoders
[Website]

Grid Diffusion Models for Text-to-Video Generation
[Website]

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
[Website]

GenRec: Unifying Video Generation and Recognition with Diffusion Models
[Website]

Efficient Continuous Video Flow Model for Video Prediction
[Website]

Dual-Stream Diffusion Net for Text-to-Video Generation
[Website]

DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control
[Website]

SimDA: Simple Diffusion Adapter for Efficient Video Generation
[Website]

VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
[Website]

Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models
[Website]

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
[Website]

LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation
[Website]

Optimal Noise pursuit for Augmenting Text-to-Video Generation
[Website]

Make Pixels Dance: High-Dynamic Video Generation
[Website]

Video-Infinity: Distributed Long Video Generation
[Website]

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
[Website]

Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion
[Website]

Decouple Content and Motion for Conditional Image-to-Video Generation
[Website]

X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention
[Website]

F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis
[Website]

MTVG : Multi-text Video Generation with Text-to-Video Models
[Website]

VideoLCM: Video Latent Consistency Model
[Website]

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
[Website]

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
[Website]

I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
[Website]

360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
[Website]

CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
[Website]

Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation
[Website]

Training-Free Semantic Video Composition via Pre-trained Diffusion Model
[Website]

STIV: Scalable Text and Image Conditioned Video Generation
[Website]

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
[Website]

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models
[Website]

Human Video Translation via Query Warping
[Website]

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
[Website]

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
[Website]

Context-aware Talking Face Video Generation
[Website]

Pix2Gif: Motion-Guided Diffusion for GIF Generation
[Website]

Intention-driven Ego-to-Exo Video Generation
[Website]

AnimateDiff-Lightning: Cross-Model Diffusion Distillation
[Website]

Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
[Website]

Matten: Video Generation with Mamba-Attention
[Website]

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
[Website]

ReVideo: Remake a Video with Motion and Content Control
[Website]

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
[Website]

SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
[Website]

GVDIFF: Grounded Text-to-Video Generation with Diffusion Models
[Website]

Mobius: An High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task
[Website]

Contrastive Sequential-Diffusion Learning: An approach to Multi-Scene Instructional Video Synthesis
[Website]

Multi-sentence Video Grounding for Long Video Generation
[Website]

Fine-gained Zero-shot Video Sampling
[Website]

Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
[Website]

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
[Website]

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
[Website]

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
[Website]

One-Shot Learning Meets Depth Diffusion in Multi-Object Videos
[Website]

Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation
[Website]

S2AG-Vid: Enhancing Multi-Motion Alignment in Video Diffusion Models via Spatial and Syntactic Attention-Based Guidance
[Website]

JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation
[Website]

ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning
[Website]

COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
[Website]

Noise Crystallization and Liquid Noise: Zero-shot Video Generation using Image Diffusion Models
[Website]

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way
[Website]

LumiSculpt: A Consistency Lighting Control Network for Video Generation
[Website]

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
[Website]

OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models
[Website]

Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge
[Website]

SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
[Website]

StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart
[Website]

VIRES: Video Instance Repainting with Sketch and Text Guidance
[Website]

MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
[Website]

Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints
[Website]

Fleximo: Towards Flexible Text-to-Human Motion Video Generation
[Website]

SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
[Website]

Towards Chunk-Wise Generation for Long Videos
[Website]

Motion Dreamer: Realizing Physically Coherent Video Generation through Scene-Aware Motion Reasoning
[Website]

CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
[Website]

Sketch-Guided Motion Diffusion for Stylized Cinemagraph Synthesis
[Website]

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
[Website]

Mobile Video Diffusion
[Website]

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
[Website]

Can video generation replace cinematographers? Research on the cinematic language of generated video
[Website]

MotionBridge: Dynamic Video Inbetweening with Flexible Controls
[Website]

Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory
[Website]

Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion
[Website]

Video Editing

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
[ICCV 2023 Oral] [Website] [Project] [Code]

Text2LIVE: Text-Driven Layered Image and Video Editing
[ECCV 2022 Oral] [Project] [code]

Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
[CVPR 2023] [Project] [Code]

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
[ICCV 2023] [Project] [Code]

StableVideo: Text-driven Consistency-aware Diffusion Video Editing
[ICCV 2023] [Website] [Code]

Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
[ECCV 2024] [Project] [Code]

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
[Website] [Project] [Code]

Video-P2P: Video Editing with Cross-attention Control
[Website] [Project] [Code]

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
[Website] [Project] [Code]

MagicEdit: High-Fidelity and Temporally Coherent Video Editing
[Website] [Project] [Code]

TokenFlow: Consistent Diffusion Features for Consistent Video Editing
[Website] [Project] [Code]

ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing
[Website] [Project] [Code]

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
[Website] [Project] [Code]

MotionDirector: Motion Customization of Text-to-Video Diffusion Models
[Website] [Project] [Code]

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
[Website] [Project] [Code]

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
[Website] [Project] [Code]

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
[Website] [Project] [Code]

MotionEditor: Editing Video Motion via Content-Aware Diffusion
[Website] [Project] [Code]

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
[Website] [Project] [Code]

MagicStick: Controllable Video Editing via Control Handle Transformations
[Website] [Project] [Code]

VidToMe: Video Token Merging for Zero-Shot Video Editing
[Website] [Project] [Code]

VASE: Object-Centric Appearance and Shape Manipulation of Real Videos
[Website] [Project] [Code]

Neural Video Fields Editing
[Website] [Project] [Code]

UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
[Website] [Project] [Code]

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
[Website] [Project] [Code]

Vid2Vid-zero: Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models
[Website] [Code]

Re-Attentional Controllable Video Diffusion Editing
[Website] [Code]

DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization
[Website] [Code]

SST-EM: Advanced Metrics for Evaluating Semantic, Spatial and Temporal Aspects in Video Editing
[Website] [Code]

LOVECon: Text-driven Training-Free Long Video Editing with ControlNet
[Website] [Code]

Pix2video: Video Editing Using Image Diffusion
[Website] [Code]

E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
[Website] [Code]

Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer
[Website] [Code]

Flow-Guided Diffusion for Video Inpainting
[Website] [Code]

Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models
[Website] [Code]

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing
[Website] [Code]

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
[Website] [Code]

Shape-Aware Text-Driven Layered Video Editing
[CVPR 2023] [Website] [Project]

VideoDirector: Precise Video Editing via Text-to-Video Models
[Website] [Project]

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing
[Website] [Project]

Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
[Website] [Project]

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
[Website] [Project]

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
[Website] [Project]

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
[Website] [Project]

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
[Website] [Project]

DIVE: Taming DINO for Subject-Driven Video Editing
[Website] [Project]

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
[Website] [Project]

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
[Website] [Project]

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
[Website] [Project]

WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing
[ECCV 2024] [Project]

MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance
[Website] [Project]

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
[Website] [Project]

DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing
[Website] [Project]

MIVE: New Design and Benchmark for Multi-Instance Video Editing
[Website] [Project]

VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
[Website] [Project]

DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
[ECCV 2024]

Edit Temporal-Consistent Videos with Image Diffusion Model
[Website]

Streaming Video Diffusion: Online Video Editing with Diffusion Models
[Website]

Cut-and-Paste: Subject-Driven Video Editing with Attention Control
[Website]

MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
[Website]

Dreamix: Video Diffusion Models Are General Video Editors
[Website]

Towards Consistent Video Editing with Text-to-Image Diffusion Models
[Website]

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints
[Website]

CCEdit: Creative and Controllable Video Editing via Diffusion Models
[Website]

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models
[Website]

FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier
[Website]

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
[Website]

RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing
[Website]

Object-Centric Diffusion for Efficient Video Editing
[Website]

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing
[Website]

Video Editing via Factorized Diffusion Distillation
[Website]

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
[Website]

EffiVED:Efficient Video Editing via Text-instruction Diffusion Models
[Website]

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
[Website]

GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models
[Website]

Temporally Consistent Object Editing in Videos using Extended Attention
[Website]

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
[Website]

FRAG: Frequency Adapting Group for Diffusion Video Editing
[Website]

InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models
[Website]

Text-based Talking Video Editing with Cascaded Conditional Diffusion
[Website]

Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion
[Website]

Blended Latent Diffusion under Attention Control for Real-World Video Editing
[Website]

EditBoard: Towards A Comprehensive Evaluation Benchmark for Text-based Video Editing Models
[Website]

DNI: Dilutional Noise Initialization for Diffusion Video Editing
[Website]

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
[Website]

Replace Anyone in Videos
[Website]

Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing
[Website]

DreamColour: Controllable Video Colour Editing without Training
[Website]

MoViE: Mobile Diffusion for Video Editing
[Website]

Edit as You See: Image-guided Video Editing via Masked Motion Modeling
[Website]

IP-FaceDiff: Identity-Preserving Facial Video Editing with Diffusion
[Website]

Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning
[Website]