Skip to content

Latest commit

 

History

History
77 lines (59 loc) · 4.4 KB

README.md

File metadata and controls

77 lines (59 loc) · 4.4 KB

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

ShowCases

Please refer to the project page for full-quality and more examples.

1. Reference-Driven Identity-Aware Text-to-Video Generation

Reference Image Generated Video Generated Video

2. Stylized & Special Effect

Reference Image Generated Video Reference Image Generated Video

3. Multi-Shot Generation

A bearded man, wearing a yellow T-shirt, working for a wooden table...

Reference:

A woman, wearing a white shirt and blue jeans, enjoying her daytime activities...

Reference:

Overview

MileStones

  • 20250101 Paper released!
  • 202501-202502 We will release code and model (we are working on fit our methods on CogVideoX-1.5, HunyuanVideo, .etc). Stay tuned!

Methods

framework

In this work, we presented Magic Mirror, a zero-shot framework for identity-preserved video generation. Magic Mirror incorporates dual facial embeddings and Conditional Adaptive Normalization (CAN) into DiT-based architectures. Our approach enables robust identity preservation and stable training convergence. Extensive experiments demonstrate that Magic Mirror generates high-quality personalized videos while maintaining identity consistency from a single reference image, outperforming existing methods across multiple benchmarks and human evaluations.

Cite Magic Mirror

If you find this repo useful for your research, please consider citing the paper

@article{zhang2025magic,
  title={Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers},
  author={Zhang, Yuechen and Liu, Yaoyang and Xia, Bin and Peng, Bohao and Yan, Zexin and Lo, Eric and Jia, Jiaya},
  journal={arXiv preprint arXiv:2501.03931},
  year={2025}
}