This repository contains the official implementation of the following paper:
Tuning-Free Image Customization with Image and Text Guidance
Current image customization methods with diffusion models have limitations, such as unintended changes, reliance on either reference images or text, and time-consuming fine-tuning. We introduce a tuning-free framework TIGIC for text-image-guided image customization, allowing precise, region-specific edits within seconds. Our approach preserves the reference image's semantic features while modifying details based on text descriptions, using an innovative attention blending strategy in the UNet decoder. This method is applicable to image synthesis, design, and creative photography.
git clone https://github.com/zrealli/TIGIC.git
cd TIGIC
To set up our environment, please follow these instructions:
conda env create -f environment.yaml
conda activate TIGIC
Please note that our project requires 24GB of memory to run.
Next, download the Stable Diffusion weights and put it to ./ckpt.
wget https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt
After downloading the base model, to execute user inference, use the following command:
python run.py --ckpt ./ckpt/v2-1_512-ema-pruned.ckpt --dpm_steps 20 --outdir ./outputs --root ./examples --scale 5 --seed 42
The generation results can be found in the ./outputs directory. Here we show some examples below.
From left to right, the images are the background, subject, collage, and the result generated by TIGIC.
'a photo of a pixel cartoon'
'a photo of a black sloth plushie'
'a photo of a polygonal illustration duck toy'
'a photo of a purple plushie toy'
'a photo of a sculpture golden retriever'
'a photo of a red clock'
🖋️ Please note that if the generated results contain significant artifacts, adjust the random seed --seed
to obtain the desired outcome. We demonstrate that different random seeds can produce varying image quality in the demo.
'a photo of a black sloth plushie'
Please refer to our project page for more visual comparisons.
We will soon release a Gradio demo version, which will include an integrated automatic foreground subject segmentation module.
This project is distributed under the MIT License. Our work builds upon the foundation laid by others. We thank the contributions of the following projects that our code is based on TF-ICON and Prompt-to-Prompt.
If you find our repo useful for your research, please consider citing our paper:
@inproceedings{li2024tuning,
title={Tuning-Free Image Customization with Image and Text Guidance},
author={Li, Pengzhi and Nie, Qiang and Chen, Ying and Jiang, Xi and Wu, Kai and Lin, Yuhuan and Liu, Yong and Peng, Jinlong and Wang, Chengjie and Zheng, Feng},
booktitle={European Conference on Computer Vision},
year={2024}
}