This project implements a latent diffusion model for image generation using PyTorch and the diffusers library.
It first pretrains an autoencoder to compress image to latent space, then perform diffusion in the latent space, which can be more efficient than pixel space.
You can install these dependencies using pip:
pip install torch torchvision diffusers tqdmDownload celeba images from Google Drive, and extract them to a directory. Please note that the images should be placed in <DATA_ROOT>/<sub_folder_name>/123.jpg, e.g. data_root/celeba/123.jpg, and no other subfolders in the data_root.
You can also use other datasets, just make sure the images are put in one subfolder under data_root.
The autoencoder needs to be trained first. You can train it using the autoencoder.py script.
python -m torch.distributed.run --nproc_per_node=NUM_GPUS autoencoder.pyReplace NUM_GPUS with the number of GPUs you want to use. Adjust hyperparameters in autoencoder.py as needed.
You can debug/run in single GPU by python autoencoder.py
Once the autoencoder is trained, you can train the latent diffusion model using latent_diffusion.py.
-
Set the autoencoder checkpoint path: Update
AUTOENCODER_CKPT_PATHinlatent_diffusion.pyto point to the saved autoencoder checkpoint. -
Run the training script:
python -m torch.distributed.run --nproc_per_node=NUM_GPUS latent_diffusion.pyDuring training, latent_diffusion.py will periodically generate and save sample images in the current directory. You can monitor these to track the progress of training.
autoencoder.py: Trains the autoencoder.latent_diffusion.py: Trains the latent diffusion model.util.py: Contains utility functions for distributed training and seeding.
This is a minimal implementation of the latent diffusion model, I skipped Adversial Loss during autoencoder pretraining. Parameters are not well tuned, and the model is not trained for a long time.
