This project aims to create a custom implementation of the stable diffusion technique for generating high-quality images from text prompts. Stable Diffusion is a generative model that iteratively refines noise into a coherent image based on the given text input.
- ✅ VAE (Variational Autoencoder)
- ✅ CLIP (Contrastive Language–Image Pretraining)
- ✅ Attention Mechanism
- ✅ Time Embedding
- ✅ Attention UNet is in progress
- ❌ Denoising Algorithm
- ❌ Full Pipeline