Audio2Art is a Streamlit-based AI application that converts spoken audio commands into artistic images using Speech Recognition and Stable Diffusion.
Users upload a voice recording, which is transcribed into text and then transformed into AI-generated artwork.
- 🎧 Upload audio (WAV format)
- 📝 Convert speech to text using Google Speech Recognition
- 🎨 Generate images from text using Stable Diffusion
- 🖼️ Display generated artwork instantly
- 🌈 Custom background and styled UI
- ⚡ Cached model loading for better performance
- Python 3.9+
- Streamlit – Web UI
- SpeechRecognition – Audio to text
- PyTorch – Deep learning backend
- Diffusers – Stable Diffusion image generation
- Transformers
- SoundFile
Audio2art/
│── app.py
│── requirements.txt
│── README.md
│── input_audio.wav # temporary (auto-generated)
│── generated_art.png # output image
python -m pip install --upgrade pip
pip install -r requirements.txt
If needed, install manually:
pip install streamlit torch torchaudio diffusers transformers soundfile SpeechRecognition
streamlit run app.py
After running, open:
[http://localhost:8501](http://localhost:8501)
- Open the web app
- Upload a WAV audio file
- The app converts speech to text
- The text is used as a prompt
- AI generates and displays artwork
- Stable Diffusion works best with GPU
- On CPU, image generation may be slow
- Internet is required for Google Speech Recognition
- Only WAV audio format is supported
This project is suitable for:
- Mini Project
- AI / ML Lab
- Data Science Portfolio
- Hackathons
- Final Year Project Prototype
- 🎙️ Live microphone recording
- 🖌️ Style selection (realistic, anime, sketch)
- ☁️ Cloud deployment (Streamlit Cloud)
- 💾 Download generated images
- ⚡ Faster CPU-optimized models