Speech-to-Image-Live-Conversion-using-Deep-Learning_Infosys_Internship_Oct2024

The objective of this project is to develop a deep learning model that can convert spoken descriptions into corresponding images in real-time. This project is a Speech-to-Image Generator that takes audio input, transcribes it using a Whisper model, analyzes the sentiment of the transcription, and generates an image using Stable Diffusion. The application is built using Python and Streamlit.

Features

Audio Recording: Record audio through your microphone.
Transcription: Convert recorded audio to text using the Whisper model.
Sentiment Analysis: Classify the sentiment of the transcribed text.
Image Generation: Generate an image based on the transcription if the sentiment is positive or neutral.

System Flow

Detailed Workflow:

The application follows these steps:

Audio Input:
- The user sets the recording duration.
- The application records audio using the sounddevice library.
- The recorded audio is saved as a .wav file.
Transcription (Whisper Model):
- The saved audio is loaded and preprocessed.
- The Whisper model transcribes the audio into text.
Sentiment Analysis:
- The transcribed text is passed to a sentiment analysis pipeline.
- The sentiment is classified as either Positive, Neutral, or Negative.
Image Generation (Stable Diffusion):
- If the sentiment is Positive or Neutral, the transcription is passed to the Stable Diffusion pipeline, which generates an image based on the transcription.
- If the sentiment is Negative, no image is generated, and a warning is displayed to the user.
Display Results:
- The transcription and sentiment analysis results are displayed.
- If the sentiment is Positive or Neutral, the generated image is also displayed alongside the results.

Flowchart

graph TD
    A[Start: User Opens the App] --> B[Set Recording Duration]
    B --> C[Record Audio Using Microphone]
    C --> D[Save Audio as WAV File]
    D --> E[Transcribe Audio Using Whisper]
    E --> F[Perform Sentiment Analysis]
    F -->|Positive/Neutral| G[Generate Image Using Stable Diffusion]
    F -->|Negative| H[Skip Image Generation]
    G --> I[Display Transcription, Sentiment, and Image]
    H --> J[Display Transcription and Sentiment Only]
    I --> K[End]
    J --> K[End]

Installation

Clone the repository:

git clone https://github.com/AabidMK/Speech-to-Image-Live-Conversion-using-Deep-Learning_Infosys_Internship_Oct2024.git
cd Speech-to-Image-Live-Conversion-using-Deep-Learning_Infosys_Internship_Oct2024

Create a virtual environment (optional but recommended):

python -m venv venv
venv\Scripts\activate     # On Windows
source venv/bin/activate  # On macOS/Linux

Install dependencies:
```
pip install -r requirements.txt
```

How to Run

Start the Streamlit application:
```
streamlit run speech_to_image.py
```
Interact with the app:
- Set the recording duration using the slider.
- Click Start Recording 🎙️ to record your audio.
- View the transcription and sentiment analysis.
- If the sentiment is positive/neutral, view the generated image.

Note

The application will not generate an image if the sentiment is classified as negative.
Ensure your microphone is functioning correctly for the audio recording.

Troubleshooting

CUDA Error:
- If GPU is not available, the application will default to CPU, which might slow down inference.
- Verify your PyTorch installation supports CUDA.
Model Loading Issues:
- Confirm the paths for the Whisper and Stable Diffusion models are correct.
- Ensure the models are downloaded and accessible.
Dependencies:
- Use the exact versions of dependencies listed in requirements.txt to avoid compatibility issues.

Contributing

Feel free to fork this repository and make contributions. Create a pull request with your changes for review.

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
dataset		dataset
final_project		final_project
models		models
scripts		scripts
AI_Speech-to-Image.pdf		AI_Speech-to-Image.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Image-Live-Conversion-using-Deep-Learning_Infosys_Internship_Oct2024

Features

System Flow

Detailed Workflow:

Flowchart

Installation

How to Run

Note

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Image-Live-Conversion-using-Deep-Learning_Infosys_Internship_Oct2024

Features

System Flow

Detailed Workflow:

Flowchart

Installation

How to Run

Note

Troubleshooting

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages