A simple Speech-to-Text (STT) system in Python using the SpeechRecognition library and Googleβs Web Speech API. It listens to your microphone, sends the audio to Googleβs cloud recognizer, and prints the transcribed text.
- π€ Capture live audio from your microphone.
- βοΈ Cloud-based transcription using Googleβs Web Speech API.
- β‘ Real-time transcription with minimal setup.
- π οΈ Minimal Python dependencies for quick prototyping.
βββ stt_speechrecognition.py # Main script (run this)
βββ requirements_speechrecognition.txt # Python dependencies
βββ README.md # Documentation (this file)
-
Python 3.8+
-
A working microphone
-
Internet connection (required for Google Web Speech API)
-
System audio backend:
-
macOS:
brew install portaudio
-
Ubuntu/Debian:
sudo apt-get install portaudio19-dev python3-pyaudio
-
Windows: Download PyAudio wheels from here and install with
pip install <wheel-file>
.
-
Clone or download this repo, then install dependencies:
pip install -r requirements_speechrecognition.txt
Run the script:
python stt_speechrecognition.py
Steps:
- The program adjusts for background noise.
- Speak into your microphone.
- The recognized text is printed to the terminal.
- Requires an active internet connection (Google Web Speech API).
- API is free but limited (not suitable for very large-scale transcription).
- For offline transcription, consider using OpenAI Whisper.
ποΈ Adjusting for ambient noise...
β
Ready. Speak now!
π You said: Hello world, this is my Jarvis project!
Enhance your project by adding:
- Wake word detection (e.g., βJarvisβ).
- Text-to-Speech (TTS) for voice responses.
- Custom commands (open apps, fetch info, control IoT devices).
This project is licensed under the MIT License β free to use, modify, and distribute.