A multimodal AI voice assistant with speech recognition, text-to-speech, computer vision, and hardware control capabilities.
Creator/Author: Mohammad Faiz
Repository: https://github.com/Mohammad-Faiz-Cloud-Engineer/Voice-Assistant-Multimodal
- Speech Recognition: Whisper-based voice command recognition
- Text-to-Speech: Coqui TTS with emotion support and voice cloning
- Computer Vision: Image analysis, object detection, face detection
- Camera Control: Real-time camera preview and image capture
- Screen Capture: Screenshot functionality
- Video Recording: Short video recording capability
- Hardware Control: Arduino servo motor control
- LM Studio Integration: Local LLM for conversational AI
- Python 3.14+
- CUDA-capable GPU (optional, for faster inference)
- Webcam (for camera features)
- Arduino (optional, for servo control)
- LM Studio running locally on port 1234
-
Clone the repository
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Copy
.env.exampleto.envand configure:cp .env.example .env
-
Edit
.envwith your configuration- Set
ARDUINO_PORTonly if you want servo control enabled - Keep
LM_STUDIO_BASE_URLon a trusted local or private endpoint - Adjust
OUTPUT_DIRandTEMP_DIRif you need non-default storage paths
- Set
python3 voice_assistant_multimodal.py- Camera: "Turn camera on/off", "Take picture", "Record video"
- Screen: "Take screenshot", "Capture screen"
- Vision: "Describe image", "Analyze screenshot"
- Detection: "Find car", "Detect faces"
- Servo: "Turn left/right", "Look up/down", "Center"
- Exit: "Stop listening"
All configuration is managed through environment variables in .env:
LM_STUDIO_BASE_URL: LM Studio API endpointARDUINO_PORT: Optional serial port for Arduino (e.g., COM3, /dev/ttyUSB0)CAMERA_INDEX: Camera device index (usually 0)WHISPER_MODEL: Whisper model size (tiny, base, small, medium, large)OUTPUT_DIR/TEMP_DIR: Writable directories for generated media files
- Never commit
.envfile to version control - Use HTTPS for production API endpoints
- Keep LM Studio bound to trusted interfaces only
- Run with minimal required permissions
See LICENSE file for details.
Contributions welcome! Please ensure code passes all security and quality checks.
Made by Mohammad Faiz