A kawaii-style AI assistant web application featuring voice interaction, real-time speech processing, and memory capabilities.
- π¨ Kawaii-style UI with responsive design
- π€ Real-time voice interaction with silence detection
- π Light/Dark mode support
- π« Smooth animations and transitions
- π Text-to-Speech using OpenAI's TTS API
- π― Accurate speech recognition with WhisperX
- π€ Browser-based Voice Activity Detection
- π§ Memory capabilities through Memobase integration
- π£οΈ High-quality Text-to-Speech with OpenAI
- π LLM integration with Ollama
Coming Soon:
- TTS optimization for Macos lightning-whisper-mlx
- Python 3.11+
- Node.js
- Docker and Docker Compose (for Memobase)
- FFmpeg (for audio processing)
- Ollama installed and running
- Create and activate Python virtual environment:
uv sync
- Set up environment variables:
# Create .env file in backend directory
cp .env.example .env
# Edit .env with your API keys and configurations
- Set up Memobase:
# Navigate to memobase directory
cd memobase/src/server
# Start Memobase services
docker-compose up -d
- Configure Memobase:
Edit
memobase/src/server/api/config.yaml
:
llm_api_key: ollama
llm_base_url: http://host.docker.internal:11434/v1
best_llm_model: phi4 # Must match your Ollama model
- Start the backend server:
uvicorn main:app --reload
- Install Node.js dependencies:
cd frontend
npm install
- Set up environment variables:
# Create .env.local file in frontend directory
cp .env.example .env.local
# Edit .env.local with your configurations
- Start the development server:
npm run dev
- Next.js
- TailwindCSS
- Pixi Live2D Display
- WebRTC
- Python 3.12
- FastAPI
- pyannote.audio
- WhisperX
- Ollama
- OpenAI (Coming Soon)
- Ollama
- OpenAI (Coming Soon)
- Memobase
- OpenAI (not hd model, meaning the cheapest one but still good)
project-root/
βββ backend/
β βββ main.py # FastAPI application
β β βββ services/
β β β βββ vad.py # Voice Activity Detection
β β β βββ stt.py # Speech-to-Text
β β β βββ tts.py # Text-to-Speech
β β β βββ llm.py # Language Model
β β βββ requirements.txt
β β
β βββ frontend/
β β βββ components/
β β β βββ Live2DModel.js
β β β βββ AudioTranscriber.js
β β βββ pages/
β β βββ public/
β β β βββ live2d/ # Live2D model assets
β β βββ styles/
β β
β βββ docs/
β βββ CHANGELOG.md
β βββ CONTRIBUTING.md
β βββ CODE_OF_CONDUCT.md
β βββ SECURITY.md
β βββ LICENSE.md
HF_TOKEN=your_huggingface_token
OPENAI_API_KEY=your_openai_key
INFERENCE_SERVER=ollama
OLLAMA_MODEL=your_model_name
USER_NAME=your_username
STREAM=True
STT_LANGUAGE=it # Language for speech recognition
NEXT_PUBLIC_API_URL=http://localhost:8000
Required files:
- memobase/src/server/api/.env
- memobase/src/server/api/config.yaml
- memobase/src/server/.env
POST /api/transcribe
- Speech-to-text conversionPOST /api/chat
- Chat with memory-enabled LLMPOST /api/tts
- Text-to-speech conversionPOST /api/detect-voice
- Voice activity detection
See CONTRIBUTING.md for detailed contribution guidelines.
Quick start:
- Fork the repository
- Create a new branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Please read our Code of Conduct before contributing.
This project is licensed under the MIT License - see LICENSE.md for details.
See SECURITY.md for reporting security vulnerabilities.
For immediate security concerns, please contact [email protected].
See CHANGELOG.md for a list of changes and versions.