Skip to content

WasamiKirua/samanta-os

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Samanta AI Assistant 🌸

A kawaii-style AI assistant web application featuring voice interaction, real-time speech processing, and memory capabilities.

✨ Features

Frontend

  • 🎨 Kawaii-style UI with responsive design
  • 🎀 Real-time voice interaction with silence detection
  • πŸŒ“ Light/Dark mode support
  • πŸ’« Smooth animations and transitions
  • πŸ”Š Text-to-Speech using OpenAI's TTS API

Backend

  • 🎯 Accurate speech recognition with WhisperX
  • 🎀 Browser-based Voice Activity Detection
  • 🧠 Memory capabilities through Memobase integration
  • πŸ—£οΈ High-quality Text-to-Speech with OpenAI
  • πŸ’­ LLM integration with Ollama

Coming Soon:

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js
  • Docker and Docker Compose (for Memobase)
  • FFmpeg (for audio processing)
  • Ollama installed and running

Backend Setup

  1. Create and activate Python virtual environment:
uv sync
  1. Set up environment variables:
# Create .env file in backend directory
cp .env.example .env
# Edit .env with your API keys and configurations
  1. Set up Memobase:
# Navigate to memobase directory
cd memobase/src/server

# Start Memobase services
docker-compose up -d
  1. Configure Memobase: Edit memobase/src/server/api/config.yaml:
llm_api_key: ollama
llm_base_url: http://host.docker.internal:11434/v1
best_llm_model: phi4  # Must match your Ollama model
  1. Start the backend server:
uvicorn main:app --reload

Frontend Setup

  1. Install Node.js dependencies:
cd frontend
npm install
  1. Set up environment variables:
# Create .env.local file in frontend directory
cp .env.example .env.local
# Edit .env.local with your configurations
  1. Start the development server:
npm run dev

πŸ› οΈ Tech Stack

Frontend

  • Next.js
  • TailwindCSS
  • Pixi Live2D Display
  • WebRTC

Backend

  • Python 3.12
  • FastAPI
  • pyannote.audio
  • WhisperX
  • Ollama
  • OpenAI (Coming Soon)

LLM Inference Server

  • Ollama
  • OpenAI (Coming Soon)

LLM Memory

  • Memobase

TTS

  • OpenAI (not hd model, meaning the cheapest one but still good)

πŸ“ Project Structure

project-root/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py              # FastAPI application
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   β”œβ”€β”€ vad.py          # Voice Activity Detection
β”‚   β”‚   β”‚   β”œβ”€β”€ stt.py          # Speech-to-Text
β”‚   β”‚   β”‚   β”œβ”€β”€ tts.py          # Text-to-Speech
β”‚   β”‚   β”‚   └── llm.py          # Language Model
β”‚   β”‚   └── requirements.txt
β”‚   β”‚
β”‚   β”œβ”€β”€ frontend/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ Live2DModel.js
β”‚   β”‚   β”‚   └── AudioTranscriber.js
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”œβ”€β”€ public/
β”‚   β”‚   β”‚   └── live2d/         # Live2D model assets
β”‚   β”‚   └── styles/
β”‚   β”‚
β”‚   └── docs/
β”‚       β”œβ”€β”€ CHANGELOG.md
β”‚       β”œβ”€β”€ CONTRIBUTING.md
β”‚       β”œβ”€β”€ CODE_OF_CONDUCT.md
β”‚       β”œβ”€β”€ SECURITY.md
β”‚       └── LICENSE.md

βš™οΈ Configuration

Backend Configuration

HF_TOKEN=your_huggingface_token
OPENAI_API_KEY=your_openai_key
INFERENCE_SERVER=ollama
OLLAMA_MODEL=your_model_name
USER_NAME=your_username
STREAM=True
STT_LANGUAGE=it  # Language for speech recognition

Frontend Configuration

NEXT_PUBLIC_API_URL=http://localhost:8000

Memobase Configuration

Required files:

  • memobase/src/server/api/.env
  • memobase/src/server/api/config.yaml
  • memobase/src/server/.env

API Endpoints

  • POST /api/transcribe - Speech-to-text conversion
  • POST /api/chat - Chat with memory-enabled LLM
  • POST /api/tts - Text-to-speech conversion
  • POST /api/detect-voice - Voice activity detection

🀝 Contributing

See CONTRIBUTING.md for detailed contribution guidelines.

Quick start:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Commit your changes (git commit -m 'Add some amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

Please read our Code of Conduct before contributing.

πŸ“œ License

This project is licensed under the MIT License - see LICENSE.md for details.

πŸ”’ Security

See SECURITY.md for reporting security vulnerabilities.

For immediate security concerns, please contact [email protected].

πŸ™ Acknowledgments

πŸ“ Changelog

See CHANGELOG.md for a list of changes and versions.

About

A Kawaii web interface for Samanta LLM

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published