Full-featured voice AI agent β record your voice, get responses. Has memory, can manage cronjobs, send messages, and more.
git clone https://github.com/zeroknowledge0x/zka-voice.git
cd zka-voice/server
chmod +x setup.sh && ./setup.sh# API keys (required)
nano .env
# Agent config (optional)
cp hermes_config.json.example hermes_config.json
nano hermes_config.jsonpython3 audio-server.pyOpen http://localhost:8082 in your browser. Done! β
Host the web UI for free β no server needed for the frontend!
- Go to vercel.com and sign up with GitHub
- Click "New Project"
- Import
zeroknowledge0x/zka-voice - Set Root Directory to
web-client - Click Deploy
- Get your URL (e.g.,
https://zka-voice.vercel.app)
- Go to pages.cloudflare.com
- Click "Create a project"
- Connect GitHub repo
zeroknowledge0x/zka-voice - Set Build output directory to
web-client - Click Deploy
- Get your URL (e.g.,
https://zka-voice.pages.dev)
- Go to repo Settings β Pages
- Source: Deploy from a branch
- Branch:
main, folder:/web-client - Click Save
- Get your URL (e.g.,
https://zeroknowledge0x.github.io/zka-voice)
If you have a Hermes Agent, just say:
"Install ZKA Voice from https://github.com/zeroknowledge0x/zka-voice"
The agent will automatically clone, setup, and run. No manual commands needed!
File: SKILL.md β skill loaded by other Hermes agents.
# Install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
chmod +x /usr/local/bin/cloudflared
# Start tunnel
cloudflared tunnel --url http://localhost:8082Get URL https://xxx.trycloudflare.com β open from phone.
| Service | Purpose | Sign Up |
|---|---|---|
| Groq | STT (Whisper) | console.groq.com |
| MiMo | LLM | mimo.xiaomi.com |
| Edge TTS | Text-to-Speech | No sign up needed |
Groq is required for STT. MiMo is optional (fallback to Groq if not set).
See examples/ folder for config samples:
basic-english.jsonβ Simple, English languagebasic-indonesia.jsonβ Simple, Indonesian languagefull-telegram.jsonβ Full features + Telegram syncdeveloper.jsonβ Developer-focused, terminal tools
Usage:
cp examples/basic-english.json server/hermes_config.json
# Edit as needed
nano server/hermes_config.json- ποΈ Hold-to-Talk β record voice, get audio response
- π§ Memory β remembers conversation context
- β‘ Dual Mode β Chat (natural) & Command (12 tools)
- π§ Cronjob Management β manage from voice
- π± Telegram Sync β auto-send to topic
- π Password Protection β token-based auth
- π Multi-STT β Groq Whisper + MiMo fallback
- π£οΈ Indonesian TTS β Edge TTS ArdiNeural
- π± Mobile-Friendly β optimized for iPhone/Android
| Tool | Description |
|---|---|
cronjob_list |
List all cronjobs |
cronjob_run |
Run a cronjob |
cronjob_pause |
Pause a cronjob |
cronjob_resume |
Resume a cronjob |
memory_read |
Read memory |
memory_write |
Write to memory |
skills_list |
List skills |
skills_search |
Search skills |
telegram_send |
Send Telegram message |
terminal_run |
Run command |
server_status |
Server status |
github_repos |
List repos |
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββ
β Web Client ββββββΆβ HTTP Server ββββββΆβ Groq STT β
β (Vercel/CF) βββββββ (port 8082) βββββββ (Whisper) β
ββββββββββββββββββββ ββββββββ¬ββββββββββββ ββββββββββββββββ
β
ββββββββΌββββββββββββ ββββββββββββββββ
β MiMo LLM ββββββΆβ Edge TTS β
β (or Groq) βββββββ (ArdiNeural) β
ββββββββββββββββββββ ββββββββββββββββ
Pipeline: Audio β ffmpeg β Groq STT β LLM β Edge TTS β Audio
Latency: ~3-5 seconds total
zka-voice/
βββ web-client/ # Static web UI
β βββ index.html # Main UI (hold-to-talk, dual mode)
β βββ config.js # Server URL config
βββ server/ # Python backend
β βββ audio-server.py # Main server
β βββ hermes_context.py # Context builder (generic)
β βββ setup.sh # One-click setup
β βββ requirements.txt # Dependencies
β βββ .env.example # API key template
β βββ hermes_config.json.example # Agent config template
βββ examples/ # Config examples
β βββ basic-english.json
β βββ basic-indonesia.json
β βββ full-telegram.json
β βββ developer.json
βββ SKILL.md # Auto-install skill for Hermes
βββ vercel.json # Vercel deployment config
βββ wrangler.toml # Cloudflare Pages config
βββ .gitignore
βββ README.md
- Python 3.10+
- ffmpeg
- Groq API key (free)
- VPS or laptop for server
- Password protection (24h TTL)
- API keys not committed to repo
- Cloudflare Tunnel = automatic HTTPS
MIT β free to use, modify, and share.
- Groq β STT Whisper
- Xiaomi MiMo β LLM
- Edge TTS β Text-to-Speech
- Cloudflare β Tunnel & Pages
- Vercel β Web Hosting
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with β€οΈ by ZKA Labs