Open Query AI is very much the lightweight open-source version of Cluely before the hype. Run as a local, real-time speech-to-LLM assistant powered entirely by open-source models. Speak to your computer β get instant answers back from an offline LLM β no accounts, no paid APIs.
Built with:
faster-whisperβ GPUβaccelerated, lowβlatency speech-to-text- Ollama β Run local LLMs (Gemma 3 & others) on your machine
- Real-time microphone capture β live transcription (partials + final commit)
- In-context hotkey toggle: decide when user input should be sent to the LLM
- Open-source models only β no closed API keys, no rate limits
- Lightweight β runs entirely on your local machine
- Easy to extend (custom prompts, text-to-speech, conversation memory, etc.)
uv sync # Install all dependencies
uv run prompter.py # Start the live transcriber + LLMThe following software needs to be installed on your local machine before running.
We highly recommend the following:
- uv β Blazing fast Python environment manager (written in Rust)
- Ollama β A streamlined, open-source platform for running and managing LLMs on your local machine. It simplifies downloading, setting up, and interacting with open-source models
βΉοΈ Make sure Ollama is running in the background for LLM-based workflows.
To enable GPU-accelerated transcription with faster-whisper:
- NVIDIA GPU with sufficient VRAM for your chosen model
- NVIDIA GPU driver (version depends on your CUDA setup)
- CUDA Toolkit (typically version 11+)
- cuDNN (sometimes bundled with CUDA)
To ensure PyTorch is installed with CUDA support:
uv pip install torch --index-url https://download.pytorch.org/whl/cu128 && uv syncThis used to be a Next.js app that connected to OpenAI's GPT API and offered users access to ChatGPT without having to register. However, given their expiration on API keys, the only way to further access their API is by signing up for the paid version π.
- You control your data (runs entirely offline if desired)
- No rate limits / API key expirations
- Full extensibility for custom workflows
- True to the spirit of OpenAI's founding principles
π€ Microphone
β (sounddevice)
[Audio Queue]
β (Silero VAD β Voice Activity Detection)
Partial transcript every X seconds
β (Whisper transcription)
Final transcript when speech ends
β if in_context = True
Send to Ollama (Gemma3 or other LLM)
β
LLM Response β log/console/next step