Easy, fast, and private LLM & VLM inference for every device
| Getting Started | Documentation | Architecture |
OmniInfer is a high-performance, cross-platform inference engine for running Large Language Models (LLM) and Vision-Language Models (VLM) locally. It abstracts away model compilation, hardware adaptation, and deployment complexity, enabling efficient local inference with minimal configuration.
OmniInfer powers the inference layer of Omni Studio, a unified model orchestration platform.
OmniInfer is fast with:
- Optimized token generation speed and minimal memory footprint
- Multiple backend engines (llama.cpp, mnn, et, mlx, OmniInfer Native) for best-fit performance
- Hardware-aware adaptation and optimization
OmniInfer is flexible and easy to use with:
- Seamless multi-backend switching — choose the best engine for your workload
- OpenAI-compatible API server for drop-in integration
- Support for LLM, VLM, and World Models
- Fine-grained parameter control (context length, GPU offloading, KV cache, etc.)
OmniInfer runs everywhere:
- Linux, macOS, Windows — desktop & server
- Android, iOS — mobile & edge devices
- One codebase, all platforms
macOS, Linux, and Android:
curl -fsSL https://raw.githubusercontent.com/omnimind-ai/OmniInfer/main/scripts/install.sh | bashWindows (PowerShell):
irm https://raw.githubusercontent.com/omnimind-ai/OmniInfer/main/scripts/install.ps1 | iexThe installer detects your platform and hardware, recommends a backend, and walks you through model setup interactively.
If you already cloned this repository, build at least one local runtime backend first.
- Windows: see Build Guide: Windows
- Linux: see Build Guide: Linux
- macOS: see Build Guide: macOS
- Android: see Build Guide: Android
After the runtime is ready, start with the OmniInfer CLI from the repository root.
Linux and macOS:
./omniinfer --helpWindows:
.\omniinfer.cmd --helpAndroid:
./omniinfer --helpIf you are using a packaged release that already includes runtime/, you can run the CLI immediately from the release directory:
Windows:
.\omniinfer.cmd --helpLinux and macOS:
./omniinfer --helpRecommended docs:
- CLI Guide: end-to-end CLI usage for Linux, macOS, Windows, and Android
- Android CLI Notes: Android direct-mode details
- Android JNI Bridge: generate an Android App bridge and packaged runtime assets
- Build Guide: build and platform packaging notes
- API Reference: OpenAI-compatible local API usage
If you use OmniInfer in research, please cite this repository. GitHub can automatically generate citation formats from CITATION.cff.
@software{omniinfer,
author = {{Omnimind AI}},
title = {OmniInfer},
url = {https://github.com/omnimind-ai/OmniInfer}
}We welcome and value any contributions and collaborations. Please check out Contributing to OmniInfer for how to get involved.
This project is licensed under the Apache License 2.0 — see LICENSE for details.