Skip to content

lywbarca/OmniInfer

 
 

Repository files navigation

OmniInfer logo

OmniInfer

Easy, fast, and private LLM & VLM inference for every device

| Getting Started | Documentation | Architecture |

About

OmniInfer is a high-performance, cross-platform inference engine for running Large Language Models (LLM) and Vision-Language Models (VLM) locally. It abstracts away model compilation, hardware adaptation, and deployment complexity, enabling efficient local inference with minimal configuration.

OmniInfer powers the inference layer of Omni Studio, a unified model orchestration platform.

OmniInfer is fast with:

  • Optimized token generation speed and minimal memory footprint
  • Multiple backend engines (llama.cpp, mnn, et, mlx, OmniInfer Native) for best-fit performance
  • Hardware-aware adaptation and optimization

OmniInfer is flexible and easy to use with:

  • Seamless multi-backend switching — choose the best engine for your workload
  • OpenAI-compatible API server for drop-in integration
  • Support for LLM, VLM, and World Models
  • Fine-grained parameter control (context length, GPU offloading, KV cache, etc.)

OmniInfer runs everywhere:

  • Linux, macOS, Windows — desktop & server
  • Android, iOS — mobile & edge devices
  • One codebase, all platforms

Getting Started

Quick Install

macOS, Linux, and Android:

curl -fsSL https://raw.githubusercontent.com/omnimind-ai/OmniInfer/main/scripts/install.sh | bash

Windows (PowerShell):

irm https://raw.githubusercontent.com/omnimind-ai/OmniInfer/main/scripts/install.ps1 | iex

The installer detects your platform and hardware, recommends a backend, and walks you through model setup interactively.

Source Checkout

If you already cloned this repository, build at least one local runtime backend first.

After the runtime is ready, start with the OmniInfer CLI from the repository root.

Linux and macOS:

./omniinfer --help

Windows:

.\omniinfer.cmd --help

Android:

./omniinfer --help

Packaged Release

If you are using a packaged release that already includes runtime/, you can run the CLI immediately from the release directory:

Windows:

.\omniinfer.cmd --help

Linux and macOS:

./omniinfer --help

Documentation

Recommended docs:

Architecture

omni_studio_architecture

Citation

If you use OmniInfer in research, please cite this repository. GitHub can automatically generate citation formats from CITATION.cff.

@software{omniinfer,
  author = {{Omnimind AI}},
  title = {OmniInfer},
  url = {https://github.com/omnimind-ai/OmniInfer}
}

Contributing

We welcome and value any contributions and collaborations. Please check out Contributing to OmniInfer for how to get involved.

License

This project is licensed under the Apache License 2.0 — see LICENSE for details.

About

Easy, fast, and private LLM & VLM inference for every device

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 44.7%
  • Shell 22.8%
  • C++ 14.5%
  • PowerShell 7.7%
  • Kotlin 5.9%
  • Swift 3.6%
  • Other 0.8%