GitHub - Infinirc/lmstack

LLM Deployment Management Platform - Deploy and manage Large Language Models on distributed GPU workers.

Features

Web UI for managing workers, models, and deployments
Support for vLLM and Ollama inference backends
Docker-based worker agents for GPU nodes
Real-time deployment status monitoring
OpenAI-compatible API gateway

Architecture

┌─────────────────┐     ┌─────────────────┐
│   Web Frontend  │────▶│   API Server    │
│   (React)       │     │   (FastAPI)     │
└─────────────────┘     └────────┬────────┘
                                │
                   ┌────────────┴────────────┐
                   ▼                         ▼
           ┌──────────────┐          ┌──────────────┐
           │ Worker Agent │          │ Worker Agent │
           │  (GPU Node)  │          │  (GPU Node)  │
           └──────────────┘          └──────────────┘

Quick Start

Prerequisites

Docker
Docker Compose V2: sudo apt install docker-compose-v2
Docker permissions: sudo usermod -aG docker $USER && newgrp docker
NVIDIA GPU with CUDA support
NVIDIA Container Toolkit (install with ./scripts/install-nvidia-toolkit.sh)

Deploy with Docker Compose

# Deploy Backend + Frontend
docker compose -f docker-compose.deploy.yml up -d

Frontend: http://localhost:3000
Backend API: http://localhost:52000

Windows Docker Desktop - LAN Access

Windows Firewall blocks LAN access by default. Choose one of the following options:

Option 1: Disable Firewall (Simplest)

# Run in PowerShell (Administrator)
Set-NetFirewallProfile -Profile Domain,Public,Private -Enabled False

Option 2: Add Firewall Rules (More Secure)

# Run in PowerShell (Administrator)
# Base ports (Frontend + Backend API)
New-NetFirewallRule -DisplayName "LMStack" -Direction Inbound -LocalPort 3000,52000 -Protocol TCP -Action Allow

# Model deployment ports (add ports as needed, e.g., 40000-40100)
New-NetFirewallRule -DisplayName "LMStack Models" -Direction Inbound -LocalPort 40000-40100 -Protocol TCP -Action Allow

# App ports (e.g., Open WebUI on 46488)
New-NetFirewallRule -DisplayName "LMStack Apps" -Direction Inbound -LocalPort 46000-46500 -Protocol TCP -Action Allow

Note: When you deploy models or apps, check the assigned port in the UI and ensure it's allowed through the firewall.

Usage

Login with admin / admin (change password after first login)
Go to Workers page and click Add Worker to get the Docker command
Run the Docker command on your GPU machine to register a worker
Add model in Models page
Create deployment in Deployments page
Use OpenAI-compatible API:

curl http://localhost:52000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}]}'

Development

Local Docker Build

Build and run Docker images locally:

# Build all images
./scripts/build-local.sh

# Or build specific image
./scripts/build-local.sh backend
./scripts/build-local.sh frontend
./scripts/build-local.sh worker

# Run locally built backend + frontend
docker compose -f docker-compose.local.yml up -d

Then go to Workers page in the UI to add a worker.

Without Docker

# Terminal 1 - Frontend
cd frontend
npm install
npm run dev

# Terminal 2 - Backend
cd backend
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 52000 --reload

# Terminal 3 - Worker (on GPU machine)
cd worker
pip install -r requirements.txt
python agent.py --name gpu-worker-01 --server-url http://YOUR_SERVER_IP:52000

API Docs

Swagger UI: http://localhost:52000/docs
ReDoc: http://localhost:52000/redoc

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
mcp-server		mcp-server
scripts		scripts
worker		worker
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
README_zh-TW.md		README_zh-TW.md
VERSION		VERSION
docker-compose.deploy.yml		docker-compose.deploy.yml
docker-compose.full.yml		docker-compose.full.yml
docker-compose.local.yml		docker-compose.local.yml
docker-compose.worker.yml		docker-compose.worker.yml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Architecture

Quick Start

Prerequisites

Deploy with Docker Compose

Windows Docker Desktop - LAN Access

Usage

Development

Local Docker Build

Without Docker

API Docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Architecture

Quick Start

Prerequisites

Deploy with Docker Compose

Windows Docker Desktop - LAN Access

Usage

Development

Local Docker Build

Without Docker

API Docs

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages