LMStack

LLM Deployment Management Platform - Deploy and manage Large Language Models on distributed GPU workers.

Features

Web UI for managing workers, models, and deployments
Support for vLLM and Ollama inference backends
Docker-based worker agents for GPU nodes
Real-time deployment status monitoring
OpenAI-compatible API gateway

Architecture

┌─────────────────┐     ┌─────────────────┐
│   Web Frontend  │────▶│   API Server    │
│   (React)       │     │   (FastAPI)     │
└─────────────────┘     └────────┬────────┘
                                │
                   ┌────────────┴────────────┐
                   ▼                         ▼
           ┌──────────────┐          ┌──────────────┐
           │ Worker Agent │          │ Worker Agent │
           │  (GPU Node)  │          │  (GPU Node)  │
           └──────────────┘          └──────────────┘

Quick Start

Prerequisites

Docker
Docker Compose V2: sudo apt install docker-compose-v2
Docker permissions: sudo usermod -aG docker $USER && newgrp docker
NVIDIA GPU with CUDA support
NVIDIA Container Toolkit (install with ./scripts/install-nvidia-toolkit.sh)

Deploy with Docker Compose

# Deploy Backend + Frontend
docker compose -f docker-compose.deploy.yml up -d

Frontend: http://localhost:3000
Backend API: http://localhost:8088

Start Worker (on GPU machine)

docker run -d \
  --name lmstack-worker \
  --gpus all \
  --privileged \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e BACKEND_URL=http://YOUR_SERVER_IP:8088 \
  -e WORKER_NAME=gpu-worker-01 \
  infinirc/lmstack-worker:latest

Usage

Login with admin / admin (change password after first login)
Check Workers page - workers auto-register
Add model in Models page
Create deployment in Deployments page
Use OpenAI-compatible API:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}]}'

Development

For local development without Docker:

# Terminal 1 - Frontend
cd frontend
npm install
npm run dev

# Terminal 2 - Backend
cd backend
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 3 - Worker (on GPU machine)
cd worker
pip install -r requirements.txt
python agent.py --name gpu-worker-01 --server-url http://YOUR_SERVER_IP:8000

API Docs

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
frontend		frontend
scripts		scripts
worker		worker
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
README_zh-TW.md		README_zh-TW.md
VERSION		VERSION
docker-compose.deploy.yml		docker-compose.deploy.yml
docker-compose.full.yml		docker-compose.full.yml
docker-compose.worker.yml		docker-compose.worker.yml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LMStack

Features

Architecture

Quick Start

Prerequisites

Deploy with Docker Compose

Start Worker (on GPU machine)

Usage

Development

API Docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LMStack

Features

Architecture

Quick Start

Prerequisites

Deploy with Docker Compose

Start Worker (on GPU machine)

Usage

Development

API Docs

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages