LLM Deployment Management Platform - Deploy and manage Large Language Models on distributed GPU workers.
- Web UI for managing workers, models, and deployments
- Support for vLLM and Ollama inference backends
- Docker-based worker agents for GPU nodes
- Real-time deployment status monitoring
- OpenAI-compatible API gateway
┌─────────────────┐ ┌─────────────────┐
│ Web Frontend │────▶│ API Server │
│ (React) │ │ (FastAPI) │
└─────────────────┘ └────────┬────────┘
│
┌────────────┴────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Worker Agent │ │ Worker Agent │
│ (GPU Node) │ │ (GPU Node) │
└──────────────┘ └──────────────┘
- Docker
- Docker Compose V2:
sudo apt install docker-compose-v2 - Docker permissions:
sudo usermod -aG docker $USER && newgrp docker - NVIDIA GPU with CUDA support
- NVIDIA Container Toolkit (install with
./scripts/install-nvidia-toolkit.sh)
# Deploy Backend + Frontend
docker compose -f docker-compose.deploy.yml up -d- Frontend: http://localhost:3000
- Backend API: http://localhost:52000
Windows Firewall blocks LAN access by default. Choose one of the following options:
Option 1: Disable Firewall (Simplest)
# Run in PowerShell (Administrator)
Set-NetFirewallProfile -Profile Domain,Public,Private -Enabled FalseOption 2: Add Firewall Rules (More Secure)
# Run in PowerShell (Administrator)
# Base ports (Frontend + Backend API)
New-NetFirewallRule -DisplayName "LMStack" -Direction Inbound -LocalPort 3000,52000 -Protocol TCP -Action Allow
# Model deployment ports (add ports as needed, e.g., 40000-40100)
New-NetFirewallRule -DisplayName "LMStack Models" -Direction Inbound -LocalPort 40000-40100 -Protocol TCP -Action Allow
# App ports (e.g., Open WebUI on 46488)
New-NetFirewallRule -DisplayName "LMStack Apps" -Direction Inbound -LocalPort 46000-46500 -Protocol TCP -Action AllowNote: When you deploy models or apps, check the assigned port in the UI and ensure it's allowed through the firewall.
- Login with
admin/admin(change password after first login) - Go to Workers page and click Add Worker to get the Docker command
- Run the Docker command on your GPU machine to register a worker
- Add model in Models page
- Create deployment in Deployments page
- Use OpenAI-compatible API:
curl http://localhost:52000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}]}'Build and run Docker images locally:
# Build all images
./scripts/build-local.sh
# Or build specific image
./scripts/build-local.sh backend
./scripts/build-local.sh frontend
./scripts/build-local.sh worker
# Run locally built backend + frontend
docker compose -f docker-compose.local.yml up -dThen go to Workers page in the UI to add a worker.
# Terminal 1 - Frontend
cd frontend
npm install
npm run dev
# Terminal 2 - Backend
cd backend
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 52000 --reload
# Terminal 3 - Worker (on GPU machine)
cd worker
pip install -r requirements.txt
python agent.py --name gpu-worker-01 --server-url http://YOUR_SERVER_IP:52000- Swagger UI: http://localhost:52000/docs
- ReDoc: http://localhost:52000/redoc
Apache-2.0
