LLM Deployment Management Platform - Deploy and manage Large Language Models on distributed GPU workers.
- Web UI for managing workers, models, and deployments
- Support for vLLM and Ollama inference backends
- Docker-based worker agents for GPU nodes
- Real-time deployment status monitoring
- OpenAI-compatible API gateway
┌─────────────────┐ ┌─────────────────┐
│ Web Frontend │────▶│ API Server │
│ (React) │ │ (FastAPI) │
└─────────────────┘ └────────┬────────┘
│
┌────────────┴────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Worker Agent │ │ Worker Agent │
│ (GPU Node) │ │ (GPU Node) │
└──────────────┘ └──────────────┘
- Docker
- Docker Compose V2:
sudo apt install docker-compose-v2 - Docker permissions:
sudo usermod -aG docker $USER && newgrp docker - NVIDIA GPU with CUDA support
- NVIDIA Container Toolkit (install with
./scripts/install-nvidia-toolkit.sh)
# Deploy Backend + Frontend
docker compose -f docker-compose.deploy.yml up -d- Frontend: http://localhost:3000
- Backend API: http://localhost:8088
docker run -d \
--name lmstack-worker \
--gpus all \
--privileged \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e BACKEND_URL=http://YOUR_SERVER_IP:8088 \
-e WORKER_NAME=gpu-worker-01 \
infinirc/lmstack-worker:latest- Login with
admin/admin(change password after first login) - Check Workers page - workers auto-register
- Add model in Models page
- Create deployment in Deployments page
- Use OpenAI-compatible API:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}]}'For local development without Docker:
# Terminal 1 - Frontend
cd frontend
npm install
npm run dev
# Terminal 2 - Backend
cd backend
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Terminal 3 - Worker (on GPU machine)
cd worker
pip install -r requirements.txt
python agent.py --name gpu-worker-01 --server-url http://YOUR_SERVER_IP:8000- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Apache-2.0