Launch, manage, and stop LLM inference workloads on one or more NVIDIA DGX Spark systems — no Slurm, no Kubernetes, no fuss.
Documentation · Quick Start · Recipes · Spark Arena
uvx sparkrun setupOne command — installs sparkrun, then launches the guided setup wizard to create a cluster, configure SSH mesh, detect ConnectX-7 NICs, set up sudoers, and enable earlyoom.
# Run an inference workload
sparkrun run qwen3-1.7b-vllm
# Multi-node tensor parallelism (TP maps to node count on DGX Spark)
sparkrun run qwen3-1.7b-vllm --tp 2
# Re-attach to logs, stop a workload, check status
sparkrun logs qwen3-1.7b-vllm
sparkrun stop qwen3-1.7b-vllm
sparkrun statusCtrl+C detaches from logs — it never kills your inference job. Your model keeps serving.
See the full CLI reference for all commands and options.
- Multi-runtime — vLLM, SGLang, llama.cpp out of the box
- Multi-node tensor parallelism —
--tp 2= 2 hosts, automatic InfiniBand/RDMA detection - VRAM estimation — know if your model fits before you launch (
sparkrun show <recipe>) - Git-based recipe registries — we publish official recipes, community recipes, and benchmarked recipes via Spark Arena, plus you can add your own registries.
- Guided setup wizard — cluster creation, SSH mesh, CX7 auto-detection, sudoers, earlyoom
- Model & container distribution — syncs models and images to cluster nodes over SSH automatically
Spark Arena is the community hub for DGX Spark recipe benchmarks — browse benchmark results, then run them directly with sparkrun.
Official Recipes are maintained by the Spark Arena team and hosted on GitHub. They are tested and optimized for NVIDIA DGX Spark systems.
Community Recipes are contributed by the community and hosted on GitHub.
Apache License 2.0 — see LICENSE for details.