diff --git a/k8s/README.md b/k8s/README.md new file mode 100644 index 000000000..be1a26248 --- /dev/null +++ b/k8s/README.md @@ -0,0 +1,205 @@ +# NemoClaw on Kubernetes + +> **⚠️ Experimental**: This deployment method is intended for **trying out NemoClaw on Kubernetes**, not for production use. It requires a **privileged pod** running **Docker-in-Docker (DinD)** to create isolated sandbox environments. Operational requirements (storage, runtime, security policies) vary by cluster configuration. + +Run [NemoClaw](https://github.com/NVIDIA/NemoClaw) on Kubernetes with GPU inference powered by [Dynamo](https://github.com/ai-dynamo/dynamo) or any OpenAI-compatible endpoint. + +--- + +## Quick Start + +### Prerequisites + +- Kubernetes cluster with `kubectl` access +- An OpenAI-compatible inference endpoint (Dynamo vLLM, vLLM, etc.) +- Permissions to create **privileged pods** (required for Docker-in-Docker) +- Sufficient node resources (~8GB memory, 2 CPUs for DinD container) + +### 1. Deploy NemoClaw + +```bash +kubectl create namespace nemoclaw +kubectl apply -f https://raw.githubusercontent.com/NVIDIA/NemoClaw/main/k8s/nemoclaw-k8s.yaml +``` + +### 2. Check Logs + +```bash +kubectl logs -f nemoclaw -n nemoclaw -c workspace +``` + +Wait for "Onboard complete" message. + +### 3. Connect to Your Sandbox + +```bash +kubectl exec -it nemoclaw -n nemoclaw -c workspace -- nemoclaw my-assistant connect +``` + +You're now inside a secure sandbox with an AI agent ready to help. + +--- + +## Configuration + +Edit the environment variables in `nemoclaw-k8s.yaml` before deploying: + +| Variable | Required | Description | +|----------|----------|-------------| +| `DYNAMO_HOST` | Yes | Inference endpoint for socat proxy (e.g., `vllm-frontend.dynamo.svc:8000`) | +| `NEMOCLAW_ENDPOINT_URL` | Yes | URL the sandbox uses (usually `http://host.openshell.internal:8000/v1`) | +| `COMPATIBLE_API_KEY` | Yes | API key (use `dummy` for Dynamo/vLLM) | +| `NEMOCLAW_MODEL` | Yes | Model name (e.g., `meta-llama/Llama-3.1-8B-Instruct`) | +| `NEMOCLAW_SANDBOX_NAME` | No | Sandbox name (default: `my-assistant`) | + +### Example: Custom Endpoint + +```yaml +env: + - name: DYNAMO_HOST + value: "my-vllm.my-namespace.svc.cluster.local:8000" + - name: NEMOCLAW_ENDPOINT_URL + value: "http://host.openshell.internal:8000/v1" + - name: COMPATIBLE_API_KEY + value: "dummy" + - name: NEMOCLAW_MODEL + value: "mistralai/Mistral-7B-Instruct-v0.3" +``` + +--- + +## Using NemoClaw + +### Access the Workspace Shell + +```bash +kubectl exec -it nemoclaw -n nemoclaw -c workspace -- bash +``` + +### Check Sandbox Status + +```bash +kubectl exec nemoclaw -n nemoclaw -c workspace -- nemoclaw list +kubectl exec nemoclaw -n nemoclaw -c workspace -- nemoclaw my-assistant status +``` + +### Connect to Sandbox + +```bash +kubectl exec -it nemoclaw -n nemoclaw -c workspace -- nemoclaw my-assistant connect +``` + +### Test Inference + +From inside the sandbox: + +```bash +curl -s https://inference.local/v1/models + +curl -s https://inference.local/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{"model":"meta-llama/Llama-3.1-8B-Instruct","messages":[{"role":"user","content":"Hello!"}],"max_tokens":50}' +``` + +### Verify Local Inference + +Confirm NemoClaw is using your Dynamo/vLLM endpoint: + +```bash +# Check model from sandbox +kubectl exec -it nemoclaw -n nemoclaw -c workspace -- nemoclaw my-assistant connect +sandbox@my-assistant:~$ curl -s https://inference.local/v1/models +# Should show your model (e.g., meta-llama/Llama-3.1-8B-Instruct) + +# Compare with Dynamo directly (from workspace) +kubectl exec nemoclaw -n nemoclaw -c workspace -- curl -s http://localhost:8000/v1/models +# Should show the same model + +# Check provider configuration +kubectl exec nemoclaw -n nemoclaw -c workspace -- openshell inference get +# Shows: Provider: compatible-endpoint, Model: + +# Test the agent +sandbox@my-assistant:~$ openclaw agent --agent main -m "What is 7 times 8?" +# Should respond with 56 +``` + +--- + +## Architecture + +```text +┌─────────────────────────────────────────────────────────────────┐ +│ Kubernetes Cluster │ +│ │ +│ ┌───────────────────────────────────────────────────────────┐ │ +│ │ NemoClaw Pod │ │ +│ │ │ │ +│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │ +│ │ │ Docker-in-Docker│ │ Workspace Container │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ ┌───────────┐ │ │ nemoclaw CLI │ │ │ +│ │ │ │ k3s │ │◄───│ openshell CLI │ │ │ +│ │ │ │ cluster │ │ │ │ │ │ +│ │ │ │ │ │ │ socat proxy ───────────────│───│──┼──► Dynamo/vLLM +│ │ │ │ ┌───────┐ │ │ │ localhost:8000 │ │ │ +│ │ │ │ │Sandbox│ │ │ │ │ │ │ +│ │ │ │ └───────┘ │ │ │ host.openshell.internal │ │ │ +│ │ │ └───────────┘ │ │ routes to socat │ │ │ +│ │ └─────────────────┘ └─────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────┘ +``` + +**How it works:** + +1. NemoClaw runs in a privileged pod with Docker-in-Docker +2. OpenShell creates a nested k3s cluster for sandbox isolation +3. A socat proxy bridges K8s DNS to the nested environment +4. Inside the sandbox, `host.openshell.internal:8000` routes to the inference endpoint + +--- + +## Troubleshooting + +### Pod won't start + +```bash +kubectl describe pod nemoclaw -n nemoclaw +``` + +Common issues: + +- Missing privileged security context +- Insufficient memory (needs ~8GB for DinD) + +### Docker daemon not starting + +```bash +kubectl logs nemoclaw -n nemoclaw -c dind +``` + +Usually resolves after 30-60 seconds. + +### Inference not working + +Check socat is running: + +```bash +kubectl exec nemoclaw -n nemoclaw -c workspace -- pgrep -a socat +``` + +Test endpoint directly: + +```bash +kubectl exec nemoclaw -n nemoclaw -c workspace -- curl -s http://localhost:8000/v1/models +``` + +--- + +## Learn More + +- [NemoClaw Documentation](https://docs.nvidia.com/nemoclaw) +- [OpenShell](https://github.com/NVIDIA/OpenShell) +- [Dynamo](https://github.com/ai-dynamo/dynamo) +- [OpenClaw](https://openclaw.ai) diff --git a/k8s/nemoclaw-k8s.yaml b/k8s/nemoclaw-k8s.yaml new file mode 100644 index 000000000..edc8748cf --- /dev/null +++ b/k8s/nemoclaw-k8s.yaml @@ -0,0 +1,119 @@ +# NemoClaw on Kubernetes +# Uses official installer with Docker-in-Docker for sandbox isolation. +# Prerequisites: kubectl create namespace nemoclaw +apiVersion: v1 +kind: Pod +metadata: + name: nemoclaw + namespace: nemoclaw + labels: + app: nemoclaw +spec: + containers: + # Docker daemon (DinD) + - name: dind + image: docker:24-dind + securityContext: + privileged: true + env: + - name: DOCKER_TLS_CERTDIR + value: "" + command: ["dockerd", "--host=unix:///var/run/docker.sock"] + volumeMounts: + - name: docker-storage + mountPath: /var/lib/docker + - name: docker-socket + mountPath: /var/run + - name: docker-config + mountPath: /etc/docker + resources: + requests: + memory: "8Gi" + cpu: "2" + + # Workspace - runs official NemoClaw installer + - name: workspace + image: node:22 + command: + - bash + - -c + - | + set -e + + # Install packages + echo "[1/4] Installing packages..." + apt-get update -qq + apt-get install -y -qq docker.io socat curl >/dev/null 2>&1 + + # Start socat proxy for K8s DNS bridge + echo "[2/4] Starting socat proxy..." + socat TCP-LISTEN:8000,fork,reuseaddr TCP:$DYNAMO_HOST & + # Add hosts entry so validation can reach socat via host.openshell.internal + echo "127.0.0.1 host.openshell.internal" >> /etc/hosts + sleep 1 + + # Wait for Docker + echo "[3/4] Waiting for Docker daemon..." + for i in $(seq 1 30); do + if docker info >/dev/null 2>&1; then break; fi + sleep 2 + done + docker info >/dev/null 2>&1 || { echo "Docker not ready"; exit 1; } + echo "Docker ready" + + # Run official NemoClaw installer + echo "[4/4] Running NemoClaw installer..." + curl -fsSL https://nvidia.com/nemoclaw.sh | bash + + # Keep running after onboard + echo "Onboard complete. Container staying alive." + exec sleep infinity + env: + - name: DOCKER_HOST + value: unix:///var/run/docker.sock + # Dynamo endpoint (raw host:port for socat) - UPDATE THIS FOR YOUR CLUSTER + - name: DYNAMO_HOST + value: "vllm-agg-frontend.dynamo.svc.cluster.local:8000" + # NemoClaw config (uses host.openshell.internal via socat) + - name: NEMOCLAW_NON_INTERACTIVE + value: "1" + - name: NEMOCLAW_PROVIDER + value: "custom" + - name: NEMOCLAW_ENDPOINT_URL + value: "http://host.openshell.internal:8000/v1" + - name: COMPATIBLE_API_KEY + value: "dummy" + - name: NEMOCLAW_MODEL + value: "meta-llama/Llama-3.1-8B-Instruct" + - name: NEMOCLAW_SANDBOX_NAME + value: "my-assistant" + - name: NEMOCLAW_POLICY_MODE + value: "skip" + volumeMounts: + - name: docker-socket + mountPath: /var/run + - name: docker-config + mountPath: /etc/docker + resources: + requests: + memory: "4Gi" + cpu: "2" + + initContainers: + # Configure Docker daemon for cgroup v2 + - name: init-docker-config + image: busybox + command: ["sh", "-c", "echo '{\"default-cgroupns-mode\":\"host\"}' > /etc/docker/daemon.json"] + volumeMounts: + - name: docker-config + mountPath: /etc/docker + + volumes: + - name: docker-storage + emptyDir: {} + - name: docker-socket + emptyDir: {} + - name: docker-config + emptyDir: {} + + restartPolicy: Never