-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat: add Kubernetes testing infrastructure #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7eec998
230409f
88511ce
502a418
2441905
ca5b5a1
412e6aa
01f944a
fcf85df
ac40a12
7c2b970
0de0910
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,205 @@ | ||
| # NemoClaw on Kubernetes | ||
|
|
||
| > **⚠️ Experimental**: This deployment method is intended for **trying out NemoClaw on Kubernetes**, not for production use. It requires a **privileged pod** running **Docker-in-Docker (DinD)** to create isolated sandbox environments. Operational requirements (storage, runtime, security policies) vary by cluster configuration. | ||
|
|
||
| Run [NemoClaw](https://github.com/NVIDIA/NemoClaw) on Kubernetes with GPU inference powered by [Dynamo](https://github.com/ai-dynamo/dynamo) or any OpenAI-compatible endpoint. | ||
|
|
||
| --- | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| - Kubernetes cluster with `kubectl` access | ||
| - An OpenAI-compatible inference endpoint (Dynamo vLLM, vLLM, etc.) | ||
| - Permissions to create **privileged pods** (required for Docker-in-Docker) | ||
| - Sufficient node resources (~8GB memory, 2 CPUs for DinD container) | ||
|
|
||
| ### 1. Deploy NemoClaw | ||
|
|
||
| ```bash | ||
| kubectl create namespace nemoclaw | ||
| kubectl apply -f https://raw.githubusercontent.com/NVIDIA/NemoClaw/main/k8s/nemoclaw-k8s.yaml | ||
| ``` | ||
|
|
||
| ### 2. Check Logs | ||
|
|
||
| ```bash | ||
| kubectl logs -f nemoclaw -n nemoclaw -c workspace | ||
| ``` | ||
|
|
||
| Wait for "Onboard complete" message. | ||
|
|
||
| ### 3. Connect to Your Sandbox | ||
|
|
||
| ```bash | ||
| kubectl exec -it nemoclaw -n nemoclaw -c workspace -- nemoclaw my-assistant connect | ||
| ``` | ||
|
|
||
| You're now inside a secure sandbox with an AI agent ready to help. | ||
|
|
||
| --- | ||
|
|
||
| ## Configuration | ||
|
|
||
| Edit the environment variables in `nemoclaw-k8s.yaml` before deploying: | ||
|
|
||
| | Variable | Required | Description | | ||
| |----------|----------|-------------| | ||
| | `DYNAMO_HOST` | Yes | Inference endpoint for socat proxy (e.g., `vllm-frontend.dynamo.svc:8000`) | | ||
| | `NEMOCLAW_ENDPOINT_URL` | Yes | URL the sandbox uses (usually `http://host.openshell.internal:8000/v1`) | | ||
| | `COMPATIBLE_API_KEY` | Yes | API key (use `dummy` for Dynamo/vLLM) | | ||
| | `NEMOCLAW_MODEL` | Yes | Model name (e.g., `meta-llama/Llama-3.1-8B-Instruct`) | | ||
| | `NEMOCLAW_SANDBOX_NAME` | No | Sandbox name (default: `my-assistant`) | | ||
|
|
||
| ### Example: Custom Endpoint | ||
|
|
||
| ```yaml | ||
| env: | ||
| - name: DYNAMO_HOST | ||
| value: "my-vllm.my-namespace.svc.cluster.local:8000" | ||
| - name: NEMOCLAW_ENDPOINT_URL | ||
| value: "http://host.openshell.internal:8000/v1" | ||
| - name: COMPATIBLE_API_KEY | ||
| value: "dummy" | ||
| - name: NEMOCLAW_MODEL | ||
| value: "mistralai/Mistral-7B-Instruct-v0.3" | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Using NemoClaw | ||
|
|
||
| ### Access the Workspace Shell | ||
|
|
||
| ```bash | ||
| kubectl exec -it nemoclaw -n nemoclaw -c workspace -- bash | ||
| ``` | ||
|
|
||
| ### Check Sandbox Status | ||
|
|
||
| ```bash | ||
| kubectl exec nemoclaw -n nemoclaw -c workspace -- nemoclaw list | ||
| kubectl exec nemoclaw -n nemoclaw -c workspace -- nemoclaw my-assistant status | ||
| ``` | ||
|
|
||
| ### Connect to Sandbox | ||
|
|
||
| ```bash | ||
| kubectl exec -it nemoclaw -n nemoclaw -c workspace -- nemoclaw my-assistant connect | ||
| ``` | ||
|
|
||
| ### Test Inference | ||
|
|
||
| From inside the sandbox: | ||
|
|
||
| ```bash | ||
| curl -s https://inference.local/v1/models | ||
|
|
||
| curl -s https://inference.local/v1/chat/completions \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"model":"meta-llama/Llama-3.1-8B-Instruct","messages":[{"role":"user","content":"Hello!"}],"max_tokens":50}' | ||
| ``` | ||
|
|
||
| ### Verify Local Inference | ||
|
|
||
| Confirm NemoClaw is using your Dynamo/vLLM endpoint: | ||
|
|
||
| ```bash | ||
| # Check model from sandbox | ||
| kubectl exec -it nemoclaw -n nemoclaw -c workspace -- nemoclaw my-assistant connect | ||
| sandbox@my-assistant:~$ curl -s https://inference.local/v1/models | ||
| # Should show your model (e.g., meta-llama/Llama-3.1-8B-Instruct) | ||
|
|
||
| # Compare with Dynamo directly (from workspace) | ||
| kubectl exec nemoclaw -n nemoclaw -c workspace -- curl -s http://localhost:8000/v1/models | ||
| # Should show the same model | ||
|
|
||
| # Check provider configuration | ||
| kubectl exec nemoclaw -n nemoclaw -c workspace -- openshell inference get | ||
| # Shows: Provider: compatible-endpoint, Model: <your-model> | ||
|
|
||
| # Test the agent | ||
| sandbox@my-assistant:~$ openclaw agent --agent main -m "What is 7 times 8?" | ||
| # Should respond with 56 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Architecture | ||
|
|
||
| ```text | ||
| ┌─────────────────────────────────────────────────────────────────┐ | ||
| │ Kubernetes Cluster │ | ||
| │ │ | ||
| │ ┌───────────────────────────────────────────────────────────┐ │ | ||
| │ │ NemoClaw Pod │ │ | ||
| │ │ │ │ | ||
| │ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │ | ||
| │ │ │ Docker-in-Docker│ │ Workspace Container │ │ │ | ||
| │ │ │ │ │ │ │ │ | ||
| │ │ │ ┌───────────┐ │ │ nemoclaw CLI │ │ │ | ||
| │ │ │ │ k3s │ │◄───│ openshell CLI │ │ │ | ||
| │ │ │ │ cluster │ │ │ │ │ │ | ||
| │ │ │ │ │ │ │ socat proxy ───────────────│───│──┼──► Dynamo/vLLM | ||
| │ │ │ │ ┌───────┐ │ │ │ localhost:8000 │ │ │ | ||
| │ │ │ │ │Sandbox│ │ │ │ │ │ │ | ||
| │ │ │ │ └───────┘ │ │ │ host.openshell.internal │ │ │ | ||
| │ │ │ └───────────┘ │ │ routes to socat │ │ │ | ||
| │ │ └─────────────────┘ └─────────────────────────────┘ │ │ | ||
| │ └───────────────────────────────────────────────────────────┘ │ | ||
| └─────────────────────────────────────────────────────────────────┘ | ||
| ``` | ||
|
|
||
| **How it works:** | ||
|
|
||
| 1. NemoClaw runs in a privileged pod with Docker-in-Docker | ||
| 2. OpenShell creates a nested k3s cluster for sandbox isolation | ||
| 3. A socat proxy bridges K8s DNS to the nested environment | ||
| 4. Inside the sandbox, `host.openshell.internal:8000` routes to the inference endpoint | ||
|
|
||
| --- | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Pod won't start | ||
|
|
||
| ```bash | ||
| kubectl describe pod nemoclaw -n nemoclaw | ||
| ``` | ||
|
|
||
| Common issues: | ||
|
|
||
| - Missing privileged security context | ||
| - Insufficient memory (needs ~8GB for DinD) | ||
|
|
||
| ### Docker daemon not starting | ||
|
|
||
| ```bash | ||
| kubectl logs nemoclaw -n nemoclaw -c dind | ||
| ``` | ||
|
|
||
| Usually resolves after 30-60 seconds. | ||
|
|
||
| ### Inference not working | ||
|
|
||
| Check socat is running: | ||
|
|
||
| ```bash | ||
| kubectl exec nemoclaw -n nemoclaw -c workspace -- pgrep -a socat | ||
| ``` | ||
|
|
||
| Test endpoint directly: | ||
|
|
||
| ```bash | ||
| kubectl exec nemoclaw -n nemoclaw -c workspace -- curl -s http://localhost:8000/v1/models | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Learn More | ||
|
|
||
| - [NemoClaw Documentation](https://docs.nvidia.com/nemoclaw) | ||
| - [OpenShell](https://github.com/NVIDIA/OpenShell) | ||
| - [Dynamo](https://github.com/ai-dynamo/dynamo) | ||
| - [OpenClaw](https://openclaw.ai) | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,119 @@ | ||||||||||||||||||||||||
| # NemoClaw on Kubernetes | ||||||||||||||||||||||||
| # Uses official installer with Docker-in-Docker for sandbox isolation. | ||||||||||||||||||||||||
| # Prerequisites: kubectl create namespace nemoclaw | ||||||||||||||||||||||||
| apiVersion: v1 | ||||||||||||||||||||||||
| kind: Pod | ||||||||||||||||||||||||
| metadata: | ||||||||||||||||||||||||
| name: nemoclaw | ||||||||||||||||||||||||
| namespace: nemoclaw | ||||||||||||||||||||||||
| labels: | ||||||||||||||||||||||||
| app: nemoclaw | ||||||||||||||||||||||||
| spec: | ||||||||||||||||||||||||
| containers: | ||||||||||||||||||||||||
|
Comment on lines
+11
to
+12
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: cat -n k8s/nemoclaw-k8s.yamlRepository: NVIDIA/NemoClaw Length of output: 4492 Disable automatic service-account token mount for this pod. The pod does not call the Kubernetes API—it only runs Docker-in-Docker and the NemoClaw installer. Mounting a token by default increases blast radius if compromised. Add 🔒 Proposed hardening spec:
+ automountServiceAccountToken: false
containers:📝 Committable suggestion
Suggested change
🧰 Tools🪛 Trivy (0.69.3)[error] 11-119: Default security context configured pod nemoclaw in nemoclaw namespace is using the default security context, which allows root privileges Rule: KSV-0118 (IaC/Kubernetes) 🤖 Prompt for AI Agents |
||||||||||||||||||||||||
| # Docker daemon (DinD) | ||||||||||||||||||||||||
| - name: dind | ||||||||||||||||||||||||
| image: docker:24-dind | ||||||||||||||||||||||||
| securityContext: | ||||||||||||||||||||||||
| privileged: true | ||||||||||||||||||||||||
| env: | ||||||||||||||||||||||||
| - name: DOCKER_TLS_CERTDIR | ||||||||||||||||||||||||
| value: "" | ||||||||||||||||||||||||
| command: ["dockerd", "--host=unix:///var/run/docker.sock"] | ||||||||||||||||||||||||
| volumeMounts: | ||||||||||||||||||||||||
| - name: docker-storage | ||||||||||||||||||||||||
| mountPath: /var/lib/docker | ||||||||||||||||||||||||
| - name: docker-socket | ||||||||||||||||||||||||
| mountPath: /var/run | ||||||||||||||||||||||||
| - name: docker-config | ||||||||||||||||||||||||
| mountPath: /etc/docker | ||||||||||||||||||||||||
| resources: | ||||||||||||||||||||||||
| requests: | ||||||||||||||||||||||||
| memory: "8Gi" | ||||||||||||||||||||||||
| cpu: "2" | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Workspace - runs official NemoClaw installer | ||||||||||||||||||||||||
| - name: workspace | ||||||||||||||||||||||||
| image: node:22 | ||||||||||||||||||||||||
| command: | ||||||||||||||||||||||||
| - bash | ||||||||||||||||||||||||
| - -c | ||||||||||||||||||||||||
| - | | ||||||||||||||||||||||||
| set -e | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Install packages | ||||||||||||||||||||||||
| echo "[1/4] Installing packages..." | ||||||||||||||||||||||||
| apt-get update -qq | ||||||||||||||||||||||||
| apt-get install -y -qq docker.io socat curl >/dev/null 2>&1 | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Start socat proxy for K8s DNS bridge | ||||||||||||||||||||||||
| echo "[2/4] Starting socat proxy..." | ||||||||||||||||||||||||
| socat TCP-LISTEN:8000,fork,reuseaddr TCP:$DYNAMO_HOST & | ||||||||||||||||||||||||
| # Add hosts entry so validation can reach socat via host.openshell.internal | ||||||||||||||||||||||||
| echo "127.0.0.1 host.openshell.internal" >> /etc/hosts | ||||||||||||||||||||||||
| sleep 1 | ||||||||||||||||||||||||
|
Comment on lines
+50
to
+53
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: find . -name "nemoclaw-k8s.yaml" -type fRepository: NVIDIA/NemoClaw Length of output: 82 🏁 Script executed: cat -n ./k8s/nemoclaw-k8s.yaml | head -80Repository: NVIDIA/NemoClaw Length of output: 3131 🏁 Script executed: cat -n ./k8s/nemoclaw-k8s.yaml | sed -n '40,70p'Repository: NVIDIA/NemoClaw Length of output: 1416 Fail fast if the socat bridge does not start. At line 50, 🛠️ Proposed reliability check- socat TCP-LISTEN:8000,fork,reuseaddr TCP:$DYNAMO_HOST &
+ : "${DYNAMO_HOST:?DYNAMO_HOST must be set as host:port}"
+ socat TCP-LISTEN:8000,fork,reuseaddr TCP:$DYNAMO_HOST &
+ SOCAT_PID=$!
# Add hosts entry so validation can reach socat via host.openshell.internal
echo "127.0.0.1 host.openshell.internal" >> /etc/hosts
sleep 1
+ kill -0 "$SOCAT_PID" 2>/dev/null || { echo "socat failed to start"; exit 1; }📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Wait for Docker | ||||||||||||||||||||||||
| echo "[3/4] Waiting for Docker daemon..." | ||||||||||||||||||||||||
| for i in $(seq 1 30); do | ||||||||||||||||||||||||
| if docker info >/dev/null 2>&1; then break; fi | ||||||||||||||||||||||||
| sleep 2 | ||||||||||||||||||||||||
| done | ||||||||||||||||||||||||
| docker info >/dev/null 2>&1 || { echo "Docker not ready"; exit 1; } | ||||||||||||||||||||||||
| echo "Docker ready" | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Run official NemoClaw installer | ||||||||||||||||||||||||
| echo "[4/4] Running NemoClaw installer..." | ||||||||||||||||||||||||
| curl -fsSL https://nvidia.com/nemoclaw.sh | bash | ||||||||||||||||||||||||
rwipfelnv marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Keep running after onboard | ||||||||||||||||||||||||
| echo "Onboard complete. Container staying alive." | ||||||||||||||||||||||||
| exec sleep infinity | ||||||||||||||||||||||||
| env: | ||||||||||||||||||||||||
| - name: DOCKER_HOST | ||||||||||||||||||||||||
| value: unix:///var/run/docker.sock | ||||||||||||||||||||||||
| # Dynamo endpoint (raw host:port for socat) - UPDATE THIS FOR YOUR CLUSTER | ||||||||||||||||||||||||
| - name: DYNAMO_HOST | ||||||||||||||||||||||||
| value: "vllm-agg-frontend.dynamo.svc.cluster.local:8000" | ||||||||||||||||||||||||
| # NemoClaw config (uses host.openshell.internal via socat) | ||||||||||||||||||||||||
| - name: NEMOCLAW_NON_INTERACTIVE | ||||||||||||||||||||||||
| value: "1" | ||||||||||||||||||||||||
| - name: NEMOCLAW_PROVIDER | ||||||||||||||||||||||||
| value: "custom" | ||||||||||||||||||||||||
| - name: NEMOCLAW_ENDPOINT_URL | ||||||||||||||||||||||||
| value: "http://host.openshell.internal:8000/v1" | ||||||||||||||||||||||||
| - name: COMPATIBLE_API_KEY | ||||||||||||||||||||||||
| value: "dummy" | ||||||||||||||||||||||||
| - name: NEMOCLAW_MODEL | ||||||||||||||||||||||||
| value: "meta-llama/Llama-3.1-8B-Instruct" | ||||||||||||||||||||||||
| - name: NEMOCLAW_SANDBOX_NAME | ||||||||||||||||||||||||
| value: "my-assistant" | ||||||||||||||||||||||||
| - name: NEMOCLAW_POLICY_MODE | ||||||||||||||||||||||||
| value: "skip" | ||||||||||||||||||||||||
| volumeMounts: | ||||||||||||||||||||||||
| - name: docker-socket | ||||||||||||||||||||||||
| mountPath: /var/run | ||||||||||||||||||||||||
| - name: docker-config | ||||||||||||||||||||||||
| mountPath: /etc/docker | ||||||||||||||||||||||||
| resources: | ||||||||||||||||||||||||
| requests: | ||||||||||||||||||||||||
| memory: "4Gi" | ||||||||||||||||||||||||
| cpu: "2" | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| initContainers: | ||||||||||||||||||||||||
| # Configure Docker daemon for cgroup v2 | ||||||||||||||||||||||||
| - name: init-docker-config | ||||||||||||||||||||||||
| image: busybox | ||||||||||||||||||||||||
| command: ["sh", "-c", "echo '{\"default-cgroupns-mode\":\"host\"}' > /etc/docker/daemon.json"] | ||||||||||||||||||||||||
| volumeMounts: | ||||||||||||||||||||||||
| - name: docker-config | ||||||||||||||||||||||||
| mountPath: /etc/docker | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| volumes: | ||||||||||||||||||||||||
| - name: docker-storage | ||||||||||||||||||||||||
| emptyDir: {} | ||||||||||||||||||||||||
| - name: docker-socket | ||||||||||||||||||||||||
| emptyDir: {} | ||||||||||||||||||||||||
| - name: docker-config | ||||||||||||||||||||||||
| emptyDir: {} | ||||||||||||||||||||||||
rwipfelnv marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| restartPolicy: Never | ||||||||||||||||||||||||
Uh oh!
There was an error while loading. Please reload this page.