Controller tool for ISV Lab cluster lifecycle orchestration.
isvctl is the unified tool for validating GPU clusters. It wraps around the
internal isvtest engine and provides:
- Setup: Run inventory stubs that query or setup clusters
- Test: Execute validation tests against the cluster
- Teardown: Clean up resources (runs by default, even after test failures; see teardown behavior)
# From workspace root
uv sync
# Verify installation
uv run isvctl --help# Validate a Kubernetes cluster
isvctl test run -f isvctl/configs/suites/k8s.yaml
# Validate a local MicroK8s
isvctl test run -f isvctl/configs/providers/microk8s.yaml
# Validate a local Minikube
isvctl test run -f isvctl/configs/providers/minikube.yaml
# Validate a local k3s
isvctl test run -f isvctl/configs/providers/k3s.yaml
# Validate a Slurm cluster
isvctl test run -f isvctl/configs/suites/slurm.yaml
# Create a provider scaffold
isvctl provider scaffold acme
# Check local readiness before a run
isvctl doctor -f isvctl/configs/suites/k8s.yaml
# Pass extra pytest args
isvctl test run -f isvctl/configs/suites/k8s.yaml -- -v -s -k "NodeCount"isvctl/
├── configs/
│ ├── suites/ # Provider-agnostic validation contracts (vm.yaml, bare_metal.yaml, ...)
│ ├── providers/ # Per-provider configs and scripts
│ │ ├── aws/
│ │ │ ├── config/ # AWS YAML bindings (import suite + supply commands)
│ │ │ └── scripts/ # AWS lifecycle scripts (boto3/Terraform implementations)
│ │ ├── my-isv/
│ │ │ ├── config/ # my-isv YAML bindings (copy-and-fill-in starting point)
│ │ │ └── scripts/ # my-isv lifecycle scripts (copy-and-fill-in stubs)
│ │ └── common/ # Shared scripts used across providers (NIM deploy/teardown)
│ └── overrides.yaml # Example override file for customizing any suite
├── schemas/ # JSON Schema for validation
├── scripts/ # Helper scripts
├── src/ # isvctl Python source
└── tests/ # Unit tests
Use isvctl doctor before longer runs or in CI to check local tools,
environment variables, and config files.
# Check tools, environment, and default config discovery
isvctl doctor
# Validate a config file before running it
isvctl doctor -f isvctl/configs/suites/k8s.yaml
# Require provider-specific checks such as AWS tools and credentials
isvctl doctor --provider aws -f isvctl/configs/providers/aws/config/control-plane.yaml
# Machine-readable output; use --strict to treat warnings as failures
isvctl doctor --json --strict# Full lifecycle: setup (query inventory) -> test -> teardown
isvctl test run -f isvctl/configs/suites/k8s.yaml
# Run only the test phase (skip inventory query)
isvctl test run -f isvctl/configs/suites/k8s.yaml --phase test
# Run only teardown (cleanup from a previous run)
isvctl test run -f isvctl/configs/suites/k8s.yaml --phase teardown
# Dry run - validate config without executing
isvctl test run -f isvctl/configs/suites/k8s.yaml --dry-run
# Verbose with pytest options
isvctl test run -f isvctl/configs/suites/k8s.yaml -- -v -s --tb=short# Base config + overrides
isvctl test run \
-f base.yaml \
-f overrides.yaml
# Override context values
isvctl test run -f config.yaml --set context.node_count=8# Check configuration syntax and schema
isvctl test validate -f isvctl/configs/suites/k8s.yamlSee Configuration Guide for full details.
version: "1.0"
commands:
kubernetes:
phases: ["setup", "test", "teardown"]
steps:
- name: setup
phase: setup
command: "my-isv/scripts/k8s/setup.sh" # replace "my-isv" with your ISV name
timeout: 120
- name: teardown
phase: teardown
command: "my-isv/scripts/k8s/teardown.sh" # replace "my-isv" with your ISV name
timeout: 30
tests:
platform: kubernetes
cluster_name: "{{steps.setup.cluster_name}}"
validations:
kubernetes:
checks:
K8sNodeCountCheck:
count: "{{steps.setup.kubernetes.node_count}}"
K8sGpuCapacityCheck:
expected_total: "{{steps.setup.kubernetes.total_gpus}}"Setup stubs must output JSON to stdout:
{
"platform": "kubernetes",
"cluster_name": "my-cluster",
"kubernetes": {
"driver_version": "580.95.05",
"node_count": 4,
"nodes": ["node1", "node2", "node3", "node4"],
"gpu_node_count": 4,
"gpu_per_node": 4,
"total_gpus": 16,
"gpu_operator_namespace": "nvidia-gpu-operator",
"runtime_class": "nvidia",
"gpu_resource_name": "nvidia.com/gpu"
}
}This output is validated and becomes the {{inventory.*}} available in templates.
Generate provider-specific lifecycle scripts with isvctl provider scaffold <your-isv-name>, then implement the TODO blocks under isvctl/configs/providers/<your-isv-name>/scripts/ (e.g. isvctl/configs/providers/acme/scripts/k8s/setup.sh). The providers directory contains:
isvctl/configs/providers/my-isv/scripts/- source template scripts for every scaffolded domainisvctl/configs/providers/aws/scripts/- fully-implemented AWS reference (follow its layout and JSON output contracts)isvctl/configs/providers/shared/- cross-provider YAML-invoked scripts (deploy_nim.py,teardown_nim.py)
Stubs can be written in any language. They must:
- Output valid JSON to stdout (for inventory/setup commands)
- Exit with code 0 on success, non-zero on failure
- Write logs/errors to stderr (not stdout)
#!/bin/bash
# setup.sh - Query real cluster
kubectl get nodes -o json | jq '{
platform: "kubernetes",
cluster_name: "my-cluster",
kubernetes: {
node_count: (.items | length),
nodes: [.items[].metadata.name],
total_gpus: ([.items[].status.capacity."nvidia.com/gpu" // 0 | tonumber] | add)
}
}'#!/usr/bin/env python3
import json
import subprocess
# Setup cluster using ISV provisioning tool
result = subprocess.run(["isv-tool", "setup", "--nodes", "4"], capture_output=True)
cluster_id = result.stdout.strip()
# Output inventory JSON
print(json.dumps({
"platform": "kubernetes",
"cluster_name": cluster_id,
"kubernetes": {
"node_count": 4,
"total_gpus": 16
}
}))See Remote Deployment Guide for full details.
# Deploy and run tests on remote machine
uv run isvctl deploy run 192.168.1.100 -u ubuntu -f isvctl/configs/suites/k8s.yaml
# With jumphost
uv run isvctl deploy run 192.168.1.100 -j jumphost.example.com -u ubuntu -f isvctl/configs/suites/k8s.yaml# Run tests
uv --directory=isvctl run pytest
# Run linter
uvx pre-commit run -a
# Regenerate JSON schemas from Pydantic models
uv --directory=isvctl run python scripts/check_schemas.py --generate