This guide explains how to create custom validation tests using the AI Cloud Validation framework without modifying the repository. You can use isvctl as a standalone tool with your own scripts and configuration files.
For the full configuration reference, see the Configuration Guide.
The AI Cloud Validation framework uses a step-based architecture:
Config (YAML) --> Script (any language) --> JSON output --> Validations (assertions)
- Scripts do the work - Launch VMs, create clusters, test APIs (Python, Bash, Go, etc.)
- Scripts output JSON - Structured results to stdout
- Validations check JSON - Built-in assertion classes verify the output
# From source
git clone git@github.com:NVIDIA/ai-cloud-validation.git
cd ai-cloud-validation
uv syncmy-validations/
├── config.yaml # Your validation config
├── scripts/
│ ├── provision.py # Setup script
│ └── teardown.py # Cleanup script
└── README.md
uv run isvctl test run -f config.yamlScripts must:
- Output valid JSON to stdout (this is captured and validated)
- Exit 0 for success, non-zero for failure
- Write logs/errors to stderr (only stdout is captured as JSON)
- Include
successandplatformfields in the JSON output
#!/usr/bin/env python3
"""Provision cloud resources."""
import argparse
import json
import sys
from typing import Any
def main() -> int:
parser = argparse.ArgumentParser(description="Provision cluster")
parser.add_argument("--name", required=True, help="Cluster name")
parser.add_argument("--region", default="us-west-2")
args = parser.parse_args()
result: dict[str, Any] = {
"success": False,
"platform": "kubernetes", # or "vm", "network", etc.
}
try:
# Your provisioning logic here
result["success"] = True
result["cluster_name"] = args.name
result["node_count"] = 3
result["endpoint"] = f"https://{args.name}.example.com"
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
result["error"] = str(e)
print(json.dumps(result, indent=2))
return 0 if result["success"] else 1
if __name__ == "__main__":
sys.exit(main())Bash, Go, Terraform wrappers, or any language works - as long as valid JSON goes to stdout. See the Configuration Guide for more script examples.
Here's a minimal working config:
version: "1.0"
commands:
myplatform:
phases: ["setup", "test", "teardown"]
steps:
- name: launch_instance
phase: setup
command: "python3 ./scripts/provision.py"
args: ["--name", "{{cluster_name}}", "--region", "{{region}}"]
timeout: 600
- name: teardown
phase: teardown
command: "python3 ./scripts/teardown.py"
args: ["--instance-id", "{{steps.launch_instance.instance_id}}"]
timeout: 300
tests:
platform: myplatform
cluster_name: "my-validation"
settings:
region: "us-west-2"
cluster_name: "test-cluster"
validations:
launch_checks:
step: launch_instance
checks:
- StepSuccessCheck: {}
- FieldExistsCheck:
fields: ["instance_id", "public_ip"]
- InstanceStateCheck:
expected_state: "running"
ssh_checks:
step: launch_instance
checks:
- ConnectivityCheck: {}
- GpuCheck:
expected_gpus: 1
teardown_checks:
step: teardown
checks:
- StepSuccessCheck: {}For full details on step options, template variables, schemas, and validation configuration, see the Configuration Guide.
Validations are grouped by meaningful category names. Set step at the group level or override per-check:
validations:
# Group-level step applies to all checks in the group
setup_checks:
step: setup
checks:
- StepSuccessCheck: {}
- ClusterHealthCheck: {}
# Per-check step overrides
mixed:
step: setup # default
checks:
- ClusterHealthCheck: {} # uses default step
- StepSuccessCheck:
step: teardown # overridesFor validation timing and phase control, see the Configuration Guide.
# Run all phases
isvctl test run -f config.yaml
# Verbose output
isvctl test run -f config.yaml -v
# Run specific phase
isvctl test run -f config.yaml --phase setup
# Run only teardown (cleanup from a previous run)
isvctl test run -f config.yaml --phase teardown
# Merge configs (later overrides earlier)
isvctl test run -f base.yaml -f overrides.yaml
# Filter validations with labels or advanced pytest args
isvctl test run -f config.yaml -- -k "ConnectivityCheck"
isvctl test run -f config.yaml --label gpu
isvctl test run -f config.yaml -- -m "not slow"
# Debug: full output on failure
isvctl test run -f config.yaml -v -- -s --tb=longTeardown behavior: By default, teardown runs even when setup or test validations fail, ensuring cloud resources are cleaned up. Individual teardown step failures don't block remaining teardown steps (best-effort execution).
- Always output valid JSON - Even on failure:
{"success": false, "error": "..."} - Log to stderr - Keep stdout clean for JSON
- Use settings for reusable values -
region,instance_type, etc. - Set appropriate timeouts - Account for cloud API latency
- Test scripts manually first - Run standalone to verify JSON output
- Keep teardown idempotent - Safe to re-run
- Never hardcode credentials - Use environment variables or IAM roles
Script output issues:
# Test script manually, verify valid JSON
python ./scripts/provision.py --name test 2>/dev/null | jq .Schema validation failures - check required fields per schema in the Configuration Guide.
SSH validation failures - ensure step output includes:
- Host:
public_ip,host, orssh_host - Key:
key_file,key_path, orssh_key_path(must exist, permissions 0600) - User:
userorssh_user(default:"ubuntu")
For common validation scenarios, don't write your config from scratch - the repo ships a ready-made scaffold:
- my-isv scaffold --
copy-and-fill-in stubs covering IAM, control-plane, VM, bare metal,
network, observability, image registry, security, k8s, and Slurm. Each
stub has a
TODO:block and a demo-mode fallback. - Test suite contracts -- per-step JSON-field breakdown for every domain.
- AWS reference - a working implementation for domains with AWS-backed reference scripts.
Preview the whole pipeline with no cloud:
make demo-test # sets ISVCTL_DEMO_MODE=1 and runs all my-isv configs (~10s)
# Domains are listed in the Makefile MY_ISV_DOMAINS variable.- Configuration Guide - Full config reference (steps, schemas, validations, templates)
- Validation Test Suites - Provider-agnostic test suites with step-by-step details
- AWS Reference Implementation - Working AWS examples for AWS-backed templates
- isvctl Package - CLI documentation
- Local Development - Development setup