diff --git a/skills/usecase/agent-deploy.md b/skills/usecase/agent-deploy.md new file mode 100644 index 00000000..0548332a --- /dev/null +++ b/skills/usecase/agent-deploy.md @@ -0,0 +1,352 @@ +--- +name: agent-deploy +description: | + Deploy a confidential AI agent to Phala Cloud — a Claude Code wrapper, + Codex agent, MCP server, autonomous bot, anything with sealed API keys + and tool calls. Use when users want to ship an agent with credentials + sealed in a TEE and a verifiable Sign-RPC action log. +--- + +# Confidential AI Agent on Phala Cloud + +`phala deploy` an agent CVM with sealed credentials and a verifiable action log. + +## Operations + +| User says | Operation | +|---|---| +| "deploy an agent", "ship my agent", "上线 agent" | **First Deploy** | +| "scaffold a new agent", "create agent project" | **Scaffold** | +| "seal the API key", "credentials leak", "secrets" | **Seal Secrets** | +| "verify agent identity", "RA-TLS", "TDX quote" | **Verify Identity** | +| "audit tool calls", "Sign-RPC log", "what did the agent do" | **Action Log** | +| "deploy 10 agents", "fleet", "many agents" | **Multi-Agent Fleet** | + +This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first. + +--- + +## Scaffold + +A confidential agent is a Docker container that: + +1. Reads sealed credentials from env vars (decrypted only inside the CVM at boot) +2. Calls its tools (OpenAI, GitHub, Slack, etc.) from inside the TEE +3. Emits a Sign-RPC log of every tool call (signature chains to the TDX root) + +### Step 1: Project layout + +```bash +mkdir my-agent && cd my-agent +``` + +``` +my-agent/ +├── docker-compose.yml # CVM definition +├── .env.example # which sealed vars the agent expects +├── .env # local-only, gitignored (real secrets) +├── agent/ +│ ├── Dockerfile +│ ├── requirements.txt +│ └── main.py # the agent loop +└── README.md +``` + +### Step 2: `docker-compose.yml` + +```yaml +services: + agent: + image: ghcr.io//my-agent:latest # publicly pullable, OR set DSTACK_DOCKER_USERNAME/PASSWORD + restart: unless-stopped + environment: + - OPENAI_API_KEY=${OPENAI_API_KEY} + - GITHUB_TOKEN=${GITHUB_TOKEN} + - AGENT_NAME=${AGENT_NAME:-my-agent} + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock # for Sign-RPC + KMS access + ports: + - "8080:8080" +``` + +The `dstack.sock` mount gives the container access to: +- `dstack-guest-agent` for Sign-RPC signing +- KMS to derive per-app keys +- Attestation quotes on demand + +### Step 3: `.env.example` + +``` +OPENAI_API_KEY=sk-replace-me +GITHUB_TOKEN=ghp_replace-me +AGENT_NAME=my-agent +``` + +Commit this. Never commit the real `.env` — that goes to Phala via `-e`. + +--- + +## Seal Secrets + +The `phala deploy -e` flag seals env vars to the registered compose-hash. Stolen ciphertext is useless — keys only re-derive inside an attested CVM whose compose-hash matches. + +### Step 1: Local `.env` + +``` +OPENAI_API_KEY=sk-real-value-here +GITHUB_TOKEN=ghp_real-value-here +AGENT_NAME=my-trading-agent +``` + +### Step 2: Pass at deploy + +```bash +phala deploy -n my-agent -c docker-compose.yml -e .env --kms phala +``` + +Or inline: + +```bash +phala deploy -n my-agent -c docker-compose.yml \ + -e OPENAI_API_KEY=sk-... \ + -e GITHUB_TOKEN=ghp-... \ + --kms phala +``` + +`--kms phala` (default) seals to Phala's managed KMS. For ETH multi-sig gating, use `--kms ethereum` with `--private-key` and `--rpc-url`. + +--- + +## First Deploy + +### Step 1: Authenticate + +Per `../phala-cli/SKILL.md`: + +```bash +phala login +``` + +### Step 2: Pick instance type + +Most agents fit a small CPU TEE: + +```bash +phala instance-types +# pick tdx.small ($0.058/hr) for light agents +# tdx.medium for tool-heavy / memory-hungry agents +# h200.small ($3.50/hr) only if the agent runs local inference +``` + +### Step 3: Deploy + +```bash +phala deploy -n my-agent -c docker-compose.yml -e .env -t tdx.medium --kms phala --wait +``` + +`--wait` blocks until the CVM is ready (essential in CI). + +### Step 4: Link the directory + +```bash +phala link +git add phala.toml # safe to commit, contains no secrets +``` + +After `link`, all subsequent `phala` commands target this CVM without `-n`. + +### Step 5: Verify + +```bash +phala ps # containers running? +phala logs -f # agent output +phala cvms attestation # TDX quote — proves the CVM is genuine +``` + +--- + +## Verify Identity + +The CVM's identity is its compose-hash. Every Sign-RPC signature chains to the TDX root + this compose-hash. + +### Pull the attestation + +```bash +# Get the full cert chain + TDX quote +phala cvms attestation --json | jq '.app_certificates[0].quote' + +# Or summary form +phala cvms attestation +``` + +The response shape: + +```json +{ + "success": true, + "is_online": true, + "is_public": true, + "app_certificates": [ + { "subject": {...}, "issuer": {...}, "quote": "0400...", "app_id": "..." }, + ... + ] +} +``` + +The hex `quote` decodes into a TDX quote containing: +- `mrtd` — TDX measurement (the firmware identity) +- `rtmr0..3` — runtime measurements (kernel, initrd, compose hash) +- `report_data` — your app-specific binding + +### Verify offline + +For the full step-by-step verification flow (Intel TDX root + NVIDIA root + report-data binding + compose-hash + Sigstore provenance), follow `verify-attestation.md`. + +The minimum check: + +```bash +phala cvms attestation my-agent --json > attestation.json +QUOTE=$(jq -r '.app_certificates[0].quote' attestation.json) +curl -sX POST "https://cloud-api.phala.com/api/v1/attestations/verify" \ + -H "Content-Type: application/json" \ + -d "{\"hex\": \"$QUOTE\"}" | jq '.quote.verified' +# Expect: true +``` + +A passing quote + matching compose-hash = the running agent IS the build you registered. + +--- + +## Action Log + +Every tool call the agent makes can be signed via Sign-RPC inside the CVM. The signature chains to the per-app key derived from KMS. + +### From inside the agent (Python) + +```python +import socket, json + +def sign_action(payload: dict) -> str: + """Send to dstack-guest-agent over the Unix socket; receive signature.""" + s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) + s.connect("/var/run/dstack.sock") + s.sendall(json.dumps({"method": "sign", "params": payload}).encode()) + return json.loads(s.recv(4096))["signature"] + +# Wrap every tool call +sig = sign_action({"tool": "github.create_issue", "args_hash": "0xab12..."}) +``` + +### Read the log later + +The signed log is emitted to stdout (or your sink). Stream it: + +```bash +phala logs -f --since 1h | jq 'select(.sign_rpc)' +``` + +### Verify the log offline + +Each entry includes a signature that anyone can verify against the per-app pubkey (derived from the compose-hash). + +--- + +## Multi-Agent Fleet + +Deploy N parallel agents, each with their own compose-hash + sealed creds. They attest each other via mutual RA-TLS. + +### Step 1: Per-agent compose + +Each agent gets a slightly different `docker-compose.yml` (different image, different env). Different compose = different `compose-hash` = different identity. + +### Step 2: Deploy in a loop + +```bash +for AGENT in researcher coder triager; do + phala deploy -n $AGENT -c compose/$AGENT.yml -e env/$AGENT.env --wait +done +``` + +### Step 3: Mutual RA-TLS between them + +Each agent CVM gets a public endpoint shaped like +`https://-.` — the exact gateway domain +is per-cluster (e.g. `dstack-pha-prod12.phala.network`). Get it live: + +```bash +phala cvms get my-agent --json | jq -r '.endpoints[0].app' +``` + +Each cert carries the peer's TDX quote in an X.509 extension — TLS handshake +AND attestation in one handshake. + +### Step 4: List the fleet + +```bash +phala apps +# or +phala apps --search trading +``` + +--- + +## Common patterns + +### Wrap an existing CLI agent (Claude Code, Codex) + +The agent runs as a long-lived service that exposes a tool API on port 8080. The tool API: +1. Receives a task from a user +2. Resolves OpenAI / GitHub creds from sealed env +3. Calls Claude Code / Codex internally +4. Emits Sign-RPC log of every tool call + +Compose example: a `claude-code` image + an `nginx` reverse proxy with TLS. + +### MCP server + +Deploy any MCP server image (e.g., `bluenexus/mcp-search`) with mutual RA-TLS. Clients verify the server's TDX quote before sending requests. + +### Pre-launch script + +Need to download model weights or warm a cache before the agent starts? + +```bash +phala deploy ... --pre-launch-script ./bootstrap.sh +``` + +The script runs once inside the CVM after attestation, before containers start. + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| `manifest unknown` in serial logs | Image not pullable | Push to a public registry, OR set `DSTACK_DOCKER_USERNAME` + `DSTACK_DOCKER_PASSWORD` in `.env` | +| Container restarts immediately | Missing env var | Run `phala logs my-agent` — check for `KeyError` / `undefined` env | +| Agent can't reach `dstack.sock` | Volume not mounted | Add `- /var/run/dstack.sock:/var/run/dstack.sock` to compose | +| Sign-RPC returns 401 | KMS doesn't recognize compose-hash | Re-deploy — first deploy registers the hash | +| Tool calls failing intermittently | Outbound network blocked | Check serial logs (`phala logs --serial`) — `dstack-gateway` allowlist may need updating | + +For deeper debugging, see **Debug a CVM** in `../phala-cli/SKILL.md`. + +--- + +## Reference: minimal end-to-end + +```bash +# 1. Scaffold +mkdir my-agent && cd my-agent +# (write docker-compose.yml + .env) + +# 2. Auth + deploy +phala login +phala deploy -n my-agent -c docker-compose.yml -e .env -t tdx.medium --kms phala --wait + +# 3. Link + verify +phala link +phala cvms attestation --json > attestation.json +phala logs -f +``` + +Done — the agent is live, its credentials are sealed, every tool call is signed, and anyone can verify the binding offline. diff --git a/skills/usecase/cloud-migration.md b/skills/usecase/cloud-migration.md new file mode 100644 index 00000000..0c9db5a2 --- /dev/null +++ b/skills/usecase/cloud-migration.md @@ -0,0 +1,273 @@ +--- +name: cloud-migration +description: | + Migrate a confidential workload from AWS Nitro Enclaves, GCP + Confidential VMs, or Tinfoil to Phala Cloud. Use when users have an + existing TEE workload elsewhere and want to move it — covers auth, + compose adaptation, attestation diff, and cutover. +--- + +# Migrate to Phala Cloud + +Port a confidential workload to Phala from another TEE provider. + +## Operations + +| User says | Operation | +|---|---| +| "migrate from AWS Nitro" | **From AWS Nitro** | +| "migrate from GCP CC VM" | **From GCP** | +| "migrate from Tinfoil" | **From Tinfoil** | +| "general migration", "where to start" | **Diff Map** | +| "cutover", "DNS switch" | **Cutover** | + +This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first. + +--- + +## Diff Map + +| Concern | AWS Nitro Enclaves | GCP Confidential VM | Tinfoil | Phala Cloud | +|---|---|---|---|---| +| Hardware TEE | AWS Nitro hypervisor | AMD SEV-SNP / Intel TDX | TDX (managed) | Intel TDX + NVIDIA NV-CSE (GPU) | +| Container model | EIF (Enclave Image Format) | Standard VM, you bring TEE-aware images | Custom AMI | Standard `docker-compose.yml` | +| Auth | IAM roles + KMS | gcloud + IAM | proprietary CLI | `phala login` (device flow) | +| Deploy | `nitro-cli build-enclave` + `run-enclave` | `gcloud compute instances create --confidential-compute` | proprietary | `phala deploy -c docker-compose.yml` | +| Secrets | KMS + parent instance | Cloud KMS | proprietary | `phala deploy -e .env --kms phala` (sealed to compose-hash) | +| Attestation | NSM PCRs (PCR0/1/2) | `vTPM` quote | proprietary | TDX quote + on-chain registry | +| Verify offline | AWS-signed PCRs | GCP attestation library | trust the provider | DCAP → Intel root + `DstackApp.sol` (anyone can verify) | +| GPU TEE | not natively | not yet (preview) | yes (limited) | yes (H200 today, more SKUs coming) | +| Multi-party | bilateral DPAs | Confidential Space (workload identity) | n/a | multi-sig DstackApp + on-chain compose-hash | + +The biggest deltas: +1. **Compose vs custom image format** — dstack runs vanilla Docker Compose; no need to build an EIF or AMI. +2. **Sealed env vars** — Phala's `--kms phala` flag seals env to the compose-hash automatically. +3. **Verifiable offline** — Phala attestation chains to Intel/NVIDIA roots + on-chain registry; no need to trust the provider. + +--- + +## From AWS Nitro + +### Step 1: Convert EIF → Docker Compose + +Nitro EIFs are typically built from a `Dockerfile` already. Reuse the same Dockerfile, package as a compose service: + +```yaml +# was: nitro-cli build-enclave --docker-uri myapp:latest +# becomes: +services: + app: + image: myapp:latest + restart: unless-stopped + ports: + - "8080:8080" + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock + environment: + - YOUR_VARS=${YOUR_VARS} +``` + +### Step 2: Migrate KMS + +AWS KMS calls go to your provider's KMS. With Phala: + +```python +# was: boto3.client('kms').decrypt(CiphertextBlob=blob) +# becomes (inside the CVM): +from dstack_sdk import DstackClient +client = DstackClient() # /var/run/dstack.sock +key = client.get_key("aws-migration", compose_hash).decode_key() +plaintext = AESGCM(key).decrypt(nonce, ct, None) +``` + +The key is derived only after attestation passes — equivalent guarantee to KMS gating, but on-chain auditable. + +### Step 3: Migrate attestation verification + +```diff +- # AWS NSM PCR verification +- nsm-cli describe-pcr --index 0 +- # client checks PCR0 == expected_hash ++ # Phala attestation ++ phala cvms attestation my-app --json > attestation.json ++ # client checks: TDX quote → Intel root, mrtd matches expected, app_id matches +``` + +### Step 4: Deploy to Phala + +```bash +phala login +phala deploy -n my-app -c docker-compose.yml -e .env --kms phala --wait +phala cvms attestation my-app --json +``` + +### Step 5: Update client code + +Clients that previously verified Nitro PCRs now verify Phala attestation. Follow `verify-attestation.md` for the full flow (Intel TDX root + NVIDIA NRAS + report-data binding + compose-hash). Reference implementation: [`Phala-Network/private-ml-sdk/vllm-proxy/verifiers/attestation_verifier.py`](https://github.com/Phala-Network/private-ml-sdk/blob/main/vllm-proxy/verifiers/attestation_verifier.py). + +--- + +## From GCP + +### Step 1: Compose + +GCP Confidential VMs run regular VM images. Your TEE-aware service runs as a systemd unit or a Docker container. Move it to a compose: + +```yaml +services: + app: + image: gcr.io//:tag + # ... env, volumes, ports as before ... +``` + +GCR images work as long as they're publicly pullable, or use `DSTACK_DOCKER_USERNAME/PASSWORD` for private GCR. + +### Step 2: Migrate Cloud KMS calls + +Replace `gcloud kms decrypt` calls with `dstack-sdk` key derivation (see AWS section above). + +### Step 3: Migrate vTPM attestation + +GCP exposes a vTPM quote via `go-attestation`. Phala provides a TDX quote via `phala cvms attestation`. Both chain to a hardware root; the verification API differs: + +```diff +- # GCP go-attestation +- attest.NewClient(...).Attest(...) ++ # Phala ++ phala cvms attestation my-app --json +``` + +### Step 4: Deploy + +```bash +phala login +phala deploy -n my-app -c docker-compose.yml -e .env --kms phala --wait +``` + +For GPU workloads (which GCP doesn't yet support in CC mode), use `-t h200.small`. + +--- + +## From Tinfoil + +Tinfoil is closest in spirit to Phala — managed TDX, OpenAI-compatible inference for some flows. Migration is mostly endpoint swap + (optionally) self-deploy. + +### Inference users + +If you're calling Tinfoil's API: + +```diff +- base_url = "https://inference.tinfoil.sh/v1" ++ base_url = "https://api.redpill.ai/v1" +``` + +Model names may differ — check `https://redpill.ai/models`. + +### Custom-deploy users + +If you're running a custom container on Tinfoil: + +```bash +# was: tinfoil deploy --image myimage:tag +# becomes: +phala deploy -n my-app -c docker-compose.yml --kms phala --wait +``` + +The compose flow is more flexible than Tinfoil's single-image model — multi-service apps, sealed env, GPU TEE all work natively. + +--- + +## Cutover + +A safe cutover keeps both providers running until verification is solid. + +### Step 1: Deploy to Phala in parallel + +```bash +phala deploy -n my-app-phala -c docker-compose.yml -e .env --kms phala --wait +PHALA_URL=$(phala cvms get my-app-phala --json | jq -r '.endpoints[0]') +``` + +### Step 2: Shadow traffic + +Send 1-5% of production traffic to Phala. Compare: +- Latency (Phala TDX overhead is ~3-5%, GPU CC ~5-7%) +- Output equivalence (same model, same input → same output) +- Attestation availability (`/_phala/attestation` should respond on every request) + +### Step 3: Increase traffic gradually + +10% → 50% → 100% over a week. Monitor your metrics. + +### Step 4: Decommission + +Once 100% on Phala for a stable period: + +```bash +# AWS +nitro-cli terminate-enclave --enclave-id + +# GCP +gcloud compute instances delete + +# Tinfoil +tinfoil delete +``` + +### Step 5: Update DNS + +Point your customer-facing DNS to the Phala endpoint. Real format is +`-.` — pull it live from the CVM JSON: + +```bash +phala cvms get my-app-phala --json | jq -r '.endpoints[0].app' +# e.g. https://e029a4b8...-8080.dstack-pha-prod5.phala.network +``` + +``` +api.example.com. CNAME e029a4b8...-8080.dstack-pha-prod5.phala.network. +``` + +`dstack-gateway` can also bind a custom domain via the dashboard so the URL +doesn't expose the app_id. + +--- + +## Common gotchas + +| Provider | Gotcha | Mitigation | +|---|---|---| +| AWS Nitro | App was using parent-instance file system | Move to a Docker volume (Phala persists volumes by default) | +| AWS Nitro | Used IMDS for IAM creds | Switch to sealed env vars via `phala deploy -e` | +| GCP | Hardcoded `metadata.google.internal` | Replace with sealed env or `dstack-gateway`-routed config | +| Tinfoil | Tinfoil-specific signing extensions | Replace with Phala's Sign-RPC (or generic JWT signed by per-app key) | +| All | Outbound network was unrestricted | Phala routes egress via `dstack-gateway` — check allowlist for your destination domains | + +--- + +## Reference: typical migration + +```bash +# 1. Audit your existing workload +# - what TEE primitive (PCR / vTPM / proprietary)? +# - what KMS calls? +# - what is your verification flow? + +# 2. Adapt to Phala primitives +# - compose file from your existing Dockerfile +# - .env with secrets you used to fetch from KMS +# - swap attestation library to dstack-verifier + +# 3. Deploy in shadow +phala login +phala deploy -n my-app -c docker-compose.yml -e .env --kms phala --wait + +# 4. Verify equivalence +phala cvms attestation my-app --json +# A/B test endpoints + +# 5. Cutover +# DNS swap, decommission old enclave/VM +``` + +For provider-specific deep dives, see the comparison pages on `https://phala.com/compare/`. diff --git a/skills/usecase/data-coanalysis.md b/skills/usecase/data-coanalysis.md new file mode 100644 index 00000000..6708a10c --- /dev/null +++ b/skills/usecase/data-coanalysis.md @@ -0,0 +1,298 @@ +--- +name: data-coanalysis +description: | + Set up multi-party cohort analysis on Phala Cloud — multiple data owners + each seal datasets locally, then a sealed Analysis CVM joins them in + TDX+H200 memory under multi-sig DstackApp approval. Use for healthcare + consortia, financial risk, fraud detection, supply-chain audits — any + case where data must stay at source but compute happens jointly. +--- + +# Multi-Party Confidential Cohort Analysis + +Compute-to-data: sealed datasets stay at source, the model travels, multi-owner approval gates every key release. + +## Operations + +| User says | Operation | +|---|---| +| "set up multi-party analysis", "consortium", "data clean room" | **End-to-End** | +| "seal my dataset", "encrypt at source" | **Owner Sealing** | +| "register on-chain", "multi-sig approval" | **Register & Approve** | +| "deploy the analysis CVM" | **Deploy Analysis** | +| "differential privacy aggregate", "DP" | **DP Output** | +| "revoke", "stop the analysis" | **Revoke** | + +This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first. + +--- + +## Architecture (skim this once) + +``` +Owner A laptop Owner B laptop +[seal-cli encrypts ds-A.jsonl] [seal-cli encrypts ds-B.jsonl] + | | + +-> S3/IPFS ciphertext blobs <--------+ + | + v + Analysis CVM (TDX + H200) + - reads blobs + - calls KMS via RA-TLS for per-dataset keys + - joins, embeds, runs the model + - emits ONLY contract-allowed output (DP-aggregate) + | + v + DstackApp.sol (multi-sig) + - owners = [A, B, ...] are signers + - compose-hash added only after threshold met + - any owner can withdraw signature → halt all subsequent compute +``` + +Phala is not in the trust chain. Owners verify everything offline. + +--- + +## Owner Sealing + +Each data owner does this on their own laptop. Phala/operator never sees plaintext. + +### Step 1: Get the analysis compose-hash + +The analyst publishes the analysis `docker-compose.yml`. Each owner reviews it (it's small and reviewable; data is large and sensitive). The compose-hash is the contract. + +```bash +# Owner: clone the analyst's repo, review, then compute the hash +sha256sum docker-compose.yml +# 0xa3f2c1... (this is the compose-hash) +``` + +### Step 2: Get the analysis app-id + +After the analyst publishes the compose, the analyst registers it on `DstackApp.sol` (or shares the app-id): + +``` +APP_ID=app_d8e2f1... +COMPOSE_HASH=a3f2c1... +KMS_ROOT_PUBKEY=0x04abc... # from kms.phala.com or hardcoded in dstack +``` + +### Step 3: Seal the dataset locally + +```python +# seal-dataset.py +import os, sys, hashlib, hmac +from cryptography.hazmat.primitives.ciphers.aead import AESGCM + +OWNER_ID = sys.argv[1] # e.g. "hospital-a" +INPUT = sys.argv[2] # e.g. "ehr-data.jsonl" +OUTPUT = sys.argv[3] # e.g. "ehr-data.sealed" + +KMS = bytes.fromhex(os.environ["KMS_ROOT_PUBKEY"][2:]) +APP = os.environ["APP_ID"].encode() +HASH = os.environ["COMPOSE_HASH"].encode() +INFO = b"|".join([APP, HASH, OWNER_ID.encode()]) + +# HKDF-Expand simulation +key = hmac.new(KMS, INFO, hashlib.sha256).digest() +aes = AESGCM(key) +nonce = os.urandom(12) + +plaintext = open(INPUT, "rb").read() +ct = aes.encrypt(nonce, plaintext, INFO) +open(OUTPUT, "wb").write(nonce + ct) +print(f"Sealed {len(plaintext)} bytes -> {OUTPUT} ({len(ct) + 12} bytes)") +``` + +```bash +python seal-dataset.py hospital-a ehr-data.jsonl ehr-data.sealed +``` + +### Step 4: Publish the ciphertext + +Owners ship `*.sealed` blobs to a shared S3 bucket / IPFS / wherever the analysis CVM can read them. Plaintext never leaves the owner's machine. + +--- + +## Register & Approve + +### Step 1: Analyst registers the compose + +```bash +phala deploy -n cohort-analysis -c docker-compose.yml --kms ethereum \ + --private-key $ANALYST_KEY --rpc-url $ETH_RPC \ + --custom-app-id app_cohort_v1 --nonce 1 \ + --prepare-only +``` + +`--prepare-only` produces a commit token (no on-chain transaction yet — for multi-sig flow). + +### Step 2: Owners approve via multisig wallet + +Each owner uses Safe / Gnosis to approve the `DstackApp.addAllowedHash(compose_hash)` transaction. The DstackApp owner is configured as a multi-sig with each data owner as a signer. + +### Step 3: Once threshold met, commit + +```bash +phala deploy --commit --token $COMMIT_TOKEN --transaction-hash $TX +``` + +The compose-hash is now on-chain. The CVM can boot. + +--- + +## Deploy Analysis + +### Step 1: Compose with sealed data ingestion + +```yaml +services: + analysis: + image: ghcr.io//cohort-analysis:v1 + environment: + - APP_ID=${APP_ID} + - COMPOSE_HASH=${COMPOSE_HASH} + - SEALED_BLOBS=s3://cohort/sealed/ # paths owners published + - OWNER_LIST=hospital-a,hospital-b # whose keys to derive + volumes: + - /var/run/dstack.sock:/var/run/dstack.sock + deploy: + resources: + reservations: + devices: [{ driver: nvidia, count: all, capabilities: [gpu] }] +``` + +### Step 2: Inside the analysis container + +```python +# analysis.py — pip install dstack-sdk cryptography +import os +from dstack_sdk import DstackClient +from cryptography.hazmat.primitives.ciphers.aead import AESGCM + +client = DstackClient() # /var/run/dstack.sock +owners = os.environ["OWNER_LIST"].split(",") + +for owner in owners: + derived = client.get_key(f"cohort/{owner}", os.environ["COMPOSE_HASH"]) + key = derived.decode_key() # 32 bytes + blob = read_s3(f"s3://cohort/sealed/{owner}.sealed") + aes = AESGCM(key) + plaintext = aes.decrypt(blob[:12], blob[12:], None) + # join into a polars/pandas frame, run the model, etc. +``` + +The keys ONLY re-derive if the running compose-hash matches the on-chain registered hash — i.e. the verifier already passed. + +### Step 3: Deploy + +```bash +phala deploy -n cohort-analysis -c docker-compose.yml \ + -e APP_ID=app_cohort_v1 -e COMPOSE_HASH=$HASH \ + -t h200.small --kms ethereum \ + --private-key $ANALYST_KEY --rpc-url $ETH_RPC --listed --wait +``` + +`--listed` makes the CVM visible on the public Trust Center so owners can independently verify it's running their registered build. + +--- + +## DP Output + +The analysis container should emit ONLY contract-allowed output — typically: +- Aggregate statistics (mean, count, ratio) +- Differential-privacy-noised aggregates +- Embedding vectors (without row provenance) +- Signed labels (without the row's full features) + +Anything that could leak per-row provenance must be guarded by the compose itself. The compose is the contract owners reviewed. + +```python +# In analysis.py — emit only DP-aggregate +from diffprivlib import LaplaceMechanism +mech = LaplaceMechanism(epsilon=1.0, sensitivity=1) +result = { + "cohort_size": mech.randomise(len(joined_df)), + "mean_risk_score": mech.randomise(float(joined_df["risk"].mean())), +} + +# Bind the result into a TDX quote so anyone can verify it offline +import hashlib +report = hashlib.sha256(json.dumps(result, sort_keys=True).encode()).digest() +quote = client.get_quote(report) +print(json.dumps({**result, "quote": quote.quote, "compose_hash": compose_hash})) +``` + +--- + +## Revoke + +Any single owner can halt all subsequent compute by withdrawing their multi-sig approval on `DstackApp.sol`. + +```bash +# Owner's wallet: +DstackApp.removeAllowedHash(compose_hash) +``` + +The next time the Analysis CVM tries to refresh keys via KMS, the verifier sees the hash is no longer allowed, and the key derivation fails. In-flight compute that already has unwrapped data is in TDX memory only — it cannot persist anything to disk that's readable elsewhere. + +--- + +## Verify + +### Each owner runs locally: + +```bash +phala cvms attestation cohort-analysis --json > attestation.json +# Then run the full verification per verify-attestation.md: +# - TDX quote chains to Intel root (DCAP / Phala verify endpoint) +# - GPU NV-CSE quote chains to NVIDIA root (NRAS) +# - report_data binds to the signing key + your nonce +# - mr_config binds to the expected compose_hash +# - container images have Sigstore provenance from expected source repos +``` + +### Each owner checks DstackApp.sol: + +``` +DstackApp.allowedHashes(compose_hash) == true +DstackApp.owners() == [hospital-a-addr, hospital-b-addr, ...] +DstackApp.threshold() == 2 # or whatever k-of-n is in use +``` + +### Output verification + +The signed aggregate's signature must verify against the per-app pubkey — and that pubkey only exists if attestation + on-chain approval both passed. + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| Decrypt fails inside CVM | Compose-hash mismatch | Confirm `phala cvms get` returns the same compose-hash you sealed against | +| `--prepare-only` token expired | Not committed within window | Re-run `--prepare-only` and have owners approve quickly | +| KMS query 403 | On-chain `addAllowedHash` not yet confirmed | Wait for tx confirmation; `phala cvms restart` | +| Output too aggressive | DP epsilon too small or aggregate too narrow | Tune `epsilon`; coarsen the aggregation function | +| Owner can't decrypt their own dataset locally | Used wrong KMS pubkey | Pull from `kms.phala.com` per current epoch | + +--- + +## Reference: minimal end-to-end + +```bash +# Owner side (each owner) +python seal-dataset.py hospital-a ./ehr-data.jsonl ./ehr-data.sealed +aws s3 cp ehr-data.sealed s3://cohort/sealed/hospital-a.sealed + +# Analyst side +phala login +phala deploy -n cohort-analysis -c docker-compose.yml --kms ethereum \ + --private-key $ANALYST_KEY --rpc-url $ETH_RPC --prepare-only +# (owners approve in multisig wallet) +phala deploy --commit --token $TOKEN --transaction-hash $TX +phala cvms attestation cohort-analysis --json > attestation.json +phala logs -f # watch the join + DP-aggregate +``` + +The output is a DP-aggregate signed by an attested CVM whose compose-hash was multi-owner approved on-chain. Each owner can verify offline without trusting the analyst, the operator, or Phala. diff --git a/skills/usecase/dstack-self-host.md b/skills/usecase/dstack-self-host.md new file mode 100644 index 00000000..1c9dab4a --- /dev/null +++ b/skills/usecase/dstack-self-host.md @@ -0,0 +1,377 @@ +--- +name: dstack-self-host +description: | + Self-host the dstack control plane on your own bare-metal Intel TDX + hardware. Use when users need data residency, regulatory boundary + control, or want to run dstack outside Phala's managed cloud. Covers + building dstack-vmm / dstack-kms / dstack-gateway from source, using + the vmm-cli.py app deployer, and choosing an auth server (auth-simple + vs auth-eth on-chain). +--- + +# Self-Hosted dstack + +Run `dstack-vmm`, `dstack-kms`, and `dstack-gateway` on your own bare-metal Intel TDX hardware. App developers use `vmm-cli.py` to deploy CVMs to your dstack instance. + +## Operations + +| User says | Operation | +|---|---| +| "self-host dstack", "BYOH", "data residency" | **End-to-End** | +| "dev setup", "try locally" | **Dev Deployment** | +| "production setup" | **Production Deployment** | +| "deploy KMS", "auth-simple", "auth-eth" | **KMS + Auth Server** | +| "deploy Gateway", "TLS termination" | **Gateway** | +| "deploy an app to my dstack", "vmm-cli" | **App Deployment (vmm-cli.py)** | +| "compare to managed Phala" | **Self-host vs Managed** | + +> **Source of truth:** all canonical operator commands are in +> [github.com/Dstack-TEE/dstack/docs/deployment.md](https://github.com/Dstack-TEE/dstack/blob/master/docs/deployment.md) +> and the [VMM CLI User Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/vmm-cli-user-guide.md). +> This skill summarizes them — verify against the current release before +> running anything in production. + +> **Heads up:** there is **no `dstack` command-line tool**. Self-hosting +> means running the Rust binaries `dstack-vmm` / `dstack-kms` / +> `dstack-gateway` directly, plus `vmm-cli.py` for app management. This is +> separate from the npm-installed `phala` CLI used against managed Phala +> Cloud. + +--- + +## Self-host vs Managed + +| Aspect | Managed (`phala` CLI) | Self-Hosted (`dstack-vmm` + `vmm-cli.py`) | +|---|---|---| +| Hardware | Phala provides H200 + TDX hosts | You provide bare-metal TDX | +| Operator | Phala | You | +| Trust path | Same: TDX quote + on-chain registry | Same: TDX quote + on-chain registry | +| Best for | Most teams. Lower TCO. | Strict data residency, regulatory boundary, providers building their own confidential cloud | +| Open source | Yes (dstack runtime) | Yes (you run the same code) | +| App CLI | `phala deploy` | `./vmm-cli.py deploy` | + +The trust model is identical. The only real difference is who operates the hardware. + +--- + +## Hardware Prerequisites + +- Bare-metal TDX-capable server (Sapphire Rapids+ Xeon, BIOS TDX enabled). See [canonical/tdx](https://github.com/canonical/tdx) for the host setup. +- ≥16GB RAM, ≥100GB free disk +- Public IPv4 + DNS access +- Optional: NVIDIA H100 or Blackwell for GPU TEE workloads + +Verify TDX is active on the host: + +```bash +dmesg | grep -i tdx +``` + +--- + +## Dev Deployment + +For local development / testing only. **No security guarantees** — KMS runs in dev mode. + +### Step 1: Install build deps + +```bash +# Ubuntu 24.04 +sudo apt install -y build-essential chrpath diffstat lz4 wireguard-tools xorriso + +# Install Rust +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +``` + +### Step 2: Build host config + +```bash +git clone https://github.com/Dstack-TEE/meta-dstack.git --recursive +cd meta-dstack/ +mkdir build && cd build +../build.sh hostcfg +``` + +Edit the generated `build-config.sh`: + +| Variable | Description | +|---|---| +| `KMS_DOMAIN` | DNS domain for KMS RPC, e.g. `kms.example.com` | +| `GATEWAY_DOMAIN` | DNS domain for Gateway RPC, e.g. `gateway.example.com` | +| `GATEWAY_PUBLIC_DOMAIN` | Public base domain for app routing, e.g. `apps.example.com` | +| `CERTBOT_ENABLED` | `true` (for ACME via Cloudflare) | +| `CF_API_TOKEN` | your Cloudflare API token | + +```bash +vim build-config.sh +../build.sh hostcfg +../build.sh dl 0.5.5 # download guest image +``` + +### Step 3: Run components in separate terminals + +```bash +# Terminal 1: KMS +./dstack-kms -c kms.toml + +# Terminal 2: Gateway (needs sudo for port 443) +sudo ./dstack-gateway -c gateway.toml + +# Terminal 3: VMM +./dstack-vmm -c vmm.toml +``` + +VMM listens on `http://localhost:8080` by default. App deployers point `vmm-cli.py` at this URL. + +--- + +## Production Deployment + +Production runs KMS and Gateway each as their own CVMs, behind an auth server. The summary below tracks the canonical guide; see `docs/deployment.md` for the latest. + +### Production checklist + +1. Set up TDX host with `dstack-vmm` +2. Deploy KMS as CVM (with auth server; capture its attestation; allowlist its `mrAggregated` before bootstrap) +3. Deploy Gateway as CVM +4. Optional: Zero-Trust HTTPS, CT monitoring, multi-node, on-chain governance + +### Step 1: Build dstack-vmm + +```bash +git clone https://github.com/Dstack-TEE/dstack +cd dstack +cargo build --release -p dstack-vmm -p supervisor +mkdir -p vmm-data +cp target/release/dstack-vmm vmm-data/ +cp target/release/supervisor vmm-data/ +cd vmm-data/ +``` + +### Step 2: Configure VMM + +Create `vmm.toml`: + +```toml +address = "tcp:0.0.0.0:9080" +reuse = true +image_path = "./images" +run_path = "./run/vm" + +[cvm] +kms_urls = [] +gateway_urls = [] +cid_start = 30000 +cid_pool_size = 1000 + +[cvm.port_mapping] +enabled = true +address = "127.0.0.1" +range = [ + { protocol = "tcp", from = 1, to = 20000 }, + { protocol = "udp", from = 1, to = 20000 }, +] + +[host_api] +address = "vsock:2" +port = 10000 +``` + +Download guest images from [meta-dstack releases](https://github.com/Dstack-TEE/meta-dstack/releases) and extract them to `./images/`. Then start VMM: + +```bash +./dstack-vmm -c vmm.toml +``` + +--- + +## KMS + Auth Server + +Production KMS requires an **auth server** that validates boot requests via webhook. Two stock implementations: + +| Auth server | Use case | Config | +|---|---|---| +| `auth-simple` | Config-file whitelisting | JSON config file | +| `auth-eth` | On-chain governance via smart contracts | Ethereum RPC + contract | +| Custom | Your own logic | Implement the webhook interface | + +All auth servers expose: +- `GET /` — health +- `POST /bootAuth/app` — app boot authz +- `POST /bootAuth/kms` — KMS boot authz + +### auth-simple (config-based) + +Create `auth-config.json`: + +```json +{ + "osImages": ["0x"], + "kms": { + "mrAggregated": ["0x"], + "allowAnyDevice": true + }, + "apps": {} +} +``` + +Get the OS image hash: + +```bash +tar -xzf dstack-0.5.5.tar.gz +cat dstack-0.5.5/digest.txt +# 0b327bcd642788b0517de3ff46d31ebd3847b6c64ea40bacde268bb9f1c8ec83 +# prefix with 0x in the JSON +``` + +Run auth-simple: + +```bash +cd kms/auth-simple +bun install +PORT=3001 AUTH_CONFIG_PATH=/path/to/auth-config.json bun run start +``` + +> **Important:** an empty `kms.mrAggregated` allowlist is treated as deny-all +> for KMS. Capture the current KMS measurement with `Onboard.GetAttestationInfo` +> and add it before bootstrap, or KMS will refuse to onboard. + +### auth-eth (on-chain governance) + +Use this for decentralized governance — the allowlist lives in a smart contract instead of a JSON file. See [docs/onchain-governance.md](https://github.com/Dstack-TEE/dstack/blob/master/docs/onchain-governance.md). + +### Deploy KMS as CVM + +Production KMS runs inside its own CVM, NOT on the host: + +```bash +cd dstack/kms/dstack-app/ +# Use the deploy script matching your auth server (auth-simple vs auth-eth) +# Capture the KMS attestation info, allowlist its mrAggregated, then bootstrap +``` + +The exact script and bootstrap dance is version-specific — follow `docs/deployment.md` for the current release. + +--- + +## Gateway + +Gateway terminates public TLS and routes traffic to apps. It also runs as a CVM in production. + +Gateway config sets: +- Public domain (e.g. `apps.example.com`) +- ACME provider (Let's Encrypt via Cloudflare DNS-01) +- Authorization endpoint (your auth server) + +App URLs follow the shape `https://-.` — the same scheme as managed Phala (just with your domain). + +--- + +## App Deployment (vmm-cli.py) + +App developers (not operators) use `vmm-cli.py` against your VMM endpoint. + +### Install + configure + +```bash +# Get the script +curl -O https://raw.githubusercontent.com/Dstack-TEE/dstack/master/vmm-cli.py +chmod +x vmm-cli.py + +# Point at your VMM +export DSTACK_VMM_URL=http://your-vmm-host:8080 + +# (optional) auth +export DSTACK_VMM_AUTH_USER=username +export DSTACK_VMM_AUTH_PASSWORD=password + +./vmm-cli.py --help +``` + +Server URL precedence: CLI `--url` > `DSTACK_VMM_URL` > default `http://localhost:8080`. + +### Discover what's available + +```bash +./vmm-cli.py lsimage # available OS images +./vmm-cli.py lsgpu # available GPU slots +./vmm-cli.py lsvm # current VMs (basic) +./vmm-cli.py lsvm -v # detailed (vCPU, memory, image, GPUs) +``` + +### Deploy an app (two-step) + +```bash +# Step 1: build the app-compose.json from your docker-compose +./vmm-cli.py compose \ + --name "my-web-app" \ + --docker-compose ./docker-compose.yml \ + --output ./app-compose.json + +# Step 2: deploy to a VM +./vmm-cli.py deploy --app-compose ./app-compose.json [other flags per --help] +``` + +### VM lifecycle + +```bash +./vmm-cli.py start +./vmm-cli.py stop # graceful +./vmm-cli.py stop -f # force +./vmm-cli.py logs # last 20 lines +./vmm-cli.py logs -n 50 # last 50 +./vmm-cli.py logs -f # stream +./vmm-cli.py remove # permanent — wipes data +``` + +### KMS key management + +```bash +./vmm-cli.py kms list +./vmm-cli.py kms add +./vmm-cli.py kms remove +``` + +For full reference: [VMM CLI User Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/vmm-cli-user-guide.md). + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| `dstack-vmm` won't start | TDX not enabled in BIOS | Reboot, enable TDX, check `dmesg \| grep -i tdx` | +| KMS rejects bootstrap | `mrAggregated` not in allowlist | Capture KMS measurement via `Onboard.GetAttestationInfo`, add to `auth-config.json` | +| Gateway 5xx after first boot | ACME cert not yet issued | Wait 1-3 min on first start (DNS-01 challenge) | +| `vmm-cli.py` connection refused | `DSTACK_VMM_URL` wrong | Confirm VMM listens on `0.0.0.0:8080` (not just `127.0.0.1`) | +| App deploy fails with "image hash not allowlisted" | OS image not in `auth-config.json` | Add the image's `digest.txt` hash with `0x` prefix | + +--- + +## Reference: minimal end-to-end (dev) + +```bash +# Operator +git clone https://github.com/Dstack-TEE/meta-dstack.git --recursive +cd meta-dstack && mkdir build && cd build +../build.sh hostcfg +vim build-config.sh # set domains, CF token +../build.sh hostcfg +../build.sh dl 0.5.5 + +# Run in separate terminals +./dstack-kms -c kms.toml +sudo ./dstack-gateway -c gateway.toml +./dstack-vmm -c vmm.toml + +# App developer (separate machine) +export DSTACK_VMM_URL=http://operator-host:8080 +curl -O https://raw.githubusercontent.com/Dstack-TEE/dstack/master/vmm-cli.py +./vmm-cli.py compose --name my-app --docker-compose ./docker-compose.yml --output ./app-compose.json +./vmm-cli.py deploy --app-compose ./app-compose.json +./vmm-cli.py lsvm +``` + +The same `docker-compose.yml` ships unchanged between managed Phala and self-hosted dstack — the trust path is identical. + +For production deployment (KMS as CVM, auth server, on-chain governance), follow [docs/deployment.md](https://github.com/Dstack-TEE/dstack/blob/master/docs/deployment.md) line-by-line — version-specific bootstrap details change between releases. diff --git a/skills/usecase/gpu-tee-custom.md b/skills/usecase/gpu-tee-custom.md new file mode 100644 index 00000000..6ed74b2b --- /dev/null +++ b/skills/usecase/gpu-tee-custom.md @@ -0,0 +1,278 @@ +--- +name: gpu-tee-custom +description: | + Deploy any custom workload to a Phala Cloud GPU TEE — Jupyter notebooks, + custom training scripts, computer vision pipelines, scientific compute. + Generic recipe for getting a Docker image running on H200 with TDX + + NVIDIA CC attestation. For LLM serving see gpu-vllm-deploy. For + fine-tuning see training-run. +--- + +# Custom Workload on Phala GPU TEE + +`phala deploy` any Docker image onto an H200 GPU with full TDX + NVIDIA CC attestation. + +## Operations + +| User says | Operation | +|---|---| +| "deploy my notebook", "Jupyter", "research env" | **Jupyter Notebook** | +| "run inference", "computer vision pipeline", "custom workload" | **First Deploy** | +| "list GPUs", "what instances", "pricing" | **Instance Types** | +| "scale to 8 GPU" | **Multi-GPU** | +| "verify GPU CC", "is the GPU sealed" | **Verify GPU CC** | +| "SSH into GPU" | **SSH** | +| "GPU support for H100", "B300" | **Availability** | + +This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first. + +--- + +## Instance Types + +Run live: + +```bash +phala instance-types +``` + +Current GPU types (as of writing): + +| ID | GPUs | vCPU | RAM | Hourly | +|---|---|---|---|---| +| `h200.small` | 1× H200 SXM 141GB | 24 | 192 GB | $3.50 | +| `h200.16xlarge` | 8× H200 SXM 141GB | 64 | 256 GB | $23.04 | +| `h200.8x.large` | 8× H200 SXM 141GB | 192 | 1.5 TB | $23.04 | + +Pick `h200.small` for single-GPU workloads (most fine-tuning, single-tenant inference). Pick the 8× variants for multi-GPU training, large-model inference, or heavy CPU/RAM needs. + +### Availability + +The CLI lists what's actually deployable in your workspace. **H100 and B300 SKUs may appear on the marketing site but not the CLI** depending on current capacity. Run `phala instance-types` for ground truth. For other regions / hardware, contact Phala sales. + +--- + +## Jupyter Notebook + +The fastest path to "I can run code in a GPU TEE." + +### Step 1: `docker-compose.yml` + +```yaml +services: + jupyter: + image: quay.io/jupyter/scipy-notebook:cuda-latest + restart: unless-stopped + environment: + - JUPYTER_TOKEN=${JUPYTER_TOKEN} + ports: + - "8888:8888" + volumes: + - work:/home/jovyan/work + - /var/run/dstack.sock:/var/run/dstack.sock + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu] +volumes: + work: +``` + +### Step 2: `.env` + +``` +JUPYTER_TOKEN=pick-a-strong-token +``` + +### Step 3: Deploy + +```bash +phala deploy -n my-jupyter -c docker-compose.yml -e .env -t h200.small --kms phala --wait +``` + +### Step 4: Open the notebook + +```bash +# URL shape: https://-. +# Get it live from the CVM: +phala cvms get my-jupyter --json | jq -r '.endpoints[] | select(.app | contains("-8888.")) | .app' +# Open the URL in your browser, paste your JUPYTER_TOKEN to log in. +``` + +--- + +## First Deploy (Generic) + +For any custom Docker image: + +### Step 1: Compose template + +```yaml +services: + workload: + image: /: # publicly pullable + restart: unless-stopped + environment: + - YOUR_VAR=${YOUR_VAR} + volumes: + - data:/data + - /var/run/dstack.sock:/var/run/dstack.sock + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu] +volumes: + data: +``` + +### Step 2: Build + push your image + +```bash +docker build -t ghcr.io//:v1 . +docker push ghcr.io//:v1 +``` + +For private registries, set `DSTACK_DOCKER_USERNAME` and `DSTACK_DOCKER_PASSWORD` in your `.env`. + +### Step 3: Deploy + +```bash +phala deploy -n my-workload -c docker-compose.yml -e .env -t h200.small --kms phala --wait +``` + +### Step 4: Verify + +```bash +phala ps # is your container running? +phala logs -f # output +phala cvms attestation # TDX + GPU NV-CSE quote +``` + +--- + +## Multi-GPU + +Tensor-parallel or data-parallel across 8× H200: + +### Compose changes + +```yaml +services: + workload: + # ... same as above ... + command: > + torchrun + --nproc_per_node=8 + train.py +``` + +### Deploy + +```bash +phala deploy -n my-workload -c docker-compose.yml -e .env -t h200.16xlarge --kms phala --wait +``` + +`h200.16xlarge` and `h200.8x.large` both give 8× H200; pick `8x.large` for larger host CPU/RAM (192 vCPU, 1.5 TB RAM). + +--- + +## Verify GPU CC + +NVIDIA Confidential Computing seals the GPU memory. Confirm it's active: + +### Inside the CVM + +```bash +phala ssh +nvidia-smi conf-compute -q +# Look for: ConfComputeMode : ON +``` + +### From the attestation + +```bash +phala cvms attestation my-workload --json | jq '.app_certificates[0].quote' +``` + +The hex `quote` decodes into a combined TDX + GPU NV-CSE attestation. For the full verification flow (Intel TDX + NVIDIA NRAS + report-data binding + compose-hash), follow `verify-attestation.md`. Offline-only path: `dcap-qvl` for the TDX layer + `nvattest-verifier` for the NVIDIA layer. + +For a full reference on parsing the combined TDX+NVIDIA quote, see `https://docs.phala.com/phala-cloud/confidential-ai/verify/verify-attestation`. + +--- + +## SSH + +Useful for debugging GPU-specific issues. + +```bash +phala ssh +# inside the CVM: +nvidia-smi # check GPU state +nvidia-smi conf-compute -q # check CC mode +docker ps # what's running +docker logs # container logs +``` + +Run a single command without an interactive shell: + +```bash +phala ssh -- nvidia-smi +phala ssh -- docker stats --no-stream +``` + +Port-forward a custom port back to your laptop: + +```bash +phala ssh -- -L 8088:localhost:8088 +``` + +--- + +## Common Docker images + +| Use case | Image | Notes | +|---|---|---| +| Jupyter + PyTorch | `quay.io/jupyter/scipy-notebook:cuda-latest` | Pre-installed CUDA, scipy, sklearn | +| PyTorch dev | `nvcr.io/nvidia/pytorch:24.10-py3` | NVIDIA's official, CUDA 12.x | +| TensorFlow | `tensorflow/tensorflow:latest-gpu` | TF 2.x with GPU support | +| Ollama | `ollama/ollama:latest` | Local LLM serving (alternative to vLLM) | +| ComfyUI / Stable Diffusion | `yanwk/comfyui-boot:latest` | SD pipeline | +| Whisper / TTS | `onerahmet/openai-whisper-asr-webservice` | ASR endpoints | + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| Container reports `no CUDA device` | GPU passthrough missing | Add the `deploy.resources.reservations.devices` block to compose | +| OOM during model load | Model too big for 1 H200 (141 GB VRAM) | Move to `h200.16xlarge` (8× 141 = 1.1 TB total) | +| Jupyter login loop | Wrong token | `phala logs` to find the auto-generated token, or set `JUPYTER_TOKEN` env explicitly | +| `manifest unknown` on deploy | Image not public | Push to public registry OR add `DSTACK_DOCKER_USERNAME/PASSWORD` to env | +| Slow `apt-get` / `pip` | dstack-gateway egress | Pre-bake all dependencies into the Docker image | +| `nvidia-smi conf-compute -q` says OFF | Host config | Open a Phala support ticket | + +--- + +## Reference: minimal end-to-end + +```bash +# 1. Scaffold + build +docker build -t ghcr.io/me/my-workload:v1 ./workload +docker push ghcr.io/me/my-workload:v1 + +# 2. Deploy on 1× H200 +phala login +phala deploy -n my-workload -c docker-compose.yml -t h200.small --kms phala --wait + +# 3. Verify + use +phala cvms attestation --json | jq +phala ssh -- nvidia-smi conf-compute -q +phala cvms get my-workload --json | jq '.endpoints' +``` diff --git a/skills/usecase/gpu-vllm-deploy.md b/skills/usecase/gpu-vllm-deploy.md new file mode 100644 index 00000000..4fb69c86 --- /dev/null +++ b/skills/usecase/gpu-vllm-deploy.md @@ -0,0 +1,275 @@ +--- +name: gpu-vllm-deploy +description: | + Deploy vLLM (or any OpenAI-compatible LLM server) onto a Phala Cloud + GPU TEE — Llama, Qwen, DeepSeek, Mistral, etc. Use when users want + self-hosted private inference on H200 with verifiable attestation + and an OpenAI-compatible endpoint at /v1/chat/completions. +--- + +# Self-Hosted vLLM on Phala GPU TEE + +`phala deploy` an OpenAI-compatible inference server inside a confidential H200 GPU. + +## Operations + +| User says | Operation | +|---|---| +| "deploy vLLM", "run my own LLM", "self-host inference" | **First Deploy** | +| "load Llama", "load Qwen", "switch model" | **Choose Model** | +| "scale to 8 GPU", "tensor parallel" | **Multi-GPU** | +| "verify GPU TEE", "is the GPU in CC mode" | **Verify GPU CC** | +| "private model weights", "seal weights" | **Seal Weights** | +| "OpenAI client can't connect" | **Endpoint** | + +This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first. + +--- + +## Choose Model + +vLLM supports any HuggingFace model. Common choices for confidential workloads: + +| Model | HF ID | VRAM | Fits on | +|---|---|---|---| +| Llama 3.1 8B Instruct | `meta-llama/Llama-3.1-8B-Instruct` | 16 GB | `h200.small` | +| Llama 3.1 70B Instruct | `meta-llama/Llama-3.1-70B-Instruct` | 140 GB | `h200.small` (FP8) or `h200.16xlarge` (FP16) | +| Qwen 2.5 7B Instruct | `Qwen/Qwen2.5-7B-Instruct` | 16 GB | `h200.small` | +| DeepSeek V3 0324 | `deepseek-ai/DeepSeek-V3-0324` | ~600 GB | `h200.16xlarge` (8× H200, FP8) | +| GPT-OSS 120B | `openai/gpt-oss-120b` | ~240 GB | `h200.16xlarge` (FP8) | +| Gemma 3 27B | `google/gemma-3-27b-it` | 54 GB | `h200.small` | + +Confirm available types: + +```bash +phala instance-types +``` + +--- + +## Scaffold + +### Step 1: Project layout + +```bash +mkdir my-llm && cd my-llm +``` + +``` +my-llm/ +├── docker-compose.yml +├── .env.example # HF_TOKEN +└── .env # gitignored +``` + +### Step 2: `docker-compose.yml` + +```yaml +services: + vllm: + image: vllm/vllm-openai:latest + restart: unless-stopped + environment: + - HF_TOKEN=${HF_TOKEN} + - VLLM_API_KEY=${VLLM_API_KEY:-sk-local} + volumes: + - hf-cache:/root/.cache/huggingface + - /var/run/dstack.sock:/var/run/dstack.sock + ports: + - "8000:8000" + command: > + --model meta-llama/Llama-3.1-8B-Instruct + --dtype auto + --max-model-len 8192 + --api-key $${VLLM_API_KEY} +volumes: + hf-cache: +``` + +Notes: +- `vllm/vllm-openai:latest` exposes `/v1/chat/completions` and `/v1/completions` on port 8000. +- The `HF_TOKEN` env (sealed via `phala deploy -e`) lets vLLM pull gated models from Hugging Face. +- `VLLM_API_KEY` protects the endpoint — anyone hitting it must know the key. +- For GPU CC mode confirmation, see **Verify GPU CC** below. + +### Step 3: `.env` + +``` +HF_TOKEN=hf_real_token_here +VLLM_API_KEY=sk-pick-something-secret +``` + +--- + +## First Deploy + +### Step 1: Pick GPU shape + +| Use case | Instance type | vCPU / RAM | Notes | +|---|---|---|---| +| 7-13B model, single user | `h200.small` (1× H200) | 24 / 192 GB | $3.50/hr | +| 70B FP16, batch | `h200.16xlarge` (8× H200) | 64 / 256 GB | $23/hr — needs `--tensor-parallel-size 8` | +| 70B FP16, GPU-heavy throughput | `h200.8x.large` (8× H200) | 192 / 1.5 TB | $23/hr — same GPUs, more host CPU/RAM | + +```bash +phala deploy -n my-llm -c docker-compose.yml -e .env -t h200.small --kms phala --wait +``` + +### Step 2: Get the endpoint + +```bash +phala cvms get my-llm --json | jq '.endpoints' +# Each endpoint has shape: { "app": "https://-.", "instance": "..." } +``` + +The URL format is `https://-.` — the +gateway base domain is per-cluster (e.g. `dstack-pha-prod12.phala.network`), +NOT a global `dstack.phala.network`. Always pull it live from the CVM JSON. + +### Step 3: Test the endpoint + +```bash +ENDPOINT=$(phala cvms get my-llm --json | jq -r '.endpoints[] | select(.app | contains("-8000.")) | .app') +curl $ENDPOINT/v1/chat/completions \ + -H "Authorization: Bearer sk-pick-something-secret" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "meta-llama/Llama-3.1-8B-Instruct", + "messages": [{"role": "user", "content": "Hello"}] + }' +``` + +--- + +## Multi-GPU + +For 70B models or higher throughput, spread across 8× H200. + +### Step 1: Update compose + +```yaml +services: + vllm: + image: vllm/vllm-openai:latest + # ... env, volumes, ports same as above ... + command: > + --model meta-llama/Llama-3.1-70B-Instruct + --tensor-parallel-size 8 + --dtype auto + --max-model-len 16384 + --api-key $${VLLM_API_KEY} +``` + +### Step 2: Deploy on the 8× H200 instance + +```bash +phala deploy -n my-llm-70b -c docker-compose.yml -e .env -t h200.16xlarge --kms phala --wait +``` + +vLLM auto-shards the model across all 8 GPUs. + +--- + +## Verify GPU CC + +### From inside the CVM + +```bash +phala ssh +nvidia-smi conf-compute -q +# Expect: ConfComputeMode : ON +``` + +### Full attestation flow + +For the complete verification (NVIDIA NRAS + Intel TDX + report-data binding + compose-hash + Sigstore), follow `verify-attestation.md`. + +The minimum check: + +```bash +phala cvms attestation my-llm --json > attestation.json +# The .app_certificates[0].quote contains the combined TDX + GPU NV-CSE attestation. +# Verify with Phala's online endpoint or dcap-qvl + nvattest-verifier offline. +``` + +--- + +## Seal Weights + +If your model weights are private and shouldn't be re-pullable on every deploy: + +### Option 1: Pre-launch download + +```bash +phala deploy ... --pre-launch-script ./download-weights.sh +``` + +`download-weights.sh` runs inside the CVM after attestation. Use `HF_TOKEN` (sealed env) to pull, write to a persistent volume. + +### Option 2: Encrypt-at-rest + +Encrypt the weight tarball client-side, ship to S3, decrypt inside the CVM with a key derived from `HKDF(kms_root_pubkey, app_id, compose_hash)`. The decrypt key only re-derives if the compose-hash matches — stolen ciphertext is useless. + +See `data-coanalysis.md` for the HKDF pattern. + +--- + +## Endpoint + +### Public URL + +The shape is `https://-./v1/chat/completions`. +`` is whatever port your compose exposes (8000 for vLLM by default). +The gateway base domain is per-cluster — pull it from `phala cvms get`: + +```bash +phala cvms get my-llm --json | jq -r '.gateway.base_domain' +# e.g. dstack-pha-prod5.phala.network +phala cvms get my-llm --json | jq -r '.app_id' +# e.g. e029a4b8... +# Compose: https://e029a4b8...-8000.dstack-pha-prod5.phala.network +``` + +### Custom domain + +`dstack-gateway` supports custom domain mapping via the dashboard. Add `gateway.alias.example.com` → `-8000`. The TLS cert continues to carry the TDX quote in an X.509 extension. + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| vLLM OOMs on startup | Model too big for 1 GPU | Move to `h200.16xlarge` + `--tensor-parallel-size 8` | +| `HF_TOKEN` invalid | Sealed env not passed | Re-deploy with `-e .env` and confirm token in HF account has read access | +| Endpoint times out | vLLM still loading weights | First boot can take 10-20 min for large models. `phala logs -f` shows progress. | +| `nvidia-smi conf-compute -q` says OFF | GPU not in CC mode | Open a Phala support ticket — host config issue | +| 401 from `/v1/chat/completions` | Wrong API key | Set `Authorization: Bearer $VLLM_API_KEY` | +| Throughput poor on 70B | FP16 on 1 GPU | Switch to FP8 or move to 8× H200 | + +--- + +## Reference: minimal end-to-end + +```bash +# 1. Scaffold +mkdir my-llm && cd my-llm +# (write docker-compose.yml + .env with HF_TOKEN) + +# 2. Auth + deploy +phala login +phala deploy -n my-llm -c docker-compose.yml -e .env -t h200.small --kms phala --wait + +# 3. Verify GPU CC + endpoint +phala cvms attestation --json | jq '.app_certificates[0].quote' +# The hex string decodes into the combined TDX quote. For GPU CC verification, +# use NVIDIA's nvattest-verifier or dstack-verifier on the same blob — +# the GPU quote is bound into the TDX report_data. +phala ssh -- nvidia-smi conf-compute -q +ENDPOINT=$(phala cvms get my-llm --json | jq -r '.endpoints[0]') +curl $ENDPOINT/v1/chat/completions -H "Authorization: Bearer $VLLM_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model":"meta-llama/Llama-3.1-8B-Instruct","messages":[{"role":"user","content":"Hello"}]}' +``` + +The endpoint is OpenAI-compatible. Drop into any OpenAI SDK by setting `base_url=$ENDPOINT/v1`. diff --git a/skills/usecase/inference-call.md b/skills/usecase/inference-call.md new file mode 100644 index 00000000..37f03a14 --- /dev/null +++ b/skills/usecase/inference-call.md @@ -0,0 +1,313 @@ +--- +name: inference-call +description: | + Call the Phala Confidential AI API (hosted models on GPU TEE) via the + OpenAI-compatible interface at api.redpill.ai/v1. Use when users want + to call DeepSeek, Qwen, Llama, GPT-OSS, Gemma, etc. without deploying + their own server — pay per token, no infrastructure. +--- + +# Phala Confidential AI API + +OpenAI-compatible inference on confidential GPUs at `https://api.redpill.ai/v1`. + +## Operations + +| User says | Operation | +|---|---| +| "call confidential AI", "use phala model", "OpenAI compatible" | **First Call** | +| "get an API key", "how to authenticate" | **Get API Key** | +| "Python SDK", "TypeScript SDK" | **SDKs** | +| "stream tokens", "SSE" | **Streaming** | +| "tool calling", "function calling" | **Tool Calling** | +| "vision", "image input" | **Images & Vision** | +| "structured output", "JSON mode" | **Structured Output** | +| "verify the response", "signed receipt" | **Verify Signature** | +| "list models", "available models" | **Model Catalog** | + +--- + +## Get API Key + +1. Go to [cloud.phala.com](https://cloud.phala.com) and add at least $5 in credits (Dashboard → Deposit). +2. Open **Dashboard → Confidential AI API** and click **Enable**. +3. Click **Create Key**, give it a name, and copy the value (starts with `sk-`). + +Store the key in your environment: + +```bash +export CONFIDENTIAL_AI_KEY="sk-..." +``` + +--- + +## First Call + +### cURL + +```bash +curl https://api.redpill.ai/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" \ + -d '{ + "model": "openai/gpt-oss-20b", + "messages": [ + { "role": "user", "content": "Hello world!" } + ] + }' +``` + +That's the canonical "hello world." Replace the model and message and you're done. + +--- + +## Model Catalog + +Models are namespaced by provider. All run inside GPU TEE. + +### Phala provider (lowest cost) + +| Model | Model ID | Context | $/1M (in/out) | +|---|---|---|---| +| DeepSeek V3 0324 | `deepseek/deepseek-chat-v3-0324` | 163K | 0.28 / 1.14 | +| Qwen 2.5 VL 72B | `qwen/qwen2.5-vl-72b-instruct` | 65K | 0.59 / 0.59 | +| Gemma 3 27B | `google/gemma-3-27b-it` | 53K | 0.11 / 0.40 | +| GPT-OSS 120B | `openai/gpt-oss-120b` | 131K | 0.10 / 0.49 | +| GPT-OSS 20B | `openai/gpt-oss-20b` | 131K | 0.04 / 0.15 | +| Qwen 2.5 7B | `qwen/qwen-2.5-7b-instruct` | 32K | 0.04 / 0.10 | + +### Other providers + +| Model | Model ID | Context | +|---|---|---| +| DeepSeek V3.1 (NearAI) | `deepseek/deepseek-chat-v3.1` | 163K | +| Qwen3 30B (NearAI) | `qwen/qwen3-30b-a3b-instruct-2507` | 262K | +| Z.AI GLM 4.6 (NearAI) | `z-ai/glm-4.6` | 202K | +| Phi-4 (Tinfoil) | check live catalog | — | + +The full live catalog: — filter by **GPU TEE** to see only confidential variants. + +--- + +## SDKs + +The endpoint is OpenAI-compatible. Any OpenAI SDK works — just change the base URL. + +### Python (OpenAI SDK) + +```python +from openai import OpenAI + +client = OpenAI( + api_key=os.environ["CONFIDENTIAL_AI_KEY"], + base_url="https://api.redpill.ai/v1", +) + +response = client.chat.completions.create( + model="phala/deepseek-chat-v3-0324", + messages=[ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "What is your model name?"}, + ], +) +print(response.choices[0].message.content) +``` + +### TypeScript (OpenAI SDK) + +```typescript +import OpenAI from "openai" + +const client = new OpenAI({ + baseURL: "https://api.redpill.ai/v1", + apiKey: process.env.CONFIDENTIAL_AI_KEY, +}) + +const completion = await client.chat.completions.create({ + model: "phala/deepseek-chat-v3-0324", + messages: [{ role: "user", content: "What is your model name?" }], +}) + +console.log(completion.choices[0].message) +``` + +### LangChain + +```python +from langchain_openai import ChatOpenAI + +llm = ChatOpenAI( + model="phala/deepseek-chat-v3-0324", + base_url="https://api.redpill.ai/v1", + api_key=os.environ["CONFIDENTIAL_AI_KEY"], +) +``` + +--- + +## Streaming + +```python +stream = client.chat.completions.create( + model="phala/qwen-2.5-7b-instruct", + messages=[{"role": "user", "content": "Write a haiku about TEEs"}], + stream=True, +) +for chunk in stream: + print(chunk.choices[0].delta.content or "", end="", flush=True) +``` + +cURL with SSE: + +```bash +curl https://api.redpill.ai/v1/chat/completions \ + -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" \ + -H "Content-Type: application/json" \ + -N \ + -d '{ + "model": "phala/qwen-2.5-7b-instruct", + "messages": [{"role":"user","content":"Hello"}], + "stream": true + }' +``` + +--- + +## Tool Calling + +Standard OpenAI tool-calling format. + +```python +tools = [{ + "type": "function", + "function": { + "name": "get_weather", + "description": "Get weather for a city", + "parameters": { + "type": "object", + "properties": {"city": {"type": "string"}}, + "required": ["city"], + }, + }, +}] + +response = client.chat.completions.create( + model="phala/deepseek-chat-v3-0324", + messages=[{"role": "user", "content": "Weather in Tokyo?"}], + tools=tools, +) +print(response.choices[0].message.tool_calls) +``` + +Models that support tool calling: most Phala-provider models. Check the catalog page for capability flags. + +--- + +## Images & Vision + +For VLM models like `qwen/qwen2.5-vl-72b-instruct`: + +```python +response = client.chat.completions.create( + model="qwen/qwen2.5-vl-72b-instruct", + messages=[{ + "role": "user", + "content": [ + {"type": "text", "text": "What's in this image?"}, + {"type": "image_url", "image_url": {"url": "https://example.com/img.jpg"}}, + ], + }], +) +``` + +--- + +## Structured Output + +JSON mode + JSON Schema: + +```python +response = client.chat.completions.create( + model="phala/deepseek-chat-v3-0324", + messages=[{"role": "user", "content": "Give me a person record."}], + response_format={ + "type": "json_schema", + "json_schema": { + "name": "person", + "schema": { + "type": "object", + "properties": { + "name": {"type": "string"}, + "age": {"type": "integer"}, + }, + "required": ["name", "age"], + }, + }, + }, +) +``` + +--- + +## Verify Signature + +Every Confidential AI API response can be cryptographically verified — the response chains to the GPU TEE quote. + +### Per-request attestation report + +Fetch a fresh attestation tied to a nonce: + +```bash +NONCE=$(openssl rand -hex 32) +curl -s "https://api.redpill.ai/v1/attestation/report?model=phala/deepseek-chat-v3-0324&nonce=$NONCE" \ + -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" > report.json +``` + +The response has `nvidia_payload`, `intel_quote`, `signing_address`, and `signing_algo`. + +### Response headers (per-request) + +| Header | Meaning | +|---|---| +| `x-phala-receipt-sig` | Signature over `(model_id, prompt_hash, response_hash, timestamp)` | +| `x-phala-compose-hash` | Compose-hash of the model-serving CVM | +| `x-phala-app-id` | Per-app key identity | + +### Full verification flow + +For the complete end-to-end verification — verify NVIDIA GPU via NRAS, verify Intel TDX, check report-data binds the signing key + nonce, verify the compose manifest, check Sigstore provenance, and verify the response signature — follow `verify-attestation.md`. + +Reference implementation: [`Phala-Network/private-ml-sdk/vllm-proxy/verifiers/attestation_verifier.py`](https://github.com/Phala-Network/private-ml-sdk/blob/main/vllm-proxy/verifiers/attestation_verifier.py). + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| 401 Unauthorized | Bad / expired key | Generate a new key in Dashboard → Confidential AI API | +| 402 Payment Required | Out of credits | Add funds in Dashboard → Deposit | +| 404 Not Found | Wrong model ID | Use lowercase, e.g. `phala/deepseek-chat-v3-0324` not `Phala/DeepSeek-V3` | +| 429 Rate Limited | Workspace quota | Wait or contact Phala for quota increase | +| Response cuts off | Hit `max_tokens` | Increase `max_tokens` in request | +| Slow first token | Cold start on smaller models | Use a Dedicated Model deployment for predictable latency | + +--- + +## Reference: minimal end-to-end + +```bash +# 1. Get API key from cloud.phala.com (one-time) +export CONFIDENTIAL_AI_KEY="sk-..." + +# 2. Call +curl https://api.redpill.ai/v1/chat/completions \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" \ + -d '{ + "model": "openai/gpt-oss-20b", + "messages": [{"role":"user","content":"Hello world!"}] + }' +``` + +For self-hosted alternative (dedicated GPU + your own model weights), see `gpu-vllm-deploy.md`. diff --git a/skills/usecase/training-run.md b/skills/usecase/training-run.md new file mode 100644 index 00000000..475dc02c --- /dev/null +++ b/skills/usecase/training-run.md @@ -0,0 +1,372 @@ +--- +name: training-run +description: | + Run a confidential training / fine-tuning job on Phala Cloud GPU TEE. + SFT, DPO, RLHF, LoRA / QLoRA / PEFT, continued pre-training, or + multimodal projector training with TRL, Unsloth, or HuggingFace. + Use when users want to train on sealed datasets with attested checkpoints. +--- + +# Confidential Training on Phala GPU TEE + +`phala deploy` a TRL/Unsloth/HuggingFace training job on H200 with sealed datasets and signed checkpoint output. + +## Operations + +| User says | Operation | +|---|---| +| "fine-tune Llama", "SFT my model" | **First Run (SFT)** | +| "DPO", "preference tuning", "RLHF" | **DPO / RLHF** | +| "LoRA", "QLoRA", "PEFT" | **LoRA / PEFT** | +| "continued pre-training", "domain adaptation" | **Continued PT** | +| "multimodal", "vision adapter" | **Multimodal** | +| "seal the dataset", "private data" | **Seal Dataset** | +| "save checkpoints", "signed manifest" | **Output & Signing** | +| "scale to 8 GPU" | **Multi-GPU** | + +This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first. + +--- + +## Choose Method + +| Method | Trainer | Best for | Memory | +|---|---|---|---| +| SFT | TRL `SFTTrainer`, Unsloth | Instruction tuning on chat data | Full / LoRA | +| DPO | TRL `DPOTrainer` | Preference tuning (chosen/rejected pairs) | Reference + policy | +| RLHF | TRL `PPOTrainer` + reward model | Online RL from prefs | Heaviest | +| LoRA / PEFT | TRL + PEFT, Unsloth | Cost-efficient fine-tune | Tiny — 7B fits in 1 H200 | +| QLoRA | Unsloth, BitsAndBytes | 4-bit base + LoRA adapters | Smallest | +| Continued PT | Unsloth, raw HF Trainer | Domain adaptation on raw text | Medium | +| Multimodal | TRL + projector head | Adding vision / audio to LLM | Medium | + +Most teams start with **LoRA on a 7-13B model** (`h200.small`, $3.50/hr). + +--- + +## Scaffold + +### Step 1: Project layout + +```bash +mkdir my-finetune && cd my-finetune +``` + +``` +my-finetune/ +├── docker-compose.yml +├── train/ +│ ├── Dockerfile +│ ├── requirements.txt +│ └── train.py # the actual training script +├── data/ # local-only, sealed dataset +│ └── dataset.tar.gz.enc +├── .env.example +└── .env # gitignored +``` + +### Step 2: `train/Dockerfile` + +```dockerfile +FROM nvcr.io/nvidia/pytorch:24.10-py3 +RUN pip install transformers trl peft accelerate bitsandbytes datasets unsloth +COPY train.py /app/train.py +WORKDIR /app +CMD ["python", "train.py"] +``` + +### Step 3: `train/train.py` (LoRA SFT example) + +```python +from trl import SFTTrainer, SFTConfig +from peft import LoraConfig +from datasets import load_dataset +from transformers import AutoModelForCausalLM, AutoTokenizer +import os + +model_id = os.environ["BASE_MODEL"] # e.g. meta-llama/Llama-3.1-8B-Instruct +dataset_path = os.environ["DATASET_PATH"] # mounted volume path +output_dir = os.environ.get("OUTPUT_DIR", "/output") + +tok = AutoTokenizer.from_pretrained(model_id) +model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") +ds = load_dataset("json", data_files=dataset_path, split="train") + +trainer = SFTTrainer( + model=model, + tokenizer=tok, + train_dataset=ds, + peft_config=LoraConfig(r=32, lora_alpha=64, target_modules="all-linear"), + args=SFTConfig( + output_dir=output_dir, + num_train_epochs=3, + per_device_train_batch_size=2, + save_strategy="epoch", + ), +) +trainer.train() +trainer.save_model(output_dir) +``` + +### Step 4: `docker-compose.yml` + +```yaml +services: + trainer: + build: ./train + environment: + - BASE_MODEL=meta-llama/Llama-3.1-8B-Instruct + - DATASET_PATH=/data/dataset.jsonl + - OUTPUT_DIR=/output + - HF_TOKEN=${HF_TOKEN} + - DATASET_KEY=${DATASET_KEY} # for in-CVM dataset decryption + volumes: + - sealed-data:/data + - checkpoints:/output + - /var/run/dstack.sock:/var/run/dstack.sock + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu] +volumes: + sealed-data: + checkpoints: +``` + +--- + +## Seal Dataset + +The dataset never leaves your laptop in cleartext. + +### Step 1: Encrypt locally with HKDF + +```python +# scripts/seal-dataset.py +import os, hashlib, hmac, json +from cryptography.hazmat.primitives.ciphers.aead import AESGCM + +KMS_PUBKEY = open("kms-pubkey.txt").read().strip() +APP_ID = os.environ["APP_ID"] +COMPOSE_HASH = os.environ["COMPOSE_HASH"] + +def hkdf(material: bytes, info: bytes, length: int = 32) -> bytes: + return hmac.new(material, info, hashlib.sha256).digest()[:length] + +key = hkdf(KMS_PUBKEY.encode(), f"{APP_ID}:{COMPOSE_HASH}".encode()) +aes = AESGCM(key) +nonce = os.urandom(12) +plaintext = open("dataset.jsonl", "rb").read() +ct = aes.encrypt(nonce, plaintext, None) +open("dataset.jsonl.enc", "wb").write(nonce + ct) +``` + +### Step 2: Upload encrypted blob + +The encrypted file ships with the compose. Or to S3 (decrypt inside CVM via `--pre-launch-script`). + +### Step 3: Decrypt inside the CVM + +The CVM's per-app key is derived only after attestation passes. Mirror the HKDF to re-derive the same key inside the CVM: + +```python +# Inside train.py — pip install dstack-sdk cryptography +import os +from cryptography.hazmat.primitives.ciphers.aead import AESGCM +from dstack_sdk import DstackClient + +client = DstackClient() # auto-connects to /var/run/dstack.sock +derived = client.get_key("dataset/wrap", os.environ["COMPOSE_HASH"]) +key = derived.decode_key() # 32-byte secp256k1 → use as AES key + +aes = AESGCM(key) +blob = open("/data/dataset.jsonl.enc", "rb").read() +plaintext = aes.decrypt(blob[:12], blob[12:], None) +open("/tmp/dataset.jsonl", "wb").write(plaintext) +``` + +If the compose-hash doesn't match the registered hash, the derived key is wrong and decryption fails. Stolen ciphertext is useless. + +--- + +## First Run (SFT) + +### Step 1: Pick GPU shape + +| Model size | Type | Per-device batch | GPU | +|---|---|---|---| +| 7B LoRA | `h200.small` | 4-8 | 1× H200 | +| 13B LoRA | `h200.small` | 2-4 | 1× H200 | +| 70B LoRA | `h200.16xlarge` | 1-2 | 8× H200 | +| 7B full SFT | `h200.16xlarge` | 1 | 8× H200 | + +### Step 2: Deploy + +```bash +phala deploy -n llama-sft -c docker-compose.yml -e .env -t h200.small --kms phala --wait +``` + +### Step 3: Stream logs + +```bash +phala logs -f +``` + +Look for `loss=...` decreasing. Training duration depends on dataset size and method — typical small-data LoRA finishes in 1-3 hours. + +--- + +## Multi-GPU + +For 70B SFT or larger: + +### Update `train.py` to use accelerate launcher + +```python +# Replace the simple Trainer with deepspeed / FSDP via accelerate +``` + +### Update compose command + +```yaml +command: ["accelerate", "launch", "--multi_gpu", "--num_processes=8", "train.py"] +``` + +### Deploy on 8× H200 + +```bash +phala deploy -n llama-70b-sft -c docker-compose.yml -e .env -t h200.16xlarge --kms phala --wait +``` + +--- + +## DPO / RLHF + +Swap `SFTTrainer` for `DPOTrainer`: + +```python +from trl import DPOTrainer, DPOConfig + +trainer = DPOTrainer( + model=model, + ref_model=None, # uses peft adapters disabled + tokenizer=tok, + train_dataset=ds, # must have chosen/rejected fields + args=DPOConfig(output_dir=output_dir, beta=0.1, num_train_epochs=1), +) +trainer.train() +``` + +For PPO/RLHF, follow TRL's `PPOTrainer` recipe — same compose, swap script. + +--- + +## LoRA / PEFT + +Already shown in **Scaffold**. Adapter files land in `/output/adapter_*` — much smaller than full checkpoints (typically 50-500 MB). + +--- + +## Continued PT + +Use Unsloth's `from_pretrained` + raw `Trainer`: + +```python +from unsloth import FastLanguageModel +model, tok = FastLanguageModel.from_pretrained(model_id, load_in_4bit=True) +# ... raw text dataset, MaskedLM-style training ... +``` + +--- + +## Output & Signing + +After training completes, the output dir contains: +- `pytorch_model.bin` / `safetensors` (weights) +- `config.json` +- `tokenizer.json` (if shipped) + +### Sign the manifest + +Use the per-app key (via `dstack-guest-agent`) to sign a manifest containing checkpoint hashes: + +```python +import json, hashlib, time +from dstack_sdk import DstackClient + +client = DstackClient() +ckpt = open("/output/pytorch_model.bin", "rb").read() +manifest = { + "compose_hash": os.environ["COMPOSE_HASH"], + "base_model": os.environ["BASE_MODEL"], + "checkpoint_sha256": hashlib.sha256(ckpt).hexdigest(), + "ts": int(time.time()), +} + +# Bind the manifest into a fresh TDX quote — `application_data` is the +# 64-byte report_data the verifier checks. +report = hashlib.sha256(json.dumps(manifest, sort_keys=True).encode()).digest() +quote = client.get_quote(report) + +open("/output/manifest.signed.json", "w").write(json.dumps({ + **manifest, + "quote": quote.quote, + "event_log": quote.event_log, +})) +``` + +The signature chains to the TDX root + on-chain `DstackApp.sol` entry. Auditors verify offline. + +### Pull the artifacts + +```bash +phala cp :/output/ ./checkpoints/ -r +``` + +--- + +## Verify + +```bash +phala cvms attestation llama-sft --json > attestation.json +phala ssh -- nvidia-smi conf-compute -q # ConfComputeMode : ON +``` + +Then run the full verification flow per `verify-attestation.md` — Intel TDX + NVIDIA NRAS + report-data binding + compose-hash. The `manifest.signed.json` quote can be verified the same way: its `report_data` binds the manifest hash to a fresh TDX quote. + +--- + +## Troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| OOM on 7B SFT | Batch too big | Reduce `per_device_train_batch_size` or use `gradient_accumulation_steps` | +| Slow throughput | Single-GPU FP16 70B | Move to `h200.16xlarge` + tensor parallel | +| `HF_TOKEN` 401 | Token doesn't have model access | Accept the model license on HF, regenerate token | +| Decrypt fails | Compose-hash mismatch | First deploy registers the hash; subsequent deploys must use the same compose | +| Signature fails | Wrong app-id | `phala status` to confirm workspace; pubkey must match the running app | + +--- + +## Reference: minimal end-to-end + +```bash +# 1. Scaffold + seal dataset +mkdir my-finetune && cd my-finetune +# (write Dockerfile + train.py + compose + .env) +python scripts/seal-dataset.py + +# 2. Auth + deploy +phala login +phala deploy -n llama-sft -c docker-compose.yml -e .env -t h200.small --kms phala --wait + +# 3. Watch + verify +phala logs -f +phala cvms attestation --json | jq +phala cp :/output/manifest.signed.json ./ +``` + +The output is a signed checkpoint that any auditor can verify — no need to trust the trainer or Phala. diff --git a/skills/usecase/verify-attestation.md b/skills/usecase/verify-attestation.md new file mode 100644 index 00000000..752dfd7b --- /dev/null +++ b/skills/usecase/verify-attestation.md @@ -0,0 +1,349 @@ +--- +name: verify-attestation +description: | + Verify a Phala Cloud TEE attestation end-to-end — Intel TDX quote + to Intel root, NVIDIA GPU quote to NVIDIA root, report-data binding + the signing key + nonce, OS image hash to dstack-os reproducible + build, compose-hash to expected app, and Sigstore provenance for + container images. Use this whenever a user asks "how do I verify + this is really running in TEE?" +--- + +# Verify TEE Attestation + +The hardware-rooted proof flow that other skills reference. Every step gives a separate cryptographic guarantee — together they prove a Phala Cloud workload is running on genuine TEE hardware with the exact code you registered. + +## Operations + +| User says | Operation | +|---|---| +| "is this really in TEE?", "verify the CVM" | **Quick Check** | +| "verify the GPU TEE", "NVIDIA quote" | **Verify NVIDIA GPU** | +| "verify the TDX quote", "Intel root" | **Verify Intel TDX** | +| "fresh nonce", "replay attack" | **Nonce Binding** | +| "OS image hash", "reproducible build" | **OS Image Verification** | +| "compose-hash matches", "exact code running" | **Compose Manifest** | +| "verify the response signature" | **Verify Signature** | +| "offline verifier", "no internet" | **Offline Verification** | + +> **Authoritative doc:** [docs.phala.com/phala-cloud/confidential-ai/verify](https://docs.phala.com/phala-cloud/confidential-ai/verify) is the source of truth. This skill summarizes it as runnable steps. + +--- + +## Quick Check + +If you just want to confirm "yes, this CVM is running in TEE": + +```bash +# Summary +phala cvms attestation my-app + +# Full JSON (for programmatic verification) +phala cvms attestation my-app --json > attestation.json +``` + +The summary should report `is_online: true`, `is_public: true`, and `error: null`. The JSON contains `app_certificates[0].quote` — a hex-encoded TDX quote that's the basis for everything else below. + +For inference (hosted Confidential AI API), use the per-request flow: + +```bash +curl "https://api.redpill.ai/v1/attestation/report?model=phala/deepseek-chat-v3-0324&nonce=$(openssl rand -hex 32)" \ + -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" > report.json +``` + +The response includes `nvidia_payload`, `intel_quote`, `signing_address`, and `signing_algo`. + +--- + +## Why every step + +| Risk | Step that catches it | +|---|---| +| Replayed attestation from old/compromised hardware | **Nonce binding** — fresh per request | +| Counterfeit CPU pretending to be Intel TDX | **Verify Intel TDX** via DCAP / Phala verify endpoint | +| Counterfeit GPU pretending to be H100/H200 | **Verify NVIDIA** via NRAS | +| Signing key not actually inside the TEE | **Report-data binding** — first 32 bytes of `reportdata` = signing key | +| Operator swapped your code post-boot | **Compose manifest hash** — `mr_config` includes compose-hash | +| Operator swapped the OS image | **OS image hash** — matches dstack-os reproducible build | +| Container image swapped at registry | **Sigstore provenance** — built from expected source | + +Skip any step → that risk is unguarded. + +--- + +## Nonce Binding + +Generate a fresh 32-byte nonce per attestation request. The TEE embeds this nonce into the report — replayed quotes won't match. + +```python +import secrets +request_nonce = secrets.token_hex(32) # 64 hex chars +``` + +Pass the nonce when you fetch the attestation report: + +```python +import requests +report = requests.get( + f"https://api.redpill.ai/v1/attestation/report?model={model}&nonce={request_nonce}", + headers={"Authorization": f"Bearer {api_key}"}, +).json() +``` + +For app-CVM attestation (not hosted-API), the nonce is bound at handshake time via RA-TLS, and you verify by extracting `report_data` from the TLS cert's TDX-quote extension. + +--- + +## Verify NVIDIA GPU + +Only NVIDIA can confirm their hardware is genuine — secret keys baked into each chip at manufacturing. + +```python +import json, base64, requests + +gpu_payload = json.loads(report["nvidia_payload"]) +assert gpu_payload["nonce"].lower() == request_nonce.lower() # check fresh + +# Send to NVIDIA Remote Attestation Service (NRAS) +r = requests.post("https://nras.attestation.nvidia.com/v3/attest/gpu", json=gpu_payload) +result = r.json() + +# Decode the JWT verdict +jwt_token = result[0][1] +payload_b64 = jwt_token.split(".")[1] +padded = payload_b64 + "=" * ((4 - len(payload_b64) % 4) % 4) +verdict = json.loads(base64.urlsafe_b64decode(padded)) + +assert verdict["x-nvidia-overall-att-result"] is True +``` + +A passing NRAS verdict means the GPU silicon is genuine NVIDIA, the firmware is signed, and Confidential Compute mode is active. + +--- + +## Verify Intel TDX + +Two paths — an online verifier service (easy) or local DCAP verification (offline). + +### Online (Phala verifier) + +```python +intel_result = requests.post( + "https://cloud-api.phala.com/api/v1/attestations/verify", + json={"hex": report["intel_quote"]}, +).json() + +assert intel_result["quote"]["verified"] is True +``` + +### Offline (DCAP) + +Use [`dcap-qvl`](https://github.com/Phala-Network/dcap-qvl) — Phala's open-source DCAP quote verifier: + +```bash +cargo install --git https://github.com/Phala-Network/dcap-qvl +echo "$INTEL_QUOTE_HEX" | xxd -r -p > quote.bin +dcap-qvl verify quote.bin +``` + +This chains to Intel's PCS root cert, no network call to Phala. + +For an interactive sanity check without code, paste the hex `intel_quote` into the [TEE Attestation Explorer](https://proof.t16z.com/) — it decodes the quote and shows TDX version + security features. + +--- + +## Report-Data Binding + +The TDX quote's `reportdata` field is 64 bytes the application provides to the hardware at attestation time. Phala packs it as: + +| Bytes | Content | +|---|---| +| 0–31 | Signing address (ECDSA: 20-byte Eth address right-padded; Ed25519: 32-byte pubkey) | +| 32–63 | Your request nonce | + +Verify both halves match what you expect: + +```python +report_data = bytes.fromhex(intel_result["quote"]["body"]["reportdata"].removeprefix("0x")) + +embedded_address = report_data[:32] +embedded_nonce = report_data[32:64] + +if report["signing_algo"] == "ecdsa": + addr = bytes.fromhex(report["signing_address"].removeprefix("0x")) + assert embedded_address == addr.ljust(32, b"\x00") +else: + pubkey = bytes.fromhex(report["signing_address"]) + assert embedded_address == pubkey + +assert embedded_nonce.hex() == request_nonce +``` + +This proves: (1) the signing key was generated inside the TEE — it's bound into hardware-attested report data; (2) the attestation is fresh — it contains your unique nonce; (3) the signing key you'll use for verifying responses actually belongs to this TEE instance. + +--- + +## OS Image Verification + +Verify the operating system the CVM booted is the dstack-os reproducible build, not a tampered image. + +The TDX `mrtd` and `rtmr0..3` measurements are folded into the quote at boot. Compare them against the expected values from [meta-dstack reproducible builds](https://github.com/Dstack-TEE/meta-dstack#reproducible-build-the-guest-image): + +```bash +tar -xzf dstack-0.5.5.tar.gz +cat dstack-0.5.5/digest.txt +# 0b327bcd642788b0517de3ff46d31ebd3847b6c64ea40bacde268bb9f1c8ec83 +``` + +Then in the verification code, follow the [`osVerification.ts`](https://github.com/Phala-Network/dstack-verifier/blob/95689c41/src/verification/osVerification.ts#L13-L27) pattern from `dstack-verifier` to compute and compare TCB measurements against the digest above. + +If even one byte of the OS image differs, the measurements won't match. + +--- + +## Compose Manifest + +Verify the running CVM's `app_compose.json` hash matches the registered `compose-hash`. This is what proves "the code I see is the code that's running". + +```python +from hashlib import sha256, json + +tcb_info = report["info"]["tcb_info"] +if isinstance(tcb_info, str): + tcb_info = json.loads(tcb_info) + +app_compose = tcb_info["app_compose"] +compose_hash = sha256(app_compose.encode()).hexdigest() + +# `mr_config` field of TDX quote includes the compose hash, prefixed with "0x01" +mr_config = intel_result["quote"]["body"]["mrconfig"] +expected = "0x01" + compose_hash +assert mr_config.lower().startswith(expected.lower()) + +# Optional: print the actual docker-compose so the user can review +docker_compose = json.loads(app_compose)["docker_compose_file"] +print(docker_compose) +``` + +This proves the CVM booted with exactly the docker-compose you registered. Operator can't swap services, change images, or inject env vars after boot. + +--- + +## Sigstore Provenance + +Verify each container image in the compose was built from a known source repo (not a backdoor pushed to the registry). + +```python +import re, requests + +digests = set(re.findall(r'@sha256:([0-9a-f]{64})', docker_compose)) + +for digest in digests: + sigstore_url = f"https://search.sigstore.dev/?hash=sha256:{digest}" + r = requests.head(sigstore_url, timeout=10) + if r.status_code < 400: + print(f"✓ {sigstore_url}") + else: + print(f"✗ {sigstore_url} (HTTP {r.status_code})") +``` + +A passing Sigstore link lets the user open the URL and confirm the image was built from the expected GitHub repo, with the expected workflow, by the expected actor. If a digest has no Sigstore record, the user must trust the registry directly — flag this in your verifier UI. + +--- + +## Verify Signature + +Once you've verified the signing key is bound to a real TEE, verify response signatures from the Confidential AI API. + +Every response carries: + +| Header | Meaning | +|---|---| +| `x-phala-receipt-sig` | Signature over `(model_id, prompt_hash, response_hash, ts)` | +| `x-phala-compose-hash` | Compose-hash of the model serving CVM | +| `x-phala-app-id` | Per-app key identity | + +```python +import hashlib +from eth_keys import keys + +prompt_hash = hashlib.sha256(prompt.encode()).hexdigest() +response_hash = hashlib.sha256(response_text.encode()).hexdigest() +payload = f"{model_id}|{prompt_hash}|{response_hash}|{timestamp}" + +# ECDSA verification (signing_algo == "ecdsa") +signature = bytes.fromhex(receipt_sig.removeprefix("0x")) +recovered = keys.ecdsa_recover(hashlib.sha256(payload.encode()).digest(), keys.Signature(signature)) +assert recovered.to_address() == report["signing_address"] +``` + +For Ed25519, use `nacl.signing.VerifyKey(pubkey).verify(payload, signature)` instead. + +This is the final link: the response YOU received is bound to the (verified-genuine) TEE that produced it. + +--- + +## Offline Verification + +If you can't reach Phala's verify endpoint or NRAS: + +| Step | Offline tool | +|---|---| +| Verify Intel TDX quote | [`dcap-qvl`](https://github.com/Phala-Network/dcap-qvl) — chains to Intel PCS root | +| Verify NVIDIA GPU quote | [`nvattest-verifier`](https://github.com/NVIDIA/nvtrust) — NVIDIA's local verifier (chains to NVIDIA root) | +| Verify OS image hash | Compare against `digest.txt` from [meta-dstack release tarball](https://github.com/Dstack-TEE/meta-dstack/releases) | +| Verify compose-hash | sha256 the app_compose JSON locally; check against `mr_config` byte for byte | +| Verify Sigstore record | `cosign verify --certificate-identity-regexp '...' --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' ` | + +End-to-end offline: clone [Phala-Network/dstack-verifier](https://github.com/Phala-Network/dstack-verifier), feed it `attestation.json`, get a single PASS/FAIL. + +--- + +## Reference: minimal end-to-end (Python) + +```python +# Full flow — verify a Confidential AI API response is genuine +import secrets, requests, json, base64, hashlib + +api_key = os.environ["CONFIDENTIAL_AI_KEY"] +model = "phala/deepseek-chat-v3-0324" + +# 1. Fresh nonce +nonce = secrets.token_hex(32) + +# 2. Get attestation report +report = requests.get( + f"https://api.redpill.ai/v1/attestation/report?model={model}&nonce={nonce}", + headers={"Authorization": f"Bearer {api_key}"}, +).json() + +# 3. Verify NVIDIA GPU +gpu_payload = json.loads(report["nvidia_payload"]) +assert gpu_payload["nonce"].lower() == nonce.lower() +nras = requests.post("https://nras.attestation.nvidia.com/v3/attest/gpu", json=gpu_payload).json() +verdict = json.loads(base64.urlsafe_b64decode(nras[0][1].split(".")[1] + "==")) +assert verdict["x-nvidia-overall-att-result"] is True + +# 4. Verify Intel TDX +intel = requests.post( + "https://cloud-api.phala.com/api/v1/attestations/verify", + json={"hex": report["intel_quote"]}, +).json() +assert intel["quote"]["verified"] is True + +# 5. Verify report-data binding +rd = bytes.fromhex(intel["quote"]["body"]["reportdata"].removeprefix("0x")) +addr = bytes.fromhex(report["signing_address"].removeprefix("0x")) +assert rd[:32] == addr.ljust(32, b"\x00") +assert rd[32:64].hex() == nonce + +# 6. Verify compose-hash +tcb = report["info"]["tcb_info"] +if isinstance(tcb, str): tcb = json.loads(tcb) +ch = hashlib.sha256(tcb["app_compose"].encode()).hexdigest() +assert intel["quote"]["body"]["mrconfig"].lower().startswith(("0x01" + ch).lower()) + +print("ALL VERIFIED — request to", model, "ran on genuine GPU TEE with the expected code.") +``` + +A reference implementation lives at [`Phala-Network/private-ml-sdk/vllm-proxy/verifiers/attestation_verifier.py`](https://github.com/Phala-Network/private-ml-sdk/blob/main/vllm-proxy/verifiers/attestation_verifier.py) — copy it, point at your model + key, run.