diff --git a/skills/usecase/agent-deploy.md b/skills/usecase/agent-deploy.md
new file mode 100644
index 00000000..0548332a
--- /dev/null
+++ b/skills/usecase/agent-deploy.md
@@ -0,0 +1,352 @@
+---
+name: agent-deploy
+description: |
+  Deploy a confidential AI agent to Phala Cloud — a Claude Code wrapper,
+  Codex agent, MCP server, autonomous bot, anything with sealed API keys
+  and tool calls. Use when users want to ship an agent with credentials
+  sealed in a TEE and a verifiable Sign-RPC action log.
+---
+
+# Confidential AI Agent on Phala Cloud
+
+`phala deploy` an agent CVM with sealed credentials and a verifiable action log.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "deploy an agent", "ship my agent", "上线 agent" | **First Deploy** |
+| "scaffold a new agent", "create agent project" | **Scaffold** |
+| "seal the API key", "credentials leak", "secrets" | **Seal Secrets** |
+| "verify agent identity", "RA-TLS", "TDX quote" | **Verify Identity** |
+| "audit tool calls", "Sign-RPC log", "what did the agent do" | **Action Log** |
+| "deploy 10 agents", "fleet", "many agents" | **Multi-Agent Fleet** |
+
+This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first.
+
+---
+
+## Scaffold
+
+A confidential agent is a Docker container that:
+
+1. Reads sealed credentials from env vars (decrypted only inside the CVM at boot)
+2. Calls its tools (OpenAI, GitHub, Slack, etc.) from inside the TEE
+3. Emits a Sign-RPC log of every tool call (signature chains to the TDX root)
+
+### Step 1: Project layout
+
+```bash
+mkdir my-agent && cd my-agent
+```
+
+```
+my-agent/
+├── docker-compose.yml      # CVM definition
+├── .env.example            # which sealed vars the agent expects
+├── .env                    # local-only, gitignored (real secrets)
+├── agent/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── main.py             # the agent loop
+└── README.md
+```
+
+### Step 2: `docker-compose.yml`
+
+```yaml
+services:
+  agent:
+    image: ghcr.io/<your-org>/my-agent:latest   # publicly pullable, OR set DSTACK_DOCKER_USERNAME/PASSWORD
+    restart: unless-stopped
+    environment:
+      - OPENAI_API_KEY=${OPENAI_API_KEY}
+      - GITHUB_TOKEN=${GITHUB_TOKEN}
+      - AGENT_NAME=${AGENT_NAME:-my-agent}
+    volumes:
+      - /var/run/dstack.sock:/var/run/dstack.sock   # for Sign-RPC + KMS access
+    ports:
+      - "8080:8080"
+```
+
+The `dstack.sock` mount gives the container access to:
+- `dstack-guest-agent` for Sign-RPC signing
+- KMS to derive per-app keys
+- Attestation quotes on demand
+
+### Step 3: `.env.example`
+
+```
+OPENAI_API_KEY=sk-replace-me
+GITHUB_TOKEN=ghp_replace-me
+AGENT_NAME=my-agent
+```
+
+Commit this. Never commit the real `.env` — that goes to Phala via `-e`.
+
+---
+
+## Seal Secrets
+
+The `phala deploy -e` flag seals env vars to the registered compose-hash. Stolen ciphertext is useless — keys only re-derive inside an attested CVM whose compose-hash matches.
+
+### Step 1: Local `.env`
+
+```
+OPENAI_API_KEY=sk-real-value-here
+GITHUB_TOKEN=ghp_real-value-here
+AGENT_NAME=my-trading-agent
+```
+
+### Step 2: Pass at deploy
+
+```bash
+phala deploy -n my-agent -c docker-compose.yml -e .env --kms phala
+```
+
+Or inline:
+
+```bash
+phala deploy -n my-agent -c docker-compose.yml \
+  -e OPENAI_API_KEY=sk-... \
+  -e GITHUB_TOKEN=ghp-... \
+  --kms phala
+```
+
+`--kms phala` (default) seals to Phala's managed KMS. For ETH multi-sig gating, use `--kms ethereum` with `--private-key` and `--rpc-url`.
+
+---
+
+## First Deploy
+
+### Step 1: Authenticate
+
+Per `../phala-cli/SKILL.md`:
+
+```bash
+phala login
+```
+
+### Step 2: Pick instance type
+
+Most agents fit a small CPU TEE:
+
+```bash
+phala instance-types
+# pick tdx.small ($0.058/hr) for light agents
+# tdx.medium for tool-heavy / memory-hungry agents
+# h200.small ($3.50/hr) only if the agent runs local inference
+```
+
+### Step 3: Deploy
+
+```bash
+phala deploy -n my-agent -c docker-compose.yml -e .env -t tdx.medium --kms phala --wait
+```
+
+`--wait` blocks until the CVM is ready (essential in CI).
+
+### Step 4: Link the directory
+
+```bash
+phala link
+git add phala.toml   # safe to commit, contains no secrets
+```
+
+After `link`, all subsequent `phala` commands target this CVM without `-n`.
+
+### Step 5: Verify
+
+```bash
+phala ps                       # containers running?
+phala logs -f                  # agent output
+phala cvms attestation         # TDX quote — proves the CVM is genuine
+```
+
+---
+
+## Verify Identity
+
+The CVM's identity is its compose-hash. Every Sign-RPC signature chains to the TDX root + this compose-hash.
+
+### Pull the attestation
+
+```bash
+# Get the full cert chain + TDX quote
+phala cvms attestation --json | jq '.app_certificates[0].quote'
+
+# Or summary form
+phala cvms attestation
+```
+
+The response shape:
+
+```json
+{
+  "success": true,
+  "is_online": true,
+  "is_public": true,
+  "app_certificates": [
+    { "subject": {...}, "issuer": {...}, "quote": "0400...", "app_id": "..." },
+    ...
+  ]
+}
+```
+
+The hex `quote` decodes into a TDX quote containing:
+- `mrtd` — TDX measurement (the firmware identity)
+- `rtmr0..3` — runtime measurements (kernel, initrd, compose hash)
+- `report_data` — your app-specific binding
+
+### Verify offline
+
+For the full step-by-step verification flow (Intel TDX root + NVIDIA root + report-data binding + compose-hash + Sigstore provenance), follow `verify-attestation.md`.
+
+The minimum check:
+
+```bash
+phala cvms attestation my-agent --json > attestation.json
+QUOTE=$(jq -r '.app_certificates[0].quote' attestation.json)
+curl -sX POST "https://cloud-api.phala.com/api/v1/attestations/verify" \
+  -H "Content-Type: application/json" \
+  -d "{\"hex\": \"$QUOTE\"}" | jq '.quote.verified'
+# Expect: true
+```
+
+A passing quote + matching compose-hash = the running agent IS the build you registered.
+
+---
+
+## Action Log
+
+Every tool call the agent makes can be signed via Sign-RPC inside the CVM. The signature chains to the per-app key derived from KMS.
+
+### From inside the agent (Python)
+
+```python
+import socket, json
+
+def sign_action(payload: dict) -> str:
+    """Send to dstack-guest-agent over the Unix socket; receive signature."""
+    s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+    s.connect("/var/run/dstack.sock")
+    s.sendall(json.dumps({"method": "sign", "params": payload}).encode())
+    return json.loads(s.recv(4096))["signature"]
+
+# Wrap every tool call
+sig = sign_action({"tool": "github.create_issue", "args_hash": "0xab12..."})
+```
+
+### Read the log later
+
+The signed log is emitted to stdout (or your sink). Stream it:
+
+```bash
+phala logs -f --since 1h | jq 'select(.sign_rpc)'
+```
+
+### Verify the log offline
+
+Each entry includes a signature that anyone can verify against the per-app pubkey (derived from the compose-hash).
+
+---
+
+## Multi-Agent Fleet
+
+Deploy N parallel agents, each with their own compose-hash + sealed creds. They attest each other via mutual RA-TLS.
+
+### Step 1: Per-agent compose
+
+Each agent gets a slightly different `docker-compose.yml` (different image, different env). Different compose = different `compose-hash` = different identity.
+
+### Step 2: Deploy in a loop
+
+```bash
+for AGENT in researcher coder triager; do
+  phala deploy -n $AGENT -c compose/$AGENT.yml -e env/$AGENT.env --wait
+done
+```
+
+### Step 3: Mutual RA-TLS between them
+
+Each agent CVM gets a public endpoint shaped like
+`https://<app_id>-<port>.<gateway_base_domain>` — the exact gateway domain
+is per-cluster (e.g. `dstack-pha-prod12.phala.network`). Get it live:
+
+```bash
+phala cvms get my-agent --json | jq -r '.endpoints[0].app'
+```
+
+Each cert carries the peer's TDX quote in an X.509 extension — TLS handshake
+AND attestation in one handshake.
+
+### Step 4: List the fleet
+
+```bash
+phala apps
+# or
+phala apps --search trading
+```
+
+---
+
+## Common patterns
+
+### Wrap an existing CLI agent (Claude Code, Codex)
+
+The agent runs as a long-lived service that exposes a tool API on port 8080. The tool API:
+1. Receives a task from a user
+2. Resolves OpenAI / GitHub creds from sealed env
+3. Calls Claude Code / Codex internally
+4. Emits Sign-RPC log of every tool call
+
+Compose example: a `claude-code` image + an `nginx` reverse proxy with TLS.
+
+### MCP server
+
+Deploy any MCP server image (e.g., `bluenexus/mcp-search`) with mutual RA-TLS. Clients verify the server's TDX quote before sending requests.
+
+### Pre-launch script
+
+Need to download model weights or warm a cache before the agent starts?
+
+```bash
+phala deploy ... --pre-launch-script ./bootstrap.sh
+```
+
+The script runs once inside the CVM after attestation, before containers start.
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| `manifest unknown` in serial logs | Image not pullable | Push to a public registry, OR set `DSTACK_DOCKER_USERNAME` + `DSTACK_DOCKER_PASSWORD` in `.env` |
+| Container restarts immediately | Missing env var | Run `phala logs my-agent` — check for `KeyError` / `undefined` env |
+| Agent can't reach `dstack.sock` | Volume not mounted | Add `- /var/run/dstack.sock:/var/run/dstack.sock` to compose |
+| Sign-RPC returns 401 | KMS doesn't recognize compose-hash | Re-deploy — first deploy registers the hash |
+| Tool calls failing intermittently | Outbound network blocked | Check serial logs (`phala logs --serial`) — `dstack-gateway` allowlist may need updating |
+
+For deeper debugging, see **Debug a CVM** in `../phala-cli/SKILL.md`.
+
+---
+
+## Reference: minimal end-to-end
+
+```bash
+# 1. Scaffold
+mkdir my-agent && cd my-agent
+# (write docker-compose.yml + .env)
+
+# 2. Auth + deploy
+phala login
+phala deploy -n my-agent -c docker-compose.yml -e .env -t tdx.medium --kms phala --wait
+
+# 3. Link + verify
+phala link
+phala cvms attestation --json > attestation.json
+phala logs -f
+```
+
+Done — the agent is live, its credentials are sealed, every tool call is signed, and anyone can verify the binding offline.
diff --git a/skills/usecase/cloud-migration.md b/skills/usecase/cloud-migration.md
new file mode 100644
index 00000000..0c9db5a2
--- /dev/null
+++ b/skills/usecase/cloud-migration.md
@@ -0,0 +1,273 @@
+---
+name: cloud-migration
+description: |
+  Migrate a confidential workload from AWS Nitro Enclaves, GCP
+  Confidential VMs, or Tinfoil to Phala Cloud. Use when users have an
+  existing TEE workload elsewhere and want to move it — covers auth,
+  compose adaptation, attestation diff, and cutover.
+---
+
+# Migrate to Phala Cloud
+
+Port a confidential workload to Phala from another TEE provider.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "migrate from AWS Nitro" | **From AWS Nitro** |
+| "migrate from GCP CC VM" | **From GCP** |
+| "migrate from Tinfoil" | **From Tinfoil** |
+| "general migration", "where to start" | **Diff Map** |
+| "cutover", "DNS switch" | **Cutover** |
+
+This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first.
+
+---
+
+## Diff Map
+
+| Concern | AWS Nitro Enclaves | GCP Confidential VM | Tinfoil | Phala Cloud |
+|---|---|---|---|---|
+| Hardware TEE | AWS Nitro hypervisor | AMD SEV-SNP / Intel TDX | TDX (managed) | Intel TDX + NVIDIA NV-CSE (GPU) |
+| Container model | EIF (Enclave Image Format) | Standard VM, you bring TEE-aware images | Custom AMI | Standard `docker-compose.yml` |
+| Auth | IAM roles + KMS | gcloud + IAM | proprietary CLI | `phala login` (device flow) |
+| Deploy | `nitro-cli build-enclave` + `run-enclave` | `gcloud compute instances create --confidential-compute` | proprietary | `phala deploy -c docker-compose.yml` |
+| Secrets | KMS + parent instance | Cloud KMS | proprietary | `phala deploy -e .env --kms phala` (sealed to compose-hash) |
+| Attestation | NSM PCRs (PCR0/1/2) | `vTPM` quote | proprietary | TDX quote + on-chain registry |
+| Verify offline | AWS-signed PCRs | GCP attestation library | trust the provider | DCAP → Intel root + `DstackApp.sol` (anyone can verify) |
+| GPU TEE | not natively | not yet (preview) | yes (limited) | yes (H200 today, more SKUs coming) |
+| Multi-party | bilateral DPAs | Confidential Space (workload identity) | n/a | multi-sig DstackApp + on-chain compose-hash |
+
+The biggest deltas:
+1. **Compose vs custom image format** — dstack runs vanilla Docker Compose; no need to build an EIF or AMI.
+2. **Sealed env vars** — Phala's `--kms phala` flag seals env to the compose-hash automatically.
+3. **Verifiable offline** — Phala attestation chains to Intel/NVIDIA roots + on-chain registry; no need to trust the provider.
+
+---
+
+## From AWS Nitro
+
+### Step 1: Convert EIF → Docker Compose
+
+Nitro EIFs are typically built from a `Dockerfile` already. Reuse the same Dockerfile, package as a compose service:
+
+```yaml
+# was: nitro-cli build-enclave --docker-uri myapp:latest
+# becomes:
+services:
+  app:
+    image: myapp:latest
+    restart: unless-stopped
+    ports:
+      - "8080:8080"
+    volumes:
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    environment:
+      - YOUR_VARS=${YOUR_VARS}
+```
+
+### Step 2: Migrate KMS
+
+AWS KMS calls go to your provider's KMS. With Phala:
+
+```python
+# was: boto3.client('kms').decrypt(CiphertextBlob=blob)
+# becomes (inside the CVM):
+from dstack_sdk import DstackClient
+client = DstackClient()                                  # /var/run/dstack.sock
+key = client.get_key("aws-migration", compose_hash).decode_key()
+plaintext = AESGCM(key).decrypt(nonce, ct, None)
+```
+
+The key is derived only after attestation passes — equivalent guarantee to KMS gating, but on-chain auditable.
+
+### Step 3: Migrate attestation verification
+
+```diff
+- # AWS NSM PCR verification
+- nsm-cli describe-pcr --index 0
+- # client checks PCR0 == expected_hash
++ # Phala attestation
++ phala cvms attestation my-app --json > attestation.json
++ # client checks: TDX quote → Intel root, mrtd matches expected, app_id matches
+```
+
+### Step 4: Deploy to Phala
+
+```bash
+phala login
+phala deploy -n my-app -c docker-compose.yml -e .env --kms phala --wait
+phala cvms attestation my-app --json
+```
+
+### Step 5: Update client code
+
+Clients that previously verified Nitro PCRs now verify Phala attestation. Follow `verify-attestation.md` for the full flow (Intel TDX root + NVIDIA NRAS + report-data binding + compose-hash). Reference implementation: [`Phala-Network/private-ml-sdk/vllm-proxy/verifiers/attestation_verifier.py`](https://github.com/Phala-Network/private-ml-sdk/blob/main/vllm-proxy/verifiers/attestation_verifier.py).
+
+---
+
+## From GCP
+
+### Step 1: Compose
+
+GCP Confidential VMs run regular VM images. Your TEE-aware service runs as a systemd unit or a Docker container. Move it to a compose:
+
+```yaml
+services:
+  app:
+    image: gcr.io/<your-project>/<your-image>:tag
+    # ... env, volumes, ports as before ...
+```
+
+GCR images work as long as they're publicly pullable, or use `DSTACK_DOCKER_USERNAME/PASSWORD` for private GCR.
+
+### Step 2: Migrate Cloud KMS calls
+
+Replace `gcloud kms decrypt` calls with `dstack-sdk` key derivation (see AWS section above).
+
+### Step 3: Migrate vTPM attestation
+
+GCP exposes a vTPM quote via `go-attestation`. Phala provides a TDX quote via `phala cvms attestation`. Both chain to a hardware root; the verification API differs:
+
+```diff
+- # GCP go-attestation
+- attest.NewClient(...).Attest(...)
++ # Phala
++ phala cvms attestation my-app --json
+```
+
+### Step 4: Deploy
+
+```bash
+phala login
+phala deploy -n my-app -c docker-compose.yml -e .env --kms phala --wait
+```
+
+For GPU workloads (which GCP doesn't yet support in CC mode), use `-t h200.small`.
+
+---
+
+## From Tinfoil
+
+Tinfoil is closest in spirit to Phala — managed TDX, OpenAI-compatible inference for some flows. Migration is mostly endpoint swap + (optionally) self-deploy.
+
+### Inference users
+
+If you're calling Tinfoil's API:
+
+```diff
+- base_url = "https://inference.tinfoil.sh/v1"
++ base_url = "https://api.redpill.ai/v1"
+```
+
+Model names may differ — check `https://redpill.ai/models`.
+
+### Custom-deploy users
+
+If you're running a custom container on Tinfoil:
+
+```bash
+# was: tinfoil deploy --image myimage:tag
+# becomes:
+phala deploy -n my-app -c docker-compose.yml --kms phala --wait
+```
+
+The compose flow is more flexible than Tinfoil's single-image model — multi-service apps, sealed env, GPU TEE all work natively.
+
+---
+
+## Cutover
+
+A safe cutover keeps both providers running until verification is solid.
+
+### Step 1: Deploy to Phala in parallel
+
+```bash
+phala deploy -n my-app-phala -c docker-compose.yml -e .env --kms phala --wait
+PHALA_URL=$(phala cvms get my-app-phala --json | jq -r '.endpoints[0]')
+```
+
+### Step 2: Shadow traffic
+
+Send 1-5% of production traffic to Phala. Compare:
+- Latency (Phala TDX overhead is ~3-5%, GPU CC ~5-7%)
+- Output equivalence (same model, same input → same output)
+- Attestation availability (`/_phala/attestation` should respond on every request)
+
+### Step 3: Increase traffic gradually
+
+10% → 50% → 100% over a week. Monitor your metrics.
+
+### Step 4: Decommission
+
+Once 100% on Phala for a stable period:
+
+```bash
+# AWS
+nitro-cli terminate-enclave --enclave-id <id>
+
+# GCP
+gcloud compute instances delete <vm>
+
+# Tinfoil
+tinfoil delete <app>
+```
+
+### Step 5: Update DNS
+
+Point your customer-facing DNS to the Phala endpoint. Real format is
+`<app_id>-<port>.<gateway_base_domain>` — pull it live from the CVM JSON:
+
+```bash
+phala cvms get my-app-phala --json | jq -r '.endpoints[0].app'
+# e.g. https://e029a4b8...-8080.dstack-pha-prod5.phala.network
+```
+
+```
+api.example.com.  CNAME  e029a4b8...-8080.dstack-pha-prod5.phala.network.
+```
+
+`dstack-gateway` can also bind a custom domain via the dashboard so the URL
+doesn't expose the app_id.
+
+---
+
+## Common gotchas
+
+| Provider | Gotcha | Mitigation |
+|---|---|---|
+| AWS Nitro | App was using parent-instance file system | Move to a Docker volume (Phala persists volumes by default) |
+| AWS Nitro | Used IMDS for IAM creds | Switch to sealed env vars via `phala deploy -e` |
+| GCP | Hardcoded `metadata.google.internal` | Replace with sealed env or `dstack-gateway`-routed config |
+| Tinfoil | Tinfoil-specific signing extensions | Replace with Phala's Sign-RPC (or generic JWT signed by per-app key) |
+| All | Outbound network was unrestricted | Phala routes egress via `dstack-gateway` — check allowlist for your destination domains |
+
+---
+
+## Reference: typical migration
+
+```bash
+# 1. Audit your existing workload
+#    - what TEE primitive (PCR / vTPM / proprietary)?
+#    - what KMS calls?
+#    - what is your verification flow?
+
+# 2. Adapt to Phala primitives
+#    - compose file from your existing Dockerfile
+#    - .env with secrets you used to fetch from KMS
+#    - swap attestation library to dstack-verifier
+
+# 3. Deploy in shadow
+phala login
+phala deploy -n my-app -c docker-compose.yml -e .env --kms phala --wait
+
+# 4. Verify equivalence
+phala cvms attestation my-app --json
+# A/B test endpoints
+
+# 5. Cutover
+# DNS swap, decommission old enclave/VM
+```
+
+For provider-specific deep dives, see the comparison pages on `https://phala.com/compare/`.
diff --git a/skills/usecase/data-coanalysis.md b/skills/usecase/data-coanalysis.md
new file mode 100644
index 00000000..6708a10c
--- /dev/null
+++ b/skills/usecase/data-coanalysis.md
@@ -0,0 +1,298 @@
+---
+name: data-coanalysis
+description: |
+  Set up multi-party cohort analysis on Phala Cloud — multiple data owners
+  each seal datasets locally, then a sealed Analysis CVM joins them in
+  TDX+H200 memory under multi-sig DstackApp approval. Use for healthcare
+  consortia, financial risk, fraud detection, supply-chain audits — any
+  case where data must stay at source but compute happens jointly.
+---
+
+# Multi-Party Confidential Cohort Analysis
+
+Compute-to-data: sealed datasets stay at source, the model travels, multi-owner approval gates every key release.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "set up multi-party analysis", "consortium", "data clean room" | **End-to-End** |
+| "seal my dataset", "encrypt at source" | **Owner Sealing** |
+| "register on-chain", "multi-sig approval" | **Register & Approve** |
+| "deploy the analysis CVM" | **Deploy Analysis** |
+| "differential privacy aggregate", "DP" | **DP Output** |
+| "revoke", "stop the analysis" | **Revoke** |
+
+This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first.
+
+---
+
+## Architecture (skim this once)
+
+```
+Owner A laptop                        Owner B laptop
+[seal-cli encrypts ds-A.jsonl]        [seal-cli encrypts ds-B.jsonl]
+        |                                     |
+        +-> S3/IPFS ciphertext blobs <--------+
+                          |
+                          v
+            Analysis CVM (TDX + H200)
+            - reads blobs
+            - calls KMS via RA-TLS for per-dataset keys
+            - joins, embeds, runs the model
+            - emits ONLY contract-allowed output (DP-aggregate)
+                          |
+                          v
+            DstackApp.sol (multi-sig)
+            - owners = [A, B, ...] are signers
+            - compose-hash added only after threshold met
+            - any owner can withdraw signature → halt all subsequent compute
+```
+
+Phala is not in the trust chain. Owners verify everything offline.
+
+---
+
+## Owner Sealing
+
+Each data owner does this on their own laptop. Phala/operator never sees plaintext.
+
+### Step 1: Get the analysis compose-hash
+
+The analyst publishes the analysis `docker-compose.yml`. Each owner reviews it (it's small and reviewable; data is large and sensitive). The compose-hash is the contract.
+
+```bash
+# Owner: clone the analyst's repo, review, then compute the hash
+sha256sum docker-compose.yml
+# 0xa3f2c1...  (this is the compose-hash)
+```
+
+### Step 2: Get the analysis app-id
+
+After the analyst publishes the compose, the analyst registers it on `DstackApp.sol` (or shares the app-id):
+
+```
+APP_ID=app_d8e2f1...
+COMPOSE_HASH=a3f2c1...
+KMS_ROOT_PUBKEY=0x04abc...   # from kms.phala.com or hardcoded in dstack
+```
+
+### Step 3: Seal the dataset locally
+
+```python
+# seal-dataset.py
+import os, sys, hashlib, hmac
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+
+OWNER_ID = sys.argv[1]            # e.g. "hospital-a"
+INPUT = sys.argv[2]               # e.g. "ehr-data.jsonl"
+OUTPUT = sys.argv[3]              # e.g. "ehr-data.sealed"
+
+KMS = bytes.fromhex(os.environ["KMS_ROOT_PUBKEY"][2:])
+APP = os.environ["APP_ID"].encode()
+HASH = os.environ["COMPOSE_HASH"].encode()
+INFO = b"|".join([APP, HASH, OWNER_ID.encode()])
+
+# HKDF-Expand simulation
+key = hmac.new(KMS, INFO, hashlib.sha256).digest()
+aes = AESGCM(key)
+nonce = os.urandom(12)
+
+plaintext = open(INPUT, "rb").read()
+ct = aes.encrypt(nonce, plaintext, INFO)
+open(OUTPUT, "wb").write(nonce + ct)
+print(f"Sealed {len(plaintext)} bytes -> {OUTPUT} ({len(ct) + 12} bytes)")
+```
+
+```bash
+python seal-dataset.py hospital-a ehr-data.jsonl ehr-data.sealed
+```
+
+### Step 4: Publish the ciphertext
+
+Owners ship `*.sealed` blobs to a shared S3 bucket / IPFS / wherever the analysis CVM can read them. Plaintext never leaves the owner's machine.
+
+---
+
+## Register & Approve
+
+### Step 1: Analyst registers the compose
+
+```bash
+phala deploy -n cohort-analysis -c docker-compose.yml --kms ethereum \
+  --private-key $ANALYST_KEY --rpc-url $ETH_RPC \
+  --custom-app-id app_cohort_v1 --nonce 1 \
+  --prepare-only
+```
+
+`--prepare-only` produces a commit token (no on-chain transaction yet — for multi-sig flow).
+
+### Step 2: Owners approve via multisig wallet
+
+Each owner uses Safe / Gnosis to approve the `DstackApp.addAllowedHash(compose_hash)` transaction. The DstackApp owner is configured as a multi-sig with each data owner as a signer.
+
+### Step 3: Once threshold met, commit
+
+```bash
+phala deploy --commit --token $COMMIT_TOKEN --transaction-hash $TX
+```
+
+The compose-hash is now on-chain. The CVM can boot.
+
+---
+
+## Deploy Analysis
+
+### Step 1: Compose with sealed data ingestion
+
+```yaml
+services:
+  analysis:
+    image: ghcr.io/<your-org>/cohort-analysis:v1
+    environment:
+      - APP_ID=${APP_ID}
+      - COMPOSE_HASH=${COMPOSE_HASH}
+      - SEALED_BLOBS=s3://cohort/sealed/   # paths owners published
+      - OWNER_LIST=hospital-a,hospital-b   # whose keys to derive
+    volumes:
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    deploy:
+      resources:
+        reservations:
+          devices: [{ driver: nvidia, count: all, capabilities: [gpu] }]
+```
+
+### Step 2: Inside the analysis container
+
+```python
+# analysis.py — pip install dstack-sdk cryptography
+import os
+from dstack_sdk import DstackClient
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+
+client = DstackClient()                                  # /var/run/dstack.sock
+owners = os.environ["OWNER_LIST"].split(",")
+
+for owner in owners:
+    derived = client.get_key(f"cohort/{owner}", os.environ["COMPOSE_HASH"])
+    key = derived.decode_key()                           # 32 bytes
+    blob = read_s3(f"s3://cohort/sealed/{owner}.sealed")
+    aes = AESGCM(key)
+    plaintext = aes.decrypt(blob[:12], blob[12:], None)
+    # join into a polars/pandas frame, run the model, etc.
+```
+
+The keys ONLY re-derive if the running compose-hash matches the on-chain registered hash — i.e. the verifier already passed.
+
+### Step 3: Deploy
+
+```bash
+phala deploy -n cohort-analysis -c docker-compose.yml \
+  -e APP_ID=app_cohort_v1 -e COMPOSE_HASH=$HASH \
+  -t h200.small --kms ethereum \
+  --private-key $ANALYST_KEY --rpc-url $ETH_RPC --listed --wait
+```
+
+`--listed` makes the CVM visible on the public Trust Center so owners can independently verify it's running their registered build.
+
+---
+
+## DP Output
+
+The analysis container should emit ONLY contract-allowed output — typically:
+- Aggregate statistics (mean, count, ratio)
+- Differential-privacy-noised aggregates
+- Embedding vectors (without row provenance)
+- Signed labels (without the row's full features)
+
+Anything that could leak per-row provenance must be guarded by the compose itself. The compose is the contract owners reviewed.
+
+```python
+# In analysis.py — emit only DP-aggregate
+from diffprivlib import LaplaceMechanism
+mech = LaplaceMechanism(epsilon=1.0, sensitivity=1)
+result = {
+    "cohort_size": mech.randomise(len(joined_df)),
+    "mean_risk_score": mech.randomise(float(joined_df["risk"].mean())),
+}
+
+# Bind the result into a TDX quote so anyone can verify it offline
+import hashlib
+report = hashlib.sha256(json.dumps(result, sort_keys=True).encode()).digest()
+quote = client.get_quote(report)
+print(json.dumps({**result, "quote": quote.quote, "compose_hash": compose_hash}))
+```
+
+---
+
+## Revoke
+
+Any single owner can halt all subsequent compute by withdrawing their multi-sig approval on `DstackApp.sol`.
+
+```bash
+# Owner's wallet:
+DstackApp.removeAllowedHash(compose_hash)
+```
+
+The next time the Analysis CVM tries to refresh keys via KMS, the verifier sees the hash is no longer allowed, and the key derivation fails. In-flight compute that already has unwrapped data is in TDX memory only — it cannot persist anything to disk that's readable elsewhere.
+
+---
+
+## Verify
+
+### Each owner runs locally:
+
+```bash
+phala cvms attestation cohort-analysis --json > attestation.json
+# Then run the full verification per verify-attestation.md:
+#   - TDX quote chains to Intel root (DCAP / Phala verify endpoint)
+#   - GPU NV-CSE quote chains to NVIDIA root (NRAS)
+#   - report_data binds to the signing key + your nonce
+#   - mr_config binds to the expected compose_hash
+#   - container images have Sigstore provenance from expected source repos
+```
+
+### Each owner checks DstackApp.sol:
+
+```
+DstackApp.allowedHashes(compose_hash) == true
+DstackApp.owners() == [hospital-a-addr, hospital-b-addr, ...]
+DstackApp.threshold() == 2  # or whatever k-of-n is in use
+```
+
+### Output verification
+
+The signed aggregate's signature must verify against the per-app pubkey — and that pubkey only exists if attestation + on-chain approval both passed.
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| Decrypt fails inside CVM | Compose-hash mismatch | Confirm `phala cvms get` returns the same compose-hash you sealed against |
+| `--prepare-only` token expired | Not committed within window | Re-run `--prepare-only` and have owners approve quickly |
+| KMS query 403 | On-chain `addAllowedHash` not yet confirmed | Wait for tx confirmation; `phala cvms restart` |
+| Output too aggressive | DP epsilon too small or aggregate too narrow | Tune `epsilon`; coarsen the aggregation function |
+| Owner can't decrypt their own dataset locally | Used wrong KMS pubkey | Pull from `kms.phala.com` per current epoch |
+
+---
+
+## Reference: minimal end-to-end
+
+```bash
+# Owner side (each owner)
+python seal-dataset.py hospital-a ./ehr-data.jsonl ./ehr-data.sealed
+aws s3 cp ehr-data.sealed s3://cohort/sealed/hospital-a.sealed
+
+# Analyst side
+phala login
+phala deploy -n cohort-analysis -c docker-compose.yml --kms ethereum \
+  --private-key $ANALYST_KEY --rpc-url $ETH_RPC --prepare-only
+# (owners approve in multisig wallet)
+phala deploy --commit --token $TOKEN --transaction-hash $TX
+phala cvms attestation cohort-analysis --json > attestation.json
+phala logs -f                                 # watch the join + DP-aggregate
+```
+
+The output is a DP-aggregate signed by an attested CVM whose compose-hash was multi-owner approved on-chain. Each owner can verify offline without trusting the analyst, the operator, or Phala.
diff --git a/skills/usecase/dstack-self-host.md b/skills/usecase/dstack-self-host.md
new file mode 100644
index 00000000..1c9dab4a
--- /dev/null
+++ b/skills/usecase/dstack-self-host.md
@@ -0,0 +1,377 @@
+---
+name: dstack-self-host
+description: |
+  Self-host the dstack control plane on your own bare-metal Intel TDX
+  hardware. Use when users need data residency, regulatory boundary
+  control, or want to run dstack outside Phala's managed cloud. Covers
+  building dstack-vmm / dstack-kms / dstack-gateway from source, using
+  the vmm-cli.py app deployer, and choosing an auth server (auth-simple
+  vs auth-eth on-chain).
+---
+
+# Self-Hosted dstack
+
+Run `dstack-vmm`, `dstack-kms`, and `dstack-gateway` on your own bare-metal Intel TDX hardware. App developers use `vmm-cli.py` to deploy CVMs to your dstack instance.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "self-host dstack", "BYOH", "data residency" | **End-to-End** |
+| "dev setup", "try locally" | **Dev Deployment** |
+| "production setup" | **Production Deployment** |
+| "deploy KMS", "auth-simple", "auth-eth" | **KMS + Auth Server** |
+| "deploy Gateway", "TLS termination" | **Gateway** |
+| "deploy an app to my dstack", "vmm-cli" | **App Deployment (vmm-cli.py)** |
+| "compare to managed Phala" | **Self-host vs Managed** |
+
+> **Source of truth:** all canonical operator commands are in
+> [github.com/Dstack-TEE/dstack/docs/deployment.md](https://github.com/Dstack-TEE/dstack/blob/master/docs/deployment.md)
+> and the [VMM CLI User Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/vmm-cli-user-guide.md).
+> This skill summarizes them — verify against the current release before
+> running anything in production.
+
+> **Heads up:** there is **no `dstack` command-line tool**. Self-hosting
+> means running the Rust binaries `dstack-vmm` / `dstack-kms` /
+> `dstack-gateway` directly, plus `vmm-cli.py` for app management. This is
+> separate from the npm-installed `phala` CLI used against managed Phala
+> Cloud.
+
+---
+
+## Self-host vs Managed
+
+| Aspect | Managed (`phala` CLI) | Self-Hosted (`dstack-vmm` + `vmm-cli.py`) |
+|---|---|---|
+| Hardware | Phala provides H200 + TDX hosts | You provide bare-metal TDX |
+| Operator | Phala | You |
+| Trust path | Same: TDX quote + on-chain registry | Same: TDX quote + on-chain registry |
+| Best for | Most teams. Lower TCO. | Strict data residency, regulatory boundary, providers building their own confidential cloud |
+| Open source | Yes (dstack runtime) | Yes (you run the same code) |
+| App CLI | `phala deploy` | `./vmm-cli.py deploy` |
+
+The trust model is identical. The only real difference is who operates the hardware.
+
+---
+
+## Hardware Prerequisites
+
+- Bare-metal TDX-capable server (Sapphire Rapids+ Xeon, BIOS TDX enabled). See [canonical/tdx](https://github.com/canonical/tdx) for the host setup.
+- ≥16GB RAM, ≥100GB free disk
+- Public IPv4 + DNS access
+- Optional: NVIDIA H100 or Blackwell for GPU TEE workloads
+
+Verify TDX is active on the host:
+
+```bash
+dmesg | grep -i tdx
+```
+
+---
+
+## Dev Deployment
+
+For local development / testing only. **No security guarantees** — KMS runs in dev mode.
+
+### Step 1: Install build deps
+
+```bash
+# Ubuntu 24.04
+sudo apt install -y build-essential chrpath diffstat lz4 wireguard-tools xorriso
+
+# Install Rust
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+```
+
+### Step 2: Build host config
+
+```bash
+git clone https://github.com/Dstack-TEE/meta-dstack.git --recursive
+cd meta-dstack/
+mkdir build && cd build
+../build.sh hostcfg
+```
+
+Edit the generated `build-config.sh`:
+
+| Variable | Description |
+|---|---|
+| `KMS_DOMAIN` | DNS domain for KMS RPC, e.g. `kms.example.com` |
+| `GATEWAY_DOMAIN` | DNS domain for Gateway RPC, e.g. `gateway.example.com` |
+| `GATEWAY_PUBLIC_DOMAIN` | Public base domain for app routing, e.g. `apps.example.com` |
+| `CERTBOT_ENABLED` | `true` (for ACME via Cloudflare) |
+| `CF_API_TOKEN` | your Cloudflare API token |
+
+```bash
+vim build-config.sh
+../build.sh hostcfg
+../build.sh dl 0.5.5     # download guest image
+```
+
+### Step 3: Run components in separate terminals
+
+```bash
+# Terminal 1: KMS
+./dstack-kms -c kms.toml
+
+# Terminal 2: Gateway (needs sudo for port 443)
+sudo ./dstack-gateway -c gateway.toml
+
+# Terminal 3: VMM
+./dstack-vmm -c vmm.toml
+```
+
+VMM listens on `http://localhost:8080` by default. App deployers point `vmm-cli.py` at this URL.
+
+---
+
+## Production Deployment
+
+Production runs KMS and Gateway each as their own CVMs, behind an auth server. The summary below tracks the canonical guide; see `docs/deployment.md` for the latest.
+
+### Production checklist
+
+1. Set up TDX host with `dstack-vmm`
+2. Deploy KMS as CVM (with auth server; capture its attestation; allowlist its `mrAggregated` before bootstrap)
+3. Deploy Gateway as CVM
+4. Optional: Zero-Trust HTTPS, CT monitoring, multi-node, on-chain governance
+
+### Step 1: Build dstack-vmm
+
+```bash
+git clone https://github.com/Dstack-TEE/dstack
+cd dstack
+cargo build --release -p dstack-vmm -p supervisor
+mkdir -p vmm-data
+cp target/release/dstack-vmm vmm-data/
+cp target/release/supervisor vmm-data/
+cd vmm-data/
+```
+
+### Step 2: Configure VMM
+
+Create `vmm.toml`:
+
+```toml
+address = "tcp:0.0.0.0:9080"
+reuse = true
+image_path = "./images"
+run_path = "./run/vm"
+
+[cvm]
+kms_urls = []
+gateway_urls = []
+cid_start = 30000
+cid_pool_size = 1000
+
+[cvm.port_mapping]
+enabled = true
+address = "127.0.0.1"
+range = [
+  { protocol = "tcp", from = 1, to = 20000 },
+  { protocol = "udp", from = 1, to = 20000 },
+]
+
+[host_api]
+address = "vsock:2"
+port = 10000
+```
+
+Download guest images from [meta-dstack releases](https://github.com/Dstack-TEE/meta-dstack/releases) and extract them to `./images/`. Then start VMM:
+
+```bash
+./dstack-vmm -c vmm.toml
+```
+
+---
+
+## KMS + Auth Server
+
+Production KMS requires an **auth server** that validates boot requests via webhook. Two stock implementations:
+
+| Auth server | Use case | Config |
+|---|---|---|
+| `auth-simple` | Config-file whitelisting | JSON config file |
+| `auth-eth` | On-chain governance via smart contracts | Ethereum RPC + contract |
+| Custom | Your own logic | Implement the webhook interface |
+
+All auth servers expose:
+- `GET /` — health
+- `POST /bootAuth/app` — app boot authz
+- `POST /bootAuth/kms` — KMS boot authz
+
+### auth-simple (config-based)
+
+Create `auth-config.json`:
+
+```json
+{
+  "osImages": ["0x<os-image-hash>"],
+  "kms": {
+    "mrAggregated": ["0x<kms-mr-aggregated>"],
+    "allowAnyDevice": true
+  },
+  "apps": {}
+}
+```
+
+Get the OS image hash:
+
+```bash
+tar -xzf dstack-0.5.5.tar.gz
+cat dstack-0.5.5/digest.txt
+# 0b327bcd642788b0517de3ff46d31ebd3847b6c64ea40bacde268bb9f1c8ec83
+# prefix with 0x in the JSON
+```
+
+Run auth-simple:
+
+```bash
+cd kms/auth-simple
+bun install
+PORT=3001 AUTH_CONFIG_PATH=/path/to/auth-config.json bun run start
+```
+
+> **Important:** an empty `kms.mrAggregated` allowlist is treated as deny-all
+> for KMS. Capture the current KMS measurement with `Onboard.GetAttestationInfo`
+> and add it before bootstrap, or KMS will refuse to onboard.
+
+### auth-eth (on-chain governance)
+
+Use this for decentralized governance — the allowlist lives in a smart contract instead of a JSON file. See [docs/onchain-governance.md](https://github.com/Dstack-TEE/dstack/blob/master/docs/onchain-governance.md).
+
+### Deploy KMS as CVM
+
+Production KMS runs inside its own CVM, NOT on the host:
+
+```bash
+cd dstack/kms/dstack-app/
+# Use the deploy script matching your auth server (auth-simple vs auth-eth)
+# Capture the KMS attestation info, allowlist its mrAggregated, then bootstrap
+```
+
+The exact script and bootstrap dance is version-specific — follow `docs/deployment.md` for the current release.
+
+---
+
+## Gateway
+
+Gateway terminates public TLS and routes traffic to apps. It also runs as a CVM in production.
+
+Gateway config sets:
+- Public domain (e.g. `apps.example.com`)
+- ACME provider (Let's Encrypt via Cloudflare DNS-01)
+- Authorization endpoint (your auth server)
+
+App URLs follow the shape `https://<app_id>-<port>.<gateway_public_domain>` — the same scheme as managed Phala (just with your domain).
+
+---
+
+## App Deployment (vmm-cli.py)
+
+App developers (not operators) use `vmm-cli.py` against your VMM endpoint.
+
+### Install + configure
+
+```bash
+# Get the script
+curl -O https://raw.githubusercontent.com/Dstack-TEE/dstack/master/vmm-cli.py
+chmod +x vmm-cli.py
+
+# Point at your VMM
+export DSTACK_VMM_URL=http://your-vmm-host:8080
+
+# (optional) auth
+export DSTACK_VMM_AUTH_USER=username
+export DSTACK_VMM_AUTH_PASSWORD=password
+
+./vmm-cli.py --help
+```
+
+Server URL precedence: CLI `--url` > `DSTACK_VMM_URL` > default `http://localhost:8080`.
+
+### Discover what's available
+
+```bash
+./vmm-cli.py lsimage         # available OS images
+./vmm-cli.py lsgpu           # available GPU slots
+./vmm-cli.py lsvm            # current VMs (basic)
+./vmm-cli.py lsvm -v         # detailed (vCPU, memory, image, GPUs)
+```
+
+### Deploy an app (two-step)
+
+```bash
+# Step 1: build the app-compose.json from your docker-compose
+./vmm-cli.py compose \
+  --name "my-web-app" \
+  --docker-compose ./docker-compose.yml \
+  --output ./app-compose.json
+
+# Step 2: deploy to a VM
+./vmm-cli.py deploy --app-compose ./app-compose.json [other flags per --help]
+```
+
+### VM lifecycle
+
+```bash
+./vmm-cli.py start <vm-id>
+./vmm-cli.py stop <vm-id>           # graceful
+./vmm-cli.py stop -f <vm-id>        # force
+./vmm-cli.py logs <vm-id>           # last 20 lines
+./vmm-cli.py logs <vm-id> -n 50     # last 50
+./vmm-cli.py logs <vm-id> -f        # stream
+./vmm-cli.py remove <vm-id>         # permanent — wipes data
+```
+
+### KMS key management
+
+```bash
+./vmm-cli.py kms list
+./vmm-cli.py kms add <key>
+./vmm-cli.py kms remove <key>
+```
+
+For full reference: [VMM CLI User Guide](https://github.com/Dstack-TEE/dstack/blob/master/docs/vmm-cli-user-guide.md).
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| `dstack-vmm` won't start | TDX not enabled in BIOS | Reboot, enable TDX, check `dmesg \| grep -i tdx` |
+| KMS rejects bootstrap | `mrAggregated` not in allowlist | Capture KMS measurement via `Onboard.GetAttestationInfo`, add to `auth-config.json` |
+| Gateway 5xx after first boot | ACME cert not yet issued | Wait 1-3 min on first start (DNS-01 challenge) |
+| `vmm-cli.py` connection refused | `DSTACK_VMM_URL` wrong | Confirm VMM listens on `0.0.0.0:8080` (not just `127.0.0.1`) |
+| App deploy fails with "image hash not allowlisted" | OS image not in `auth-config.json` | Add the image's `digest.txt` hash with `0x` prefix |
+
+---
+
+## Reference: minimal end-to-end (dev)
+
+```bash
+# Operator
+git clone https://github.com/Dstack-TEE/meta-dstack.git --recursive
+cd meta-dstack && mkdir build && cd build
+../build.sh hostcfg
+vim build-config.sh        # set domains, CF token
+../build.sh hostcfg
+../build.sh dl 0.5.5
+
+# Run in separate terminals
+./dstack-kms -c kms.toml
+sudo ./dstack-gateway -c gateway.toml
+./dstack-vmm -c vmm.toml
+
+# App developer (separate machine)
+export DSTACK_VMM_URL=http://operator-host:8080
+curl -O https://raw.githubusercontent.com/Dstack-TEE/dstack/master/vmm-cli.py
+./vmm-cli.py compose --name my-app --docker-compose ./docker-compose.yml --output ./app-compose.json
+./vmm-cli.py deploy --app-compose ./app-compose.json
+./vmm-cli.py lsvm
+```
+
+The same `docker-compose.yml` ships unchanged between managed Phala and self-hosted dstack — the trust path is identical.
+
+For production deployment (KMS as CVM, auth server, on-chain governance), follow [docs/deployment.md](https://github.com/Dstack-TEE/dstack/blob/master/docs/deployment.md) line-by-line — version-specific bootstrap details change between releases.
diff --git a/skills/usecase/gpu-tee-custom.md b/skills/usecase/gpu-tee-custom.md
new file mode 100644
index 00000000..6ed74b2b
--- /dev/null
+++ b/skills/usecase/gpu-tee-custom.md
@@ -0,0 +1,278 @@
+---
+name: gpu-tee-custom
+description: |
+  Deploy any custom workload to a Phala Cloud GPU TEE — Jupyter notebooks,
+  custom training scripts, computer vision pipelines, scientific compute.
+  Generic recipe for getting a Docker image running on H200 with TDX +
+  NVIDIA CC attestation. For LLM serving see gpu-vllm-deploy. For
+  fine-tuning see training-run.
+---
+
+# Custom Workload on Phala GPU TEE
+
+`phala deploy` any Docker image onto an H200 GPU with full TDX + NVIDIA CC attestation.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "deploy my notebook", "Jupyter", "research env" | **Jupyter Notebook** |
+| "run inference", "computer vision pipeline", "custom workload" | **First Deploy** |
+| "list GPUs", "what instances", "pricing" | **Instance Types** |
+| "scale to 8 GPU" | **Multi-GPU** |
+| "verify GPU CC", "is the GPU sealed" | **Verify GPU CC** |
+| "SSH into GPU" | **SSH** |
+| "GPU support for H100", "B300" | **Availability** |
+
+This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first.
+
+---
+
+## Instance Types
+
+Run live:
+
+```bash
+phala instance-types
+```
+
+Current GPU types (as of writing):
+
+| ID | GPUs | vCPU | RAM | Hourly |
+|---|---|---|---|---|
+| `h200.small` | 1× H200 SXM 141GB | 24 | 192 GB | $3.50 |
+| `h200.16xlarge` | 8× H200 SXM 141GB | 64 | 256 GB | $23.04 |
+| `h200.8x.large` | 8× H200 SXM 141GB | 192 | 1.5 TB | $23.04 |
+
+Pick `h200.small` for single-GPU workloads (most fine-tuning, single-tenant inference). Pick the 8× variants for multi-GPU training, large-model inference, or heavy CPU/RAM needs.
+
+### Availability
+
+The CLI lists what's actually deployable in your workspace. **H100 and B300 SKUs may appear on the marketing site but not the CLI** depending on current capacity. Run `phala instance-types` for ground truth. For other regions / hardware, contact Phala sales.
+
+---
+
+## Jupyter Notebook
+
+The fastest path to "I can run code in a GPU TEE."
+
+### Step 1: `docker-compose.yml`
+
+```yaml
+services:
+  jupyter:
+    image: quay.io/jupyter/scipy-notebook:cuda-latest
+    restart: unless-stopped
+    environment:
+      - JUPYTER_TOKEN=${JUPYTER_TOKEN}
+    ports:
+      - "8888:8888"
+    volumes:
+      - work:/home/jovyan/work
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+volumes:
+  work:
+```
+
+### Step 2: `.env`
+
+```
+JUPYTER_TOKEN=pick-a-strong-token
+```
+
+### Step 3: Deploy
+
+```bash
+phala deploy -n my-jupyter -c docker-compose.yml -e .env -t h200.small --kms phala --wait
+```
+
+### Step 4: Open the notebook
+
+```bash
+# URL shape: https://<app_id>-<port>.<gateway_base_domain>
+# Get it live from the CVM:
+phala cvms get my-jupyter --json | jq -r '.endpoints[] | select(.app | contains("-8888.")) | .app'
+# Open the URL in your browser, paste your JUPYTER_TOKEN to log in.
+```
+
+---
+
+## First Deploy (Generic)
+
+For any custom Docker image:
+
+### Step 1: Compose template
+
+```yaml
+services:
+  workload:
+    image: <your-org>/<your-image>:<tag>     # publicly pullable
+    restart: unless-stopped
+    environment:
+      - YOUR_VAR=${YOUR_VAR}
+    volumes:
+      - data:/data
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+volumes:
+  data:
+```
+
+### Step 2: Build + push your image
+
+```bash
+docker build -t ghcr.io/<your-org>/<your-image>:v1 .
+docker push ghcr.io/<your-org>/<your-image>:v1
+```
+
+For private registries, set `DSTACK_DOCKER_USERNAME` and `DSTACK_DOCKER_PASSWORD` in your `.env`.
+
+### Step 3: Deploy
+
+```bash
+phala deploy -n my-workload -c docker-compose.yml -e .env -t h200.small --kms phala --wait
+```
+
+### Step 4: Verify
+
+```bash
+phala ps                  # is your container running?
+phala logs -f             # output
+phala cvms attestation    # TDX + GPU NV-CSE quote
+```
+
+---
+
+## Multi-GPU
+
+Tensor-parallel or data-parallel across 8× H200:
+
+### Compose changes
+
+```yaml
+services:
+  workload:
+    # ... same as above ...
+    command: >
+      torchrun
+      --nproc_per_node=8
+      train.py
+```
+
+### Deploy
+
+```bash
+phala deploy -n my-workload -c docker-compose.yml -e .env -t h200.16xlarge --kms phala --wait
+```
+
+`h200.16xlarge` and `h200.8x.large` both give 8× H200; pick `8x.large` for larger host CPU/RAM (192 vCPU, 1.5 TB RAM).
+
+---
+
+## Verify GPU CC
+
+NVIDIA Confidential Computing seals the GPU memory. Confirm it's active:
+
+### Inside the CVM
+
+```bash
+phala ssh
+nvidia-smi conf-compute -q
+# Look for: ConfComputeMode : ON
+```
+
+### From the attestation
+
+```bash
+phala cvms attestation my-workload --json | jq '.app_certificates[0].quote'
+```
+
+The hex `quote` decodes into a combined TDX + GPU NV-CSE attestation. For the full verification flow (Intel TDX + NVIDIA NRAS + report-data binding + compose-hash), follow `verify-attestation.md`. Offline-only path: `dcap-qvl` for the TDX layer + `nvattest-verifier` for the NVIDIA layer.
+
+For a full reference on parsing the combined TDX+NVIDIA quote, see `https://docs.phala.com/phala-cloud/confidential-ai/verify/verify-attestation`.
+
+---
+
+## SSH
+
+Useful for debugging GPU-specific issues.
+
+```bash
+phala ssh
+# inside the CVM:
+nvidia-smi                         # check GPU state
+nvidia-smi conf-compute -q         # check CC mode
+docker ps                          # what's running
+docker logs <container>            # container logs
+```
+
+Run a single command without an interactive shell:
+
+```bash
+phala ssh -- nvidia-smi
+phala ssh -- docker stats --no-stream
+```
+
+Port-forward a custom port back to your laptop:
+
+```bash
+phala ssh -- -L 8088:localhost:8088
+```
+
+---
+
+## Common Docker images
+
+| Use case | Image | Notes |
+|---|---|---|
+| Jupyter + PyTorch | `quay.io/jupyter/scipy-notebook:cuda-latest` | Pre-installed CUDA, scipy, sklearn |
+| PyTorch dev | `nvcr.io/nvidia/pytorch:24.10-py3` | NVIDIA's official, CUDA 12.x |
+| TensorFlow | `tensorflow/tensorflow:latest-gpu` | TF 2.x with GPU support |
+| Ollama | `ollama/ollama:latest` | Local LLM serving (alternative to vLLM) |
+| ComfyUI / Stable Diffusion | `yanwk/comfyui-boot:latest` | SD pipeline |
+| Whisper / TTS | `onerahmet/openai-whisper-asr-webservice` | ASR endpoints |
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| Container reports `no CUDA device` | GPU passthrough missing | Add the `deploy.resources.reservations.devices` block to compose |
+| OOM during model load | Model too big for 1 H200 (141 GB VRAM) | Move to `h200.16xlarge` (8× 141 = 1.1 TB total) |
+| Jupyter login loop | Wrong token | `phala logs` to find the auto-generated token, or set `JUPYTER_TOKEN` env explicitly |
+| `manifest unknown` on deploy | Image not public | Push to public registry OR add `DSTACK_DOCKER_USERNAME/PASSWORD` to env |
+| Slow `apt-get` / `pip` | dstack-gateway egress | Pre-bake all dependencies into the Docker image |
+| `nvidia-smi conf-compute -q` says OFF | Host config | Open a Phala support ticket |
+
+---
+
+## Reference: minimal end-to-end
+
+```bash
+# 1. Scaffold + build
+docker build -t ghcr.io/me/my-workload:v1 ./workload
+docker push ghcr.io/me/my-workload:v1
+
+# 2. Deploy on 1× H200
+phala login
+phala deploy -n my-workload -c docker-compose.yml -t h200.small --kms phala --wait
+
+# 3. Verify + use
+phala cvms attestation --json | jq
+phala ssh -- nvidia-smi conf-compute -q
+phala cvms get my-workload --json | jq '.endpoints'
+```
diff --git a/skills/usecase/gpu-vllm-deploy.md b/skills/usecase/gpu-vllm-deploy.md
new file mode 100644
index 00000000..4fb69c86
--- /dev/null
+++ b/skills/usecase/gpu-vllm-deploy.md
@@ -0,0 +1,275 @@
+---
+name: gpu-vllm-deploy
+description: |
+  Deploy vLLM (or any OpenAI-compatible LLM server) onto a Phala Cloud
+  GPU TEE — Llama, Qwen, DeepSeek, Mistral, etc. Use when users want
+  self-hosted private inference on H200 with verifiable attestation
+  and an OpenAI-compatible endpoint at /v1/chat/completions.
+---
+
+# Self-Hosted vLLM on Phala GPU TEE
+
+`phala deploy` an OpenAI-compatible inference server inside a confidential H200 GPU.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "deploy vLLM", "run my own LLM", "self-host inference" | **First Deploy** |
+| "load Llama", "load Qwen", "switch model" | **Choose Model** |
+| "scale to 8 GPU", "tensor parallel" | **Multi-GPU** |
+| "verify GPU TEE", "is the GPU in CC mode" | **Verify GPU CC** |
+| "private model weights", "seal weights" | **Seal Weights** |
+| "OpenAI client can't connect" | **Endpoint** |
+
+This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first.
+
+---
+
+## Choose Model
+
+vLLM supports any HuggingFace model. Common choices for confidential workloads:
+
+| Model | HF ID | VRAM | Fits on |
+|---|---|---|---|
+| Llama 3.1 8B Instruct | `meta-llama/Llama-3.1-8B-Instruct` | 16 GB | `h200.small` |
+| Llama 3.1 70B Instruct | `meta-llama/Llama-3.1-70B-Instruct` | 140 GB | `h200.small` (FP8) or `h200.16xlarge` (FP16) |
+| Qwen 2.5 7B Instruct | `Qwen/Qwen2.5-7B-Instruct` | 16 GB | `h200.small` |
+| DeepSeek V3 0324 | `deepseek-ai/DeepSeek-V3-0324` | ~600 GB | `h200.16xlarge` (8× H200, FP8) |
+| GPT-OSS 120B | `openai/gpt-oss-120b` | ~240 GB | `h200.16xlarge` (FP8) |
+| Gemma 3 27B | `google/gemma-3-27b-it` | 54 GB | `h200.small` |
+
+Confirm available types:
+
+```bash
+phala instance-types
+```
+
+---
+
+## Scaffold
+
+### Step 1: Project layout
+
+```bash
+mkdir my-llm && cd my-llm
+```
+
+```
+my-llm/
+├── docker-compose.yml
+├── .env.example         # HF_TOKEN
+└── .env                 # gitignored
+```
+
+### Step 2: `docker-compose.yml`
+
+```yaml
+services:
+  vllm:
+    image: vllm/vllm-openai:latest
+    restart: unless-stopped
+    environment:
+      - HF_TOKEN=${HF_TOKEN}
+      - VLLM_API_KEY=${VLLM_API_KEY:-sk-local}
+    volumes:
+      - hf-cache:/root/.cache/huggingface
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    ports:
+      - "8000:8000"
+    command: >
+      --model meta-llama/Llama-3.1-8B-Instruct
+      --dtype auto
+      --max-model-len 8192
+      --api-key $${VLLM_API_KEY}
+volumes:
+  hf-cache:
+```
+
+Notes:
+- `vllm/vllm-openai:latest` exposes `/v1/chat/completions` and `/v1/completions` on port 8000.
+- The `HF_TOKEN` env (sealed via `phala deploy -e`) lets vLLM pull gated models from Hugging Face.
+- `VLLM_API_KEY` protects the endpoint — anyone hitting it must know the key.
+- For GPU CC mode confirmation, see **Verify GPU CC** below.
+
+### Step 3: `.env`
+
+```
+HF_TOKEN=hf_real_token_here
+VLLM_API_KEY=sk-pick-something-secret
+```
+
+---
+
+## First Deploy
+
+### Step 1: Pick GPU shape
+
+| Use case | Instance type | vCPU / RAM | Notes |
+|---|---|---|---|
+| 7-13B model, single user | `h200.small` (1× H200) | 24 / 192 GB | $3.50/hr |
+| 70B FP16, batch | `h200.16xlarge` (8× H200) | 64 / 256 GB | $23/hr — needs `--tensor-parallel-size 8` |
+| 70B FP16, GPU-heavy throughput | `h200.8x.large` (8× H200) | 192 / 1.5 TB | $23/hr — same GPUs, more host CPU/RAM |
+
+```bash
+phala deploy -n my-llm -c docker-compose.yml -e .env -t h200.small --kms phala --wait
+```
+
+### Step 2: Get the endpoint
+
+```bash
+phala cvms get my-llm --json | jq '.endpoints'
+# Each endpoint has shape: { "app": "https://<app_id>-<port>.<gateway>", "instance": "..." }
+```
+
+The URL format is `https://<app_id>-<port>.<gateway_base_domain>` — the
+gateway base domain is per-cluster (e.g. `dstack-pha-prod12.phala.network`),
+NOT a global `dstack.phala.network`. Always pull it live from the CVM JSON.
+
+### Step 3: Test the endpoint
+
+```bash
+ENDPOINT=$(phala cvms get my-llm --json | jq -r '.endpoints[] | select(.app | contains("-8000.")) | .app')
+curl $ENDPOINT/v1/chat/completions \
+  -H "Authorization: Bearer sk-pick-something-secret" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "meta-llama/Llama-3.1-8B-Instruct",
+    "messages": [{"role": "user", "content": "Hello"}]
+  }'
+```
+
+---
+
+## Multi-GPU
+
+For 70B models or higher throughput, spread across 8× H200.
+
+### Step 1: Update compose
+
+```yaml
+services:
+  vllm:
+    image: vllm/vllm-openai:latest
+    # ... env, volumes, ports same as above ...
+    command: >
+      --model meta-llama/Llama-3.1-70B-Instruct
+      --tensor-parallel-size 8
+      --dtype auto
+      --max-model-len 16384
+      --api-key $${VLLM_API_KEY}
+```
+
+### Step 2: Deploy on the 8× H200 instance
+
+```bash
+phala deploy -n my-llm-70b -c docker-compose.yml -e .env -t h200.16xlarge --kms phala --wait
+```
+
+vLLM auto-shards the model across all 8 GPUs.
+
+---
+
+## Verify GPU CC
+
+### From inside the CVM
+
+```bash
+phala ssh
+nvidia-smi conf-compute -q
+# Expect: ConfComputeMode : ON
+```
+
+### Full attestation flow
+
+For the complete verification (NVIDIA NRAS + Intel TDX + report-data binding + compose-hash + Sigstore), follow `verify-attestation.md`.
+
+The minimum check:
+
+```bash
+phala cvms attestation my-llm --json > attestation.json
+# The .app_certificates[0].quote contains the combined TDX + GPU NV-CSE attestation.
+# Verify with Phala's online endpoint or dcap-qvl + nvattest-verifier offline.
+```
+
+---
+
+## Seal Weights
+
+If your model weights are private and shouldn't be re-pullable on every deploy:
+
+### Option 1: Pre-launch download
+
+```bash
+phala deploy ... --pre-launch-script ./download-weights.sh
+```
+
+`download-weights.sh` runs inside the CVM after attestation. Use `HF_TOKEN` (sealed env) to pull, write to a persistent volume.
+
+### Option 2: Encrypt-at-rest
+
+Encrypt the weight tarball client-side, ship to S3, decrypt inside the CVM with a key derived from `HKDF(kms_root_pubkey, app_id, compose_hash)`. The decrypt key only re-derives if the compose-hash matches — stolen ciphertext is useless.
+
+See `data-coanalysis.md` for the HKDF pattern.
+
+---
+
+## Endpoint
+
+### Public URL
+
+The shape is `https://<app_id>-<port>.<gateway_base_domain>/v1/chat/completions`.
+`<port>` is whatever port your compose exposes (8000 for vLLM by default).
+The gateway base domain is per-cluster — pull it from `phala cvms get`:
+
+```bash
+phala cvms get my-llm --json | jq -r '.gateway.base_domain'
+# e.g. dstack-pha-prod5.phala.network
+phala cvms get my-llm --json | jq -r '.app_id'
+# e.g. e029a4b8...
+# Compose: https://e029a4b8...-8000.dstack-pha-prod5.phala.network
+```
+
+### Custom domain
+
+`dstack-gateway` supports custom domain mapping via the dashboard. Add `gateway.alias.example.com` → `<cvm-id>-8000`. The TLS cert continues to carry the TDX quote in an X.509 extension.
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| vLLM OOMs on startup | Model too big for 1 GPU | Move to `h200.16xlarge` + `--tensor-parallel-size 8` |
+| `HF_TOKEN` invalid | Sealed env not passed | Re-deploy with `-e .env` and confirm token in HF account has read access |
+| Endpoint times out | vLLM still loading weights | First boot can take 10-20 min for large models. `phala logs -f` shows progress. |
+| `nvidia-smi conf-compute -q` says OFF | GPU not in CC mode | Open a Phala support ticket — host config issue |
+| 401 from `/v1/chat/completions` | Wrong API key | Set `Authorization: Bearer $VLLM_API_KEY` |
+| Throughput poor on 70B | FP16 on 1 GPU | Switch to FP8 or move to 8× H200 |
+
+---
+
+## Reference: minimal end-to-end
+
+```bash
+# 1. Scaffold
+mkdir my-llm && cd my-llm
+# (write docker-compose.yml + .env with HF_TOKEN)
+
+# 2. Auth + deploy
+phala login
+phala deploy -n my-llm -c docker-compose.yml -e .env -t h200.small --kms phala --wait
+
+# 3. Verify GPU CC + endpoint
+phala cvms attestation --json | jq '.app_certificates[0].quote'
+# The hex string decodes into the combined TDX quote. For GPU CC verification,
+# use NVIDIA's nvattest-verifier or dstack-verifier on the same blob —
+# the GPU quote is bound into the TDX report_data.
+phala ssh -- nvidia-smi conf-compute -q
+ENDPOINT=$(phala cvms get my-llm --json | jq -r '.endpoints[0]')
+curl $ENDPOINT/v1/chat/completions -H "Authorization: Bearer $VLLM_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model":"meta-llama/Llama-3.1-8B-Instruct","messages":[{"role":"user","content":"Hello"}]}'
+```
+
+The endpoint is OpenAI-compatible. Drop into any OpenAI SDK by setting `base_url=$ENDPOINT/v1`.
diff --git a/skills/usecase/inference-call.md b/skills/usecase/inference-call.md
new file mode 100644
index 00000000..37f03a14
--- /dev/null
+++ b/skills/usecase/inference-call.md
@@ -0,0 +1,313 @@
+---
+name: inference-call
+description: |
+  Call the Phala Confidential AI API (hosted models on GPU TEE) via the
+  OpenAI-compatible interface at api.redpill.ai/v1. Use when users want
+  to call DeepSeek, Qwen, Llama, GPT-OSS, Gemma, etc. without deploying
+  their own server — pay per token, no infrastructure.
+---
+
+# Phala Confidential AI API
+
+OpenAI-compatible inference on confidential GPUs at `https://api.redpill.ai/v1`.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "call confidential AI", "use phala model", "OpenAI compatible" | **First Call** |
+| "get an API key", "how to authenticate" | **Get API Key** |
+| "Python SDK", "TypeScript SDK" | **SDKs** |
+| "stream tokens", "SSE" | **Streaming** |
+| "tool calling", "function calling" | **Tool Calling** |
+| "vision", "image input" | **Images & Vision** |
+| "structured output", "JSON mode" | **Structured Output** |
+| "verify the response", "signed receipt" | **Verify Signature** |
+| "list models", "available models" | **Model Catalog** |
+
+---
+
+## Get API Key
+
+1. Go to [cloud.phala.com](https://cloud.phala.com) and add at least $5 in credits (Dashboard → Deposit).
+2. Open **Dashboard → Confidential AI API** and click **Enable**.
+3. Click **Create Key**, give it a name, and copy the value (starts with `sk-`).
+
+Store the key in your environment:
+
+```bash
+export CONFIDENTIAL_AI_KEY="sk-..."
+```
+
+---
+
+## First Call
+
+### cURL
+
+```bash
+curl https://api.redpill.ai/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" \
+  -d '{
+    "model": "openai/gpt-oss-20b",
+    "messages": [
+      { "role": "user", "content": "Hello world!" }
+    ]
+  }'
+```
+
+That's the canonical "hello world." Replace the model and message and you're done.
+
+---
+
+## Model Catalog
+
+Models are namespaced by provider. All run inside GPU TEE.
+
+### Phala provider (lowest cost)
+
+| Model | Model ID | Context | $/1M (in/out) |
+|---|---|---|---|
+| DeepSeek V3 0324 | `deepseek/deepseek-chat-v3-0324` | 163K | 0.28 / 1.14 |
+| Qwen 2.5 VL 72B | `qwen/qwen2.5-vl-72b-instruct` | 65K | 0.59 / 0.59 |
+| Gemma 3 27B | `google/gemma-3-27b-it` | 53K | 0.11 / 0.40 |
+| GPT-OSS 120B | `openai/gpt-oss-120b` | 131K | 0.10 / 0.49 |
+| GPT-OSS 20B | `openai/gpt-oss-20b` | 131K | 0.04 / 0.15 |
+| Qwen 2.5 7B | `qwen/qwen-2.5-7b-instruct` | 32K | 0.04 / 0.10 |
+
+### Other providers
+
+| Model | Model ID | Context |
+|---|---|---|
+| DeepSeek V3.1 (NearAI) | `deepseek/deepseek-chat-v3.1` | 163K |
+| Qwen3 30B (NearAI) | `qwen/qwen3-30b-a3b-instruct-2507` | 262K |
+| Z.AI GLM 4.6 (NearAI) | `z-ai/glm-4.6` | 202K |
+| Phi-4 (Tinfoil) | check live catalog | — |
+
+The full live catalog: <https://redpill.ai/models> — filter by **GPU TEE** to see only confidential variants.
+
+---
+
+## SDKs
+
+The endpoint is OpenAI-compatible. Any OpenAI SDK works — just change the base URL.
+
+### Python (OpenAI SDK)
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key=os.environ["CONFIDENTIAL_AI_KEY"],
+    base_url="https://api.redpill.ai/v1",
+)
+
+response = client.chat.completions.create(
+    model="phala/deepseek-chat-v3-0324",
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "What is your model name?"},
+    ],
+)
+print(response.choices[0].message.content)
+```
+
+### TypeScript (OpenAI SDK)
+
+```typescript
+import OpenAI from "openai"
+
+const client = new OpenAI({
+  baseURL: "https://api.redpill.ai/v1",
+  apiKey: process.env.CONFIDENTIAL_AI_KEY,
+})
+
+const completion = await client.chat.completions.create({
+  model: "phala/deepseek-chat-v3-0324",
+  messages: [{ role: "user", content: "What is your model name?" }],
+})
+
+console.log(completion.choices[0].message)
+```
+
+### LangChain
+
+```python
+from langchain_openai import ChatOpenAI
+
+llm = ChatOpenAI(
+    model="phala/deepseek-chat-v3-0324",
+    base_url="https://api.redpill.ai/v1",
+    api_key=os.environ["CONFIDENTIAL_AI_KEY"],
+)
+```
+
+---
+
+## Streaming
+
+```python
+stream = client.chat.completions.create(
+    model="phala/qwen-2.5-7b-instruct",
+    messages=[{"role": "user", "content": "Write a haiku about TEEs"}],
+    stream=True,
+)
+for chunk in stream:
+    print(chunk.choices[0].delta.content or "", end="", flush=True)
+```
+
+cURL with SSE:
+
+```bash
+curl https://api.redpill.ai/v1/chat/completions \
+  -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" \
+  -H "Content-Type: application/json" \
+  -N \
+  -d '{
+    "model": "phala/qwen-2.5-7b-instruct",
+    "messages": [{"role":"user","content":"Hello"}],
+    "stream": true
+  }'
+```
+
+---
+
+## Tool Calling
+
+Standard OpenAI tool-calling format.
+
+```python
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get weather for a city",
+        "parameters": {
+            "type": "object",
+            "properties": {"city": {"type": "string"}},
+            "required": ["city"],
+        },
+    },
+}]
+
+response = client.chat.completions.create(
+    model="phala/deepseek-chat-v3-0324",
+    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
+    tools=tools,
+)
+print(response.choices[0].message.tool_calls)
+```
+
+Models that support tool calling: most Phala-provider models. Check the catalog page for capability flags.
+
+---
+
+## Images & Vision
+
+For VLM models like `qwen/qwen2.5-vl-72b-instruct`:
+
+```python
+response = client.chat.completions.create(
+    model="qwen/qwen2.5-vl-72b-instruct",
+    messages=[{
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "What's in this image?"},
+            {"type": "image_url", "image_url": {"url": "https://example.com/img.jpg"}},
+        ],
+    }],
+)
+```
+
+---
+
+## Structured Output
+
+JSON mode + JSON Schema:
+
+```python
+response = client.chat.completions.create(
+    model="phala/deepseek-chat-v3-0324",
+    messages=[{"role": "user", "content": "Give me a person record."}],
+    response_format={
+        "type": "json_schema",
+        "json_schema": {
+            "name": "person",
+            "schema": {
+                "type": "object",
+                "properties": {
+                    "name": {"type": "string"},
+                    "age": {"type": "integer"},
+                },
+                "required": ["name", "age"],
+            },
+        },
+    },
+)
+```
+
+---
+
+## Verify Signature
+
+Every Confidential AI API response can be cryptographically verified — the response chains to the GPU TEE quote.
+
+### Per-request attestation report
+
+Fetch a fresh attestation tied to a nonce:
+
+```bash
+NONCE=$(openssl rand -hex 32)
+curl -s "https://api.redpill.ai/v1/attestation/report?model=phala/deepseek-chat-v3-0324&nonce=$NONCE" \
+  -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" > report.json
+```
+
+The response has `nvidia_payload`, `intel_quote`, `signing_address`, and `signing_algo`.
+
+### Response headers (per-request)
+
+| Header | Meaning |
+|---|---|
+| `x-phala-receipt-sig` | Signature over `(model_id, prompt_hash, response_hash, timestamp)` |
+| `x-phala-compose-hash` | Compose-hash of the model-serving CVM |
+| `x-phala-app-id` | Per-app key identity |
+
+### Full verification flow
+
+For the complete end-to-end verification — verify NVIDIA GPU via NRAS, verify Intel TDX, check report-data binds the signing key + nonce, verify the compose manifest, check Sigstore provenance, and verify the response signature — follow `verify-attestation.md`.
+
+Reference implementation: [`Phala-Network/private-ml-sdk/vllm-proxy/verifiers/attestation_verifier.py`](https://github.com/Phala-Network/private-ml-sdk/blob/main/vllm-proxy/verifiers/attestation_verifier.py).
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| 401 Unauthorized | Bad / expired key | Generate a new key in Dashboard → Confidential AI API |
+| 402 Payment Required | Out of credits | Add funds in Dashboard → Deposit |
+| 404 Not Found | Wrong model ID | Use lowercase, e.g. `phala/deepseek-chat-v3-0324` not `Phala/DeepSeek-V3` |
+| 429 Rate Limited | Workspace quota | Wait or contact Phala for quota increase |
+| Response cuts off | Hit `max_tokens` | Increase `max_tokens` in request |
+| Slow first token | Cold start on smaller models | Use a Dedicated Model deployment for predictable latency |
+
+---
+
+## Reference: minimal end-to-end
+
+```bash
+# 1. Get API key from cloud.phala.com (one-time)
+export CONFIDENTIAL_AI_KEY="sk-..."
+
+# 2. Call
+curl https://api.redpill.ai/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" \
+  -d '{
+    "model": "openai/gpt-oss-20b",
+    "messages": [{"role":"user","content":"Hello world!"}]
+  }'
+```
+
+For self-hosted alternative (dedicated GPU + your own model weights), see `gpu-vllm-deploy.md`.
diff --git a/skills/usecase/training-run.md b/skills/usecase/training-run.md
new file mode 100644
index 00000000..475dc02c
--- /dev/null
+++ b/skills/usecase/training-run.md
@@ -0,0 +1,372 @@
+---
+name: training-run
+description: |
+  Run a confidential training / fine-tuning job on Phala Cloud GPU TEE.
+  SFT, DPO, RLHF, LoRA / QLoRA / PEFT, continued pre-training, or
+  multimodal projector training with TRL, Unsloth, or HuggingFace.
+  Use when users want to train on sealed datasets with attested checkpoints.
+---
+
+# Confidential Training on Phala GPU TEE
+
+`phala deploy` a TRL/Unsloth/HuggingFace training job on H200 with sealed datasets and signed checkpoint output.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "fine-tune Llama", "SFT my model" | **First Run (SFT)** |
+| "DPO", "preference tuning", "RLHF" | **DPO / RLHF** |
+| "LoRA", "QLoRA", "PEFT" | **LoRA / PEFT** |
+| "continued pre-training", "domain adaptation" | **Continued PT** |
+| "multimodal", "vision adapter" | **Multimodal** |
+| "seal the dataset", "private data" | **Seal Dataset** |
+| "save checkpoints", "signed manifest" | **Output & Signing** |
+| "scale to 8 GPU" | **Multi-GPU** |
+
+This skill builds on the foundational `../phala-cli/SKILL.md`. Install + login per that skill first.
+
+---
+
+## Choose Method
+
+| Method | Trainer | Best for | Memory |
+|---|---|---|---|
+| SFT | TRL `SFTTrainer`, Unsloth | Instruction tuning on chat data | Full / LoRA |
+| DPO | TRL `DPOTrainer` | Preference tuning (chosen/rejected pairs) | Reference + policy |
+| RLHF | TRL `PPOTrainer` + reward model | Online RL from prefs | Heaviest |
+| LoRA / PEFT | TRL + PEFT, Unsloth | Cost-efficient fine-tune | Tiny — 7B fits in 1 H200 |
+| QLoRA | Unsloth, BitsAndBytes | 4-bit base + LoRA adapters | Smallest |
+| Continued PT | Unsloth, raw HF Trainer | Domain adaptation on raw text | Medium |
+| Multimodal | TRL + projector head | Adding vision / audio to LLM | Medium |
+
+Most teams start with **LoRA on a 7-13B model** (`h200.small`, $3.50/hr).
+
+---
+
+## Scaffold
+
+### Step 1: Project layout
+
+```bash
+mkdir my-finetune && cd my-finetune
+```
+
+```
+my-finetune/
+├── docker-compose.yml
+├── train/
+│   ├── Dockerfile
+│   ├── requirements.txt
+│   └── train.py            # the actual training script
+├── data/                   # local-only, sealed dataset
+│   └── dataset.tar.gz.enc
+├── .env.example
+└── .env                    # gitignored
+```
+
+### Step 2: `train/Dockerfile`
+
+```dockerfile
+FROM nvcr.io/nvidia/pytorch:24.10-py3
+RUN pip install transformers trl peft accelerate bitsandbytes datasets unsloth
+COPY train.py /app/train.py
+WORKDIR /app
+CMD ["python", "train.py"]
+```
+
+### Step 3: `train/train.py` (LoRA SFT example)
+
+```python
+from trl import SFTTrainer, SFTConfig
+from peft import LoraConfig
+from datasets import load_dataset
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import os
+
+model_id = os.environ["BASE_MODEL"]            # e.g. meta-llama/Llama-3.1-8B-Instruct
+dataset_path = os.environ["DATASET_PATH"]      # mounted volume path
+output_dir = os.environ.get("OUTPUT_DIR", "/output")
+
+tok = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
+ds = load_dataset("json", data_files=dataset_path, split="train")
+
+trainer = SFTTrainer(
+    model=model,
+    tokenizer=tok,
+    train_dataset=ds,
+    peft_config=LoraConfig(r=32, lora_alpha=64, target_modules="all-linear"),
+    args=SFTConfig(
+        output_dir=output_dir,
+        num_train_epochs=3,
+        per_device_train_batch_size=2,
+        save_strategy="epoch",
+    ),
+)
+trainer.train()
+trainer.save_model(output_dir)
+```
+
+### Step 4: `docker-compose.yml`
+
+```yaml
+services:
+  trainer:
+    build: ./train
+    environment:
+      - BASE_MODEL=meta-llama/Llama-3.1-8B-Instruct
+      - DATASET_PATH=/data/dataset.jsonl
+      - OUTPUT_DIR=/output
+      - HF_TOKEN=${HF_TOKEN}
+      - DATASET_KEY=${DATASET_KEY}        # for in-CVM dataset decryption
+    volumes:
+      - sealed-data:/data
+      - checkpoints:/output
+      - /var/run/dstack.sock:/var/run/dstack.sock
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+volumes:
+  sealed-data:
+  checkpoints:
+```
+
+---
+
+## Seal Dataset
+
+The dataset never leaves your laptop in cleartext.
+
+### Step 1: Encrypt locally with HKDF
+
+```python
+# scripts/seal-dataset.py
+import os, hashlib, hmac, json
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+
+KMS_PUBKEY = open("kms-pubkey.txt").read().strip()
+APP_ID = os.environ["APP_ID"]
+COMPOSE_HASH = os.environ["COMPOSE_HASH"]
+
+def hkdf(material: bytes, info: bytes, length: int = 32) -> bytes:
+    return hmac.new(material, info, hashlib.sha256).digest()[:length]
+
+key = hkdf(KMS_PUBKEY.encode(), f"{APP_ID}:{COMPOSE_HASH}".encode())
+aes = AESGCM(key)
+nonce = os.urandom(12)
+plaintext = open("dataset.jsonl", "rb").read()
+ct = aes.encrypt(nonce, plaintext, None)
+open("dataset.jsonl.enc", "wb").write(nonce + ct)
+```
+
+### Step 2: Upload encrypted blob
+
+The encrypted file ships with the compose. Or to S3 (decrypt inside CVM via `--pre-launch-script`).
+
+### Step 3: Decrypt inside the CVM
+
+The CVM's per-app key is derived only after attestation passes. Mirror the HKDF to re-derive the same key inside the CVM:
+
+```python
+# Inside train.py — pip install dstack-sdk cryptography
+import os
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+from dstack_sdk import DstackClient
+
+client = DstackClient()                    # auto-connects to /var/run/dstack.sock
+derived = client.get_key("dataset/wrap", os.environ["COMPOSE_HASH"])
+key = derived.decode_key()                 # 32-byte secp256k1 → use as AES key
+
+aes = AESGCM(key)
+blob = open("/data/dataset.jsonl.enc", "rb").read()
+plaintext = aes.decrypt(blob[:12], blob[12:], None)
+open("/tmp/dataset.jsonl", "wb").write(plaintext)
+```
+
+If the compose-hash doesn't match the registered hash, the derived key is wrong and decryption fails. Stolen ciphertext is useless.
+
+---
+
+## First Run (SFT)
+
+### Step 1: Pick GPU shape
+
+| Model size | Type | Per-device batch | GPU |
+|---|---|---|---|
+| 7B LoRA | `h200.small` | 4-8 | 1× H200 |
+| 13B LoRA | `h200.small` | 2-4 | 1× H200 |
+| 70B LoRA | `h200.16xlarge` | 1-2 | 8× H200 |
+| 7B full SFT | `h200.16xlarge` | 1 | 8× H200 |
+
+### Step 2: Deploy
+
+```bash
+phala deploy -n llama-sft -c docker-compose.yml -e .env -t h200.small --kms phala --wait
+```
+
+### Step 3: Stream logs
+
+```bash
+phala logs -f
+```
+
+Look for `loss=...` decreasing. Training duration depends on dataset size and method — typical small-data LoRA finishes in 1-3 hours.
+
+---
+
+## Multi-GPU
+
+For 70B SFT or larger:
+
+### Update `train.py` to use accelerate launcher
+
+```python
+# Replace the simple Trainer with deepspeed / FSDP via accelerate
+```
+
+### Update compose command
+
+```yaml
+command: ["accelerate", "launch", "--multi_gpu", "--num_processes=8", "train.py"]
+```
+
+### Deploy on 8× H200
+
+```bash
+phala deploy -n llama-70b-sft -c docker-compose.yml -e .env -t h200.16xlarge --kms phala --wait
+```
+
+---
+
+## DPO / RLHF
+
+Swap `SFTTrainer` for `DPOTrainer`:
+
+```python
+from trl import DPOTrainer, DPOConfig
+
+trainer = DPOTrainer(
+    model=model,
+    ref_model=None,           # uses peft adapters disabled
+    tokenizer=tok,
+    train_dataset=ds,         # must have chosen/rejected fields
+    args=DPOConfig(output_dir=output_dir, beta=0.1, num_train_epochs=1),
+)
+trainer.train()
+```
+
+For PPO/RLHF, follow TRL's `PPOTrainer` recipe — same compose, swap script.
+
+---
+
+## LoRA / PEFT
+
+Already shown in **Scaffold**. Adapter files land in `/output/adapter_*` — much smaller than full checkpoints (typically 50-500 MB).
+
+---
+
+## Continued PT
+
+Use Unsloth's `from_pretrained` + raw `Trainer`:
+
+```python
+from unsloth import FastLanguageModel
+model, tok = FastLanguageModel.from_pretrained(model_id, load_in_4bit=True)
+# ... raw text dataset, MaskedLM-style training ...
+```
+
+---
+
+## Output & Signing
+
+After training completes, the output dir contains:
+- `pytorch_model.bin` / `safetensors` (weights)
+- `config.json`
+- `tokenizer.json` (if shipped)
+
+### Sign the manifest
+
+Use the per-app key (via `dstack-guest-agent`) to sign a manifest containing checkpoint hashes:
+
+```python
+import json, hashlib, time
+from dstack_sdk import DstackClient
+
+client = DstackClient()
+ckpt = open("/output/pytorch_model.bin", "rb").read()
+manifest = {
+    "compose_hash": os.environ["COMPOSE_HASH"],
+    "base_model": os.environ["BASE_MODEL"],
+    "checkpoint_sha256": hashlib.sha256(ckpt).hexdigest(),
+    "ts": int(time.time()),
+}
+
+# Bind the manifest into a fresh TDX quote — `application_data` is the
+# 64-byte report_data the verifier checks.
+report = hashlib.sha256(json.dumps(manifest, sort_keys=True).encode()).digest()
+quote = client.get_quote(report)
+
+open("/output/manifest.signed.json", "w").write(json.dumps({
+    **manifest,
+    "quote": quote.quote,
+    "event_log": quote.event_log,
+}))
+```
+
+The signature chains to the TDX root + on-chain `DstackApp.sol` entry. Auditors verify offline.
+
+### Pull the artifacts
+
+```bash
+phala cp :/output/ ./checkpoints/ -r
+```
+
+---
+
+## Verify
+
+```bash
+phala cvms attestation llama-sft --json > attestation.json
+phala ssh -- nvidia-smi conf-compute -q       # ConfComputeMode : ON
+```
+
+Then run the full verification flow per `verify-attestation.md` — Intel TDX + NVIDIA NRAS + report-data binding + compose-hash. The `manifest.signed.json` quote can be verified the same way: its `report_data` binds the manifest hash to a fresh TDX quote.
+
+---
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| OOM on 7B SFT | Batch too big | Reduce `per_device_train_batch_size` or use `gradient_accumulation_steps` |
+| Slow throughput | Single-GPU FP16 70B | Move to `h200.16xlarge` + tensor parallel |
+| `HF_TOKEN` 401 | Token doesn't have model access | Accept the model license on HF, regenerate token |
+| Decrypt fails | Compose-hash mismatch | First deploy registers the hash; subsequent deploys must use the same compose |
+| Signature fails | Wrong app-id | `phala status` to confirm workspace; pubkey must match the running app |
+
+---
+
+## Reference: minimal end-to-end
+
+```bash
+# 1. Scaffold + seal dataset
+mkdir my-finetune && cd my-finetune
+# (write Dockerfile + train.py + compose + .env)
+python scripts/seal-dataset.py
+
+# 2. Auth + deploy
+phala login
+phala deploy -n llama-sft -c docker-compose.yml -e .env -t h200.small --kms phala --wait
+
+# 3. Watch + verify
+phala logs -f
+phala cvms attestation --json | jq
+phala cp :/output/manifest.signed.json ./
+```
+
+The output is a signed checkpoint that any auditor can verify — no need to trust the trainer or Phala.
diff --git a/skills/usecase/verify-attestation.md b/skills/usecase/verify-attestation.md
new file mode 100644
index 00000000..752dfd7b
--- /dev/null
+++ b/skills/usecase/verify-attestation.md
@@ -0,0 +1,349 @@
+---
+name: verify-attestation
+description: |
+  Verify a Phala Cloud TEE attestation end-to-end — Intel TDX quote
+  to Intel root, NVIDIA GPU quote to NVIDIA root, report-data binding
+  the signing key + nonce, OS image hash to dstack-os reproducible
+  build, compose-hash to expected app, and Sigstore provenance for
+  container images. Use this whenever a user asks "how do I verify
+  this is really running in TEE?"
+---
+
+# Verify TEE Attestation
+
+The hardware-rooted proof flow that other skills reference. Every step gives a separate cryptographic guarantee — together they prove a Phala Cloud workload is running on genuine TEE hardware with the exact code you registered.
+
+## Operations
+
+| User says | Operation |
+|---|---|
+| "is this really in TEE?", "verify the CVM" | **Quick Check** |
+| "verify the GPU TEE", "NVIDIA quote" | **Verify NVIDIA GPU** |
+| "verify the TDX quote", "Intel root" | **Verify Intel TDX** |
+| "fresh nonce", "replay attack" | **Nonce Binding** |
+| "OS image hash", "reproducible build" | **OS Image Verification** |
+| "compose-hash matches", "exact code running" | **Compose Manifest** |
+| "verify the response signature" | **Verify Signature** |
+| "offline verifier", "no internet" | **Offline Verification** |
+
+> **Authoritative doc:** [docs.phala.com/phala-cloud/confidential-ai/verify](https://docs.phala.com/phala-cloud/confidential-ai/verify) is the source of truth. This skill summarizes it as runnable steps.
+
+---
+
+## Quick Check
+
+If you just want to confirm "yes, this CVM is running in TEE":
+
+```bash
+# Summary
+phala cvms attestation my-app
+
+# Full JSON (for programmatic verification)
+phala cvms attestation my-app --json > attestation.json
+```
+
+The summary should report `is_online: true`, `is_public: true`, and `error: null`. The JSON contains `app_certificates[0].quote` — a hex-encoded TDX quote that's the basis for everything else below.
+
+For inference (hosted Confidential AI API), use the per-request flow:
+
+```bash
+curl "https://api.redpill.ai/v1/attestation/report?model=phala/deepseek-chat-v3-0324&nonce=$(openssl rand -hex 32)" \
+  -H "Authorization: Bearer $CONFIDENTIAL_AI_KEY" > report.json
+```
+
+The response includes `nvidia_payload`, `intel_quote`, `signing_address`, and `signing_algo`.
+
+---
+
+## Why every step
+
+| Risk | Step that catches it |
+|---|---|
+| Replayed attestation from old/compromised hardware | **Nonce binding** — fresh per request |
+| Counterfeit CPU pretending to be Intel TDX | **Verify Intel TDX** via DCAP / Phala verify endpoint |
+| Counterfeit GPU pretending to be H100/H200 | **Verify NVIDIA** via NRAS |
+| Signing key not actually inside the TEE | **Report-data binding** — first 32 bytes of `reportdata` = signing key |
+| Operator swapped your code post-boot | **Compose manifest hash** — `mr_config` includes compose-hash |
+| Operator swapped the OS image | **OS image hash** — matches dstack-os reproducible build |
+| Container image swapped at registry | **Sigstore provenance** — built from expected source |
+
+Skip any step → that risk is unguarded.
+
+---
+
+## Nonce Binding
+
+Generate a fresh 32-byte nonce per attestation request. The TEE embeds this nonce into the report — replayed quotes won't match.
+
+```python
+import secrets
+request_nonce = secrets.token_hex(32)   # 64 hex chars
+```
+
+Pass the nonce when you fetch the attestation report:
+
+```python
+import requests
+report = requests.get(
+    f"https://api.redpill.ai/v1/attestation/report?model={model}&nonce={request_nonce}",
+    headers={"Authorization": f"Bearer {api_key}"},
+).json()
+```
+
+For app-CVM attestation (not hosted-API), the nonce is bound at handshake time via RA-TLS, and you verify by extracting `report_data` from the TLS cert's TDX-quote extension.
+
+---
+
+## Verify NVIDIA GPU
+
+Only NVIDIA can confirm their hardware is genuine — secret keys baked into each chip at manufacturing.
+
+```python
+import json, base64, requests
+
+gpu_payload = json.loads(report["nvidia_payload"])
+assert gpu_payload["nonce"].lower() == request_nonce.lower()    # check fresh
+
+# Send to NVIDIA Remote Attestation Service (NRAS)
+r = requests.post("https://nras.attestation.nvidia.com/v3/attest/gpu", json=gpu_payload)
+result = r.json()
+
+# Decode the JWT verdict
+jwt_token = result[0][1]
+payload_b64 = jwt_token.split(".")[1]
+padded = payload_b64 + "=" * ((4 - len(payload_b64) % 4) % 4)
+verdict = json.loads(base64.urlsafe_b64decode(padded))
+
+assert verdict["x-nvidia-overall-att-result"] is True
+```
+
+A passing NRAS verdict means the GPU silicon is genuine NVIDIA, the firmware is signed, and Confidential Compute mode is active.
+
+---
+
+## Verify Intel TDX
+
+Two paths — an online verifier service (easy) or local DCAP verification (offline).
+
+### Online (Phala verifier)
+
+```python
+intel_result = requests.post(
+    "https://cloud-api.phala.com/api/v1/attestations/verify",
+    json={"hex": report["intel_quote"]},
+).json()
+
+assert intel_result["quote"]["verified"] is True
+```
+
+### Offline (DCAP)
+
+Use [`dcap-qvl`](https://github.com/Phala-Network/dcap-qvl) — Phala's open-source DCAP quote verifier:
+
+```bash
+cargo install --git https://github.com/Phala-Network/dcap-qvl
+echo "$INTEL_QUOTE_HEX" | xxd -r -p > quote.bin
+dcap-qvl verify quote.bin
+```
+
+This chains to Intel's PCS root cert, no network call to Phala.
+
+For an interactive sanity check without code, paste the hex `intel_quote` into the [TEE Attestation Explorer](https://proof.t16z.com/) — it decodes the quote and shows TDX version + security features.
+
+---
+
+## Report-Data Binding
+
+The TDX quote's `reportdata` field is 64 bytes the application provides to the hardware at attestation time. Phala packs it as:
+
+| Bytes | Content |
+|---|---|
+| 0–31 | Signing address (ECDSA: 20-byte Eth address right-padded; Ed25519: 32-byte pubkey) |
+| 32–63 | Your request nonce |
+
+Verify both halves match what you expect:
+
+```python
+report_data = bytes.fromhex(intel_result["quote"]["body"]["reportdata"].removeprefix("0x"))
+
+embedded_address = report_data[:32]
+embedded_nonce = report_data[32:64]
+
+if report["signing_algo"] == "ecdsa":
+    addr = bytes.fromhex(report["signing_address"].removeprefix("0x"))
+    assert embedded_address == addr.ljust(32, b"\x00")
+else:
+    pubkey = bytes.fromhex(report["signing_address"])
+    assert embedded_address == pubkey
+
+assert embedded_nonce.hex() == request_nonce
+```
+
+This proves: (1) the signing key was generated inside the TEE — it's bound into hardware-attested report data; (2) the attestation is fresh — it contains your unique nonce; (3) the signing key you'll use for verifying responses actually belongs to this TEE instance.
+
+---
+
+## OS Image Verification
+
+Verify the operating system the CVM booted is the dstack-os reproducible build, not a tampered image.
+
+The TDX `mrtd` and `rtmr0..3` measurements are folded into the quote at boot. Compare them against the expected values from [meta-dstack reproducible builds](https://github.com/Dstack-TEE/meta-dstack#reproducible-build-the-guest-image):
+
+```bash
+tar -xzf dstack-0.5.5.tar.gz
+cat dstack-0.5.5/digest.txt
+# 0b327bcd642788b0517de3ff46d31ebd3847b6c64ea40bacde268bb9f1c8ec83
+```
+
+Then in the verification code, follow the [`osVerification.ts`](https://github.com/Phala-Network/dstack-verifier/blob/95689c41/src/verification/osVerification.ts#L13-L27) pattern from `dstack-verifier` to compute and compare TCB measurements against the digest above.
+
+If even one byte of the OS image differs, the measurements won't match.
+
+---
+
+## Compose Manifest
+
+Verify the running CVM's `app_compose.json` hash matches the registered `compose-hash`. This is what proves "the code I see is the code that's running".
+
+```python
+from hashlib import sha256, json
+
+tcb_info = report["info"]["tcb_info"]
+if isinstance(tcb_info, str):
+    tcb_info = json.loads(tcb_info)
+
+app_compose = tcb_info["app_compose"]
+compose_hash = sha256(app_compose.encode()).hexdigest()
+
+# `mr_config` field of TDX quote includes the compose hash, prefixed with "0x01"
+mr_config = intel_result["quote"]["body"]["mrconfig"]
+expected = "0x01" + compose_hash
+assert mr_config.lower().startswith(expected.lower())
+
+# Optional: print the actual docker-compose so the user can review
+docker_compose = json.loads(app_compose)["docker_compose_file"]
+print(docker_compose)
+```
+
+This proves the CVM booted with exactly the docker-compose you registered. Operator can't swap services, change images, or inject env vars after boot.
+
+---
+
+## Sigstore Provenance
+
+Verify each container image in the compose was built from a known source repo (not a backdoor pushed to the registry).
+
+```python
+import re, requests
+
+digests = set(re.findall(r'@sha256:([0-9a-f]{64})', docker_compose))
+
+for digest in digests:
+    sigstore_url = f"https://search.sigstore.dev/?hash=sha256:{digest}"
+    r = requests.head(sigstore_url, timeout=10)
+    if r.status_code < 400:
+        print(f"✓ {sigstore_url}")
+    else:
+        print(f"✗ {sigstore_url} (HTTP {r.status_code})")
+```
+
+A passing Sigstore link lets the user open the URL and confirm the image was built from the expected GitHub repo, with the expected workflow, by the expected actor. If a digest has no Sigstore record, the user must trust the registry directly — flag this in your verifier UI.
+
+---
+
+## Verify Signature
+
+Once you've verified the signing key is bound to a real TEE, verify response signatures from the Confidential AI API.
+
+Every response carries:
+
+| Header | Meaning |
+|---|---|
+| `x-phala-receipt-sig` | Signature over `(model_id, prompt_hash, response_hash, ts)` |
+| `x-phala-compose-hash` | Compose-hash of the model serving CVM |
+| `x-phala-app-id` | Per-app key identity |
+
+```python
+import hashlib
+from eth_keys import keys
+
+prompt_hash = hashlib.sha256(prompt.encode()).hexdigest()
+response_hash = hashlib.sha256(response_text.encode()).hexdigest()
+payload = f"{model_id}|{prompt_hash}|{response_hash}|{timestamp}"
+
+# ECDSA verification (signing_algo == "ecdsa")
+signature = bytes.fromhex(receipt_sig.removeprefix("0x"))
+recovered = keys.ecdsa_recover(hashlib.sha256(payload.encode()).digest(), keys.Signature(signature))
+assert recovered.to_address() == report["signing_address"]
+```
+
+For Ed25519, use `nacl.signing.VerifyKey(pubkey).verify(payload, signature)` instead.
+
+This is the final link: the response YOU received is bound to the (verified-genuine) TEE that produced it.
+
+---
+
+## Offline Verification
+
+If you can't reach Phala's verify endpoint or NRAS:
+
+| Step | Offline tool |
+|---|---|
+| Verify Intel TDX quote | [`dcap-qvl`](https://github.com/Phala-Network/dcap-qvl) — chains to Intel PCS root |
+| Verify NVIDIA GPU quote | [`nvattest-verifier`](https://github.com/NVIDIA/nvtrust) — NVIDIA's local verifier (chains to NVIDIA root) |
+| Verify OS image hash | Compare against `digest.txt` from [meta-dstack release tarball](https://github.com/Dstack-TEE/meta-dstack/releases) |
+| Verify compose-hash | sha256 the app_compose JSON locally; check against `mr_config` byte for byte |
+| Verify Sigstore record | `cosign verify --certificate-identity-regexp '...' --certificate-oidc-issuer 'https://token.actions.githubusercontent.com' <image>` |
+
+End-to-end offline: clone [Phala-Network/dstack-verifier](https://github.com/Phala-Network/dstack-verifier), feed it `attestation.json`, get a single PASS/FAIL.
+
+---
+
+## Reference: minimal end-to-end (Python)
+
+```python
+# Full flow — verify a Confidential AI API response is genuine
+import secrets, requests, json, base64, hashlib
+
+api_key = os.environ["CONFIDENTIAL_AI_KEY"]
+model = "phala/deepseek-chat-v3-0324"
+
+# 1. Fresh nonce
+nonce = secrets.token_hex(32)
+
+# 2. Get attestation report
+report = requests.get(
+    f"https://api.redpill.ai/v1/attestation/report?model={model}&nonce={nonce}",
+    headers={"Authorization": f"Bearer {api_key}"},
+).json()
+
+# 3. Verify NVIDIA GPU
+gpu_payload = json.loads(report["nvidia_payload"])
+assert gpu_payload["nonce"].lower() == nonce.lower()
+nras = requests.post("https://nras.attestation.nvidia.com/v3/attest/gpu", json=gpu_payload).json()
+verdict = json.loads(base64.urlsafe_b64decode(nras[0][1].split(".")[1] + "=="))
+assert verdict["x-nvidia-overall-att-result"] is True
+
+# 4. Verify Intel TDX
+intel = requests.post(
+    "https://cloud-api.phala.com/api/v1/attestations/verify",
+    json={"hex": report["intel_quote"]},
+).json()
+assert intel["quote"]["verified"] is True
+
+# 5. Verify report-data binding
+rd = bytes.fromhex(intel["quote"]["body"]["reportdata"].removeprefix("0x"))
+addr = bytes.fromhex(report["signing_address"].removeprefix("0x"))
+assert rd[:32] == addr.ljust(32, b"\x00")
+assert rd[32:64].hex() == nonce
+
+# 6. Verify compose-hash
+tcb = report["info"]["tcb_info"]
+if isinstance(tcb, str): tcb = json.loads(tcb)
+ch = hashlib.sha256(tcb["app_compose"].encode()).hexdigest()
+assert intel["quote"]["body"]["mrconfig"].lower().startswith(("0x01" + ch).lower())
+
+print("ALL VERIFIED — request to", model, "ran on genuine GPU TEE with the expected code.")
+```
+
+A reference implementation lives at [`Phala-Network/private-ml-sdk/vllm-proxy/verifiers/attestation_verifier.py`](https://github.com/Phala-Network/private-ml-sdk/blob/main/vllm-proxy/verifiers/attestation_verifier.py) — copy it, point at your model + key, run.