Last updated: 2026-02-12
This document provides a reproducible, public workflow:
- Deploy dstack-kms on a GCP TDX CVM; 2) Complete on-chain authorization; 3) Have a Nitro Enclave retrieve keys from the KMS via RA-TLS.
- KMS Runtime: dstack OS on GCP Confidential VM (TDX)
- Authentication: On-chain authorization (production mode)
- Two Connectivity Modes:
- Direct RPC:
kms -> auth-api -> public RPC (contract execution) - Light Client:
kms -> auth-api -> helios (contract execution) -> public RPC (data sync)
- Direct RPC:
- Caller: AWS Nitro Enclave application
The following must be installed and authenticated:
gcloud(logged in viagcloud auth login, with permissions to create Confidential VMs)awsCLI (configured viaaws configure)docker/docker composenode+npmjq- Rust toolchain (
cargo) + musl target:rustup target add x86_64-unknown-linux-musl
# Download the dstack-cloud CLI
curl -fsSL -o ~/.local/bin/dstack-cloud \
https://raw.githubusercontent.com/Phala-Network/meta-dstack-cloud/main/scripts/bin/dstack-cloud
chmod +x ~/.local/bin/dstack-cloudSource code and latest releases: Phala-Network/meta-dstack-cloud
dstack-cloud config-editEdit ~/.config/dstack-cloud/config.json and fill in the following fields:
# Download and extract (note: must use the -uki.tar.gz variant)
dstack-cloud pull https://github.com/Phala-Network/meta-dstack-cloud/releases/download/v0.6.0-test/dstack-cloud-0.6.0-uki.tar.gzThis is currently a test release (v0.6.0-test); reproducible build scripts are not yet available.
Important: The release contains two image variants. You must download the
-uki.tar.gzfile (containsdisk.raw+auth_hash.txt), not the plain.tar.gzfile (which contains separate kernel/rootfs files for bare-metal VMM use). Using the wrong variant will causedstack-cloud deployto fail with "Boot image not found" and result in an emptyos_image_hashin attestation.
git clone https://github.com/Phala-Network/dstack-cloud-deployment-guide.git
cd dstack-cloud-deployment-guideIt is recommended to prepare a domain name (e.g., test-kms.kvin.wang) pointing to the KMS GCP instance's public IP.
Open ports:
12001/tcp: KMS API18000/tcp: internal auth-api (debugging, optional)18545/tcp: internal helios eth RPC (debugging, optional, light client only)
Get a Base Sepolia RPC URL from a provider — Alchemy / Infura / QuickNode free tier all work. Keep it handy; you'll reuse it in §4 (Hardhat tasks) and §9 (helios EXECUTION_RPC).
export RPC_URL="https://base-sepolia.g.alchemy.com/v2/<YOUR_KEY>"Don't use https://sepolia.base.org for this. Since the Base V1 / base-reth-node rollout on 2026-04-20, the public endpoint runs with most non-eth_* namespaces stripped, which breaks two specific things this guide needs:
web3_clientVersionis missing →kms:deployaborts during OpenZeppelin upgrades-core'sisDevelopmentNetworkcheck withProviderError: Method not found(details in §4.2 callout).eth_getProofis missing → helios light client can't verify state reads (details in §9.1).
sepolia.base.org is still fine for runtime eth_calls — i.e. ETH_RPC_URL inside the KMS CVM in Direct RPC mode (§6.2) can stay on the public endpoint. The provider URL is only needed for tooling on your laptop and (in Light Client mode) for helios.
One-line sanity check before you start:
curl -s -X POST -H 'Content-Type: application/json' \
--data '{"jsonrpc":"2.0","method":"web3_clientVersion","id":1}' "$RPC_URL"
# Expect e.g. {"jsonrpc":"2.0","id":1,"result":"reth/v1.11.3-.../base/v0.7.6"}
# If you get "Method not found", you're hitting a stripped-down endpoint — switch provider.# Create a dstack-cloud project in your working directory
# (stay in the dstack-cloud-deployment-guide root — later sections reference paths relative to it)
mkdir -p workshop-run
dstack-cloud new workshop-run/kms-prod \
--os-image dstack-cloud-0.6.0 \
--key-provider tpm \
--instance-name dstack-kmsExample output:
Initialized project in /path/to/workshop-run/kms-prod
Created files:
app.json - Application configuration (with embedded GCP config)
shared/ - System-generated files
docker-compose.yaml - Docker compose file
prelaunch.sh - Prelaunch script
.user-config - User configuration
Created new project: kms-prod
Project directory: /path/to/workshop-run/kms-prod
Key files in the generated project:
app.json: Project configuration (OS image, key provider, instance name, etc.)docker-compose.yaml: Container orchestration (will be replaced in subsequent steps)prelaunch.sh: Script executed before containers start
Note: Production mode requires completing contract deployment and authorization configuration first.
Prepare a test wallet and ensure it has sufficient balance (approximately 0.003 ETH on Base Sepolia is enough for the deployment steps).
git clone https://github.com/Phala-Network/dstack-nitro-enclave-app.git --recurse-submodules
cd dstack-nitro-enclave-app/dstack/kms/auth-eth
npm ci
npx hardhat compileCompilation output:
Generating typings for: 19 artifacts in dir: typechain-types for target: ethers-v6
Successfully generated 72 typings!
Compiled 19 Solidity files successfully (evm target: paris).
Deploy the contract:
# RPC_URL is the provider URL from §2.7 — do NOT use https://sepolia.base.org here.
export PRIVATE_KEY="<YOUR_PRIVATE_KEY>"
echo "y" | npx hardhat kms:deploy --with-app-impl --network customExample output:
Deploying with account: 0xe359...EfB5
Account balance: 0.002689232335867312
Step 1: Deploying DstackApp implementation...
✅ DstackApp implementation deployed to: 0x43ac...A578
Step 2: Deploying DstackKms...
DstackKms Proxy deployed to: 0xFaAD...4DBC
Record from the output:
DstackKms Proxy(used later asKMS_CONTRACT_ADDR)
If you ignored §2.7 and used
https://sepolia.base.org,kms:deployaborts with:ProviderError: Method not found at HttpProvider.request ... at async isDevelopmentNetwork (.../@openzeppelin/upgrades-core/src/provider.ts:160)OpenZeppelin upgrades-core calls
web3_clientVersionfromisDevelopmentNetworkbefore deploying any proxy (any chain id ≠ 1337 / 31337), andbase-reth-nodebehind the public endpoint runs with theweb3namespace disabled. base/node#421 (merged Oct 2025) enabled it in the upstreamreth-entrypointscript for self-hosted operators, butsepolia.base.orghasn't picked it up; OpenZeppelin has no flag to skip the check. Fix is to switch to a provider URL (§2.7).Historically — on older RPC backends — this step also surfaced as a
Contract deployment failed - no code at addressrace: the proxy was on chain but the post-deploy read raced ahead of propagation. If you hit that on a different RPC, theDstackKms Proxy deployed to:line still appears before the error; record that address and verify withcast code <addr> --rpc-url <RPC>or sepolia.basescan.org.
export KMS_CONTRACT_ADDRESS="<KMS_CONTRACT_ADDR>"
# Create app (allow-any-device is fine for demos; tighten for production)
npx hardhat kms:create-app --network custom --allow-any-device✅ App deployed and registered successfully!
Proxy Address (App Id): 0x1342...8BA0
Owner: 0xe359...EfB5
export APP_ID="<APP_ID_FROM_CREATE_APP>"The
RPC_URL,PRIVATE_KEY, andKMS_CONTRACT_ADDRESSenvironment variables must remain set throughout this process.
The workshop/kms/builder/ directory in this repository provides a one-step build script that produces an image containing both dstack-kms and helios (for the Light Client mode in Section 9).
Source versions are pinned in build-image.sh (DSTACK_REV / HELIOS_REV) and can be overridden via environment variables.
cd workshop/kms/builder
# Build (uses pinned versions by default)
# Replace cr.kvin.wang with your own registry if needed
./build-image.sh cr.kvin.wang/dstack-kms:latest
# Push
docker push cr.kvin.wang/dstack-kms:latestCopy the compose template from this repository to the project directory generated by dstack-cloud new:
# Assuming current directory is dstack-cloud-deployment-guide and project is at workshop-run/kms-prod
cp workshop/kms/docker-compose.direct.yaml workshop-run/kms-prod/docker-compose.yamlDatadog (optional): use
workshop/kms/docker-compose.direct.datadog.yamlinstead — same services plus a sidecardatadog-agentthat scrapes the KMS Prometheus/metricsendpoint and forwards container logs. The rest of §6 proceeds identically; you'll also add a fewDD_*env vars in §6.2 and use the verification block at the end of §6.5.
Security boundary:
prelaunch.shis embedded inapp-compose.json, which the dstack guest-agent serves over HTTP wheneverpublic_tcbinfo: true(the default) and which is also bundled into the shared-disk tarball uploaded to GCS during deploy. Never put secrets inprelaunch.sh.Secrets go in
.user-config(mounted at/dstack/.host-shared/.user-configinside the CVM). dstack stores this file verbatim — it is not part ofapp-compose.json, not measured (changing it doesn't movemr_aggregated, so you can rotate without re-registering on chain), and not served by the public TCB-info HTTP endpoint. Seedocs/security/cvm-boundaries.mdfor the full boundary contract.Because
.user-configis not measured, a malicious host could swap it. Two consequences shape how we consume it:
- JSON format, parsed with
jq. Rawcat-then-source would let a substituted file inject shell metacharacters (KEY="; rm -rf / ") into the.envconsumed by docker-compose.- Explicit allowlist of keys baked into
prelaunch.sh. Sinceprelaunch.shis measured (part ofapp-compose.json, hashed intomr_aggregated), a malicious host can swap the value of a whitelisted key, but cannot introduce new env vars.
Write a prelaunch.sh that lays down the non-secret defaults and then pulls whitelisted keys from .user-config:
cat > workshop-run/kms-prod/prelaunch.sh <<'EOF'
#!/bin/sh
# Prelaunch script - write .env for docker-compose (non-secrets only)
cat > .env <<'ENVEOF'
KMS_HTTPS_PORT=12001
AUTH_HTTP_PORT=18000
KMS_IMAGE=cr.kvin.wang/dstack-kms:latest
ETH_RPC_URL=https://sepolia.base.org
KMS_CONTRACT_ADDR=<KMS_CONTRACT_ADDR>
DSTACK_REPO=https://github.com/Phala-Network/dstack-cloud.git
DSTACK_REF=14963a2ccb0ec7bef8a496c1ac5ac40f5593145d
ENVEOF
# Whitelisted keys to import from .user-config (JSON).
# Plain Direct RPC needs none; add DD_* for the Datadog variant (see §6.1 note),
# or EXECUTION_RPC for the Light Client variant (§9).
ALLOWED=""
UC=/dstack/.host-shared/.user-config
if [ -f "$UC" ] && [ -n "$ALLOWED" ]; then
for key in $ALLOWED; do
val=$(jq -r --arg k "$key" '.[$k] // empty' "$UC" | tr -d '\n\r')
[ -n "$val" ] && printf '%s=%s\n' "$key" "$val" >> .env
done
fi
EOFReplace <KMS_CONTRACT_ADDR> with the actual value. jq is included in the dstack OS rootfs.
dstack-cloud new initializes .user-config to {}. Plain Direct RPC needs no secrets — leave it {}.
Building
.user-config: it's a plain JSON object. For one-off setup a heredoc with literal values is fine, but if any value comes from an existing env file or a secret manager, build withjq -ninstead — it handles JSON escaping correctly when values contain",\, or/, and the--argform keeps the secret on stdin only (not on the command line, where it'd land in shell history /psoutput).# values from a private env file, secret manager, password prompt, … DD_API_KEY="<32-char hex from Datadog>"; DD_SITE="datadoghq.com" jq -n --arg k "$DD_API_KEY" --arg s "$DD_SITE" '{ DD_API_KEY: $k, DD_SITE: $s, DD_ENV: "production", DD_SERVICE: "dstack-kms", DD_TAGS: "env:production,service:dstack-kms" }' > workshop-run/kms-prod/.user-config jq . workshop-run/kms-prod/.user-config # validate
Datadog: if you used
docker-compose.direct.datadog.yamlin §6.1:
- In
prelaunch.sh, change the allowlist:ALLOWED="DD_API_KEY DD_SITE DD_ENV DD_SERVICE DD_TAGS"- Write
.user-configwith thejq -nsnippet above (or the literal heredoc form if you prefer):cat > workshop-run/kms-prod/.user-config <<'EOF' { "DD_API_KEY": "<32-char hex from Datadog>", "DD_SITE": "datadoghq.com", "DD_ENV": "production", "DD_SERVICE": "dstack-kms", "DD_TAGS": "env:production,service:dstack-kms" } EOF
DD_API_KEYmust be exactly 32 alphanumeric characters. Some sources include a human prefix (e.g.pub<32hex>) — paste only the 32-char body, otherwise the agent retries forever and nothing reaches Datadog (silent failure, invisible from outside the CVM). Sanity-check before deploy:curl -sw '%{http_code}\n' -X POST "https://api.${DD_SITE}/api/v2/series" \ -H "DD-API-KEY: $DD_API_KEY" -H "Content-Type: application/json" \ -d '{"series":[{"metric":"sanity.test","type":3,"points":[{"timestamp":'$(date +%s)',"value":1}],"resources":[{"name":"laptop","type":"host"}]}]}' # 202 {"errors":[]} -> OK; 403 + format-error message -> bad key.
cd workshop-run/kms-prod
dstack-cloud deploy --deleteExample output:
=== GCP TDX VM Deployment ===
Project: your-project
Zone: us-central1-a
Instance: dstack-kms
...
=== Deployment Complete ===
Instance: dstack-kms
External IP: 35.188.xxx.xxx
# Open ports
dstack-cloud fw allow 12001
dstack-cloud fw allow 18000After deployment, point your domain DNS to the External IP shown in the output.
Container startup takes approximately 1-2 minutes (image pull + auth-api compilation). You can monitor progress via serial port logs:
dstack-cloud logsOn first boot, the KMS starts an HTTP (not HTTPS) Onboard service on port 12001. Opening http://<KMS_DOMAIN>:12001/ in a browser shows an interactive UI that automatically displays Attestation Info (device_id, mr_aggregated, os_image_hash) — these are the real values needed for on-chain registration.
Pre-Bootstrap registration:
Onboard.BootstrapcallsbootAuth/kms, which checks the KMS contract'sisKmsAllowed(). On a fresh contract (or any time the compose /prelaunch.shcontent changes, including switching to the.datadog.yamlvariant), the currentmr_aggregatedanddevice_idwon't be on the allow-list and Bootstrap returns:{"error": "KMS is not allowed to bootstrap: boot denied: Aggregated MR not allowed"}Fetch the values and register them before calling Bootstrap:
curl -s "http://<KMS_DOMAIN>:12001/prpc/Onboard.GetAttestationInfo?json" | jq . # In auth-eth/ with RPC_URL / PRIVATE_KEY / KMS_CONTRACT_ADDRESS exported npx hardhat kms:add-image 0x<OS_IMAGE_HASH> --network custom # idempotent if already registered npx hardhat kms:add 0x<MR_AGGREGATED> --network custom npx hardhat kms:add-device 0x<DEVICE_ID> --network customIf
kms:add-devicereportsnonce too lowimmediately afterkms:add, retry once — the public RPC sometimes returns a stale nonce while the previous tx is propagating.
Wait for auth-api before calling Bootstrap. The KMS Onboard HTTP listener on 12001 comes up before
auth-apifinishes its first-runnpm ci+ compile (the long pole of the 1–2 minute boot).Onboard.Bootstrapcallsauth-apiinternally; if it isn't reachable yet you'll get:"error": "KMS is not allowed to bootstrap: failed to call KMS auth check: error sending request for url (http://auth-api:8000/bootAuth/kms): client error (Connect): tcp connect error: Connection refused (os error 111)"One-liner readiness gate before Bootstrap:
until curl -sf --max-time 3 "http://<KMS_DOMAIN>:18000/" | jq -e '.status=="ok"' >/dev/null; do sleep 10; done
Call Bootstrap to generate keys:
curl -s "http://<KMS_DOMAIN>:12001/prpc/Onboard.Bootstrap?json" \
-d '{"domain": "<KMS_DOMAIN>"}' | jq .{
"ca_pubkey": "3059301306072a8648ce3d0201...",
"k256_pubkey": "03548465f50fca3aec29ec1569...",
"attestation": "0001017d040002008100..."
}
attestationis the TDX quote (becausequote_enabled = true), containing the key fingerprint. It can be verified on-chain or off-chain to confirm the KMS is running in a trusted environment.
Complete initialization:
curl "http://<KMS_DOMAIN>:12001/finish"
# Returns "OK"/finish causes the Onboard service to exit(0), and docker-compose's restart: unless-stopped automatically restarts the container.
At this point, keys are written to the persistent volume. The KMS detects existing keys, skips Onboard, and starts the main service directly over HTTPS.
Verify auth-api:
curl -s "http://<KMS_DOMAIN>:18000/" | jq .{
"status": "ok",
"kmsContractAddr": "0xFaAD...4DBC",
"gatewayAppId": "",
"chainId": 84532,
"appAuthImplementation": "0x43ac...A578",
"appImplementation": "0x43ac...A578"
}Verify KMS (note: HTTPS at this point):
curl -sk "https://<KMS_DOMAIN>:12001/prpc/GetMeta?json" -d '{}' | jq .{
"ca_cert": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----\n",
"allow_any_upgrade": false,
"k256_pubkey": "02...",
"bootstrap_info": {
"ca_pubkey": "3059...",
"k256_pubkey": "03...",
"attestation": "0001..."
},
"is_dev": false,
"kms_contract_address": "0xFaAD...4DBC",
"chain_id": 84532,
"app_auth_implementation": "0x43ac...A578"
}
bootstrap_infocontains theca_pubkey,k256_pubkey, and TDXattestationreturned during Bootstrap.
Datadog (only if you used the .datadog.yaml variant):
The KMS exposes Prometheus metrics on the same TLS port as the RPCs (/metrics):
curl -sk "https://<KMS_DOMAIN>:12001/metrics"# HELP dstack_kms_attestation_requests_total Total number of KMS attestation requests.
# TYPE dstack_kms_attestation_requests_total counter
dstack_kms_attestation_requests_total 0
# HELP dstack_kms_attestation_failures_total Total number of failed KMS attestation requests.
# TYPE dstack_kms_attestation_failures_total counter
dstack_kms_attestation_failures_total 0
These two counters increment when enclaves call GetAppKey / GetKmsKey / SignCert.
Serial logs only show app-compose.sh orchestration output (container stdout doesn't reach the serial console), so the most we can confirm host-side is that both containers started:
dstack-cloud logs --lines 400 | grep -E "Container dstack-(kms|datadog-agent)-1"[ 41.747276] app-compose.sh[768]: Container dstack-datadog-agent-1 Starting
[ 41.934329] app-compose.sh[768]: Container dstack-datadog-agent-1 Started
(The compose project name on the CVM is dstack, not the directory name kms-prod — that's why containers are dstack-*-1.)
In Datadog, open Metrics Explorer and search for dstack_kms_attestation_requests_total; container logs land in Logs Explorer with service:dstack-kms. The openmetrics namespace is set to "" so metric names match upstream Prometheus exactly (setting namespace: dstack_kms would produce a duplicated dstack_kms.dstack_kms_* prefix). The template uses bridge networking rather than network_mode: host (avoids TDX iptables conflicts), so systemd-level metrics from the dstack guest agent on port 8090 are not collected — only KMS application metrics.
# If not already cloned (skip if done in step 4)
git clone https://github.com/Phala-Network/dstack-nitro-enclave-app.git --recurse-submodules
cd dstack-nitro-enclave-app
# Override variables as needed: REGION / INSTANCE_TYPE / KEY_NAME / KEY_PATH, etc.
REGION=us-east-1 \
INSTANCE_TYPE=c5.xlarge \
KEY_NAME=dstack-nitro-enclave-key \
KEY_PATH=./dstack-nitro-enclave-key.pem \
./deploy_host.shNote: If a Key Pair with the same name (
dstack-nitro-enclave-key) already exists in AWS but you don't have the corresponding local PEM file, the script will fail. Delete the old key pair first:aws ec2 delete-key-pair --region us-east-1 --key-name dstack-nitro-enclave-key
If you see Insufficient CPUs available in the pool (E22), first check for leftover enclaves occupying CPUs:
# Check for leftover enclaves (common when running multiple times on the same instance)
nitro-cli describe-enclaves
# If the output shows RUNNING enclaves, terminate them first
nitro-cli terminate-enclave --allIf the issue persists after terminating leftover enclaves, manually configure the allocator:
# Check available CPUs on this instance (enclave can use at most nproc - 1)
nproc
# Run on the EC2 instance (cpu_count must be less than nproc output)
sudo bash -c 'cat > /etc/nitro_enclaves/allocator.yaml <<YAML
---
memory_mib: 512
cpu_count: 2
YAML'
# Free memory fragments and restart
sudo bash -c 'sync; echo 3 > /proc/sys/vm/drop_caches; echo 1 > /proc/sys/vm/compact_memory'
sudo systemctl restart nitro-enclaves-allocator.serviceAfter ./deploy_host.sh completes, it generates deployment.json:
{
"instance_id": "i-0324740db36bfeb08",
"public_ip": "18.207.xxx.xxx"
}Prerequisite: A local Rust toolchain (with musl target) is required, as
get_keys.shcompilesdstack-utilfrom source.
get_keys.sh --show-mrs builds the EIF image on the EC2 host, computes sha256(pcr0 || pcr1 || pcr2), and prints the OS_IMAGE_HASH directly:
cd dstack-nitro-enclave-app
HOST=$(jq -r .public_ip deployment.json) \
KMS_URL="https://<KMS_DOMAIN>:12001" \
APP_ID="<APP_ID>" \
KEY_PATH=./dstack-nitro-enclave-key.pem \
./get_keys.sh --show-mrsExample output:
PCR0: 1415501c7caeba0a7aea20f...
PCR1: 4b4d5b3661b3efc1292090...
PCR2: 33ae855210ea2ce171925831...
OS_IMAGE_HASH: 0x1078395c3151831924c255f7b7dec87b3f6bb3bf9db98fe17d43abfbe506407d
Register on-chain (run in the dstack-nitro-enclave-app/dstack/kms/auth-eth directory):
# Register OS image hash in the KMS contract
npx hardhat kms:add-image ${OS_IMAGE_HASH} --network custom
# Register compose hash in the APP contract (hash value is the same as OS image hash)
npx hardhat app:add-hash --app-id ${APP_ID} ${OS_IMAGE_HASH} --network customImportant: Both
KMS_URLandAPP_IDare baked into the EIF image (viaenclave_run_get_keys.sh). They affect PCR values and thereforeOS_IMAGE_HASH. The values used for--show-mrsmust be identical to those used in the actual key retrieval run (Section 8.2). If either value differs, the PCR measurements will not match the registered hash, causing a "Boot denied: OS image is not allowed" error.
cd dstack-nitro-enclave-app
HOST=$(jq -r .public_ip deployment.json) \
KMS_URL="https://<KMS_DOMAIN>:12001" \
APP_ID="<APP_ID>" \
KEY_PATH=./dstack-nitro-enclave-key.pem \
./get_keys.shExample output:
[local] Building dstack-util (musl)...
[local] Built .../dstack/target/x86_64-unknown-linux-musl/release/dstack-util
[local] Uploading dstack-util and get-keys scripts to host...
[remote] Starting forward proxy (squid) and vsock proxy bridge...
...
[enclave] run dstack-util get-keys
[enclave] dstack-util exit=0
[enclave] keys-bytes=2325
[enclave] sending keys to host vsock:9999
...
Saved app keys to .../app_keys.json (size: 2325 bytes)
Saved enclave console log to .../enclave_console.log
On success, the following files are generated locally:
app_keys.json— containsca_cert,disk_crypt_key,env_crypt_key,k256_key, etc.
# Verify app_keys.json structure
jq 'keys' app_keys.json["ca_cert", "disk_crypt_key", "env_crypt_key", "gateway_app_id", "k256_key", "k256_signature", "key_provider"]enclave_console.log— enclave kernel boot log (only generated whenDEBUG_ENCLAVE=1is set; absent by default)ncat_keys.log
The
cr.kvin.wang/dstack-kms:latestimage already includes the helios binary (seeworkshop/kms/builder/).
EXECUTION_RPC for helios is the same kind of endpoint as RPC_URL in §4 — use the provider URL you set up in §2.7. The KMS image already pulls helios from a pinned source; you only need to point it at an execution RPC that serves eth_getProof.
EXECUTION_RPC=https://base-sepolia.g.alchemy.com/v2/<YOUR_KEY>
Why not https://sepolia.base.org? Helios verifies every account read against the block's state root by calling eth_getProof (core/src/execution/providers/rpc.rs) — no "trusted" fallback. Since Base V1 activated on Sepolia on 2026-04-20 (see base/node#1035 and #980), the public endpoint runs base-reth-node with the historical-proofs ExEx disabled by default; eth_getProof returns 403 -32601 "rpc method is unsupported", which surfaces as auth-api 500s (missing revert data / CALL_EXCEPTION) and Onboard.Bootstrap failures (boot denied: ...). Same root cause as the §4.2 web3_clientVersion issue — different missing namespace, same workaround.
# Use the light template from this repository
cp workshop/kms/docker-compose.light.yaml workshop-run/kms-prod/docker-compose.yamlThe light compose templates require EXECUTION_RPC in .env (compose refuses to start with a clear error otherwise). Because the Alchemy URL is a secret, put it in .user-config, and add EXECUTION_RPC to the prelaunch.sh allowlist from §6.2:
# prelaunch.sh — change the allowlist line:
ALLOWED="EXECUTION_RPC"
# (combine with Datadog: ALLOWED="EXECUTION_RPC DD_API_KEY DD_SITE DD_ENV DD_SERVICE DD_TAGS")Build .user-config (see §6.2 for the JSON-construction note — jq -n from a shell variable keeps the key off the command line and JSON-escapes the URL safely):
EXECUTION_RPC="https://base-sepolia.g.alchemy.com/v2/<YOUR_ALCHEMY_KEY>"
jq -n --arg r "$EXECUTION_RPC" '{EXECUTION_RPC: $r}' \
> workshop-run/kms-prod/.user-configOr, if you already keep EXECUTION_RPC=... in a private env file (e.g. alchemy.env), turn it into the JSON shape in one line:
. ./alchemy.env # sets EXECUTION_RPC
jq -n --arg r "$EXECUTION_RPC" '{EXECUTION_RPC: $r}' \
> workshop-run/kms-prod/.user-configDatadog (optional): use
workshop/kms/docker-compose.light.datadog.yamlinstead. Apply theDD_*env-var addition from §6.2 and the verification block from §6.5.
To build a KMS image with helios yourself, see
workshop/kms/builder/README.md.
cd workshop-run/kms-prod
dstack-cloud deploy --delete
# Ports: KMS + auth-api + helios (debugging)
dstack-cloud fw allow 12001
dstack-cloud fw allow 18000
dstack-cloud fw allow 18545The Bootstrap process is the same as Section 6.4: wait for the containers to start, then call Onboard.Bootstrap + /finish. The §6.4 readiness gate (curl http://<KMS_DOMAIN>:18000/ returning "status":"ok") is the right wait here too — in Light Client mode it also covers helios initial sync. While helios is still catching up to a recent block, auth-api returns 500 with CALL_EXCEPTION / missing revert data (helios can't supply the state proof yet); a Bootstrap call during that window fails the same way. The gate stays red until both are ready.
Verify:
# helios RPC
curl -s -H 'Content-Type: application/json' \
--data '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \
"http://<KMS_DOMAIN>:18545"
# auth-api
curl -s "http://<KMS_DOMAIN>:18000/" | jq .
# KMS (HTTPS, after Bootstrap)
curl -sk "https://<KMS_DOMAIN>:12001/prpc/GetMeta?json" -d '{}' | jq .Nitro-side verification is the same as Chapter 8 — just keep the same KMS_URL.
This scenario requires two GCP TDX instances running simultaneously, with the source KMS having completed Bootstrap (Section 6.4).
Onboard allows a new KMS instance to replicate keys from a running source KMS, enabling multi-node shared identity (high availability / disaster recovery).
During Onboard, the new KMS requests keys from the source KMS via RA-TLS. The source KMS's auth-api verifies the new KMS's TDX attestation quote and calls the on-chain contract's isKmsAllowed() to check:
- OS image hash: Must be registered via
npx hardhat kms:add-image - Aggregated MR: Must be registered via
npx hardhat kms:add - Device ID: Must be registered via
npx hardhat kms:add-device
Note:
isKmsAllowed()performs exact matching on each field — there are no wildcards. Real values must be registered.
While the new KMS is in Onboard mode (HTTP), open http://<NEW_KMS_DOMAIN>:12001/ to see the Attestation Info, or retrieve it via RPC:
curl -s "http://<NEW_KMS_DOMAIN>:12001/prpc/Onboard.GetAttestationInfo?json" | jq .{
"device_id": "7c05db197ea451c8...",
"mr_aggregated": "77eea120a230044f...",
"os_image_hash": "182e89740db72378...",
"attestation_mode": "dstack-gcp-tdx"
}Important: The
device_idshown in serial port logs (dstack-util show) is a dummy valuee3b0c442...(SHA256("")) and cannot be used for on-chain registration. You must obtain the real values from theGetAttestationInfoRPC or the Web UI.
Register the real values for on-chain authorization:
# Register KMS-specific on-chain authorizations (run in the auth-eth directory)
npx hardhat kms:add-image 0x<OS_IMAGE_HASH> --network custom
npx hardhat kms:add 0x<MR_AGGREGATED> --network custom
npx hardhat kms:add-device 0x<DEVICE_ID> --network customAssume a running KMS (source) is available at https://source-kms.example.com:12001.
- Deploy a second KMS instance (using
docker-compose.direct.yamlordocker-compose.light.yaml, both use interactive Bootstrap) - Instead of calling Bootstrap, call the Onboard RPC, specifying the source KMS URL and the new instance's domain:
curl -s "http://<NEW_KMS_DOMAIN>:12001/prpc/Onboard.Onboard?json" \
-d '{
"source_url": "https://source-kms.example.com:12001",
"domain": "<NEW_KMS_DOMAIN>"
}' | jq .{}An empty object
{}indicates success. The new KMS has obtainedca_key,k256_key, andtmp_ca_keyfrom the source KMS, and has generated its own RPC certificate.If the source KMS is running in Light Client mode,
auth-apichecksisKmsAllowed()through its in-CVM helios. Helios needs to verify the block that contains the new KMS's just-registeredmr_aggregated/device_id, which lags the public chain by up to a couple of minutes. During that window Onboard returns:"Boot denied: missing revert data (action=\"call\", ..., code=CALL_EXCEPTION)"Retry every 30s until
{}comes back — no action needed, helios just has to catch up.
- Complete initialization:
curl "http://<NEW_KMS_DOMAIN>:12001/finish"- Verify both KMS instances share the same identity:
# Source KMS
curl -sk "https://source-kms.example.com:12001/prpc/GetMeta?json" -d '{}' | jq .k256_pubkey
# New KMS
curl -sk "https://<NEW_KMS_DOMAIN>:12001/prpc/GetMeta?json" -d '{}' | jq .k256_pubkeyBoth should return the same k256_pubkey (shared identity). Different ca_cert values are expected — each instance generates its own RPC certificate.
Notes:
- During Onboard, the new KMS connects to the source KMS via RA-TLS (
quote_enabled = true). Both instances must run in a dstack environment that supports TDX attestation.- After a successful Onboard, the new KMS's
bootstrap_infoisnull(only the source KMS retains the attestation from Bootstrap).- DNS must be correct: The
source_urldomain is resolved by the new KMS from inside the GCP VM using public DNS — not your local/etc/hosts. If you redeployed the source KMS and its IP changed, you must update the DNS record before calling Onboard. A stale DNS record (e.g., pointing to the new KMS itself) will cause a TLS handshake failure:received corrupt message of type InvalidContentType. Alternatively, you can use the source KMS's IP address directly insource_urlto avoid DNS-related issues.
# GCP
cd workshop-run/kms-prod
dstack-cloud stop
# Or remove completely
dstack-cloud removeDeleting instance dstack-kms...
Deleting shared disk image dstack-kms-shared...
Instance removed.
Note:
dstack-cloud removedeletes the instance and the shared-disk image, but not the firewall rules created viadstack-cloud fw allow. They'll be left asdstack-<instance>-allow-tcp-<port>and collide with future runs that reuse the instance name. Clean them up explicitly:dstack-cloud fw remove 12001 dstack-cloud fw remove 18000 dstack-cloud fw remove 18545 # only if you ran §9 Light Client
# AWS
aws ec2 terminate-instances \
--instance-ids <INSTANCE_ID> \
--region <REGION>{
"TerminatingInstances": [{
"InstanceId": "i-0324...",
"CurrentState": { "Name": "shutting-down" },
"PreviousState": { "Name": "running" }
}]
}Don't forget to clean up DNS records and AWS Key Pairs (if no longer needed).
{ // Local search paths for OS images "image_search_paths": [ "/path/to/your/images" ], "gcp": { "project": "your-gcp-project", // GCP Project ID "zone": "us-central1-a", // Availability zone "bucket": "gs://your-bucket-dstack" // GCS Bucket (for storing deployment images) } }