Audience: the operator standing up a brand-new cloud account to host AgentKeys for the first time, or porting the deployment to a new cloud provider (AliCloud, GCP, Tencent Cloud).
Scope: the per-account, run-once provisioning that has to happen before the broker host can come up (§§3–8 of this doc), followed by the per-broker OIDC federation activation (§9), broker host bring-up (§10), and tear-down (§11). Identifiers (DNS names, IAM principals, mail backend, object store, initial bucket policy) + runtime activation in one place.
FAQ + troubleshooting: wiki/cloud-setup-faq.md.
After this doc is run, the operator returns here ONLY when:
- Switching cloud providers (e.g. AWS → AliCloud)
- Adding a second AWS account (test instance, regional shard)
- Re-bootstrapping after a teardown
- Auditing the identity surface (the security-audit checklist in §7)
The day-to-day broker re-deploys live in §10 below (setup-broker-host.sh); they re-run that section without touching §§1–9.
Tight five-step flow. Explanation + per-step reasoning are in §1–§11 below; the same flow works for prod (no flag), the CI/test stack (--ci, alias --test, swaps in -test identifiers everywhere + targets the test broker EIP agentkeys-broker-eip-test), or any test-fleet slot N ≥ 2 (--ci --slot N, issue #265 — -test-N identifiers + EIP agentkeys-broker-eip-test-N; see §0.3). The orchestrator scripts/setup-cloud.sh is idempotent — re-running is safe. Prod and every test broker are SEPARATE EC2 machines with SEPARATE EIPs — --ci (+ --slot N) is what keeps them apart; never mix the flags.
For each stack (prod and test) you stand up SEPARATELY:
- Launch an EC2 — t3.small minimum (Ubuntu 22.04 LTS recommended).
t3.microruns the OS but its 1 GB RAM gets OOM-killed compilingaws-sdk-s3duringsetup-broker-host.sh. If you already have a t3.micro you can resize:aws ec2 stop-instances→modify-instance-attribute --instance-type t3.small→start-instances(EIP stays attached, INSTANCE_ID unchanged). - Allocate an EIP (or reuse one) and attach it to the EC2.
- Open SG ports 22 (SSH), 80 (certbot HTTP-01 challenge), 443 (TLS) to
0.0.0.0/0. All three are required — port 80 is needed for Let's Encrypt to validate domain ownership during cert issuance (step 5b), even though steady-state traffic only flows over 443. Verify withaws ec2 describe-security-groups --group-ids <sg-id> --query 'SecurityGroups[].IpPermissions[].[FromPort,IpRanges[].CidrIp]'— you should see all three ports. - Generate or import an SSH key pair (the
.pemyou'll keep as the fallback when EC2 Instance Connect is down). Confirm SSH works:ssh -i your.pem ubuntu@<EIP>. - The default
ubuntuuser is enough for now — theagentkeySSH login user (used by EC2 Instance Connect later) is created automatically bysetup-broker-host.shin step 5, along with theec2-instance-connectpackage. - Note INSTANCE_ID + EIP — both go into the env files in step 2.
Two files per environment: {operator-workstation, broker} × {prod, test, test-2, …, test-N}. The operator-workstation files carry account-wide identifiers; the broker files carry per-machine identifiers (INSTANCE_ID + EIP). The base 2×2 (prod + test) ships in-repo; each additional test-fleet slot adds its operator-workstation.test-N.env + broker.test-N.env pair (copy from the slot-2 files — see §0.3).
Both operator-workstation files are pre-populated with litentry.org / account 429071895007 defaults, and every derived value uses bash ${VAR} substitution off of ACCOUNT_ID / BROKER_HOST / ZONE. The script writes 2 values back automatically — operator never hand-edits them:
EIP=…persisted to broker env file by step 4 (after allocate-or-adopt)DATA_ROLE_ARN=…persisted to operator env file by step 11 (after data role create)
| File | Operator edits | What to set |
|---|---|---|
scripts/operator-workstation.env |
None if your account is litentry.org / 429071895007. 5 keys if you're forking: ACCOUNT_ID, BROKER_HOST, ZONE, PARENT_ZONE_ID, MAIL_DOMAIN (the other ~20 keys all derive). |
account-wide identifiers |
scripts/operator-workstation.test.env |
None in the same case. Same 5 keys (or just ZONE + PARENT_ZONE_ID) for a fork. |
-test variants pre-derived |
scripts/broker.env |
INSTANCE_ID=i-… |
EIP is written by the script |
scripts/broker.test.env |
INSTANCE_ID=i-… |
EIP is written by the script |
In practice: paste INSTANCE_ID into the two broker env files. Done.
awsp agentkeys-admin
# Prod stack (no env flag):
bash scripts/setup-cloud.sh --yes
# CI/test stack — --ci (alias --test) auto-selects scripts/operator-workstation.test.env
# + scripts/broker.test.env, suffixes IAM identifiers with -test, and targets the
# test broker EIP (tag agentkeys-broker-eip-test):
bash scripts/setup-cloud.sh --ci --yes
# Test-fleet slot N >= 2 (issue #265) — --slot N selects the slot's env files
# (operator-workstation.test-N.env + broker.test-N.env), suffixes every
# identifier with -test-N, and targets EIP tag agentkeys-broker-eip-test-N:
bash scripts/setup-cloud.sh --ci --slot 2 --yes
# Base prod stack (#282 dual-stack, §0.4) — --base selects
# operator-workstation.base.env + broker.base.env, suffixes every identifier
# with -base, and targets EIP tag agentkeys-broker-eip-base:
bash scripts/setup-cloud.sh --base --yesThe orchestrator walks 15 idempotent steps (cloud-side AWS resources + IAM users + per-data-class roles + bucket policies + DNS UPSERTs). Steps 10 (agentkeys-daemon[-test]) and 12 (agentkeys-broker[-test]) print access keys to copy off — they're shown ONCE.
Append the two access-key blocks from step 3 to ~/.aws/credentials:
[agentkeys-daemon-test]
aws_access_key_id = AKIA…
aws_secret_access_key = …
region = us-east-1
[agentkeys-broker-test]
aws_access_key_id = AKIA…
aws_secret_access_key = …
region = us-east-1(Drop the -test suffix for the prod variants. Account-owner agentkeys-admin is shared — no -test variant.)
Add to ~/.zshenv (works in zsh + bash):
export AGENTKEYS_REPO="$HOME/Projects/agentKeys"
alias ssh-agentkeys='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh prod'
alias ssh-agentkeys-test='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh test'
alias ssh-agentkeys-fallback='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh prod --fallback'
alias ssh-agentkeys-test-fallback='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh test --fallback'
# One pair per additional test-fleet slot (§0.3) — e.g. slot 2:
alias ssh-agentkeys-test-2='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh test-2'
alias ssh-agentkeys-test-2-fallback='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh test-2 --fallback'source ~/.zshenv. The fallback aliases use the .pem key + ubuntu user; the non-fallback ones use EC2 Instance Connect + the agentkey user (which comes online in step 5).
First-time SSH: use the fallback path (the agentkey user doesn't exist yet — setup-broker-host.sh creates it):
ssh-agentkeys-test-fallback # ssh -i ~/.ssh/your.pem ubuntu@<test EIP>
# On the EC2 (~10-15 min on t3.small):
git clone https://github.com/litentry/agentKeys.git
cd agentKeys
sudo bash scripts/setup-broker-host.sh --ci --yesTwo flags. --ci (alias --test) triggers the -test suffix on every derived hostname / bucket / email; --issuer-url + --account-id auto-derive from ZONE + ACCOUNT_ID in scripts/operator-workstation.env (which the repo clone ships with). Override any flag explicitly if you need a non-conventional name. For prod, drop --ci:
sudo bash scripts/setup-broker-host.sh --yesWhat --ci derives automatically:
signer-test.${ZONE},audit-test.${ZONE},email-test.${ZONE},cred-test.${ZONE},memory-test.${ZONE},config-test.${ZONE}agentkeys-vault-test-${ACCOUNT_ID},agentkeys-memory-test-${ACCOUNT_ID},agentkeys-config-test-${ACCOUNT_ID}noreply-test@bots-test.${ZONE}https://test-broker.${ZONE}for the OIDC issuer URL
When the script finishes (~10-15 min on t3.small cold; ~30-60s on re-runs), it does three things at the end so steady-state operator work is one keystroke from your laptop:
- Creates the
agentkeySSH login user (separate from theagentkeysdaemon system user). - Installs
ec2-instance-connect+ writes the sshdAuthorizedKeysCommandconfig so EC2 Instance Connect can push ephemeral keys toagentkey. - Relocates the repo
/home/ubuntu/agentKeys→/home/agentkey/agentKeys(chowned toagentkey) so re-runs + ongoing edits happen as the steady-state user.
Then exit the ubuntu session and reconnect as agentkey for everything from here on:
exit # leave the ubuntu fallback session
ssh-agentkeys-test # Instance Connect, no .pem needed
cd ~/agentKeys # → /home/agentkey/agentKeys, files visibleSubsequent re-runs (git pull + sudo bash scripts/setup-broker-host.sh --ci --yes) happen from /home/agentkey/agentKeys — step 10's relocation is idempotent (existence check skips when already in place). The cargo build cache survives the move (it's inside target/). The Rust toolchain is KEPT across runs by default so re-deploys skip the slow rustup + crate-registry re-download (and sccache caches the compilations) — a no-source-change re-run drops to ~30-60s. Pass --reclaim-toolchain on a final deploy to delete /root/.cargo + /root/.rustup and free ~1.5 GB.
For prod, the same flow applies — drop --test everywhere and the relocation moves the repo from whichever home dir you bootstrapped in to /home/agentkey/.
Optional: install rustup for the agentkey user (dev-loop cargo). If you want to run cargo clippy / cargo test interactively as agentkey (e.g., to mirror the CI Linux env locally and catch cfg(target_os = "linux") clippy lints that don't fire on macOS), install rustup under your own $HOME once after reconnecting as agentkey:
ssh-agentkeys-test
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \
| sh -s -- -y --default-toolchain stable --profile minimal
source "$HOME/.cargo/env"
echo 'source "$HOME/.cargo/env"' >> ~/.bashrc # persist for future sessions
cd ~/agentKeys
cargo --version # resolves the repo's rust-toolchain.toml pin — the exact CI toolchain
cargo clippy --workspace --all-targets -- -D warnings # same toolchain + lint set as CIThis is optional; the broker itself runs from compiled binaries, not from a live toolchain. Operators who only manage the deployed broker (no compile-in-place dev work) can skip this.
setup-broker-host.sh (step 7b) auto-issues the Let's Encrypt cert for every co-located vhost whose DNS already resolves to this host, then flips nginx onto :443 — no manual certbot step. Issuance is DNS-dependent (HTTP-01 validates against the hostname's public A record), so a host whose A record isn't live yet is skipped and left HTTP-only (503 "TLS cert not yet issued"; the OIDC federation step in ci-setup.md §1 also needs the broker cert). The order: DNS first (setup-cloud.sh), then this script — a host added to DNS later (e.g. the classify worker, #207) just needs a re-run.
Symptom of a skipped host:
curl https://<host>/healthz→SSL: no alternative certificate subject name matches target host name— nginx served the broker's default cert because<host>has no:443block. Fix: repair the A record, then re-run the script on the host.
Re-run to converge any missing certs:
# On the broker host (ssh-agentkeys; `curl -s ifconfig.me` must print the broker EIP):
cd ~/agentKeys
sudo bash scripts/setup-broker-host.sh --ref main # prod (--ci --ref main = test broker)
# Optional LE registration email: --certbot-email ops@litentry.org (default: none)The hostname env vars come from /etc/agentkeys/broker.env (which setup-broker-host.sh wrote at step 5). For test: BROKER_HOST=test-broker.${ZONE}, SIGNER_HOST=signer-test.${ZONE}, etc. For prod: drop the -test suffix.
Manual fallback — issue one cert by hand (step 7b failed or a one-off)
⚠️ Run ON THE BROKER HOST (ssh-agentkeys;curl -s ifconfig.memust print the broker EIP). The CA validates against the hostname's public IP, so certbot run on a laptop — especially behind a VPN that rewrites${ZONE}— writes the challenge to the wrong machine and the CA gets a 404.
# PRE-CHECK (avoids burning LE's rate-limited attempts): the RUNNING nginx must
# serve the ACME path for <host> — `nginx -T` showing the vhost is not enough.
sudo nginx -t && sudo systemctl reload nginx
sudo mkdir -p /var/www/certbot/.well-known/acme-challenge
echo probe-ok | sudo tee /var/www/certbot/.well-known/acme-challenge/probe >/dev/null
curl -s http://localhost/.well-known/acme-challenge/probe -H "Host: <host>" # → probe-ok (404 ⇒ reload)
sudo rm -f /var/www/certbot/.well-known/acme-challenge/probe
sudo certbot certonly --webroot -w /var/www/certbot -d <host> \
--agree-tos --register-unsafely-without-email --non-interactive
# Flip nginx Phase A → B (the renderer picks B when /etc/letsencrypt/live/<host>/ exists):
cd ~/agentKeys
sudo bash scripts/setup-broker-host.sh --ref main # or --ci --ref main (test broker)Verify the cert is live (bypass laptop DNS, which may be rewritten by WARP / Zscaler / Tailscale to 198.18.x.y for ${ZONE}):
# DoH lookup — proves Route 53 has the right EIP, not your laptop's local resolver
curl -sS "https://dns.google/resolve?name=${BROKER_HOST}&type=A" | jq -r '.Answer[].data'
# → should be your EIP, not 198.18.x.y
# TLS handshake against the real EIP:
echo | openssl s_client -servername "${BROKER_HOST}" -connect "$(curl -sS "https://dns.google/resolve?name=${BROKER_HOST}&type=A" | jq -r '.Answer[0].data'):443" 2>&1 \
| grep -E "subject="
# → subject=/CN=<your-BROKER_HOST>If openssl s_client returns no peer certificate available, certbot didn't finish or nginx isn't on Phase B yet. Check:
sudo ls /etc/letsencrypt/live/— should list all 8 hostnames as subdirs (broker + signer + 6 workers: audit/email/cred/memory/config/classify).sudo ss -tlnp | grep ':443'— nginx should be on0.0.0.0:443.sudo tail /var/log/letsencrypt/letsencrypt.logfor the actual certbot failure.
Common failures + fixes:
Connection timeout to … port 80— the SG is missing port 80 ingress. Re-check step 1's SG requirements (you need 22, 80, and 443).DNS problem: NXDOMAIN— Route 53 doesn't have the A record yet, or DNS hasn't propagated. Wait 1-2 min, then retry. Quick check:curl -sS "https://dns.google/resolve?name=<host>&type=A"(do NOT rely ondig— local resolver may be lying).No such file or directory: /var/www/certbot— Phase A nginx render didn't complete; re-runsudo bash scripts/setup-broker-host.sh --test --yesfirst.- A worker's cert fails but the broker's works (or the CA hits an IP that isn't your broker) — the worker A records point at a different EIP than the broker. Workers co-locate with the broker, so every worker A record MUST equal
broker.${ZONE}'s. This bit us when the account had both a prod and a test broker EIP and the worker records were pointed at the test EIP whilebroker/signerstayed on prod. Check via DoH (never laptop DNS — a VPN rewrites it):for h in broker audit email cred memory config classify; do echo "$h → $(curl -s "https://dns.google/resolve?name=$h.${ZONE}&type=A" | jq -r '.Answer[0].data')"; done. If any worker differs frombroker, re-runbash scripts/dns-upsert-workers.shfrom your laptop (agentkeys-admin) — it derives the EIP frombroker.${ZONE}'s own A record, so all six workers mirror the broker (pass--eip <broker-EIP>to be explicit). unauthorized/Invalid response … /.well-known/acme-challenge/… : 404— the CA reached the host but couldn't fetch the challenge file. Two causes: (1) certbot ran on the wrong machine —--webrootwrote the challenge to that box's/var/www/certbot, but the CA validates against the hostname's public IP (the broker). Run certbot ON THE BROKER (see the⚠️ above). (2) a newly-added worker's vhost isn't live —nginx -Tshows it on disk but the running process wasn't reloaded;sudo nginx -t && sudo systemctl reload nginx, then re-run the PRE-CHECK probe. Confirm locally on the broker (not a VPN'd laptop):echo ok | sudo tee /var/www/certbot/.well-known/acme-challenge/probe >/dev/null && curl -s http://localhost/.well-known/acme-challenge/probe -H "Host: <host>"must printok.
The rest of this doc explains why each step exists and how to recover from failures. Operators following the quick start above can skip to docs/chain-setup.md once step 5b completes.
§1 Identities — four IAM principals; concept first, then provider commands
§2 Domain + DNS — subdomain ownership; parent-zone confirmation
§3 Email backend — SES domain identity + receipt rule + S3 inbound bucket
§4 IAM users + roles — agentkeys-{admin,broker,daemon} + agentkeys-data-role
§5 Bucket policy — static-IAM variant (pre-OIDC; replaced in §9 below)
§6 Instance profile — agentkeys-broker-host (optional, EC2-only)
§7 Security audit — strip legacy over-broad attached policies
§8 Cloud portability — AWS → AliCloud / GCP / Tencent Cloud mapping
§9 OIDC federation — per-broker security upgrade after broker is reachable
§10 Broker host — what setup-broker-host.sh does
§11 Cleanup — full account teardown
Surgical re-run of any single step: bash scripts/setup-cloud.sh --only-step N (with --test for test).
Two env files per environment — {operator, broker} × {prod, test, test-N…} (the test fleet adds a pair per slot, §0.3). The GitHub Actions runner doesn't get its own file — it materializes the operator-workstation env inline at job start from TEST_* secrets.
| File | Lives on | Scope | Sourced by |
|---|---|---|---|
scripts/operator-workstation.env |
operator laptop | prod | every helper script + setup-cloud.sh + setup-heima.sh + harness/run.sh |
scripts/operator-workstation.test.env |
operator laptop | test (slot 1) | same scripts, via --env-file <path> or --ci |
scripts/operator-workstation.test-2.env |
operator laptop | test slot 2 (#265; template for slots ≥ 2) | same scripts, via --ci --slot 2 |
scripts/broker.env |
prod broker host at /etc/agentkeys/broker.env |
prod | the broker process at boot (also setup-broker-host.sh writes equivalent systemd Environment= lines) |
scripts/broker.test.env |
test broker host at /etc/agentkeys/broker.env |
test (slot 1) | same |
scripts/broker.test-2.env |
slot-2 test broker host | test slot 2 | same |
| GitHub Actions runner | ephemeral runner per job | test | harness-ci.yml writes scripts/operator-workstation.env inline from TEST_* secrets (see docs/ci-setup.md §7) |
| Variable | Prod | Test | Purpose |
|---|---|---|---|
ACCOUNT_ID |
429071895007 |
429071895007 (same) |
every cloud step |
REGION |
us-east-1 |
us-east-1 |
regional API calls |
ZONE |
litentry.org |
litentry.org (same) |
parent DNS zone |
PARENT_ZONE_ID |
Route 53 zone ID | same | DNS UPSERTs |
BROKER_HOST |
broker.${ZONE} |
test-broker.${ZONE} |
OIDC issuer hostname (byte-for-byte distinct → distinct IAM OIDC provider ARN) |
MAIL_DOMAIN |
bots.${ZONE} |
bots-test.${ZONE} |
SES inbound subdomain |
BUCKET / MAIL_BUCKET |
agentkeys-mail-${ACCT} |
agentkeys-mail-test-${ACCT} |
inbound mail bucket |
VAULT_BUCKET |
agentkeys-vault-${ACCT} |
agentkeys-vault-test-${ACCT} |
credentials bucket (arch.md §17) |
MEMORY_BUCKET |
agentkeys-memory-${ACCT} |
agentkeys-memory-test-${ACCT} |
memory bucket |
DATA_ROLE_ARN |
…:role/agentkeys-data-role |
…:role/agentkeys-data-role-test |
OIDC-federated data role |
VAULT_ROLE_ARN |
…:role/agentkeys-vault-role |
…:role/agentkeys-vault-role-test |
per-data-class vault role |
MEMORY_ROLE_ARN |
…:role/agentkeys-memory-role |
…:role/agentkeys-memory-role-test |
per-data-class memory role |
OIDC_PROVIDER_ARN |
…:oidc-provider/${BROKER_HOST} |
…:oidc-provider/test-broker.${ZONE} |
derived from BROKER_HOST |
SIGNER_HOST + worker hosts |
signer.${ZONE} etc. |
signer-test.${ZONE} etc. |
per-service public hostnames |
BROKER_EMAIL_FROM_ADDRESS |
noreply@bots.${ZONE} |
noreply-test@bots-test.${ZONE} |
SES verified sender |
Heima contract *_HEIMA addresses |
one set | a DIFFERENT set (same chain, different deployer key) | per-deploy pinned addresses |
| Variable | Prod | Test |
|---|---|---|
ACCOUNT_ID |
same | same |
BROKER_DATA_ROLE_ARN |
…:role/agentkeys-data-role |
…:role/agentkeys-data-role-test |
BROKER_AWS_REGION |
us-east-1 |
us-east-1 |
BROKER_OIDC_ISSUER |
https://broker.${ZONE} |
https://test-broker.${ZONE} |
BROKER_OIDC_KEYPAIR_PATH |
/home/ubuntu/.agentkeys/broker/oidc-keypair.json |
same |
BROKER_SESSION_KEYPAIR_PATH |
/home/ubuntu/.agentkeys/broker/session-keypair.json |
same |
BROKER_AUTH_METHODS |
wallet_sig,email_link |
same |
BROKER_AUDIT_ANCHORS |
sqlite |
same |
BROKER_EMAIL_SENDER |
ses |
ses |
BROKER_EMAIL_FROM_ADDRESS |
noreply@bots.${ZONE} |
noreply-test@bots-test.${ZONE} |
The broker process never reads operator-workstation env vars directly — separation prevents a laptop value from silently shadowing the broker's own config (per scripts/broker.env header comment).
The runner doesn't ship with a checked-in env file. harness-ci.yml writes one inline at job start, mapping TEST_* repo secrets into scripts/operator-workstation.env:
| TEST secret | Maps to operator var |
|---|---|
TEST_ACCOUNT_ID |
ACCOUNT_ID |
TEST_AWS_REGION |
REGION |
TEST_BROKER_HOST |
BROKER_HOST |
TEST_VAULT_BUCKET / TEST_MEMORY_BUCKET |
VAULT_BUCKET / MEMORY_BUCKET |
TEST_DATA_ROLE_ARN / TEST_VAULT_ROLE_ARN / TEST_MEMORY_ROLE_ARN |
DATA_ROLE_ARN / VAULT_ROLE_ARN / MEMORY_ROLE_ARN |
TEST_HEIMA_DEPLOYER_KEY |
written to ~/.agentkeys/heima-deployer.key |
TEST_*_HEIMA contract addresses |
*_HEIMA |
TEST_OIDC_AWS_ROLE_ARN |
the GH Actions OIDC role (gate; not a runtime var) |
Full list + activation flow: docs/ci-setup.md §7. setup-cloud.sh validates required keys at step 2 and dies with a precise pointer if missing.
setup-cloud.sh consumes already-existing identifiers — it does NOT register your domain, create a Route 53 hosted zone, or launch the EC2. Those are operator decisions (instance type, region, key pair, DNS provider choice) and don't belong in an automated script. Three manual prereqs before the orchestrator works:
You own a domain (e.g. litentry.org). If not, register one with any registrar (Namecheap, GoDaddy, Route 53 Domains, etc.) — fully manual, out of scope here.
Create a Route 53 hosted zone for the domain (idempotent at the caller-reference level, but safe to skip if the zone already exists):
aws route53 create-hosted-zone \
--name "$ZONE" \
--caller-reference "agentkeys-$(date +%s)"Look up the zone ID (strip the /hostedzone/ prefix):
aws route53 list-hosted-zones \
--query 'HostedZones[?Name==`'"$ZONE"'.`].Id' --output text \
| awk -F/ '{print $NF}'
# → Z09723983CFJOHAE3VC65Paste it into operator-workstation.env as PARENT_ZONE_ID=Z….
Delegation: Route 53 outputs 4 NS records when you create the zone (visible via aws route53 get-hosted-zone --id $PARENT_ZONE_ID --query 'DelegationSet.NameServers'). Copy them into your registrar's DNS settings as the authoritative nameservers. Verify after propagation (usually <1h):
dig +short NS "$ZONE"
# Should return 4 ns-XX.awsdns-YY.{com,net,org,co.uk} entries.If dig returns the registrar's default nameservers instead, delegation hasn't propagated. All downstream DNS UPSERTs in §6 will silently miss until it does.
Non-Route 53 DNS providers: setup-cloud.sh step 6 hardcodes Route 53 API calls. To use Cloudflare / DigitalOcean / etc., skip step 6 (--to-step 5) and replicate the same 12 records manually — see §6 below for the canonical record set. Test isolation works identically: a test-broker.${ZONE} A record under any DNS provider is the same byte-for-byte trust scope as under Route 53.
setup-broker-host.sh runs on any Linux box with sudo, systemd, public-internet egress, ports 22/80/443 open inbound. The host is your choice:
| Setting | Prod | Test |
|---|---|---|
| Instance type | t3.small minimum | t3.micro is fine |
| AMI | Ubuntu 22.04 LTS or Amazon Linux 2023 | same |
| Security group | 22 (SSH), 80 (certbot HTTP-01), 443 (broker + workers TLS), all from 0.0.0.0/0 |
same (AWS validates OIDC JWKS over public TLS from AWS IPs that aren't pinnable) |
| Key pair | SSH key, EC2 Instance Connect, or SSM Session Manager | same |
Launch via AWS console, aws ec2 run-instances, or your IaC tool. The script doesn't care which.
Getting the IP — three workflows:
Both INSTANCE_ID and EIP live in the env file (scripts/operator-workstation.env or …test.env) — set them there once, not on the shell every run. The test stack is selected by --env-file <path> + the explicit --test flag (or auto-detected when the env-file name contains "test").
Workflow 0 (you already have EC2 + EIP attached): step 4 adopts the existing EIP
If the EC2 is already running with an EIP attached (whether allocated via the AWS Console, Terraform, or a previous setup-cloud.sh run), there's no need to allocate or re-associate. Step 4's precedence ladder detects it:
# 1. Find the existing EC2's instance id:
aws ec2 describe-instances --region "$REGION" \
--filters "Name=ip-address,Values=<YOUR-EXISTING-EIP>" \
--query 'Reservations[].Instances[].InstanceId' --output text
# 2. Paste it into the env file (one line edit):
echo 'INSTANCE_ID=i-0123…' >> scripts/operator-workstation.env
# 3. Run setup-cloud.sh — step 4 prints:
# "skip EIP <ip> already attached to <instance-id> (adopting; no allocation)"
# "ok tagged existing EIP as agentkeys-broker-eip (idempotency for re-runs)"
# No new EIP is allocated. No re-association. The existing EIP gets
# retroactively tagged so future re-runs find it via tag-lookup too.
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh --yesThe precedence inside step 4 is: A adopt EIP attached to $INSTANCE_ID → B reuse tagged EIP → C use $EIP from env file → D allocate fresh. First match wins; no later branch fires if an earlier one resolves. Fully idempotent re-runs even when the operator pre-provisioned EC2 + EIP outside the script.
Workflow A (recommended): EC2-first, then attach via env-file edit + re-run
# 1. Launch EC2 → note INSTANCE_ID
aws ec2 run-instances --instance-type t3.small --image-id <ami> --key-name <key> ...
# 2. Paste INSTANCE_ID into the env file (one line edit):
echo 'INSTANCE_ID=<from-step-1>' >> scripts/operator-workstation.env
# (or for test: scripts/operator-workstation.test.env)
# 3. Bootstrap (allocates EIP + attaches to INSTANCE_ID + persists EIP back to env)
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh --yes
# Test stack:
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh \
--env-file scripts/operator-workstation.test.env --test --yes
# 4. SSH (EIP is now in the env file as EIP=…)
ssh ubuntu@$(grep ^EIP= scripts/operator-workstation.env | cut -d= -f2)Workflow B: EIP-first, attach manually
# 1. Allocate EIP (printed at §14 summary; persisted to env file as EIP=…)
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh --yes
# 2. Launch EC2
aws ec2 run-instances ...
# 3. Attach the EIP
aws ec2 associate-address --region "$REGION" \
--instance-id <new-instance-id> \
--public-ip $(grep ^EIP= scripts/operator-workstation.env | cut -d= -f2)A is one fewer command; B is sometimes necessary when an existing EC2 needs to be repointed at the EIP later. For test, swap in --env-file scripts/operator-workstation.test.env --test everywhere — the EIP will be tagged agentkeys-broker-eip-test (the test env file has the test placeholders pre-populated).
Once the EC2 is launched + the EIP attached, SSH access goes through scripts/ssh-broker.sh — single entry point that reads INSTANCE_ID + EIP from scripts/broker.env or scripts/broker.test.env so it stays in lockstep with whatever setup-cloud.sh persisted.
# Prod broker via EC2 Instance Connect (no .pem needed):
bash scripts/ssh-broker.sh
# Test broker:
bash scripts/ssh-broker.sh test
# Fallback via .pem key (when EC2 Instance Connect is down):
bash scripts/ssh-broker.sh prod --fallback
bash scripts/ssh-broker.sh test --fallbackDefault AWS profiles per stack (least-privilege, one-shot to provision):
| Stack | Default profile | Trust |
|---|---|---|
prod |
agentkeys-broker |
ec2-instance-connect:SendSSHPublicKey on the prod instance ARN only |
test |
agentkeys-broker-test |
same, scoped to the test instance ARN |
If agentkeys-broker or agentkeys-broker-test doesn't exist yet, setup-cloud.sh step 12 creates it idempotently (scoped to whatever INSTANCE_ID is set in the corresponding broker env file):
# Test stack — creates agentkeys-broker-test, scopes ec2-instance-connect
# to INSTANCE_ID from broker.test.env, mints an access key ONCE if none
# active. Re-run is a no-op once the user + policy + key already exist.
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh \
--env-file scripts/operator-workstation.test.env --test --only-step 12
# Prod stack (the canonical `agentkeys-broker` user from AGENTS.md):
AWS_PROFILE=agentkeys-admin bash scripts/setup-cloud.sh --only-step 12The script prints the access key once (paste into ~/.aws/credentials as [agentkeys-broker] / [agentkeys-broker-test]) — it never re-mints on subsequent runs because the operator already holds the secret. If INSTANCE_ID is unset in the broker env file, step 12 skips with a pointer to paste it first.
Shell wrappers (drop in ~/.zshrc) make the common case one keystroke:
AGENTKEYS_REPO="$HOME/Projects/agentKeys"
alias ssh-prod='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh prod'
alias ssh-test='bash $AGENTKEYS_REPO/scripts/ssh-broker.sh test'A long-lived IAM user with IAMFullAccess + AmazonS3FullAccess + AmazonSESFullAccess + AmazonRoute53FullAccess permissions. Already provisioned per AGENTS.md "AWS local-profile ↔ remote-IAM mapping". Switch to it before any bootstrap call:
awsp agentkeys-admin
aws sts get-caller-identity # → arn:aws:iam::…:user/agentkeys-adminThe bootstrap script intentionally doesn't auto-create the admin user — bootstrapping IAM root credentials onto disk is the kind of thing you only do once, by hand, with the IAM Console open.
Same AWS account is fine — isolation comes from the -test suffix on every identifier, not from the account boundary. Cross-trust is structurally impossible because the trust policy on every test role lists ONLY the test OIDC provider ARN (which is bound byte-for-byte to test-broker.${ZONE}, never broker.${ZONE}).
| Resource | Prod name | Test name | Created by |
|---|---|---|---|
| IAM user (daemon) | agentkeys-daemon |
agentkeys-daemon-test |
setup-cloud.sh step 10 (suffixed when --test flag is passed, or env-file path matches *test* as an ergonomic auto-detect) |
| IAM role (data) | agentkeys-data-role |
agentkeys-data-role-test |
setup-cloud.sh step 11 (same suffix logic) |
| IAM role (vault) | agentkeys-vault-role |
agentkeys-vault-role-test |
provision-vault-role.sh reads VAULT_ROLE_ARN from the active env file |
| IAM role (memory) | agentkeys-memory-role |
agentkeys-memory-role-test |
provision-memory-role.sh (same env-driven pattern) |
| IAM OIDC provider | …oidc-provider/broker.${ZONE} |
…oidc-provider/test-broker.${ZONE} |
manual aws iam create-open-id-connect-provider per §9.2 (one per broker URL — AWS validates byte-for-byte) |
| EC2 instance profile | agentkeys-broker-host |
agentkeys-broker-host-test |
§6 (optional) |
| EIP (tag) | agentkeys-broker-eip |
agentkeys-broker-eip-test |
setup-cloud.sh step 4 |
| Mail bucket | agentkeys-mail-${ACCT} |
agentkeys-mail-test-${ACCT} |
setup-cloud.sh step 7 (from BUCKET env var) |
| Vault bucket | agentkeys-vault-${ACCT} |
agentkeys-vault-test-${ACCT} |
provision-vault-bucket.sh (from VAULT_BUCKET env var) |
| Memory bucket | agentkeys-memory-${ACCT} |
agentkeys-memory-test-${ACCT} |
provision-memory-bucket.sh (from MEMORY_BUCKET env var) |
| SES sender | noreply@bots.${ZONE} |
noreply-test@bots-test.${ZONE} |
ses-verify-sender.sh (from BROKER_EMAIL_FROM_ADDRESS) |
| Heima contracts | one set of 6 addresses | a different set of 6 (same chain, different deployer key) | setup-heima.sh per deployer key |
Cross-trust isolation enforced by:
- OIDC provider URL is the trust scope. Each role's trust policy names exactly one provider ARN. The provider ARN derives from the broker URL.
broker.${ZONE}andtest-broker.${ZONE}produce distinct ARNs, so the test OIDC provider literally cannot mint JWTs that prod roles accept. - PrincipalTag scoping (§9.4) layers on top. Even if a test JWT somehow reached a prod role, the bucket policy condition
s3:prefix=bots/${aws:PrincipalTag/agentkeys_actor_omni}/*would still scope reads/writes by actor. - Per-data-class bucket separation. Vault role's IAM grants reference vault bucket only; memory role references memory bucket only. Even within one stack, vault creds in the memory bucket → AccessDenied (defense-in-depth for the cap-mint layer).
setup-cloud.sh validates required env keys at step 2 and dies with a precise pointer if missing.
The account can host N test brokers in parallel (one EC2 per slot) so concurrent CI runs stop trampling each other's deploys. Prod stays single-broker. Slots are selected everywhere by --ci --slot N (or AGENTKEYS_TEST_SLOT=N, or auto-detected from a *test-N* env-file path); a bare --ci is slot 1.
Naming scheme. Slot 1 predates the fleet and keeps its names (the OIDC issuer URL is byte-frozen against the registered IAM provider — renaming would break every trust policy). Slots ≥ 2 use a uniform -test-N suffix:
| Identifier | Prod | Test slot 1 (grandfathered) | Test slot N ≥ 2 |
|---|---|---|---|
| Broker host / OIDC issuer | broker.${ZONE} |
test-broker.${ZONE} |
broker-test-N.${ZONE} |
| MCP host | mcp.${ZONE} |
test-mcp.${ZONE} |
mcp-test-N.${ZONE} |
| Signer + 6 workers | signer.${ZONE} … |
signer-test.${ZONE} … |
signer-test-N.${ZONE} … |
| EIP tag | agentkeys-broker-eip |
agentkeys-broker-eip-test |
agentkeys-broker-eip-test-N |
| IAM users/roles | agentkeys-daemon … |
agentkeys-daemon-test … |
agentkeys-daemon-test-N … |
| Buckets | agentkeys-vault-${ACCT} … |
agentkeys-vault-test-${ACCT} … |
agentkeys-vault-test-N-${ACCT} … |
| Mail domain / sender | bots.${ZONE} |
bots-test.${ZONE} / noreply-test@ |
bots-test-N.${ZONE} / noreply-test-N@ |
| SES receipt rule | agentkeys-inbound |
agentkeys-inbound-test |
agentkeys-inbound-test-N |
| SSM instance profile | agentkeys-broker-host (§6) |
agentkeys-test-broker-ssm |
agentkeys-test-broker-ssm-N |
| Env files | …workstation.env + broker.env |
….test.env + broker.test.env |
….test-N.env + broker.test-N.env |
| SSH | ssh-broker.sh prod |
ssh-broker.sh test |
ssh-broker.sh test-N |
Shared across the fleet vs replicated per slot (mirrors the issue #265 table):
- Shared: the AWS account, parent DNS zone, SES rule set name (
agentkeys— rules are per-slot), the Heima TEST contract set (Registry/Scope/EntryPoint/factory are multi-tenant, keyed by omni/account), the GH-Actions deploy rolegithub-actions-agentkeys-deploy(its SendCommand scope covers the whole fleet). - Replicated per slot: EC2 + EIP (+ tag), the full DNS name set (9 A records), OIDC issuer + IAM provider, daemon/SSH users, data + per-data-class roles, all four buckets (mail/vault/memory/config — decision (a): per-slot buckets, zero cross-slot blast radius), SES domain identity + sender + receipt rule, SSM instance profile (per-machine — never shared, per #265's isolation requirement), and (phase 3) the slot's own Heima deployer wallet → distinct omnis + master account.
Cross-slot isolation is structural, same mechanism as prod↔test (§0.2): each slot's roles trust ONLY that slot's OIDC provider ARN, which derives from the slot's broker URL — slot 2's JWTs cannot assume slot 1's roles, and vice versa.
Adding slot N (checklist):
- Launch the EC2 (t3.small+; SG ports 22/80/443 open — the existing
launch-wizard-2SG is fine) and attach/allocate an EIP. Tag conventions are applied by the script, not by hand. - Create the two env files by copying the slot-2 pair:
scripts/operator-workstation.test-2.env→….test-N.env(replace everytest-2withtest-N; keep the SHARED contract addresses verbatim; leaveDEPLOYER_ADDR_HEIMAempty until the slot's deployer key exists) andscripts/broker.test-2.env→broker.test-N.env(paste the newINSTANCE_ID+EIP). bash scripts/setup-cloud.sh --ci --slot N --yes --to-step 12(laptop,agentkeys-admin). Step 13's per-data-class roles need the slot's OIDC provider to exist, so it runs later (step 6 of this checklist).- On the EC2 (first time via
bash scripts/ssh-broker.sh test-N --fallback): clone the repo,sudo bash scripts/setup-broker-host.sh --ci --slot N --yes, then issue certs per quick-start §5b (all 9 hostnames) and re-run the script to flip nginx onto :443.--slot(and even--ci) matter only on this FIRST bootstrap of the virgin host — once the broker unit exists, every re-run self-identifies the environment + slot from the deployed unit'sBROKER_OIDC_ISSUER(so CI's flagless-slot SSM re-deploys can't cross-wire slots), and an explicit--slotthat contradicts the deployed identity is a hard error. - Register the slot's OIDC provider per §9.2 (issuer
https://broker-test-N.${ZONE}). bash scripts/setup-cloud.sh --ci --slot N --yes --only-step 13— per-data-class buckets + roles + bucket policies now succeed (their trust policies name the provider from step 5).bash scripts/provision-ci-deploy-role.sh --fix-ssm(no instance flag — it auto-discovers everybroker.test*.envslot, re-scopes the CI deploy role to the whole fleet, and creates + associates the slot'sagentkeys-test-broker-ssm-Ninstance profile).- Verify:
ENV_FILE=scripts/operator-workstation.test-N.env bash scripts/verify-workers.shandcurl -sf https://broker-test-N.${ZONE}/healthz.
CI slot routing (issue #265 phase 4) is wired but dormant by default. harness-ci.yml shards the deploy + harness concurrency per slot (slot 1 keeps the legacy group string heima-test-deployer-nonce so runs from pre-phase-4 branches still mutually exclude; slots ≥ 2 get heima-test-slot-N) and selects the slot's deployer key / instance id / env; the slot itself is picked least-loaded by the run's dependency-free slot-claim job (static PR# % N + 1 as tiebreak/fallback — spec/ci-parallel-test-fleet.md §4b). N is the repo variable AGENTKEYS_TEST_SLOT_COUNT, default 1 (every run on slot 1 = the pre-fleet behavior). Turn on parallelism with gh variable set AGENTKEYS_TEST_SLOT_COUNT --body N once each slot's CI is proven (validate a slot first via gh workflow run harness-ci.yml -f slot_override=N). Watch the fleet + run→slot mapping live with bash scripts/fleet-status.sh --watch (#279 v1). Full runbook: ci-setup.md "Activate parallel CI"; design + why the chain contract set is SHARED: spec/ci-parallel-test-fleet.md.
Live fleet inventory (re-verify with aws ec2 describe-addresses --region "$REGION" --filters Name=tag:Name,Values='agentkeys-broker-eip-test*' — never hardcode the IPs downstream; the env files are the source of truth):
| Slot | EC2 Name tag |
Env files | Stood up |
|---|---|---|---|
| 1 | agentkeys-broker-test |
*.test.env |
pre-#265 (the original CI broker) |
| 2 | agentkeys-broker-test-2 |
*.test-2.env |
2026-06-11 (#265 phase 2) |
Why
jq -n --argand notcat > file.json <<EOF:jq --argpasses values outside shell parameter expansion, sidestepping the zsh modifier bug ($VAR:retc.) that silently corrupts ARNs. JSON is validated on construction, command substitution feeds straight into--policy-document, no file lands on disk. The orchestrator + every helper script applies this convention.
Production runs one full broker stack per chain: the Heima stack (broker.${ZONE}, the consumer free tier, plain flags) and the Base stack (-base, the permissioned B2B2C partner tier — #282 D5 dual-stack), each on its own EC2. This is the same spawn-another-broker pattern as §0.3 / ci-setup.md "Add a test-fleet slot" — instantiated on the chain axis (one machine per chain, prod posture) instead of the slot axis (N machines, one chain). The chain migration itself (contracts, ceremonies, phases) is plan/chain/base-migration.md; this section is the end-to-end operator journey for the per-chain cloud + host substrate, traps inline.
Selection everywhere is
--base(or auto-detect from a*base*env-file path). A PROD stack — mutually exclusive with--ci; real SES sender, hard-fail accept drift guard. Host re-runs are flagless (self-identify from the deployed unit's issuer; a contradicting--ci/--baseis a hard cross-wiring error).
Naming — the §0.3 worker-style suffix scheme with -base on every identifier:
| Identifier | Base prod stack |
|---|---|
| Broker host / OIDC issuer | broker-base.${ZONE} |
| MCP host | mcp-base.${ZONE} |
| Signer + 6 workers | signer-base.${ZONE}, audit/email/cred/memory/config/classify-base.${ZONE} |
| EIP tag | agentkeys-broker-eip-base |
| IAM users/roles | agentkeys-daemon-base, agentkeys-broker-base, agentkeys-data-role-base, per-class agentkeys-{vault,memory,config}-role-base |
| Buckets | agentkeys-{mail,vault,memory,config}-base-${ACCT} |
| Mail domain / sender / receipt rule | bots-base.${ZONE} / noreply-base@ / agentkeys-inbound-base |
| EC2 instance profile | agentkeys-broker-host-base (§6 with -base substituted) |
| Env files / SSH | *.base.env / ssh-broker.sh base |
Shared vs replicated: shared with the Heima prod stack = the AWS account + parent DNS zone, nothing else. Everything in the table is replicated per stack. Isolation is structural, same mechanism as prod↔test (§0.2): each stack's roles trust only that stack's OIDC provider ARN.
The chain dimension (the one thing the slot pattern doesn't have): the env file pins AGENTKEYS_CHAIN=base; setup-broker-host.sh templates that chain into every unit + worker env it writes — env keys carry the _BASE suffix per the Rust env_profile convention, RPC default from chain-profiles/base.json. Until #282 Phase 4 deploys the Base contract set, the *_BASE addresses are empty → the stack runs healthz-green but chain-degraded (#241 posture: accept / cap-mint / chain-verify answer actionable 5xx; OIDC + IAM + S3 + SES fully live). Deployer key: ~/.agentkeys/deployer-base.key (0600; symlink base-deployer.key → it for the bring-up's <chain>-deployer.key path), address pinned as DEPLOYER_ADDR_BASE.
Add the stack (checklist — live since 2026-06-12 on EC2 agentkeys-broker-base; broker.base.env is the instance/EIP source of truth):
- Launch the EC2 (t3.medium; SG ports 22/80/443 — the broker SGs qualify) and attach an EIP. Paste
INSTANCE_ID+EIPintoscripts/broker.base.env; tags are applied by the script. - Create the two env files. For Base they're committed (
operator-workstation.base.env+broker.base.env); for a future chain<c>, copy that pair with s/base/<c>/, keep the*_<C>contract keys EMPTY, and pin the chain's deployer address. - Cloud provision (laptop,
agentkeys-admin):bash scripts/setup-cloud.sh --base --yes --to-step 12. Step 4 adopts + retro-tags the attached EIP; append the two printed access-key blocks to~/.aws/credentialsas[agentkeys-daemon-base]/[agentkeys-broker-base]. Trap: step 9's sender verify can exit non-zero while actually succeeding — re-run--only-step 9; "already verified" = done. - Instance profile
agentkeys-broker-host-baseper §6 with-basesubstituted (SES grant scoped to thebots-base.${ZONE}identities +AmazonSSMManagedInstanceCore), associated to the instance. - Broker host bring-up (first time via
bash scripts/ssh-broker.sh base --fallback): clone the repo,sudo bash scripts/setup-broker-host.sh --base --yes. TLS for the 8 hostnames auto-issues in the same run (§5b — DNS from step 3 must be live first;mcp-basestays DNS-only until #152); if any host was skipped, re-run flagless after fixing its A record. - Register the stack's OIDC provider (§9.2, issuer
https://broker-base.${ZONE}), thenbash scripts/setup-cloud.sh --base --yes --only-step 13(the per-class roles' trust policies need the provider to exist first), then a fullbash scripts/setup-cloud.sh --base --yesconverge (all ok/skip). - Verify:
ENV_FILE=scripts/operator-workstation.base.env bash scripts/wait-stack-healthy.sh(8/8 green over public TLS) + OIDC discovery athttps://broker-base.${ZONE}/.well-known/openid-configurationserves the issuer with a populated JWKS.
What this does NOT yet wire (deferred to #282 Phases 2–5): the Base contract set + P-256 wrapper (#170) + paymaster/EntryPoint wiring, binding ceremonies, harness coverage, web-app chain copy. No CI deploy role either — prod stacks are operator-deployed via ssh-broker.sh base, never SSM. fleet-status.sh (#279/#280) does not yet discover *.base.env.
Cloud-agnostic. The four principals exist in every cloud the broker runs on; the cloud changes only which API creates them.
| Identity | Type | Holds | Purpose |
|---|---|---|---|
agentkeys-admin |
privileged user | Long-lived access key | One-shot provisioning. Runs every command in this doc. IAM-admin scope. |
agentkeys-broker |
scoped user | Long-lived access key | Operator's SSH-into-EC2 path via EC2 Instance Connect (AWS) / SSH key (other clouds). No data-plane access. |
agentkeys-daemon |
runtime user | Long-lived access key | The broker process uses this at runtime. Only permission: assume the data role. |
agentkeys-data-role |
assumed role | (none — assumed) | Holds the actual storage + email permissions. Trusted by the runtime user (Stage 6) or by the OIDC provider (Stage 7). |
agentkeys-broker-host |
instance profile (optional) | (none — bound to a VM) | If the broker runs on a managed VM, attach this so the daemon never sees a static key. Runtime creds come from IMDS / metadata server. |
Why "data role" and not "agent role": the project word "agent" already means three things (the AI agent, the AgentKeys product, an IAM role). The role holds data-plane permissions. The broker still accepts the legacy
BROKER_AGENT_ROLE_ARNenv var for backwards compatibility.
Six subdomains under the operator's parent zone (substitute ${ZONE} everywhere):
| Host | Purpose | Provisioned in |
|---|---|---|
${MAIL_DOMAIN} (e.g. bots.${ZONE}) |
SES / email backend inbound | §3 |
${BROKER_HOST} (e.g. broker.${ZONE}) |
Broker public reverse proxy | §10.1 below |
signer.${ZONE} |
Signer service (issue #74 step 1b) | §10.1 below |
audit.${ZONE} / email.${ZONE} / cred.${ZONE} / memory.${ZONE} / config.${ZONE} |
Service workers (issue #90; config = #201 master-only taxonomy) | §10.1 below (dev co-location on broker EIP today) |
Confirm the parent zone is reachable before any record changes (AWS Route 53 example; the same get-hosted-zone shape exists on AliCloud DNS + Cloud DNS):
aws route53 get-hosted-zone --id "$PARENT_ZONE_ID" \
--query 'HostedZone.{name:Name, private:Config.PrivateZone}'
# → {"name": "${ZONE}.", "private": false}The bulk service-worker A-record creation is automated by scripts/dns-upsert-workers.sh (AWS Route 53 today). For other providers, replicate the same shape — the hostnames are the migration seam.
aws sesv2 create-email-identity \
--region "$REGION" --email-identity "$MAIL_DOMAIN" \
--dkim-signing-attributes NextSigningKeyLength=RSA_2048_BITThen publish DKIM + SPF + DMARC + MX records in one DNS change. AWS Route 53:
read -r T1 T2 T3 <<<"$(aws sesv2 get-email-identity --region "$REGION" \
--email-identity "$MAIL_DOMAIN" --query 'DkimAttributes.Tokens' --output text)"
aws route53 change-resource-record-sets --hosted-zone-id "$PARENT_ZONE_ID" \
--change-batch "$(jq -n \
--arg domain "$MAIL_DOMAIN" --arg region "$REGION" \
--arg t1 "$T1" --arg t2 "$T2" --arg t3 "$T3" '{
Comment: "AgentKeys email infra for \($domain)",
Changes: [
{Action:"UPSERT", ResourceRecordSet:{Name:"\($t1)._domainkey.\($domain)", Type:"CNAME", TTL:300, ResourceRecords:[{Value:"\($t1).dkim.amazonses.com"}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:"\($t2)._domainkey.\($domain)", Type:"CNAME", TTL:300, ResourceRecords:[{Value:"\($t2).dkim.amazonses.com"}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:"\($t3)._domainkey.\($domain)", Type:"CNAME", TTL:300, ResourceRecords:[{Value:"\($t3).dkim.amazonses.com"}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:$domain, Type:"MX", TTL:300, ResourceRecords:[{Value:"10 inbound-smtp.\($region).amazonaws.com"}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:$domain, Type:"TXT", TTL:300, ResourceRecords:[{Value:"\"v=spf1 include:amazonses.com -all\""}]}},
{Action:"UPSERT", ResourceRecordSet:{Name:"_dmarc.\($domain)", Type:"TXT", TTL:300, ResourceRecords:[{Value:"\"v=DMARC1; p=quarantine; rua=mailto:dmarc@\($domain)\""}]}}
]
}')"Wait ~5 min for DKIM propagation, then verify:
aws sesv2 get-email-identity --region "$REGION" --email-identity "$MAIL_DOMAIN" \
--query '{verified: VerifiedForSendingStatus, dkim: DkimAttributes.Status}'
# → {"verified": true, "dkim": "SUCCESS"}DKIM key custody: in this interim setup, the email service holds the private DKIM key (AWS-internal on SES, AliCloud-internal on DirectMail, etc.). Trust surface = provider could forge mail signed as us → bounded blast radius (reputation, not user-data custody). Migration target is TEE-held BYODKIM — track in
docs/spec/heima-gaps-vs-desired-architecture.md§4. Do not intermediate-step to "BYODKIM with file-stored key" (strictly worse than provider-managed).
aws s3api create-bucket \
--region "$REGION" --bucket "$BUCKET" \
$([ "$REGION" != "us-east-1" ] && echo "--create-bucket-configuration LocationConstraint=$REGION")
aws s3api put-public-access-block --region "$REGION" --bucket "$BUCKET" \
--public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
# 30-day TTL on inbound objects (throwaway-inbox model)
aws s3api put-bucket-lifecycle-configuration --region "$REGION" --bucket "$BUCKET" \
--lifecycle-configuration "$(jq -n '{
Rules: [{ID:"inbound-30d-ttl", Status:"Enabled", Filter:{Prefix:"inbound/"}, Expiration:{Days:30}}]
}')"aws ses create-receipt-rule-set --rule-set-name agentkeys --region "$REGION" 2>/dev/null || true
aws ses create-receipt-rule --region "$REGION" --rule-set-name agentkeys \
--rule "$(jq -n --arg domain "$MAIL_DOMAIN" --arg bucket "$BUCKET" '{
Name: "agentkeys-inbound", Enabled: true, ScanEnabled: true, TlsPolicy: "Optional",
Recipients: [$domain],
Actions: [{S3Action: {BucketName: $bucket, ObjectKeyPrefix: "inbound/"}}]
}')"
aws ses set-active-receipt-rule-set --rule-set-name agentkeys --region "$REGION"Inbound MIME lands at s3://$BUCKET/inbound/<msg_id>. First object: AMAZON_SES_SETUP_NOTIFICATION (provider's "I successfully wrote to your bucket" marker). Real mail follows.
Sandbox vs production sending: inbound is unaffected by SES sandbox; outbound to arbitrary addresses needs Console → Support → "SES Sending Limits" → "Request Production Access".
aws iam create-user --user-name agentkeys-daemon
aws iam create-access-key --user-name agentkeys-daemon
# → save AccessKeyId + SecretAccessKey to your secret manager. NEVER to git.
aws iam put-user-policy --user-name agentkeys-daemon \
--policy-name agentkeys-daemon-assume-role \
--policy-document "$(jq -n --arg acct "$ACCOUNT_ID" '{
Version:"2012-10-17",
Statement:[{
Effect:"Allow", Action:"sts:AssumeRole",
Resource:"arn:aws:iam::\($acct):role/agentkeys-data-role"
}]
}')"The daemon user can do exactly one thing: assume agentkeys-data-role. Any storage / email action goes through the role's permissions, never the user's.
The role's trust policy starts with the static-IAM-user variant. After the broker is publicly reachable, docs/cloud-bootstrap.md §4 swaps it for the OIDC-federated variant.
aws iam create-role --role-name agentkeys-data-role \
--assume-role-policy-document "$(jq -n --arg acct "$ACCOUNT_ID" '{
Version:"2012-10-17",
Statement:[{
Effect:"Allow",
Principal:{AWS:"arn:aws:iam::\($acct):user/agentkeys-daemon"},
Action:"sts:AssumeRole"
}]
}')"
aws iam put-role-policy --role-name agentkeys-data-role \
--policy-name agentkeys-data-role-inline \
--policy-document "$(jq -n \
--arg bucket "$BUCKET" --arg region "$REGION" \
--arg acct "$ACCOUNT_ID" --arg domain "$MAIL_DOMAIN" '{
Version:"2012-10-17",
Statement:[
{Effect:"Allow", Action:"s3:ListBucket", Resource:"arn:aws:s3:::\($bucket)"},
{Effect:"Allow", Action:"s3:GetObject", Resource:"arn:aws:s3:::\($bucket)/*"},
{Effect:"Allow", Action:["ses:SendEmail","ses:GetEmailIdentity"],
Resource:["arn:aws:ses:\($region):\($acct):identity/\($domain)",
"arn:aws:ses:\($region):\($acct):identity/*@\($domain)"]}
]
}')"
export ROLE_ARN=$(aws iam get-role --role-name agentkeys-data-role --query 'Role.Arn' --output text)
echo "ROLE_ARN=$ROLE_ARN"Per arch.md §17.2: separate roles for credentials + memory data classes. Same trust shape as §4.2, distinct inline policies + PrincipalTag scoping. Provisioned by per-data-class helpers (idempotent):
bash scripts/provision-vault-bucket.sh # agentkeys-vault-${ACCOUNT_ID}
bash scripts/provision-vault-role.sh # agentkeys-vault-role
bash scripts/apply-vault-bucket-policy.sh # v3 split-statement PrincipalTag policy
bash scripts/provision-memory-bucket.sh
bash scripts/provision-memory-role.sh
bash scripts/apply-memory-bucket-policy.sh
bash scripts/cleanup-mail-bucket-policy.sh # restore email-only grants on $BUCKETThese scripts are the source of truth for the policy shape — read them, don't transcribe.
If you reached this section, agentkeys-admin exists (you're using it). agentkeys-broker is whatever IAM user you SSH into the broker host with — its perms are out of scope (ec2-instance-connect:SendSSHPublicKey on the host's instance ID is sufficient for AWS Instance Connect).
aws s3api put-bucket-policy --region "$REGION" --bucket "$BUCKET" \
--policy "$(jq -n --arg bucket "$BUCKET" --arg acct "$ACCOUNT_ID" '{
Version:"2012-10-17",
Statement:[
{
Sid:"AllowSESWriteInbound", Effect:"Allow",
Principal:{Service:"ses.amazonaws.com"},
Action:"s3:PutObject",
Resource:"arn:aws:s3:::\($bucket)/*",
Condition:{StringEquals:{"aws:Referer":$acct}}
},
{
Sid:"AllowDaemonRead", Effect:"Allow",
Principal:{AWS:"arn:aws:iam::\($acct):role/agentkeys-data-role"},
Action:["s3:GetObject","s3:ListBucket"],
Resource:["arn:aws:s3:::\($bucket)","arn:aws:s3:::\($bucket)/*"]
}
]
}')"The PrincipalTag-scoped federated variant (which replaces this once OIDC federation is up) lives in docs/cloud-bootstrap.md §4.4.
If the broker runs on AWS EC2, attach this so the daemon never holds a static key. Runtime creds come from IMDS.
ROLE=agentkeys-broker-host
aws iam create-role --role-name "$ROLE" \
--assume-role-policy-document "$(jq -n '{
Version:"2012-10-17",
Statement:[{Effect:"Allow", Principal:{Service:"ec2.amazonaws.com"}, Action:"sts:AssumeRole"}]
}')"
aws iam put-role-policy --role-name "$ROLE" --policy-name BrokerAssumeData \
--policy-document "$(jq -n --arg acct "$ACCOUNT_ID" '{
Version:"2012-10-17",
Statement:[{Effect:"Allow", Action:"sts:AssumeRole",
Resource:"arn:aws:iam::\($acct):role/agentkeys-data-role"}]
}')"
aws iam create-instance-profile --instance-profile-name "$ROLE"
aws iam add-role-to-instance-profile --instance-profile-name "$ROLE" --role-name "$ROLE"
aws ec2 associate-iam-instance-profile --region "$REGION" \
--instance-id "$INSTANCE_ID" \
--iam-instance-profile Name="$ROLE"Caller-region trap:
agentkeys-adminprofile defaults tous-west-2; the broker EC2 usually lives inus-east-1. Without--region "$REGION",describe-instancessilently returns empty and downstreamput-role-policyruns with--role-name "". Pass--regionexplicitly on every regional call. See AGENTS.md "AWS local-profile ↔ remote-IAM mapping".
The broker calls SES v2 SendEmail with its own runtime credentials (instance profile), not via the assumed agentkeys-data-role. Without ses:SendEmail on the broker's role, the operator hits:
broker rejected /v1/auth/email/request: status=502 body=
{"error":"backend_unreachable","message":"… ses SendEmail:
unhandled error (AccessDeniedException)"}
The IAM action is ses:SendEmail (sesv2), NOT ses:SendRawEmail (v1; different code path the broker doesn't use). The grant lives on the broker's runtime role (agentkeys-broker-host on EC2; the user agentkeys-daemon otherwise) — see docs/cloud-bootstrap.md §3.3 for the exact statement.
Some early deploys ship with AmazonS3FullAccess (or similar wide permissions) attached to the broker's runtime role. The broker at runtime ONLY uses aws-sdk-sts (the GetCallerIdentity startup probe) + aws-sdk-sesv2 (the §6.1 grant) — it never accesses S3 with its own creds. Per-user S3 is via JWT-assumed agentkeys-{data,vault,memory}-role, not the broker's runtime role.
A broker compromise with AmazonS3FullAccess would expose every inbound email in the SES bucket (verification tokens, magic links). Strip it:
# Discover the actual role attached to the broker host (canonical name:
# agentkeys-broker-host; some early deploys landed on different names):
INSTANCE_PROFILE_ARN=$(aws ec2 describe-instances --region "$REGION" \
--filters "Name=ip-address,Values=$EIP" \
--query 'Reservations[].Instances[].IamInstanceProfile.Arn' --output text)
ROLE=$(aws iam get-instance-profile \
--instance-profile-name "${INSTANCE_PROFILE_ARN##*/}" \
--query 'InstanceProfile.Roles[0].RoleName' --output text)
echo "broker runtime role: $ROLE"
# Audit attached policies:
aws iam list-attached-role-policies --role-name "$ROLE"
# Detach AmazonS3FullAccess if present:
aws iam detach-role-policy --role-name "$ROLE" \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
# Verify only the narrow inline policy (BrokerSendEmail + AssumeDataRole) remains:
aws iam list-role-policies --role-name "$ROLE"
aws iam list-attached-role-policies --role-name "$ROLE"Every layer in §3–§5 has a 1:1 analog on the major providers. The provisioning shape carries; only the API endpoints + JSON dialects differ.
| Layer | AWS (current) | AliCloud (in progress) | GCP | Tencent Cloud |
|---|---|---|---|---|
| Privileged user | IAM user with IAMFullAccess |
RAM user with AliyunRAMFullAccess |
IAM service account with roles/iam.securityAdmin |
CAM user with AdministratorAccess |
| Runtime user | IAM user + access key | RAM user + AK/SK | Service account + key file (or Workload Identity) | CAM user + SecretId/SecretKey |
| Data role | IAM role + assume policy | RAM role + assume policy | Service account + IAM bindings | CAM role + assume policy |
| Federation | IAM OIDC provider | RAM IDaaS / OIDC provider | Workload Identity Pool | CAM OIDC provider |
| Object store | S3 + bucket policy | OSS + bucket policy | Cloud Storage + IAM bindings | COS + bucket policy |
| Email backend | SES + S3 receipt rule | DirectMail / SimpleDM + OSS event notification | SendGrid / Mailgun (no GCP-native) | SimpleDM + COS |
| TLS termination | nginx + Let's Encrypt | nginx + Let's Encrypt | nginx + Let's Encrypt | nginx + Let's Encrypt |
| Compute (broker host) | EC2 + EIP | ECS + EIP | Compute Engine + external IP | CVM + EIP |
| DNS | Route 53 | AliCloud DNS | Cloud DNS | DNSPod / Cloud DNS |
| Secrets storage | Secrets Manager / SSM Parameter Store | KMS Secrets Manager | Secret Manager | KMS |
Migration playbook (cloud → cloud):
- Re-bind operator-workstation.env to the new provider's identifiers (account ID, region, role ARNs, bucket name).
- Re-run this doc top-to-bottom against the new provider.
- Re-run §9 (OIDC federation activation) — substitute the provider's OIDC API.
- Re-run
scripts/setup-broker-host.shon the new host (the script doesn't care which cloud — it consumes already-provisioned identifiers). - Re-run
scripts/setup-heima.sh— the chain side is cloud-agnostic. - Re-run the harness scripts to validate end-to-end.
The boundary is sharp: the broker process itself contains zero cloud-specific code — it talks STS-compatible OIDC + S3-compatible PutObject/GetObject + SMTP-compatible SendEmail. Every cloud above offers all three primitives. The provisioner-scripts/email-backends/ directory documents the email-backend trait; a new backend slots in as tencent-simpledm-cos (or similar) with the same upstream API as ses-s3.
The broker mints OIDC JWTs that AWS STS validates via the broker's public JWKS endpoint. Three one-shot steps per account, run AFTER setup-broker-host.sh finishes and the broker is reachable at https://${BROKER_HOST} over public TLS.
https://${BROKER_HOST}/.well-known/openid-configurationreturns 200 with the expectedissuer+jwks_uri.https://${BROKER_HOST}/.well-known/jwks.jsonreturns at least one ES256 key.curl -sf "https://${BROKER_HOST}/healthz"returns 200.
# DoH-resolved EIP (immune to local DNS interception; see §5b verify steps):
broker_ip=$(curl -sS "https://dns.google/resolve?name=${BROKER_HOST}&type=A" | jq -r '.Answer[0].data')
# -sha1 is REQUIRED. macOS LibreSSL 3.3 + OpenSSL 3.x default to SHA256
# (64 hex chars) but AWS IAM CreateOpenIDConnectProvider rejects anything
# that isn't exactly 40 hex chars (SHA1).
thumb=$(echo | openssl s_client -servername "$BROKER_HOST" \
-connect "${broker_ip}:443" 2>/dev/null \
| openssl x509 -fingerprint -sha1 -noout \
| awk -F'=' '{print $2}' | tr -d ':' | tr 'A-Z' 'a-z')
[ ${#thumb} -eq 40 ] || { echo "thumb length ${#thumb} != 40 — check -sha1 flag" >&2; return 1; }
aws iam create-open-id-connect-provider \
--url "https://${BROKER_HOST}" \
--client-id-list "sts.amazonaws.com" \
--thumbprint-list "$thumb"AWS validates the issuer URL byte-for-byte against the JWT iss claim. Once registered, the URL is effectively immutable — switching means a new provider ARN + new trust policy + new federated grants.
Apply to each of the three data roles. Use $ROLE ∈ {agentkeys-data-role, agentkeys-vault-role, agentkeys-memory-role} (or the -test variants when bootstrapping the CI test instance).
aws iam update-assume-role-policy --role-name "$ROLE" --policy-document "$(jq -n \
--arg acct "$ACCOUNT_ID" --arg host "$BROKER_HOST" '{
Version:"2012-10-17",
Statement:[{
Effect:"Allow",
Principal:{Federated:"arn:aws:iam::\($acct):oidc-provider/\($host)"},
Action:"sts:AssumeRoleWithWebIdentity",
Condition:{StringEquals:{"\($host):aud":"sts.amazonaws.com"}}
}]
}')"Per AGENTS.md "Per-actor + per-data-class isolation invariants": every S3 read/write is scoped to bots/${aws:PrincipalTag/agentkeys_actor_omni}/{credentials,memory}/*. The split-statement v3 bucket policy is applied by scripts/apply-{vault,memory}-bucket-policy.sh — those scripts are the source of truth for the policy shape.
After §9.3 + §9.4, strip the broad-bucket inline grant from the role's policy (the bucket-side policy enforces; defense in depth means no app-side grant):
aws iam delete-role-policy --role-name "$ROLE" --policy-name "${ROLE}-inline"Run harness/v2-stage3-demo.sh (or bash harness/run.sh --stage 3) — it mints session JWT → OIDC JWT → STS creds, then proves both POSITIVE (own prefix) and NEGATIVE (cross-actor prefix → AccessDenied) writes for both data classes plus the cross-role isolation matrix. Walks the full §17.2 isolation table from AGENTS.md.
§§3–8 set up identifiers. This step stands up the actual processes — broker + mock-server + signer + 5 service workers (audit/email/cred/memory/config) — on the EC2 host (or any Linux box with public-internet egress + the broker's hostname).
- Fresh Linux host with sudo, systemd, public-internet egress, ports 80 + 443 open inbound (for certbot + nginx).
- DNS A records for
${BROKER_HOST}+signer.${ZONE}+audit.${ZONE}+email.${ZONE}+cred.${ZONE}+memory.${ZONE}+config.${ZONE}all pointing at the host's public IP (provisioned bysetup-cloud.shstep 6 — broker/signer/mcp inline, the 5 service workers viadns-upsert-workers.sh). - AWS credentials in
/etc/agentkeys/broker.env(the script writes the template; operator pastes theagentkeys-daemonaccess key from §4.1).
# Bootstrap a fresh host:
sudo bash scripts/setup-broker-host.sh \
--issuer-url "https://${BROKER_HOST}" \
--account-id "${ACCOUNT_ID}" \
--signer-host "signer.${ZONE}" \
--audit-host "audit.${ZONE}" \
--email-host "email.${ZONE}" \
--cred-host "cred.${ZONE}" \
--memory-host "memory.${ZONE}" \
--config-host "config.${ZONE}" \
--yes
# After a `git pull`, the same command re-deploys:
sudo bash scripts/setup-broker-host.sh --yesThe script:
- Builds
agentkeys-broker-server(+auth-email-linkfeature),agentkeys-mock-server, the 5 service workers (audit/email/cred/memory/config), and the signer. Compilations are cached with sccache (a content-addressed compiler cache, auto-installed best-effort) so re-deploys +--refbranch switches reuse the cache instead of recompiling — even whengit checkoutchurns mtimes ortarget/is cold. The build printssccache statsafterward (re-deploys should be mostly cache hits). Opt out withAGENTKEYS_NO_SCCACHE=1; pin a version withSCCACHE_VERSION=vX.Y.Z. - Creates the
agentkeyssystem user + state dir/var/lib/agentkeys/. - Writes the dev_key_service master secret (one-shot at first boot, never rotated — rotation invalidates every previously-derived wallet).
- Writes per-worker env files at
/etc/agentkeys/worker-{audit,email,creds,memory,config}.env. - Writes systemd units for broker + signer + each worker, enables + starts.
- Configures nginx vhosts for
${BROKER_HOST}+signer.${ZONE}+ 6 worker hosts (audit/email/cred/memory/config/classify) (skip via--without-nginx). Vhost is rendered in two phases: Phase A (HTTP-only on:80, with the ACME challenge path under/.well-known/acme-challenge/and a 503 placeholder on/) when no cert is on disk; Phase B (HTTPS on:443, broker proxy on/) when/etc/letsencrypt/live/<host>/fullchain.pemexists. - Installs certbot AND auto-issues certs (step 7b) for every co-located vhost whose DNS resolves to this host, flipping nginx Phase A → B in the same run; an unresolved host is skipped (fix DNS, re-run). Optional
--certbot-email <addr>(default: no email). Details + manual fallback: quick-start §5b. - Mints broker keypairs (oidc + session) under
/var/lib/agentkeys/keys/.
Auto-detects bootstrap vs upgrade by reading the existing systemd unit's Environment= lines. Pass --ref <branch> to opt into an in-script git fetch + pull.
curl -sf "https://${BROKER_HOST}/healthz" # → 200
curl -sf "https://${BROKER_HOST}/.well-known/openid-configuration" | jq .
curl -sf "https://${BROKER_HOST}/.well-known/jwks.json" | jq '.keys | length'
curl -sf "https://audit.${ZONE}/healthz" # → 200 (and friends)For full E2E (broker + workers + chain + AWS), run bash harness/run.sh — see docs/chain-setup.md for the chain side and docs/ci-setup.md for the automated path.
Tear down the whole AgentKeys footprint in one account. Use only when retiring the deployment.
# Drain the buckets
for b in "$BUCKET" "agentkeys-vault-${ACCOUNT_ID}" "agentkeys-memory-${ACCOUNT_ID}"; do
aws s3 rm "s3://$b" --recursive 2>/dev/null || true
aws s3api delete-bucket --bucket "$b" --region "$REGION" 2>/dev/null || true
done
# Roles
for r in agentkeys-data-role agentkeys-vault-role agentkeys-memory-role agentkeys-broker-host; do
for p in $(aws iam list-role-policies --role-name "$r" --query 'PolicyNames[]' --output text 2>/dev/null); do
aws iam delete-role-policy --role-name "$r" --policy-name "$p"
done
aws iam delete-role --role-name "$r" 2>/dev/null || true
done
# OIDC provider
aws iam delete-open-id-connect-provider \
--open-id-connect-provider-arn "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${BROKER_HOST}"
# Daemon user
for k in $(aws iam list-access-keys --user-name agentkeys-daemon --query 'AccessKeyMetadata[].AccessKeyId' --output text); do
aws iam delete-access-key --user-name agentkeys-daemon --access-key-id "$k"
done
aws iam delete-user-policy --user-name agentkeys-daemon --policy-name agentkeys-daemon-assume-role 2>/dev/null || true
aws iam delete-user --user-name agentkeys-daemon
# SES + DNS
aws ses set-active-receipt-rule-set --rule-set-name "" --region "$REGION" 2>/dev/null || true
aws sesv2 delete-email-identity --email-identity "$MAIL_DOMAIN" --region "$REGION" 2>/dev/null || true
# DNS records: operator-managed (Route 53 / your DNS provider) — delete by hand.
# EC2 + EIP: manual via console or aws ec2 CLIFor the test instance, substitute -test on every identifier above; for a test-fleet slot N ≥ 2 (§0.3), substitute -test-N (and delete that slot's OIDC provider …oidc-provider/broker-test-N.${ZONE}, its agentkeys-test-broker-ssm-N instance profile/role, and its agentkeys-broker-test-N SSH user); for the Base prod stack (§0.4), substitute -base (its OIDC provider is …oidc-provider/broker-base.${ZONE}, its instance profile agentkeys-broker-host-base). Tearing down one stack must not touch any other stack's resources — every identifier is stack-suffixed precisely so the blast radius stays per-stack.
- Operator workstation setup:
docs/dev-setup.md - Chain bring-up:
docs/chain-setup.md - CI activation:
docs/ci-setup.md - Broker host script (single entry point):
scripts/setup-broker-host.sh - Cloud bootstrap script (single entry point):
scripts/setup-cloud.sh - Architecture (per-data-class buckets + isolation invariants):
docs/arch.md§17, §17.2 - Future Tencent / TEE DKIM:
docs/spec/heima-gaps-vs-desired-architecture.md§4 - FAQ + troubleshooting:
wiki/cloud-setup-faq.md