You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The test-broker fleet is growing (#265): prod + N CI test slots, each a full broker stack (broker, signer, 6 workers, bundler, nginx) on its own EC2 + EIP, plus the local daemon and the chain layer. Today an operator answers "which server is running, and which CI run is on which slot?" by hand — aws ec2 describe-addresses by tag, curl …/healthz per host, ssh-broker.sh test-N 'systemctl status …', GitHub Actions tab for run→slot. There's no single view.
Ask
A read-only fleet status dashboard — start narrow (which broker/slot is up, healthy, and what CI run currently holds it), designed to grow into a general DevOps dashboard later. Do not implement yet — this issue is to scope + design.
v1 scope (status board)
Per environment (prod, test slot 1..N):
Machine: EC2 instance id + state, EIP (by tag agentkeys-broker-eip[-test[-N]]), instance type, uptime.
SoT for the fleet inventory: scripts/broker.test*.env + the slot env files (instance ids, EIPs, hostnames).
Future (the "DevOps dashboard" expansion — out of v1 scope, listed so v1 doesn't paint into a corner)
Deploy history + rollback, per-worker log tail, cost/idle-stop controls (#265 phase 6), alerting on health flaps, prod broker too (not just test fleet), the cloud (AWS/IAM/DNS) and chain-contract layers.
Design questions to settle first
Surface: static page regenerated by a scheduled job (cheap, no new always-on service) vs a small live service. Lean static-first to avoid adding an always-on component beyond the broker.
Auth: it reads AWS + GH + chain — where does it run and whose creds (operator laptop one-shot? a locked-down read-only IAM principal?).
Where it lives: a new viz/ page (there's already a viz/ dashboard pattern in-repo) vs a separate tool.
Context
The test-broker fleet is growing (#265): prod + N CI test slots, each a full broker stack (broker, signer, 6 workers, bundler, nginx) on its own EC2 + EIP, plus the local daemon and the chain layer. Today an operator answers "which server is running, and which CI run is on which slot?" by hand —
aws ec2 describe-addressesby tag,curl …/healthzper host,ssh-broker.sh test-N 'systemctl status …', GitHub Actions tab for run→slot. There's no single view.Ask
A read-only fleet status dashboard — start narrow (which broker/slot is up, healthy, and what CI run currently holds it), designed to grow into a general DevOps dashboard later. Do not implement yet — this issue is to scope + design.
v1 scope (status board)
Per environment (prod, test slot 1..N):
agentkeys-broker-eip[-test[-N]]), instance type, uptime./healthz(green/red/degraded), TLS cert expiry per host, nginx :443 up.heima-test-slot-Nonce Parallel CI envs: multi-broker architecture, one EC2 per env (max N) #265 phase 4 lands;heima-test-deployer-noncetoday), queue depth behind it.check-wallet-balances.sh), master account registration state.BROKER_OIDC_ISSUER≠ the slot it's tagged as, stale binary vsorigin/mainsha.Data sources (all already exist, read-only)
ec2 describe-addresses/describe-instances(by env-aware tag), Route 53 record sets, ACM/letsencrypt cert via TLS probe./healthzendpoints +scripts/wait-stack-healthy.shlogic;ssh-broker.sh test-Nfor systemd/journald.cast balance+SidecarRegistry.operatorMasterWallet(omni);scripts/check-wallet-balances.sh.scripts/broker.test*.env+ the slot env files (instance ids, EIPs, hostnames).Future (the "DevOps dashboard" expansion — out of v1 scope, listed so v1 doesn't paint into a corner)
Deploy history + rollback, per-worker log tail, cost/idle-stop controls (#265 phase 6), alerting on health flaps, prod broker too (not just test fleet), the cloud (AWS/IAM/DNS) and chain-contract layers.
Design questions to settle first
viz/page (there's already aviz/dashboard pattern in-repo) vs a separate tool.References
docs/spec/ci-parallel-test-fleet.mddocs/cloud-bootstrap.md§0.3 (live fleet inventory + EIP-by-tag rule),scripts/wait-stack-healthy.sh,scripts/check-wallet-balances.shScope/design only — do not implement until the design questions above are answered.