Terraform module that deploys the Ghost Agent Platform to a single EC2 VM in an AWS account. Provisions the VM, network, IAM, and AWS Secrets Manager entries, then bootstraps the stack via cloud-init - including signature verification of every published image before any container starts.
- 1 EC2 instance (
t3.largeby default, AL2023 AMI) running the full Ghost Agent Platform stack as docker-compose. Five service containers: the gateway, credential proxy, worker, UI, and an in-stack updater that handles UI-driven image bumps (see Updating the deployment). - 1 EBS data volume (100 GB gp3 by default) mounted at
/var/lib/exo- holds the database, TLS material, run artifacts, and signing-cert state. Configured withprevent_destroy = trueto guard against accidental teardown. - 1 Elastic IP providing a stable public address.
- 1 Security group - SSH from
var.admin_cidronly, HTTP/HTTPS fromvar.public_ingress_cidrs(default: world). - 1 IAM role + instance profile with scoped permissions: ECR pull on Ghost's image repos, Secrets Manager read on this module's own secrets, SSM core (for Session Manager).
- 4 Secrets Manager entries: JWT secret, encryption key, seed admin password, optional Slack config. JWT and encryption key are auto-generated by Terraform.
- (Optional) Route 53 A record when both
var.domain_nameandvar.route53_zone_idare set.
- AWS account with admin or equivalent permissions to create EC2 / EBS / IAM / Secrets Manager resources.
- A public subnet in a VPC. The subnet must allow inbound HTTP/HTTPS for Let's Encrypt cert issuance and public app access.
- Cross-account ECR pull access from Ghost Security. Ghost publishes images to a private ECR registry; the AWS account running this module needs to be added to Ghost's repository policy before the EC2 can pull. See Onboarding below.
- Terraform >= 1.5 with the AWS provider 6.x.
- Share the target AWS account ID with Ghost Security. The account running this module needs to appear on Ghost's
pull_allowed_accountslist - without it,terraform applywill succeed but cloud-init will fail to pull images. - Ghost provides back the
image_registryURL (ECR registry hostname) and the releasedimage_tagto deploy. - Apply the module with those values plus a subnet, admin CIDR, and seed admin email.
module "ghost_agent" {
# Pin to a released tag for reproducibility (omit ?ref=… to track main).
source = "github.com/ghostsecurity/terraform-ghost-agent-platform?ref=<latest release tag, e.g. v0.1.5>"
# Provided by Ghost during onboarding
image_registry = "012345678901.dkr.ecr.<region>.amazonaws.com"
image_tag = "v1.0.0"
# Network placement
subnet_id = "subnet-0abc123def456789"
admin_cidr = "203.0.113.42/32"
# Initial admin user - password is auto-generated
seed_admin_email = "ops@example.com"
}
output "url" {
value = module.ghost_agent.bringup_url
}
# Access the VM. Uses AWS Systems Manager Session Manager - no SSH key
# pair needed (the module attaches the SSM role). For SSH instead, set
# `ssh_key_name` on the module and use `module.ghost_agent.ssh_command`.
output "ssm" {
value = module.ghost_agent.ssm_session_command
}This bring-up uses the default nip.io fallback for the public hostname (no DNS setup required) and pulls a real Let's Encrypt cert automatically. To use a custom FQDN instead, set domain_name and optionally route53_zone_id.
terraform apply returns as soon as the EC2 instance is created, but the on-VM bootstrap (mount EBS, install Docker + cosign, verify image signatures, pull images, start the stack) takes another ~3-5 minutes. Wait for cloud-init to finish before trying to log in - watch the bootstrap log and look for the final ===> Bootstrap complete at <timestamp> line.
Two ways to watch:
# From your laptop - no SSM/SSH needed. Console output is buffered
# by EC2 and typically lags 1-3 minutes, but works even before the
# SSM agent comes up:
aws ec2 get-console-output --region <region, e.g. us-east-1> \
--instance-id "$(terraform output -raw instance_id)" \
--latest --output text | tail -100# Real-time, via SSM Session Manager (requires the SSM agent to be
# running on the VM - happens a minute or two into bootstrap):
$(terraform output -raw ssm_session_command)
sudo tail -f /var/log/ghost-agent-bootstrap.logOnce the ===> Bootstrap complete line appears the stack is up. Useful terraform outputs:
terraform output bringup_url # e.g. https://3-14-15-92.nip.io
terraform output -raw ssm_session_command # aws ssm start-session --region <region> --target i-01234...
terraform output secret_arns # ARNs for the auto-generated secretsRetrieve the initial admin password:
aws secretsmanager get-secret-value \
--region <region> \
--secret-id "$(terraform output -json secret_arns | jq -r .seed_admin_password)" \
--query SecretString --output textOpen the bringup_url in a browser, sign in with seed_admin_email + the password above, then rotate the password in-app.
By default this module publishes an IAM role, <name_prefix>-ssm-support (e.g. ghost-agent-ssm-support), that Ghost Security assumes to open an AWS Systems Manager Session Manager shell on the VM for support. Its trust is scoped to a single Ghost support role, so only that role can assume it. The role grants only ssm:StartSession on this one instance - an interactive shell, or a port-forward to reach the UI without opening any inbound ports - plus management of the operator's own sessions: no SSH key, no inbound ports, and no access to any other resource. The instance still dials out to the SSM service exactly as it does for your own ssm_session_command; nothing new is exposed to the internet.
terraform output ssm_support_role_arn # the role Ghost assumes (empty if disabled)To turn it off, set ghost_support_access_enabled = false. The role is removed and Ghost has no Session Manager path to the VM.
Three paths, in order of operator overhead.
A workspace admin opens the UI's workspace dropdown → System → Version. The page shows the running tag and (after a check-against-ECR) the latest available release. Clicking Upgrade to vX.Y.Z dispatches the bump to the in-stack updater container, which:
- Rewrites
TAG=in/opt/exo/.env - Runs
docker compose pullfor the managed services (gateway, credential-proxy, worker, ui) - Runs
docker compose up -dfor the same set
The updater excludes itself from the dispatched up -d so it doesn't kill the container running the upgrade. Brief ~10-30s window during the gateway swap where the API may 502 - acceptable for a single-host deployment.
Rollback works the same way: the page also offers a one-click rollback to the prior tag, derived from the audit log of past upgrades.
When the updater itself needs a version bump (it doesn't auto-update), or as break-glass if the in-app path is broken:
$(terraform output -raw ssm_session_command) # or ssh_command
cd /opt/exo
sudo sed -i 's/^TAG=.*/TAG=vNEW.TAG.HERE/' .env
# To also bump the updater image (typically same tag, but pinned separately):
sudo sed -i 's/^UPDATER_TAG=.*/UPDATER_TAG=vNEW.TAG.HERE/' .env
sudo docker compose -f docker-compose.prod.yml pull
sudo docker compose -f docker-compose.prod.yml up -dECR repositories are configured with immutable tags so byte-identical reproduction is guaranteed.
Force a clean instance recycle by tainting the EC2 resource:
terraform apply -replace=module.ghost_agent.aws_instance.vmThe instance is replaced, cloud-init re-runs end-to-end (including cosign verification), but the data EBS volume and secrets persist. Useful when AMI / cloud-init changes need to land alongside an image bump. Note: the var.image_tag value written to .env by cloud-init overrides whatever the running stack was on, so make sure image_tag in tfvars matches the version you want before triggering a replace.
Every image is cosign verify'd against Ghost's published-workflow identity before any container starts. The verify policy is encoded in var.image_signing_identity_regex + var.image_signing_oidc_issuer. Defaults match the Ghost Security publish workflow on a v* tag:
identity-regexp: ^https://github\.com/ghostsecurity/exo/\.github/workflows/publish-to-ecr\.yml@refs/tags/v[0-9]+\.[0-9]+\.[0-9]+(-.*)?$
oidc-issuer: https://token.actions.githubusercontent.com
If a tag's signature fails verification, cloud-init exits before docker compose up. Cosign and the workflow signer are pinned in lockstep on both sides - Ghost's publish workflow signs with cosign-installer@v4.1.2 (cosign v3.0.6), and the cloud-init verify uses the same version.
| Symptom | Where to look |
|---|---|
terraform apply succeeds but the bringup URL doesn't respond |
Cloud-init log: sudo tail -200 /var/log/ghost-agent-bootstrap.log |
cosign verify failure in the bootstrap log |
Image-tag mismatch, or Ghost's signing identity regex doesn't match. Verify with cosign verify <image>@<digest> locally to confirm. |
RepositoryNotFoundException during image pull |
Cross-account ECR access not yet granted - contact Ghost. |
502 from Caddy for /api calls |
Gateway container down. docker compose -f /opt/exo/docker-compose.prod.yml ps and check gateway logs. |
| Browser cert error / "not secure" warning | LE issuance failed (port 80 unreachable, DNS not resolving, rate limited). Check docker logs <caddy-container>. |
| Database not initializing | First-boot replica-set bootstrap timing. Verify /var/lib/exo is actually the EBS mount (mountpoint /var/lib/exo) - if not, the bootstrap script aborts before docker starts. |
terraform apply fails with secret with this name is already scheduled for deletion |
A previous terraform destroy puts secrets into a 7-day soft-delete (recovery_window_in_days = 7 in secrets.tf). To re-deploy into the same account before the window expires, force-delete the leftovers: for s in jwt-secret encryption-key seed-admin-password slack; do aws secretsmanager delete-secret --secret-id "ghost-agent/$s" --force-delete-without-recovery --region <region>; done |
Persistent log locations on the VM:
/var/log/ghost-agent-bootstrap.log- cloud-init bootstrap (single run, captured during first boot)/var/log/cloud-init-output.log- cloud-init's own logdocker logs <container>- runtime container logs (gateway, credential-proxy, worker, caddy, database)
The EBS data volume at /var/lib/exo holds all survival-critical state:
tls/- MITM CA private key + service CA private key. Loss of these invalidates every encrypted credential and every enrolled runner identity.mongo-data/- application database.caddy-data/- Let's Encrypt account and issued certificates.runner-identity/- per-runner client certs and slot locks.artifacts/- run output (recoverable, but loss is a regression).
The module sets prevent_destroy = true on this volume to guard against terraform destroy. Recommended additional posture for production:
- Enable EBS snapshot lifecycle policy for daily snapshots with 30-day retention.
- Treat the volume ID as sensitive - anyone with
ec2:DescribeVolumescan locate it. - For dev/test instances: temporarily set
prevent_destroy = false(one-line module edit) beforeterraform destroy.
See variables.tf for the full input list. Most useful overrides:
| Variable | When to override |
|---|---|
domain_name + route53_zone_id |
Real public domain instead of the nip.io fallback |
instance_type |
Higher throughput; in-place change supported (~1-2 min downtime, data persists) |
data_volume_size_gb |
Anticipate high workflow rate / artifact retention |
public_ingress_cidrs |
Lock down 80/443 to a CDN/WAF origin (note: scoping 80 too tightly will break Let's Encrypt cert renewal) |
ssh_key_name |
Use SSH instead of SSM Session Manager |
ghost_support_access_enabled |
Set false to remove the Ghost support role and disable cross-account SSM access |
image_signing_identity_regex |
Ghost rotates the publish workflow path or pre-release tag pattern |
A few things this module does not enforce, but which any production deploy should have in place:
- Tighten
admin_cidrto an office IP or VPN egress range. The variable validates that the value is a CIDR, not that it's narrow. - Use a remote Terraform backend (S3 with KMS encryption). The module generates secrets via
random_*resources, so their values live in state - local state on a laptop is not appropriate. - Enable an EBS snapshot lifecycle policy for the data volume. The volume holds all survival-critical state (see Data persistence and recovery);
prevent_destroyguards against accidental teardown but not against corruption or accidental file deletion inside the VM. - Add CloudWatch alarms on EC2 status checks and configure instance auto-recovery. A single-VM deployment has no built-in failover.
Apache License 2.0 - see LICENSE.
Copyright 2026 Ghost Security