Skip to content

ghostsecurity/terraform-ghost-agent-platform

Repository files navigation

terraform-ghost-agent-platform

Terraform module that deploys the Ghost Agent Platform to a single EC2 VM in an AWS account. Provisions the VM, network, IAM, and AWS Secrets Manager entries, then bootstraps the stack via cloud-init - including signature verification of every published image before any container starts.

What gets deployed

  • 1 EC2 instance (t3.large by default, AL2023 AMI) running the full Ghost Agent Platform stack as docker-compose. Five service containers: the gateway, credential proxy, worker, UI, and an in-stack updater that handles UI-driven image bumps (see Updating the deployment).
  • 1 EBS data volume (100 GB gp3 by default) mounted at /var/lib/exo - holds the database, TLS material, run artifacts, and signing-cert state. Configured with prevent_destroy = true to guard against accidental teardown.
  • 1 Elastic IP providing a stable public address.
  • 1 Security group - SSH from var.admin_cidr only, HTTP/HTTPS from var.public_ingress_cidrs (default: world).
  • 1 IAM role + instance profile with scoped permissions: ECR pull on Ghost's image repos, Secrets Manager read on this module's own secrets, SSM core (for Session Manager).
  • 4 Secrets Manager entries: JWT secret, encryption key, seed admin password, optional Slack config. JWT and encryption key are auto-generated by Terraform.
  • (Optional) Route 53 A record when both var.domain_name and var.route53_zone_id are set.

Prerequisites

  1. AWS account with admin or equivalent permissions to create EC2 / EBS / IAM / Secrets Manager resources.
  2. A public subnet in a VPC. The subnet must allow inbound HTTP/HTTPS for Let's Encrypt cert issuance and public app access.
  3. Cross-account ECR pull access from Ghost Security. Ghost publishes images to a private ECR registry; the AWS account running this module needs to be added to Ghost's repository policy before the EC2 can pull. See Onboarding below.
  4. Terraform >= 1.5 with the AWS provider 6.x.

Onboarding

  1. Share the target AWS account ID with Ghost Security. The account running this module needs to appear on Ghost's pull_allowed_accounts list - without it, terraform apply will succeed but cloud-init will fail to pull images.
  2. Ghost provides back the image_registry URL (ECR registry hostname) and the released image_tag to deploy.
  3. Apply the module with those values plus a subnet, admin CIDR, and seed admin email.

Quick start

module "ghost_agent" {
  # Pin to a released tag for reproducibility (omit ?ref=… to track main).
  source = "github.com/ghostsecurity/terraform-ghost-agent-platform?ref=<latest release tag, e.g. v0.1.5>"

  # Provided by Ghost during onboarding
  image_registry = "012345678901.dkr.ecr.<region>.amazonaws.com"
  image_tag      = "v1.0.0"

  # Network placement
  subnet_id  = "subnet-0abc123def456789"
  admin_cidr = "203.0.113.42/32"

  # Initial admin user - password is auto-generated
  seed_admin_email = "ops@example.com"
}

output "url" {
  value = module.ghost_agent.bringup_url
}

# Access the VM. Uses AWS Systems Manager Session Manager - no SSH key
# pair needed (the module attaches the SSM role). For SSH instead, set
# `ssh_key_name` on the module and use `module.ghost_agent.ssh_command`.
output "ssm" {
  value = module.ghost_agent.ssm_session_command
}

This bring-up uses the default nip.io fallback for the public hostname (no DNS setup required) and pulls a real Let's Encrypt cert automatically. To use a custom FQDN instead, set domain_name and optionally route53_zone_id.

After apply

terraform apply returns as soon as the EC2 instance is created, but the on-VM bootstrap (mount EBS, install Docker + cosign, verify image signatures, pull images, start the stack) takes another ~3-5 minutes. Wait for cloud-init to finish before trying to log in - watch the bootstrap log and look for the final ===> Bootstrap complete at <timestamp> line.

Two ways to watch:

# From your laptop - no SSM/SSH needed. Console output is buffered
# by EC2 and typically lags 1-3 minutes, but works even before the
# SSM agent comes up:
aws ec2 get-console-output --region <region, e.g. us-east-1> \
  --instance-id "$(terraform output -raw instance_id)" \
  --latest --output text | tail -100
# Real-time, via SSM Session Manager (requires the SSM agent to be
# running on the VM - happens a minute or two into bootstrap):
$(terraform output -raw ssm_session_command)
sudo tail -f /var/log/ghost-agent-bootstrap.log

Once the ===> Bootstrap complete line appears the stack is up. Useful terraform outputs:

terraform output bringup_url                # e.g. https://3-14-15-92.nip.io
terraform output -raw ssm_session_command   # aws ssm start-session --region <region> --target i-01234...
terraform output secret_arns                # ARNs for the auto-generated secrets

Retrieve the initial admin password:

aws secretsmanager get-secret-value \
  --region <region> \
  --secret-id "$(terraform output -json secret_arns | jq -r .seed_admin_password)" \
  --query SecretString --output text

Open the bringup_url in a browser, sign in with seed_admin_email + the password above, then rotate the password in-app.

Ghost support access (cross-account SSM)

By default this module publishes an IAM role, <name_prefix>-ssm-support (e.g. ghost-agent-ssm-support), that Ghost Security assumes to open an AWS Systems Manager Session Manager shell on the VM for support. Its trust is scoped to a single Ghost support role, so only that role can assume it. The role grants only ssm:StartSession on this one instance - an interactive shell, or a port-forward to reach the UI without opening any inbound ports - plus management of the operator's own sessions: no SSH key, no inbound ports, and no access to any other resource. The instance still dials out to the SSM service exactly as it does for your own ssm_session_command; nothing new is exposed to the internet.

terraform output ssm_support_role_arn   # the role Ghost assumes (empty if disabled)

To turn it off, set ghost_support_access_enabled = false. The role is removed and Ghost has no Session Manager path to the VM.

Updating the deployment

Three paths, in order of operator overhead.

In-app (preferred)

A workspace admin opens the UI's workspace dropdown → System → Version. The page shows the running tag and (after a check-against-ECR) the latest available release. Clicking Upgrade to vX.Y.Z dispatches the bump to the in-stack updater container, which:

  1. Rewrites TAG= in /opt/exo/.env
  2. Runs docker compose pull for the managed services (gateway, credential-proxy, worker, ui)
  3. Runs docker compose up -d for the same set

The updater excludes itself from the dispatched up -d so it doesn't kill the container running the upgrade. Brief ~10-30s window during the gateway swap where the API may 502 - acceptable for a single-host deployment.

Rollback works the same way: the page also offers a one-click rollback to the prior tag, derived from the audit log of past upgrades.

Out-of-band (operator SSH)

When the updater itself needs a version bump (it doesn't auto-update), or as break-glass if the in-app path is broken:

$(terraform output -raw ssm_session_command)   # or ssh_command
cd /opt/exo
sudo sed -i 's/^TAG=.*/TAG=vNEW.TAG.HERE/' .env
# To also bump the updater image (typically same tag, but pinned separately):
sudo sed -i 's/^UPDATER_TAG=.*/UPDATER_TAG=vNEW.TAG.HERE/' .env
sudo docker compose -f docker-compose.prod.yml pull
sudo docker compose -f docker-compose.prod.yml up -d

ECR repositories are configured with immutable tags so byte-identical reproduction is guaranteed.

Terraform replace (heavier)

Force a clean instance recycle by tainting the EC2 resource:

terraform apply -replace=module.ghost_agent.aws_instance.vm

The instance is replaced, cloud-init re-runs end-to-end (including cosign verification), but the data EBS volume and secrets persist. Useful when AMI / cloud-init changes need to land alongside an image bump. Note: the var.image_tag value written to .env by cloud-init overrides whatever the running stack was on, so make sure image_tag in tfvars matches the version you want before triggering a replace.

Signature verification

Every image is cosign verify'd against Ghost's published-workflow identity before any container starts. The verify policy is encoded in var.image_signing_identity_regex + var.image_signing_oidc_issuer. Defaults match the Ghost Security publish workflow on a v* tag:

identity-regexp: ^https://github\.com/ghostsecurity/exo/\.github/workflows/publish-to-ecr\.yml@refs/tags/v[0-9]+\.[0-9]+\.[0-9]+(-.*)?$
oidc-issuer:     https://token.actions.githubusercontent.com

If a tag's signature fails verification, cloud-init exits before docker compose up. Cosign and the workflow signer are pinned in lockstep on both sides - Ghost's publish workflow signs with cosign-installer@v4.1.2 (cosign v3.0.6), and the cloud-init verify uses the same version.

Troubleshooting

Symptom Where to look
terraform apply succeeds but the bringup URL doesn't respond Cloud-init log: sudo tail -200 /var/log/ghost-agent-bootstrap.log
cosign verify failure in the bootstrap log Image-tag mismatch, or Ghost's signing identity regex doesn't match. Verify with cosign verify <image>@<digest> locally to confirm.
RepositoryNotFoundException during image pull Cross-account ECR access not yet granted - contact Ghost.
502 from Caddy for /api calls Gateway container down. docker compose -f /opt/exo/docker-compose.prod.yml ps and check gateway logs.
Browser cert error / "not secure" warning LE issuance failed (port 80 unreachable, DNS not resolving, rate limited). Check docker logs <caddy-container>.
Database not initializing First-boot replica-set bootstrap timing. Verify /var/lib/exo is actually the EBS mount (mountpoint /var/lib/exo) - if not, the bootstrap script aborts before docker starts.
terraform apply fails with secret with this name is already scheduled for deletion A previous terraform destroy puts secrets into a 7-day soft-delete (recovery_window_in_days = 7 in secrets.tf). To re-deploy into the same account before the window expires, force-delete the leftovers: for s in jwt-secret encryption-key seed-admin-password slack; do aws secretsmanager delete-secret --secret-id "ghost-agent/$s" --force-delete-without-recovery --region <region>; done

Persistent log locations on the VM:

  • /var/log/ghost-agent-bootstrap.log - cloud-init bootstrap (single run, captured during first boot)
  • /var/log/cloud-init-output.log - cloud-init's own log
  • docker logs <container> - runtime container logs (gateway, credential-proxy, worker, caddy, database)

Data persistence and recovery

The EBS data volume at /var/lib/exo holds all survival-critical state:

  • tls/ - MITM CA private key + service CA private key. Loss of these invalidates every encrypted credential and every enrolled runner identity.
  • mongo-data/ - application database.
  • caddy-data/ - Let's Encrypt account and issued certificates.
  • runner-identity/ - per-runner client certs and slot locks.
  • artifacts/ - run output (recoverable, but loss is a regression).

The module sets prevent_destroy = true on this volume to guard against terraform destroy. Recommended additional posture for production:

  • Enable EBS snapshot lifecycle policy for daily snapshots with 30-day retention.
  • Treat the volume ID as sensitive - anyone with ec2:DescribeVolumes can locate it.
  • For dev/test instances: temporarily set prevent_destroy = false (one-line module edit) before terraform destroy.

Configuration knobs

See variables.tf for the full input list. Most useful overrides:

Variable When to override
domain_name + route53_zone_id Real public domain instead of the nip.io fallback
instance_type Higher throughput; in-place change supported (~1-2 min downtime, data persists)
data_volume_size_gb Anticipate high workflow rate / artifact retention
public_ingress_cidrs Lock down 80/443 to a CDN/WAF origin (note: scoping 80 too tightly will break Let's Encrypt cert renewal)
ssh_key_name Use SSH instead of SSM Session Manager
ghost_support_access_enabled Set false to remove the Ghost support role and disable cross-account SSM access
image_signing_identity_regex Ghost rotates the publish workflow path or pre-release tag pattern

Production recommendations

A few things this module does not enforce, but which any production deploy should have in place:

  • Tighten admin_cidr to an office IP or VPN egress range. The variable validates that the value is a CIDR, not that it's narrow.
  • Use a remote Terraform backend (S3 with KMS encryption). The module generates secrets via random_* resources, so their values live in state - local state on a laptop is not appropriate.
  • Enable an EBS snapshot lifecycle policy for the data volume. The volume holds all survival-critical state (see Data persistence and recovery); prevent_destroy guards against accidental teardown but not against corruption or accidental file deletion inside the VM.
  • Add CloudWatch alarms on EC2 status checks and configure instance auto-recovery. A single-VM deployment has no built-in failover.

License

Apache License 2.0 - see LICENSE.

Copyright 2026 Ghost Security

About

The Ghost Security terraform module for deploying the Ghost Agent Platform

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors