Add PostgreSQL functional tests with real containers in CI#71
Add PostgreSQL functional tests with real containers in CI#71renecannao merged 1 commit intomasterfrom
Conversation
Add functional test infrastructure for PostgreSQL streaming replication: - PostgreSQL 17 primary + standby containers in docker-compose.yml - Init scripts for primary (WAL config, replication user, orchestrator user) and standby (pg_basebackup from primary) - Dedicated orchestrator instance (port 3098) with PostgreSQL provider - Test script covering discovery, read-only verification, API endpoints, and DeadPrimary failover with automatic standby promotion - GitHub Actions workflow updated to run PG tests after MySQL tests Closes #68
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 41 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a functional testing framework for PostgreSQL within Orchestrator, including a Docker Compose environment with primary and standby nodes and a comprehensive test script for discovery and failover. Key feedback points out that installing packages during container startup is inefficient and suggests replacing fixed sleeps in the test script with polling loops to prevent flakiness during successor promotion checks.
| - ./orchestrator-pg-test.conf.json:/orchestrator/orchestrator.conf.json:ro | ||
| command: > | ||
| bash -c " | ||
| apt-get update -qq && apt-get install -y -qq curl sqlite3 > /dev/null 2>&1 && |
There was a problem hiding this comment.
Running apt-get update and apt-get install every time the container starts is inefficient and makes the tests dependent on external network connectivity and package repository availability. It is recommended to use a custom Dockerfile or a pre-built image that already includes curl and sqlite3 to speed up the test execution and improve reliability.
| sleep 3 | ||
| SUCCESSOR_RO=$(curl -s "$ORC_URL/api/cluster/$PG_CLUSTER" 2>/dev/null | python3 -c " | ||
| import json, sys | ||
| instances = json.load(sys.stdin) | ||
| for inst in instances: | ||
| hostname = inst.get('Key', {}).get('Hostname', '') | ||
| if hostname == '$SUCCESSOR': | ||
| print('true' if inst.get('ReadOnly', True) else 'false') | ||
| sys.exit(0) | ||
| print('unknown') | ||
| " 2>/dev/null || echo "unknown") | ||
|
|
||
| if [ "$SUCCESSOR_RO" = "false" ]; then | ||
| pass "Successor $SUCCESSOR promoted (read_only=false)" | ||
| else | ||
| # After promotion the instance needs a poll cycle to update | ||
| skip "Successor read_only=$SUCCESSOR_RO (may need additional poll cycle)" | ||
| fi |
There was a problem hiding this comment.
The current check for successor promotion is potentially flaky. It uses a fixed 3-second sleep, which may be shorter than the orchestrator's poll interval (InstancePollSeconds: 5), and then skips the test if the condition isn't met. It's better to loop and wait for the expected state change (read_only=false) to ensure the test correctly verifies the promotion.
echo "Waiting for successor to be reported as primary (read_only=false)..."
SUCCESSOR_PROMOTED=false
for j in $(seq 1 20); do
SUCCESSOR_RO=$(curl -s "$ORC_URL/api/cluster/$PG_CLUSTER" 2>/dev/null | python3 -c "
import json, sys
instances = json.load(sys.stdin)
for inst in instances:
hostname = inst.get('Key', {}).get('Hostname', '')
if hostname == '$SUCCESSOR':
print('true' if inst.get('ReadOnly', True) else 'false')
sys.exit(0)
print('unknown')
" 2>/dev/null || echo "unknown")
if [ "$SUCCESSOR_RO" = "false" ]; then
pass "Successor $SUCCESSOR promoted (read_only=false)"
SUCCESSOR_PROMOTED=true
break
fi
sleep 1
done
if [ "$SUCCESSOR_PROMOTED" = "false" ]; then
fail "Successor $SUCCESSOR not promoted (read_only=$SUCCESSOR_RO) after 20s"
fiThere was a problem hiding this comment.
Pull request overview
Adds PostgreSQL functional coverage to the existing tests/functional container-based infrastructure, and wires it into the GitHub Actions functional workflow so CI validates PostgreSQL discovery and automated failover behaviors.
Changes:
- Introduces PostgreSQL primary + standby services in
tests/functional/docker-compose.yml, with init scripts to bootstrap streaming replication. - Adds a dedicated
orchestrator-pginstance configured forProviderType: "postgresql"and a newtest-postgresql.shscript to validate discovery, API responses, and DeadPrimary recovery. - Extends
.github/workflows/functional.ymlto start PostgreSQL services and execute the new PostgreSQL functional test script after the existing MySQL tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
tests/functional/test-postgresql.sh |
New functional test script for PostgreSQL discovery + failover + API validation. |
tests/functional/postgres/init-standby.sh |
New standby entrypoint to take a base backup from primary and start streaming replication. |
tests/functional/postgres/init-primary.sh |
New primary initdb script to enable WAL settings and create replication/orchestrator users. |
tests/functional/orchestrator-pg-test.conf.json |
Orchestrator config for a dedicated PostgreSQL-provider test instance (port 3098, sqlite backend). |
tests/functional/docker-compose.yml |
Adds pgprimary, pgstandby1, and orchestrator-pg services to the functional test compose stack. |
.github/workflows/functional.yml |
Runs PostgreSQL container bring-up + orchestrator bring-up + new PostgreSQL tests in CI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| environment: | ||
| POSTGRES_PASSWORD: testpass | ||
| PGUSER: postgres | ||
| PGPASSWORD: repl_pass |
There was a problem hiding this comment.
In the pgstandby1 service the libpq environment variables are inconsistent (PGUSER: postgres but PGPASSWORD: repl_pass). This can break readiness checks / any libpq client defaults, and it makes the intended authentication unclear. Set PGUSER/PGPASSWORD to the same account you expect to use (e.g., PGUSER: repl with PGPASSWORD: repl_pass, or keep postgres with testpass).
| PGPASSWORD: repl_pass | |
| PGPASSWORD: testpass |
| PRIMARY_PORT=5432 | ||
|
|
||
| echo "Waiting for primary to accept connections..." | ||
| until pg_isready -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U postgres; do |
There was a problem hiding this comment.
init-standby.sh waits for the primary using pg_isready -U postgres, but this container’s environment sets PGPASSWORD to the replication password. If the primary requires password auth, this readiness loop may never succeed. Use the replication user (or ensure PGPASSWORD matches the user used in pg_isready).
| until pg_isready -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U postgres; do | |
| until pg_isready -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U repl; do |
| if [ -n "$PG_CLUSTER" ]; then | ||
| pass "PostgreSQL cluster discovered: $PG_CLUSTER" | ||
| else | ||
| fail "No PostgreSQL cluster discovered" | ||
| fi | ||
|
|
||
| INST_COUNT=$(curl -s "$ORC_URL/api/cluster/$PG_CLUSTER" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo "0") | ||
| if [ "$INST_COUNT" -ge 2 ]; then | ||
| pass "PostgreSQL instances discovered: $INST_COUNT" | ||
| else | ||
| fail "PostgreSQL instances discovered: $INST_COUNT (expected >= 2)" | ||
| fi |
There was a problem hiding this comment.
If PG_CLUSTER is not discovered (empty), the script still proceeds to call /api/cluster/$PG_CLUSTER and run follow-up assertions. This can yield confusing failures (or hit an unintended endpoint) and makes debugging harder. Consider short-circuiting after the discovery failure (e.g., skip/exit remaining PG-specific assertions when PG_CLUSTER is empty).
| test_body_contains "/api/clusters contains PG cluster" "$ORC_URL/api/clusters" "pgprimary" | ||
|
|
There was a problem hiding this comment.
The API assertion test_body_contains "/api/clusters contains PG cluster" ... "pgprimary" assumes the cluster name contains pgprimary. Cluster naming can change (or be normalized) and this makes the test unnecessarily brittle. Prefer asserting the response contains the discovered $PG_CLUSTER value (or validate via /api/cluster/$PG_CLUSTER data) instead of hard-coding pgprimary.
| test_body_contains "/api/clusters contains PG cluster" "$ORC_URL/api/clusters" "pgprimary" | |
| PG_CLUSTER=$(curl -s "$ORC_URL/api/clusters" 2>/dev/null | python3 -c " | |
| import json, sys | |
| try: | |
| clusters = json.load(sys.stdin) | |
| except Exception: | |
| print('') | |
| sys.exit(0) | |
| if isinstance(clusters, list): | |
| for cluster in clusters: | |
| if isinstance(cluster, dict): | |
| name = cluster.get('ClusterName') or cluster.get('clusterName') or cluster.get('Name') or cluster.get('Alias') | |
| if name: | |
| print(name) | |
| sys.exit(0) | |
| print('') | |
| ") | |
| if [ -n "$PG_CLUSTER" ]; then | |
| test_endpoint "GET /api/cluster/$PG_CLUSTER" "$ORC_URL/api/cluster/$PG_CLUSTER" "200" | |
| else | |
| fail "Could not determine PostgreSQL cluster name from /api/clusters" | |
| fi |
| -- Orchestrator monitoring user | ||
| CREATE ROLE orchestrator WITH LOGIN PASSWORD 'orch_pass'; | ||
| GRANT pg_monitor TO orchestrator; | ||
| -- Allow orchestrator to promote standbys and reload config | ||
| ALTER ROLE orchestrator SUPERUSER; | ||
| EOSQL |
There was a problem hiding this comment.
init-primary.sh grants orchestrator SUPERUSER and pg_hba.conf allows password auth from all addresses. Combined with the host port publishing in docker-compose.yml, this creates a broad superuser entry point during CI runs. If possible, reduce privileges to what orchestrator needs for tests (e.g., pg_monitor + pg_signal_backend for pg_promote) and/or restrict pg_hba.conf to the Docker network CIDR / remove host port publishing for Postgres services.
| pgstandby1: | ||
| image: postgres:17 | ||
| hostname: pgstandby1 | ||
| environment: | ||
| POSTGRES_PASSWORD: testpass | ||
| PGUSER: postgres | ||
| PGPASSWORD: repl_pass | ||
| volumes: | ||
| - ./postgres/init-standby.sh:/init-standby.sh | ||
| entrypoint: ["/bin/bash", "/init-standby.sh"] | ||
| depends_on: | ||
| pgprimary: | ||
| condition: service_healthy | ||
| ports: | ||
| - "15433:5432" | ||
| healthcheck: | ||
| test: ["CMD-SHELL", "pg_isready -U postgres"] | ||
| interval: 5s | ||
| timeout: 3s | ||
| retries: 30 | ||
| networks: | ||
| orchnet: | ||
| aliases: | ||
| - pgstandby1 | ||
|
|
There was a problem hiding this comment.
The PR description / linked issue mentions a primary plus two streaming replicas, but this compose file defines only one standby (pgstandby1). If the intent is to close #68 as written, add the second standby service (and extend the tests accordingly); otherwise, update the PR/issue linkage so expectations match what’s implemented.
Summary
orchestrator-pgon port 3098) configured withProviderType: "postgresql"test-postgresql.shcovering: discovery verification, read-only state checks, API endpoint validation, and DeadPrimary failover with automatic standby promotion.github/workflows/functional.ymlto run PostgreSQL tests after the existing MySQL testsTest plan
/api/clusters,/api/v2/clusters,/api/v2/statusreturn expected data/api/v2/recoveriesshows successful recoveryCloses #68