Skip to content

Add PostgreSQL functional tests with real containers in CI#71

Merged
renecannao merged 1 commit intomasterfrom
issue68-postgresql-tests
Apr 3, 2026
Merged

Add PostgreSQL functional tests with real containers in CI#71
renecannao merged 1 commit intomasterfrom
issue68-postgresql-tests

Conversation

@renecannao
Copy link
Copy Markdown

Summary

  • Adds PostgreSQL 17 primary + standby containers to the functional test docker-compose infrastructure
  • Creates init scripts for primary (WAL replication config, replication user, orchestrator monitoring user) and standby (pg_basebackup + streaming replication)
  • Adds a dedicated orchestrator instance (orchestrator-pg on port 3098) configured with ProviderType: "postgresql"
  • Adds test-postgresql.sh covering: discovery verification, read-only state checks, API endpoint validation, and DeadPrimary failover with automatic standby promotion
  • Updates .github/workflows/functional.yml to run PostgreSQL tests after the existing MySQL tests

Test plan

  • PostgreSQL containers start and form a streaming replication topology
  • Orchestrator discovers pgprimary and pgstandby1
  • Primary reports read_only=false, standby reports read_only=true
  • /api/clusters, /api/v2/clusters, /api/v2/status return expected data
  • Stopping pgprimary triggers DeadPrimary recovery and promotes pgstandby1
  • /api/v2/recoveries shows successful recovery

Closes #68

Add functional test infrastructure for PostgreSQL streaming replication:

- PostgreSQL 17 primary + standby containers in docker-compose.yml
- Init scripts for primary (WAL config, replication user, orchestrator
  user) and standby (pg_basebackup from primary)
- Dedicated orchestrator instance (port 3098) with PostgreSQL provider
- Test script covering discovery, read-only verification, API endpoints,
  and DeadPrimary failover with automatic standby promotion
- GitHub Actions workflow updated to run PG tests after MySQL tests

Closes #68
Copilot AI review requested due to automatic review settings April 3, 2026 07:11
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

Warning

Rate limit exceeded

@renecannao has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 41 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 41 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: db7ef2a2-579e-4d6d-843f-48473319ff49

📥 Commits

Reviewing files that changed from the base of the PR and between 51ff6f6 and feebad6.

📒 Files selected for processing (6)
  • .github/workflows/functional.yml
  • tests/functional/docker-compose.yml
  • tests/functional/orchestrator-pg-test.conf.json
  • tests/functional/postgres/init-primary.sh
  • tests/functional/postgres/init-standby.sh
  • tests/functional/test-postgresql.sh
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue68-postgresql-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a functional testing framework for PostgreSQL within Orchestrator, including a Docker Compose environment with primary and standby nodes and a comprehensive test script for discovery and failover. Key feedback points out that installing packages during container startup is inefficient and suggests replacing fixed sleeps in the test script with polling loops to prevent flakiness during successor promotion checks.

- ./orchestrator-pg-test.conf.json:/orchestrator/orchestrator.conf.json:ro
command: >
bash -c "
apt-get update -qq && apt-get install -y -qq curl sqlite3 > /dev/null 2>&1 &&
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Running apt-get update and apt-get install every time the container starts is inefficient and makes the tests dependent on external network connectivity and package repository availability. It is recommended to use a custom Dockerfile or a pre-built image that already includes curl and sqlite3 to speed up the test execution and improve reliability.

Comment on lines +158 to +175
sleep 3
SUCCESSOR_RO=$(curl -s "$ORC_URL/api/cluster/$PG_CLUSTER" 2>/dev/null | python3 -c "
import json, sys
instances = json.load(sys.stdin)
for inst in instances:
hostname = inst.get('Key', {}).get('Hostname', '')
if hostname == '$SUCCESSOR':
print('true' if inst.get('ReadOnly', True) else 'false')
sys.exit(0)
print('unknown')
" 2>/dev/null || echo "unknown")

if [ "$SUCCESSOR_RO" = "false" ]; then
pass "Successor $SUCCESSOR promoted (read_only=false)"
else
# After promotion the instance needs a poll cycle to update
skip "Successor read_only=$SUCCESSOR_RO (may need additional poll cycle)"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current check for successor promotion is potentially flaky. It uses a fixed 3-second sleep, which may be shorter than the orchestrator's poll interval (InstancePollSeconds: 5), and then skips the test if the condition isn't met. It's better to loop and wait for the expected state change (read_only=false) to ensure the test correctly verifies the promotion.

    echo "Waiting for successor to be reported as primary (read_only=false)..."
    SUCCESSOR_PROMOTED=false
    for j in $(seq 1 20); do
        SUCCESSOR_RO=$(curl -s "$ORC_URL/api/cluster/$PG_CLUSTER" 2>/dev/null | python3 -c "
import json, sys
instances = json.load(sys.stdin)
for inst in instances:
    hostname = inst.get('Key', {}).get('Hostname', '')
    if hostname == '$SUCCESSOR':
        print('true' if inst.get('ReadOnly', True) else 'false')
        sys.exit(0)
print('unknown')
" 2>/dev/null || echo "unknown")

        if [ "$SUCCESSOR_RO" = "false" ]; then
            pass "Successor $SUCCESSOR promoted (read_only=false)"
            SUCCESSOR_PROMOTED=true
            break
        fi
        sleep 1
    done
    if [ "$SUCCESSOR_PROMOTED" = "false" ]; then
        fail "Successor $SUCCESSOR not promoted (read_only=$SUCCESSOR_RO) after 20s"
    fi

@renecannao renecannao merged commit 86a4f3d into master Apr 3, 2026
7 of 8 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds PostgreSQL functional coverage to the existing tests/functional container-based infrastructure, and wires it into the GitHub Actions functional workflow so CI validates PostgreSQL discovery and automated failover behaviors.

Changes:

  • Introduces PostgreSQL primary + standby services in tests/functional/docker-compose.yml, with init scripts to bootstrap streaming replication.
  • Adds a dedicated orchestrator-pg instance configured for ProviderType: "postgresql" and a new test-postgresql.sh script to validate discovery, API responses, and DeadPrimary recovery.
  • Extends .github/workflows/functional.yml to start PostgreSQL services and execute the new PostgreSQL functional test script after the existing MySQL tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/functional/test-postgresql.sh New functional test script for PostgreSQL discovery + failover + API validation.
tests/functional/postgres/init-standby.sh New standby entrypoint to take a base backup from primary and start streaming replication.
tests/functional/postgres/init-primary.sh New primary initdb script to enable WAL settings and create replication/orchestrator users.
tests/functional/orchestrator-pg-test.conf.json Orchestrator config for a dedicated PostgreSQL-provider test instance (port 3098, sqlite backend).
tests/functional/docker-compose.yml Adds pgprimary, pgstandby1, and orchestrator-pg services to the functional test compose stack.
.github/workflows/functional.yml Runs PostgreSQL container bring-up + orchestrator bring-up + new PostgreSQL tests in CI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

environment:
POSTGRES_PASSWORD: testpass
PGUSER: postgres
PGPASSWORD: repl_pass
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the pgstandby1 service the libpq environment variables are inconsistent (PGUSER: postgres but PGPASSWORD: repl_pass). This can break readiness checks / any libpq client defaults, and it makes the intended authentication unclear. Set PGUSER/PGPASSWORD to the same account you expect to use (e.g., PGUSER: repl with PGPASSWORD: repl_pass, or keep postgres with testpass).

Suggested change
PGPASSWORD: repl_pass
PGPASSWORD: testpass

Copilot uses AI. Check for mistakes.
PRIMARY_PORT=5432

echo "Waiting for primary to accept connections..."
until pg_isready -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U postgres; do
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init-standby.sh waits for the primary using pg_isready -U postgres, but this container’s environment sets PGPASSWORD to the replication password. If the primary requires password auth, this readiness loop may never succeed. Use the replication user (or ensure PGPASSWORD matches the user used in pg_isready).

Suggested change
until pg_isready -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U postgres; do
until pg_isready -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U repl; do

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +65
if [ -n "$PG_CLUSTER" ]; then
pass "PostgreSQL cluster discovered: $PG_CLUSTER"
else
fail "No PostgreSQL cluster discovered"
fi

INST_COUNT=$(curl -s "$ORC_URL/api/cluster/$PG_CLUSTER" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo "0")
if [ "$INST_COUNT" -ge 2 ]; then
pass "PostgreSQL instances discovered: $INST_COUNT"
else
fail "PostgreSQL instances discovered: $INST_COUNT (expected >= 2)"
fi
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PG_CLUSTER is not discovered (empty), the script still proceeds to call /api/cluster/$PG_CLUSTER and run follow-up assertions. This can yield confusing failures (or hit an unintended endpoint) and makes debugging harder. Consider short-circuiting after the discovery failure (e.g., skip/exit remaining PG-specific assertions when PG_CLUSTER is empty).

Copilot uses AI. Check for mistakes.
Comment on lines +109 to +110
test_body_contains "/api/clusters contains PG cluster" "$ORC_URL/api/clusters" "pgprimary"

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API assertion test_body_contains "/api/clusters contains PG cluster" ... "pgprimary" assumes the cluster name contains pgprimary. Cluster naming can change (or be normalized) and this makes the test unnecessarily brittle. Prefer asserting the response contains the discovered $PG_CLUSTER value (or validate via /api/cluster/$PG_CLUSTER data) instead of hard-coding pgprimary.

Suggested change
test_body_contains "/api/clusters contains PG cluster" "$ORC_URL/api/clusters" "pgprimary"
PG_CLUSTER=$(curl -s "$ORC_URL/api/clusters" 2>/dev/null | python3 -c "
import json, sys
try:
clusters = json.load(sys.stdin)
except Exception:
print('')
sys.exit(0)
if isinstance(clusters, list):
for cluster in clusters:
if isinstance(cluster, dict):
name = cluster.get('ClusterName') or cluster.get('clusterName') or cluster.get('Name') or cluster.get('Alias')
if name:
print(name)
sys.exit(0)
print('')
")
if [ -n "$PG_CLUSTER" ]; then
test_endpoint "GET /api/cluster/$PG_CLUSTER" "$ORC_URL/api/cluster/$PG_CLUSTER" "200"
else
fail "Could not determine PostgreSQL cluster name from /api/clusters"
fi

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +34
-- Orchestrator monitoring user
CREATE ROLE orchestrator WITH LOGIN PASSWORD 'orch_pass';
GRANT pg_monitor TO orchestrator;
-- Allow orchestrator to promote standbys and reload config
ALTER ROLE orchestrator SUPERUSER;
EOSQL
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init-primary.sh grants orchestrator SUPERUSER and pg_hba.conf allows password auth from all addresses. Combined with the host port publishing in docker-compose.yml, this creates a broad superuser entry point during CI runs. If possible, reduce privileges to what orchestrator needs for tests (e.g., pg_monitor + pg_signal_backend for pg_promote) and/or restrict pg_hba.conf to the Docker network CIDR / remove host port publishing for Postgres services.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +139
pgstandby1:
image: postgres:17
hostname: pgstandby1
environment:
POSTGRES_PASSWORD: testpass
PGUSER: postgres
PGPASSWORD: repl_pass
volumes:
- ./postgres/init-standby.sh:/init-standby.sh
entrypoint: ["/bin/bash", "/init-standby.sh"]
depends_on:
pgprimary:
condition: service_healthy
ports:
- "15433:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 30
networks:
orchnet:
aliases:
- pgstandby1

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description / linked issue mentions a primary plus two streaming replicas, but this compose file defines only one standby (pgstandby1). If the intent is to close #68 as written, add the second standby service (and extend the tests accordingly); otherwise, update the PR/issue linkage so expectations match what’s implemented.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase D: PostgreSQL functional tests in CI

2 participants