Add functional test CI: Docker MySQL topology + ProxySQL + failover tests#51
Add functional test CI: Docker MySQL topology + ProxySQL + failover tests#51renecannao merged 11 commits intomasterfrom
Conversation
Comprehensive functional tests running against real MySQL topology + ProxySQL: Infrastructure (docker-compose.yml): - MySQL 8.4 master + 2 replicas with GTID replication - ProxySQL with writer (HG 10) and reader (HG 20) hostgroups - Auto-configured replication and orchestrator user Test suites: - test-smoke.sh (Tier A): Discovery, API v1/v2, Prometheus, health endpoints, ProxySQL CLI/API, web UI, static files (25+ checks) - test-failover.sh (Tier B): Graceful master takeover with ProxySQL hook validation, hard failover (kill master), recovery audit (12+ checks) - test-regression.sh (Tier C): Full chi router regression, API v2 response format, Prometheus metrics, health endpoints (22+ checks) GitHub Actions workflow (functional.yml): - Triggers on push to master, PRs, and manual dispatch - Starts Docker infrastructure, builds orchestrator, runs all 3 suites - Uploads orchestrator and docker logs as artifacts on failure - 15 minute timeout Closes #46, closes #47, closes #48, closes #49, closes #50
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a functional test harness: Docker Compose topology (MySQL master + 2 replicas + ProxySQL), initialization and replication setup, orchestrator test config and binary build, three test suites (smoke, regression, failover), shared helpers, a runner and GitHub Actions workflow that boots the stack, runs tests, collects logs/artifacts, and always tears down. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant DC as Docker Compose (mysql1,mysql2,mysql3,proxysql)
participant SETUP as setup-replication.sh
participant ORC as Orchestrator (bin/orchestrator)
participant TEST as Test Suites (smoke/regression/failover)
participant ART as Artifact Uploader
GH->>DC: docker compose up -d (tests/functional/docker-compose.yml)
DC-->>GH: services start & healthchecks
GH->>SETUP: run tests/functional/setup-replication.sh
SETUP->>DC: exec MySQL CHANGE REPLICATION / START REPLICA
SETUP-->>GH: replication verified
GH->>ORC: build & start orchestrator (orchestrator-test.conf.json)
ORC-->>GH: HTTP API ready (/api/clusters)
GH->>TEST: run test-smoke.sh -> test-regression.sh -> test-failover.sh
TEST->>ORC: HTTP API calls (discovery, recovery endpoints)
TEST->>DC: docker compose exec for ProxySQL/MySQL inspection
TEST-->>GH: PASS/FAIL results
GH->>ORC: collect /tmp/orchestrator-test.log
GH->>DC: docker compose down -v --remove-orphans
GH->>ART: upload logs/artifacts
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust and automated functional testing framework for the orchestrator. By leveraging Docker Compose, it sets up a realistic MySQL replication topology with ProxySQL, enabling comprehensive validation of topology discovery, API endpoints, and critical failover mechanisms. This significantly enhances the reliability and maintainability of the orchestrator by ensuring that all key functionalities are continuously tested in an integrated environment. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive functional test suite, which is a great addition for ensuring the stability and correctness of orchestrator, especially around failover scenarios. The test setup using Docker Compose is well-structured. I've identified a few areas for improvement, mainly concerning the robustness and maintainability of the test environment and scripts. Key points include pinning Docker image versions for reproducibility, avoiding hardcoded credentials, fixing a potential bug with duplicate MySQL server-ids, and making the test scripts more robust. Addressing these points will make the new test suite even more valuable.
| @@ -0,0 +1,9 @@ | |||
| [mysqld] | |||
| server-id=100 | |||
There was a problem hiding this comment.
Both mysql2 and mysql3 replicas use this same configuration file, which means they will both have server-id=100. This creates a server-id collision. While the current test setup might work, this is a latent bug that will cause issues if orchestrator attempts to reconfigure replication between these two instances (e.g., making mysql3 a replica of mysql2 after a failover). Each MySQL instance in a replication topology must have a unique server-id.
To fix this, you should use separate configuration files for each replica (e.g., replica2.cnf and replica3.cnf) with unique server-ids, and update tests/functional/docker-compose.yml accordingly.
| image: mysql:8.4 | ||
| hostname: mysql1 | ||
| environment: | ||
| MYSQL_ROOT_PASSWORD: testpass |
There was a problem hiding this comment.
Hardcoded credentials are used here and throughout the new test files. While this is a test environment, it's a security risk and a bad practice that could be copied elsewhere. Consider using environment variables to manage secrets. You can define them in a .env file (which should be added to .gitignore) and Docker Compose will automatically pick them up. This would centralize credential management and improve security hygiene.
This issue is also present in:
tests/functional/docker-compose.yml(lines 13, 24, 32, 43, 51, 71)tests/functional/mysql/init-master.sqltests/functional/mysql/init-replica.sqltests/functional/orchestrator-test.conf.jsontests/functional/proxysql/proxysql.cnftests/functional/run-all.shtests/functional/test-failover.shtests/functional/lib.sh
| - orchnet | ||
|
|
||
| proxysql: | ||
| image: proxysql/proxysql:latest |
| test_endpoint() { | ||
| local NAME="$1" URL="$2" EXPECT="$3" | ||
| local CODE | ||
| CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL" 2>&1) | ||
| if [ "$CODE" = "$EXPECT" ]; then | ||
| pass "$NAME (HTTP $CODE)" | ||
| else | ||
| fail "$NAME (HTTP $CODE, expected $EXPECT)" | ||
| fi | ||
| } |
There was a problem hiding this comment.
The use of 2>&1 can lead to incorrect behavior if curl fails to connect. In such cases, the error message from curl is redirected to stdout and gets assigned to the CODE variable, which will then contain more than just the HTTP status code. A more robust approach is to remove the redirection and check curl's exit code separately.
| test_endpoint() { | |
| local NAME="$1" URL="$2" EXPECT="$3" | |
| local CODE | |
| CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL" 2>&1) | |
| if [ "$CODE" = "$EXPECT" ]; then | |
| pass "$NAME (HTTP $CODE)" | |
| else | |
| fail "$NAME (HTTP $CODE, expected $EXPECT)" | |
| fi | |
| } | |
| test_endpoint() { | |
| local NAME="$1" URL="$2" EXPECT="$3" | |
| local CODE | |
| CODE=$(curl -s -o /dev/null -w "%{http_code}" "$URL") | |
| if [ $? -ne 0 ]; then | |
| fail "$NAME" "curl command failed for $URL" | |
| return | |
| fi | |
| if [ "$CODE" = "$EXPECT" ]; then | |
| pass "$NAME (HTTP $CODE)" | |
| else | |
| fail "$NAME (HTTP $CODE, expected $EXPECT)" | |
| fi | |
| } |
| test_body_contains() { | ||
| local NAME="$1" URL="$2" EXPECT="$3" | ||
| local BODY | ||
| BODY=$(curl -s "$URL" 2>&1) | ||
| if echo "$BODY" | grep -q "$EXPECT"; then | ||
| pass "$NAME" | ||
| else | ||
| fail "$NAME" "Response does not contain '$EXPECT'" | ||
| fi | ||
| } |
There was a problem hiding this comment.
This function is not fully robust. If curl fails with a network error, the error message is captured in BODY, and the test might fail with a generic message. It's better to explicitly check curl's exit code. Additionally, using grep -F is safer for matching fixed strings, as it prevents misinterpretation of special characters as regex.
| test_body_contains() { | |
| local NAME="$1" URL="$2" EXPECT="$3" | |
| local BODY | |
| BODY=$(curl -s "$URL" 2>&1) | |
| if echo "$BODY" | grep -q "$EXPECT"; then | |
| pass "$NAME" | |
| else | |
| fail "$NAME" "Response does not contain '$EXPECT'" | |
| fi | |
| } | |
| test_body_contains() { | |
| local NAME="$1" URL="$2" EXPECT="$3" | |
| local BODY | |
| BODY=$(curl -s "$URL") | |
| if [ $? -ne 0 ]; then | |
| fail "$NAME" "curl command failed for $URL" | |
| return | |
| fi | |
| if echo "$BODY" | grep -qF -- "$EXPECT"; then | |
| pass "$NAME" | |
| else | |
| fail "$NAME" "Response does not contain '$EXPECT'" | |
| fi | |
| } |
tests/functional/lib.sh
Outdated
| echo "Waiting for topology discovery..." | ||
| for i in $(seq 1 60); do | ||
| local COUNT | ||
| COUNT=$(curl -s "$ORC_URL/api/cluster/$MASTER_HOST:3306" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null) |
There was a problem hiding this comment.
The Python one-liner for parsing JSON is not fully robust. If curl returns a non-JSON response (e.g., an HTML error page), the json.load call will fail, and the COUNT variable will be empty. This can lead to silent failures in the loop condition. It's safer to handle this case by defaulting to 0 if the Python script fails.
| COUNT=$(curl -s "$ORC_URL/api/cluster/$MASTER_HOST:3306" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null) | |
| COUNT=$(curl -s "$ORC_URL/api/cluster/$MASTER_HOST:3306" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo 0) |
| fail "Graceful takeover API returned: $CODE" "$(echo "$RESULT" | head -c 200)" | ||
| fi | ||
|
|
||
| sleep 3 |
There was a problem hiding this comment.
Using fixed sleep durations can make tests flaky or unnecessarily slow. It's more reliable to poll for the desired state. For example, instead of sleeping for 3 seconds after triggering a graceful takeover, you could create a polling loop that repeatedly checks the read_only status of mysql1 and mysql2 until they reflect the role change. This would make the test faster and more robust against timing variations.
This applies to other sleep commands in this file as well (e.g., lines 92, 96, 142, 162).
There was a problem hiding this comment.
Pull request overview
Adds a Docker-based functional test environment (MySQL + ProxySQL) and wires three tiered functional test suites into a GitHub Actions workflow so manual QA checks run automatically on PRs and master.
Changes:
- Introduces
tests/functional/Docker Compose topology (MySQL primary/replicas + ProxySQL) plus an orchestrator test config. - Adds Tier A/B/C bash test suites (smoke, failover, regression) and a local
run-all.shrunner. - Adds
.github/workflows/functional.ymlto run the functional test suite in CI and upload logs as artifacts.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/functional.yml |
New CI workflow to build orchestrator, start Compose topology, and run functional suites. |
tests/functional/docker-compose.yml |
Defines MySQL+ProxySQL topology used by functional tests. |
tests/functional/lib.sh |
Shared bash helpers (HTTP checks, discovery, mysql/proxysql helpers, summary). |
tests/functional/run-all.sh |
Local runner to build, start infra, start orchestrator, and run all suites. |
tests/functional/test-smoke.sh |
Tier A smoke checks (UI, v1/v2 APIs, metrics, health, ProxySQL API/CLI). |
tests/functional/test-regression.sh |
Tier C regression coverage for key endpoints, metrics, health, static assets. |
tests/functional/test-failover.sh |
Tier B graceful + hard failover validation and ProxySQL hook assertions. |
tests/functional/orchestrator-test.conf.json |
Orchestrator config tuned for functional tests (SQLite backend, ProxySQL settings, metrics). |
tests/functional/proxysql/proxysql.cnf |
ProxySQL config (admin creds, hostgroups, monitor + mysql users). |
tests/functional/mysql/master.cnf |
MySQL primary config (GTID/binlog settings). |
tests/functional/mysql/replica.cnf |
MySQL replica config (GTID/binlog settings, read-only). |
tests/functional/mysql/init-master.sql |
Initializes users for orchestrator + replication. |
tests/functional/mysql/init-replica.sql |
Initializes orchestrator user and sets up replication from primary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mysql1: | ||
| image: mysql:8.4 | ||
| hostname: mysql1 | ||
| environment: | ||
| MYSQL_ROOT_PASSWORD: testpass |
There was a problem hiding this comment.
The functional workflow starts orchestrator as a host process, but these Compose services do not publish MySQL/ProxySQL ports to the runner. A host process also cannot resolve mysql1/proxysql via Docker DNS, so orchestrator won't be able to connect to the topology as configured. Either (a) run orchestrator as a Compose service on orchnet and expose 3099, or (b) publish the needed ports and update the test config/URLs to use host-reachable addresses.
| @@ -0,0 +1,9 @@ | |||
| [mysqld] | |||
| server-id=100 | |||
There was a problem hiding this comment.
Both replicas mount this same config, so they will share the same server-id value. In MySQL replication, each server must have a unique server_id; duplicates can cause replicas to disconnect or behave unpredictably (especially under GTID). Provide distinct configs (e.g., replica2.cnf/replica3.cnf) or override server-id per service.
| server-id=100 | |
| # server-id must be set uniquely per replica instance (e.g., via instance-specific config) |
| echo "--- Step 4: Run smoke tests ---" | ||
| bash tests/functional/test-smoke.sh | ||
| SMOKE_EXIT=$? | ||
|
|
||
| echo "" | ||
| echo "--- Step 5: Run regression tests ---" | ||
| bash tests/functional/test-regression.sh | ||
| REGRESSION_EXIT=$? | ||
|
|
||
| echo "" | ||
| echo "--- Step 6: Run failover tests ---" | ||
| bash tests/functional/test-failover.sh | ||
| FAILOVER_EXIT=$? |
There was a problem hiding this comment.
Because this script runs with set -e, if any suite script exits non-zero the entire run-all.sh will terminate immediately, and the subsequent *_EXIT=$? assignments and cleanup won't run. Disable -e around the suite execution (or use bash ...; SMOKE_EXIT=$? with set +e), so you can collect all exit codes and always perform teardown.
| echo "Failover: exit $FAILOVER_EXIT" | ||
|
|
||
| # Exit with failure if any suite failed | ||
| [ "$SMOKE_EXIT" -ne 0 ] || [ "$REGRESSION_EXIT" -ne 0 ] || [ "$FAILOVER_EXIT" -ne 0 ] && exit 1 |
There was a problem hiding this comment.
The final failure check uses a chained ||/&& expression whose short-circuiting means it may not exit 1 when smoke/regression fails (because earlier || terms evaluate true and skip the trailing && exit 1). Use an explicit if (or group with { ...; }) to exit non-zero when any suite exit code is non-zero.
| [ "$SMOKE_EXIT" -ne 0 ] || [ "$REGRESSION_EXIT" -ne 0 ] || [ "$FAILOVER_EXIT" -ne 0 ] && exit 1 | |
| if [ "$SMOKE_EXIT" -ne 0 ] || [ "$REGRESSION_EXIT" -ne 0 ] || [ "$FAILOVER_EXIT" -ne 0 ]; then | |
| exit 1 | |
| fi |
| - orchnet | ||
|
|
||
| proxysql: | ||
| image: proxysql/proxysql:latest |
There was a problem hiding this comment.
Using proxysql/proxysql:latest makes the functional tests non-reproducible and can break CI when upstream publishes a new image with behavior/config changes. Pin to a known-good ProxySQL version tag (and ideally document the required major/minor).
| image: proxysql/proxysql:latest | |
| image: proxysql/proxysql:2.6.3 |
|
|
||
| fail() { | ||
| echo " ❌ FAIL: $1" | ||
| [ -n "$2" ] && echo " $2" |
There was a problem hiding this comment.
With set -u enabled in all test scripts, referencing $2 here will trigger an "unbound variable" error when fail is called with a single argument (which happens throughout the suite). Use a safe expansion like ${2-} (or shift/iterate over args) before testing/printing the optional details parameter.
| [ -n "$2" ] && echo " $2" | |
| [ -n "${2-}" ] && echo " ${2-}" |
tests/functional/lib.sh
Outdated
| echo "Waiting for topology discovery..." | ||
| for i in $(seq 1 60); do | ||
| local COUNT | ||
| COUNT=$(curl -s "$ORC_URL/api/cluster/$MASTER_HOST:3306" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null) |
There was a problem hiding this comment.
discover_topology assumes /api/cluster/... always returns a JSON array; when it returns an error envelope/object (or temporarily invalid JSON while the cluster is being discovered), json.load may fail or len() may yield a non-instance count (e.g., number of object keys), causing flakiness or false positives. Make the parsing resilient by treating non-list/parse errors as count=0 and continuing to retry.
| COUNT=$(curl -s "$ORC_URL/api/cluster/$MASTER_HOST:3306" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null) | |
| COUNT=$(curl -s "$ORC_URL/api/cluster/$MASTER_HOST:3306" 2>/dev/null | python3 -c 'import json, sys | |
| try: | |
| data = json.load(sys.stdin) | |
| if isinstance(data, list): | |
| print(len(data)) | |
| else: | |
| print(0) | |
| except Exception: | |
| print(0) | |
| ' 2>/dev/null) |
- mysql3 gets server-id=200 (was 100, same as mysql2) - Replication configured by setup-replication.sh after containers are up (init scripts run during mysqld startup and can't connect to master) - Workflow uses setup-replication.sh instead of inline verification
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/functional.yml:
- Around line 56-65: The current readiness loop only checks mysql2 for
Replica_IO_Running which can leave mysql3 or the SQL thread unready; update the
readiness check (the timeout loop that runs the mysql -uroot -ptestpass "SHOW
REPLICA STATUS\G" command) to verify all replicas (e.g., mysql2 and mysql3) and
that both Replica_IO_Running and Replica_SQL_Running are "Yes" for each host
before proceeding; implement this by iterating the target containers (mysql2,
mysql3) and asserting both fields are "Yes" in the SHOW REPLICA STATUS\G output
(or by checking the results of grepping for "Replica_IO_Running: Yes" and
"Replica_SQL_Running: Yes" per container) and only exit success when every
replica passes, otherwise keep waiting until timeout.
In `@tests/functional/docker-compose.yml`:
- Line 5: Replace floating image tags in the docker-compose.yml to pin images to
immutable versions: locate the image entries "mysql:8.4" (appearing at the three
occurrences) and "proxysql/proxysql:latest" and update them to digest-pinned
image references (e.g., mysql@sha256:<digest> and
proxysql/proxysql@sha256:<digest>) or a specific immutable version tag; fetch
the appropriate digests from the upstream registries (Docker Hub or the image
provider) and substitute them so the functional test suite uses fixed,
reproducible images.
In `@tests/functional/lib.sh`:
- Around line 57-63: The readiness check in wait_for_orchestrator currently
treats any HTTP response (including 4xx/5xx) as success because curl is invoked
with -s -o /dev/null; change the curl invocation in wait_for_orchestrator (the
call hitting "$ORC_URL/api/clusters") to use flags that cause non-2xx responses
to exit nonzero (e.g., replace -s -o /dev/null with -fsS -o /dev/null or
equivalent) so the loop only reports "Orchestrator ready" on actual HTTP
success.
- Around line 33-53: The test functions test_endpoint and test_body_contains
currently let curl failures abort the whole suite and mix stderr into outputs;
update them to run curl without inheriting set -e (e.g., append || true or
capture exit code immediately), capture stdout and stderr separately (don’t use
2>&1 into the asserted payload), check curl’s exit status and treat any non-zero
exit as a test failure (call fail with diagnostic including the curl stderr and
exit code), and only then assert on the HTTP code or body content; reference
these changes inside the test_endpoint and test_body_contains functions so
failures are recorded instead of aborting the script.
- Around line 9-22: The three reporting functions (pass, fail, skip) are unsafe
under set -euo pipefail: change the post-increment arithmetic ((...++)) to a
safe assignment form (e.g. PASS_COUNT=$((PASS_COUNT+1)),
SKIP_COUNT=$((SKIP_COUNT+1)), FAIL_COUNT=$((FAIL_COUNT+1))) so the increment
never produces a failing exit status, and make fail() guard the optional second
parameter by using a parameter expansion fallback (e.g. "${2:-}") or testing
with "${2+set}" before echoing so referencing $2 cannot trigger an unbound
variable error; update the functions pass, fail, and skip accordingly (ensuring
their echo lines remain) to preserve behavior under strict mode.
In `@tests/functional/mysql/replica.cnf`:
- Line 2: The replica.cnf currently hardcodes server-id=100 which causes
duplicate IDs; remove the server-id line from tests/functional/mysql/replica.cnf
and instead add per-service command overrides in the docker-compose service
definitions for mysql2 and mysql3 (e.g., set --server-id=2 for mysql2 and
--server-id=3 for mysql3) so each replica has a unique MySQL server-id; update
the mysql2 and mysql3 service blocks to include the command override (or
equivalent entrypoint args) rather than relying on the shared replica.cnf.
In `@tests/functional/run-all.sh`:
- Around line 68-80: The script currently aborts early due to set -e when any
test script fails, so wrap each test invocation to prevent immediate exit and
still capture its exit code: run each suite as "bash
tests/functional/test-smoke.sh || true" (and similarly for test-regression.sh
and test-failover.sh), then immediately assign the real exit status to
SMOKE_EXIT, REGRESSION_EXIT and FAILOVER_EXIT via "$?" (e.g., bash ... || true;
SMOKE_EXIT=$?), ensuring the cleanup/summary and trap handlers always run; keep
the rest of the logic unchanged so the final summary uses those variables.
- Around line 48-55: The loop that sets REPL_OK using command substitution exits
under set -euo pipefail when grep finds no match; change the command to avoid
failing the pipeline (e.g., append "|| true" or use grep -q with || true and
capture exit status/output) and parse the mysql SHOW REPLICA STATUS output
safely; extend the check to run against both replicas (mysql2 and mysql3) and
validate both fields Replica_IO_Running and Replica_SQL_Running (aggregate
counts into REPL_OK or separate variables), retry up to the existing timeout,
and if after retries replication is not fully ready, print a clear error and
exit non-zero so the script fails instead of continuing. Ensure you update the
invocation(s) that use $COMPOSE exec -T mysql ... and the REPL_OK logic
accordingly.
In `@tests/functional/test-failover.sh`:
- Around line 78-99: The script restores MySQL and ProxySQL but doesn't verify
the resulting topology; add explicit checks using the helper functions
mysql_read_only, mysql_source_host, and proxysql_servers (from
tests/functional/lib.sh) before declaring PASS and proceeding to the
hard-failover case. After the discovery curls and sleep, call
mysql_read_only(mysql1) to assert mysql1 is writable (read_only=0),
mysql_read_only(mysql2/mysql3) to assert they are read-only, use
mysql_source_host(mysql2) and mysql_source_host(mysql3) to assert their source
is mysql1, and use proxysql_servers() (or proxysql_servers for hostgroup 10/20)
to verify ProxySQL backends were applied; if any check fails, print diagnostics
and exit non‑zero instead of passing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1895a8cf-54c8-4e03-adda-336b217b2263
📒 Files selected for processing (13)
.github/workflows/functional.ymltests/functional/docker-compose.ymltests/functional/lib.shtests/functional/mysql/init-master.sqltests/functional/mysql/init-replica.sqltests/functional/mysql/master.cnftests/functional/mysql/replica.cnftests/functional/orchestrator-test.conf.jsontests/functional/proxysql/proxysql.cnftests/functional/run-all.shtests/functional/test-failover.shtests/functional/test-regression.shtests/functional/test-smoke.sh
.github/workflows/functional.yml
Outdated
| timeout 30 bash -c ' | ||
| while true; do | ||
| REPL=$(docker compose exec -T mysql2 mysql -uroot -ptestpass -Nse "SHOW REPLICA STATUS\G" 2>/dev/null | grep "Replica_IO_Running: Yes" || true) | ||
| if [ -n "$REPL" ]; then | ||
| echo "Replication running" | ||
| exit 0 | ||
| fi | ||
| sleep 1 | ||
| done | ||
| ' || { echo "Replication not running"; exit 1; } |
There was a problem hiding this comment.
Wait for full replica readiness before starting the suites.
This only checks mysql2 and only Replica_IO_Running. CI can proceed with mysql3 still broken or with a stopped SQL thread, which makes the later failover assertions flaky against a partially healthy topology.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/functional.yml around lines 56 - 65, The current readiness
loop only checks mysql2 for Replica_IO_Running which can leave mysql3 or the SQL
thread unready; update the readiness check (the timeout loop that runs the mysql
-uroot -ptestpass "SHOW REPLICA STATUS\G" command) to verify all replicas (e.g.,
mysql2 and mysql3) and that both Replica_IO_Running and Replica_SQL_Running are
"Yes" for each host before proceeding; implement this by iterating the target
containers (mysql2, mysql3) and asserting both fields are "Yes" in the SHOW
REPLICA STATUS\G output (or by checking the results of grepping for
"Replica_IO_Running: Yes" and "Replica_SQL_Running: Yes" per container) and only
exit success when every replica passes, otherwise keep waiting until timeout.
|
|
||
| services: | ||
| mysql1: | ||
| image: mysql:8.4 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Declared images:"
rg -n '^\s*image:\s*' tests/functional/docker-compose.yml
echo
echo "Floating tags detected (latest or major.minor only):"
rg -nP 'image:\s*\S+:(latest|\d+\.\d+)$' tests/functional/docker-compose.yml || trueRepository: ProxySQL/orchestrator
Length of output: 355
Pin container images to immutable versions (ideally digest-pinned).
Floating tags (mysql:8.4 and proxysql/proxysql:latest) make the functional test suite non-reproducible and vulnerable to upstream image drift. Each pull of the image may yield different contents, causing tests to break without code changes.
Apply to lines 5, 21, 40 (all mysql:8.4), and line 59 (proxysql/proxysql:latest).
🔧 Proposed fix
- image: mysql:8.4
+ image: mysql:8.4.x@sha256:<digest>
- image: proxysql/proxysql:latest
+ image: proxysql/proxysql:<stable-version>@sha256:<digest>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/docker-compose.yml` at line 5, Replace floating image tags
in the docker-compose.yml to pin images to immutable versions: locate the image
entries "mysql:8.4" (appearing at the three occurrences) and
"proxysql/proxysql:latest" and update them to digest-pinned image references
(e.g., mysql@sha256:<digest> and proxysql/proxysql@sha256:<digest>) or a
specific immutable version tag; fetch the appropriate digests from the upstream
registries (Docker Hub or the image provider) and substitute them so the
functional test suite uses fixed, reproducible images.
| pass() { | ||
| echo " ✅ PASS: $1" | ||
| ((PASS_COUNT++)) | ||
| } | ||
|
|
||
| fail() { | ||
| echo " ❌ FAIL: $1" | ||
| [ -n "$2" ] && echo " $2" | ||
| ((FAIL_COUNT++)) | ||
| } | ||
|
|
||
| skip() { | ||
| echo " ⚠️ SKIP: $1" | ||
| ((SKIP_COUNT++)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
bash -euo pipefail -c 'count=0; ((count++)); echo "unreached"'
bash -euo pipefail -c 'f(){ [ -n "$2" ] && echo "$2"; }; f only-one-arg'Repository: ProxySQL/orchestrator
Length of output: 106
🏁 Script executed:
#!/bin/bash
# Check lines 1-8 to see if counters are initialized
head -n 30 tests/functional/lib.shRepository: ProxySQL/orchestrator
Length of output: 566
🏁 Script executed:
#!/bin/bash
# Test 1: Post-increment at 0 returns old value (0), causing exit status 1 under set -e
bash -euo pipefail -c 'count=0; ((count++)); echo "reached: $count"'
echo "Test 1 exit code: $?"
# Test 2: Post-increment at 1+ returns old value, causing exit status 0 (truthy)
bash -euo pipefail -c 'count=1; ((count++)); echo "reached: $count"'
echo "Test 2 exit code: $?"
# Test 3: Assignment form always succeeds
bash -euo pipefail -c 'count=0; count=$((count+1)); echo "reached: $count"'
echo "Test 3 exit code: $?"Repository: ProxySQL/orchestrator
Length of output: 146
pass/fail/skip are not safe under the strict mode the suites enable.
The post-increment ((...++)) evaluates to the old value and returns exit status 1 when the counter is 0, causing the script to exit on the first call to any of these functions. Additionally, fail() dereferences $2 without guarding for missing positional parameters, which causes an unbound variable error under set -u when called with a single argument.
Because every suite does set -euo pipefail before sourcing tests/functional/lib.sh, the first PASS, the first SKIP, or any one-argument FAIL exits the script instead of recording the result.
Suggested fix
pass() {
echo " ✅ PASS: $1"
- ((PASS_COUNT++))
+ PASS_COUNT=$((PASS_COUNT + 1))
}
fail() {
echo " ❌ FAIL: $1"
- [ -n "$2" ] && echo " $2"
- ((FAIL_COUNT++))
+ if [ $# -ge 2 ] && [ -n "$2" ]; then
+ echo " $2"
+ fi
+ FAIL_COUNT=$((FAIL_COUNT + 1))
}
skip() {
echo " ⚠️ SKIP: $1"
- ((SKIP_COUNT++))
+ SKIP_COUNT=$((SKIP_COUNT + 1))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pass() { | |
| echo " ✅ PASS: $1" | |
| ((PASS_COUNT++)) | |
| } | |
| fail() { | |
| echo " ❌ FAIL: $1" | |
| [ -n "$2" ] && echo " $2" | |
| ((FAIL_COUNT++)) | |
| } | |
| skip() { | |
| echo " ⚠️ SKIP: $1" | |
| ((SKIP_COUNT++)) | |
| pass() { | |
| echo " ✅ PASS: $1" | |
| PASS_COUNT=$((PASS_COUNT + 1)) | |
| } | |
| fail() { | |
| echo " ❌ FAIL: $1" | |
| if [ $# -ge 2 ] && [ -n "$2" ]; then | |
| echo " $2" | |
| fi | |
| FAIL_COUNT=$((FAIL_COUNT + 1)) | |
| } | |
| skip() { | |
| echo " ⚠️ SKIP: $1" | |
| SKIP_COUNT=$((SKIP_COUNT + 1)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/lib.sh` around lines 9 - 22, The three reporting functions
(pass, fail, skip) are unsafe under set -euo pipefail: change the post-increment
arithmetic ((...++)) to a safe assignment form (e.g.
PASS_COUNT=$((PASS_COUNT+1)), SKIP_COUNT=$((SKIP_COUNT+1)),
FAIL_COUNT=$((FAIL_COUNT+1))) so the increment never produces a failing exit
status, and make fail() guard the optional second parameter by using a parameter
expansion fallback (e.g. "${2:-}") or testing with "${2+set}" before echoing so
referencing $2 cannot trigger an unbound variable error; update the functions
pass, fail, and skip accordingly (ensuring their echo lines remain) to preserve
behavior under strict mode.
| wait_for_orchestrator() { | ||
| echo "Waiting for orchestrator to be ready..." | ||
| for i in $(seq 1 30); do | ||
| if curl -s -o /dev/null "$ORC_URL/api/clusters" 2>/dev/null; then | ||
| echo "Orchestrator ready after ${i}s" | ||
| return 0 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does curlreturn a zero exit status for HTTP 404/500 responses unless-f/--fail is used?
💡 Result:
Yes, curl returns a zero exit status for HTTP 404 and 500 responses unless the -f or --fail option is used. By default, curl considers the transfer successful as long as the HTTP response is received, regardless of the status code (4xx or 5xx). The --fail option makes curl return a non-zero exit code (typically 22) for HTTP responses with status codes >= 400.
Citations:
- 1: https://curl.se/docs/manpage.html
- 2: https://man7.org/linux/man-pages/man1/curl.1.html
- 3: HTTP error codes not reflected in curl exit value curl/curl#3789
- 4: https://superuser.com/questions/590099/can-i-make-curl-fail-with-an-exitcode-different-than-0-if-the-http-status-code-i
- 5: https://stackoverflow.com/questions/22506024/get-page-output-with-curl-fail
- 6: https://everything.curl.dev/http/response.html
Fix readiness check to require HTTP success response.
curl -s -o /dev/null exits with code 0 for HTTP 404/500 responses, allowing wait_for_orchestrator() to incorrectly report success when the endpoint is failing. Use curl -fsS to make curl fail on HTTP error status codes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/lib.sh` around lines 57 - 63, The readiness check in
wait_for_orchestrator currently treats any HTTP response (including 4xx/5xx) as
success because curl is invoked with -s -o /dev/null; change the curl invocation
in wait_for_orchestrator (the call hitting "$ORC_URL/api/clusters") to use flags
that cause non-2xx responses to exit nonzero (e.g., replace -s -o /dev/null with
-fsS -o /dev/null or equivalent) so the loop only reports "Orchestrator ready"
on actual HTTP success.
| @@ -0,0 +1,9 @@ | |||
| [mysqld] | |||
| server-id=100 | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Replica server-id definitions:"
rg -n 'server-id\s*=' tests/functional/mysql/*.cnf
echo
echo "Where replica.cnf is mounted:"
rg -n 'replica\.cnf' tests/functional/docker-compose.yml -C2Repository: ProxySQL/orchestrator
Length of output: 638
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check the full service definitions for mysql2 and mysql3
echo "=== Full mysql2 service definition ==="
rg -A 20 'mysql2:' tests/functional/docker-compose.yml
echo
echo "=== Full mysql3 service definition ==="
rg -A 20 'mysql3:' tests/functional/docker-compose.yml
# Check if there are any other replicas
echo
echo "=== All service definitions ==="
rg '^\s+[a-z]+[a-z0-9]*:$' tests/functional/docker-compose.yml | head -20Repository: ProxySQL/orchestrator
Length of output: 2156
Assign unique MySQL server-id values per replica.
Both mysql2 and mysql3 services mount the same replica.cnf with server-id=100, violating MySQL's replication requirement that each server have a unique ID. This causes replication connection conflicts and breaks the topology state needed for failover tests.
- Remove the fixed
server-idfromreplica.cnf - Add unique
server-idper service viadocker-composecommandoverride (e.g.,mysql2: --server-id=2,mysql3: --server-id=3)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/mysql/replica.cnf` at line 2, The replica.cnf currently
hardcodes server-id=100 which causes duplicate IDs; remove the server-id line
from tests/functional/mysql/replica.cnf and instead add per-service command
overrides in the docker-compose service definitions for mysql2 and mysql3 (e.g.,
set --server-id=2 for mysql2 and --server-id=3 for mysql3) so each replica has a
unique MySQL server-id; update the mysql2 and mysql3 service blocks to include
the command override (or equivalent entrypoint args) rather than relying on the
shared replica.cnf.
| for i in $(seq 1 30); do | ||
| REPL_OK=$($COMPOSE exec -T mysql2 mysql -uroot -ptestpass -Nse "SHOW REPLICA STATUS\G" 2>/dev/null | grep "Replica_IO_Running: Yes" | wc -l) | ||
| if [ "$REPL_OK" -ge 1 ]; then | ||
| echo "Replication is running" | ||
| break | ||
| fi | ||
| sleep 1 | ||
| done |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
bash -euo pipefail -c 'x=$(printf "" | grep foo | wc -l); echo "unreached"'Repository: ProxySQL/orchestrator
Length of output: 47
🏁 Script executed:
cd /root/codebase && find . -name "run-all.sh" -type fRepository: ProxySQL/orchestrator
Length of output: 129
🏁 Script executed:
cd /root/codebase && sed -n '1,60p' tests/functional/run-all.sh | cat -nRepository: ProxySQL/orchestrator
Length of output: 129
🏁 Script executed:
cd /root/codebase && head -30 tests/functional/run-all.sh | cat -nRepository: ProxySQL/orchestrator
Length of output: 129
🏁 Script executed:
pwd && git ls-files tests/functional/run-all.shRepository: ProxySQL/orchestrator
Length of output: 111
🏁 Script executed:
cat -n tests/functional/run-all.sh | head -60Repository: ProxySQL/orchestrator
Length of output: 2250
🏁 Script executed:
head -20 tests/functional/run-all.sh && echo "---" && sed -n '40,65p' tests/functional/run-all.sh | cat -nRepository: ProxySQL/orchestrator
Length of output: 1648
🏁 Script executed:
cat tests/functional/docker-compose.yml | grep -E "mysql[0-9]|container_name|MYSQL_REPLICATION" -A 2 -B 2Repository: ProxySQL/orchestrator
Length of output: 973
🏁 Script executed:
rg "mysql3" tests/functional/Repository: ProxySQL/orchestrator
Length of output: 992
🏁 Script executed:
rg "Replica_SQL_Running" tests/functional/Repository: ProxySQL/orchestrator
Length of output: 47
The replication verification loop will exit prematurely on first iteration if replication is not immediately ready due to pipefail behavior.
With set -euo pipefail on line 3, the command substitution fails when grep finds no match in the pipeline, causing the script to exit immediately instead of retrying. Additionally:
- Only mysql2 is verified, but mysql3 (another replica in the topology) is not checked
- Only
Replica_IO_Runningis verified;Replica_SQL_Runningshould also be validated - The loop completes silently without explicit failure if replication never starts, allowing the script to continue to Step 3
Suggested fix
-echo "Verifying replication..."
-for i in $(seq 1 30); do
- REPL_OK=$($COMPOSE exec -T mysql2 mysql -uroot -ptestpass -Nse "SHOW REPLICA STATUS\G" 2>/dev/null | grep "Replica_IO_Running: Yes" | wc -l)
- if [ "$REPL_OK" -ge 1 ]; then
- echo "Replication is running"
- break
- fi
- sleep 1
-done
+echo "Verifying replication..."
+replica_ready() {
+ local container="$1"
+ local status
+ status=$($COMPOSE exec -T "$container" mysql -uroot -ptestpass -Nse "SHOW REPLICA STATUS\G" 2>/dev/null || true)
+ echo "$status" | grep -q "Replica_IO_Running: Yes" &&
+ echo "$status" | grep -q "Replica_SQL_Running: Yes"
+}
+
+for i in $(seq 1 30); do
+ if replica_ready mysql2 && replica_ready mysql3; then
+ echo "Replication is running"
+ break
+ fi
+ if [ "$i" -eq 30 ]; then
+ echo "FATAL: Replication not running on all replicas after 30s"
+ exit 1
+ fi
+ sleep 1
+done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/run-all.sh` around lines 48 - 55, The loop that sets REPL_OK
using command substitution exits under set -euo pipefail when grep finds no
match; change the command to avoid failing the pipeline (e.g., append "|| true"
or use grep -q with || true and capture exit status/output) and parse the mysql
SHOW REPLICA STATUS output safely; extend the check to run against both replicas
(mysql2 and mysql3) and validate both fields Replica_IO_Running and
Replica_SQL_Running (aggregate counts into REPL_OK or separate variables), retry
up to the existing timeout, and if after retries replication is not fully ready,
print a clear error and exit non-zero so the script fails instead of continuing.
Ensure you update the invocation(s) that use $COMPOSE exec -T mysql ... and the
REPL_OK logic accordingly.
| echo "--- Step 4: Run smoke tests ---" | ||
| bash tests/functional/test-smoke.sh | ||
| SMOKE_EXIT=$? | ||
|
|
||
| echo "" | ||
| echo "--- Step 5: Run regression tests ---" | ||
| bash tests/functional/test-regression.sh | ||
| REGRESSION_EXIT=$? | ||
|
|
||
| echo "" | ||
| echo "--- Step 6: Run failover tests ---" | ||
| bash tests/functional/test-failover.sh | ||
| FAILOVER_EXIT=$? |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
bash -euo pipefail -c 'false; echo "unreached"'Repository: ProxySQL/orchestrator
Length of output: 47
🏁 Script executed:
# Check if the file exists and examine the structure
if [ -f tests/functional/run-all.sh ]; then
echo "=== File exists, checking total lines ==="
wc -l tests/functional/run-all.sh
echo ""
echo "=== Checking for set -e/errexit ==="
head -20 tests/functional/run-all.sh
echo ""
echo "=== Lines 68-80 (the section in review) ==="
sed -n '68,80p' tests/functional/run-all.sh
echo ""
echo "=== Checking if there's a summary/cleanup block after line 80 ==="
tail -30 tests/functional/run-all.sh
else
echo "File not found"
fiRepository: ProxySQL/orchestrator
Length of output: 2087
set -e prevents the suite summary and cleanup from running on test failure.
If any test suite (smoke, regression, or failover) exits non-zero, the script terminates immediately before capturing its exit code. The cleanup block (killing ORC_PID, stopping Docker Compose) and the final summary are never reached, leaving infrastructure orphaned. All subsequent test suites also fail to run. Wrap each suite call in conditional statements to allow the script to continue and capture exit codes:
Suggested fix
echo ""
echo "--- Step 4: Run smoke tests ---"
-bash tests/functional/test-smoke.sh
-SMOKE_EXIT=$?
+if bash tests/functional/test-smoke.sh; then
+ SMOKE_EXIT=0
+else
+ SMOKE_EXIT=$?
+fi
echo ""
echo "--- Step 5: Run regression tests ---"
-bash tests/functional/test-regression.sh
-REGRESSION_EXIT=$?
+if bash tests/functional/test-regression.sh; then
+ REGRESSION_EXIT=0
+else
+ REGRESSION_EXIT=$?
+fi
echo ""
echo "--- Step 6: Run failover tests ---"
-bash tests/functional/test-failover.sh
-FAILOVER_EXIT=$?
+if bash tests/functional/test-failover.sh; then
+ FAILOVER_EXIT=0
+else
+ FAILOVER_EXIT=$?
+fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/run-all.sh` around lines 68 - 80, The script currently
aborts early due to set -e when any test script fails, so wrap each test
invocation to prevent immediate exit and still capture its exit code: run each
suite as "bash tests/functional/test-smoke.sh || true" (and similarly for
test-regression.sh and test-failover.sh), then immediately assign the real exit
status to SMOKE_EXIT, REGRESSION_EXIT and FAILOVER_EXIT via "$?" (e.g., bash ...
|| true; SMOKE_EXIT=$?), ensuring the cleanup/summary and trap handlers always
run; keep the rest of the logic unchanged so the final summary uses those
variables.
| # Restore mysql1 as master | ||
| docker compose -f tests/functional/docker-compose.yml exec -T mysql1 \ | ||
| mysql -uroot -ptestpass -e "STOP REPLICA; RESET REPLICA ALL; SET GLOBAL read_only=0;" 2>/dev/null | ||
| docker compose -f tests/functional/docker-compose.yml exec -T mysql2 \ | ||
| mysql -uroot -ptestpass -e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql1', SOURCE_PORT=3306, SOURCE_USER='repl', SOURCE_PASSWORD='repl_pass', SOURCE_AUTO_POSITION=1; START REPLICA; SET GLOBAL read_only=1;" 2>/dev/null | ||
| docker compose -f tests/functional/docker-compose.yml exec -T mysql3 \ | ||
| mysql -uroot -ptestpass -e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql1', SOURCE_PORT=3306, SOURCE_USER='repl', SOURCE_PASSWORD='repl_pass', SOURCE_AUTO_POSITION=1; START REPLICA; SET GLOBAL read_only=1;" 2>/dev/null | ||
|
|
||
| # Reset ProxySQL | ||
| docker compose -f tests/functional/docker-compose.yml exec -T proxysql \ | ||
| mysql -h127.0.0.1 -P6032 -uradmin -pradmin -e \ | ||
| "DELETE FROM mysql_servers WHERE hostgroup_id IN (10,20); INSERT INTO mysql_servers (hostgroup_id,hostname,port) VALUES (10,'mysql1',3306),(20,'mysql2',3306),(20,'mysql3',3306); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;" 2>/dev/null | ||
|
|
||
| # Re-discover after topology change | ||
| sleep 5 | ||
| curl -s "$ORC_URL/api/discover/mysql1/3306" > /dev/null | ||
| curl -s "$ORC_URL/api/discover/mysql2/3306" > /dev/null | ||
| curl -s "$ORC_URL/api/discover/mysql3/3306" > /dev/null | ||
| sleep 15 | ||
|
|
||
| echo "Topology restored, waiting for orchestrator to stabilize..." | ||
| pass "Topology restored for hard failover test" |
There was a problem hiding this comment.
Validate the restored topology before starting the hard-failover case.
This section records PASS after a couple of sleeps, but none of the MySQL or ProxySQL rewiring is actually verified. If restore drifted, the DeadMaster test starts from the wrong topology and its outcome is misleading. Re-check tests/functional/lib.sh helpers mysql_read_only, mysql_source_host, and proxysql_servers here before continuing.
Suggested fix
# Re-discover after topology change
sleep 5
curl -s "$ORC_URL/api/discover/mysql1/3306" > /dev/null
curl -s "$ORC_URL/api/discover/mysql2/3306" > /dev/null
curl -s "$ORC_URL/api/discover/mysql3/3306" > /dev/null
sleep 15
echo "Topology restored, waiting for orchestrator to stabilize..."
-pass "Topology restored for hard failover test"
+RO1=$(mysql_read_only mysql1)
+RO2=$(mysql_read_only mysql2)
+RO3=$(mysql_read_only mysql3)
+SRC2=$(mysql_source_host mysql2)
+SRC3=$(mysql_source_host mysql3)
+HG10=$(proxysql_servers 10)
+HG20=$(proxysql_servers 20)
+
+if [ "$RO1" = "0" ] && [ "$RO2" = "1" ] && [ "$RO3" = "1" ] && \
+ [ "$SRC2" = "mysql1" ] && [ "$SRC3" = "mysql1" ] && \
+ echo "$HG10" | grep -q "mysql1" && \
+ echo "$HG20" | grep -q "mysql2" && \
+ echo "$HG20" | grep -q "mysql3"; then
+ pass "Topology restored for hard failover test"
+else
+ fail "Topology restore validation failed" \
+ "RO=$RO1/$RO2/$RO3 SRC=$SRC2/$SRC3 HG10=$HG10 HG20=$HG20"
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Restore mysql1 as master | |
| docker compose -f tests/functional/docker-compose.yml exec -T mysql1 \ | |
| mysql -uroot -ptestpass -e "STOP REPLICA; RESET REPLICA ALL; SET GLOBAL read_only=0;" 2>/dev/null | |
| docker compose -f tests/functional/docker-compose.yml exec -T mysql2 \ | |
| mysql -uroot -ptestpass -e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql1', SOURCE_PORT=3306, SOURCE_USER='repl', SOURCE_PASSWORD='repl_pass', SOURCE_AUTO_POSITION=1; START REPLICA; SET GLOBAL read_only=1;" 2>/dev/null | |
| docker compose -f tests/functional/docker-compose.yml exec -T mysql3 \ | |
| mysql -uroot -ptestpass -e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql1', SOURCE_PORT=3306, SOURCE_USER='repl', SOURCE_PASSWORD='repl_pass', SOURCE_AUTO_POSITION=1; START REPLICA; SET GLOBAL read_only=1;" 2>/dev/null | |
| # Reset ProxySQL | |
| docker compose -f tests/functional/docker-compose.yml exec -T proxysql \ | |
| mysql -h127.0.0.1 -P6032 -uradmin -pradmin -e \ | |
| "DELETE FROM mysql_servers WHERE hostgroup_id IN (10,20); INSERT INTO mysql_servers (hostgroup_id,hostname,port) VALUES (10,'mysql1',3306),(20,'mysql2',3306),(20,'mysql3',3306); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;" 2>/dev/null | |
| # Re-discover after topology change | |
| sleep 5 | |
| curl -s "$ORC_URL/api/discover/mysql1/3306" > /dev/null | |
| curl -s "$ORC_URL/api/discover/mysql2/3306" > /dev/null | |
| curl -s "$ORC_URL/api/discover/mysql3/3306" > /dev/null | |
| sleep 15 | |
| echo "Topology restored, waiting for orchestrator to stabilize..." | |
| pass "Topology restored for hard failover test" | |
| # Restore mysql1 as master | |
| docker compose -f tests/functional/docker-compose.yml exec -T mysql1 \ | |
| mysql -uroot -ptestpass -e "STOP REPLICA; RESET REPLICA ALL; SET GLOBAL read_only=0;" 2>/dev/null | |
| docker compose -f tests/functional/docker-compose.yml exec -T mysql2 \ | |
| mysql -uroot -ptestpass -e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql1', SOURCE_PORT=3306, SOURCE_USER='repl', SOURCE_PASSWORD='repl_pass', SOURCE_AUTO_POSITION=1; START REPLICA; SET GLOBAL read_only=1;" 2>/dev/null | |
| docker compose -f tests/functional/docker-compose.yml exec -T mysql3 \ | |
| mysql -uroot -ptestpass -e "STOP REPLICA; CHANGE REPLICATION SOURCE TO SOURCE_HOST='mysql1', SOURCE_PORT=3306, SOURCE_USER='repl', SOURCE_PASSWORD='repl_pass', SOURCE_AUTO_POSITION=1; START REPLICA; SET GLOBAL read_only=1;" 2>/dev/null | |
| # Reset ProxySQL | |
| docker compose -f tests/functional/docker-compose.yml exec -T proxysql \ | |
| mysql -h127.0.0.1 -P6032 -uradmin -pradmin -e \ | |
| "DELETE FROM mysql_servers WHERE hostgroup_id IN (10,20); INSERT INTO mysql_servers (hostgroup_id,hostname,port) VALUES (10,'mysql1',3306),(20,'mysql2',3306),(20,'mysql3',3306); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;" 2>/dev/null | |
| # Re-discover after topology change | |
| sleep 5 | |
| curl -s "$ORC_URL/api/discover/mysql1/3306" > /dev/null | |
| curl -s "$ORC_URL/api/discover/mysql2/3306" > /dev/null | |
| curl -s "$ORC_URL/api/discover/mysql3/3306" > /dev/null | |
| sleep 15 | |
| echo "Topology restored, waiting for orchestrator to stabilize..." | |
| RO1=$(mysql_read_only mysql1) | |
| RO2=$(mysql_read_only mysql2) | |
| RO3=$(mysql_read_only mysql3) | |
| SRC2=$(mysql_source_host mysql2) | |
| SRC3=$(mysql_source_host mysql3) | |
| HG10=$(proxysql_servers 10) | |
| HG20=$(proxysql_servers 20) | |
| if [ "$RO1" = "0" ] && [ "$RO2" = "1" ] && [ "$RO3" = "1" ] && \ | |
| [ "$SRC2" = "mysql1" ] && [ "$SRC3" = "mysql1" ] && \ | |
| echo "$HG10" | grep -q "mysql1" && \ | |
| echo "$HG20" | grep -q "mysql2" && \ | |
| echo "$HG20" | grep -q "mysql3"; then | |
| pass "Topology restored for hard failover test" | |
| else | |
| fail "Topology restore validation failed" \ | |
| "RO=$RO1/$RO2/$RO3 SRC=$SRC2/$SRC3 HG10=$HG10 HG20=$HG20" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-failover.sh` around lines 78 - 99, The script restores
MySQL and ProxySQL but doesn't verify the resulting topology; add explicit
checks using the helper functions mysql_read_only, mysql_source_host, and
proxysql_servers (from tests/functional/lib.sh) before declaring PASS and
proceeding to the hard-failover case. After the discovery curls and sleep, call
mysql_read_only(mysql1) to assert mysql1 is writable (read_only=0),
mysql_read_only(mysql2/mysql3) to assert they are read-only, use
mysql_source_host(mysql2) and mysql_source_host(mysql3) to assert their source
is mysql1, and use proxysql_servers() (or proxysql_servers for hostgroup 10/20)
to verify ProxySQL backends were applied; if any check fails, print diagnostics
and exit non‑zero instead of passing.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/functional.yml (1)
56-61: Consider polling orchestrator health instead of fixed sleep.A 3-second sleep may be insufficient if orchestrator takes longer to initialize (database setup, port binding, etc.). A health check poll would be more reliable.
♻️ Example: Poll health endpoint
bin/orchestrator -config tests/functional/orchestrator-test.conf.json http > /tmp/orchestrator-test.log 2>&1 & echo "$!" > /tmp/orchestrator-test.pid - sleep 3 + for i in $(seq 1 30); do + curl -sf http://localhost:3000/api/health >/dev/null && break + sleep 1 + doneAdjust the port and endpoint to match your orchestrator configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/functional.yml around lines 56 - 61, Replace the fixed "sleep 3" wait with a health-check polling loop that calls the orchestrator's HTTP health endpoint (after starting bin/orchestrator -config tests/functional/orchestrator-test.conf.json http) until it returns success or a timeout is reached; ensure you still write the PID to /tmp/orchestrator-test.pid, stream or capture /tmp/orchestrator-test.log for debug on failure, and fail the workflow if the health check times out. Use the started process (bin/orchestrator) and its HTTP health endpoint (adjust port/path as needed) for polling with a short interval and a reasonable overall timeout so tests only proceed once the orchestrator is actually ready.tests/functional/setup-replication.sh (1)
20-20: Consider preserving stderr for debugging.Suppressing stderr during retries hides diagnostic information (like the authentication error that caused the pipeline failure). Consider logging to a temp file or only suppressing on non-final attempts.
♻️ Example: Log errors on final retry
- " 2>/dev/null && break + " 2>&1 | { [ "$i" -lt 30 ] && cat >/dev/null || cat; } && breakOr simpler—just remove
2>/dev/nulland accept the noise during retries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/functional/setup-replication.sh` at line 20, The retry loop in setup-replication.sh suppresses stderr with the redirection "2>/dev/null && break", hiding useful diagnostics; update the loop so stderr is preserved for the final attempt (e.g., write stderr to a temp file and only discard it on non-final retries, or remove the "2>/dev/null" entirely) so that authentication and other errors are visible when retries fail; locate the retry construct containing the literal "2>/dev/null && break" and modify the redirection logic to log or retain stderr on the final try.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/functional/setup-replication.sh`:
- Around line 12-22: The replication setup fails due to caching_sha2_password
requiring RSA public key exchange; update the CHANGE REPLICATION SOURCE TO
statement used in the setup loop to include SOURCE_PUBLIC_KEY='...' by adding
the MySQL option GET_SOURCE_PUBLIC_KEY=1 (or the equivalent SOURCE_PUBLIC_KEY
setting) so the replica can fetch the source public key for authentication;
modify the block that runs CHANGE REPLICATION SOURCE TO / START REPLICA (the
command that sets
SOURCE_HOST/SOURCE_PORT/SOURCE_USER/SOURCE_PASSWORD/SOURCE_AUTO_POSITION) to
include the public-key option so START REPLICA succeeds without Last_IO_Errno
2061.
---
Nitpick comments:
In @.github/workflows/functional.yml:
- Around line 56-61: Replace the fixed "sleep 3" wait with a health-check
polling loop that calls the orchestrator's HTTP health endpoint (after starting
bin/orchestrator -config tests/functional/orchestrator-test.conf.json http)
until it returns success or a timeout is reached; ensure you still write the PID
to /tmp/orchestrator-test.pid, stream or capture /tmp/orchestrator-test.log for
debug on failure, and fail the workflow if the health check times out. Use the
started process (bin/orchestrator) and its HTTP health endpoint (adjust
port/path as needed) for polling with a short interval and a reasonable overall
timeout so tests only proceed once the orchestrator is actually ready.
In `@tests/functional/setup-replication.sh`:
- Line 20: The retry loop in setup-replication.sh suppresses stderr with the
redirection "2>/dev/null && break", hiding useful diagnostics; update the loop
so stderr is preserved for the final attempt (e.g., write stderr to a temp file
and only discard it on non-final retries, or remove the "2>/dev/null" entirely)
so that authentication and other errors are visible when retries fail; locate
the retry construct containing the literal "2>/dev/null && break" and modify the
redirection logic to log or retain stderr on the final try.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dc0cdc68-486b-4fe1-b9c4-e9db66e395f1
📒 Files selected for processing (5)
.github/workflows/functional.ymltests/functional/docker-compose.ymltests/functional/mysql/init-replica.sqltests/functional/mysql/replica2.cnftests/functional/setup-replication.sh
✅ Files skipped from review due to trivial changes (2)
- tests/functional/mysql/replica2.cnf
- tests/functional/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/functional/mysql/init-replica.sql
…on SHOW REPLICA STATUS
…scovery - Remove set -e from test scripts (tests handle failures themselves) - Discover topology dynamically (cluster name varies by environment) - Seed all replicas during discovery, not just master - Use CLUSTER_NAME variable throughout instead of hardcoded names - Failover test uses dynamic cluster name for takeover API
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
tests/functional/test-failover.sh (1)
79-99:⚠️ Potential issue | 🟠 MajorRestore phase still reports PASS without validating restored topology.
This is the same unresolved issue previously flagged: the script can proceed to hard-failover from an unverified baseline.
Proposed fix
# Re-discover after topology change sleep 5 curl -s "$ORC_URL/api/discover/mysql1/3306" > /dev/null curl -s "$ORC_URL/api/discover/mysql2/3306" > /dev/null curl -s "$ORC_URL/api/discover/mysql3/3306" > /dev/null sleep 15 echo "Topology restored, waiting for orchestrator to stabilize..." -pass "Topology restored for hard failover test" +RO1=$(mysql_read_only mysql1) +RO2=$(mysql_read_only mysql2) +RO3=$(mysql_read_only mysql3) +SRC2=$(mysql_source_host mysql2) +SRC3=$(mysql_source_host mysql3) +HG10=$(proxysql_servers 10) +HG20=$(proxysql_servers 20) + +if [ "$RO1" = "0" ] && [ "$RO2" = "1" ] && [ "$RO3" = "1" ] && \ + [ "$SRC2" = "mysql1" ] && [ "$SRC3" = "mysql1" ] && \ + echo "$HG10" | grep -q "mysql1" && \ + echo "$HG20" | grep -q "mysql2" && \ + echo "$HG20" | grep -q "mysql3"; then + pass "Topology restored for hard failover test" +else + fail "Topology restore validation failed" "RO=$RO1/$RO2/$RO3 SRC=$SRC2/$SRC3 HG10=$HG10 HG20=$HG20" + summary + exit 1 +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-failover.sh` around lines 79 - 99, The script currently calls the discovery endpoints (curl .../api/discover/...) and then immediately reports success with pass "Topology restored for hard failover test" without validating the actual cluster state; add a verification step after the discovery/sleep block and before the pass call that queries the orchestrator/topology API (or MySQL status) to assert mysql1 is PRIMARY and mysql2/mysql3 are replicas with replication running and read_only=1, and fail the test (exit non-zero or call the existing fail helper) if any check fails; locate where the discovery curls and the pass "Topology restored for hard failover test" are used and add these checks there.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/functional/test-failover.sh`:
- Line 4: The script performs cd "$(dirname "$0")/../.." without handling
failures, which can cause subsequent relative-path operations to run in the
wrong directory; update the test-failover.sh entry so the repository-root cd is
fail-fast by checking its exit status or enabling errexit (e.g., use a
conditional or set -e) so that if cd "$(dirname "$0")/../.." fails the script
exits immediately and does not continue with invalid relative paths.
In `@tests/functional/test-regression.sh`:
- Line 4: The script currently runs cd "$(dirname "$0")/../.." without checking
its result; update tests/functional/test-regression.sh to explicitly handle a
failing cd by checking its exit status (e.g., test the return value or use ||)
and print a clear error and exit non‑zero if the cd fails before any further
relative path operations or sourcing; reference the existing cd command in the
top of the script and ensure the script halts immediately on failure to avoid
running subsequent commands in the wrong directory.
In `@tests/functional/test-smoke.sh`:
- Line 4: The repository-root cd command (cd "$(dirname "$0")/../..") is
unguarded and can leave the script running from the wrong directory; update that
invocation to fail fast by appending a guard such as || exit 1 (or || { echo
"failed to change directory"; exit 1; }) so the script stops if the chdir fails
and subsequent path-sensitive checks run from the intended repository root.
- Around line 27-31: The smoke test currently treats INST_COUNT >= 2 as success
which is too low for the intended 3‑node topology; update the conditional that
checks INST_COUNT (the curl + python3 JSON length pipeline assigning INST_COUNT)
so it requires at least 3 (e.g., change the numeric comparator from 2 to 3) and
adjust the pass/fail messages produced by pass and fail to reflect the new
expected minimum of 3 instances; keep using the same variables (INST_COUNT,
ORC_URL, CLUSTER_NAME) and the existing pass/fail helpers.
---
Duplicate comments:
In `@tests/functional/test-failover.sh`:
- Around line 79-99: The script currently calls the discovery endpoints (curl
.../api/discover/...) and then immediately reports success with pass "Topology
restored for hard failover test" without validating the actual cluster state;
add a verification step after the discovery/sleep block and before the pass call
that queries the orchestrator/topology API (or MySQL status) to assert mysql1 is
PRIMARY and mysql2/mysql3 are replicas with replication running and read_only=1,
and fail the test (exit non-zero or call the existing fail helper) if any check
fails; locate where the discovery curls and the pass "Topology restored for hard
failover test" are used and add these checks there.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2a3e65ba-13dc-4c1a-b713-2e394cc66c8b
📒 Files selected for processing (5)
tests/functional/lib.shtests/functional/setup-replication.shtests/functional/test-failover.shtests/functional/test-regression.shtests/functional/test-smoke.sh
✅ Files skipped from review due to trivial changes (1)
- tests/functional/lib.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/functional/setup-replication.sh
| #!/bin/bash | ||
| # Tier B: Failover tests — verify failover and ProxySQL hooks against real services | ||
| set -uo pipefail # no -e: we handle failures ourselves | ||
| cd "$(dirname "$0")/../.." |
There was a problem hiding this comment.
Add fail-fast handling for repository-root cd.
Continuing after a failed directory change can invalidate all subsequent relative-path commands.
Proposed fix
-cd "$(dirname "$0")/../.."
+cd "$(dirname "$0")/../.." || exit 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd "$(dirname "$0")/../.." | |
| cd "$(dirname "$0")/../.." || exit 1 |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 4-4: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-failover.sh` at line 4, The script performs cd
"$(dirname "$0")/../.." without handling failures, which can cause subsequent
relative-path operations to run in the wrong directory; update the
test-failover.sh entry so the repository-root cd is fail-fast by checking its
exit status or enabling errexit (e.g., use a conditional or set -e) so that if
cd "$(dirname "$0")/../.." fails the script exits immediately and does not
continue with invalid relative paths.
| #!/bin/bash | ||
| # Tier C: Regression tests — verify all API endpoints and features | ||
| set -uo pipefail # no -e: we handle failures ourselves | ||
| cd "$(dirname "$0")/../.." |
There was a problem hiding this comment.
Handle cd failure explicitly before sourcing project-relative paths.
If this cd fails, subsequent relative paths run in the wrong directory and produce misleading failures.
Proposed fix
-cd "$(dirname "$0")/../.."
+cd "$(dirname "$0")/../.." || exit 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd "$(dirname "$0")/../.." | |
| cd "$(dirname "$0")/../.." || exit 1 |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 4-4: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-regression.sh` at line 4, The script currently runs cd
"$(dirname "$0")/../.." without checking its result; update
tests/functional/test-regression.sh to explicitly handle a failing cd by
checking its exit status (e.g., test the return value or use ||) and print a
clear error and exit non‑zero if the cd fails before any further relative path
operations or sourcing; reference the existing cd command in the top of the
script and ensure the script halts immediately on failure to avoid running
subsequent commands in the wrong directory.
| #!/bin/bash | ||
| # Tier A: Smoke tests — verify basic functionality against real services | ||
| set -uo pipefail # no -e: we handle failures ourselves | ||
| cd "$(dirname "$0")/../.." |
There was a problem hiding this comment.
Guard the repository-root cd with || exit.
Without this, path-sensitive checks may execute from the wrong working directory.
Proposed fix
-cd "$(dirname "$0")/../.."
+cd "$(dirname "$0")/../.." || exit 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd "$(dirname "$0")/../.." | |
| cd "$(dirname "$0")/../.." || exit 1 |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 4-4: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-smoke.sh` at line 4, The repository-root cd command (cd
"$(dirname "$0")/../..") is unguarded and can leave the script running from the
wrong directory; update that invocation to fail fast by appending a guard such
as || exit 1 (or || { echo "failed to change directory"; exit 1; }) so the
script stops if the chdir fails and subsequent path-sensitive checks run from
the intended repository root.
| INST_COUNT=$(curl -s "$ORC_URL/api/cluster/$CLUSTER_NAME" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo "0") | ||
| if [ "$INST_COUNT" -ge 2 ]; then | ||
| pass "Instances discovered: $INST_COUNT" | ||
| else | ||
| fail "Instances discovered: $INST_COUNT (expected >= 2)" |
There was a problem hiding this comment.
Smoke discovery threshold is too weak for the intended 3-node topology.
This currently passes with only 2 discovered instances, which can mask partial topology failures in Tier A.
Proposed fix
-INST_COUNT=$(curl -s "$ORC_URL/api/cluster/$CLUSTER_NAME" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo "0")
-if [ "$INST_COUNT" -ge 2 ]; then
+INST_COUNT=$(curl -s "$ORC_URL/api/cluster/$CLUSTER_NAME" 2>/dev/null | python3 -c "import json,sys; print(len(json.load(sys.stdin)))" 2>/dev/null || echo "0")
+if [ "$INST_COUNT" -ge 3 ]; then
pass "Instances discovered: $INST_COUNT"
else
- fail "Instances discovered: $INST_COUNT (expected >= 2)"
+ fail "Instances discovered: $INST_COUNT (expected >= 3)"
fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-smoke.sh` around lines 27 - 31, The smoke test
currently treats INST_COUNT >= 2 as success which is too low for the intended
3‑node topology; update the conditional that checks INST_COUNT (the curl +
python3 JSON length pipeline assigning INST_COUNT) so it requires at least 3
(e.g., change the numeric comparator from 2 to 3) and adjust the pass/fail
messages produced by pass and fail to reflect the new expected minimum of 3
instances; keep using the same variables (INST_COUNT, ORC_URL, CLUSTER_NAME) and
the existing pass/fail helpers.
Orchestrator needs to reach MySQL containers by their Docker hostnames (mysql1, mysql2, mysql3). Running it on the host requires /etc/hosts hacks. Solution: Add orchestrator as a Docker Compose service that: - Builds from source using golang:1.25.7 image - Joins the same network as MySQL and ProxySQL - Port-forwards 3099 to host for test scripts Also: expose MySQL and ProxySQL ports for host-side CLI tests.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/functional/test-smoke.sh`:
- Line 11: The call to discover_topology currently ignores its exit code so
failures are swallowed; update the invocation to check its return status and
fail the test early (e.g., run discover_topology "mysql1" and if it returns
non-zero then log an error and exit non‑zero). Specifically, after calling
discover_topology, test its exit code (or use `||` to short-circuit) and call
the existing test-failure/cleanup routine or `exit 1` so the suite stops on
discovery timeout; ensure you reference the discover_topology invocation and use
the same logging/teardown helpers used elsewhere in the script.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c9d952b2-1693-48dd-889c-bf252f022fc4
📒 Files selected for processing (3)
.github/workflows/functional.ymltests/functional/docker-compose.ymltests/functional/test-smoke.sh
✅ Files skipped from review due to trivial changes (1)
- tests/functional/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/functional.yml
|
|
||
| # Prerequisites | ||
| wait_for_orchestrator || { echo "FATAL: Orchestrator not reachable"; exit 1; } | ||
| discover_topology "mysql1" |
There was a problem hiding this comment.
Handle discover_topology failure explicitly.
Line 11 ignores the function’s exit code. If discovery times out, the suite continues and can produce misleading downstream results instead of recording the root failure.
Suggested fix
-discover_topology "mysql1"
+if ! discover_topology "mysql1"; then
+ fail "Topology discovery timed out" "Expected 3 discovered instances within timeout"
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| discover_topology "mysql1" | |
| if ! discover_topology "mysql1"; then | |
| fail "Topology discovery timed out" "Expected 3 discovered instances within timeout" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-smoke.sh` at line 11, The call to discover_topology
currently ignores its exit code so failures are swallowed; update the invocation
to check its return status and fail the test early (e.g., run discover_topology
"mysql1" and if it returns non-zero then log an error and exit non‑zero).
Specifically, after calling discover_topology, test its exit code (or use `||`
to short-circuit) and call the existing test-failure/cleanup routine or `exit 1`
so the suite stops on discovery timeout; ensure you reference the
discover_topology invocation and use the same logging/teardown helpers used
elsewhere in the script.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/functional/test-regression.sh (1)
30-35: Consider extracting the HTTP status code check into a reusable helper.The manual curl call for the 404 test works correctly, but if similar error-code validations are needed elsewhere, a helper like
test_status_codeinlib.shcould improve consistency. This is a minor observation; the current implementation is functional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/functional/test-regression.sh` around lines 30 - 35, Extract the repeated HTTP status check into a reusable helper (e.g., add function test_status_code in lib.sh) and replace the inline curl+comparison in the test with a call to that helper; specifically, move the curl invocation (currently assigned to V2_404 and compared to "404") into test_status_code which accepts a URL and expected code, and update this test to call test_status_code "$ORC_URL/api/v2/instances/nonexistent/9999" 404 so the logic is centralized and can be reused elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/functional/test-failover.sh`:
- Around line 152-158: The current grep against RECOVERY_API for the literal
string '"IsSuccessful":true' is fragile to JSON whitespace; replace the grep
check with a small Python JSON parse that reads RECOVERY_API and tests whether
any object in the response has IsSuccessful == True (use the RECOVERY_API
variable, parse it with json.loads, and exit 0/1 accordingly), and then use that
exit code to call pass/fail; locate the block using the RECOVERY_API variable
and the recovery API check to implement this change.
---
Nitpick comments:
In `@tests/functional/test-regression.sh`:
- Around line 30-35: Extract the repeated HTTP status check into a reusable
helper (e.g., add function test_status_code in lib.sh) and replace the inline
curl+comparison in the test with a call to that helper; specifically, move the
curl invocation (currently assigned to V2_404 and compared to "404") into
test_status_code which accepts a URL and expected code, and update this test to
call test_status_code "$ORC_URL/api/v2/instances/nonexistent/9999" 404 so the
logic is centralized and can be reused elsewhere.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 451f234a-dbf6-40ba-8557-61a142374d92
📒 Files selected for processing (3)
tests/functional/test-failover.shtests/functional/test-regression.shtests/functional/test-smoke.sh
| # Check recovery via API | ||
| RECOVERY_API=$(curl -s "$ORC_URL/api/v2/recoveries" 2>/dev/null) | ||
| if echo "$RECOVERY_API" | grep -q '"IsSuccessful":true'; then | ||
| pass "Recovery audit: /api/v2/recoveries shows successful recovery" | ||
| else | ||
| fail "Recovery audit: no successful recovery in API response" | ||
| fi |
There was a problem hiding this comment.
JSON grep pattern may be fragile with whitespace.
The grep for "IsSuccessful":true on line 154 will fail if the JSON response is pretty-printed with spaces (e.g., "IsSuccessful": true). Since Python is already available and used elsewhere in this script, consider using it for reliable JSON parsing.
Proposed fix
# Check recovery via API
RECOVERY_API=$(curl -s "$ORC_URL/api/v2/recoveries" 2>/dev/null)
-if echo "$RECOVERY_API" | grep -q '"IsSuccessful":true'; then
+HAS_SUCCESS=$(echo "$RECOVERY_API" | python3 -c "
+import json, sys
+d = json.load(sys.stdin)
+for r in d.get('data', []):
+ if r.get('IsSuccessful'):
+ print('YES')
+ sys.exit(0)
+print('NO')
+" 2>/dev/null)
+if [ "$HAS_SUCCESS" = "YES" ]; then
pass "Recovery audit: /api/v2/recoveries shows successful recovery"
else
fail "Recovery audit: no successful recovery in API response"
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Check recovery via API | |
| RECOVERY_API=$(curl -s "$ORC_URL/api/v2/recoveries" 2>/dev/null) | |
| if echo "$RECOVERY_API" | grep -q '"IsSuccessful":true'; then | |
| pass "Recovery audit: /api/v2/recoveries shows successful recovery" | |
| else | |
| fail "Recovery audit: no successful recovery in API response" | |
| fi | |
| # Check recovery via API | |
| RECOVERY_API=$(curl -s "$ORC_URL/api/v2/recoveries" 2>/dev/null) | |
| HAS_SUCCESS=$(echo "$RECOVERY_API" | python3 -c " | |
| import json, sys | |
| d = json.load(sys.stdin) | |
| for r in d.get('data', []): | |
| if r.get('IsSuccessful'): | |
| print('YES') | |
| sys.exit(0) | |
| print('NO') | |
| " 2>/dev/null) | |
| if [ "$HAS_SUCCESS" = "YES" ]; then | |
| pass "Recovery audit: /api/v2/recoveries shows successful recovery" | |
| else | |
| fail "Recovery audit: no successful recovery in API response" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/functional/test-failover.sh` around lines 152 - 158, The current grep
against RECOVERY_API for the literal string '"IsSuccessful":true' is fragile to
JSON whitespace; replace the grep check with a small Python JSON parse that
reads RECOVERY_API and tests whether any object in the response has IsSuccessful
== True (use the RECOVERY_API variable, parse it with json.loads, and exit 0/1
accordingly), and then use that exit code to call pass/fail; locate the block
using the RECOVERY_API variable and the recovery API check to implement this
change.
Closes #46, closes #47, closes #48, closes #49, closes #50
Summary
Every manual test from the QA session is now automated and runs in GitHub Actions.
Infrastructure (
tests/functional/docker-compose.yml)Test Suites
Tier A — Smoke (
test-smoke.sh): 25+ checks/metrics/health/live,/health/ready,/health/leader)proxysql-test,proxysql-servers)/api/proxysql/servers)Tier B — Failover (
test-failover.sh): 12+ checksdocker stop mysql1, verify detection + promotion + ProxySQL update/api/v2/recoveriesTier C — Regression (
test-regression.sh): 22+ checksGitHub Actions (
functional.yml)Test plan
Summary by CodeRabbit
Tests
Chores