Skip to content

fix: resolve all CI failures#81

Merged
renecannao merged 12 commits intomasterfrom
fix/ci-failures
Apr 9, 2026
Merged

fix: resolve all CI failures#81
renecannao merged 12 commits intomasterfrom
fix/ci-failures

Conversation

@renecannao
Copy link
Copy Markdown

@renecannao renecannao commented Apr 9, 2026

Summary

Fixes all remaining CI failures across integration tests and ProxySQL tests.

Fixes:

  1. Percona Server: SELECT VERSION() doesn't contain "Percona" — changed to SELECT @@version_comment
  2. MariaDB: Added @@version_comment fallback for robustness
  3. ts PostgreSQL: Added PGDG apt repo setup (was missing from ts-replication-test job, only existed in postgresql-test job)
  4. Fan-in: Increased retry from 20s to 30s for multi-source replication convergence
  5. InnoDB Cluster: Print init_cluster.log on deploy failure for debugging
  6. ProxySQL R/W split: Added LOAD MYSQL SERVERS TO RUNTIME, switched to stats_mysql_query_digest for reliable counting, added settle time

Previously fixed in this branch:

  • Retry loops (10×2s) replacing all sleep+check patterns (18 instances)
  • Symlink $HOME/opt/mysql for ts tests (HOME path mismatch)

Summary by CodeRabbit

  • Chores

    • Improved test/setup robustness: broader error tolerance, explicit pre-test cleanup, added small delays and longer retries, PostgreSQL repo bootstrapping, and updated MariaDB version/unpack handling.
    • Installed required system libraries and ensured MySQL Shell runtime libs are available to sandboxes.
  • Bug Fixes

    • More reliable version detection and ProxySQL routing checks (switched stats metric); made replication/read probes non-fatal and fail fast on InnoDB Cluster deployment errors.

…, ProxySQL R/W split)

- Percona Server: use SELECT @@version_comment instead of SELECT VERSION() to grep for 'percona' (VERSION() returns bare "8.0.36-28")
- MariaDB: add @@version_comment fallback in case SELECT VERSION() grep fails
- ts-replication-test: add PGDG apt repo setup before installing postgresql-16 (repo was only added in the postgresql-test job)
- Fan-in topology: increase replication wait from 10x2s to 15x2s (30s) for multi-source replication
- InnoDB Cluster: add error handler on all three deploy steps to print init_cluster.log on failure
- ProxySQL R/W split: add LOAD MYSQL SERVERS TO RUNTIME, sleep before test queries, switch from stats_mysql_connection_pool to stats_mysql_query_digest for reliable query counting
@gemini-code-assist
Copy link
Copy Markdown

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adjusted CI workflows and sandbox cluster templates: improved Percona/MariaDB detection and unpacking, hardened replica/read polling to be non-fatal, changed ProxySQL rule/stats handling, updated InnoDB Cluster bootstrap and mysqlsh library resolution, and added PostgreSQL test setup/cleanup steps.

Changes

Cohort / File(s) Summary
Integration tests workflow
/.github/workflows/integration_tests.yml
Switched Percona/MariaDB version checks to @@version_comment, removed a MariaDB matrix entry, added MariaDB unpack options, installed libprotobuf-lite23, copied mysqlsh runtime dirs into sandbox, made many use/query captures non-fatal (`
ProxySQL tests workflow
/.github/workflows/proxysql_integration_tests.yml
Added pre R/W-split cleanup, split use -e into discrete calls, added LOAD MYSQL SERVERS TO RUNTIME; and sleeps, switched metrics checks to stats_mysql_query_digest SUM(count_star), and made some probe assignments non-fatal (`
Sandbox cluster templates
sandbox/templates/cluster/check_cluster.gotxt, sandbox/templates/cluster/init_cluster.gotxt
Prepends {{.Basedir}}/lib and {{.Basedir}}/lib/mysqlsh to LD_LIBRARY_PATH; sets MYSQL={{.Basedir}}/bin/mysql; replaces some configure-flow steps with mysqlsh-based cluster init and per-instance mysqlsh dba configure-instance --restart=false; added earlier ResetMasterCmd calls and adjusted cluster creation call.
InnoDB cluster code
sandbox/innodb_cluster.go
Exposes Basedir in template data and changes startup flow to prefer the mysqlsh-based ScriptInitCluster when not skipping start; Router bootstrap remains conditional.
Workflow deploy failure handling
(workflow steps invoking dbdeployer deploy replication ... --topology=innodb-cluster --nodes=3)
On InnoDB Cluster init failures for Router/skip-router/ProxySQL variants, fail fast and print init_cluster.log to surface errors.
General polling/read probes
... (various workflow shell steps and verify scripts)
Appended `

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I nudge the LD path, coax logs to glow,

I pause between rules so the queries flow,
Replicas whisper retries, non-fatal and true,
Clusters print their secrets and mysqlsh finds glue,
A hop, a patch — CI hums, tests waltz on through.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title "fix: resolve all CI failures" is vague and generic, using non-descriptive language that doesn't convey the specific changes made (InnoDB Cluster fixes, MariaDB version checks, ProxySQL stats adjustments, PostgreSQL setup, etc.). Consider a more specific title that highlights the primary technical change, such as "fix: resolve InnoDB Cluster and ProxySQL integration test failures" or "fix: stabilize CI integration tests with mysqlsh runtime fixes."
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ci-failures

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

GitHub Actions run: blocks use set -e implicitly. When mysql returns
non-zero inside a retry loop (e.g., database doesn't exist yet),
the entire step exits before the retry can continue.

Added || true to all 19 RESULT=$() assignments inside retry loops
so the non-zero exit code doesn't propagate.
… continue-on-error

- Fan-in: node2 needs its own CREATE DATABASE (independent master)
- PostgreSQL multiple: fixed sandbox dir name (postgresql_multi_ not multi_msb_),
  added cleanup between replication and multiple tests
- InnoDB Cluster: marked continue-on-error (mysqlsh needs libprotobuf)
- ProxySQL R/W split: marked continue-on-error (admin query issue)
- MariaDB: switched from 11.4.5 to 11.4.9
- Added libprotobuf-lite23 to InnoDB Cluster system libraries
…ng CI failures

InnoDB Cluster:
- Copy mysqlsh bundled libraries (libprotobuf-lite) to sandbox lib dir
- Set LD_LIBRARY_PATH in init_cluster and check_cluster templates
- Remove continue-on-error — these are real tests

ProxySQL R/W split:
- Split multi-statement SQL into individual -e calls (admin interface)
- Add cleanup between main ProxySQL test and R/W split test
- Remove continue-on-error

Fan-in: CREATE DATABASE on node2 (independent master)
PostgreSQL multiple: correct dir name, cleanup between steps
MariaDB: use 11.4.9 instead of 11.4.5
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/proxysql_integration_tests.yml:
- Around line 147-150: The final ProxySQL read should be retried against the
reader hostgroup until the replica has applied the write; replace the
single-shot command that sets RESULT via ${SANDBOX_DIR}/proxysql/use_proxy -BN
-e "SELECT val FROM rw_split_test.t1 WHERE val='rw_split_write_1';" with the
same retry loop pattern used elsewhere: repeatedly invoke use_proxy (targeting
the reader hostgroup as the rule does), capture and filter its output into
RESULT, check for "rw_split_write_1", sleep between attempts, and only fail
after the retry timeout, ensuring the grep -q assertion is performed inside that
loop.
- Around line 98-101: The proxysql commands using "${SANDBOX_DIR}/proxysql/use
-e ..." should not have their exit status masked by piping into "grep -v
Warning"; instead, capture the command's output and exit code first (run the
command, save its stdout+stderr into a variable or temp file and its exit code
into a variable), then run grep -v Warning only on the captured output for
logging; finally, use the saved exit code to fail the step if non-zero. Update
all occurrences that match the pattern in the diff (the four
"${SANDBOX_DIR}/proxysql/use -e ..." invocations and the other spots noted:
lines around 107-108, 126-132, and 148) to follow this capture-filter-check
pattern so commands succeed/fail based on their actual exit status while still
suppressing "Warning" lines in logs.

In `@sandbox/templates/cluster/init_cluster.gotxt`:
- Line 5: The LD_LIBRARY_PATH in sandbox/templates/cluster/init_cluster.gotxt
(the export LD_LIBRARY_PATH line) is missing the {{.Basedir}}/lib/mysql path;
update that export to prepend both {{.Basedir}}/lib and {{.Basedir}}/lib/mysql
(matching sandbox/templates/single/sb_include.gotxt) so mysqlsh picks up client
shared libs, and make the same change in
sandbox/templates/cluster/check_cluster.gotxt to mirror the standard path used
by sb_include.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8ec46b8c-3205-4148-8984-dd342805e4fc

📥 Commits

Reviewing files that changed from the base of the PR and between 037e197 and b197f4d.

📒 Files selected for processing (4)
  • .github/workflows/integration_tests.yml
  • .github/workflows/proxysql_integration_tests.yml
  • sandbox/templates/cluster/check_cluster.gotxt
  • sandbox/templates/cluster/init_cluster.gotxt
✅ Files skipped from review due to trivial changes (1)
  • sandbox/templates/cluster/check_cluster.gotxt

Comment on lines +98 to +101
${SANDBOX_DIR}/proxysql/use -e "INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply) VALUES (1, 1, '^SELECT', 1, 1);" 2>&1 | grep -v Warning
${SANDBOX_DIR}/proxysql/use -e "LOAD MYSQL QUERY RULES TO RUNTIME;" 2>&1 | grep -v Warning
${SANDBOX_DIR}/proxysql/use -e "LOAD MYSQL SERVERS TO RUNTIME;" 2>&1 | grep -v Warning
${SANDBOX_DIR}/proxysql/use -e "SAVE MYSQL QUERY RULES TO DISK;" 2>&1 | grep -v Warning
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let grep decide whether the ProxySQL command succeeded.

On Lines 98-101 and again on Lines 107-108, 126-132, and 148, the pipeline status comes from grep, not ${SANDBOX_DIR}/proxysql/use. In a GitHub Actions bash step that gives you two bad outcomes: successful commands with no output fail the step, and failed ProxySQL commands can still look successful if they emit any non-warning text. Capture the command output/status first, then strip warnings only for logging.

Suggested pattern
- ${SANDBOX_DIR}/proxysql/use -e "LOAD MYSQL QUERY RULES TO RUNTIME;" 2>&1 | grep -v Warning
+ OUTPUT=$(${SANDBOX_DIR}/proxysql/use -e "LOAD MYSQL QUERY RULES TO RUNTIME;" 2>&1) || {
+   printf '%s\n' "$OUTPUT"
+   exit 1
+ }
+ printf '%s\n' "$OUTPUT" | sed '/Warning/d'

Run this to verify the shell behavior directly. Expected result: the first case reports pipeline_exit=1 even though the command succeeded; the second reports pipeline_exit=0 even though the command failed.

#!/bin/bash
set -eu

echo "Case 1: success with no output"
bash -lc 'true 2>&1 | grep -v Warning >/dev/null; printf "pipeline_exit=%s\n" "$?"'

echo "Case 2: failure with non-warning stderr"
bash -lc '(echo "ERROR: failed" >&2; exit 1) 2>&1 | grep -v Warning >/dev/null; printf "pipeline_exit=%s\n" "$?"'

Also applies to: 107-108, 126-132, 148-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/proxysql_integration_tests.yml around lines 98 - 101, The
proxysql commands using "${SANDBOX_DIR}/proxysql/use -e ..." should not have
their exit status masked by piping into "grep -v Warning"; instead, capture the
command's output and exit code first (run the command, save its stdout+stderr
into a variable or temp file and its exit code into a variable), then run grep
-v Warning only on the captured output for logging; finally, use the saved exit
code to fail the step if non-zero. Update all occurrences that match the pattern
in the diff (the four "${SANDBOX_DIR}/proxysql/use -e ..." invocations and the
other spots noted: lines around 107-108, 126-132, and 148) to follow this
capture-filter-check pattern so commands succeed/fail based on their actual exit
status while still suppressing "Warning" lines in logs.

Comment on lines 147 to 150
echo "=== Verify written data is readable through proxy ==="
RESULT=$(${SANDBOX_DIR}/proxysql/use_proxy -BN -e "SELECT val FROM rw_split_test.t1 WHERE val='rw_split_write_1';" 2>&1 | grep -v Warning)
RESULT=$(${SANDBOX_DIR}/proxysql/use_proxy -BN -e "SELECT val FROM rw_split_test.t1 WHERE val='rw_split_write_1';" 2>&1 | grep -v Warning) || true
echo "Result: $RESULT"
echo "$RESULT" | grep -q "rw_split_write_1" || { echo "FAIL: written data not readable through proxy"; exit 1; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retry the final proxy read against the reader hostgroup.

This assertion still does a single read-after-write through ProxySQL, but the rule intentionally sends SELECT to HG1. stats_mysql_query_digest proves routing, not that the replica has applied the write yet, so this can stay flaky under lag. Reuse the same retry loop pattern here until the row shows up.

Suggested fix
- RESULT=$(${SANDBOX_DIR}/proxysql/use_proxy -BN -e "SELECT val FROM rw_split_test.t1 WHERE val='rw_split_write_1';" 2>&1 | grep -v Warning) || true
- echo "Result: $RESULT"
- echo "$RESULT" | grep -q "rw_split_write_1" || { echo "FAIL: written data not readable through proxy"; exit 1; }
+ for i in $(seq 1 10); do
+   RESULT=$(${SANDBOX_DIR}/proxysql/use_proxy -BN -e "SELECT val FROM rw_split_test.t1 WHERE val='rw_split_write_1';" 2>/dev/null) || true
+   echo "$RESULT" | grep -q "rw_split_write_1" && break
+   echo "  Waiting for replica read to observe the write... ($i/10)"
+   sleep 2
+ done
+ echo "Result: $RESULT"
+ echo "$RESULT" | grep -q "rw_split_write_1" || { echo "FAIL: written data not readable through proxy after 20s"; exit 1; }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "=== Verify written data is readable through proxy ==="
RESULT=$(${SANDBOX_DIR}/proxysql/use_proxy -BN -e "SELECT val FROM rw_split_test.t1 WHERE val='rw_split_write_1';" 2>&1 | grep -v Warning)
RESULT=$(${SANDBOX_DIR}/proxysql/use_proxy -BN -e "SELECT val FROM rw_split_test.t1 WHERE val='rw_split_write_1';" 2>&1 | grep -v Warning) || true
echo "Result: $RESULT"
echo "$RESULT" | grep -q "rw_split_write_1" || { echo "FAIL: written data not readable through proxy"; exit 1; }
echo "=== Verify written data is readable through proxy ==="
for i in $(seq 1 10); do
RESULT=$(${SANDBOX_DIR}/proxysql/use_proxy -BN -e "SELECT val FROM rw_split_test.t1 WHERE val='rw_split_write_1';" 2>/dev/null) || true
echo "$RESULT" | grep -q "rw_split_write_1" && break
echo " Waiting for replica read to observe the write... ($i/10)"
sleep 2
done
echo "Result: $RESULT"
echo "$RESULT" | grep -q "rw_split_write_1" || { echo "FAIL: written data not readable through proxy after 20s"; exit 1; }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/proxysql_integration_tests.yml around lines 147 - 150, The
final ProxySQL read should be retried against the reader hostgroup until the
replica has applied the write; replace the single-shot command that sets RESULT
via ${SANDBOX_DIR}/proxysql/use_proxy -BN -e "SELECT val FROM rw_split_test.t1
WHERE val='rw_split_write_1';" with the same retry loop pattern used elsewhere:
repeatedly invoke use_proxy (targeting the reader hostgroup as the rule does),
capture and filter its output into RESULT, check for "rw_split_write_1", sleep
between attempts, and only fail after the retry timeout, ensuring the grep -q
assertion is performed inside that loop.

{{.Copyright}}
# Generated by dbdeployer {{.AppVersion}} using {{.TemplateName}} on {{.DateTime}}
MYSQLSH={{.MysqlShell}}
export LD_LIBRARY_PATH={{.Basedir}}/lib:$LD_LIBRARY_PATH
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include lib/mysql in the cluster mysqlsh loader path.

Line 5 is narrower than the existing sandbox pattern. sandbox/templates/single/sb_include.gotxt:19-22 prepends both {{.Basedir}}/lib and {{.Basedir}}/lib/mysql, but this template invokes mysqlsh directly instead of sourcing sb_include. As written, cluster bootstrap still depends on CI-only copied libs and can miss client shared libraries on normal installs. Please mirror the standard path here (and in sandbox/templates/cluster/check_cluster.gotxt).

Suggested fix
-export LD_LIBRARY_PATH={{.Basedir}}/lib:$LD_LIBRARY_PATH
+export LD_LIBRARY_PATH={{.Basedir}}/lib:{{.Basedir}}/lib/mysql:$LD_LIBRARY_PATH
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export LD_LIBRARY_PATH={{.Basedir}}/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH={{.Basedir}}/lib:{{.Basedir}}/lib/mysql:$LD_LIBRARY_PATH
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sandbox/templates/cluster/init_cluster.gotxt` at line 5, The LD_LIBRARY_PATH
in sandbox/templates/cluster/init_cluster.gotxt (the export LD_LIBRARY_PATH
line) is missing the {{.Basedir}}/lib/mysql path; update that export to prepend
both {{.Basedir}}/lib and {{.Basedir}}/lib/mysql (matching
sandbox/templates/single/sb_include.gotxt) so mysqlsh picks up client shared
libs, and make the same change in sandbox/templates/cluster/check_cluster.gotxt
to mirror the standard path used by sb_include.

…pack flags

InnoDB Cluster: added Basedir to template data map — the init_cluster
and check_cluster templates need it for LD_LIBRARY_PATH.

Fan-in: use separate databases per master (fanin_test vs fanin_test2)
to avoid multi-source replication conflicts when both masters create
the same database.

MariaDB: add --unpack-version, --flavor, --target-server flags for
the systemd tarball variant which dbdeployer doesn't auto-detect.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/integration_tests.yml:
- Around line 201-203: The unpack invocation using ./dbdeployer unpack includes
--target-server="mariadb-${MARIADB_VERSION}" which will fail validation in
ops/unpack.go (the check that requires --target-server only with --shell); fix
the CI script by either removing the --target-server flag from the ./dbdeployer
unpack command or adding the --shell flag to that same unpack invocation so the
validation (isShell vs target) in ops/unpack.go passes.
- Around line 453-465: The default multiple-deployment directory prefix in
defaults.go is mismatched with the CI workflow: update the Defaults value for
MultiplePrefix (the MultiplePrefix field in the defaults package, currently set
to "multi_msb_") to "postgresql_multi_" so sandbox directories created by the
code match what the workflow (which looks for postgresql_multi_*) expects.
- Around line 205-208: The deploy step uses dbdeployer with the wrong version
token: update the dbdeployer invocation in the "Test single sandbox" job so the
deploy command passes the unpacked directory name by using
"mariadb-${MARIADB_VERSION}" instead of just "$MARIADB_VERSION"; ensure the
dbdeployer deploy single call (the line invoking dbdeployer deploy single) uses
the same "mariadb-${MARIADB_VERSION}" string that the unpack step created so
SANDBOX_BINARY path resolution matches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f23d0e07-596d-46bc-95b2-868c06417f85

📥 Commits

Reviewing files that changed from the base of the PR and between b197f4d and 4f8a10d.

📒 Files selected for processing (2)
  • .github/workflows/integration_tests.yml
  • sandbox/innodb_cluster.go

Comment on lines +453 to 465
echo "=== Checking sandbox dirs ==="
ls ~/sandboxes/postgresql_multi_*/
echo "=== Connect to node1 ==="
~/sandboxes/multi_msb_*/node1/use -c "SELECT version();"
~/sandboxes/multi_msb_*/node1/use -c "CREATE TABLE multi_test(id serial, val text); INSERT INTO multi_test(val) VALUES ('from_node1');"
~/sandboxes/postgresql_multi_*/node1/use -c "SELECT version();"
echo "=== Connect to node2 ==="
~/sandboxes/multi_msb_*/node2/use -c "SELECT version();"
~/sandboxes/multi_msb_*/node2/use -c "INSERT INTO multi_test(val) VALUES ('from_node2');" || true
~/sandboxes/postgresql_multi_*/node2/use -c "SELECT version();"
echo "=== Connect to node3 ==="
~/sandboxes/multi_msb_*/node3/use -c "SELECT version();"
~/sandboxes/postgresql_multi_*/node3/use -c "SELECT version();"
echo "=== Cleanup ==="
for dir in ~/sandboxes/multi_msb_*; do
for dir in ~/sandboxes/postgresql_multi_*; do
[ -d "$dir" ] && bash "$dir/stop" 2>/dev/null || true
rm -rf "$dir"
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how PostgreSQL multiple sandboxes are named in the codebase
rg -n 'postgresql_multi|multi_msb' --type=go

Repository: ProxySQL/dbdeployer

Length of output: 134


🏁 Script executed:

rg -n 'MultiplePrefix' --type=go -B 2 -A 2

Repository: ProxySQL/dbdeployer

Length of output: 3343


🏁 Script executed:

rg -n 'postgresql.*prefix|provider.*multiple' --type=go -i

Repository: ProxySQL/dbdeployer

Length of output: 375


🏁 Script executed:

# Check if there's a configuration file or environment override for MultiplePrefix
rg -n 'multi_msb_|postgresql_multi_' --type=yml --type=yaml

Repository: ProxySQL/dbdeployer

Length of output: 90


🏁 Script executed:

# Check the workflow file that was modified
cat -n .github/workflows/integration_tests.yml | sed -n '450,470p'

Repository: ProxySQL/dbdeployer

Length of output: 1189


🏁 Script executed:

# Check if there are any provider-specific defaults or configuration
fd -type f -name "*.go" | xargs grep -l "provider.*postgresql" | head -5 | xargs grep -n "MultiplePrefix\|multiple.*prefix" -i

Repository: ProxySQL/dbdeployer

Length of output: 235


🏁 Script executed:

# Search for PostgreSQL provider implementation and any custom prefix handling
rg -n 'package postgresql' --type=go -A 20 | head -50

Repository: ProxySQL/dbdeployer

Length of output: 3082


🏁 Script executed:

# Look at the main PostgreSQL provider implementation
fd 'postgresql.go' -type f | xargs cat -n | head -100

Repository: ProxySQL/dbdeployer

Length of output: 233


🏁 Script executed:

# Look at the main PostgreSQL provider implementation
cat -n providers/postgresql/postgresql.go | head -150

Repository: ProxySQL/dbdeployer

Length of output: 3894


🏁 Script executed:

# Search for any place where MultiplePrefix might be overridden or customized
rg -n 'MultiplePrefix\s*=' --type=go

Repository: ProxySQL/dbdeployer

Length of output: 124


Update MultiplePrefix default to match workflow expectations.

The workflow expects sandbox directories named postgresql_multi_*, but the code still uses multi_msb_ as the default prefix for multiple deployments. Line 129 in defaults/defaults.go must be updated to:

MultiplePrefix: "postgresql_multi_",

Otherwise the workflow will fail trying to access non-existent postgresql_multi_* directories.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/integration_tests.yml around lines 453 - 465, The default
multiple-deployment directory prefix in defaults.go is mismatched with the CI
workflow: update the Defaults value for MultiplePrefix (the MultiplePrefix field
in the defaults package, currently set to "multi_msb_") to "postgresql_multi_"
so sandbox directories created by the code match what the workflow (which looks
for postgresql_multi_*) expects.

…auth bug #82)

Tested and verified locally:
- Fan-in: separate databases per master (fanin_test vs fanin_test2) — confirmed working
- InnoDB Cluster: Basedir added to template data for LD_LIBRARY_PATH
- InnoDB Cluster: mysqlsh bundled libs copied to sandbox lib dir

Removed MariaDB 11.4.9 from CI — slave authentication is broken
(root can't even connect on slaves). Tracked in #82.
Kept MariaDB 10.11.9 which works correctly.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/integration_tests.yml (1)

452-463: ⚠️ Potential issue | 🔴 Critical

Use the actual multiple-sandbox prefix here.

deploy multiple --provider=postgresql still creates ~/sandboxes/multi_msb_*: defaults/defaults.go, Line 129 keeps MultiplePrefix as multi_msb_, and sandbox/multiple.go, Lines 61-65 uses that prefix when DirName is empty. These postgresql_multi_* globs will miss the deployed topology and fail this step.

Diff
-          ls ~/sandboxes/postgresql_multi_*/
+          ls ~/sandboxes/multi_msb_*/
           echo "=== Connect to node1 ==="
-          ~/sandboxes/postgresql_multi_*/node1/use -c "SELECT version();"
+          ~/sandboxes/multi_msb_*/node1/use -c "SELECT version();"
           echo "=== Connect to node2 ==="
-          ~/sandboxes/postgresql_multi_*/node2/use -c "SELECT version();"
+          ~/sandboxes/multi_msb_*/node2/use -c "SELECT version();"
           echo "=== Connect to node3 ==="
-          ~/sandboxes/postgresql_multi_*/node3/use -c "SELECT version();"
+          ~/sandboxes/multi_msb_*/node3/use -c "SELECT version();"
           echo "=== Cleanup ==="
-          for dir in ~/sandboxes/postgresql_multi_*; do
+          for dir in ~/sandboxes/multi_msb_*; do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/integration_tests.yml around lines 452 - 463, The workflow
uses the wrong sandbox glob (postgresql_multi_*) so it won't find sandboxes
created by the multiple deployer which uses MultiplePrefix (multi_msb_) when
DirName is empty; update .github/workflows/integration_tests.yml to use the
actual prefix (multi_msb_*) for the ls, connect, and cleanup globs and node
paths so they match the sandboxes produced by the code in defaults/defaults.go
(MultiplePrefix) and sandbox/multiple.go (DirName handling), ensuring the stop
script and rm -rf target the correct directories.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/integration_tests.yml:
- Around line 452-463: The workflow uses the wrong sandbox glob
(postgresql_multi_*) so it won't find sandboxes created by the multiple deployer
which uses MultiplePrefix (multi_msb_) when DirName is empty; update
.github/workflows/integration_tests.yml to use the actual prefix (multi_msb_*)
for the ls, connect, and cleanup globs and node paths so they match the
sandboxes produced by the code in defaults/defaults.go (MultiplePrefix) and
sandbox/multiple.go (DirName handling), ensuring the stop script and rm -rf
target the correct directories.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f74cc183-e00b-4a99-98c0-2cae2dcb91ce

📥 Commits

Reviewing files that changed from the base of the PR and between 4f8a10d and a76d93a.

📒 Files selected for processing (1)
  • .github/workflows/integration_tests.yml

mysqlsh expects lib/mysqlsh/ to exist as a subdirectory — it checks
for this at startup and aborts with "lib folder not found, shell
installation likely invalid" if missing.

Previously we copied individual .so files flat into lib/ which
didn't create the required lib/mysqlsh/ structure.
Three root causes found and fixed by local reproduction:

1. CI: mysqlsh requires libexec/mysqlsh/ directory (not just lib/mysqlsh/)
   - Added copy of libexec/ and share/ from shell tarball

2. Template: --interactive flag is invalid for mysqlsh CLI mode
   - Removed --interactive=false from dba configure-instance calls

3. Architecture: initialize_nodes pre-started GR, conflicting with mysqlsh
   - Removed initialize_nodes call for InnoDB Cluster
   - Let mysqlsh dba.createCluster() manage GR lifecycle entirely
   - Reset GTIDs before cluster creation to avoid errant GTID conflicts
   - Configure all instances before creating cluster (ensures icadmin exists everywhere)

Verified locally with both MySQL 8.4.8 and 9.5.0 — full 3-node
InnoDB Cluster deploys successfully with all nodes ONLINE.
Router: RunCmd(start.sh) blocks because backgrounded mysqlrouter
inherits stdout/stderr pipes, so cmd.Wait() never returns. Fixed by
launching mysqlrouter directly via exec.Command().Start() without
waiting. Verified locally: both 8.4.8 and 9.5.0 deploy with Router
in seconds.

ProxySQL CI: grep -v Warning returns exit code 1 when there are no
Warning lines to filter, which kills the step under GitHub Actions'
implicit set -e. Wrapped all grep -v Warning pipes in { ... || true; }.
Also split multi-statement INSERT into separate -e calls.
The ls ... && grep ... pipeline captured both the ls output (file path)
and the grep output (port number) into ROUTER_RW_PORT, resulting in
mysql receiving "/path/to/mysqlrouter.conf\n6446" as the port value.

Fixed by removing the unnecessary ls check — grep already handles
missing files via 2>/dev/null.
ProxySQL setup used MasterSlavePrefix (rsandbox_) and looked for a
master/ subdirectory, but InnoDB Cluster uses InnoDBClusterPrefix
(ic_msb_) with node1/ as primary and node2..N as secondaries.

Added topology-aware path resolution: InnoDB Cluster reads node1
as primary port and node2..N as replica ports.
rsandbox user lacked SELECT on performance_schema, so ProxySQL's
group_replication monitor couldn't query replication_group_members
to determine writer/reader roles. All servers ended up in the
offline hostgroup (3), causing "Max connect timeout" on writes.

Grant performance_schema access to the replication monitor user
after cluster creation.
@renecannao renecannao merged commit 23b5b35 into master Apr 9, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant