Fix: Extended query Routing and Statistics Tracking#5388
Conversation
In the extended query protocol flow (Parse/Bind/Execute/Sync messages), queries without matching query rules were incorrectly retaining the current_hostgroup value from the previous query instead of being reset to the default_hostgroup.
This happened because:
- Simple query protocol goes through STATE_SLEEP between queries, which
resets current_hostgroup to default_hostgroup
- Extended query protocol does NOT go through STATE_SLEEP between
Parse/Bind/Execute messages, so current_hostgroup was never reset
The queries_sent counter in stats.stats_pgsql_connection_pool was not being incremented for prepared statement executions (ASYNC_STMT_EXECUTE_START), only for regular queries (ASYNC_QUERY_START) and statement preparation (ASYNC_STMT_PREPARE_START).
Added TAP test to verify query routing works correctly for both text protocol and extended query protocol. Test cases: - Test 1: Basic read/write split (^SELECT -> HG 1, others -> HG 0) - Test 2: Table-specific routing rules - Test 3: SELECT ... FOR UPDATE exception handling Each test validates routing by checking query counts in stats.stats_pgsql_connection_pool for both protocols.
Summary of ChangesHello @rahim-kanji, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses two critical bugs related to PostgreSQL extended query protocol handling within ProxySQL. It ensures that prepared statements are correctly routed to the default hostgroup when no specific rule applies and that their execution statistics are accurately recorded. Additionally, a comprehensive new test has been added to verify the robustness of PostgreSQL query rule routing for both standard and extended query types. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughFixes PostgreSQL extended query protocol issues: prepared-statement executions increment backend query counters and bytes sent; hostgroup selection now uses Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
The pull request addresses two bugs related to PostgreSQL extended query protocol: routing queries without matching rules to the default hostgroup and correctly counting extended query protocol executions in statistics. It also adds a basic read/write split TAP test. The changes improve the correctness and observability of PostgreSQL extended query handling.
| if (transaction_persistent_hostgroup == -1) { | ||
| if (qpo->destination_hostgroup >= 0) { | ||
| current_hostgroup = qpo->destination_hostgroup; | ||
| } else { | ||
| // No query rule matched - use default hostgroup | ||
| current_hostgroup = default_hostgroup; |
There was a problem hiding this comment.
The logic for setting current_hostgroup when qpo->destination_hostgroup is not set (i.e., no query rule matched) has been changed. Previously, if qpo->destination_hostgroup was not set (value < 0), current_hostgroup would retain its previous value. Now, it explicitly defaults to default_hostgroup. This is a significant change in behavior and should be thoroughly tested to ensure it aligns with the intended routing logic, especially for edge cases where default_hostgroup might not be explicitly defined or might lead to unexpected routing.
| __sync_fetch_and_add(&parent->queries_sent, 1); | ||
| update_bytes_sent(query.length + 5); | ||
| statuses.questions++; |
There was a problem hiding this comment.
| case ASYNC_RESYNC_START: | ||
| if (PQpipelineStatus(pgsql_conn) == PQ_PIPELINE_OFF) { | ||
| proxy_warning("Resync not required � connection already synchronized.\n"); | ||
| proxy_warning("Resync not required - connection already synchronized.\n"); |
There was a problem hiding this comment.
The character `` is a replacement character for an unknown or unrepresentable character. This likely indicates a character encoding issue. It should be replaced with a standard hyphen - to ensure proper display across different systems and locales.
proxy_warning("Resync not required - connection already synchronized.\n");|
|
||
| // For prepared statements, we expect 2 queries (PREPARE + EXECUTE or similar) | ||
| // The exact count depends on ProxySQL implementation | ||
| if (new_hid_queries - cur_hid_queries < 1) { |
There was a problem hiding this comment.
The comment states that for prepared statements, 2 queries (PREPARE + EXECUTE or similar) are expected. However, the condition new_hid_queries - cur_hid_queries < 1 only checks for at least one query. If the expectation is strictly 2, the condition should be != 2 or == 0 to catch cases where only one operation is counted or none at all. If the exact count can vary but should be at least one, the current condition is acceptable, but the comment should reflect this flexibility.
There was a problem hiding this comment.
🧹 Nitpick comments (6)
test/tap/tests/pgsql-query_rules_routing-t.cpp (3)
148-170:PQgetvaluenever returnsNULLfor valid row/column; usePQgetisnullinstead.
PQgetvaluereturns an empty string""for SQL NULL values (not a CNULLpointer), so the!= NULLcheck on line 164 is always true whenPQntuples > 0. In practice,atoi("")returns0, which happens to be a reasonable fallback, so this is harmless here. Still, usingPQgetisnullwould be more correct and clearer.Proposed fix
int query_count = -1; - if (PQntuples(res) > 0 && PQgetvalue(res, 0, 0) != NULL) { + if (PQntuples(res) > 0 && !PQgetisnull(res, 0, 0)) { query_count = atoi(PQgetvalue(res, 0, 0)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/tests/pgsql-query_rules_routing-t.cpp` around lines 148 - 170, The get_hostgroup_query_count function incorrectly checks PQgetvalue(...) != NULL to detect SQL NULLs; replace that check with PQgetisnull(res, 0, 0) to detect NULLs and only call atoi(PQgetvalue(res,0,0)) when PQgetisnull returns false (otherwise keep the fallback -1 or 0 as desired); ensure you include PQgetisnull(res, 0, 0) in the conditional that currently uses PQgetvalue != NULL and leave PQclear(res) and error handling unchanged.
204-231: DEALLOCATE via text protocol may inflate query counts for hostgroup 0.
perform_extended_query_protocolissues aDEALLOCATEviaPQexec(text protocol), which ProxySQL routes as a regular query — likely to the default hostgroup (0). This means for queries expected to route to HG 0, the extended protocol check (new_hid_queries - cur_hid_queries) will include the DEALLOCATE's count. The< 1check on line 433 tolerates this, but the test comments (lines 431–432) should clarify that the DEALLOCATE contributes to the count, to avoid confusion for future maintainers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/tests/pgsql-query_rules_routing-t.cpp` around lines 204 - 231, The test's extended-protocol helper perform_extended_query_protocol issues DEALLOCATE via PQexec (text protocol), which ProxySQL treats as a regular query and increments hostgroup 0 counts; update the test comments adjacent to the assertion that checks (new_hid_queries - cur_hid_queries) < 1 to explicitly state that the DEALLOCATE issued by perform_extended_query_protocol is sent over the text protocol and will contribute to the query count for hostgroup 0, so the check allows that extra count; reference perform_extended_query_protocol and the assertion that compares new_hid_queries and cur_hid_queries when editing the comments.
341-347:plan()count reflects test cases, not individual query assertions — looks intentional.The TAP plan is set to
dst_hostgroup_tests.size()(3), with each test case producing a singleok()that aggregates all sub-query checks. This is fine, but consider whether finer-grainedok()calls per query would improve debuggability when a test fails. As-is, a failure requires readingdiagoutput to identify which specific query misrouted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/tests/pgsql-query_rules_routing-t.cpp` around lines 341 - 347, The TAP plan currently uses plan(dst_hostgroup_tests.size()) which counts only test cases, making single aggregated ok() calls per case; update the test to report each individual query as its own TAP test by changing the plan to the total number of queries (e.g. sum of queries per dst_hostgroup_tests) and emit an ok() for each sub-query instead of one per case, or alternatively keep the plan as-is but add finer-grained ok() calls inside the loop that runs through dst_hostgroup_tests and the per-query checks and use diag() only for additional context; locate the plan(), dst_hostgroup_tests usage, and the ok()/diag() calls to implement this change.test/tap/groups/pgsql17-repl/pre-proxysql.bash (2)
40-44: Quote variable expansions to prevent word splitting.
$JENKINS_SCRIPTS_PATHand${INFRA}should be quoted in all path usages.Proposed fix
-source $JENKINS_SCRIPTS_PATH/${INFRA}/.env +source "$JENKINS_SCRIPTS_PATH/${INFRA}/.env" # Start infra (this will configure ProxySQL via docker-proxy-post.bash) -$JENKINS_SCRIPTS_PATH/infra-docker-hoster/docker-compose-init.bash -$JENKINS_SCRIPTS_PATH/${INFRA}/docker-compose-init.bash +"$JENKINS_SCRIPTS_PATH/infra-docker-hoster/docker-compose-init.bash" +"$JENKINS_SCRIPTS_PATH/${INFRA}/docker-compose-init.bash"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/groups/pgsql17-repl/pre-proxysql.bash` around lines 40 - 44, The script uses unquoted variable expansions (e.g., source $JENKINS_SCRIPTS_PATH/${INFRA}/.env and calling $JENKINS_SCRIPTS_PATH/infra-docker-hoster/docker-compose-init.bash and $JENKINS_SCRIPTS_PATH/${INFRA}/docker-compose-init.bash) which can cause word-splitting; update these usages to quote the expansions (e.g., wrap $JENKINS_SCRIPTS_PATH and ${INFRA} in double quotes when constructing the paths) so the source and docker-compose-init.bash invocations become robust against spaces or special characters.
7-7: Quote inner command substitution to prevent word splitting.The inner
$(dirname "$0")result is passed unquoted tobasename, which can break if the path contains spaces (SC2046).Proposed fix
-INFRA=infra-$(basename $(dirname "$0") | sed 's/-g[0-9]//' | sed 's/_.*//') +INFRA=infra-$(basename "$(dirname "$0")" | sed 's/-g[0-9]//' | sed 's/_.*//')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/groups/pgsql17-repl/pre-proxysql.bash` at line 7, The INFRA assignment uses basename with an unquoted command substitution which can word-split paths; update the INFRA line to pass the dirname result to basename quoted (i.e., change basename $(dirname "$0") to basename "$(dirname "$0")") so the dirname output isn't split by whitespace, keeping the rest of the pipeline (sed calls) unchanged and referencing the INFRA variable, basename and dirname invocations.test/tap/groups/pgsql17-repl/env.sh (1)
1-5: Add a shebang to silence the shellcheck warning.While this file is sourced rather than executed directly, adding a shebang (e.g.,
#!/usr/bin/env bash) is good practice and will silence the SC2148 warning.Proposed fix
+#!/usr/bin/env bash # PostgreSQL 17 Replication Test Group # # Tests for replication, query routing, read-only detection export TEST_PY_TAP_INCL="pgsql-query_rules_routing-t"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/groups/pgsql17-repl/env.sh` around lines 1 - 5, Add a bash shebang as the first line of the env.sh script to silence shellcheck SC2148 and clarify the intended shell (e.g., use "#!/usr/bin/env bash"); keep the rest of the file intact (the TEST_PY_TAP_INCL export and comments) and ensure the shebang appears before any comments or exports so sourcing still works.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/tap/groups/pgsql17-repl/env.sh`:
- Around line 1-5: Add a bash shebang as the first line of the env.sh script to
silence shellcheck SC2148 and clarify the intended shell (e.g., use
"#!/usr/bin/env bash"); keep the rest of the file intact (the TEST_PY_TAP_INCL
export and comments) and ensure the shebang appears before any comments or
exports so sourcing still works.
In `@test/tap/groups/pgsql17-repl/pre-proxysql.bash`:
- Around line 40-44: The script uses unquoted variable expansions (e.g., source
$JENKINS_SCRIPTS_PATH/${INFRA}/.env and calling
$JENKINS_SCRIPTS_PATH/infra-docker-hoster/docker-compose-init.bash and
$JENKINS_SCRIPTS_PATH/${INFRA}/docker-compose-init.bash) which can cause
word-splitting; update these usages to quote the expansions (e.g., wrap
$JENKINS_SCRIPTS_PATH and ${INFRA} in double quotes when constructing the paths)
so the source and docker-compose-init.bash invocations become robust against
spaces or special characters.
- Line 7: The INFRA assignment uses basename with an unquoted command
substitution which can word-split paths; update the INFRA line to pass the
dirname result to basename quoted (i.e., change basename $(dirname "$0") to
basename "$(dirname "$0")") so the dirname output isn't split by whitespace,
keeping the rest of the pipeline (sed calls) unchanged and referencing the INFRA
variable, basename and dirname invocations.
In `@test/tap/tests/pgsql-query_rules_routing-t.cpp`:
- Around line 148-170: The get_hostgroup_query_count function incorrectly checks
PQgetvalue(...) != NULL to detect SQL NULLs; replace that check with
PQgetisnull(res, 0, 0) to detect NULLs and only call atoi(PQgetvalue(res,0,0))
when PQgetisnull returns false (otherwise keep the fallback -1 or 0 as desired);
ensure you include PQgetisnull(res, 0, 0) in the conditional that currently uses
PQgetvalue != NULL and leave PQclear(res) and error handling unchanged.
- Around line 204-231: The test's extended-protocol helper
perform_extended_query_protocol issues DEALLOCATE via PQexec (text protocol),
which ProxySQL treats as a regular query and increments hostgroup 0 counts;
update the test comments adjacent to the assertion that checks (new_hid_queries
- cur_hid_queries) < 1 to explicitly state that the DEALLOCATE issued by
perform_extended_query_protocol is sent over the text protocol and will
contribute to the query count for hostgroup 0, so the check allows that extra
count; reference perform_extended_query_protocol and the assertion that compares
new_hid_queries and cur_hid_queries when editing the comments.
- Around line 341-347: The TAP plan currently uses
plan(dst_hostgroup_tests.size()) which counts only test cases, making single
aggregated ok() calls per case; update the test to report each individual query
as its own TAP test by changing the plan to the total number of queries (e.g.
sum of queries per dst_hostgroup_tests) and emit an ok() for each sub-query
instead of one per case, or alternatively keep the plan as-is but add
finer-grained ok() calls inside the loop that runs through dst_hostgroup_tests
and the per-query checks and use diag() only for additional context; locate the
plan(), dst_hostgroup_tests usage, and the ok()/diag() calls to implement this
change.
Fix missing and incorrect update_bytes_sent() calls in PgSQL_Connection async state machine.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/PgSQL_Connection.cpp`:
- Around line 732-738: The bytes_sent calculation double-counts the null
terminator for non-empty portal/statement names: keep the base 7 (which already
includes the trailing '\0') and add only strlen(name) when
query.extended_query_info->stmt_type is 'P' for stmt_client_portal_name, and
likewise add only strlen(query.backend_stmt_name) in the else branch;
update_bytes_sent(bytes_sent) should then be called with this corrected
bytes_sent value so named portals/statements don't add an extra +1.
| size_t bytes_sent = 7 + 5; // 7 for DESCRIBE header, 5 for SYNC/FLUSH | ||
| if (query.extended_query_info->stmt_type == 'P') { | ||
| bytes_sent += query.extended_query_info->stmt_client_portal_name ? (strlen(query.extended_query_info->stmt_client_portal_name) + 1) : 0; | ||
| } else { | ||
| bytes_sent += query.backend_stmt_name ? (strlen(query.backend_stmt_name) + 1) : 0; | ||
| } | ||
| update_bytes_sent(bytes_sent); |
There was a problem hiding this comment.
Off-by-one in bytes_sent for non-empty portal/statement names.
The base 7 already encodes the null terminator of an anonymous (empty-string) name — 1('D') + 4(Int32) + 1(stmt_type) + 1('\0'). For non-NULL names the code then adds strlen(name) + 1, double-counting that null byte.
- Anonymous name (
NULL):7 + 5 + 0 = 12→ matches wire size7 + 5 = 12. ✓ - Named portal
"foo"(len 3): code gives7 + 5 + 4 = 16, but wire size is(1+4+1+3+1) + 5 = 15. ✗
🐛 Proposed fix
- size_t bytes_sent = 7 + 5; // 7 for DESCRIBE header, 5 for SYNC/FLUSH
+ size_t bytes_sent = 7 + 5; // 7 = 'D'(1)+len(4)+stmt_type(1)+null_term(1) for empty name, 5 for SYNC/FLUSH
if (query.extended_query_info->stmt_type == 'P') {
- bytes_sent += query.extended_query_info->stmt_client_portal_name ? (strlen(query.extended_query_info->stmt_client_portal_name) + 1) : 0;
+ bytes_sent += query.extended_query_info->stmt_client_portal_name ? strlen(query.extended_query_info->stmt_client_portal_name) : 0;
} else {
- bytes_sent += query.backend_stmt_name ? (strlen(query.backend_stmt_name) + 1) : 0;
+ bytes_sent += query.backend_stmt_name ? strlen(query.backend_stmt_name) : 0;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/PgSQL_Connection.cpp` around lines 732 - 738, The bytes_sent calculation
double-counts the null terminator for non-empty portal/statement names: keep the
base 7 (which already includes the trailing '\0') and add only strlen(name) when
query.extended_query_info->stmt_type is 'P' for stmt_client_portal_name, and
likewise add only strlen(query.backend_stmt_name) in the else branch;
update_bytes_sent(bytes_sent) should then be called with this corrected
bytes_sent value so named portals/statements don't add an extra +1.
|
retest this please |



Summary
Fixes two bugs in PostgreSQL extended query protocol (prepared statements):
stats.stats_pgsql_connection_poolAdded Basic read/write split TAP test
Closes #5387
Summary by CodeRabbit
Bug Fixes
Tests