Conversation
Add a MySQL daemon plugin that serves Prometheus text exposition format metrics via an embedded HTTP endpoint. Uses srv_session + command_service to execute SHOW GLOBAL STATUS internally (no network, no auth). Features: - Disabled by default (--prometheus-exporter-enabled=ON to activate) - Configurable port (default 9104) and bind address - Exports ~400 global status variables as mysql_global_status_* metrics - Gauge/untyped type classification for known variables - Plugin self-monitoring: requests_total, errors_total, scrape_duration - Clean shutdown via poll() + atomic flag Includes MTR test suite (basic + metrics_endpoint) and design spec/plan for v2 enhancements.
Extract the inline SHOW GLOBAL STATUS logic from collect_metrics() into separate reusable functions: collect_name_value_query() as a generic helper for 2-column queries, and collect_global_status() as the specific collector. MetricsCollectorCtx now takes a configurable prefix and type function, enabling additional collectors to be added cleanly.
Export all numeric MySQL server configuration variables as Prometheus gauge metrics with the mysql_global_variables_ prefix. Non-numeric values (strings, paths, ON/OFF) are automatically skipped by the existing strtod check.
Add collector that exports InnoDB metrics from information_schema.INNODB_METRICS with proper Prometheus type mapping (InnoDB 'counter' -> Prometheus counter, everything else -> gauge). Uses dedicated 4-column callbacks since INNODB_METRICS returns NAME, SUBSYSTEM, TYPE, COUNT rather than simple name/value pairs.
Add column-name-aware parsing for SHOW REPLICA STATUS that exports 6 replication metrics as gauges: seconds_behind_source, io_running, sql_running, relay_log_space, exec_source_log_pos, and read_source_log_pos. Boolean fields (IO/SQL running) emit 1 for "Yes", 0 otherwise. Gracefully emits nothing on non-replica servers.
Adds a binlog collector that runs SHOW BINARY LOGS and emits two synthetic gauge metrics: mysql_binlog_file_count (number of binary log files) and mysql_binlog_size_bytes_total (sum of all file sizes). If binary logging is disabled the query errors silently and no metrics are emitted.
- Update basic.test with inline documentation and clearer echo messages - Update metrics_endpoint.test to verify all 5 collector prefixes (global status, global variables, InnoDB metrics, binlog) - Add format_validation.test with Perl-based Prometheus exposition format validation (TYPE lines, metric names, value structure) - Record all result files
- Full README with ASCII architecture diagram, configuration table, metric namespace reference, usage examples, type classification, and limitations - Pointer doc in Docs/ directory
📝 WalkthroughWalkthroughAdds a new Prometheus Exporter daemon plugin that serves an HTTP Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTPServer as "Listener Thread (HTTP)"
participant MySQL as "MySQL Command Service"
participant Formatter as "Metrics Formatter"
Client->>HTTPServer: TCP connect + GET /metrics
HTTPServer->>HTTPServer: parse request
HTTPServer->>MySQL: open srv_session / run SHOW GLOBAL STATUS
MySQL-->>Formatter: status rows
Formatter->>Formatter: classify (gauge/untyped) and format
HTTPServer->>MySQL: SHOW GLOBAL VARIABLES / INNODB_METRICS / REPLICA STATUS / SHOW BINARY LOGS
MySQL-->>Formatter: query results
Formatter->>Formatter: map types, convert booleans, aggregate binlog sizes
HTTPServer->>HTTPServer: update Prometheus_exporter_* counters/timers
HTTPServer->>Client: HTTP 200 + Prometheus text payload
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Remove spec and plan docs from docs/superpowers/ -- these were development-time artifacts. User-facing documentation lives in plugin/prometheus_exporter/README.md.
There was a problem hiding this comment.
Pull request overview
Adds an embedded Prometheus /metrics HTTP exporter as a MySQL/VillageSQL daemon plugin, plus an accompanying MTR test suite and documentation, so Prometheus can scrape server metrics without an external sidecar.
Changes:
- Introduces
prometheus_exporterdaemon plugin implementing an in-process HTTP server and 5 SQL-based collectors (status, variables, InnoDB metrics, replica status, binlogs). - Adds a new
prometheus_exporterMTR suite with per-collector endpoint checks and a Perl-based exposition-format validator. - Adds plugin build wiring and end-user documentation (README + Docs pointer) plus internal design/spec docs.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/prometheus_exporter/prometheus_exporter.cc | Implements the embedded HTTP server and SQL-driven metric collection/export. |
| plugin/prometheus_exporter/CMakeLists.txt | Adds build rules to produce the prometheus_exporter plugin module. |
| plugin/prometheus_exporter/README.md | Documents architecture, configuration, metric namespaces, and limitations. |
| Docs/prometheus_exporter.md | Provides a Docs/ entry pointing to the plugin README. |
| mysql-test/include/plugin.defs | Registers the plugin for MTR so it can be loaded in test runs. |
| mysql-test/suite/prometheus_exporter/t/basic.test | Verifies install/uninstall lifecycle and sysvar/statusvar presence. |
| mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test | Verifies /metrics endpoint behavior, multiple prefixes, and 404 handling. |
| mysql-test/suite/prometheus_exporter/t/global_variables.test | Validates SHOW GLOBAL VARIABLES metrics export. |
| mysql-test/suite/prometheus_exporter/t/innodb_metrics.test | Validates information_schema.INNODB_METRICS metrics export and typing. |
| mysql-test/suite/prometheus_exporter/t/replica_status.test | Ensures replica metrics are absent on non-replica servers. |
| mysql-test/suite/prometheus_exporter/t/binlog.test | Validates binlog file count/size metrics export. |
| mysql-test/suite/prometheus_exporter/t/format_validation.test | Perl-based structural validation of Prometheus text exposition output. |
| mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt | Test server options (enable plugin, bind/port) for endpoint test. |
| mysql-test/suite/prometheus_exporter/t/global_variables-master.opt | Test server options for global variables collector test. |
| mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt | Test server options for InnoDB metrics collector test. |
| mysql-test/suite/prometheus_exporter/t/replica_status-master.opt | Test server options for replica-status collector test. |
| mysql-test/suite/prometheus_exporter/t/binlog-master.opt | Test server options for binlog collector test. |
| mysql-test/suite/prometheus_exporter/t/format_validation-master.opt | Test server options for format-validation test. |
| mysql-test/suite/prometheus_exporter/r/basic.result | Recorded output for basic lifecycle test. |
| mysql-test/suite/prometheus_exporter/r/metrics_endpoint.result | Recorded output for endpoint/all-collector smoke test. |
| mysql-test/suite/prometheus_exporter/r/global_variables.result | Recorded output for global variables collector test. |
| mysql-test/suite/prometheus_exporter/r/innodb_metrics.result | Recorded output for InnoDB metrics collector test. |
| mysql-test/suite/prometheus_exporter/r/replica_status.result | Recorded output for replica-status absence test. |
| mysql-test/suite/prometheus_exporter/r/binlog.result | Recorded output for binlog collector test. |
| mysql-test/suite/prometheus_exporter/r/format_validation.result | Recorded output for exposition-format validation test. |
| docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md | Internal design spec for the v2 exporter. |
| docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md | Internal step-by-step implementation plan and commands. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
plugin/prometheus_exporter/README.md
Outdated
| │ ┌──────────────���───────────────────────────────┐ │ | ||
| │ │ prometheus_exporter plugin │ │ | ||
| │ │ �� │ | ||
| │ │ ┌──────────────┐ ┌────────────────────┐ │ │ |
There was a problem hiding this comment.
The ASCII architecture diagram contains corrupted/invalid UTF-8 characters (e.g., "���", "��"), which makes the README render poorly and is likely an encoding artifact. Please replace those characters with plain ASCII/Unicode box-drawing characters and ensure the file is saved as valid UTF-8.
| // Switch to root security context | ||
| MYSQL_SECURITY_CONTEXT sc; | ||
| thd_get_security_context(srv_session_info_get_thd(session), &sc); | ||
| security_context_lookup(sc, "root", "localhost", "127.0.0.1", ""); | ||
|
|
There was a problem hiding this comment.
The exporter unconditionally switches the internal session to the "root" security context and ignores return codes from thd_get_security_context/security_context_lookup. This effectively runs the collectors with full privileges and can fail silently depending on account configuration. Consider using a dedicated low-privilege account/context (or the admin session service if appropriate), and handle errors explicitly (e.g., fail the scrape / increment errors_total when context switch fails).
| // Switch to root security context | |
| MYSQL_SECURITY_CONTEXT sc; | |
| thd_get_security_context(srv_session_info_get_thd(session), &sc); | |
| security_context_lookup(sc, "root", "localhost", "127.0.0.1", ""); | |
| // Run collectors using the default internal session context. Do not | |
| // unconditionally elevate this session to the root account. |
| "endpoint. Default 0.0.0.0.", | ||
| nullptr, nullptr, "0.0.0.0"); |
There was a problem hiding this comment.
Defaulting the bind address to 0.0.0.0 while also having no authentication/TLS means enabling the plugin can unintentionally expose internal server metrics on all interfaces. Consider changing the default to 127.0.0.1 (or requiring an explicit bind address when enabled) and logging a warning when binding to a non-loopback address.
| "endpoint. Default 0.0.0.0.", | |
| nullptr, nullptr, "0.0.0.0"); | |
| "endpoint. Default 127.0.0.1.", | |
| nullptr, nullptr, "127.0.0.1"); |
|
|
||
| static void prom_handle_error(void *ctx, uint, const char *, const char *) { | ||
| auto *mc = static_cast<MetricsCollectorCtx *>(ctx); | ||
| mc->error = true; |
There was a problem hiding this comment.
errors_total is never incremented: prom_handle_error()/innodb_handle_error()/replica_handle_error()/binlog_handle_error() only set per-collector flags, but those flags are never checked and g_ctx->errors_total is only read for SHOW STATUS. Please wire collector/scrape failures to increment errors_total (and/or surface an error comment in the scrape output) so the status variable reflects actual failures.
| mc->error = true; | |
| mc->error = true; | |
| if (g_ctx != nullptr) { | |
| ++g_ctx->errors_total; | |
| } |
| // Try to parse as a number; skip non-numeric values (ON/OFF etc.) | ||
| char *end = nullptr; | ||
| double val = strtod(mc->current_value.c_str(), &end); | ||
| if (end == mc->current_value.c_str()) return 0; // not numeric | ||
| (void)val; // we use the string representation directly |
There was a problem hiding this comment.
Numeric parsing accepts partially-numeric strings (e.g., "1abc"): strtod() is only checked for end==start, but the original (possibly invalid) string is emitted as the Prometheus value. This can generate invalid exposition output. Please require that the full string parses as a number (optionally allowing surrounding whitespace) before emitting.
| // Check if numeric | ||
| char *end = nullptr; | ||
| strtod(val.c_str(), &end); | ||
| if (end == val.c_str()) continue; // not numeric, skip | ||
| value_str = val; | ||
| } |
There was a problem hiding this comment.
Replica metric numeric validation has the same partial-parse issue as prom_end_row: it only checks end==start after strtod(), so values like "123abc" would be emitted and break Prometheus parsing. Please require full numeric consumption (optionally allowing trailing whitespace) before emitting.
| char *end = nullptr; | ||
| long long sz = strtoll(bc->current_size.c_str(), &end, 10); | ||
| if (end != bc->current_size.c_str()) { | ||
| bc->total_size += sz; |
There was a problem hiding this comment.
Binlog size parsing uses strtoll() but only checks end!=start; partially-numeric strings (e.g., "123abc") would still be counted. Please validate that the entire field parses as an integer (optionally allowing whitespace) before adding it to total_size.
| char *end = nullptr; | |
| long long sz = strtoll(bc->current_size.c_str(), &end, 10); | |
| if (end != bc->current_size.c_str()) { | |
| bc->total_size += sz; | |
| const char *start = bc->current_size.c_str(); | |
| char *end = nullptr; | |
| long long sz = strtoll(start, &end, 10); | |
| if (end != start) { | |
| while (*end == ' ' || *end == '\t' || *end == '\n' || *end == '\r' || | |
| *end == '\f' || *end == '\v') { | |
| ++end; | |
| } | |
| if (*end == '\0') { | |
| bc->total_size += sz; | |
| } |
| if (strncmp(buf, "GET /metrics", 12) == 0) { | ||
| ctx->requests_total.fetch_add(1, std::memory_order_relaxed); |
There was a problem hiding this comment.
HTTP routing check is too permissive: strncmp("GET /metrics", 12) will also match paths like /metricsXYZ and doesn't ensure the first line is a valid request line. Consider parsing just the request line and matching the exact path "/metrics" (and method GET) before returning 200.
| --source include/not_windows.inc | ||
|
|
||
| --echo # On a non-replica server, no mysql_replica_ metrics should appear | ||
| --exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0" |
There was a problem hiding this comment.
This command prints two lines when there are no matches: grep -c already outputs 0, but exits with status 1 which triggers || echo "0". Use || true (or similar) to keep the exit status non-fatal without duplicating output.
| --exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0" | |
| --exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true |
| int client_fd = accept(ctx->listen_fd, nullptr, nullptr); | ||
| if (client_fd < 0) continue; | ||
|
|
||
| // Read HTTP request | ||
| char buf[4096]; | ||
| ssize_t n = read(client_fd, buf, sizeof(buf) - 1); | ||
| if (n <= 0) { | ||
| close(client_fd); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
The listener performs a blocking read() on the accepted client socket. Because the HTTP server is single-threaded, a client that connects and then stalls (or sends very slowly) can block the exporter indefinitely and prevent Prometheus scrapes (and potentially interfere with plugin shutdown). Consider setting a receive timeout (SO_RCVTIMEO), making the client socket non-blocking + polling for readability, or otherwise enforcing a small read deadline before parsing the request line.
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (12)
mysql-test/suite/prometheus_exporter/t/replica_status.test (2)
1-1: Consider documenting why Windows is excluded.Adding a brief comment explaining why
not_windows.incis needed (e.g., curl dependency, or Windows socket behavior differences) would help future maintainers understand the limitation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/replica_status.test` at line 1, Add a short inline comment above the include line referencing include/not_windows.inc that explains why Windows is excluded (for example: reliance on curl/Unix sockets, POSIX-only behavior in the Prometheus exporter tests, or known Windows networking incompatibilities); update the test file's top-of-file section (where the include "include/not_windows.inc" appears) to state the specific reason and any relevant bug/issue tracker IDs so future maintainers understand the limitation.
4-4: Thegrep -c || echo "0"pattern produces two lines of output.When
grep -cfinds no matches, it outputs "0" (the count) but exits with status 1. The|| echo "0"then also outputs "0", resulting in two "0" lines. This matches the result file, but the logic is confusing and redundant.Consider using
grep -c ... || trueto suppress the exit status while keeping the count output, or redirect grep's output and handle it differently.Proposed fix for cleaner output
--echo # On a non-replica server, no mysql_replica_ metrics should appear ---exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0" +--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || trueAnd update the result file to expect a single "0":
# On a non-replica server, no mysql_replica_ metrics should appear 0 -0 # But other metrics should still be present🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/replica_status.test` at line 4, The test currently uses the shell line '--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"', which prints "0" twice when no matches are found; replace that pattern with '--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true' so grep's count is preserved and non-zero exit is suppressed, and update the test result expectation to a single "0" output (remove the duplicate line) to keep output and assertions consistent.plugin/prometheus_exporter/prometheus_exporter.cc (5)
403-410: Unusedhas_rowfield inReplicaStatusCtx.The
has_rowfield is set totrueinreplica_start_row(line 432) but never read. If not needed for future use, consider removing it for clarity.♻️ Suggested cleanup
struct ReplicaStatusCtx { std::string *output; std::vector<std::string> col_names; std::vector<std::string> col_values; int col_index; - bool has_row; bool error; };And remove the assignment in
replica_start_row:static int replica_start_row(void *ctx) { auto *rc = static_cast<ReplicaStatusCtx *>(ctx); rc->col_values.clear(); rc->col_values.resize(rc->col_names.size()); rc->col_index = 0; - rc->has_row = true; return 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 403 - 410, The struct ReplicaStatusCtx contains an unused field has_row; remove the has_row member from ReplicaStatusCtx and also remove the assignment to it inside replica_start_row so there are no leftover writes to a deleted field; update any references (none expected) and rebuild to ensure no use remains of ReplicaStatusCtx::has_row.
851-863: Thread attribute not destroyed after use.
my_thread_attr_init(&attr)allocates resources that should be released withmy_thread_attr_destroy(&attr)after the thread is created, regardless of success or failure.♻️ Suggested fix
my_thread_attr_t attr; my_thread_attr_init(&attr); my_thread_attr_setdetachstate(&attr, MY_THREAD_CREATE_JOINABLE); if (my_thread_create(&g_ctx->listener_thread, &attr, prometheus_listener_thread, g_ctx) != 0) { LogPluginErr(ERROR_LEVEL, ER_LOG_PRINTF_MSG, "Prometheus exporter: failed to create listener thread"); + my_thread_attr_destroy(&attr); close(g_ctx->listen_fd); delete g_ctx; g_ctx = nullptr; return 1; } + my_thread_attr_destroy(&attr); LogPluginErr(INFORMATION_LEVEL, ER_LOG_PRINTF_MSG,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 851 - 863, The thread attribute initialized with my_thread_attr_init(&attr) is not destroyed; call my_thread_attr_destroy(&attr) after attempting to create the thread to free resources whether my_thread_create(&g_ctx->listener_thread, &attr, prometheus_listener_thread, g_ctx) succeeds or fails; add my_thread_attr_destroy(&attr) before the error return path (after cleaning up g_ctx/listen_fd) and also after successful creation (before proceeding) so attr is always destroyed.
119-157: Consider using a hash set for gauge variable lookup.
is_gauge()performs a linear search through ~30 items on every metric emission. While acceptable for current scale, astd::unordered_setwould provide O(1) lookup and better scalability if the list grows.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 119 - 157, Replace the linear-scan array lookup in is_gauge with an O(1) hash lookup: create a static std::unordered_set<std::string> (or std::string_view with storage) containing all gauge names (populate it once from gauge_variables or via an initializer list), ensure case-insensitive matching by storing lowercased names and lowercasing the input name before lookup (or use a custom case-insensitive hash/equality), and make initialization thread-safe by using a function-local static or std::call_once; then change is_gauge to query that unordered_set instead of iterating through gauge_variables.
52-55: Global service pointers lack internal linkage.
log_biandlog_bsare declared withoutstatic, giving them external linkage. This could cause symbol collisions if another plugin uses the same names. These should bestaticlikereg_srv.♻️ Suggested fix
static SERVICE_TYPE(registry) *reg_srv = nullptr; -SERVICE_TYPE(log_builtins) *log_bi = nullptr; -SERVICE_TYPE(log_builtins_string) *log_bs = nullptr; +static SERVICE_TYPE(log_builtins) *log_bi = nullptr; +static SERVICE_TYPE(log_builtins_string) *log_bs = nullptr;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 52 - 55, The global service pointers log_bi and log_bs currently have external linkage; make them have internal linkage like reg_srv by declaring them static (i.e., add the static storage specifier to the SERVICE_TYPE(log_builtins) *log_bi and SERVICE_TYPE(log_builtins_string) *log_bs declarations) so they won’t collide with symbols from other plugins; keep SERVICE_TYPE(registry) *reg_srv unchanged.
703-707: Missing return value check onsecurity_context_lookup.The return value of
security_context_lookupis not checked. While failures are rare (root user should always exist), defensive coding would log or handle the error case.🛡️ Suggested defensive check
// Switch to root security context MYSQL_SECURITY_CONTEXT sc; thd_get_security_context(srv_session_info_get_thd(session), &sc); - security_context_lookup(sc, "root", "localhost", "127.0.0.1", ""); + if (security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") != 0) { + srv_session_close(session); + srv_session_deinit_thread(); + return "# Failed to set security context\n"; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 703 - 707, The call to security_context_lookup is unchecked; capture its return value and handle errors defensively: call thd_get_security_context(...) into MYSQL_SECURITY_CONTEXT sc as before, then if security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") returns non-zero, log the failure (using the plugin/mysql logging facility available in this module) with context (e.g., include the user/host) and abort the root switch path—return an error code or clean up/exit the function instead of continuing; ensure any resources are left consistent before returning.docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md (3)
1093-1098: Numeric validation regex doesn't cover Prometheus special values.The regex
/^-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?$/doesn't match Prometheus special values like+Inf,-Inf, orNaN. While the plugin currently doesn't emit these values, you may want to extend the regex for future-proofing:unless ($value =~ /^-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?$/ || $value =~ /^[+-]?Inf$/ || $value eq 'NaN') {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 1093 - 1098, The numeric validation currently rejects Prometheus special values; update the check around the $value validation (the regex that tests $value before emitting the FORMAT ERROR for $name) to also accept `+Inf`, `-Inf`, and `NaN` — e.g., extend the condition so it passes if the original numeric regex matches OR $value matches /^[+-]?Inf$/ OR $value eq 'NaN', and keep the existing error message and $errors++ path unchanged.
556-577: Minor inconsistency:emplace_backvspush_backin plan vs implementation.The plan uses
rc->col_names.emplace_back(field->col_name)while the actual implementation atprometheus_exporter.cc:423usespush_back. Both are functionally equivalent here, but the plan should match the implementation for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 556 - 577, The plan and implementation are inconsistent: the plan's replica_field_metadata uses rc->col_names.emplace_back(field->col_name) but the actual implementation (prometheus_exporter.cc: replica_field_metadata) uses push_back; update the plan to match the implementation by replacing emplace_back with rc->col_names.push_back(field->col_name) (reference symbols: ReplicaStatusCtx, replica_field_metadata, col_names) so the plan and code are consistent.
675-676: Outdated note about#include <vector>.The note mentions adding
#include <vector>to the includes, but the actual implementation already includes it (line 46 ofprometheus_exporter.cc). This note can be removed since it's already satisfied.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 675 - 676, Remove the outdated note that says "Note: add `#include <vector>`" from the markdown (the note in the plans doc), because the implementation already includes <vector> in prometheus_exporter.cc; simply delete that line so the doc no longer suggests adding the header.docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md (2)
96-108: Add language specifier to fenced code block.The code block showing internal structure is missing a language specifier. Use
textorplaintextto satisfy markdown linting and improve readability.📝 Suggested fix
-``` +```text 1. Includes, logging refs, system vars, context struct (existing) 2. Prometheus formatting helpers + gauge classification (existing, expanded)As per coding guidelines from static analysis hints:
[warning] 96-96: Fenced code blocks should have a language specified (MD040, fenced-code-language).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md` around lines 96 - 108, The fenced code block that lists the internal structure (the block starting with "1. Includes, logging refs, system vars, context struct" and containing the numbered list) is missing a language specifier; update that triple-backtick fence to include a language token such as text or plaintext (e.g., change ``` to ```text) so the markdown linter rule MD040 is satisfied and readability is improved.
165-202: Add language specifier to ASCII art code block.The ASCII architecture diagram is missing a language specifier. Use
textfor consistency with standard markdown practices.📝 Suggested fix
-``` +```text ┌─────────────────────────────────────────────────────┐ │ VillageSQL Server │As per coding guidelines from static analysis hints:
[warning] 165-165: Fenced code blocks should have a language specified (MD040, fenced-code-language).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md` around lines 165 - 202, The fenced ASCII-art diagram (the triple-backtick code block that begins the big VillageSQL Server diagram) lacks a language specifier; update the opening fence to include the language "text" (i.e., change ``` to ```text) so the diagram is treated as plain text and satisfies MD040/fenced-code-language rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mysql-test/suite/prometheus_exporter/r/innodb_metrics.result`:
- Around line 5-6: The test's comment and metric TYPE disagree: the comment on
the check mentions a "counter" but the metric declaration lists
mysql_innodb_metrics_buffer_pool_reads as a "gauge"; update one to match the
other. Either change the comment to say "gauge" (so it documents the current
TYPE) or change the TYPE line to "counter" to assert the metric is a counter;
ensure you update the metric identifier mysql_innodb_metrics_buffer_pool_reads
accordingly so the test and comment are consistent.
In `@mysql-test/suite/prometheus_exporter/r/replica_status.result`:
- Around line 2-3: The result file replica_status.result contains two "0" lines
but the test replica_status.test only runs one grep -c "mysql_replica_" || echo
"0" command, so remove the extra "0" line from replica_status.result (leaving a
single "0" output) to match the single grep invocation; alternatively if the
test should produce two outputs, add the missing second grep -c "mysql_replica_"
(or equivalent) to replica_status.test so the number of grep outputs matches the
number of result lines.
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Around line 33-42: The code sets $expect_metric_name on a valid "# TYPE" line
but can overwrite it with a subsequent TYPE or leave it set at EOF without
reporting an error; update the TYPE-handling logic in the loop (the block that
matches /^# TYPE / and the similar block at lines 73-79) to first check if
$expect_metric_name is defined and, if so, emit a "missing sample for TYPE"
FORMAT ERROR and increment $errors before assigning a new value, and also add a
final check after the main input loop to emit the same error if
$expect_metric_name remains set at EOF so TYPE lines that never receive a
matching metric sample are flagged.
- Around line 59-66: The test currently skips numeric validation for any metric
whose name matches /^mysql_global_variables_/ which lets non-numeric exports
pass; to fix, remove the "$name =~ /^mysql_global_variables_/" exception in the
value-checking unless block so only the intended "_time$" exception remains
(i.e., keep the check for $name =~ /_time$/ but delete the global variables
whitelist), ensuring $value is enforced to match the numeric regex for
mysql_global_variables_* metrics; update the unless that inspects $value and
$name accordingly.
In `@mysql-test/suite/prometheus_exporter/t/innodb_metrics.test`:
- Around line 7-10: The test currently only checks the presence of the metric
families; tighten the assertions to require exact Prometheus types by changing
the grep patterns to match the full type lines for the two metrics: require "^#
TYPE mysql_innodb_metrics_buffer_pool_reads counter$" for
mysql_innodb_metrics_buffer_pool_reads and "^# TYPE
mysql_innodb_metrics_buffer_pool_size gauge$" for
mysql_innodb_metrics_buffer_pool_size (update the two curl|grep commands in
innodb_metrics.test that reference those metric names).
In `@mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test`:
- Around line 4-8: The test description in metrics_endpoint.test claims "all 5
collector prefixes" but the assertions do not check replica metrics; either add
an assertion that the Prometheus output contains the mysql_replica_ prefix
(e.g., verify presence of a sample metric starting with "mysql_replica_") and
ensure that assertion runs only in a replica-capable test setup, or change the
test description to accurately state the number of collectors currently asserted
(e.g., remove "5" or list the actual prefixes asserted). Update the file's top
comment and any related assertions that mention collectors so they remain
consistent with the new check or the narrowed description.
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 100-101: The errors_total atomic is declared and exported but
never incremented; update the code to increment errors_total wherever request or
collection failures occur (for example in prom_handle_error, in the HTTP handler
when collect_metrics() fails, and inside session operation failure paths), or
remove the variable if you prefer not to expose it. Locate the symbols
errors_total and requests_total to find existing counters, then add
errors_total.fetch_add(1, std::memory_order_relaxed) (or equivalent) in
prom_handle_error and any branch that returns an error from collect_metrics() or
session operations so the metric accurately reflects failures. Ensure increments
happen before sending error responses so the exposed status variable stays
consistent.
- Around line 774-784: The read() on the accepted socket (client_fd from accept
in the listener loop) has no receive timeout and can block the single-threaded
exporter; before calling read() in the accept loop (where client_fd is
obtained), set a receive timeout on client_fd via setsockopt(client_fd,
SOL_SOCKET, SO_RCVTIMEO, ...) or alternatively poll() the client_fd with a short
timeout and only call read() when it's ready; update the code around accept/read
(symbols: accept, client_fd, read) to close the socket and continue on timeout
or error.
In `@plugin/prometheus_exporter/README.md`:
- Around line 15-52: The README's ASCII architecture diagram (the
triple-backtick block containing "VillageSQL Server" and "prometheus_exporter
plugin") contains corrupted glyphs (`���`); fix it by replacing those
placeholders with the correct box-drawing characters (e.g., ─ ┌ ┐ └ ┘ │ ├ ┤ ┬ ┴)
or plain ASCII equivalents so the boxes/lines render correctly, and ensure the
file is saved with UTF-8 encoding; update all occurrences of `���` inside that
diagram block to the appropriate characters to restore the intended boxes and
connectors.
---
Nitpick comments:
In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md`:
- Around line 1093-1098: The numeric validation currently rejects Prometheus
special values; update the check around the $value validation (the regex that
tests $value before emitting the FORMAT ERROR for $name) to also accept `+Inf`,
`-Inf`, and `NaN` — e.g., extend the condition so it passes if the original
numeric regex matches OR $value matches /^[+-]?Inf$/ OR $value eq 'NaN', and
keep the existing error message and $errors++ path unchanged.
- Around line 556-577: The plan and implementation are inconsistent: the plan's
replica_field_metadata uses rc->col_names.emplace_back(field->col_name) but the
actual implementation (prometheus_exporter.cc: replica_field_metadata) uses
push_back; update the plan to match the implementation by replacing emplace_back
with rc->col_names.push_back(field->col_name) (reference symbols:
ReplicaStatusCtx, replica_field_metadata, col_names) so the plan and code are
consistent.
- Around line 675-676: Remove the outdated note that says "Note: add `#include
<vector>`" from the markdown (the note in the plans doc), because the
implementation already includes <vector> in prometheus_exporter.cc; simply
delete that line so the doc no longer suggests adding the header.
In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md`:
- Around line 96-108: The fenced code block that lists the internal structure
(the block starting with "1. Includes, logging refs, system vars, context
struct" and containing the numbered list) is missing a language specifier;
update that triple-backtick fence to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the markdown linter rule MD040 is
satisfied and readability is improved.
- Around line 165-202: The fenced ASCII-art diagram (the triple-backtick code
block that begins the big VillageSQL Server diagram) lacks a language specifier;
update the opening fence to include the language "text" (i.e., change ``` to
```text) so the diagram is treated as plain text and satisfies
MD040/fenced-code-language rules.
In `@mysql-test/suite/prometheus_exporter/t/replica_status.test`:
- Line 1: Add a short inline comment above the include line referencing
include/not_windows.inc that explains why Windows is excluded (for example:
reliance on curl/Unix sockets, POSIX-only behavior in the Prometheus exporter
tests, or known Windows networking incompatibilities); update the test file's
top-of-file section (where the include "include/not_windows.inc" appears) to
state the specific reason and any relevant bug/issue tracker IDs so future
maintainers understand the limitation.
- Line 4: The test currently uses the shell line '--exec curl -s
http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"', which
prints "0" twice when no matches are found; replace that pattern with '--exec
curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true' so
grep's count is preserved and non-zero exit is suppressed, and update the test
result expectation to a single "0" output (remove the duplicate line) to keep
output and assertions consistent.
In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 403-410: The struct ReplicaStatusCtx contains an unused field
has_row; remove the has_row member from ReplicaStatusCtx and also remove the
assignment to it inside replica_start_row so there are no leftover writes to a
deleted field; update any references (none expected) and rebuild to ensure no
use remains of ReplicaStatusCtx::has_row.
- Around line 851-863: The thread attribute initialized with
my_thread_attr_init(&attr) is not destroyed; call my_thread_attr_destroy(&attr)
after attempting to create the thread to free resources whether
my_thread_create(&g_ctx->listener_thread, &attr, prometheus_listener_thread,
g_ctx) succeeds or fails; add my_thread_attr_destroy(&attr) before the error
return path (after cleaning up g_ctx/listen_fd) and also after successful
creation (before proceeding) so attr is always destroyed.
- Around line 119-157: Replace the linear-scan array lookup in is_gauge with an
O(1) hash lookup: create a static std::unordered_set<std::string> (or
std::string_view with storage) containing all gauge names (populate it once from
gauge_variables or via an initializer list), ensure case-insensitive matching by
storing lowercased names and lowercasing the input name before lookup (or use a
custom case-insensitive hash/equality), and make initialization thread-safe by
using a function-local static or std::call_once; then change is_gauge to query
that unordered_set instead of iterating through gauge_variables.
- Around line 52-55: The global service pointers log_bi and log_bs currently
have external linkage; make them have internal linkage like reg_srv by declaring
them static (i.e., add the static storage specifier to the
SERVICE_TYPE(log_builtins) *log_bi and SERVICE_TYPE(log_builtins_string) *log_bs
declarations) so they won’t collide with symbols from other plugins; keep
SERVICE_TYPE(registry) *reg_srv unchanged.
- Around line 703-707: The call to security_context_lookup is unchecked; capture
its return value and handle errors defensively: call
thd_get_security_context(...) into MYSQL_SECURITY_CONTEXT sc as before, then if
security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") returns
non-zero, log the failure (using the plugin/mysql logging facility available in
this module) with context (e.g., include the user/host) and abort the root
switch path—return an error code or clean up/exit the function instead of
continuing; ensure any resources are left consistent before returning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5dfbf72a-b1ba-4ab5-821a-cb858172c6d2
📒 Files selected for processing (27)
Docs/prometheus_exporter.mddocs/superpowers/plans/2026-04-05-prometheus-exporter-v2.mddocs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.mdmysql-test/include/plugin.defsmysql-test/suite/prometheus_exporter/r/basic.resultmysql-test/suite/prometheus_exporter/r/binlog.resultmysql-test/suite/prometheus_exporter/r/format_validation.resultmysql-test/suite/prometheus_exporter/r/global_variables.resultmysql-test/suite/prometheus_exporter/r/innodb_metrics.resultmysql-test/suite/prometheus_exporter/r/metrics_endpoint.resultmysql-test/suite/prometheus_exporter/r/replica_status.resultmysql-test/suite/prometheus_exporter/t/basic.testmysql-test/suite/prometheus_exporter/t/binlog-master.optmysql-test/suite/prometheus_exporter/t/binlog.testmysql-test/suite/prometheus_exporter/t/format_validation-master.optmysql-test/suite/prometheus_exporter/t/format_validation.testmysql-test/suite/prometheus_exporter/t/global_variables-master.optmysql-test/suite/prometheus_exporter/t/global_variables.testmysql-test/suite/prometheus_exporter/t/innodb_metrics-master.optmysql-test/suite/prometheus_exporter/t/innodb_metrics.testmysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.optmysql-test/suite/prometheus_exporter/t/metrics_endpoint.testmysql-test/suite/prometheus_exporter/t/replica_status-master.optmysql-test/suite/prometheus_exporter/t/replica_status.testplugin/prometheus_exporter/CMakeLists.txtplugin/prometheus_exporter/README.mdplugin/prometheus_exporter/prometheus_exporter.cc
| --exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads " | head -1 | ||
|
|
||
| --echo # Verify a known gauge type metric exists (buffer_pool_size is 'value' type in InnoDB) | ||
| --exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size " | head -1 |
There was a problem hiding this comment.
Assert exact Prometheus types for the two canonical InnoDB metrics.
These patterns currently only confirm metric family presence; they don’t enforce counter for buffer_pool_reads or gauge for buffer_pool_size.
🔧 Suggested tightening
---exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads " | head -1
+--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads counter$" | head -1
---exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size " | head -1
+--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size gauge$" | head -1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mysql-test/suite/prometheus_exporter/t/innodb_metrics.test` around lines 7 -
10, The test currently only checks the presence of the metric families; tighten
the assertions to require exact Prometheus types by changing the grep patterns
to match the full type lines for the two metrics: require "^# TYPE
mysql_innodb_metrics_buffer_pool_reads counter$" for
mysql_innodb_metrics_buffer_pool_reads and "^# TYPE
mysql_innodb_metrics_buffer_pool_size gauge$" for
mysql_innodb_metrics_buffer_pool_size (update the two curl|grep commands in
innodb_metrics.test that reference those metric names).
…tness - Default bind address changed from 0.0.0.0 to 127.0.0.1 (no auth endpoint) - Increment errors_total in all four error handlers - Strict HTTP path matching (reject /metricsXYZ) - Add SO_RCVTIMEO on client sockets to prevent slow-client hangs - Full strtod/strtoll consumption checks to reject partial numeric parses - Check return values of security context functions - Fix README corrupted box-drawing characters - Test fixes: grep -c || true, orphaned TYPE detection, comment accuracy
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugin/prometheus_exporter/README.md (1)
15-15: Consider adding a language identifier to the fenced code block.While not critical for ASCII art diagrams, adding a language identifier (e.g.,
textor leaving it as```) would satisfy the markdownlint rule and is a minor documentation best practice.📝 Proposed fix
-``` +```text ┌─────────────────────────────────────────────────────┐🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/prometheus_exporter/README.md` at line 15, Add a language identifier to the fenced code block in the README so markdownlint is satisfied; locate the ASCII-art fenced block delimited by ``` and change the opening fence to include a language token such as text (e.g., replace the opening ``` with ```text) so the block becomes a labeled code fence while keeping its contents unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugin/prometheus_exporter/README.md`:
- Line 15: Add a language identifier to the fenced code block in the README so
markdownlint is satisfied; locate the ASCII-art fenced block delimited by ```
and change the opening fence to include a language token such as text (e.g.,
replace the opening ``` with ```text) so the block becomes a labeled code fence
while keeping its contents unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bd21e74f-8e15-4551-a7a8-bc072326f157
📒 Files selected for processing (9)
mysql-test/suite/prometheus_exporter/r/basic.resultmysql-test/suite/prometheus_exporter/r/innodb_metrics.resultmysql-test/suite/prometheus_exporter/r/replica_status.resultmysql-test/suite/prometheus_exporter/t/format_validation.testmysql-test/suite/prometheus_exporter/t/innodb_metrics.testmysql-test/suite/prometheus_exporter/t/metrics_endpoint.testmysql-test/suite/prometheus_exporter/t/replica_status.testplugin/prometheus_exporter/README.mdplugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (7)
- mysql-test/suite/prometheus_exporter/r/replica_status.result
- mysql-test/suite/prometheus_exporter/r/innodb_metrics.result
- mysql-test/suite/prometheus_exporter/r/basic.result
- mysql-test/suite/prometheus_exporter/t/replica_status.test
- mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
- mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
- plugin/prometheus_exporter/prometheus_exporter.cc
🚧 Files skipped from review as they are similar to previous changes (1)
- mysql-test/suite/prometheus_exporter/t/format_validation.test
Thoughtful Review of PR #1I commissioned five parallel specialized review agents (security, concurrency, code quality, test coverage, MySQL plugin API conformance) and synthesized their findings below. The bottom line: the plugin works and has a solid foundation, but it has a number of real correctness, security, and lifecycle bugs that should be addressed before this can be considered production-ready. The v1 scope was intentionally minimal; many of these items are natural follow-ups for a v2. Findings are grouped by theme with severity and concrete fix suggestions. Line numbers reference the current HEAD of the branch. 🚨 Critical / High — Must Fix Before Merge1. Use-after-free race in
|
| Severity | Count |
|---|---|
| Critical/High (must fix) | 8 |
| Medium (should fix) | 7 |
| Code Quality | 6 |
| Test Coverage | 7 |
Recommended Path Forward
Given the scope of the findings, I would suggest landing this PR (v1 foundation is solid) and opening follow-up issues for:
- Security hardening issue — items Fix README installation and Quick Start examples villagesql/villagesql-server#4, Set Default Issue templates villagesql/villagesql-server#9, and the bind-address warning (Feature request: Stored procedures in Python villagesql/villagesql-server#7 in README)
- Concurrency / lifecycle issue — items Embedded Prometheus metrics exporter plugin #1, Data directory initialization issues first time installation villagesql/villagesql-server#2, Five builtin extensions mentioned, only vsql_complex and vsql_simple actually available villagesql/villagesql-server#5, Fix README to clarify only 2 extensions included in source builds villagesql/villagesql-server#6
- HTTP robustness issue — items Streamlining installation instructions villagesql/villagesql-server#3, Initial launch of VillageSQL villagesql/villagesql-server#8, [Feature Request]: Provide a pre-built Docker image villagesql/villagesql-server#15
- Test coverage expansion issue — items Error messages for disallowed implicit cast of custom types villagesql/villagesql-server#22–Update README to reflect unqualified VDF function calls villagesql/villagesql-server#28
- Refactor issue — items [Feature Request]: Server support on Windows villagesql/villagesql-server#16–vsql_simple doesn't seem to function villagesql/villagesql-server#21
Alternatively, items #1, villagesql#2, villagesql#3, villagesql#4, villagesql#5, and villagesql#6 (the critical correctness/safety items) could be addressed in this PR before merge, and the rest deferred to follow-ups.
This review was produced by dispatching 5 specialized agents in parallel: security, concurrency, code quality, test coverage, and MySQL plugin API conformance. Each agent read the code independently and reported findings; I synthesized them here.
- Move atomic counters out of PrometheusContext to file-scope globals, eliminating the use-after-free race between SHOW STATUS callbacks and plugin deinit. Use memcpy instead of reinterpret_cast to avoid alignment issues in SHOW_FUNC callbacks. - Use eventfd as wakeup mechanism instead of closing listen_fd from another thread (fixes fd-recycle hazard). close() now happens after join(), so the listener never sees closed fds. - Use memory_order_acquire/release on shutdown_requested flag. - Add deinit_logging_service_for_plugin to all init failure paths to avoid leaking the logging service registration. - NULL/empty check for bind_address to prevent crash on inet_pton(NULL).
- Add SO_SNDTIMEO (5s) on accepted client sockets alongside existing SO_RCVTIMEO. Prevents listener thread from blocking forever when a client completes TCP handshake but never reads. - Switch write_full from write() to send() with MSG_NOSIGNAL to avoid SIGPIPE on half-closed connections, and retry on EINTR. - Loop reading HTTP request until \r\n\r\n delimiter or buffer full, instead of a single read() that may return a partial request across TCP segments. Handles EINTR correctly. - Length check before indexing buf[12] in path matcher.
- Move srv_session_init_thread/deinit_thread to listener thread lifetime instead of per-scrape. Per MySQL service contract, these must be called once per physical thread, not per request. - Add prometheus_exporter_security_user sysvar to make the internal user configurable (was hardcoded to "root"). Default remains "root" for backward compatibility. Operators can switch to a least-privilege account (e.g. one granted PROCESS, REPLICATION CLIENT, and SELECT on information_schema) by setting this variable at startup. Note that mysql.session@localhost does NOT have PROCESS by default, so cannot read information_schema.INNODB_METRICS without an additional grant. - Improve error message when security context setup fails.
- Binlog collector: fully validate numeric parsing (reject "123abc"), check for negative values and overflow before accumulating. - Replica collector: require full-string numeric consumption in strtod check, matching the fix previously applied to global_status. - Log a WARNING at plugin init when bound to a non-loopback address, alerting operators to the security implications of exposing the unauthenticated /metrics endpoint to the network. - Mark unused parameter in global_variables_type with [[maybe_unused]].
- Remove section separator comments per CLAUDE.md guideline - Remove g_ctx/plugin->data redundancy, keeping only one source of truth for the plugin context pointer
- Add scrape_counter.test: captures requests_total before and after a fixed number of curl scrapes, asserts exact delta. Also verifies scrape_duration_microseconds > 0 and errors_total == 0. Catches regressions where status counters silently fail to update. - Remove format_validation carve-out that exempted global_variables and _time metrics from numeric validation. The plugin's prom_end_row already filters non-numeric values, so the validator should enforce the invariant strictly. Extend regex to accept NaN/+Inf/-Inf per Prometheus spec.
Review Fixes AppliedI've addressed the majority of the findings from the review above across 6 commits. 9 tests passing (up from 8 — added a new Commits (most recent last)
What was fixed (by review item number)Critical / High (8/8):
Medium (6/7):
Code Quality (3/6):
Test Coverage (2/7):
Not fixed (deferred to follow-ups)The following items from the review are real but were judged out of scope for this PR:
Test run |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
mysql-test/suite/prometheus_exporter/t/format_validation.test (1)
55-60: Tighten validation: reject metric samples without a preceding# TYPE.Currently, a sample line is accepted when no expected TYPE is pending. For this plugin’s contract, that weakens regression detection.
🔧 Proposed patch
- if (defined $expect_metric_name && $name ne $expect_metric_name) { - print "FORMAT ERROR: expected metric '$expect_metric_name' but got '$name'\n"; - $errors++; - } + if (!defined $expect_metric_name) { + print "FORMAT ERROR: metric '$name' missing preceding TYPE line\n"; + $errors++; + } elsif ($name ne $expect_metric_name) { + print "FORMAT ERROR: expected metric '$expect_metric_name' but got '$name'\n"; + $errors++; + } $expect_metric_name = undef;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/format_validation.test` around lines 55 - 60, The test currently allows a sample line when $expect_metric_name is undef; change the check in the sample-handling block so that if $expect_metric_name is defined you still compare $name to $expect_metric_name, but if it is not defined you treat this as a format error (increment $errors and print a clear message like "FORMAT ERROR: sample without preceding TYPE for '<name>'"). Update the block that references $expect_metric_name, $name, and $errors to implement this rejection and then clear $expect_metric_name as before.mysql-test/suite/prometheus_exporter/t/scrape_counter.test (1)
16-18: Add explicit timeout to prevent test hangs, but reconsider fail-fast flag.
curl -scorrectly returns success here since the test validates correctness through theerrors_totalcounter assertion (line 33–35). However,curlwith no deadline can hang indefinitely. Adding--max-time 5is prudent.Avoid
-fflag: It makescurlexit non-zero on HTTP 4xx/5xx, causing--execto fail and abort the test before the verification checks (lines 24–35) run. The current design intentionally lets curl succeed and then validates via metrics—which is the correct approach for this regression test.---exec curl -s http://127.0.0.1:19110/metrics > /dev/null ---exec curl -s http://127.0.0.1:19110/metrics > /dev/null ---exec curl -s http://127.0.0.1:19110/metrics > /dev/null ++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null ++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null ++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mysql-test/suite/prometheus_exporter/t/scrape_counter.test` around lines 16 - 18, The three test invocations that run curl (the lines with "--exec curl -s http://127.0.0.1:19110/metrics > /dev/null") should include an explicit timeout to avoid hanging: add "--max-time 5" to each curl invocation; do NOT add "-f" or any fail-fast flag since the test intentionally lets curl succeed and then verifies correctness via the errors_total counter in the subsequent assertions (lines around the errors_total check).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Line 19: The curl backtick invocation assigning to $output can hang; change
the command to include timeout and fail flags (e.g. add --max-time 5 --fail
--silent) and then check the shell exit status ($?) immediately after the
backtick call, failing the test if non-zero (die or ok/fail) so the test is
fail-fast on fetch errors; locate the assignment to $output (my $output = `curl
-s http://127.0.0.1:19109/metrics`), replace the command string with one that
includes --max-time and --fail, and add a conditional checking $? to handle and
report the error before proceeding.
---
Nitpick comments:
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Around line 55-60: The test currently allows a sample line when
$expect_metric_name is undef; change the check in the sample-handling block so
that if $expect_metric_name is defined you still compare $name to
$expect_metric_name, but if it is not defined you treat this as a format error
(increment $errors and print a clear message like "FORMAT ERROR: sample without
preceding TYPE for '<name>'"). Update the block that references
$expect_metric_name, $name, and $errors to implement this rejection and then
clear $expect_metric_name as before.
In `@mysql-test/suite/prometheus_exporter/t/scrape_counter.test`:
- Around line 16-18: The three test invocations that run curl (the lines with
"--exec curl -s http://127.0.0.1:19110/metrics > /dev/null") should include an
explicit timeout to avoid hanging: add "--max-time 5" to each curl invocation;
do NOT add "-f" or any fail-fast flag since the test intentionally lets curl
succeed and then verifies correctness via the errors_total counter in the
subsequent assertions (lines around the errors_total check).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0ca05812-c49f-4d12-9a5d-4794d009e036
📒 Files selected for processing (6)
mysql-test/suite/prometheus_exporter/r/basic.resultmysql-test/suite/prometheus_exporter/r/scrape_counter.resultmysql-test/suite/prometheus_exporter/t/format_validation.testmysql-test/suite/prometheus_exporter/t/scrape_counter-master.optmysql-test/suite/prometheus_exporter/t/scrape_counter.testplugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (3)
- mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
- mysql-test/suite/prometheus_exporter/r/basic.result
- mysql-test/suite/prometheus_exporter/r/scrape_counter.result
🚧 Files skipped from review as they are similar to previous changes (1)
- plugin/prometheus_exporter/prometheus_exporter.cc
| use strict; | ||
| use warnings; | ||
|
|
||
| my $output = `curl -s http://127.0.0.1:19109/metrics`; |
There was a problem hiding this comment.
Add timeout and fail-fast handling to the curl fetch.
At Line 19, curl -s can block indefinitely and make this test flaky under listener stalls. Add --max-time, --fail, and explicit exit-status handling.
🔧 Proposed patch
-my $output = `curl -s http://127.0.0.1:19109/metrics`;
+my $output = `curl -sS --fail --max-time 5 http://127.0.0.1:19109/metrics`;
+if ($? != 0) {
+ print "ERROR: failed to fetch /metrics\n";
+ exit 1;
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| my $output = `curl -s http://127.0.0.1:19109/metrics`; | |
| my $output = `curl -sS --fail --max-time 5 http://127.0.0.1:19109/metrics`; | |
| if ($? != 0) { | |
| print "ERROR: failed to fetch /metrics\n"; | |
| exit 1; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test` at line 19,
The curl backtick invocation assigning to $output can hang; change the command
to include timeout and fail flags (e.g. add --max-time 5 --fail --silent) and
then check the shell exit status ($?) immediately after the backtick call,
failing the test if non-zero (die or ok/fail) so the test is fail-fast on fetch
errors; locate the assignment to $output (my $output = `curl -s
http://127.0.0.1:19109/metrics`), replace the command string with one that
includes --max-time and --fail, and add a conditional checking $? to handle and
report the error before proceeding.
Summary
plugin/prometheus_exporter/)/metricsendpoint in Prometheus text exposition format via a built-in HTTP serversrv_session+command_serviceto execute SQL queries internally (no network, no auth overhead)--prometheus-exporter-enabled=ONData Sources (5 collectors)
mysql_global_status_*SHOW GLOBAL STATUSmysql_global_variables_*SHOW GLOBAL VARIABLESmysql_innodb_metrics_*information_schema.INNODB_METRICSmysql_replica_*SHOW REPLICA STATUSmysql_binlog_*SHOW BINARY LOGSConfiguration
Then:
curl http://127.0.0.1:9104/metricsDesign Decisions
srv_session, opened once per Prometheus scrapeSHOW REPLICA STATUSon a non-replica), that section is silently skippedTest Plan
./mysql-test/mysql-test-run.pl --suite=prometheus_exporter --parallel=auto)basic-- plugin install/uninstall, system variables, status variablesmetrics_endpoint-- HTTP endpoint, all 5 collector prefixes, 404 handlingglobal_variables-- SHOW GLOBAL VARIABLES collector outputinnodb_metrics-- INNODB_METRICS collector with counter/gauge typesreplica_status-- graceful absence on non-replica serverbinlog-- binary log file count and size metricsformat_validation-- Perl-based structural validation of Prometheus exposition formatcurl http://localhost:9104/metricsreturns valid Prometheus outputSummary by CodeRabbit
New Features
Documentation
Tests