Skip to content

Embedded Prometheus metrics exporter plugin#1

Open
renecannao wants to merge 16 commits intomainfrom
prometheus
Open

Embedded Prometheus metrics exporter plugin#1
renecannao wants to merge 16 commits intomainfrom
prometheus

Conversation

@renecannao
Copy link
Copy Markdown

@renecannao renecannao commented Apr 5, 2026

Summary

  • Add an embedded Prometheus metrics exporter as a MySQL daemon plugin (plugin/prometheus_exporter/)
  • Serves /metrics endpoint in Prometheus text exposition format via a built-in HTTP server
  • Uses srv_session + command_service to execute SQL queries internally (no network, no auth overhead)
  • Disabled by default; enabled via --prometheus-exporter-enabled=ON
  • Zero changes to existing MySQL/VillageSQL code outside the plugin directory and test suite

Data Sources (5 collectors)

Prefix Source Metrics
mysql_global_status_* SHOW GLOBAL STATUS ~400 server status counters
mysql_global_variables_* SHOW GLOBAL VARIABLES Numeric config values
mysql_innodb_metrics_* information_schema.INNODB_METRICS ~200 InnoDB internals
mysql_replica_* SHOW REPLICA STATUS Replication lag, thread state
mysql_binlog_* SHOW BINARY LOGS File count, total size

Configuration

[mysqld]
plugin-load=prometheus_exporter=prometheus_exporter.so
prometheus-exporter-enabled=ON
prometheus-exporter-port=9104
prometheus-exporter-bind-address=127.0.0.1

Then: curl http://127.0.0.1:9104/metrics

Design Decisions

  • SQL queries over direct struct access: Uses stable SQL interfaces rather than internal server structs, making the plugin resilient to MySQL version changes during rebases
  • Single session per scrape: All collectors share one srv_session, opened once per Prometheus scrape
  • Graceful failure per collector: If a query fails (e.g., SHOW REPLICA STATUS on a non-replica), that section is silently skipped

Test Plan

  • 7 MTR tests pass (./mysql-test/mysql-test-run.pl --suite=prometheus_exporter --parallel=auto)
    • basic -- plugin install/uninstall, system variables, status variables
    • metrics_endpoint -- HTTP endpoint, all 5 collector prefixes, 404 handling
    • global_variables -- SHOW GLOBAL VARIABLES collector output
    • innodb_metrics -- INNODB_METRICS collector with counter/gauge types
    • replica_status -- graceful absence on non-replica server
    • binlog -- binary log file count and size metrics
    • format_validation -- Perl-based structural validation of Prometheus exposition format
  • Manual: curl http://localhost:9104/metrics returns valid Prometheus output
  • Manual: Prometheus can scrape the endpoint and display metrics

Summary by CodeRabbit

  • New Features

    • Prometheus exporter plugin: exposes MySQL metrics via HTTP /metrics; configurable enablement, bind address, and port; includes self-observability counters.
  • Documentation

    • Added comprehensive docs and usage examples for installing, configuring, and scraping the exporter.
  • Tests

    • Added extensive test suite and fixtures validating endpoint behavior, metric formats, metric families (status, variables, InnoDB, replication, binlog), and scrape counters.

Add a MySQL daemon plugin that serves Prometheus text exposition format
metrics via an embedded HTTP endpoint. Uses srv_session + command_service
to execute SHOW GLOBAL STATUS internally (no network, no auth).

Features:
- Disabled by default (--prometheus-exporter-enabled=ON to activate)
- Configurable port (default 9104) and bind address
- Exports ~400 global status variables as mysql_global_status_* metrics
- Gauge/untyped type classification for known variables
- Plugin self-monitoring: requests_total, errors_total, scrape_duration
- Clean shutdown via poll() + atomic flag

Includes MTR test suite (basic + metrics_endpoint) and design spec/plan
for v2 enhancements.
Extract the inline SHOW GLOBAL STATUS logic from collect_metrics() into
separate reusable functions: collect_name_value_query() as a generic
helper for 2-column queries, and collect_global_status() as the specific
collector. MetricsCollectorCtx now takes a configurable prefix and type
function, enabling additional collectors to be added cleanly.
Export all numeric MySQL server configuration variables as Prometheus
gauge metrics with the mysql_global_variables_ prefix. Non-numeric
values (strings, paths, ON/OFF) are automatically skipped by the
existing strtod check.
Add collector that exports InnoDB metrics from
information_schema.INNODB_METRICS with proper Prometheus type mapping
(InnoDB 'counter' -> Prometheus counter, everything else -> gauge).
Uses dedicated 4-column callbacks since INNODB_METRICS returns
NAME, SUBSYSTEM, TYPE, COUNT rather than simple name/value pairs.
Add column-name-aware parsing for SHOW REPLICA STATUS that exports
6 replication metrics as gauges: seconds_behind_source, io_running,
sql_running, relay_log_space, exec_source_log_pos, and
read_source_log_pos. Boolean fields (IO/SQL running) emit 1 for
"Yes", 0 otherwise. Gracefully emits nothing on non-replica servers.
Adds a binlog collector that runs SHOW BINARY LOGS and emits two
synthetic gauge metrics: mysql_binlog_file_count (number of binary
log files) and mysql_binlog_size_bytes_total (sum of all file sizes).
If binary logging is disabled the query errors silently and no
metrics are emitted.
- Update basic.test with inline documentation and clearer echo messages
- Update metrics_endpoint.test to verify all 5 collector prefixes
  (global status, global variables, InnoDB metrics, binlog)
- Add format_validation.test with Perl-based Prometheus exposition
  format validation (TYPE lines, metric names, value structure)
- Record all result files
- Full README with ASCII architecture diagram, configuration table,
  metric namespace reference, usage examples, type classification,
  and limitations
- Pointer doc in Docs/ directory
Copilot AI review requested due to automatic review settings April 5, 2026 05:59
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

📝 Walkthrough

Walkthrough

Adds a new Prometheus Exporter daemon plugin that serves an HTTP /metrics endpoint, collects MySQL metrics via internal sessions, exposes Prometheus text-format metrics, includes build files, comprehensive tests, and documentation.

Changes

Cohort / File(s) Summary
Documentation
Docs/prometheus_exporter.md, plugin/prometheus_exporter/README.md
New overview and detailed README describing plugin behavior, config variables, metric namespaces, usage, and limitations.
Plugin Source & Build
plugin/prometheus_exporter/prometheus_exporter.cc, plugin/prometheus_exporter/CMakeLists.txt
New daemon plugin implementation and CMake target registering system/status variables, listener thread, HTTP /metrics handling, metric collection logic, and lifecycle.
Test Framework Registration
mysql-test/include/plugin.defs
Added plugin definition entry for prometheus_exporter to the test harness.
Test Suites (scripts & results)
mysql-test/suite/prometheus_exporter/t/*, mysql-test/suite/prometheus_exporter/r/*, mysql-test/suite/prometheus_exporter/t/*-master.opt
Comprehensive MTR tests and expected results covering install/uninstall, metrics endpoint, format validation, global variables, InnoDB metrics, binlog metrics, replica-status handling, and scrape counter behavior across multiple option files.
Test Fixtures / Results
mysql-test/suite/prometheus_exporter/r/*.result
New expected-result fixtures validating metric presence, types, numeric values, format validation, and scrape counter assertions.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTPServer as "Listener Thread (HTTP)"
    participant MySQL as "MySQL Command Service"
    participant Formatter as "Metrics Formatter"

    Client->>HTTPServer: TCP connect + GET /metrics
    HTTPServer->>HTTPServer: parse request
    HTTPServer->>MySQL: open srv_session / run SHOW GLOBAL STATUS
    MySQL-->>Formatter: status rows
    Formatter->>Formatter: classify (gauge/untyped) and format
    HTTPServer->>MySQL: SHOW GLOBAL VARIABLES / INNODB_METRICS / REPLICA STATUS / SHOW BINARY LOGS
    MySQL-->>Formatter: query results
    Formatter->>Formatter: map types, convert booleans, aggregate binlog sizes
    HTTPServer->>HTTPServer: update Prometheus_exporter_* counters/timers
    HTTPServer->>Client: HTTP 200 + Prometheus text payload
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped a thread to fetch each stat,

From SHOW and schema I gathered that.
Metrics in lines, for Prometheus to read,
Scrapes counted neatly, each little deed.
A tiny exporter, with a rabbit’s proud beat.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Embedded Prometheus metrics exporter plugin' accurately summarizes the main change in the pull request, which adds a new Prometheus exporter daemon plugin with comprehensive documentation and test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch prometheus

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Remove spec and plan docs from docs/superpowers/ -- these were
development-time artifacts. User-facing documentation lives in
plugin/prometheus_exporter/README.md.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an embedded Prometheus /metrics HTTP exporter as a MySQL/VillageSQL daemon plugin, plus an accompanying MTR test suite and documentation, so Prometheus can scrape server metrics without an external sidecar.

Changes:

  • Introduces prometheus_exporter daemon plugin implementing an in-process HTTP server and 5 SQL-based collectors (status, variables, InnoDB metrics, replica status, binlogs).
  • Adds a new prometheus_exporter MTR suite with per-collector endpoint checks and a Perl-based exposition-format validator.
  • Adds plugin build wiring and end-user documentation (README + Docs pointer) plus internal design/spec docs.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
plugin/prometheus_exporter/prometheus_exporter.cc Implements the embedded HTTP server and SQL-driven metric collection/export.
plugin/prometheus_exporter/CMakeLists.txt Adds build rules to produce the prometheus_exporter plugin module.
plugin/prometheus_exporter/README.md Documents architecture, configuration, metric namespaces, and limitations.
Docs/prometheus_exporter.md Provides a Docs/ entry pointing to the plugin README.
mysql-test/include/plugin.defs Registers the plugin for MTR so it can be loaded in test runs.
mysql-test/suite/prometheus_exporter/t/basic.test Verifies install/uninstall lifecycle and sysvar/statusvar presence.
mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test Verifies /metrics endpoint behavior, multiple prefixes, and 404 handling.
mysql-test/suite/prometheus_exporter/t/global_variables.test Validates SHOW GLOBAL VARIABLES metrics export.
mysql-test/suite/prometheus_exporter/t/innodb_metrics.test Validates information_schema.INNODB_METRICS metrics export and typing.
mysql-test/suite/prometheus_exporter/t/replica_status.test Ensures replica metrics are absent on non-replica servers.
mysql-test/suite/prometheus_exporter/t/binlog.test Validates binlog file count/size metrics export.
mysql-test/suite/prometheus_exporter/t/format_validation.test Perl-based structural validation of Prometheus text exposition output.
mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt Test server options (enable plugin, bind/port) for endpoint test.
mysql-test/suite/prometheus_exporter/t/global_variables-master.opt Test server options for global variables collector test.
mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt Test server options for InnoDB metrics collector test.
mysql-test/suite/prometheus_exporter/t/replica_status-master.opt Test server options for replica-status collector test.
mysql-test/suite/prometheus_exporter/t/binlog-master.opt Test server options for binlog collector test.
mysql-test/suite/prometheus_exporter/t/format_validation-master.opt Test server options for format-validation test.
mysql-test/suite/prometheus_exporter/r/basic.result Recorded output for basic lifecycle test.
mysql-test/suite/prometheus_exporter/r/metrics_endpoint.result Recorded output for endpoint/all-collector smoke test.
mysql-test/suite/prometheus_exporter/r/global_variables.result Recorded output for global variables collector test.
mysql-test/suite/prometheus_exporter/r/innodb_metrics.result Recorded output for InnoDB metrics collector test.
mysql-test/suite/prometheus_exporter/r/replica_status.result Recorded output for replica-status absence test.
mysql-test/suite/prometheus_exporter/r/binlog.result Recorded output for binlog collector test.
mysql-test/suite/prometheus_exporter/r/format_validation.result Recorded output for exposition-format validation test.
docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md Internal design spec for the v2 exporter.
docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md Internal step-by-step implementation plan and commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +19 to +22
│ ┌──────────────���───────────────────────────────┐ │
│ │ prometheus_exporter plugin │ │
│ │ �� │
│ │ ┌──────────────┐ ┌────────────────────┐ │ │
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASCII architecture diagram contains corrupted/invalid UTF-8 characters (e.g., "���", "��"), which makes the README render poorly and is likely an encoding artifact. Please replace those characters with plain ASCII/Unicode box-drawing characters and ensure the file is saved as valid UTF-8.

Copilot uses AI. Check for mistakes.
Comment on lines +703 to +707
// Switch to root security context
MYSQL_SECURITY_CONTEXT sc;
thd_get_security_context(srv_session_info_get_thd(session), &sc);
security_context_lookup(sc, "root", "localhost", "127.0.0.1", "");

Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exporter unconditionally switches the internal session to the "root" security context and ignores return codes from thd_get_security_context/security_context_lookup. This effectively runs the collectors with full privileges and can fail silently depending on account configuration. Consider using a dedicated low-privilege account/context (or the admin session service if appropriate), and handle errors explicitly (e.g., fail the scrape / increment errors_total when context switch fails).

Suggested change
// Switch to root security context
MYSQL_SECURITY_CONTEXT sc;
thd_get_security_context(srv_session_info_get_thd(session), &sc);
security_context_lookup(sc, "root", "localhost", "127.0.0.1", "");
// Run collectors using the default internal session context. Do not
// unconditionally elevate this session to the root account.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +81
"endpoint. Default 0.0.0.0.",
nullptr, nullptr, "0.0.0.0");
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting the bind address to 0.0.0.0 while also having no authentication/TLS means enabling the plugin can unintentionally expose internal server metrics on all interfaces. Consider changing the default to 127.0.0.1 (or requiring an explicit bind address when enabled) and logging a warning when binding to a non-loopback address.

Suggested change
"endpoint. Default 0.0.0.0.",
nullptr, nullptr, "0.0.0.0");
"endpoint. Default 127.0.0.1.",
nullptr, nullptr, "127.0.0.1");

Copilot uses AI. Check for mistakes.

static void prom_handle_error(void *ctx, uint, const char *, const char *) {
auto *mc = static_cast<MetricsCollectorCtx *>(ctx);
mc->error = true;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors_total is never incremented: prom_handle_error()/innodb_handle_error()/replica_handle_error()/binlog_handle_error() only set per-collector flags, but those flags are never checked and g_ctx->errors_total is only read for SHOW STATUS. Please wire collector/scrape failures to increment errors_total (and/or surface an error comment in the scrape output) so the status variable reflects actual failures.

Suggested change
mc->error = true;
mc->error = true;
if (g_ctx != nullptr) {
++g_ctx->errors_total;
}

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +208
// Try to parse as a number; skip non-numeric values (ON/OFF etc.)
char *end = nullptr;
double val = strtod(mc->current_value.c_str(), &end);
if (end == mc->current_value.c_str()) return 0; // not numeric
(void)val; // we use the string representation directly
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numeric parsing accepts partially-numeric strings (e.g., "1abc"): strtod() is only checked for end==start, but the original (possibly invalid) string is emitted as the Prometheus value. This can generate invalid exposition output. Please require that the full string parses as a number (optionally allowing surrounding whitespace) before emitting.

Copilot uses AI. Check for mistakes.
Comment on lines +482 to +487
// Check if numeric
char *end = nullptr;
strtod(val.c_str(), &end);
if (end == val.c_str()) continue; // not numeric, skip
value_str = val;
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replica metric numeric validation has the same partial-parse issue as prom_end_row: it only checks end==start after strtod(), so values like "123abc" would be emitted and break Prometheus parsing. Please require full numeric consumption (optionally allowing trailing whitespace) before emitting.

Copilot uses AI. Check for mistakes.
Comment on lines +570 to +573
char *end = nullptr;
long long sz = strtoll(bc->current_size.c_str(), &end, 10);
if (end != bc->current_size.c_str()) {
bc->total_size += sz;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binlog size parsing uses strtoll() but only checks end!=start; partially-numeric strings (e.g., "123abc") would still be counted. Please validate that the entire field parses as an integer (optionally allowing whitespace) before adding it to total_size.

Suggested change
char *end = nullptr;
long long sz = strtoll(bc->current_size.c_str(), &end, 10);
if (end != bc->current_size.c_str()) {
bc->total_size += sz;
const char *start = bc->current_size.c_str();
char *end = nullptr;
long long sz = strtoll(start, &end, 10);
if (end != start) {
while (*end == ' ' || *end == '\t' || *end == '\n' || *end == '\r' ||
*end == '\f' || *end == '\v') {
++end;
}
if (*end == '\0') {
bc->total_size += sz;
}

Copilot uses AI. Check for mistakes.
Comment on lines +786 to +787
if (strncmp(buf, "GET /metrics", 12) == 0) {
ctx->requests_total.fetch_add(1, std::memory_order_relaxed);
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP routing check is too permissive: strncmp("GET /metrics", 12) will also match paths like /metricsXYZ and doesn't ensure the first line is a valid request line. Consider parsing just the request line and matching the exact path "/metrics" (and method GET) before returning 200.

Copilot uses AI. Check for mistakes.
--source include/not_windows.inc

--echo # On a non-replica server, no mysql_replica_ metrics should appear
--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command prints two lines when there are no matches: grep -c already outputs 0, but exits with status 1 which triggers || echo "0". Use || true (or similar) to keep the exit status non-fatal without duplicating output.

Suggested change
--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"
--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true

Copilot uses AI. Check for mistakes.
Comment on lines +774 to +783
int client_fd = accept(ctx->listen_fd, nullptr, nullptr);
if (client_fd < 0) continue;

// Read HTTP request
char buf[4096];
ssize_t n = read(client_fd, buf, sizeof(buf) - 1);
if (n <= 0) {
close(client_fd);
continue;
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The listener performs a blocking read() on the accepted client socket. Because the HTTP server is single-threaded, a client that connects and then stalls (or sends very slowly) can block the exporter indefinitely and prevent Prometheus scrapes (and potentially interfere with plugin shutdown). Consider setting a receive timeout (SO_RCVTIMEO), making the client socket non-blocking + polling for readability, or otherwise enforcing a small read deadline before parsing the request line.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (12)
mysql-test/suite/prometheus_exporter/t/replica_status.test (2)

1-1: Consider documenting why Windows is excluded.

Adding a brief comment explaining why not_windows.inc is needed (e.g., curl dependency, or Windows socket behavior differences) would help future maintainers understand the limitation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/replica_status.test` at line 1, Add a
short inline comment above the include line referencing include/not_windows.inc
that explains why Windows is excluded (for example: reliance on curl/Unix
sockets, POSIX-only behavior in the Prometheus exporter tests, or known Windows
networking incompatibilities); update the test file's top-of-file section (where
the include "include/not_windows.inc" appears) to state the specific reason and
any relevant bug/issue tracker IDs so future maintainers understand the
limitation.

4-4: The grep -c || echo "0" pattern produces two lines of output.

When grep -c finds no matches, it outputs "0" (the count) but exits with status 1. The || echo "0" then also outputs "0", resulting in two "0" lines. This matches the result file, but the logic is confusing and redundant.

Consider using grep -c ... || true to suppress the exit status while keeping the count output, or redirect grep's output and handle it differently.

Proposed fix for cleaner output
 --echo # On a non-replica server, no mysql_replica_ metrics should appear
---exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"
+--exec curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true

And update the result file to expect a single "0":

 # On a non-replica server, no mysql_replica_ metrics should appear
 0
-0
 # But other metrics should still be present
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/replica_status.test` at line 4, The
test currently uses the shell line '--exec curl -s
http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"', which
prints "0" twice when no matches are found; replace that pattern with '--exec
curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true' so
grep's count is preserved and non-zero exit is suppressed, and update the test
result expectation to a single "0" output (remove the duplicate line) to keep
output and assertions consistent.
plugin/prometheus_exporter/prometheus_exporter.cc (5)

403-410: Unused has_row field in ReplicaStatusCtx.

The has_row field is set to true in replica_start_row (line 432) but never read. If not needed for future use, consider removing it for clarity.

♻️ Suggested cleanup
 struct ReplicaStatusCtx {
   std::string *output;
   std::vector<std::string> col_names;
   std::vector<std::string> col_values;
   int col_index;
-  bool has_row;
   bool error;
 };

And remove the assignment in replica_start_row:

 static int replica_start_row(void *ctx) {
   auto *rc = static_cast<ReplicaStatusCtx *>(ctx);
   rc->col_values.clear();
   rc->col_values.resize(rc->col_names.size());
   rc->col_index = 0;
-  rc->has_row = true;
   return 0;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 403 - 410,
The struct ReplicaStatusCtx contains an unused field has_row; remove the has_row
member from ReplicaStatusCtx and also remove the assignment to it inside
replica_start_row so there are no leftover writes to a deleted field; update any
references (none expected) and rebuild to ensure no use remains of
ReplicaStatusCtx::has_row.

851-863: Thread attribute not destroyed after use.

my_thread_attr_init(&attr) allocates resources that should be released with my_thread_attr_destroy(&attr) after the thread is created, regardless of success or failure.

♻️ Suggested fix
   my_thread_attr_t attr;
   my_thread_attr_init(&attr);
   my_thread_attr_setdetachstate(&attr, MY_THREAD_CREATE_JOINABLE);

   if (my_thread_create(&g_ctx->listener_thread, &attr,
                        prometheus_listener_thread, g_ctx) != 0) {
     LogPluginErr(ERROR_LEVEL, ER_LOG_PRINTF_MSG,
                  "Prometheus exporter: failed to create listener thread");
+    my_thread_attr_destroy(&attr);
     close(g_ctx->listen_fd);
     delete g_ctx;
     g_ctx = nullptr;
     return 1;
   }
+  my_thread_attr_destroy(&attr);

   LogPluginErr(INFORMATION_LEVEL, ER_LOG_PRINTF_MSG,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 851 - 863,
The thread attribute initialized with my_thread_attr_init(&attr) is not
destroyed; call my_thread_attr_destroy(&attr) after attempting to create the
thread to free resources whether my_thread_create(&g_ctx->listener_thread,
&attr, prometheus_listener_thread, g_ctx) succeeds or fails; add
my_thread_attr_destroy(&attr) before the error return path (after cleaning up
g_ctx/listen_fd) and also after successful creation (before proceeding) so attr
is always destroyed.

119-157: Consider using a hash set for gauge variable lookup.

is_gauge() performs a linear search through ~30 items on every metric emission. While acceptable for current scale, a std::unordered_set would provide O(1) lookup and better scalability if the list grows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 119 - 157,
Replace the linear-scan array lookup in is_gauge with an O(1) hash lookup:
create a static std::unordered_set<std::string> (or std::string_view with
storage) containing all gauge names (populate it once from gauge_variables or
via an initializer list), ensure case-insensitive matching by storing lowercased
names and lowercasing the input name before lookup (or use a custom
case-insensitive hash/equality), and make initialization thread-safe by using a
function-local static or std::call_once; then change is_gauge to query that
unordered_set instead of iterating through gauge_variables.

52-55: Global service pointers lack internal linkage.

log_bi and log_bs are declared without static, giving them external linkage. This could cause symbol collisions if another plugin uses the same names. These should be static like reg_srv.

♻️ Suggested fix
 static SERVICE_TYPE(registry) *reg_srv = nullptr;
-SERVICE_TYPE(log_builtins) *log_bi = nullptr;
-SERVICE_TYPE(log_builtins_string) *log_bs = nullptr;
+static SERVICE_TYPE(log_builtins) *log_bi = nullptr;
+static SERVICE_TYPE(log_builtins_string) *log_bs = nullptr;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 52 - 55, The
global service pointers log_bi and log_bs currently have external linkage; make
them have internal linkage like reg_srv by declaring them static (i.e., add the
static storage specifier to the SERVICE_TYPE(log_builtins) *log_bi and
SERVICE_TYPE(log_builtins_string) *log_bs declarations) so they won’t collide
with symbols from other plugins; keep SERVICE_TYPE(registry) *reg_srv unchanged.

703-707: Missing return value check on security_context_lookup.

The return value of security_context_lookup is not checked. While failures are rare (root user should always exist), defensive coding would log or handle the error case.

🛡️ Suggested defensive check
   // Switch to root security context
   MYSQL_SECURITY_CONTEXT sc;
   thd_get_security_context(srv_session_info_get_thd(session), &sc);
-  security_context_lookup(sc, "root", "localhost", "127.0.0.1", "");
+  if (security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") != 0) {
+    srv_session_close(session);
+    srv_session_deinit_thread();
+    return "# Failed to set security context\n";
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/prometheus_exporter.cc` around lines 703 - 707,
The call to security_context_lookup is unchecked; capture its return value and
handle errors defensively: call thd_get_security_context(...) into
MYSQL_SECURITY_CONTEXT sc as before, then if security_context_lookup(sc, "root",
"localhost", "127.0.0.1", "") returns non-zero, log the failure (using the
plugin/mysql logging facility available in this module) with context (e.g.,
include the user/host) and abort the root switch path—return an error code or
clean up/exit the function instead of continuing; ensure any resources are left
consistent before returning.
docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md (3)

1093-1098: Numeric validation regex doesn't cover Prometheus special values.

The regex /^-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?$/ doesn't match Prometheus special values like +Inf, -Inf, or NaN. While the plugin currently doesn't emit these values, you may want to extend the regex for future-proofing:

unless ($value =~ /^-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?$/ ||
        $value =~ /^[+-]?Inf$/ ||
        $value eq 'NaN') {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines
1093 - 1098, The numeric validation currently rejects Prometheus special values;
update the check around the $value validation (the regex that tests $value
before emitting the FORMAT ERROR for $name) to also accept `+Inf`, `-Inf`, and
`NaN` — e.g., extend the condition so it passes if the original numeric regex
matches OR $value matches /^[+-]?Inf$/ OR $value eq 'NaN', and keep the existing
error message and $errors++ path unchanged.

556-577: Minor inconsistency: emplace_back vs push_back in plan vs implementation.

The plan uses rc->col_names.emplace_back(field->col_name) while the actual implementation at prometheus_exporter.cc:423 uses push_back. Both are functionally equivalent here, but the plan should match the implementation for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 556
- 577, The plan and implementation are inconsistent: the plan's
replica_field_metadata uses rc->col_names.emplace_back(field->col_name) but the
actual implementation (prometheus_exporter.cc: replica_field_metadata) uses
push_back; update the plan to match the implementation by replacing emplace_back
with rc->col_names.push_back(field->col_name) (reference symbols:
ReplicaStatusCtx, replica_field_metadata, col_names) so the plan and code are
consistent.

675-676: Outdated note about #include <vector>.

The note mentions adding #include <vector> to the includes, but the actual implementation already includes it (line 46 of prometheus_exporter.cc). This note can be removed since it's already satisfied.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md` around lines 675
- 676, Remove the outdated note that says "Note: add `#include <vector>`" from
the markdown (the note in the plans doc), because the implementation already
includes <vector> in prometheus_exporter.cc; simply delete that line so the doc
no longer suggests adding the header.
docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md (2)

96-108: Add language specifier to fenced code block.

The code block showing internal structure is missing a language specifier. Use text or plaintext to satisfy markdown linting and improve readability.

📝 Suggested fix
-```
+```text
 1. Includes, logging refs, system vars, context struct        (existing)
 2. Prometheus formatting helpers + gauge classification        (existing, expanded)

As per coding guidelines from static analysis hints: [warning] 96-96: Fenced code blocks should have a language specified (MD040, fenced-code-language).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md` around
lines 96 - 108, The fenced code block that lists the internal structure (the
block starting with "1. Includes, logging refs, system vars, context struct" and
containing the numbered list) is missing a language specifier; update that
triple-backtick fence to include a language token such as text or plaintext
(e.g., change ``` to ```text) so the markdown linter rule MD040 is satisfied and
readability is improved.

165-202: Add language specifier to ASCII art code block.

The ASCII architecture diagram is missing a language specifier. Use text for consistency with standard markdown practices.

📝 Suggested fix
-```
+```text
 ┌─────────────────────────────────────────────────────┐
 │                  VillageSQL Server                   │

As per coding guidelines from static analysis hints: [warning] 165-165: Fenced code blocks should have a language specified (MD040, fenced-code-language).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md` around
lines 165 - 202, The fenced ASCII-art diagram (the triple-backtick code block
that begins the big VillageSQL Server diagram) lacks a language specifier;
update the opening fence to include the language "text" (i.e., change ``` to
```text) so the diagram is treated as plain text and satisfies
MD040/fenced-code-language rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mysql-test/suite/prometheus_exporter/r/innodb_metrics.result`:
- Around line 5-6: The test's comment and metric TYPE disagree: the comment on
the check mentions a "counter" but the metric declaration lists
mysql_innodb_metrics_buffer_pool_reads as a "gauge"; update one to match the
other. Either change the comment to say "gauge" (so it documents the current
TYPE) or change the TYPE line to "counter" to assert the metric is a counter;
ensure you update the metric identifier mysql_innodb_metrics_buffer_pool_reads
accordingly so the test and comment are consistent.

In `@mysql-test/suite/prometheus_exporter/r/replica_status.result`:
- Around line 2-3: The result file replica_status.result contains two "0" lines
but the test replica_status.test only runs one grep -c "mysql_replica_" || echo
"0" command, so remove the extra "0" line from replica_status.result (leaving a
single "0" output) to match the single grep invocation; alternatively if the
test should produce two outputs, add the missing second grep -c "mysql_replica_"
(or equivalent) to replica_status.test so the number of grep outputs matches the
number of result lines.

In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Around line 33-42: The code sets $expect_metric_name on a valid "# TYPE" line
but can overwrite it with a subsequent TYPE or leave it set at EOF without
reporting an error; update the TYPE-handling logic in the loop (the block that
matches /^# TYPE / and the similar block at lines 73-79) to first check if
$expect_metric_name is defined and, if so, emit a "missing sample for TYPE"
FORMAT ERROR and increment $errors before assigning a new value, and also add a
final check after the main input loop to emit the same error if
$expect_metric_name remains set at EOF so TYPE lines that never receive a
matching metric sample are flagged.
- Around line 59-66: The test currently skips numeric validation for any metric
whose name matches /^mysql_global_variables_/ which lets non-numeric exports
pass; to fix, remove the "$name =~ /^mysql_global_variables_/" exception in the
value-checking unless block so only the intended "_time$" exception remains
(i.e., keep the check for $name =~ /_time$/ but delete the global variables
whitelist), ensuring $value is enforced to match the numeric regex for
mysql_global_variables_* metrics; update the unless that inspects $value and
$name accordingly.

In `@mysql-test/suite/prometheus_exporter/t/innodb_metrics.test`:
- Around line 7-10: The test currently only checks the presence of the metric
families; tighten the assertions to require exact Prometheus types by changing
the grep patterns to match the full type lines for the two metrics: require "^#
TYPE mysql_innodb_metrics_buffer_pool_reads counter$" for
mysql_innodb_metrics_buffer_pool_reads and "^# TYPE
mysql_innodb_metrics_buffer_pool_size gauge$" for
mysql_innodb_metrics_buffer_pool_size (update the two curl|grep commands in
innodb_metrics.test that reference those metric names).

In `@mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test`:
- Around line 4-8: The test description in metrics_endpoint.test claims "all 5
collector prefixes" but the assertions do not check replica metrics; either add
an assertion that the Prometheus output contains the mysql_replica_ prefix
(e.g., verify presence of a sample metric starting with "mysql_replica_") and
ensure that assertion runs only in a replica-capable test setup, or change the
test description to accurately state the number of collectors currently asserted
(e.g., remove "5" or list the actual prefixes asserted). Update the file's top
comment and any related assertions that mention collectors so they remain
consistent with the new check or the narrowed description.

In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 100-101: The errors_total atomic is declared and exported but
never incremented; update the code to increment errors_total wherever request or
collection failures occur (for example in prom_handle_error, in the HTTP handler
when collect_metrics() fails, and inside session operation failure paths), or
remove the variable if you prefer not to expose it. Locate the symbols
errors_total and requests_total to find existing counters, then add
errors_total.fetch_add(1, std::memory_order_relaxed) (or equivalent) in
prom_handle_error and any branch that returns an error from collect_metrics() or
session operations so the metric accurately reflects failures. Ensure increments
happen before sending error responses so the exposed status variable stays
consistent.
- Around line 774-784: The read() on the accepted socket (client_fd from accept
in the listener loop) has no receive timeout and can block the single-threaded
exporter; before calling read() in the accept loop (where client_fd is
obtained), set a receive timeout on client_fd via setsockopt(client_fd,
SOL_SOCKET, SO_RCVTIMEO, ...) or alternatively poll() the client_fd with a short
timeout and only call read() when it's ready; update the code around accept/read
(symbols: accept, client_fd, read) to close the socket and continue on timeout
or error.

In `@plugin/prometheus_exporter/README.md`:
- Around line 15-52: The README's ASCII architecture diagram (the
triple-backtick block containing "VillageSQL Server" and "prometheus_exporter
plugin") contains corrupted glyphs (`���`); fix it by replacing those
placeholders with the correct box-drawing characters (e.g., ─ ┌ ┐ └ ┘ │ ├ ┤ ┬ ┴)
or plain ASCII equivalents so the boxes/lines render correctly, and ensure the
file is saved with UTF-8 encoding; update all occurrences of `���` inside that
diagram block to the appropriate characters to restore the intended boxes and
connectors.

---

Nitpick comments:
In `@docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md`:
- Around line 1093-1098: The numeric validation currently rejects Prometheus
special values; update the check around the $value validation (the regex that
tests $value before emitting the FORMAT ERROR for $name) to also accept `+Inf`,
`-Inf`, and `NaN` — e.g., extend the condition so it passes if the original
numeric regex matches OR $value matches /^[+-]?Inf$/ OR $value eq 'NaN', and
keep the existing error message and $errors++ path unchanged.
- Around line 556-577: The plan and implementation are inconsistent: the plan's
replica_field_metadata uses rc->col_names.emplace_back(field->col_name) but the
actual implementation (prometheus_exporter.cc: replica_field_metadata) uses
push_back; update the plan to match the implementation by replacing emplace_back
with rc->col_names.push_back(field->col_name) (reference symbols:
ReplicaStatusCtx, replica_field_metadata, col_names) so the plan and code are
consistent.
- Around line 675-676: Remove the outdated note that says "Note: add `#include
<vector>`" from the markdown (the note in the plans doc), because the
implementation already includes <vector> in prometheus_exporter.cc; simply
delete that line so the doc no longer suggests adding the header.

In `@docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md`:
- Around line 96-108: The fenced code block that lists the internal structure
(the block starting with "1. Includes, logging refs, system vars, context
struct" and containing the numbered list) is missing a language specifier;
update that triple-backtick fence to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the markdown linter rule MD040 is
satisfied and readability is improved.
- Around line 165-202: The fenced ASCII-art diagram (the triple-backtick code
block that begins the big VillageSQL Server diagram) lacks a language specifier;
update the opening fence to include the language "text" (i.e., change ``` to
```text) so the diagram is treated as plain text and satisfies
MD040/fenced-code-language rules.

In `@mysql-test/suite/prometheus_exporter/t/replica_status.test`:
- Line 1: Add a short inline comment above the include line referencing
include/not_windows.inc that explains why Windows is excluded (for example:
reliance on curl/Unix sockets, POSIX-only behavior in the Prometheus exporter
tests, or known Windows networking incompatibilities); update the test file's
top-of-file section (where the include "include/not_windows.inc" appears) to
state the specific reason and any relevant bug/issue tracker IDs so future
maintainers understand the limitation.
- Line 4: The test currently uses the shell line '--exec curl -s
http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || echo "0"', which
prints "0" twice when no matches are found; replace that pattern with '--exec
curl -s http://127.0.0.1:19107/metrics | grep -c "mysql_replica_" || true' so
grep's count is preserved and non-zero exit is suppressed, and update the test
result expectation to a single "0" output (remove the duplicate line) to keep
output and assertions consistent.

In `@plugin/prometheus_exporter/prometheus_exporter.cc`:
- Around line 403-410: The struct ReplicaStatusCtx contains an unused field
has_row; remove the has_row member from ReplicaStatusCtx and also remove the
assignment to it inside replica_start_row so there are no leftover writes to a
deleted field; update any references (none expected) and rebuild to ensure no
use remains of ReplicaStatusCtx::has_row.
- Around line 851-863: The thread attribute initialized with
my_thread_attr_init(&attr) is not destroyed; call my_thread_attr_destroy(&attr)
after attempting to create the thread to free resources whether
my_thread_create(&g_ctx->listener_thread, &attr, prometheus_listener_thread,
g_ctx) succeeds or fails; add my_thread_attr_destroy(&attr) before the error
return path (after cleaning up g_ctx/listen_fd) and also after successful
creation (before proceeding) so attr is always destroyed.
- Around line 119-157: Replace the linear-scan array lookup in is_gauge with an
O(1) hash lookup: create a static std::unordered_set<std::string> (or
std::string_view with storage) containing all gauge names (populate it once from
gauge_variables or via an initializer list), ensure case-insensitive matching by
storing lowercased names and lowercasing the input name before lookup (or use a
custom case-insensitive hash/equality), and make initialization thread-safe by
using a function-local static or std::call_once; then change is_gauge to query
that unordered_set instead of iterating through gauge_variables.
- Around line 52-55: The global service pointers log_bi and log_bs currently
have external linkage; make them have internal linkage like reg_srv by declaring
them static (i.e., add the static storage specifier to the
SERVICE_TYPE(log_builtins) *log_bi and SERVICE_TYPE(log_builtins_string) *log_bs
declarations) so they won’t collide with symbols from other plugins; keep
SERVICE_TYPE(registry) *reg_srv unchanged.
- Around line 703-707: The call to security_context_lookup is unchecked; capture
its return value and handle errors defensively: call
thd_get_security_context(...) into MYSQL_SECURITY_CONTEXT sc as before, then if
security_context_lookup(sc, "root", "localhost", "127.0.0.1", "") returns
non-zero, log the failure (using the plugin/mysql logging facility available in
this module) with context (e.g., include the user/host) and abort the root
switch path—return an error code or clean up/exit the function instead of
continuing; ensure any resources are left consistent before returning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5dfbf72a-b1ba-4ab5-821a-cb858172c6d2

📥 Commits

Reviewing files that changed from the base of the PR and between 957d549 and ba7b2c0.

📒 Files selected for processing (27)
  • Docs/prometheus_exporter.md
  • docs/superpowers/plans/2026-04-05-prometheus-exporter-v2.md
  • docs/superpowers/specs/2026-04-05-prometheus-exporter-v2-design.md
  • mysql-test/include/plugin.defs
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/r/binlog.result
  • mysql-test/suite/prometheus_exporter/r/format_validation.result
  • mysql-test/suite/prometheus_exporter/r/global_variables.result
  • mysql-test/suite/prometheus_exporter/r/innodb_metrics.result
  • mysql-test/suite/prometheus_exporter/r/metrics_endpoint.result
  • mysql-test/suite/prometheus_exporter/r/replica_status.result
  • mysql-test/suite/prometheus_exporter/t/basic.test
  • mysql-test/suite/prometheus_exporter/t/binlog-master.opt
  • mysql-test/suite/prometheus_exporter/t/binlog.test
  • mysql-test/suite/prometheus_exporter/t/format_validation-master.opt
  • mysql-test/suite/prometheus_exporter/t/format_validation.test
  • mysql-test/suite/prometheus_exporter/t/global_variables-master.opt
  • mysql-test/suite/prometheus_exporter/t/global_variables.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics-master.opt
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint-master.opt
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
  • mysql-test/suite/prometheus_exporter/t/replica_status-master.opt
  • mysql-test/suite/prometheus_exporter/t/replica_status.test
  • plugin/prometheus_exporter/CMakeLists.txt
  • plugin/prometheus_exporter/README.md
  • plugin/prometheus_exporter/prometheus_exporter.cc

Comment on lines +7 to +10
--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads " | head -1

--echo # Verify a known gauge type metric exists (buffer_pool_size is 'value' type in InnoDB)
--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size " | head -1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert exact Prometheus types for the two canonical InnoDB metrics.

These patterns currently only confirm metric family presence; they don’t enforce counter for buffer_pool_reads or gauge for buffer_pool_size.

🔧 Suggested tightening
---exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads " | head -1
+--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_reads counter$" | head -1

---exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size " | head -1
+--exec curl -s http://127.0.0.1:19106/metrics | grep "^# TYPE mysql_innodb_metrics_buffer_pool_size gauge$" | head -1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/innodb_metrics.test` around lines 7 -
10, The test currently only checks the presence of the metric families; tighten
the assertions to require exact Prometheus types by changing the grep patterns
to match the full type lines for the two metrics: require "^# TYPE
mysql_innodb_metrics_buffer_pool_reads counter$" for
mysql_innodb_metrics_buffer_pool_reads and "^# TYPE
mysql_innodb_metrics_buffer_pool_size gauge$" for
mysql_innodb_metrics_buffer_pool_size (update the two curl|grep commands in
innodb_metrics.test that reference those metric names).

…tness

- Default bind address changed from 0.0.0.0 to 127.0.0.1 (no auth endpoint)
- Increment errors_total in all four error handlers
- Strict HTTP path matching (reject /metricsXYZ)
- Add SO_RCVTIMEO on client sockets to prevent slow-client hangs
- Full strtod/strtoll consumption checks to reject partial numeric parses
- Check return values of security context functions
- Fix README corrupted box-drawing characters
- Test fixes: grep -c || true, orphaned TYPE detection, comment accuracy
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
plugin/prometheus_exporter/README.md (1)

15-15: Consider adding a language identifier to the fenced code block.

While not critical for ASCII art diagrams, adding a language identifier (e.g., text or leaving it as ```) would satisfy the markdownlint rule and is a minor documentation best practice.

📝 Proposed fix
-```
+```text
 ┌─────────────────────────────────────────────────────┐
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/prometheus_exporter/README.md` at line 15, Add a language identifier
to the fenced code block in the README so markdownlint is satisfied; locate the
ASCII-art fenced block delimited by ``` and change the opening fence to include
a language token such as text (e.g., replace the opening ``` with ```text) so
the block becomes a labeled code fence while keeping its contents unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@plugin/prometheus_exporter/README.md`:
- Line 15: Add a language identifier to the fenced code block in the README so
markdownlint is satisfied; locate the ASCII-art fenced block delimited by ```
and change the opening fence to include a language token such as text (e.g.,
replace the opening ``` with ```text) so the block becomes a labeled code fence
while keeping its contents unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bd21e74f-8e15-4551-a7a8-bc072326f157

📥 Commits

Reviewing files that changed from the base of the PR and between ba7b2c0 and f709eb0.

📒 Files selected for processing (9)
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/r/innodb_metrics.result
  • mysql-test/suite/prometheus_exporter/r/replica_status.result
  • mysql-test/suite/prometheus_exporter/t/format_validation.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
  • mysql-test/suite/prometheus_exporter/t/replica_status.test
  • plugin/prometheus_exporter/README.md
  • plugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (7)
  • mysql-test/suite/prometheus_exporter/r/replica_status.result
  • mysql-test/suite/prometheus_exporter/r/innodb_metrics.result
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/t/replica_status.test
  • mysql-test/suite/prometheus_exporter/t/metrics_endpoint.test
  • mysql-test/suite/prometheus_exporter/t/innodb_metrics.test
  • plugin/prometheus_exporter/prometheus_exporter.cc
🚧 Files skipped from review as they are similar to previous changes (1)
  • mysql-test/suite/prometheus_exporter/t/format_validation.test

@renecannao
Copy link
Copy Markdown
Author

Thoughtful Review of PR #1

I commissioned five parallel specialized review agents (security, concurrency, code quality, test coverage, MySQL plugin API conformance) and synthesized their findings below. The bottom line: the plugin works and has a solid foundation, but it has a number of real correctness, security, and lifecycle bugs that should be addressed before this can be considered production-ready. The v1 scope was intentionally minimal; many of these items are natural follow-ups for a v2.

Findings are grouped by theme with severity and concrete fix suggestions. Line numbers reference the current HEAD of the branch.


🚨 Critical / High — Must Fix Before Merge

1. Use-after-free race in SHOW STATUS callbacks vs plugin uninstall

Files: plugin/prometheus_exporter/prometheus_exporter.cc lines 113, 897–905, 915–945

`g_ctx` is a plain `PrometheusContext *`, not `std::atomic`. The status-variable handlers (`show_requests_total`, `show_errors_total`, `show_scrape_duration`) do a null check then dereference `g_ctx` — but MySQL can invoke these callbacks from any user thread at any time, including while `prometheus_exporter_deinit` is running `delete ctx; g_ctx = nullptr;`.

Scenario: A user runs `SHOW GLOBAL STATUS LIKE 'Prometheus_exporter_%'` at the same moment another session runs `UNINSTALL PLUGIN prometheus_exporter`. The status handler reads `g_ctx` as non-null, passes the check, then deinit runs `delete ctx`, then the status handler dereferences freed memory.

Fix: Make `g_ctx` a `std::atomic<PrometheusContext *>`, or move the atomic counters out of the context into file-scope globals (simplest). Alternatively, an rwlock held by all status callbacks and exclusively by deinit.


2. `close(listen_fd)` as shutdown wakeup is unsafe

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 897–901, 777–786

Deinit does `shutdown_requested=true; close(listen_fd); my_thread_join(...)`. Closing a file descriptor that another thread is using in `poll()`/`accept()` is undefined on Linux — the fd number can be reused by another thread in the process (e.g., a new client connection) between the close and the next poll iteration, causing the listener thread to operate on an unrelated fd.

Fix: Use a self-pipe or `eventfd` added as a second pollfd. On shutdown, write to it. The listener treats the wakeup as "shutdown" and closes `listen_fd` itself after the loop. Or at minimum, `shutdown(listen_fd, SHUT_RDWR)` before `close()` and only close after join.


3. Single-threaded listener + unbounded write = trivial DoS

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 765–772, 774–836

  • `write_full()` loops on `write()` with no `SO_SNDTIMEO` and no write-side timeout. A client that completes the TCP handshake but never reads will cause `write()` to block until the socket buffer fills, then block forever.
  • The listener is single-threaded, so one stalled client permanently halts all metrics scraping.
  • `listen(fd, 5)` is a backlog of only 5.

Fix:

  1. Set `SO_SNDTIMEO` on accepted client sockets.
  2. Consider `send(..., MSG_NOSIGNAL)` instead of `write()` to avoid SIGPIPE.
  3. Increase backlog to `SOMAXCONN` (or at least 128).
  4. Bound total per-request time with an overall deadline.

4. Hardcoded `root@localhost` security context

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` line 713

`security_context_lookup(sc, "root", "localhost", "127.0.0.1", "")` is fragile:

  • Hardened / packaged installs may not have `root@localhost` (Debian uses `debian-sys-maint`; some distros rename root).
  • Runs every query with root privileges — far more than the scrape actually needs (`PROCESS`, `REPLICATION CLIENT`, `SELECT` on `information_schema`).
  • If anything ever causes user input to flow into the query path, root execution becomes full compromise.

Fix: Use `mysql.session@localhost` (guaranteed-exists, low-priv internal account used by MySQL itself for service threads), or make the account configurable via a sysvar. Fail at init if the lookup fails, not silently at every scrape.


5. `srv_session_init_thread` called per-scrape instead of per-thread

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 700, 727

The MySQL service contract (see `include/mysql/service_srv_session.h:82-94` and `plugin/test_service_sql_api/test_sql_cmds_1.cc:514-551`) specifies `srv_session_init_thread` should be called once per physical thread that will use the session service. The current code calls it on every HTTP request, which is:

  • Wasteful (THD allocation, PFS registration per scrape)
  • Semantically wrong
  • A potential shutdown hang: if a scrape is stuck inside `command_service_run_command` during server shutdown, the thread state has no cancellation mechanism, and `my_thread_join` in deinit will hang indefinitely.

Fix: Move `srv_session_init_thread(ctx->plugin_ref)` to the top of `prometheus_listener_thread()` and `srv_session_deinit_thread()` just before return.


6. Logging service leak on init failure paths

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 846, 857, 861–868, 874–882

`prometheus_exporter_init` calls `init_logging_service_for_plugin` early, but failure returns later in init (bind failure, thread create failure) don't call `deinit_logging_service_for_plugin`. MySQL's `plugin_initialize()` does NOT call deinit when init returns non-zero — see `connection_control.cc` for the correct pattern of cleaning up before every failure return.

Fix: Add `deinit_logging_service_for_plugin(&reg_srv, &log_bi, &log_bs);` before every error `return 1;` in init.


7. `prom_bind_address` can be NULL

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 747, 860, 863

A user can set `--prometheus-exporter-bind-address=""` (empty). `setup_listen_socket` passes it straight to `inet_pton`, which is undefined for NULL on some libcs. The error log message also uses `"%s", prom_bind_address` which formats NULL unsafely.

Fix: Null/empty check at the top of `setup_listen_socket`, fail clean with a log message.


8. HTTP request parsing reads once, ignoring TCP segmentation

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 796–805

A single `read(client_fd, buf, 4095)` and then `strncmp(buf, "GET /metrics", 12)`. Problems:

  • TCP can deliver the request in multiple segments — a short read would return fewer bytes than needed and the `strncmp` silently fails.
  • Legitimate slow clients would see intermittent 404s.
  • Pipelined requests and malformed clients are not handled robustly.

Fix: Loop reading until the request line `\r\n\r\n` is seen (with overall deadline). Validate the method/URI/HTTP version properly (GET method, matching `/metrics` exactly, HTTP/1.x version).


⚠️ Medium — Should Fix Soon

9. `SHOW GLOBAL VARIABLES` over unauthenticated HTTP (information disclosure)

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 689–693, C-1 in security review

`collect_global_variables` runs `SHOW GLOBAL VARIABLES` as root and emits every variable with a numeric value. While the default bind is `127.0.0.1` (mitigating network exposure), any local user on the host can read the full tuning surface of the server without authentication. And if the bind address is ever changed, the exposure becomes network-wide.

Fix (conservative): Add a hard deny-list of sensitive variable names (log paths, SSL paths, auth settings) even when numeric. Log a warning when binding to a non-loopback address.

Fix (better): Make `SHOW GLOBAL VARIABLES` collection opt-in via a separate sysvar that's `OFF` by default, and/or document explicitly that the endpoint must not be exposed outside a trusted network (README already says this but it bears repeating).


10. Status variable buffer alignment assumption

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 918, 929, 939

`reinterpret_cast<longlong *>(buff)` in the SHOW_FUNC callbacks. MySQL does pass an aligned 1024-byte buffer in practice, and `connection_control.cc:467` uses the same pattern, but on strict-alignment platforms this is a latent SIGBUS. Use `memcpy` to the buffer instead of a reinterpret_cast assignment.


11. `errors_total` never actually useful

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` multiple

`errors_total` is incremented in the error handlers but it's the only signal a scraper has that something went wrong. The HTTP response is still 200 OK with a partial body, and the individual collector's `error` flag is set but never read by `collect_metrics()`. Consider:

  • Emit `mysql_exporter_collector_success{collector="..."}` gauge per collector (standard mysqld_exporter pattern).
  • Or return HTTP 503 when all collectors fail.

12. Missing PSI instrumentation

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` entire file

The listener thread is not registered with Performance Schema. Memory allocations (context struct, string building) are not tracked. Compare with `plugin/daemon_example/daemon_example.cc:44-59`. This is standard for any daemon plugin and important for operators who rely on P_S for observability of the server itself.

Fix: Register a PSI memory category and thread key.


13. `std::memory_order_relaxed` on shutdown flag

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 777, 897

Using relaxed ordering on `shutdown_requested` happens to work today because `my_thread_join` provides the release/acquire edge. But if anything else is added around the flag in the future, relaxed will silently break the contract.

Fix: Use `memory_order_release` on store, `memory_order_acquire` on load. Cheap and self-documenting.


14. Partial-numeric acceptance in binlog size accumulator

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 571–582

`strtoll` is used with `end != start` check, but a string like `"123abc"` would still be counted as `123`. Also, `total_size` is `long long` not `uint64_t` — theoretical signed overflow on petabyte-scale binlog totals is UB. (Note: the other numeric-parsing call sites were fixed in `f709eb03` but this path seems to have been missed — please double check.)


15. EINTR not handled consistently

Files: `plugin/prometheus_exporter/prometheus_exporter.cc` lines 765–772 (`write_full`), 782–786 (`poll/accept`), 796–800 (`read`)

  • `write_full` breaks the loop on `EINTR`, silently truncating the response.
  • `poll`/`accept` silently retry on all errors including persistent ones, busy-looping.
  • `read` treats `EINTR` as disconnect.

Fix: Explicitly retry on `EINTR`, log+exit on fatal errors like `EBADF`.


📋 Code Quality — Larger Cleanup Opportunities

16. Massive duplication across the 5 collectors

Five callback structs (`prom_cbs`, `innodb_cbs`, `replica_cbs`, `binlog_cbs`), four nearly-identical error handlers, four copies of the "`# TYPE ...\n \n`" emission pattern. Any change to the format or callback ABI requires touching 4–5 places.

Suggested refactor: A single `Collector` abstraction taking the query string and a per-row lambda. This would reduce the file by 200+ lines and consolidate the error handling. Also: `collect_name_value_query` already exists as a helper but covers only 2 of 5 collectors.

17. `g_ctx` + `plugin->data` are redundant sources of truth

Both point at the same object. Status callbacks can only reach `g_ctx`, so keep that and remove `plugin->data`, or vice versa. Mixing them makes the lifecycle harder to reason about.

18. Section separator comments violate CLAUDE.md

The file contains 13 `// -----` block separators. `CLAUDE.md` explicitly says: "Do NOT use section separator comments". Remove them.

19. Hardcoded gauge list at lines 119–150 is a maintenance burden

29 hardcoded names via linear `strcasecmp` (~17k comparisons per scrape for ~600 status vars). Use a `std::unordered_set` or suffix-based pattern matching (`_total`, `_seconds`, `_bytes`, etc.).

20. `CS_TEXT_REPRESENTATION` forces string round-trip for all numeric values

Using `CS_BINARY_REPRESENTATION` + the typed getters (`get_longlong`, `get_double`) would be faster and eliminate the `strtod` validation entirely. Minor perf but touches hot paths.

21. `global_variables_type` ignores its argument

The callback takes a name but never uses it. Fine for now, but it makes the signature misleading. Use `[[maybe_unused]]` or a wrapper that doesn't need the parameter.


🧪 Test Coverage Gaps

22. Scrape counter delta test is trivially passing

`metrics_endpoint.test` asserts `Prometheus_exporter_requests_total` with `--replace_regex /[0-9]+/NUM/`. `0` matches too — a bug that fails to increment the counter would not be caught.

Fix: Capture `requests_total` before and after N curls, assert delta = N.

23. `scrape_duration_microseconds` and `errors_total` are never exercised

Only `requests_total` is touched. The timing instrumentation could be removed entirely and no test would fail. `errors_total` is completely untested.

24. Replica positive path is not tested

`replica_status.test` only checks absence on a non-replica. All 6 replica fields, the Yes→1/No→0 mapping, the column-name matching loop, and `replica_end_row` are dead code from a test perspective. Consider an `rpl` suite variant.

25. Several error paths are untested

  • Plugin init when the port is already in use (bind failure)
  • Plugin init with invalid `bind_address` (e.g., `not.an.ip`, `::1` — IPv4-only constraint not documented by a test)
  • Plugin loaded with `enabled=OFF` at startup (verifying the short-circuit path)
  • HTTP `POST /metrics` (currently 404 by accident)
  • `GET /metrics?query=string` (Prometheus sometimes appends this)
  • `SO_RCVTIMEO` firing on an idle client
  • Concurrent scrapes
  • Plugin uninstall while a scrape is in-flight

26. `format_validation.test` has a carve-out that hides bugs

The validator exempts `mysql_global_variables_` and `_time` metrics from the numeric check. But the plugin code already filters non-numeric values at `prom_end_row`. The carve-out would mask a regression where that filter breaks.

Fix: Remove the carve-out. The plugin invariant is "every emitted value is numeric" — the test should enforce it. Also add support for `NaN`, `+Inf`, `-Inf` per the Prometheus spec.

27. No unit tests for pure functions

`is_gauge()`, metric name formatting, numeric detection, binlog size parsing — all are testable without the server. Consider adding a `unittest/gunit`-style test.

28. `untyped` path is likely uncovered

Tests verify the gauge path but never assert that at least one `untyped` metric appears. If the gauge list accidentally swallowed all metrics as gauge, nothing would fail.


✅ Things Done Well

  1. Conservative defaults — `enabled=OFF`, `bind_address=127.0.0.1`, READONLY sysvars. Good security posture out of the box.
  2. Correct command_service plumbing — the callbacks, COM_DATA setup, charset choice, and CS_TEXT_REPRESENTATION usage are all correct and match the reference examples in `plugin/test_service_sql_api/`.
  3. Clean separation of concerns — `collect_metrics()`, `setup_listen_socket()`, and the listener loop are cleanly divided. The HTTP listener is a thin wrapper around `collect_metrics()`.
  4. SQL over internal structs — using stable SQL interfaces instead of poking at `global_status_var` makes the plugin resilient to MySQL version changes during rebases. This is the right call for a VillageSQL fork.
  5. `format_validation.test` is a genuinely novel and useful test — the concept of a structural format validator is much better than raw greps. With the carve-out removed it would be very strong.
  6. No SQL injection surface — all queries are hardcoded literals, no user input concatenation.
  7. No dangerous C string functions — all string building uses `std::string +=`, which is length-safe.
  8. Port isolation across tests — 19104–19109 lets tests run in parallel cleanly.

Summary Counts

Severity Count
Critical/High (must fix) 8
Medium (should fix) 7
Code Quality 6
Test Coverage 7

Recommended Path Forward

Given the scope of the findings, I would suggest landing this PR (v1 foundation is solid) and opening follow-up issues for:

  1. Security hardening issue — items Fix README installation and Quick Start examples villagesql/villagesql-server#4, Set Default Issue templates villagesql/villagesql-server#9, and the bind-address warning (Feature request: Stored procedures in Python villagesql/villagesql-server#7 in README)
  2. Concurrency / lifecycle issue — items Embedded Prometheus metrics exporter plugin #1, Data directory initialization issues first time installation villagesql/villagesql-server#2, Five builtin extensions mentioned, only vsql_complex and vsql_simple actually available villagesql/villagesql-server#5, Fix README to clarify only 2 extensions included in source builds villagesql/villagesql-server#6
  3. HTTP robustness issue — items Streamlining installation instructions villagesql/villagesql-server#3, Initial launch of VillageSQL villagesql/villagesql-server#8, [Feature Request]: Provide a pre-built Docker image villagesql/villagesql-server#15
  4. Test coverage expansion issue — items Error messages for disallowed implicit cast of custom types villagesql/villagesql-server#22Update README to reflect unqualified VDF function calls villagesql/villagesql-server#28
  5. Refactor issue — items [Feature Request]: Server support on Windows villagesql/villagesql-server#16vsql_simple doesn't seem to function villagesql/villagesql-server#21

Alternatively, items #1, villagesql#2, villagesql#3, villagesql#4, villagesql#5, and villagesql#6 (the critical correctness/safety items) could be addressed in this PR before merge, and the rest deferred to follow-ups.


This review was produced by dispatching 5 specialized agents in parallel: security, concurrency, code quality, test coverage, and MySQL plugin API conformance. Each agent read the code independently and reported findings; I synthesized them here.

- Move atomic counters out of PrometheusContext to file-scope globals,
  eliminating the use-after-free race between SHOW STATUS callbacks
  and plugin deinit. Use memcpy instead of reinterpret_cast to avoid
  alignment issues in SHOW_FUNC callbacks.
- Use eventfd as wakeup mechanism instead of closing listen_fd
  from another thread (fixes fd-recycle hazard). close() now happens
  after join(), so the listener never sees closed fds.
- Use memory_order_acquire/release on shutdown_requested flag.
- Add deinit_logging_service_for_plugin to all init failure paths
  to avoid leaking the logging service registration.
- NULL/empty check for bind_address to prevent crash on inet_pton(NULL).
- Add SO_SNDTIMEO (5s) on accepted client sockets alongside existing
  SO_RCVTIMEO. Prevents listener thread from blocking forever when
  a client completes TCP handshake but never reads.
- Switch write_full from write() to send() with MSG_NOSIGNAL to
  avoid SIGPIPE on half-closed connections, and retry on EINTR.
- Loop reading HTTP request until \r\n\r\n delimiter or buffer full,
  instead of a single read() that may return a partial request
  across TCP segments. Handles EINTR correctly.
- Length check before indexing buf[12] in path matcher.
- Move srv_session_init_thread/deinit_thread to listener thread lifetime
  instead of per-scrape. Per MySQL service contract, these must be
  called once per physical thread, not per request.

- Add prometheus_exporter_security_user sysvar to make the internal
  user configurable (was hardcoded to "root"). Default remains "root"
  for backward compatibility. Operators can switch to a least-privilege
  account (e.g. one granted PROCESS, REPLICATION CLIENT, and SELECT on
  information_schema) by setting this variable at startup. Note that
  mysql.session@localhost does NOT have PROCESS by default, so cannot
  read information_schema.INNODB_METRICS without an additional grant.

- Improve error message when security context setup fails.
- Binlog collector: fully validate numeric parsing (reject "123abc"),
  check for negative values and overflow before accumulating.
- Replica collector: require full-string numeric consumption in strtod
  check, matching the fix previously applied to global_status.
- Log a WARNING at plugin init when bound to a non-loopback address,
  alerting operators to the security implications of exposing the
  unauthenticated /metrics endpoint to the network.
- Mark unused parameter in global_variables_type with [[maybe_unused]].
- Remove section separator comments per CLAUDE.md guideline
- Remove g_ctx/plugin->data redundancy, keeping only one source of
  truth for the plugin context pointer
- Add scrape_counter.test: captures requests_total before and after
  a fixed number of curl scrapes, asserts exact delta. Also verifies
  scrape_duration_microseconds > 0 and errors_total == 0. Catches
  regressions where status counters silently fail to update.
- Remove format_validation carve-out that exempted global_variables
  and _time metrics from numeric validation. The plugin's prom_end_row
  already filters non-numeric values, so the validator should enforce
  the invariant strictly. Extend regex to accept NaN/+Inf/-Inf per
  Prometheus spec.
@renecannao
Copy link
Copy Markdown
Author

Review Fixes Applied

I've addressed the majority of the findings from the review above across 6 commits. 9 tests passing (up from 8 — added a new scrape_counter test).

Commits (most recent last)

SHA What
8efd77f2d2a Critical safety: use-after-free race, unsafe shutdown, NULL bind_address, logging leak
f4041d3aed6 HTTP robustness: SO_SNDTIMEO, read loop for TCP segmentation, EINTR handling, MSG_NOSIGNAL
dccbd944818 srv_session lifecycle fix + configurable security user via new sysvar
9c388765b45 Medium items: binlog numeric validation, replica strict parse, non-loopback warning, unused param
d96ee346f5b Code quality: removed section separators (CLAUDE.md), removed g_ctx/plugin->data redundancy
d92a1fe775e Tests: new scrape_counter delta test, removed format_validation carve-out

What was fixed (by review item number)

Critical / High (8/8):

Medium (6/7):

Code Quality (3/6):

Test Coverage (2/7):

Not fixed (deferred to follow-ups)

The following items from the review are real but were judged out of scope for this PR:

Test run

prometheus_exporter.basic              pass
prometheus_exporter.binlog             pass
prometheus_exporter.format_validation  pass
prometheus_exporter.global_variables   pass
prometheus_exporter.innodb_metrics     pass
prometheus_exporter.metrics_endpoint   pass
prometheus_exporter.replica_status     pass
prometheus_exporter.scrape_counter     pass  (new)
shutdown_report                        pass
Completed: All 9 tests were successful.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
mysql-test/suite/prometheus_exporter/t/format_validation.test (1)

55-60: Tighten validation: reject metric samples without a preceding # TYPE.

Currently, a sample line is accepted when no expected TYPE is pending. For this plugin’s contract, that weakens regression detection.

🔧 Proposed patch
-    if (defined $expect_metric_name && $name ne $expect_metric_name) {
-      print "FORMAT ERROR: expected metric '$expect_metric_name' but got '$name'\n";
-      $errors++;
-    }
+    if (!defined $expect_metric_name) {
+      print "FORMAT ERROR: metric '$name' missing preceding TYPE line\n";
+      $errors++;
+    } elsif ($name ne $expect_metric_name) {
+      print "FORMAT ERROR: expected metric '$expect_metric_name' but got '$name'\n";
+      $errors++;
+    }
     $expect_metric_name = undef;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/format_validation.test` around lines
55 - 60, The test currently allows a sample line when $expect_metric_name is
undef; change the check in the sample-handling block so that if
$expect_metric_name is defined you still compare $name to $expect_metric_name,
but if it is not defined you treat this as a format error (increment $errors and
print a clear message like "FORMAT ERROR: sample without preceding TYPE for
'<name>'"). Update the block that references $expect_metric_name, $name, and
$errors to implement this rejection and then clear $expect_metric_name as
before.
mysql-test/suite/prometheus_exporter/t/scrape_counter.test (1)

16-18: Add explicit timeout to prevent test hangs, but reconsider fail-fast flag.

curl -s correctly returns success here since the test validates correctness through the errors_total counter assertion (line 33–35). However, curl with no deadline can hang indefinitely. Adding --max-time 5 is prudent.

Avoid -f flag: It makes curl exit non-zero on HTTP 4xx/5xx, causing --exec to fail and abort the test before the verification checks (lines 24–35) run. The current design intentionally lets curl succeed and then validates via metrics—which is the correct approach for this regression test.

---exec curl -s http://127.0.0.1:19110/metrics > /dev/null
---exec curl -s http://127.0.0.1:19110/metrics > /dev/null
---exec curl -s http://127.0.0.1:19110/metrics > /dev/null
++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null
++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null
++exec curl -s --max-time 5 http://127.0.0.1:19110/metrics > /dev/null
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/scrape_counter.test` around lines 16 -
18, The three test invocations that run curl (the lines with "--exec curl -s
http://127.0.0.1:19110/metrics > /dev/null") should include an explicit timeout
to avoid hanging: add "--max-time 5" to each curl invocation; do NOT add "-f" or
any fail-fast flag since the test intentionally lets curl succeed and then
verifies correctness via the errors_total counter in the subsequent assertions
(lines around the errors_total check).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Line 19: The curl backtick invocation assigning to $output can hang; change
the command to include timeout and fail flags (e.g. add --max-time 5 --fail
--silent) and then check the shell exit status ($?) immediately after the
backtick call, failing the test if non-zero (die or ok/fail) so the test is
fail-fast on fetch errors; locate the assignment to $output (my $output = `curl
-s http://127.0.0.1:19109/metrics`), replace the command string with one that
includes --max-time and --fail, and add a conditional checking $? to handle and
report the error before proceeding.

---

Nitpick comments:
In `@mysql-test/suite/prometheus_exporter/t/format_validation.test`:
- Around line 55-60: The test currently allows a sample line when
$expect_metric_name is undef; change the check in the sample-handling block so
that if $expect_metric_name is defined you still compare $name to
$expect_metric_name, but if it is not defined you treat this as a format error
(increment $errors and print a clear message like "FORMAT ERROR: sample without
preceding TYPE for '<name>'"). Update the block that references
$expect_metric_name, $name, and $errors to implement this rejection and then
clear $expect_metric_name as before.

In `@mysql-test/suite/prometheus_exporter/t/scrape_counter.test`:
- Around line 16-18: The three test invocations that run curl (the lines with
"--exec curl -s http://127.0.0.1:19110/metrics > /dev/null") should include an
explicit timeout to avoid hanging: add "--max-time 5" to each curl invocation;
do NOT add "-f" or any fail-fast flag since the test intentionally lets curl
succeed and then verifies correctness via the errors_total counter in the
subsequent assertions (lines around the errors_total check).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ca05812-c49f-4d12-9a5d-4794d009e036

📥 Commits

Reviewing files that changed from the base of the PR and between f709eb0 and d92a1fe.

📒 Files selected for processing (6)
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/r/scrape_counter.result
  • mysql-test/suite/prometheus_exporter/t/format_validation.test
  • mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
  • mysql-test/suite/prometheus_exporter/t/scrape_counter.test
  • plugin/prometheus_exporter/prometheus_exporter.cc
✅ Files skipped from review due to trivial changes (3)
  • mysql-test/suite/prometheus_exporter/t/scrape_counter-master.opt
  • mysql-test/suite/prometheus_exporter/r/basic.result
  • mysql-test/suite/prometheus_exporter/r/scrape_counter.result
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugin/prometheus_exporter/prometheus_exporter.cc

use strict;
use warnings;

my $output = `curl -s http://127.0.0.1:19109/metrics`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout and fail-fast handling to the curl fetch.

At Line 19, curl -s can block indefinitely and make this test flaky under listener stalls. Add --max-time, --fail, and explicit exit-status handling.

🔧 Proposed patch
-my $output = `curl -s http://127.0.0.1:19109/metrics`;
+my $output = `curl -sS --fail --max-time 5 http://127.0.0.1:19109/metrics`;
+if ($? != 0) {
+  print "ERROR: failed to fetch /metrics\n";
+  exit 1;
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
my $output = `curl -s http://127.0.0.1:19109/metrics`;
my $output = `curl -sS --fail --max-time 5 http://127.0.0.1:19109/metrics`;
if ($? != 0) {
print "ERROR: failed to fetch /metrics\n";
exit 1;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mysql-test/suite/prometheus_exporter/t/format_validation.test` at line 19,
The curl backtick invocation assigning to $output can hang; change the command
to include timeout and fail flags (e.g. add --max-time 5 --fail --silent) and
then check the shell exit status ($?) immediately after the backtick call,
failing the test if non-zero (die or ok/fail) so the test is fail-fast on fetch
errors; locate the assignment to $output (my $output = `curl -s
http://127.0.0.1:19109/metrics`), replace the command string with one that
includes --max-time and --fail, and add a conditional checking $? to handle and
report the error before proceeding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants