SQL-189: External metrics endpoint#36463
Merged
SangJunBak merged 10 commits intoMay 13, 2026
Merged
Conversation
65c9289 to
0c502bc
Compare
ddd9c8c to
29c43be
Compare
e79a931 to
d718848
Compare
Adds a feature flag for the upcoming federated /metrics/external endpoint on environmentd, which will fan out a single scrape across env's local metrics and every clusterd replica's /metrics.
Add the protobuf feature to the prometheus crate for future use. Updated all calls of get_value to value. This is because `get_value` is deprecated since v0.14.0 and when we enabled the protobuf feature in the Prometheus crate, the new build forces us to use it.
Adds a small crate that owns the rust-protobuf based decoder for the Prometheus protobuf wire format. Encoding goes through prometheus::ProtobufEncoder directly; decoding requires rust-protobuf's Message trait and CodedInputStream, which we fence behind this crate via a deny.toml wrapper exception. The initial motivation for denylisting the protobuf crate is because it wasn't actively maintained and we could get much of the same behavior using crates `prost` and `protobuf-native`. However because the prometheus crate already exposes encoding/decoding through protobuf, the alternative crates would require copying/forking Prometheus' client_model.proto and writing our own shims
The federated /metrics/external endpoint will use this to scrape clusterd replicas in protobuf so it can mutate labels before re-emitting the merged result as text. For all callers of this endpoint, we need to pass the headermap
Extracts out helper functions used to proxy clusterd http endpoints from environmentd. We'll be using these to proxy each clusterd's metrics endpoint
Adds a new metrics endpoint that aggregates env's local Prometheus metrics with every clusterd replica's /metrics output. Each clusterd-sourced metric is decorated with cluster_id, replica_id, process, and (when known) cluster_name and replica_name labels before the merged result is re-emitted as Prometheus text. Names come from a catalog snapshot taken once per request; IDs come from the known addresses. The handler scrapes replicas in protobuf (added in the previous handle_prometheus commit) so labels can be mutated cheaply without a text-format reparse.
53cf91c to
5bb5182
Compare
Creates a server test to ensure a cluster replica's metrics are reflected in the new endpoint with the correct labels. Creates a similar test as a cloudtest such that we're testing that it works in a Kubernetes environment too
5bb5182 to
c32c1d2
Compare
Contributor
Author
Contributor
Author
|
Going to rename as |
def-
reviewed
May 13, 2026
Contributor
def-
left a comment
There was a problem hiding this comment.
I pushed a commit with a failing cargo test
aljoscha
approved these changes
May 13, 2026
Contributor
aljoscha
left a comment
There was a problem hiding this comment.
The structure and wiring and all this look good. Good to merge once Dennis is happy on the testing front and Pranshu is happy with the naming and things
We'll now have duplicate metric families, which wouldn't be a problem, but it means we'd have multiple HELP / TYPE texts which would cause errors in scrapers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Create a single prometheus endpoint, proxy all processes' metrics through environmentd, and add object/cluster/replica names to each series from clusterd
This PR touches a lot of files but it's mainly because we had to rename one of the method calls for the prometheus crate.
Motivation
Fixes https://linear.app/materializeinc/issue/SQL-189/created-federated-prometheus-endpoint
Description
See commit messages for details.
Verification