Skip to content

SQL-189: External metrics endpoint#36463

Merged
SangJunBak merged 10 commits into
MaterializeInc:mainfrom
SangJunBak:jun/external-metrics-endpoint-1
May 13, 2026
Merged

SQL-189: External metrics endpoint#36463
SangJunBak merged 10 commits into
MaterializeInc:mainfrom
SangJunBak:jun/external-metrics-endpoint-1

Conversation

@SangJunBak
Copy link
Copy Markdown
Contributor

@SangJunBak SangJunBak commented May 7, 2026

Create a single prometheus endpoint, proxy all processes' metrics through environmentd, and add object/cluster/replica names to each series from clusterd

This PR touches a lot of files but it's mainly because we had to rename one of the method calls for the prometheus crate.

Motivation

Fixes https://linear.app/materializeinc/issue/SQL-189/created-federated-prometheus-endpoint

Description

See commit messages for details.

Verification

  • Created an integration rust-based server test
  • Created a cloud test to test in a kubernetes environmebt

@SangJunBak SangJunBak changed the title Jun/external metrics endpoint 1 External metrics endpoint May 7, 2026
@SangJunBak SangJunBak force-pushed the jun/external-metrics-endpoint-1 branch 8 times, most recently from 65c9289 to 0c502bc Compare May 13, 2026 13:49
@SangJunBak SangJunBak changed the title External metrics endpoint SQL-189: External metrics endpoint May 13, 2026
@SangJunBak SangJunBak requested a review from mtabebe May 13, 2026 13:58
@SangJunBak SangJunBak force-pushed the jun/external-metrics-endpoint-1 branch 2 times, most recently from ddd9c8c to 29c43be Compare May 13, 2026 14:15
@SangJunBak SangJunBak marked this pull request as ready for review May 13, 2026 14:16
@SangJunBak SangJunBak requested review from a team as code owners May 13, 2026 14:16
@SangJunBak SangJunBak force-pushed the jun/external-metrics-endpoint-1 branch 2 times, most recently from e79a931 to d718848 Compare May 13, 2026 14:44
@SangJunBak SangJunBak requested review from aljoscha and removed request for a team and mtabebe May 13, 2026 14:48
Adds a feature flag for the upcoming federated
/metrics/external endpoint on environmentd, which will fan out a single
scrape across env's local metrics and every clusterd replica's /metrics.
Add the protobuf feature to the prometheus crate for future use. Updated all calls of get_value to value. This is because `get_value` is deprecated since v0.14.0 and when we enabled the protobuf feature in the Prometheus crate, the new build forces us to use it.
Adds a small crate that owns the rust-protobuf based decoder for
the Prometheus protobuf wire format. Encoding goes through
prometheus::ProtobufEncoder directly; decoding requires rust-protobuf's Message trait and CodedInputStream, which we fence behind
this crate via a deny.toml wrapper exception.

The initial motivation for denylisting the protobuf crate is because it wasn't actively maintained and we could get much of the same behavior using crates `prost` and `protobuf-native`. However because the prometheus crate already exposes encoding/decoding through protobuf, the alternative crates would require copying/forking Prometheus' client_model.proto and writing our own shims
The federated /metrics/external endpoint will use this to scrape clusterd replicas in protobuf so it can mutate labels before re-emitting
the merged result as text. For all callers of this endpoint, we need to pass the headermap
Extracts out helper functions used to proxy clusterd http endpoints from environmentd. We'll be using these to proxy each clusterd's metrics endpoint
Adds a new metrics endpoint that aggregates env's local Prometheus
metrics with every clusterd replica's /metrics output. Each
clusterd-sourced metric is decorated with cluster_id, replica_id,
process, and (when known) cluster_name and replica_name labels before
the merged result is re-emitted as Prometheus text. Names come from a
catalog snapshot taken once per request; IDs come from the known addresses. The handler scrapes replicas in protobuf (added in the
previous handle_prometheus commit) so labels can be mutated cheaply
without a text-format reparse.
@SangJunBak SangJunBak force-pushed the jun/external-metrics-endpoint-1 branch 2 times, most recently from 53cf91c to 5bb5182 Compare May 13, 2026 15:37
Creates a server test to ensure a cluster replica's metrics are reflected in the new endpoint with the correct labels. Creates a similar test as a cloudtest such that we're testing that it works in a Kubernetes environment too
@SangJunBak SangJunBak force-pushed the jun/external-metrics-endpoint-1 branch from 5bb5182 to c32c1d2 Compare May 13, 2026 15:43
@SangJunBak
Copy link
Copy Markdown
Contributor Author

@SangJunBak
Copy link
Copy Markdown
Contributor Author

Copy link
Copy Markdown
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit with a failing cargo test

Copy link
Copy Markdown
Contributor

@aljoscha aljoscha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure and wiring and all this look good. Good to merge once Dennis is happy on the testing front and Pranshu is happy with the naming and things

We'll now have duplicate metric families, which wouldn't be a problem, but it means we'd have multiple HELP / TYPE texts which would cause errors in scrapers.
@SangJunBak SangJunBak merged commit 8b77a49 into MaterializeInc:main May 13, 2026
121 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants