Skip to content

feat: Add Devnet-4 metrics#29

Merged
KatyaRyazantseva merged 3 commits into
leanEthereum:mainfrom
KatyaRyazantseva:devnet4-metrics
Apr 16, 2026
Merged

feat: Add Devnet-4 metrics#29
KatyaRyazantseva merged 3 commits into
leanEthereum:mainfrom
KatyaRyazantseva:devnet4-metrics

Conversation

@KatyaRyazantseva
Copy link
Copy Markdown
Collaborator

This PR adds a new set of metrics for Devnet-4 related to recursive aggregation plus extra metrics for monitoring node sync status and gossip message sizes.

MegaRedHand pushed a commit to lambdaclass/ethlambda that referenced this pull request Apr 15, 2026
## Motivation

Implements the metrics defined in
[leanEthereum/leanMetrics#29](leanEthereum/leanMetrics#29)
for Devnet-4 monitoring: block production, gossip message sizes, sync
status, and updated histogram buckets.

## Description

### Block Production Metrics (`blockchain/metrics.rs` → instrumented in
`lib.rs` + `store.rs`)

| Metric | Type | Buckets |
|--------|------|---------|
| `lean_block_building_time_seconds` | Histogram | 0.01, 0.025, 0.05,
0.1, 0.25, 0.5, 0.75, 1 |
| `lean_block_building_payload_aggregation_time_seconds` | Histogram |
0.1, 0.25, 0.5, 0.75, 1, 2, 3, 4 |
| `lean_block_aggregated_payloads` | Histogram | 1, 2, 4, 8, 16, 32, 64,
128 |
| `lean_block_building_success_total` | Counter | — |
| `lean_block_building_failures_total` | Counter | — |

- `propose_block()` is wrapped with a timing guard for total block
building time, and increments success/failure counters on each exit
path.
- `produce_block_with_signatures()` times the `build_block()` call
specifically for payload aggregation, and observes the aggregated
payload count.

### Sync Status (`blockchain/metrics.rs` → tracked in `lib.rs`)

| Metric | Type | Labels |
|--------|------|--------|
| `lean_node_sync_status` | Gauge | status=idle,syncing,synced |

- Set to `idle` at startup (before first tick).
- Updated to `syncing` or `synced` after every block, by comparing
`head_slot` against the wall clock slot.

### Gossip Message Size Metrics (`p2p/metrics.rs` → instrumented in
`gossipsub/handler.rs`)

| Metric | Type | Buckets |
|--------|------|---------|
| `lean_gossip_block_size_bytes` | Histogram | 10K, 50K, 100K, 250K,
500K, 1M, 2M, 5M |
| `lean_gossip_attestation_size_bytes` | Histogram | 512, 1K, 2K, 4K,
8K, 16K |
| `lean_gossip_aggregation_size_bytes` | Histogram | 1K, 4K, 16K, 64K,
128K, 256K, 512K, 1M |

- Observed on the **uncompressed** message after snappy decompression,
for each gossip message type.

### Modified Existing Metric

- `lean_committee_signatures_aggregation_time_seconds`: buckets updated
from `[0.005..1.0]` to `[0.05..4.0]` to capture longer aggregation times
in Devnet-4.

## How to Test

1. `make fmt` — passes
2. `make lint` — passes (clippy clean)
3. `make test` — spec test failures are pre-existing (fixture format
mismatch, unrelated to this PR)
4. Run a local devnet and verify new metrics appear on the `/metrics`
endpoint (port 5054)
@KatyaRyazantseva KatyaRyazantseva merged commit d38c9e9 into leanEthereum:main Apr 16, 2026
dimka90 added a commit to geanlabs/gean that referenced this pull request Apr 17, 2026
Adds the metrics tracked in leanEthereum/leanMetrics#29 for cross-client
Grafana dashboarding. All client teams committed to implementing these.

Block production (5):
  lean_block_building_time_seconds                     histogram
  lean_block_building_payload_aggregation_time_seconds histogram
  lean_block_aggregated_payloads                       histogram
  lean_block_building_success_total                    counter
  lean_block_building_failures_total                   counter

Network gossip sizes (3):
  lean_gossip_block_size_bytes                         histogram
  lean_gossip_attestation_size_bytes                   histogram
  lean_gossip_aggregation_size_bytes                   histogram

Sync status (1):
  lean_node_sync_status{status=idle|syncing|synced}    gauge

Wiring:
  - Block production: timed in ProduceBlockWithSignatures + buildBlock
  - Gossip size: nil-safe hooks in p2p/gossip.go set by node at engine start
    (avoids p2p->node import cycle, matches existing AggregateMetricsFunc pattern)
  - Sync status: updated each tick based on head-vs-wallclock distance and
    peer count. idle=no peers, synced=head within 2 slots, syncing=otherwise

No production behavior change beyond observability.
zclawz pushed a commit to blockblaz/zeam that referenced this pull request Apr 17, 2026
Implements the metrics defined in leanEthereum/leanMetrics#29:

## Block production metrics (chain.zig)
- lean_block_building_time_seconds (Histogram): total produceBlock() wall time
- lean_block_building_payload_aggregation_time_seconds (Histogram): time
  to aggregate attestation payloads during block building
- lean_block_aggregated_payloads (Histogram): number of aggregated
  attestation signatures included in produced block
- lean_block_building_success_total (Counter): incremented on each
  successful block production
- lean_block_building_failures_total (Counter): incremented via errdefer
  on any block production error

## Sync status metric (chain.zig)
- lean_node_sync_status (Gauge): updated every onInterval tick;
  0=idle (no peers / fc_initing), 1=syncing (behind peers), 2=synced

## Gossip message size metrics (ethlibp2p.zig)
- lean_gossip_block_size_bytes (Histogram): uncompressed block gossip size
- lean_gossip_attestation_size_bytes (Histogram): uncompressed attestation size
- lean_gossip_aggregation_size_bytes (Histogram): uncompressed aggregation size
  Observed after snappy decode, per topic kind, matching spec bucket sizes.

## Updated existing metric
- lean_committee_signatures_aggregation_time_seconds: buckets widened
  from [0.005..1] to [0.05..4] to capture longer Devnet-4 aggregation times

## Infrastructure
- Added Histogram.record(value) method for direct observation without a timer
- Wired @zeam/metrics into @zeam/network module in build.zig

Ref: leanEthereum/leanMetrics#29
anshalshukla pushed a commit to blockblaz/zeam that referenced this pull request Apr 21, 2026
* feat(metrics): add Devnet-4 metrics from leanMetrics#29

Implements the metrics defined in leanEthereum/leanMetrics#29:

## Block production metrics (chain.zig)
- lean_block_building_time_seconds (Histogram): total produceBlock() wall time
- lean_block_building_payload_aggregation_time_seconds (Histogram): time
  to aggregate attestation payloads during block building
- lean_block_aggregated_payloads (Histogram): number of aggregated
  attestation signatures included in produced block
- lean_block_building_success_total (Counter): incremented on each
  successful block production
- lean_block_building_failures_total (Counter): incremented via errdefer
  on any block production error

## Sync status metric (chain.zig)
- lean_node_sync_status (Gauge): updated every onInterval tick;
  0=idle (no peers / fc_initing), 1=syncing (behind peers), 2=synced

## Gossip message size metrics (ethlibp2p.zig)
- lean_gossip_block_size_bytes (Histogram): uncompressed block gossip size
- lean_gossip_attestation_size_bytes (Histogram): uncompressed attestation size
- lean_gossip_aggregation_size_bytes (Histogram): uncompressed aggregation size
  Observed after snappy decode, per topic kind, matching spec bucket sizes.

## Updated existing metric
- lean_committee_signatures_aggregation_time_seconds: buckets widened
  from [0.005..1] to [0.05..4] to capture longer Devnet-4 aggregation times

## Infrastructure
- Added Histogram.record(value) method for direct observation without a timer
- Wired @zeam/metrics into @zeam/network module in build.zig

Ref: leanEthereum/leanMetrics#29

* chore: align with leanMetrics devnet-4 spec

* chore: update zeam metrics

* fix: format fix

* fix: address anshalshukla review on PR #753

- Restore source labels ('gossip'/'aggregation'/'block') for
  lean_attestations_valid_total and lean_attestations_invalid_total
  counters in chain.zig and lib.zig (CounterVec restored)
- Move lean_committee_signatures_aggregation_time_seconds timer
  from aggregate() into aggregateUnlocked() so it measures only
  the time after the mutex is acquired
- Add metrics around compactAttestations in forkchoice.zig:
  lean_compact_attestations_time_seconds, _input_total, _output_total
- Restore the Metrics Definitions section in README.md (including
  updated metric names and new block/compact metrics)
- Restore full init() boilerplate in README Step 2

* fix: use constSlice().len for utils.List length in metrics counters

* fix: udate, remove redundant zeam metrics

* fix: prefix zeam inner metrics with 'zeam_'

* fix: correct lean_pq_sig_aggregated_signatures_building_time_seconds

---------

Co-authored-by: zclawz <zclawz@openclaw.ai>
Co-authored-by: Parthasarathy Ramanujam <1627026+ch4r10t33r@users.noreply.github.com>
Co-authored-by: Katya Ryazantseva <sibkatya@gmail.com>
Co-authored-by: zclawz <zclawz@blockblaz.xyz>
Co-authored-by: zclawz <zclawz@zeam.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants