From dac9e840e338992a84fa7095f57fd7cef2da13f6 Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Mon, 1 Jun 2026 16:00:40 -0700 Subject: [PATCH] feat(bench): absolute allocs/event ceiling gate (issue #302) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds scripts/bench-allocs-check.sh + an allocs_gate array in scripts/bench-registry.sh that pins each per-detector benchmark to an absolute allocs-per-event ceiling, complementing the existing % delta gate. Wired into `make bench-check` so CI fails closed on breach. The new gate is the load-bearing one for ratchet semantics: optimization PRs must lower the ceiling in the registry to lock the win in, and reverts cannot silently raise it back without showing up in `git diff scripts/bench-registry.sh`. Initial ceilings (measured baseline + 1 slack on Apple M1 Max; allocs/op is hardware-invariant so the same numbers gate on CI): - pod_evicted 15.27 ev → ceiling 16 - xid_correlation 12.40 ev → ceiling 13 - hbm_ecc 1.40 ev → ceiling 2 - nccl_hang 3.99 ev → ceiling 5 - thermal_throttle 0.76 ev → ceiling 1 - pcie_aer 0.51 ev → ceiling 1 NORTHSTAR is 2 allocs/event; the per-detector optimization trackers (filed separately) drive the ratchet path down. Mutation-verified: tightening HBM ceiling from 2 → 1 in the registry trips the gate (exit 1, "BREACH"). Fail-closed verified: a gated bench with no measurement exits 2. ```release-notes - New `make bench-allocs-check` (wired into `make bench-check`) enforces absolute allocs-per-event ceilings on every pattern- library detector; ratchets downward as optimization PRs land. ``` Signed-off-by: Tri Lam --- Makefile | 11 ++- bench/detectors/README.md | 22 ++++++ scripts/bench-allocs-check.sh | 139 ++++++++++++++++++++++++++++++++++ scripts/bench-registry.sh | 38 ++++++++++ 4 files changed, 208 insertions(+), 2 deletions(-) create mode 100755 scripts/bench-allocs-check.sh diff --git a/Makefile b/Makefile index 539bce2a..ac7d73b8 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ .PHONY: help build clean hooks # Test suites -.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-baseline bench-detectors bench-detectors-check bench-detectors-baseline +.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-allocs-check bench-baseline bench-detectors bench-detectors-check bench-detectors-baseline # Format + tidy .PHONY: fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify bump-otel @@ -79,16 +79,23 @@ test: ## Run unit tests with the race detector. bench: ## Run benchmarks across the repo with -benchmem, count=5. go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./... -bench-check: ## Run perf gate: bench each gated pkg vs its committed baseline; fail if any row regresses >THRESHOLD% (default 10). +bench-check: ## Run perf gate: bench each gated pkg vs its committed baseline; fail if any row regresses >THRESHOLD% (default 10). Also runs the absolute allocs/event ceiling gate (issue #302). @# Per-package registry + execution lives in scripts/bench-check-all.sh @# rather than inline here: the bench regex contains parentheses that @# Make-invoked /bin/sh tokenises as subshell metacharacters. THRESHOLD @# env var passes through ("THRESHOLD=20 make bench-check" etc.). scripts/bench-check-all.sh + @# Absolute alloc-per-event ceiling gate (issue #302). Complementary + @# to the % delta gate above — catches absolute ceiling breaches even + @# if the baseline silently drifted upward. + scripts/bench-allocs-check.sh bench-baseline: ## Regenerate every per-package bench-baseline.txt. Commit the diff after vetting it on your hardware. scripts/bench-baseline.sh +bench-allocs-check: ## Run absolute allocs/event ceiling gate for per-detector benches (issue #302). Fails if any detector exceeds its ceiling in scripts/bench-registry.sh::allocs_gate. + scripts/bench-allocs-check.sh + bench-detectors: ## Run per-detector allocs/event benches (issue #302). count=10 x 500ms; outputs full benchstat-compatible block. go test -bench=. -benchmem -benchtime=500ms -count=10 -run='^$$' ./bench/detectors/ diff --git a/bench/detectors/README.md b/bench/detectors/README.md index 120239c0..559300e6 100644 --- a/bench/detectors/README.md +++ b/bench/detectors/README.md @@ -35,6 +35,28 @@ go test -bench=. -benchmem -benchtime=500ms -count=10 \ -run='^$' ./bench/detectors/ ``` +## Two gates: hard absolute ceiling + soft % delta + +Two complementary gates run against these benches: + +1. **`make bench-allocs-check` (HARD, gates CI)** — reads + `allocs_gate` from `scripts/bench-registry.sh` and fails the build + if any detector exceeds its `allocs_per_event` ceiling. Catches + absolute breaches: a new detector lands above the per-event budget, + or a baseline-update PR tries to raise the ceiling without + adversarial review. Wired into `make bench-check` (CI step) so it + runs on every PR. Fails closed — a gated bench with no output + exits 2. + +2. **`make bench-detectors-check` (SOFT, advisory)** — runs in the + post-merge `bench.yml` workflow and reports a % delta vs + `baselines.json`. Graduates to hard-fail per the criterion below. + +The hard gate is the load-bearing one for ratchet semantics: +optimization PRs lower the ceiling in `allocs_gate`, locking the win +in. Reverts cannot silently raise it back without showing up in +`git diff scripts/bench-registry.sh`. + ## CI wiring (soft gate) The CI runs `make bench-detectors-check` and reports the diff but diff --git a/scripts/bench-allocs-check.sh b/scripts/bench-allocs-check.sh new file mode 100755 index 00000000..d63a12f2 --- /dev/null +++ b/scripts/bench-allocs-check.sh @@ -0,0 +1,139 @@ +#!/usr/bin/env bash +# bench-allocs-check.sh — hard absolute alloc-per-event ceiling gate +# for the per-detector benches (issue #302). +# +# Reads `allocs_gate` from scripts/bench-registry.sh, runs every +# benchmark under ./bench/detectors/ at count=10 x 500ms, parses the +# median allocs/op per benchmark, divides by the entry's window size, +# and fails (exit 1) if any allocs-per-event value exceeds its ceiling. +# +# Complementary to scripts/bench-check.sh (the % delta gate) — the +# delta gate only catches relative regression vs the committed +# baseline; this gate catches an absolute ceiling breach (e.g. a new +# detector lands above the per-event budget, or a baseline-update PR +# tries to raise the ceiling silently). Together they ratchet +# DOWNWARD: optimization PRs must lower the ceiling to lock the win in. +# +# FAIL CLOSED: a benchmark in `allocs_gate` with no measurement in the +# bench output exits 2 (configuration error). Missing data must not +# silently pass — that defeats the purpose of an absolute gate. +# +# Exit codes: +# 0 every gated detector stayed at-or-below its allocs-per-event ceiling. +# 1 one or more detectors breached the ceiling. +# 2 configuration error (gated bench produced no output, bad entry +# format, missing `go`, etc.). +# +# Portability: avoids bash 4 features so the script runs on macOS +# stock bash 3.2. +set -euo pipefail + +# shellcheck source=scripts/bench-registry.sh +source "$(dirname "$0")/bench-registry.sh" + +# shellcheck disable=SC2154 # allocs_gate is defined by bench-registry.sh +if [[ ${#allocs_gate[@]} -eq 0 ]]; then + echo "bench-allocs-check: allocs_gate is empty in scripts/bench-registry.sh" >&2 + exit 2 +fi + +if ! command -v go >/dev/null 2>&1; then + echo "bench-allocs-check: go not in PATH" >&2 + exit 2 +fi + +tmp=$(mktemp) +trap 'rm -f "$tmp"' EXIT + +echo "==> running per-detector benches (count=10 x 500ms)" +go test -bench=. -benchmem -benchtime=500ms -count=10 \ + -run='^$' ./bench/detectors/ > "$tmp" 2>&1 + +# Collect (bench_name, allocs_per_op) pairs. Bench names in Go output +# are suffixed with `-N` where N is GOMAXPROCS; strip it for matching. +samples=$(grep -E '^Benchmark[A-Za-z]+-[0-9]+' "$tmp" | awk '{ + for (i = 1; i <= NF; i++) { + if ($i == "allocs/op") { + name = $1 + sub(/-[0-9]+$/, "", name) + print name, $(i - 1) + break + } + } +}') + +echo +echo "==> absolute allocs/event ceiling check" +echo + +status=0 +breached="" + +for entry in "${allocs_gate[@]}"; do + name="${entry%%|*}" + rest="${entry#*|}" + window="${rest%%|*}" + ceiling="${rest#*|}" + + if [[ -z "$name" || -z "$window" || -z "$ceiling" ]]; then + echo "bench-allocs-check: malformed allocs_gate entry: $entry" >&2 + status=2 + continue + fi + if ! [[ "$window" =~ ^[0-9]+$ && "$ceiling" =~ ^[0-9]+$ ]]; then + echo "bench-allocs-check: non-numeric window/ceiling in: $entry" >&2 + status=2 + continue + fi + + # Median allocs/op across the 10 samples. Take the upper-median + # (NR/2+1) so we round toward the worse side — single-spike noise + # cannot mask a real regression. + median=$(awk -v k="$name" '$1 == k {print $2}' <<<"$samples" \ + | sort -n \ + | awk 'BEGIN {n=0} {a[++n] = $1} END {if (n == 0) exit 0; print a[int(n/2) + 1]}') + + if [[ -z "$median" ]]; then + echo "bench-allocs-check: no measurement for gated bench '$name'" >&2 + echo " (allocs_gate entry has no matching Benchmark in ./bench/detectors/)" >&2 + status=2 + continue + fi + + allocs_per_event=$(awk -v m="$median" -v w="$window" 'BEGIN { + if (w == 0) { print "+inf"; exit } + printf "%.4f", m / w + }') + + over=$(awk -v a="$allocs_per_event" -v c="$ceiling" 'BEGIN { + print (a + 0 > c + 0) ? "yes" : "no" + }') + + if [[ "$over" == "yes" ]]; then + printf " %-40s ceiling=%-3s allocs/op=%-7s allocs/ev=%-7s BREACH\n" \ + "$name" "$ceiling" "$median" "$allocs_per_event" + breached+=" $name allocs/event=$allocs_per_event > ceiling=$ceiling (allocs/op=$median, window=$window)"$'\n' + status=1 + else + printf " %-40s ceiling=%-3s allocs/op=%-7s allocs/ev=%-7s ok\n" \ + "$name" "$ceiling" "$median" "$allocs_per_event" + fi +done + +if [[ $status -eq 1 ]]; then + echo >&2 + echo "FAIL: the following detectors exceeded their allocs/event ceiling:" >&2 + echo "$breached" >&2 + echo "Either:" >&2 + echo " - root-cause the alloc growth (escape analysis, hot-path string ops, etc.)" >&2 + echo " - or raise the ceiling in scripts/bench-registry.sh (allocs_gate) WITH a" >&2 + echo " PR-body justification + adversarial reviewer signoff." >&2 +elif [[ $status -eq 2 ]]; then + echo >&2 + echo "FAIL: configuration error (see messages above). Gate exits 2 (fail-closed)." >&2 +else + echo + echo "PASS: every detector stayed at-or-below its allocs/event ceiling." +fi + +exit "$status" diff --git a/scripts/bench-registry.sh b/scripts/bench-registry.sh index 6a57f89f..97ffbdf2 100644 --- a/scripts/bench-registry.sh +++ b/scripts/bench-registry.sh @@ -23,3 +23,41 @@ bench_entries=( "module/pkg/patterns|^BenchmarkPodEvictedDetector" "components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$" ) + +# allocs_gate — per-detector absolute alloc ceilings (issue #302). Each +# entry is "bench_name|window_size|ceiling_allocs_per_event". +# +# Semantics: scripts/bench-allocs-check.sh runs every benchmark under +# ./bench/detectors/ at count=10 x 500ms, takes the median allocs/op +# per benchmark, divides by window_size to get allocs-per-event, and +# fails (exit 1) if any value exceeds its ceiling. +# +# This is a HARD ABSOLUTE GATE, complementary to bench-check.sh's % delta +# gate: % delta only catches relative regression, but this catches a +# detector that ships above the alloc-per-event ceiling regardless of +# its prior baseline. Ratchets DOWNWARD as detector optimizations land — +# a PR that reduces allocs/event must update the ceiling so the gain +# locks in. +# +# Window size: 1024 events, matching bench/detectors/detectors_bench_test.go +# (the `benchWindowSize` const). One source of truth: changing the window +# in the bench file requires updating these entries. +# +# Initial ceilings (measured baseline + 1 slack per #302): +# - pod_evicted measured 15.27/ev, ceiling 16 (tracked: optimize via #TBD) +# - xid_correlation measured 12.4/ev, ceiling 13 +# - hbm_ecc measured 1.4/ev, ceiling 2 +# - nccl_hang measured 3.99/ev, ceiling 5 +# - thermal_throttle measured 0.76/ev, ceiling 1 +# - pcie_aer measured 0.51/ev, ceiling 1 +# +# NORTHSTAR is 2 allocs/event; the per-detector tracking issues drive +# the ratchet path down to that ceiling. See #302 for the rollup. +allocs_gate=( + "BenchmarkPodEvictedDetector|1024|16" + "BenchmarkXidCorrelationDetector|1024|13" + "BenchmarkHBMECCDetector|1024|2" + "BenchmarkNCCLHangDetector|1024|5" + "BenchmarkThermalThrottleDetector|1024|1" + "BenchmarkPCIeAERDetector|1024|1" +)