Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
.PHONY: help build clean hooks

# Test suites
.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-baseline bench-detectors bench-detectors-check bench-detectors-baseline
.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-allocs-check bench-baseline bench-detectors bench-detectors-check bench-detectors-baseline

# Format + tidy
.PHONY: fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify bump-otel
Expand Down Expand Up @@ -79,16 +79,23 @@ test: ## Run unit tests with the race detector.
bench: ## Run benchmarks across the repo with -benchmem, count=5.
go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./...

bench-check: ## Run perf gate: bench each gated pkg vs its committed baseline; fail if any row regresses >THRESHOLD% (default 10).
bench-check: ## Run perf gate: bench each gated pkg vs its committed baseline; fail if any row regresses >THRESHOLD% (default 10). Also runs the absolute allocs/event ceiling gate (issue #302).
@# Per-package registry + execution lives in scripts/bench-check-all.sh
@# rather than inline here: the bench regex contains parentheses that
@# Make-invoked /bin/sh tokenises as subshell metacharacters. THRESHOLD
@# env var passes through ("THRESHOLD=20 make bench-check" etc.).
scripts/bench-check-all.sh
@# Absolute alloc-per-event ceiling gate (issue #302). Complementary
@# to the % delta gate above — catches absolute ceiling breaches even
@# if the baseline silently drifted upward.
scripts/bench-allocs-check.sh

bench-baseline: ## Regenerate every per-package bench-baseline.txt. Commit the diff after vetting it on your hardware.
scripts/bench-baseline.sh

bench-allocs-check: ## Run absolute allocs/event ceiling gate for per-detector benches (issue #302). Fails if any detector exceeds its ceiling in scripts/bench-registry.sh::allocs_gate.
scripts/bench-allocs-check.sh

bench-detectors: ## Run per-detector allocs/event benches (issue #302). count=10 x 500ms; outputs full benchstat-compatible block.
go test -bench=. -benchmem -benchtime=500ms -count=10 -run='^$$' ./bench/detectors/

Expand Down
22 changes: 22 additions & 0 deletions bench/detectors/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,28 @@ go test -bench=. -benchmem -benchtime=500ms -count=10 \
-run='^$' ./bench/detectors/
```

## Two gates: hard absolute ceiling + soft % delta

Two complementary gates run against these benches:

1. **`make bench-allocs-check` (HARD, gates CI)** — reads
`allocs_gate` from `scripts/bench-registry.sh` and fails the build
if any detector exceeds its `allocs_per_event` ceiling. Catches
absolute breaches: a new detector lands above the per-event budget,
or a baseline-update PR tries to raise the ceiling without
adversarial review. Wired into `make bench-check` (CI step) so it
runs on every PR. Fails closed — a gated bench with no output
exits 2.

2. **`make bench-detectors-check` (SOFT, advisory)** — runs in the
post-merge `bench.yml` workflow and reports a % delta vs
`baselines.json`. Graduates to hard-fail per the criterion below.

The hard gate is the load-bearing one for ratchet semantics:
optimization PRs lower the ceiling in `allocs_gate`, locking the win
in. Reverts cannot silently raise it back without showing up in
`git diff scripts/bench-registry.sh`.

## CI wiring (soft gate)

The CI runs `make bench-detectors-check` and reports the diff but
Expand Down
139 changes: 139 additions & 0 deletions scripts/bench-allocs-check.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
#!/usr/bin/env bash
# bench-allocs-check.sh — hard absolute alloc-per-event ceiling gate
# for the per-detector benches (issue #302).
#
# Reads `allocs_gate` from scripts/bench-registry.sh, runs every
# benchmark under ./bench/detectors/ at count=10 x 500ms, parses the
# median allocs/op per benchmark, divides by the entry's window size,
# and fails (exit 1) if any allocs-per-event value exceeds its ceiling.
#
# Complementary to scripts/bench-check.sh (the % delta gate) — the
# delta gate only catches relative regression vs the committed
# baseline; this gate catches an absolute ceiling breach (e.g. a new
# detector lands above the per-event budget, or a baseline-update PR
# tries to raise the ceiling silently). Together they ratchet
# DOWNWARD: optimization PRs must lower the ceiling to lock the win in.
#
# FAIL CLOSED: a benchmark in `allocs_gate` with no measurement in the
# bench output exits 2 (configuration error). Missing data must not
# silently pass — that defeats the purpose of an absolute gate.
#
# Exit codes:
# 0 every gated detector stayed at-or-below its allocs-per-event ceiling.
# 1 one or more detectors breached the ceiling.
# 2 configuration error (gated bench produced no output, bad entry
# format, missing `go`, etc.).
#
# Portability: avoids bash 4 features so the script runs on macOS
# stock bash 3.2.
set -euo pipefail

# shellcheck source=scripts/bench-registry.sh
source "$(dirname "$0")/bench-registry.sh"

# shellcheck disable=SC2154 # allocs_gate is defined by bench-registry.sh
if [[ ${#allocs_gate[@]} -eq 0 ]]; then
echo "bench-allocs-check: allocs_gate is empty in scripts/bench-registry.sh" >&2
exit 2
fi

if ! command -v go >/dev/null 2>&1; then
echo "bench-allocs-check: go not in PATH" >&2
exit 2
fi

tmp=$(mktemp)
trap 'rm -f "$tmp"' EXIT

echo "==> running per-detector benches (count=10 x 500ms)"
go test -bench=. -benchmem -benchtime=500ms -count=10 \
-run='^$' ./bench/detectors/ > "$tmp" 2>&1

# Collect (bench_name, allocs_per_op) pairs. Bench names in Go output
# are suffixed with `-N` where N is GOMAXPROCS; strip it for matching.
samples=$(grep -E '^Benchmark[A-Za-z]+-[0-9]+' "$tmp" | awk '{
for (i = 1; i <= NF; i++) {
if ($i == "allocs/op") {
name = $1
sub(/-[0-9]+$/, "", name)
print name, $(i - 1)
break
}
}
}')

echo
echo "==> absolute allocs/event ceiling check"
echo

status=0
breached=""

for entry in "${allocs_gate[@]}"; do
name="${entry%%|*}"
rest="${entry#*|}"
window="${rest%%|*}"
ceiling="${rest#*|}"

if [[ -z "$name" || -z "$window" || -z "$ceiling" ]]; then
echo "bench-allocs-check: malformed allocs_gate entry: $entry" >&2
status=2
continue
fi
if ! [[ "$window" =~ ^[0-9]+$ && "$ceiling" =~ ^[0-9]+$ ]]; then
echo "bench-allocs-check: non-numeric window/ceiling in: $entry" >&2
status=2
continue
fi

# Median allocs/op across the 10 samples. Take the upper-median
# (NR/2+1) so we round toward the worse side — single-spike noise
# cannot mask a real regression.
median=$(awk -v k="$name" '$1 == k {print $2}' <<<"$samples" \
| sort -n \
| awk 'BEGIN {n=0} {a[++n] = $1} END {if (n == 0) exit 0; print a[int(n/2) + 1]}')

if [[ -z "$median" ]]; then
echo "bench-allocs-check: no measurement for gated bench '$name'" >&2
echo " (allocs_gate entry has no matching Benchmark in ./bench/detectors/)" >&2
status=2
continue
fi

allocs_per_event=$(awk -v m="$median" -v w="$window" 'BEGIN {
if (w == 0) { print "+inf"; exit }
printf "%.4f", m / w
}')

over=$(awk -v a="$allocs_per_event" -v c="$ceiling" 'BEGIN {
print (a + 0 > c + 0) ? "yes" : "no"
}')

if [[ "$over" == "yes" ]]; then
printf " %-40s ceiling=%-3s allocs/op=%-7s allocs/ev=%-7s BREACH\n" \
"$name" "$ceiling" "$median" "$allocs_per_event"
breached+=" $name allocs/event=$allocs_per_event > ceiling=$ceiling (allocs/op=$median, window=$window)"$'\n'
status=1
else
printf " %-40s ceiling=%-3s allocs/op=%-7s allocs/ev=%-7s ok\n" \
"$name" "$ceiling" "$median" "$allocs_per_event"
fi
done

if [[ $status -eq 1 ]]; then
echo >&2
echo "FAIL: the following detectors exceeded their allocs/event ceiling:" >&2
echo "$breached" >&2
echo "Either:" >&2
echo " - root-cause the alloc growth (escape analysis, hot-path string ops, etc.)" >&2
echo " - or raise the ceiling in scripts/bench-registry.sh (allocs_gate) WITH a" >&2
echo " PR-body justification + adversarial reviewer signoff." >&2
elif [[ $status -eq 2 ]]; then
echo >&2
echo "FAIL: configuration error (see messages above). Gate exits 2 (fail-closed)." >&2
else
echo
echo "PASS: every detector stayed at-or-below its allocs/event ceiling."
fi

exit "$status"
38 changes: 38 additions & 0 deletions scripts/bench-registry.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,41 @@ bench_entries=(
"module/pkg/patterns|^BenchmarkPodEvictedDetector"
"components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$"
)

# allocs_gate — per-detector absolute alloc ceilings (issue #302). Each
# entry is "bench_name|window_size|ceiling_allocs_per_event".
#
# Semantics: scripts/bench-allocs-check.sh runs every benchmark under
# ./bench/detectors/ at count=10 x 500ms, takes the median allocs/op
# per benchmark, divides by window_size to get allocs-per-event, and
# fails (exit 1) if any value exceeds its ceiling.
#
# This is a HARD ABSOLUTE GATE, complementary to bench-check.sh's % delta
# gate: % delta only catches relative regression, but this catches a
# detector that ships above the alloc-per-event ceiling regardless of
# its prior baseline. Ratchets DOWNWARD as detector optimizations land —
# a PR that reduces allocs/event must update the ceiling so the gain
# locks in.
#
# Window size: 1024 events, matching bench/detectors/detectors_bench_test.go
# (the `benchWindowSize` const). One source of truth: changing the window
# in the bench file requires updating these entries.
#
# Initial ceilings (measured baseline + 1 slack per #302):
# - pod_evicted measured 15.27/ev, ceiling 16 (tracked: optimize via #TBD)
# - xid_correlation measured 12.4/ev, ceiling 13
# - hbm_ecc measured 1.4/ev, ceiling 2
# - nccl_hang measured 3.99/ev, ceiling 5
# - thermal_throttle measured 0.76/ev, ceiling 1
# - pcie_aer measured 0.51/ev, ceiling 1
#
# NORTHSTAR is 2 allocs/event; the per-detector tracking issues drive
# the ratchet path down to that ceiling. See #302 for the rollup.
allocs_gate=(
"BenchmarkPodEvictedDetector|1024|16"
"BenchmarkXidCorrelationDetector|1024|13"
"BenchmarkHBMECCDetector|1024|2"
"BenchmarkNCCLHangDetector|1024|5"
"BenchmarkThermalThrottleDetector|1024|1"
"BenchmarkPCIeAERDetector|1024|1"
)