feat(bench): absolute allocs/event ceiling gate (#302)#416
Conversation
Adds scripts/bench-allocs-check.sh + an allocs_gate array in scripts/bench-registry.sh that pins each per-detector benchmark to an absolute allocs-per-event ceiling, complementing the existing % delta gate. Wired into `make bench-check` so CI fails closed on breach. The new gate is the load-bearing one for ratchet semantics: optimization PRs must lower the ceiling in the registry to lock the win in, and reverts cannot silently raise it back without showing up in `git diff scripts/bench-registry.sh`. Initial ceilings (measured baseline + 1 slack on Apple M1 Max; allocs/op is hardware-invariant so the same numbers gate on CI): - pod_evicted 15.27 ev → ceiling 16 - xid_correlation 12.40 ev → ceiling 13 - hbm_ecc 1.40 ev → ceiling 2 - nccl_hang 3.99 ev → ceiling 5 - thermal_throttle 0.76 ev → ceiling 1 - pcie_aer 0.51 ev → ceiling 1 NORTHSTAR is 2 allocs/event; the per-detector optimization trackers (filed separately) drive the ratchet path down. Mutation-verified: tightening HBM ceiling from 2 → 1 in the registry trips the gate (exit 1, "BREACH"). Fail-closed verified: a gated bench with no measurement exits 2. ```release-notes - New `make bench-allocs-check` (wired into `make bench-check`) enforces absolute allocs-per-event ceilings on every pattern- library detector; ratchets downward as optimization PRs land. ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>
Independent Review: feat(bench): absolute allocs/event ceiling gate (#302)Mutation Verification: PASS
Scope Alignment: PASSIssue #302 asks for "ratchet gate"; PR delivers absolute ceiling with hard-fail on CI.
CI Wiring: PASS
Dual-Gate Logic: SIMPLIFICATION OPPORTUNITYPR gates on BOTH % delta AND absolute ceiling. Assessment:
Hardware Invariance Claim: UNVERIFIEDPR asserts "allocs/op is hardware-invariant" and M1 ceilings will hold on ubuntu-latest/amd64.
Minor: Over-Engineering Check
Findings
VerdictSHIP — mutation test passes, all 6 detectors gated, CI wiring correct, fail-closed semantics solid. Dual-gate trade-off is defensible. Let CI run validate amd64 parity. |
Closes #424. ## Root cause PR #416 shipped `make bench-allocs-check` as the absolute allocs/event ceiling gate and wired it into `make bench-check` (so CI's `verify-static` job runs it). It was **not** wired into `make ci-full`. Result: local pre-PR `make ci-full` diverges from what CI enforces — a per-event allocs ceiling breach passes locally and only fails after the PR is opened, defeating the per-PR enforcement intent of #302. (The issue's framing claims per-PR enforcement is entirely missing; in fact CI's verify-static already enforces it via the `bench-check` step. The actual gap is local/CI divergence, which is what this PR closes.) ## Change One-line dep addition: `bench-allocs-check` appended to `ci-full`'s prerequisite list. Plus comment + PRINCIPLES.md §10 update naming the new wall-time floor. ```diff -ci-full: ... build smoke-quickstart ## ... +ci-full: ... build smoke-quickstart bench-allocs-check ## ... ``` ## Wall-time impact (M1 Max) | Run | Before | After | |---|---|---| | `make ci-full` (cold testcache) | 229s | 283s | | `make bench-allocs-check` standalone | n/a | ~80s | Delta: **+54s** (effective, with warm Go build cache from prior `ci-full` steps; cold-only standalone bench is ~80s). PRINCIPLES.md §10's historical "~2.5m" budget was already stale (cold-cache `ci-full` measured 229s = 3.8m before this PR). Updated to ~5m and named the dominant steps (`coverage-check`, `build`, `bench-allocs-check`). ## Mutation test (TDD red) Tightened HBM ceiling 2 → 1 in `scripts/bench-registry.sh`, ran `make bench-allocs-check`: ``` BenchmarkHBMECCDetector ceiling=1 allocs/op=1429 allocs/ev=1.3955 BREACH FAIL: the following detectors exceeded their allocs/event ceiling: BenchmarkHBMECCDetector allocs/event=1.3955 > ceiling=1 (allocs/op=1429, window=1024) make: *** [bench-allocs-check] Error 1 ``` Reverted. Confirmed `make -n ci-full | grep bench-allocs-check` resolves to `scripts/bench-allocs-check.sh` (i.e., the new dep is in `ci-full`'s graph). ## Verify (green) - `make ci-full` (cold testcache) → PASS in 283.54s; tail shows all six detectors at-or-below ceiling. - `make lint`, `make vet`, pre-commit hooks → green. ## Audit (A+ criterion) Other bench targets and CI status: - `bench-check` (composite: % delta + allocs ceiling) — CI's verify-static runs it; local `ci-full` now runs the allocs-ceiling half. The % delta half (`bench-check-all.sh`) is **not** added to `ci-full` because it requires `benchstat` (CI installs it; local devs would need a manual `go install`). Acceptable: CI is the ground truth; local divergence on the % delta gate is acknowledged. - `bench-detectors-check` — intentionally SOFT gate (script comment), runs in `.github/workflows/bench.yml` as a PR-comment annotation. Graduates to hard-fail per `bench/detectors/README.md` roadmap. No action. - `bench`, `bench-detectors`, `bench-baseline`, `bench-detectors-baseline` — manual exploration / regen helpers. Not gates. No action. Why not `verify` (pre-push)? `verify` is documented as `<30s`; +80s busts the budget 3x. ## Follow-up Will file separately: ratchet path to fold `bench-allocs-check` wall-time down (faster `benchtime` or `count`) so the gate eventually folds back under the historical 2.5m local budget. Tracked-against-#424 in the issue's existing "Cost" line. ```release-notes - `make ci-full` now invokes `bench-allocs-check`, closing the local/CI gap that let per-event allocs-ceiling breaches pass local pre-PR but fail in CI (#424). ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>
Closes #302.
Summary
Adds a hard absolute alloc-per-event ceiling gate complementing the existing
% delta gate. Wired into
make bench-check(the CI step), so a PR that landsabove the per-event ceiling on any pattern-library detector fails closed.
scripts/bench-allocs-check.shruns everyBenchmarkXxxDetectorunder
./bench/detectors/atcount=10 × 500ms, takes the medianallocs/op, divides by the per-entry window size (1024), compares to theceiling in
scripts/bench-registry.sh::allocs_gate.scripts/bench-registry.shgains anallocs_gatearray withone row per detector (
bench_name|window|ceiling_allocs_per_event).make bench-checknow invokes both gates (% delta + absolute).A new standalone
make bench-allocs-checktarget runs the new gate alone.bench/detectors/README.mdgains a "Two gates" sectiondescribing how the hard ceiling and the soft % delta gate compose.
Why an absolute gate
make bench-checktoday gates against a committedbench-baseline.txtviabenchstat % delta. That catches relative regression but not absolute
breach: a PR that updates the baseline ALSO raises the ceiling implicitly,
and a reverted optimization can creep back above the prior floor without
showing up as a >10% delta from its own (post-revert) baseline.
The absolute ceiling lives in
git diff scripts/bench-registry.sh—any change to the ceiling is reviewable on the diff alone. Optimization PRs
must lower the ceiling to lock the win in; reverts cannot silently raise
it back. That is the ratchet semantic #302 asks for.
Initial ceilings (measured baseline + 1 slack, Apple M1 Max)
testing.B'sallocs/opis hardware-invariant — same value on CI'slinux/amd64 runner — so the M1-captured numbers gate cleanly.
NORTHSTAR is 2 allocs/event; pod_evicted / xid_correlation / nccl_hang are
the three above the bar. Per-detector optimization trackers will be filed
as follow-ups to drive the ratchet path down.
Test plan
make bench-allocs-checklocally → PASS (all six detectors belowceiling).
allocs_gate, re-ran: gate exit 1,BenchmarkHBMECCDetector allocs/event=1.3955 > ceiling=1 ... BREACH. Reverted.BenchmarkNonexistentDetector, re-ran: gate exit 2, "no measurementfor gated bench". Reverted.
make vet,make lint, pre-commit hooks → green.check run is the validation).