Follow-up to #302 / PR #416.
Status after PR #416
Remaining for Phase 2
- Verify linux/amd64 baseline matches M1 numbers. Initial ceilings were measured on Apple M1 Max; the README claims +/- 1 alloc on linux/amd64 (Go's
testing.B allocs/op is hardware-invariant in theory). Confirm by:
- Capture per-detector allocs/op output from the next 10 push-to-main
bench.yml runs.
- Compare medians vs
bench/detectors/baselines.json.
- If any detector shifts >= 2 allocs/op consistently, re-baseline on linux and update the JSON + ceilings.
- Graduate soft gate to hard. Per
bench/detectors/README.md:
- N >= 10 consecutive PRs with no flake-driven false positive.
- alloc-CV < 1% across those 10 PRs per detector.
- Flip
$gate_mode in baselines.json from soft to hard; change scripts/bench-check-detectors.sh final exit 0 to exit "$status".
- (Optional) Cross-CV captured in CI artifact. Have
bench.yml upload the bench output as an artifact each run; a small script can compute rolling CV across the last 10 runs to drive the graduation decision objectively.
Out of scope
Per-detector hot-path optimization PRs (track separately as bench-ratchet: issues per detector).
Follow-up to #302 / PR #416.
Status after PR #416
make bench-checkinvokes both gates (% delta + absolute ceiling).bench-detectors-checkruns on push-to-main via.github/workflows/bench.yml.allocs_gateinscripts/bench-registry.sh) hard-fails on breach.Remaining for Phase 2
testing.Ballocs/op is hardware-invariant in theory). Confirm by:bench.ymlruns.bench/detectors/baselines.json.bench/detectors/README.md:$gate_modeinbaselines.jsonfromsofttohard; changescripts/bench-check-detectors.shfinalexit 0toexit "$status".bench.ymlupload the bench output as an artifact each run; a small script can compute rolling CV across the last 10 runs to drive the graduation decision objectively.Out of scope
Per-detector hot-path optimization PRs (track separately as
bench-ratchet:issues per detector).