Skip to content

Phase 2: verify linux baseline + graduate bench-check soft gate to hard (#302 follow-up) #420

Description

@trilamsr

Follow-up to #302 / PR #416.

Status after PR #416

  • make bench-check invokes both gates (% delta + absolute ceiling).
  • bench-detectors-check runs on push-to-main via .github/workflows/bench.yml.
  • Absolute ceiling array (allocs_gate in scripts/bench-registry.sh) hard-fails on breach.
  • Mutation-verified (ceiling 2 → 1 on HBM trips the gate).

Remaining for Phase 2

  1. Verify linux/amd64 baseline matches M1 numbers. Initial ceilings were measured on Apple M1 Max; the README claims +/- 1 alloc on linux/amd64 (Go's testing.B allocs/op is hardware-invariant in theory). Confirm by:
    • Capture per-detector allocs/op output from the next 10 push-to-main bench.yml runs.
    • Compare medians vs bench/detectors/baselines.json.
    • If any detector shifts >= 2 allocs/op consistently, re-baseline on linux and update the JSON + ceilings.
  2. Graduate soft gate to hard. Per bench/detectors/README.md:
    • N >= 10 consecutive PRs with no flake-driven false positive.
    • alloc-CV < 1% across those 10 PRs per detector.
    • Flip $gate_mode in baselines.json from soft to hard; change scripts/bench-check-detectors.sh final exit 0 to exit "$status".
  3. (Optional) Cross-CV captured in CI artifact. Have bench.yml upload the bench output as an artifact each run; a small script can compute rolling CV across the last 10 runs to drive the graduation decision objectively.

Out of scope

Per-detector hot-path optimization PRs (track separately as bench-ratchet: issues per detector).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestexternal-clockBlocked on out-of-repo state

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions