ci(bench): re-baseline make bench-check + add regression gate#244
Merged
Conversation
added 3 commits
May 31, 2026 13:27
Per #227. The 'bench-check' target had been an empty 'for pkg in ;' loop since PR-K.2 deleted the in-tree k8sevents receiver (the gate's sole baseline row). Perf regressions could ship. Restore the gate by registering two fast, deterministic-allocation benchmarks against committed baselines: - internal/synthesis/patterns (M19 PodEvictedDetector budget) - components/receivers/pyspy (M18 ParseDump + StackID hash) The 1 GiB bench/overhead/nccl_fr_bench_test.go stays advisory: it self-asserts on HeapAlloc against NORTHSTARS O2 and is too slow for CI gating. The Make → shell indirection (scripts/bench-check-all.sh) exists because the bench regex contains parentheses that Make-invoked /bin/sh tokenises as subshell metacharacters. Signed-off-by: Tri Lam <tri@maydow.com>
Signed-off-by: Tri Lam <tri@maydow.com> # Conflicts: # module/pkg/patterns/testdata/bench-baseline.txt
Adversarial-review pass on the prior commit caught two real issues:
1. The prior commit gated sec/op too, but wall-clock CV on dev
hardware routinely crosses the 10% threshold from background load
alone even when benchstat marks the delta significant (p<0.05).
Reproduced locally: 'make bench-check' false-fired on a +16.5%
ns/op row immediately after baseline capture, with allocs/op + B/op
both pristine at 0% CV. Hide sec/op from the gate; alloc-count +
B/op are hardware-invariant and only move when the code does.
2. scripts/bench-{check-all,baseline}.sh duplicated the package
registry — DRY violation that would silently desync. Extract to
scripts/bench-registry.sh sourced by both consumers.
Re-verified TDD plant (alloc regression in stackID): gate still
trips with exit 2 on B/op (+800%) + allocs/op (+133%); plant
reverted; clean tree exits 0.
Signed-off-by: Tri Lam <tri@maydow.com>
trilamsr
pushed a commit
that referenced
this pull request
May 31, 2026
Land both processors (rankjoin + patterndetector) at v0.120 and keep bench-check scaffolding from #244 alongside the OTel bump. go mod tidy in module/ collapsed indirects to the v0.120 surface incl. xprocessor + consumertest. Signed-off-by: Tri Lam <tri@maydow.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #227.
Summary
make bench-checkhas been a no-op since PR-K.2 (#217) deletedcomponents/receivers/k8sevents/— itstestdata/bench-baseline.txtwas the sole row in the gate'sfor pkg in …loop, leaving the loop body literally empty (for pkg in ; do …). The target stayed as a stub somake ciautomation kept a stable invocation, with a TODO to re-register packages later. That re-registration is this PR.Root cause
Makefile:71-101(pre-diff) iterated an empty list. The wrapper aroundbenchstat+scripts/bench-check.shwas intact; only the registry of which packages to gate had been emptied. So the fix is purely a registration + baseline-capture exercise — no logic change to the gate's comparison machinery, plus one scope-tightening change called out under "Adversarial-review fix" below.What this PR does
Registers two packages with deterministic-allocation micro-benchmarks for gating:
module/pkg/patterns—BenchmarkPodEvictedDetector_1kEventWindow(the M19 detector budget). (Path is post-PR-I.2a / refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a) #242; this branch was rebased onto that merge during the PR.)components/receivers/pyspy—BenchmarkParseDump+BenchmarkStackID(the M18 join-key hash + faulthandler parser).Captures committed baselines at
module/pkg/patterns/testdata/bench-baseline.txtandcomponents/receivers/pyspy/testdata/bench-baseline.txt, both atcount=10 -benchtime=500ms -benchmem.count=10is the minimum that survives benchstat's≥6 samples for confidence interval at level 0.95warning with headroom for one outlier.Extracts the registry to
scripts/bench-registry.shsourced by bothscripts/bench-check-all.sh(gate runner, wired tomake bench-check) andscripts/bench-baseline.sh(regenerator, wired tomake bench-baseline). Single source of truth — drift between "what we gate" and "what we regenerate" can no longer happen.Why scripts and not an inline Make loop: the bench regex
^Benchmark(ParseDump|StackID)$contains parentheses that the Make-invoked/bin/shtokenises as subshell metacharacters. Quoting undermakeshellquote escaping is more fragile than just punting to plain shell.Restricts the gate to
B/op+allocs/optables inscripts/bench-check.sh;sec/opis no longer gated. See "Adversarial-review fix" below.Wires the gate into CI as a new step in
.github/workflows/ci.yml'sverify-staticjob, aftermake build. A pinnedgo install golang.org/x/perf/cmd/benchstat@…step installs the tool on the runner.Adversarial-review fix (mid-PR)
The first commit gated sec/op too, with the design note claiming pure wall-clock jitter would show as benchstat's
~(non-significant) and never reach the+NN%parser. That's false. Repeatingmake bench-checkimmediately after a baseline capture, with identical code on identical hardware, fired:The 16.5% delta was background-load drift between two runs minutes apart. benchstat marks it significant because the CV is small (~5-10%) and the means are reliably different — both true symptoms of real but uninteresting wall-clock variance. Gating on sec/op would produce continuous false fires across the team without catching anything code review can act on.
Fix:
scripts/bench-check.shnow tracks the active benchstat table by indented header line (sec/op/B/op/allocs/op) and only counts+NN%deltas underB/op+allocs/op. Both stay pinned to 0% CV across runs (deterministic Go allocator + identical bytes-allocated per code path) and only move when the code does.make bench-checkon clean main now passes consistently across runs; the TDD planted regression (heap-allocating sink instackID) still trips the gate at+800% B/op++133% allocs/op.Design notes
HeapAllocdelta against the NORTHSTARS O2 ceiling. Adding it tobench-checkwould push CI past the budget and produce nothing the in-bench assertion doesn't already gate. Stays advisory.ubuntu-latest(x86_64). The Go allocator is deterministic across GOARCH for a given code path —b.ReportAllocs()returns identical numbers on both. The historical k8sevents baseline lived under the same cross-arch arrangement.make ciunchanged:bench-checkis NOT added to themake cirecipe. That would force every contributor to install benchstat. CI installs benchstat itself in the workflow; local dev runsmake bench-checkonly when intentionally vetting a perf change.module/submodule resolution:module/pkg/patternslives in the in-repo Go submodule (module/go.mod); local dev runs resolve throughgo.workand OCB throughbuilder-config.yaml'sreplaces: ./module.go test ./module/pkg/patterns/…works from the repo root because of the workspace.TDD record
make bench-checkon clean main passes — exit 0, both packages reportPASS: no benchmarks regressed by more than 10% vs baseline.(Re-run multiple times across different machine load — stays green.)stackIDto allocate 64 B/iter via a package-level sink (escape-analysis-routed to the heap, so allocs/op actually moved);make bench-checkexit 2 (make wraps the inner exit 1). Output flaggedStackID-10 +800.00% (p=0.000 n=10)on B/op and+133.33% (p=0.000 n=10)on allocs/op.make bench-checkexit 0;git diff main -- components/receivers/pyspy/stackid.goempty.Release notes
Test plan
make check(fmt, tidy-check, lint, vet, mod-verify) — clean (re-run post-adversarial-fix)make actionlint— cleanmake zizmor— clean (No findings to report. Good job!)make license-check— cleanmake doc-check— cleanmake bench-checkon clean main — exit 0, both packages PASS, repeated runs stay green (false-positive bug fixed)make bench-checkwith deliberate alloc-regression planted instackID— exit 2, gate flags+800% B/op++133% allocs/opmake bench-checkafter revert — exit 0 again, stackid.go diff vs main is emptyverify-staticrunsmake bench-checkgreen on this PR (awaiting CI)Self-grading
A — root cause named, fix scoped to missing registration without expanding scope (no new benchmark framework, no rewrite of the benchstat-comparison machinery beyond the sec/op-vs-alloc scoping fix). TDD cycle complete with a deliberate regression that the gate catches. Adversarial self-review found a real false-positive bug (sec/op gating) and fixed it in-PR before merge. Registry extracted to a single source of truth so the two consumers can't drift. CHANGELOG entry written.
Not A+ because: even with sec/op excluded, the gate still depends on benchstat's text-output format remaining stable; a future major version that changes column layout could silently disable the gate (it would skip every row, exit 0, print PASS). A future iteration could parse benchstat's
-format=csvoutput instead, which is its versioned machine-readable interface. Out of scope here; the current text format hasn't changed for years and matches the existingscripts/bench-check.shshape that already shipped with the k8sevents baseline.