TraceCoreAI · trilamsr · May 31, 2026 · May 31, 2026 · May 31, 2026 · May 31, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -128,6 +128,12 @@ jobs:
         run: make doc-check
       - name: build
         run: make build
+      - name: install benchstat
+        run: |
+          go install golang.org/x/perf/cmd/benchstat@v0.0.0-20260512194132-3cf34090a3db
+          echo "$(go env GOPATH)/bin" >> "$GITHUB_PATH"
+      - name: bench-check (perf regression gate)
+        run: make bench-check
 
   # M6: validate every docs/integrations/examples/*.yaml against the
   # right binary. Tracecore-tagged recipes use the in-tree binary;

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,8 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee
 
 ## [Unreleased]
 
+**`make bench-check` re-baselined as a real perf-regression gate.** Since PR-K.2 deleted the in-tree k8sevents receiver (whose `testdata/bench-baseline.txt` was the gate's sole row), the target had been a no-op `for pkg in ;` empty loop. The loop now iterates a quoted-shell registry in `scripts/bench-check-all.sh` covering `module/pkg/patterns` (`BenchmarkPodEvictedDetector_1kEventWindow`, the M19 detector budget) and `components/receivers/pyspy` (`BenchmarkParseDump` + `BenchmarkStackID`, the M18 join-key hash + faulthandler parser). Each package owns a checked-in `testdata/bench-baseline.txt` captured at `count=10 -benchtime=500ms -benchmem` (benchstat needs ≥6 samples for a meaningful CI band); regenerate with `make bench-baseline` after a vetted, intentional perf change and commit the diff. `scripts/bench-check.sh` parses the `+NN.NN%` deltas from `benchstat baseline new` for the `B/op` + `allocs/op` tables and exits non-zero on any row past `THRESHOLD` (default 10%, env-overridable); `sec/op` rows are skipped because wall-clock CV on dev hardware and shared CI runners routinely crosses 10% from background load alone, even when benchstat marks the delta statistically significant. Alloc-count + B/op are the hardware-invariant signals that pin to 0% CV across runs and only move when the code does. The 1 GiB `bench/overhead/nccl_fr_bench_test.go` stays advisory (self-asserts on `HeapAlloc` against NORTHSTARS O2; too slow + already gated internally) and is NOT in the registry. CI runs the gate in `verify-static` after a pinned `go install golang.org/x/perf/cmd/benchstat@...` step. The shell-registry indirection (Makefile → `scripts/bench-check-all.sh`) exists because the bench regex `^Benchmark(ParseDump|StackID)$` contains parentheses that the Make-invoked `/bin/sh` tokenises as subshell metacharacters. Closes #227.
+
 **`tracecore --version` now reports the git-describe state of the build commit**, not the hardcoded `builder-config.yaml` `dist.version` value. `make build` resolves the injected version in order: `$TRACECORE_VERSION` (CI / release override) → `git describe --tags --always --match 'v*' --dirty=-dev` → the unmodified `dist.version` fallback (used when git is unavailable, e.g. a tarball extract). The `v*` filter keeps `module/vX.Y.Z` submodule tags from shadowing the binary's release namespace; the `-dev` suffix flags uncommitted working trees so a laptop build is never indistinguishable from a tagged release. The injection sed-rewrites a temporary copy of `builder-config.yaml` and restores the original on exit via a trap, so the on-disk source-tree value stays in lockstep with `Chart.yaml` `appVersion` and `make doc-check`'s parity gate. `.github/workflows/release.yml` pins `TRACECORE_VERSION="$TAG"` for both the goreleaser pre-build matrix and the ko-publish step so the in-image binary's `--version` matches the published tag verbatim regardless of git-history depth. Closes RFC-0013 PR-D follow-up.
 
 Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-first-pivot.md))** - binary now assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`) containing only the moat (NCCL FlightRecorder receiver, OTTL processors with windowed semantics, pattern detectors). The M1 in-tree pipeline runtime + factory-based assembly is queued for deletion at v0.1.0 in favor of the OCB-generated boot path; the canonical `clockreceiver` + `stdoutexporter` examples ship for one PR cycle and then exit. Targeting v0.1.0 / v0.2.0 / v0.3.0 release boundaries per RFC-0013 §4.

diff --git a/Makefile b/Makefile
@@ -2,7 +2,7 @@
 .PHONY: help build clean hooks
 
 # Test suites
-.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check
+.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-baseline
 
 # Format + tidy
 .PHONY: fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify
@@ -79,26 +79,15 @@ test:  ## Run unit tests with the race detector.
 bench:  ## Run benchmarks across the repo with -benchmem, count=5.
 	go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./...
 
-bench-check:  ## Run benches; fail if any row regresses >10% vs the per-package baseline under testdata/bench-baseline.txt. THRESHOLD overridable.
-	@# No per-package baselines remain in tree at this point — the
-	@# k8sevents baseline was retired in RFC-0013 PR-K.2 alongside the
-	@# in-tree k8sevents receiver. Future per-package baselines wire
-	@# back into the `for pkg in ...` loop below. The target stays so
-	@# downstream automation (`make ci`) keeps a stable invocation.
-	@status=0; \
-	out=$$(mktemp); \
-	trap 'rm -f $$out' EXIT; \
-	for pkg in ; do \
-	  baseline=$$pkg/testdata/bench-baseline.txt; \
-	  if [ ! -f $$baseline ]; then \
-	    echo "bench-check: missing baseline $$baseline" >&2; \
-	    status=2; continue; \
-	  fi; \
-	  echo "==> $$pkg"; \
-	  go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./$$pkg/ > $$out 2>&1; \
-	  scripts/bench-check.sh $$baseline $$out $${THRESHOLD:-10} || status=$$?; \
-	done; \
-	exit $$status
+bench-check:  ## Run perf gate: bench each gated pkg vs its committed baseline; fail if any row regresses >THRESHOLD% (default 10).
+	@# Per-package registry + execution lives in scripts/bench-check-all.sh
+	@# rather than inline here: the bench regex contains parentheses that
+	@# Make-invoked /bin/sh tokenises as subshell metacharacters. THRESHOLD
+	@# env var passes through ("THRESHOLD=20 make bench-check" etc.).
+	scripts/bench-check-all.sh
+
+bench-baseline:  ## Regenerate every per-package bench-baseline.txt. Commit the diff after vetting it on your hardware.
+	scripts/bench-baseline.sh
 
 
 fmt:  ## Check formatting; fails if any file is not gofumpt-clean.

diff --git a/components/receivers/pyspy/testdata/bench-baseline.txt b/components/receivers/pyspy/testdata/bench-baseline.txt
@@ -0,0 +1,26 @@
+goos: darwin
+goarch: arm64
+pkg: github.com/tracecoreai/tracecore/components/receivers/pyspy
+cpu: Apple M1 Max
+BenchmarkParseDump-10    	   10000	     69886 ns/op	   66339 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   10000	    109513 ns/op	   66337 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   10000	     50370 ns/op	   66337 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   10000	     78885 ns/op	   66337 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   33966	     19766 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   42043	     19763 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   36619	     16521 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   10000	     68161 ns/op	   66337 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   33739	     24151 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   32496	     17291 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkStackID-10      	 1000000	      2463 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      2175 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      2419 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      2151 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	  788203	      2649 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	  442069	      2578 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	  605858	      2137 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      2168 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      1993 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	  982612	      2085 ns/op	      32 B/op	       3 allocs/op
+PASS
+ok  	github.com/tracecoreai/tracecore/components/receivers/pyspy	32.159s
diff --git a/module/pkg/patterns/testdata/bench-baseline.txt b/module/pkg/patterns/testdata/bench-baseline.txt
@@ -0,0 +1,16 @@
+goos: darwin
+goarch: arm64
+pkg: github.com/tracecoreai/tracecore/module/pkg/patterns
+cpu: Apple M1 Max
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     278	   2311466 ns/op	  847109 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     249	   2603816 ns/op	  847115 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     214	   2610155 ns/op	  847060 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     241	   2670796 ns/op	  847112 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     249	   2468809 ns/op	  847070 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     222	   2466137 ns/op	  847072 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     237	   2476560 ns/op	  847063 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     256	   2770240 ns/op	  847087 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     267	   2589770 ns/op	  847115 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     250	   2418795 ns/op	  847075 B/op	   15635 allocs/op
+PASS
+ok  	github.com/tracecoreai/tracecore/module/pkg/patterns	9.643s
diff --git a/scripts/bench-baseline.sh b/scripts/bench-baseline.sh
@@ -0,0 +1,29 @@
+#!/usr/bin/env bash
+# bench-baseline.sh — regenerate every per-package bench baseline
+# registered in scripts/bench-registry.sh.
+#
+# benchstat needs >= 6 samples for confidence intervals at level 0.95
+# (anything less yields a "need >= 6 samples" warning + ± ∞ band, which
+# defeats the noise filter). count=10 gives headroom + matches what
+# `make bench-check` runs, so a freshly-regenerated baseline self-
+# compares cleanly under `benchstat <baseline> <baseline>`.
+#
+# Run on the hardware you intend to gate on. The committed baseline
+# pins B/op + allocs/op (hardware-invariant) precisely; sec/op gating
+# only fires when a regression exceeds the run-to-run CV, which is the
+# right behavior — alloc-count growth is the load-bearing signal, not
+# wall-clock jitter.
+set -euo pipefail
+
+# shellcheck source=scripts/bench-registry.sh
+source "$(dirname "$0")/bench-registry.sh"
+
+for entry in "${bench_entries[@]}"; do
+    pkg="${entry%%|*}"
+    pat="${entry#*|}"
+    baseline="$pkg/testdata/bench-baseline.txt"
+    mkdir -p "$pkg/testdata"
+    echo "==> regenerating $baseline ($pat)"
+    go test -bench="$pat" -benchmem -benchtime=500ms -count=10 \
+        -run='^$' "./$pkg/" > "$baseline"
+done
diff --git a/scripts/bench-check-all.sh b/scripts/bench-check-all.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+# bench-check-all.sh — perf gate runner.
+#
+# Iterates the registry in scripts/bench-registry.sh and, for each
+# (package, bench-regex) entry, runs the matching benchmarks and
+# compares results against the package's frozen
+# `testdata/bench-baseline.txt` via scripts/bench-check.sh (benchstat
+# under the hood).
+#
+# Exit codes:
+#   0  every gated row stayed within THRESHOLD (default 10%).
+#   1  one or more rows regressed past THRESHOLD.
+#   2  a baseline file was missing or benchstat itself was missing.
+set -euo pipefail
+
+# shellcheck source=scripts/bench-registry.sh
+source "$(dirname "$0")/bench-registry.sh"
+
+threshold="${THRESHOLD:-10}"
+
+status=0
+out=$(mktemp)
+trap 'rm -f "$out"' EXIT
+
+for entry in "${bench_entries[@]}"; do
+    pkg="${entry%%|*}"
+    pat="${entry#*|}"
+    baseline="$pkg/testdata/bench-baseline.txt"
+    if [[ ! -f "$baseline" ]]; then
+        echo "bench-check: missing baseline $baseline" >&2
+        status=2
+        continue
+    fi
+    echo "==> $pkg ($pat)"
+    go test -bench="$pat" -benchmem -benchtime=500ms -count=10 \
+        -run='^$' "./$pkg/" > "$out" 2>&1
+    scripts/bench-check.sh "$baseline" "$out" "$threshold" || status=$?
+done
+
+exit $status
diff --git a/scripts/bench-check.sh b/scripts/bench-check.sh
@@ -37,12 +37,32 @@ trap 'rm -f "$cmp"' EXIT
 
 benchstat "$baseline" "$new" | tee "$cmp"
 
-# Parse `+NN.NN%` regressions from benchstat output. The vs-base
-# column shows `~` for noise, `+X.YZ%` for slowdowns, `-X.YZ%` for
-# speedups. We only care about + with magnitude > threshold.
+# Parse `+NN.NN%` regressions from benchstat output, restricted to the
+# B/op and allocs/op tables. sec/op rows are deliberately ignored: the
+# wall-clock CV on dev hardware (and on shared CI runners) routinely
+# crosses the 10% threshold from background-load jitter alone, even
+# when benchstat marks the delta as statistically significant — gating
+# on it would false-fire on most PRs without catching anything code
+# review can act on. Alloc-count and B/op growth are the
+# hardware-invariant signals that track real regressions (escape-
+# analysis bugs, accidentally-heap-allocated locals, hot-loop string
+# concatenations, etc.); they pin to 0% CV across runs and only move
+# when the code does.
 #
-# awk extracts {benchmark_name, delta_percent} pairs, then we filter.
+# Tracking the active table: benchstat prints an indented header line
+# containing only whitespace, box-drawing chars (`│`, U+2502 — multi-
+# byte, not portable to match as a literal in awk regex), and the unit
+# label (`sec/op` / `B/op` / `allocs/op`) before each comparison table.
+# We detect these by `index(line, "<unit>")` since the unit substrings
+# do not appear in benchmark names (Go disallows `/` in identifiers).
+# Header lines start with whitespace; data rows start with an alpha
+# benchmark name; the indent prefix is therefore an unambiguous
+# disambiguator. Set `gate=1` only inside B/op + allocs/op tables.
 regressed=$(awk -v thr="$threshold" '
+    /^ +/ && index($0, "sec/op")    { gate = 0; next }
+    /^ +/ && index($0, "B/op")      { gate = 1; next }
+    /^ +/ && index($0, "allocs/op") { gate = 1; next }
+    !gate { next }
     # Skip header lines that contain column titles (no benchmark name).
     /^[A-Za-z]/ && /\+[0-9]+(\.[0-9]+)?%/ {
         # Match either +NN.NN% (current benchstat output) or +NN%

diff --git a/scripts/bench-registry.sh b/scripts/bench-registry.sh
@@ -0,0 +1,25 @@
+# shellcheck shell=bash
+# bench-registry.sh — single source of truth for the perf-gate registry.
+#
+# Sourced by scripts/bench-check-all.sh (gate runner) and
+# scripts/bench-baseline.sh (regenerator); not directly executable.
+# Defines the `bench_entries` array; each entry is "pkg|regex":
+#   pkg    — Go import path relative to repo root.
+#   regex  — `-bench` regex; quote-safe since both consumers run under
+#            plain shell, not Make-invoked /bin/sh (which would tokenise
+#            unescaped parentheses as subshell metacharacters).
+#
+# Gating criteria for a new entry:
+#   1. Benchmark must be deterministic-allocation — B/op + allocs/op are
+#      hardware-invariant and gate cleanly across CI runner and dev box;
+#      sec/op is noisy on dev machines and only fires the gate when the
+#      regression exceeds the run-to-run CV (which is the right behavior).
+#   2. Benchmark must run fast (≲30s per package at count=10 × 500ms ×
+#      benches-in-package). Volume benchmarks such as
+#      bench/overhead/nccl_fr_bench_test.go (1 GiB/iter, self-asserting
+#      HeapAlloc) stay advisory and OUT of this registry.
+
+bench_entries=(
+  "module/pkg/patterns|^BenchmarkPodEvictedDetector"
+  "components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$"
+)