From b41d212222157ad431dd456846a58c6e2fb99279 Mon Sep 17 00:00:00 2001
From: Tri Lam <tri@maydow.com>
Date: Sun, 31 May 2026 13:27:55 -0700
Subject: [PATCH 1/2] ci(bench): re-baseline make bench-check + add regression
 gate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per #227. The 'bench-check' target had been an empty 'for pkg in ;'
loop since PR-K.2 deleted the in-tree k8sevents receiver (the gate's
sole baseline row). Perf regressions could ship.

Restore the gate by registering two fast, deterministic-allocation
benchmarks against committed baselines:
  - internal/synthesis/patterns (M19 PodEvictedDetector budget)
  - components/receivers/pyspy   (M18 ParseDump + StackID hash)

The 1 GiB bench/overhead/nccl_fr_bench_test.go stays advisory: it
self-asserts on HeapAlloc against NORTHSTARS O2 and is too slow for
CI gating.

The Make → shell indirection (scripts/bench-check-all.sh) exists
because the bench regex contains parentheses that Make-invoked
/bin/sh tokenises as subshell metacharacters.

Signed-off-by: Tri Lam <tri@maydow.com>
---
 .github/workflows/ci.yml                      |  6 ++
 CHANGELOG.md                                  |  2 +
 Makefile                                      | 31 ++++------
 .../pyspy/testdata/bench-baseline.txt         | 26 +++++++++
 .../patterns/testdata/bench-baseline.txt      | 16 ++++++
 scripts/bench-baseline.sh                     | 31 ++++++++++
 scripts/bench-check-all.sh                    | 57 +++++++++++++++++++
 7 files changed, 148 insertions(+), 21 deletions(-)
 create mode 100644 components/receivers/pyspy/testdata/bench-baseline.txt
 create mode 100644 internal/synthesis/patterns/testdata/bench-baseline.txt
 create mode 100755 scripts/bench-baseline.sh
 create mode 100755 scripts/bench-check-all.sh

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 6770c89f..7260c30a 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -128,6 +128,12 @@ jobs:
         run: make doc-check
       - name: build
         run: make build
+      - name: install benchstat
+        run: |
+          go install golang.org/x/perf/cmd/benchstat@v0.0.0-20260512194132-3cf34090a3db
+          echo "$(go env GOPATH)/bin" >> "$GITHUB_PATH"
+      - name: bench-check (perf regression gate)
+        run: make bench-check
 
   # M6: validate every docs/integrations/examples/*.yaml against the
   # right binary. Tracecore-tagged recipes use the in-tree binary;
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4ff7e3ea..ceff91d0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,8 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee
 
 ## [Unreleased]
 
+**`make bench-check` re-baselined as a real perf-regression gate.** Since PR-K.2 deleted the in-tree k8sevents receiver (whose `testdata/bench-baseline.txt` was the gate's sole row), the target had been a no-op `for pkg in ;` empty loop. The loop now iterates a quoted-shell registry in `scripts/bench-check-all.sh` covering `internal/synthesis/patterns` (`BenchmarkPodEvictedDetector_1kEventWindow`, the M19 detector budget) and `components/receivers/pyspy` (`BenchmarkParseDump` + `BenchmarkStackID`, the M18 join-key hash + faulthandler parser). Each package owns a checked-in `testdata/bench-baseline.txt` captured at `count=10 -benchtime=500ms -benchmem` (benchstat needs ≥6 samples for a meaningful CI band); regenerate with `make bench-baseline` after a vetted, intentional perf change and commit the diff. `scripts/bench-check.sh` (unchanged) parses the `+NN.NN%` deltas from `benchstat baseline new` and exits non-zero on any row past `THRESHOLD` (default 10%, env-overridable). The wire-up keeps in-tree CV-noisy `sec/op` rows from false-firing — only deltas exceeding run-to-run noise reach the gate — while `B/op` + `allocs/op` (hardware-invariant) gate precisely. The 1 GiB `bench/overhead/nccl_fr_bench_test.go` stays advisory (self-asserts on `HeapAlloc` against NORTHSTARS O2; too slow + already gated internally) and is NOT in the registry. CI runs the gate in `verify-static` after a pinned `go install golang.org/x/perf/cmd/benchstat@...` step. The shell-registry indirection (Makefile → `scripts/bench-check-all.sh`) exists because the bench regex `^Benchmark(ParseDump|StackID)$` contains parentheses that the Make-invoked `/bin/sh` tokenises as subshell metacharacters. Closes #227.
+
 **`tracecore --version` now reports the git-describe state of the build commit**, not the hardcoded `builder-config.yaml` `dist.version` value. `make build` resolves the injected version in order: `$TRACECORE_VERSION` (CI / release override) → `git describe --tags --always --match 'v*' --dirty=-dev` → the unmodified `dist.version` fallback (used when git is unavailable, e.g. a tarball extract). The `v*` filter keeps `module/vX.Y.Z` submodule tags from shadowing the binary's release namespace; the `-dev` suffix flags uncommitted working trees so a laptop build is never indistinguishable from a tagged release. The injection sed-rewrites a temporary copy of `builder-config.yaml` and restores the original on exit via a trap, so the on-disk source-tree value stays in lockstep with `Chart.yaml` `appVersion` and `make doc-check`'s parity gate. `.github/workflows/release.yml` pins `TRACECORE_VERSION="$TAG"` for both the goreleaser pre-build matrix and the ko-publish step so the in-image binary's `--version` matches the published tag verbatim regardless of git-history depth. Closes RFC-0013 PR-D follow-up.
 
 Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-first-pivot.md))** - binary now assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`) containing only the moat (NCCL FlightRecorder receiver, OTTL processors with windowed semantics, pattern detectors). The M1 in-tree pipeline runtime + factory-based assembly is queued for deletion at v0.1.0 in favor of the OCB-generated boot path; the canonical `clockreceiver` + `stdoutexporter` examples ship for one PR cycle and then exit. Targeting v0.1.0 / v0.2.0 / v0.3.0 release boundaries per RFC-0013 §4.
diff --git a/Makefile b/Makefile
index 82d37019..2cc09068 100644
--- a/Makefile
+++ b/Makefile
@@ -2,7 +2,7 @@
 .PHONY: help build clean hooks
 
 # Test suites
-.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check
+.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-baseline
 
 # Format + tidy
 .PHONY: fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify
@@ -79,26 +79,15 @@ test:  ## Run unit tests with the race detector.
 bench:  ## Run benchmarks across the repo with -benchmem, count=5.
 	go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./...
 
-bench-check:  ## Run benches; fail if any row regresses >10% vs the per-package baseline under testdata/bench-baseline.txt. THRESHOLD overridable.
-	@# No per-package baselines remain in tree at this point — the
-	@# k8sevents baseline was retired in RFC-0013 PR-K.2 alongside the
-	@# in-tree k8sevents receiver. Future per-package baselines wire
-	@# back into the `for pkg in ...` loop below. The target stays so
-	@# downstream automation (`make ci`) keeps a stable invocation.
-	@status=0; \
-	out=$$(mktemp); \
-	trap 'rm -f $$out' EXIT; \
-	for pkg in ; do \
-	  baseline=$$pkg/testdata/bench-baseline.txt; \
-	  if [ ! -f $$baseline ]; then \
-	    echo "bench-check: missing baseline $$baseline" >&2; \
-	    status=2; continue; \
-	  fi; \
-	  echo "==> $$pkg"; \
-	  go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./$$pkg/ > $$out 2>&1; \
-	  scripts/bench-check.sh $$baseline $$out $${THRESHOLD:-10} || status=$$?; \
-	done; \
-	exit $$status
+bench-check:  ## Run perf gate: bench each gated pkg vs its committed baseline; fail if any row regresses >THRESHOLD% (default 10).
+	@# Per-package registry + execution lives in scripts/bench-check-all.sh
+	@# rather than inline here: the bench regex contains parentheses that
+	@# Make-invoked /bin/sh tokenises as subshell metacharacters. THRESHOLD
+	@# env var passes through ("THRESHOLD=20 make bench-check" etc.).
+	scripts/bench-check-all.sh
+
+bench-baseline:  ## Regenerate every per-package bench-baseline.txt. Commit the diff after vetting it on your hardware.
+	scripts/bench-baseline.sh
 
 
 fmt:  ## Check formatting; fails if any file is not gofumpt-clean.
diff --git a/components/receivers/pyspy/testdata/bench-baseline.txt b/components/receivers/pyspy/testdata/bench-baseline.txt
new file mode 100644
index 00000000..4dfe3ef4
--- /dev/null
+++ b/components/receivers/pyspy/testdata/bench-baseline.txt
@@ -0,0 +1,26 @@
+goos: darwin
+goarch: arm64
+pkg: github.com/tracecoreai/tracecore/components/receivers/pyspy
+cpu: Apple M1 Max
+BenchmarkParseDump-10    	   10000	     69886 ns/op	   66339 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   10000	    109513 ns/op	   66337 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   10000	     50370 ns/op	   66337 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   10000	     78885 ns/op	   66337 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   33966	     19766 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   42043	     19763 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   36619	     16521 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   10000	     68161 ns/op	   66337 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   33739	     24151 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkParseDump-10    	   32496	     17291 ns/op	   66336 B/op	      14 allocs/op
+BenchmarkStackID-10      	 1000000	      2463 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      2175 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      2419 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      2151 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	  788203	      2649 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	  442069	      2578 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	  605858	      2137 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      2168 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	 1000000	      1993 ns/op	      32 B/op	       3 allocs/op
+BenchmarkStackID-10      	  982612	      2085 ns/op	      32 B/op	       3 allocs/op
+PASS
+ok  	github.com/tracecoreai/tracecore/components/receivers/pyspy	32.159s
diff --git a/internal/synthesis/patterns/testdata/bench-baseline.txt b/internal/synthesis/patterns/testdata/bench-baseline.txt
new file mode 100644
index 00000000..653879d3
--- /dev/null
+++ b/internal/synthesis/patterns/testdata/bench-baseline.txt
@@ -0,0 +1,16 @@
+goos: darwin
+goarch: arm64
+pkg: github.com/tracecoreai/tracecore/internal/synthesis/patterns
+cpu: Apple M1 Max
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     100	   5140860 ns/op	  847365 B/op	   15636 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     100	   6203007 ns/op	  847152 B/op	   15636 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     370	   4990751 ns/op	  847097 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     100	   6649042 ns/op	  847089 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     150	   4384935 ns/op	  846940 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     100	   5087184 ns/op	  846956 B/op	   15634 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     250	   5339458 ns/op	  846707 B/op	   15634 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     100	   5578500 ns/op	  846980 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     153	   4252240 ns/op	  847009 B/op	   15635 allocs/op
+BenchmarkPodEvictedDetector_1kEventWindow-10    	     500	   5139900 ns/op	  846761 B/op	   15634 allocs/op
+PASS
+ok  	github.com/tracecoreai/tracecore/internal/synthesis/patterns	13.295s
diff --git a/scripts/bench-baseline.sh b/scripts/bench-baseline.sh
new file mode 100755
index 00000000..5711f3a7
--- /dev/null
+++ b/scripts/bench-baseline.sh
@@ -0,0 +1,31 @@
+#!/usr/bin/env bash
+# bench-baseline.sh — regenerate every per-package bench baseline
+# registered in scripts/bench-check-all.sh.
+#
+# benchstat needs >= 6 samples for confidence intervals at level 0.95
+# (anything less yields a "need >= 6 samples" warning + ± ∞ band, which
+# defeats the noise filter). count=10 gives headroom + matches what
+# `make bench-check` runs, so a freshly-regenerated baseline self-
+# compares cleanly under `benchstat <baseline> <baseline>`.
+#
+# Run on the hardware you intend to gate on. The committed baseline
+# pins B/op + allocs/op (hardware-invariant) precisely; sec/op gating
+# only fires when a regression exceeds the run-to-run CV, which is the
+# right behavior — alloc-count growth is the load-bearing signal, not
+# wall-clock jitter.
+set -euo pipefail
+
+entries=(
+  "internal/synthesis/patterns|^BenchmarkPodEvictedDetector"
+  "components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$"
+)
+
+for entry in "${entries[@]}"; do
+    pkg="${entry%%|*}"
+    pat="${entry#*|}"
+    baseline="$pkg/testdata/bench-baseline.txt"
+    mkdir -p "$pkg/testdata"
+    echo "==> regenerating $baseline ($pat)"
+    go test -bench="$pat" -benchmem -benchtime=500ms -count=10 \
+        -run='^$' "./$pkg/" > "$baseline"
+done
diff --git a/scripts/bench-check-all.sh b/scripts/bench-check-all.sh
new file mode 100755
index 00000000..b2435fb4
--- /dev/null
+++ b/scripts/bench-check-all.sh
@@ -0,0 +1,57 @@
+#!/usr/bin/env bash
+# bench-check-all.sh — perf gate runner.
+#
+# Iterates the registry of (package, bench-regex) pairs below and, for
+# each, runs the matching benchmarks and compares results against the
+# package's frozen `testdata/bench-baseline.txt` via
+# scripts/bench-check.sh (benchstat under the hood).
+#
+# Exit codes:
+#   0  every gated row stayed within THRESHOLD (default 10%).
+#   1  one or more rows regressed past THRESHOLD.
+#   2  a baseline file was missing or benchstat itself was missing.
+#
+# Why a separate script (not a Makefile loop): the bench regex
+# `^Benchmark(ParseDump|StackID)$` contains parentheses, which the
+# Make-invoked /bin/sh tokenises as subshell metacharacters. Pinning
+# the registry in shell with quoted strings avoids that escape-hell.
+#
+# Why a registry (not `go test -bench=. ./...`): each row is an
+# explicit, reviewed contract — adding a benchmark to the gate is an
+# intentional act, not an accident of placing a `func Benchmark…` in
+# the tree. Volume benchmarks (e.g. bench/overhead/nccl_fr_bench_test.go
+# at 1 GiB/iter with self-asserting HeapAlloc check) stay advisory and
+# OUT of this registry; only fast, deterministic-allocation micro-
+# benchmarks belong here.
+set -euo pipefail
+
+threshold="${THRESHOLD:-10}"
+
+# Registry: one entry per gated package.
+#   pkg     — Go import path relative to repo root.
+#   pattern — `-bench` regex; quote-safe since this is plain shell.
+entries=(
+  "internal/synthesis/patterns|^BenchmarkPodEvictedDetector"
+  "components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$"
+)
+
+status=0
+out=$(mktemp)
+trap 'rm -f "$out"' EXIT
+
+for entry in "${entries[@]}"; do
+    pkg="${entry%%|*}"
+    pat="${entry#*|}"
+    baseline="$pkg/testdata/bench-baseline.txt"
+    if [[ ! -f "$baseline" ]]; then
+        echo "bench-check: missing baseline $baseline" >&2
+        status=2
+        continue
+    fi
+    echo "==> $pkg ($pat)"
+    go test -bench="$pat" -benchmem -benchtime=500ms -count=10 \
+        -run='^$' "./$pkg/" > "$out" 2>&1
+    scripts/bench-check.sh "$baseline" "$out" "$threshold" || status=$?
+done
+
+exit $status

From 43fcd5ea7073b7a9b24d105d920c39ca183de38a Mon Sep 17 00:00:00 2001
From: Tri Lam <tri@maydow.com>
Date: Sun, 31 May 2026 13:41:42 -0700
Subject: [PATCH 2/2] ci(bench): gate B/op + allocs/op only; extract registry
 to shared file
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adversarial-review pass on the prior commit caught two real issues:

1. The prior commit gated sec/op too, but wall-clock CV on dev
   hardware routinely crosses the 10% threshold from background load
   alone even when benchstat marks the delta significant (p<0.05).
   Reproduced locally: 'make bench-check' false-fired on a +16.5%
   ns/op row immediately after baseline capture, with allocs/op + B/op
   both pristine at 0% CV. Hide sec/op from the gate; alloc-count +
   B/op are hardware-invariant and only move when the code does.

2. scripts/bench-{check-all,baseline}.sh duplicated the package
   registry — DRY violation that would silently desync. Extract to
   scripts/bench-registry.sh sourced by both consumers.

Re-verified TDD plant (alloc regression in stackID): gate still
trips with exit 2 on B/op (+800%) + allocs/op (+133%); plant
reverted; clean tree exits 0.

Signed-off-by: Tri Lam <tri@maydow.com>
---
 CHANGELOG.md               |  2 +-
 scripts/bench-baseline.sh  | 10 ++++------
 scripts/bench-check-all.sh | 35 +++++++++--------------------------
 scripts/bench-check.sh     | 28 ++++++++++++++++++++++++----
 scripts/bench-registry.sh  | 25 +++++++++++++++++++++++++
 5 files changed, 63 insertions(+), 37 deletions(-)
 create mode 100644 scripts/bench-registry.sh

diff --git a/CHANGELOG.md b/CHANGELOG.md
index ceff91d0..0a6411aa 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,7 +4,7 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee
 
 ## [Unreleased]
 
-**`make bench-check` re-baselined as a real perf-regression gate.** Since PR-K.2 deleted the in-tree k8sevents receiver (whose `testdata/bench-baseline.txt` was the gate's sole row), the target had been a no-op `for pkg in ;` empty loop. The loop now iterates a quoted-shell registry in `scripts/bench-check-all.sh` covering `internal/synthesis/patterns` (`BenchmarkPodEvictedDetector_1kEventWindow`, the M19 detector budget) and `components/receivers/pyspy` (`BenchmarkParseDump` + `BenchmarkStackID`, the M18 join-key hash + faulthandler parser). Each package owns a checked-in `testdata/bench-baseline.txt` captured at `count=10 -benchtime=500ms -benchmem` (benchstat needs ≥6 samples for a meaningful CI band); regenerate with `make bench-baseline` after a vetted, intentional perf change and commit the diff. `scripts/bench-check.sh` (unchanged) parses the `+NN.NN%` deltas from `benchstat baseline new` and exits non-zero on any row past `THRESHOLD` (default 10%, env-overridable). The wire-up keeps in-tree CV-noisy `sec/op` rows from false-firing — only deltas exceeding run-to-run noise reach the gate — while `B/op` + `allocs/op` (hardware-invariant) gate precisely. The 1 GiB `bench/overhead/nccl_fr_bench_test.go` stays advisory (self-asserts on `HeapAlloc` against NORTHSTARS O2; too slow + already gated internally) and is NOT in the registry. CI runs the gate in `verify-static` after a pinned `go install golang.org/x/perf/cmd/benchstat@...` step. The shell-registry indirection (Makefile → `scripts/bench-check-all.sh`) exists because the bench regex `^Benchmark(ParseDump|StackID)$` contains parentheses that the Make-invoked `/bin/sh` tokenises as subshell metacharacters. Closes #227.
+**`make bench-check` re-baselined as a real perf-regression gate.** Since PR-K.2 deleted the in-tree k8sevents receiver (whose `testdata/bench-baseline.txt` was the gate's sole row), the target had been a no-op `for pkg in ;` empty loop. The loop now iterates a quoted-shell registry in `scripts/bench-check-all.sh` covering `module/pkg/patterns` (`BenchmarkPodEvictedDetector_1kEventWindow`, the M19 detector budget) and `components/receivers/pyspy` (`BenchmarkParseDump` + `BenchmarkStackID`, the M18 join-key hash + faulthandler parser). Each package owns a checked-in `testdata/bench-baseline.txt` captured at `count=10 -benchtime=500ms -benchmem` (benchstat needs ≥6 samples for a meaningful CI band); regenerate with `make bench-baseline` after a vetted, intentional perf change and commit the diff. `scripts/bench-check.sh` parses the `+NN.NN%` deltas from `benchstat baseline new` for the `B/op` + `allocs/op` tables and exits non-zero on any row past `THRESHOLD` (default 10%, env-overridable); `sec/op` rows are skipped because wall-clock CV on dev hardware and shared CI runners routinely crosses 10% from background load alone, even when benchstat marks the delta statistically significant. Alloc-count + B/op are the hardware-invariant signals that pin to 0% CV across runs and only move when the code does. The 1 GiB `bench/overhead/nccl_fr_bench_test.go` stays advisory (self-asserts on `HeapAlloc` against NORTHSTARS O2; too slow + already gated internally) and is NOT in the registry. CI runs the gate in `verify-static` after a pinned `go install golang.org/x/perf/cmd/benchstat@...` step. The shell-registry indirection (Makefile → `scripts/bench-check-all.sh`) exists because the bench regex `^Benchmark(ParseDump|StackID)$` contains parentheses that the Make-invoked `/bin/sh` tokenises as subshell metacharacters. Closes #227.
 
 **`tracecore --version` now reports the git-describe state of the build commit**, not the hardcoded `builder-config.yaml` `dist.version` value. `make build` resolves the injected version in order: `$TRACECORE_VERSION` (CI / release override) → `git describe --tags --always --match 'v*' --dirty=-dev` → the unmodified `dist.version` fallback (used when git is unavailable, e.g. a tarball extract). The `v*` filter keeps `module/vX.Y.Z` submodule tags from shadowing the binary's release namespace; the `-dev` suffix flags uncommitted working trees so a laptop build is never indistinguishable from a tagged release. The injection sed-rewrites a temporary copy of `builder-config.yaml` and restores the original on exit via a trap, so the on-disk source-tree value stays in lockstep with `Chart.yaml` `appVersion` and `make doc-check`'s parity gate. `.github/workflows/release.yml` pins `TRACECORE_VERSION="$TAG"` for both the goreleaser pre-build matrix and the ko-publish step so the in-image binary's `--version` matches the published tag verbatim regardless of git-history depth. Closes RFC-0013 PR-D follow-up.
 
diff --git a/scripts/bench-baseline.sh b/scripts/bench-baseline.sh
index 4b888096..675a1300 100755
--- a/scripts/bench-baseline.sh
+++ b/scripts/bench-baseline.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 # bench-baseline.sh — regenerate every per-package bench baseline
-# registered in scripts/bench-check-all.sh.
+# registered in scripts/bench-registry.sh.
 #
 # benchstat needs >= 6 samples for confidence intervals at level 0.95
 # (anything less yields a "need >= 6 samples" warning + ± ∞ band, which
@@ -15,12 +15,10 @@
 # wall-clock jitter.
 set -euo pipefail
 
-entries=(
-  "module/pkg/patterns|^BenchmarkPodEvictedDetector"
-  "components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$"
-)
+# shellcheck source=scripts/bench-registry.sh
+source "$(dirname "$0")/bench-registry.sh"
 
-for entry in "${entries[@]}"; do
+for entry in "${bench_entries[@]}"; do
     pkg="${entry%%|*}"
     pat="${entry#*|}"
     baseline="$pkg/testdata/bench-baseline.txt"
diff --git a/scripts/bench-check-all.sh b/scripts/bench-check-all.sh
index f11cf520..5fe607c7 100755
--- a/scripts/bench-check-all.sh
+++ b/scripts/bench-check-all.sh
@@ -1,45 +1,28 @@
 #!/usr/bin/env bash
 # bench-check-all.sh — perf gate runner.
 #
-# Iterates the registry of (package, bench-regex) pairs below and, for
-# each, runs the matching benchmarks and compares results against the
-# package's frozen `testdata/bench-baseline.txt` via
-# scripts/bench-check.sh (benchstat under the hood).
+# Iterates the registry in scripts/bench-registry.sh and, for each
+# (package, bench-regex) entry, runs the matching benchmarks and
+# compares results against the package's frozen
+# `testdata/bench-baseline.txt` via scripts/bench-check.sh (benchstat
+# under the hood).
 #
 # Exit codes:
 #   0  every gated row stayed within THRESHOLD (default 10%).
 #   1  one or more rows regressed past THRESHOLD.
 #   2  a baseline file was missing or benchstat itself was missing.
-#
-# Why a separate script (not a Makefile loop): the bench regex
-# `^Benchmark(ParseDump|StackID)$` contains parentheses, which the
-# Make-invoked /bin/sh tokenises as subshell metacharacters. Pinning
-# the registry in shell with quoted strings avoids that escape-hell.
-#
-# Why a registry (not `go test -bench=. ./...`): each row is an
-# explicit, reviewed contract — adding a benchmark to the gate is an
-# intentional act, not an accident of placing a `func Benchmark…` in
-# the tree. Volume benchmarks (e.g. bench/overhead/nccl_fr_bench_test.go
-# at 1 GiB/iter with self-asserting HeapAlloc check) stay advisory and
-# OUT of this registry; only fast, deterministic-allocation micro-
-# benchmarks belong here.
 set -euo pipefail
 
-threshold="${THRESHOLD:-10}"
+# shellcheck source=scripts/bench-registry.sh
+source "$(dirname "$0")/bench-registry.sh"
 
-# Registry: one entry per gated package.
-#   pkg     — Go import path relative to repo root.
-#   pattern — `-bench` regex; quote-safe since this is plain shell.
-entries=(
-  "module/pkg/patterns|^BenchmarkPodEvictedDetector"
-  "components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$"
-)
+threshold="${THRESHOLD:-10}"
 
 status=0
 out=$(mktemp)
 trap 'rm -f "$out"' EXIT
 
-for entry in "${entries[@]}"; do
+for entry in "${bench_entries[@]}"; do
     pkg="${entry%%|*}"
     pat="${entry#*|}"
     baseline="$pkg/testdata/bench-baseline.txt"
diff --git a/scripts/bench-check.sh b/scripts/bench-check.sh
index f2dd5642..486ecd2b 100755
--- a/scripts/bench-check.sh
+++ b/scripts/bench-check.sh
@@ -37,12 +37,32 @@ trap 'rm -f "$cmp"' EXIT
 
 benchstat "$baseline" "$new" | tee "$cmp"
 
-# Parse `+NN.NN%` regressions from benchstat output. The vs-base
-# column shows `~` for noise, `+X.YZ%` for slowdowns, `-X.YZ%` for
-# speedups. We only care about + with magnitude > threshold.
+# Parse `+NN.NN%` regressions from benchstat output, restricted to the
+# B/op and allocs/op tables. sec/op rows are deliberately ignored: the
+# wall-clock CV on dev hardware (and on shared CI runners) routinely
+# crosses the 10% threshold from background-load jitter alone, even
+# when benchstat marks the delta as statistically significant — gating
+# on it would false-fire on most PRs without catching anything code
+# review can act on. Alloc-count and B/op growth are the
+# hardware-invariant signals that track real regressions (escape-
+# analysis bugs, accidentally-heap-allocated locals, hot-loop string
+# concatenations, etc.); they pin to 0% CV across runs and only move
+# when the code does.
 #
-# awk extracts {benchmark_name, delta_percent} pairs, then we filter.
+# Tracking the active table: benchstat prints an indented header line
+# containing only whitespace, box-drawing chars (`│`, U+2502 — multi-
+# byte, not portable to match as a literal in awk regex), and the unit
+# label (`sec/op` / `B/op` / `allocs/op`) before each comparison table.
+# We detect these by `index(line, "<unit>")` since the unit substrings
+# do not appear in benchmark names (Go disallows `/` in identifiers).
+# Header lines start with whitespace; data rows start with an alpha
+# benchmark name; the indent prefix is therefore an unambiguous
+# disambiguator. Set `gate=1` only inside B/op + allocs/op tables.
 regressed=$(awk -v thr="$threshold" '
+    /^ +/ && index($0, "sec/op")    { gate = 0; next }
+    /^ +/ && index($0, "B/op")      { gate = 1; next }
+    /^ +/ && index($0, "allocs/op") { gate = 1; next }
+    !gate { next }
     # Skip header lines that contain column titles (no benchmark name).
     /^[A-Za-z]/ && /\+[0-9]+(\.[0-9]+)?%/ {
         # Match either +NN.NN% (current benchstat output) or +NN%
diff --git a/scripts/bench-registry.sh b/scripts/bench-registry.sh
new file mode 100644
index 00000000..6a57f89f
--- /dev/null
+++ b/scripts/bench-registry.sh
@@ -0,0 +1,25 @@
+# shellcheck shell=bash
+# bench-registry.sh — single source of truth for the perf-gate registry.
+#
+# Sourced by scripts/bench-check-all.sh (gate runner) and
+# scripts/bench-baseline.sh (regenerator); not directly executable.
+# Defines the `bench_entries` array; each entry is "pkg|regex":
+#   pkg    — Go import path relative to repo root.
+#   regex  — `-bench` regex; quote-safe since both consumers run under
+#            plain shell, not Make-invoked /bin/sh (which would tokenise
+#            unescaped parentheses as subshell metacharacters).
+#
+# Gating criteria for a new entry:
+#   1. Benchmark must be deterministic-allocation — B/op + allocs/op are
+#      hardware-invariant and gate cleanly across CI runner and dev box;
+#      sec/op is noisy on dev machines and only fires the gate when the
+#      regression exceeds the run-to-run CV (which is the right behavior).
+#   2. Benchmark must run fast (≲30s per package at count=10 × 500ms ×
+#      benches-in-package). Volume benchmarks such as
+#      bench/overhead/nccl_fr_bench_test.go (1 GiB/iter, self-asserting
+#      HeapAlloc) stay advisory and OUT of this registry.
+
+bench_entries=(
+  "module/pkg/patterns|^BenchmarkPodEvictedDetector"
+  "components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$"
+)