diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 290facac..4ad3fd39 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -128,6 +128,12 @@ jobs: run: make doc-check - name: build run: make build + - name: install benchstat + run: | + go install golang.org/x/perf/cmd/benchstat@v0.0.0-20260512194132-3cf34090a3db + echo "$(go env GOPATH)/bin" >> "$GITHUB_PATH" + - name: bench-check (perf regression gate) + run: make bench-check # M6: validate every docs/integrations/examples/*.yaml against the # right binary. Tracecore-tagged recipes use the in-tree binary; diff --git a/CHANGELOG.md b/CHANGELOG.md index 4ff7e3ea..0a6411aa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,8 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee ## [Unreleased] +**`make bench-check` re-baselined as a real perf-regression gate.** Since PR-K.2 deleted the in-tree k8sevents receiver (whose `testdata/bench-baseline.txt` was the gate's sole row), the target had been a no-op `for pkg in ;` empty loop. The loop now iterates a quoted-shell registry in `scripts/bench-check-all.sh` covering `module/pkg/patterns` (`BenchmarkPodEvictedDetector_1kEventWindow`, the M19 detector budget) and `components/receivers/pyspy` (`BenchmarkParseDump` + `BenchmarkStackID`, the M18 join-key hash + faulthandler parser). Each package owns a checked-in `testdata/bench-baseline.txt` captured at `count=10 -benchtime=500ms -benchmem` (benchstat needs ≥6 samples for a meaningful CI band); regenerate with `make bench-baseline` after a vetted, intentional perf change and commit the diff. `scripts/bench-check.sh` parses the `+NN.NN%` deltas from `benchstat baseline new` for the `B/op` + `allocs/op` tables and exits non-zero on any row past `THRESHOLD` (default 10%, env-overridable); `sec/op` rows are skipped because wall-clock CV on dev hardware and shared CI runners routinely crosses 10% from background load alone, even when benchstat marks the delta statistically significant. Alloc-count + B/op are the hardware-invariant signals that pin to 0% CV across runs and only move when the code does. The 1 GiB `bench/overhead/nccl_fr_bench_test.go` stays advisory (self-asserts on `HeapAlloc` against NORTHSTARS O2; too slow + already gated internally) and is NOT in the registry. CI runs the gate in `verify-static` after a pinned `go install golang.org/x/perf/cmd/benchstat@...` step. The shell-registry indirection (Makefile → `scripts/bench-check-all.sh`) exists because the bench regex `^Benchmark(ParseDump|StackID)$` contains parentheses that the Make-invoked `/bin/sh` tokenises as subshell metacharacters. Closes #227. + **`tracecore --version` now reports the git-describe state of the build commit**, not the hardcoded `builder-config.yaml` `dist.version` value. `make build` resolves the injected version in order: `$TRACECORE_VERSION` (CI / release override) → `git describe --tags --always --match 'v*' --dirty=-dev` → the unmodified `dist.version` fallback (used when git is unavailable, e.g. a tarball extract). The `v*` filter keeps `module/vX.Y.Z` submodule tags from shadowing the binary's release namespace; the `-dev` suffix flags uncommitted working trees so a laptop build is never indistinguishable from a tagged release. The injection sed-rewrites a temporary copy of `builder-config.yaml` and restores the original on exit via a trap, so the on-disk source-tree value stays in lockstep with `Chart.yaml` `appVersion` and `make doc-check`'s parity gate. `.github/workflows/release.yml` pins `TRACECORE_VERSION="$TAG"` for both the goreleaser pre-build matrix and the ko-publish step so the in-image binary's `--version` matches the published tag verbatim regardless of git-history depth. Closes RFC-0013 PR-D follow-up. Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-first-pivot.md))** - binary now assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`) containing only the moat (NCCL FlightRecorder receiver, OTTL processors with windowed semantics, pattern detectors). The M1 in-tree pipeline runtime + factory-based assembly is queued for deletion at v0.1.0 in favor of the OCB-generated boot path; the canonical `clockreceiver` + `stdoutexporter` examples ship for one PR cycle and then exit. Targeting v0.1.0 / v0.2.0 / v0.3.0 release boundaries per RFC-0013 §4. diff --git a/Makefile b/Makefile index cf98ff50..520f0c97 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ .PHONY: help build clean hooks # Test suites -.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check +.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-baseline # Format + tidy .PHONY: fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify @@ -79,26 +79,15 @@ test: ## Run unit tests with the race detector. bench: ## Run benchmarks across the repo with -benchmem, count=5. go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./... -bench-check: ## Run benches; fail if any row regresses >10% vs the per-package baseline under testdata/bench-baseline.txt. THRESHOLD overridable. - @# No per-package baselines remain in tree at this point — the - @# k8sevents baseline was retired in RFC-0013 PR-K.2 alongside the - @# in-tree k8sevents receiver. Future per-package baselines wire - @# back into the `for pkg in ...` loop below. The target stays so - @# downstream automation (`make ci`) keeps a stable invocation. - @status=0; \ - out=$$(mktemp); \ - trap 'rm -f $$out' EXIT; \ - for pkg in ; do \ - baseline=$$pkg/testdata/bench-baseline.txt; \ - if [ ! -f $$baseline ]; then \ - echo "bench-check: missing baseline $$baseline" >&2; \ - status=2; continue; \ - fi; \ - echo "==> $$pkg"; \ - go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./$$pkg/ > $$out 2>&1; \ - scripts/bench-check.sh $$baseline $$out $${THRESHOLD:-10} || status=$$?; \ - done; \ - exit $$status +bench-check: ## Run perf gate: bench each gated pkg vs its committed baseline; fail if any row regresses >THRESHOLD% (default 10). + @# Per-package registry + execution lives in scripts/bench-check-all.sh + @# rather than inline here: the bench regex contains parentheses that + @# Make-invoked /bin/sh tokenises as subshell metacharacters. THRESHOLD + @# env var passes through ("THRESHOLD=20 make bench-check" etc.). + scripts/bench-check-all.sh + +bench-baseline: ## Regenerate every per-package bench-baseline.txt. Commit the diff after vetting it on your hardware. + scripts/bench-baseline.sh fmt: ## Check formatting; fails if any file is not gofumpt-clean. diff --git a/components/receivers/pyspy/testdata/bench-baseline.txt b/components/receivers/pyspy/testdata/bench-baseline.txt new file mode 100644 index 00000000..4dfe3ef4 --- /dev/null +++ b/components/receivers/pyspy/testdata/bench-baseline.txt @@ -0,0 +1,26 @@ +goos: darwin +goarch: arm64 +pkg: github.com/tracecoreai/tracecore/components/receivers/pyspy +cpu: Apple M1 Max +BenchmarkParseDump-10 10000 69886 ns/op 66339 B/op 14 allocs/op +BenchmarkParseDump-10 10000 109513 ns/op 66337 B/op 14 allocs/op +BenchmarkParseDump-10 10000 50370 ns/op 66337 B/op 14 allocs/op +BenchmarkParseDump-10 10000 78885 ns/op 66337 B/op 14 allocs/op +BenchmarkParseDump-10 33966 19766 ns/op 66336 B/op 14 allocs/op +BenchmarkParseDump-10 42043 19763 ns/op 66336 B/op 14 allocs/op +BenchmarkParseDump-10 36619 16521 ns/op 66336 B/op 14 allocs/op +BenchmarkParseDump-10 10000 68161 ns/op 66337 B/op 14 allocs/op +BenchmarkParseDump-10 33739 24151 ns/op 66336 B/op 14 allocs/op +BenchmarkParseDump-10 32496 17291 ns/op 66336 B/op 14 allocs/op +BenchmarkStackID-10 1000000 2463 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 1000000 2175 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 1000000 2419 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 1000000 2151 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 788203 2649 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 442069 2578 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 605858 2137 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 1000000 2168 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 1000000 1993 ns/op 32 B/op 3 allocs/op +BenchmarkStackID-10 982612 2085 ns/op 32 B/op 3 allocs/op +PASS +ok github.com/tracecoreai/tracecore/components/receivers/pyspy 32.159s diff --git a/module/pkg/patterns/testdata/bench-baseline.txt b/module/pkg/patterns/testdata/bench-baseline.txt new file mode 100644 index 00000000..b9701c6b --- /dev/null +++ b/module/pkg/patterns/testdata/bench-baseline.txt @@ -0,0 +1,16 @@ +goos: darwin +goarch: arm64 +pkg: github.com/tracecoreai/tracecore/module/pkg/patterns +cpu: Apple M1 Max +BenchmarkPodEvictedDetector_1kEventWindow-10 278 2311466 ns/op 847109 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 249 2603816 ns/op 847115 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 214 2610155 ns/op 847060 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 241 2670796 ns/op 847112 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 249 2468809 ns/op 847070 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 222 2466137 ns/op 847072 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 237 2476560 ns/op 847063 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 256 2770240 ns/op 847087 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 267 2589770 ns/op 847115 B/op 15635 allocs/op +BenchmarkPodEvictedDetector_1kEventWindow-10 250 2418795 ns/op 847075 B/op 15635 allocs/op +PASS +ok github.com/tracecoreai/tracecore/module/pkg/patterns 9.643s diff --git a/scripts/bench-baseline.sh b/scripts/bench-baseline.sh new file mode 100755 index 00000000..675a1300 --- /dev/null +++ b/scripts/bench-baseline.sh @@ -0,0 +1,29 @@ +#!/usr/bin/env bash +# bench-baseline.sh — regenerate every per-package bench baseline +# registered in scripts/bench-registry.sh. +# +# benchstat needs >= 6 samples for confidence intervals at level 0.95 +# (anything less yields a "need >= 6 samples" warning + ± ∞ band, which +# defeats the noise filter). count=10 gives headroom + matches what +# `make bench-check` runs, so a freshly-regenerated baseline self- +# compares cleanly under `benchstat `. +# +# Run on the hardware you intend to gate on. The committed baseline +# pins B/op + allocs/op (hardware-invariant) precisely; sec/op gating +# only fires when a regression exceeds the run-to-run CV, which is the +# right behavior — alloc-count growth is the load-bearing signal, not +# wall-clock jitter. +set -euo pipefail + +# shellcheck source=scripts/bench-registry.sh +source "$(dirname "$0")/bench-registry.sh" + +for entry in "${bench_entries[@]}"; do + pkg="${entry%%|*}" + pat="${entry#*|}" + baseline="$pkg/testdata/bench-baseline.txt" + mkdir -p "$pkg/testdata" + echo "==> regenerating $baseline ($pat)" + go test -bench="$pat" -benchmem -benchtime=500ms -count=10 \ + -run='^$' "./$pkg/" > "$baseline" +done diff --git a/scripts/bench-check-all.sh b/scripts/bench-check-all.sh new file mode 100755 index 00000000..5fe607c7 --- /dev/null +++ b/scripts/bench-check-all.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# bench-check-all.sh — perf gate runner. +# +# Iterates the registry in scripts/bench-registry.sh and, for each +# (package, bench-regex) entry, runs the matching benchmarks and +# compares results against the package's frozen +# `testdata/bench-baseline.txt` via scripts/bench-check.sh (benchstat +# under the hood). +# +# Exit codes: +# 0 every gated row stayed within THRESHOLD (default 10%). +# 1 one or more rows regressed past THRESHOLD. +# 2 a baseline file was missing or benchstat itself was missing. +set -euo pipefail + +# shellcheck source=scripts/bench-registry.sh +source "$(dirname "$0")/bench-registry.sh" + +threshold="${THRESHOLD:-10}" + +status=0 +out=$(mktemp) +trap 'rm -f "$out"' EXIT + +for entry in "${bench_entries[@]}"; do + pkg="${entry%%|*}" + pat="${entry#*|}" + baseline="$pkg/testdata/bench-baseline.txt" + if [[ ! -f "$baseline" ]]; then + echo "bench-check: missing baseline $baseline" >&2 + status=2 + continue + fi + echo "==> $pkg ($pat)" + go test -bench="$pat" -benchmem -benchtime=500ms -count=10 \ + -run='^$' "./$pkg/" > "$out" 2>&1 + scripts/bench-check.sh "$baseline" "$out" "$threshold" || status=$? +done + +exit $status diff --git a/scripts/bench-check.sh b/scripts/bench-check.sh index f2dd5642..486ecd2b 100755 --- a/scripts/bench-check.sh +++ b/scripts/bench-check.sh @@ -37,12 +37,32 @@ trap 'rm -f "$cmp"' EXIT benchstat "$baseline" "$new" | tee "$cmp" -# Parse `+NN.NN%` regressions from benchstat output. The vs-base -# column shows `~` for noise, `+X.YZ%` for slowdowns, `-X.YZ%` for -# speedups. We only care about + with magnitude > threshold. +# Parse `+NN.NN%` regressions from benchstat output, restricted to the +# B/op and allocs/op tables. sec/op rows are deliberately ignored: the +# wall-clock CV on dev hardware (and on shared CI runners) routinely +# crosses the 10% threshold from background-load jitter alone, even +# when benchstat marks the delta as statistically significant — gating +# on it would false-fire on most PRs without catching anything code +# review can act on. Alloc-count and B/op growth are the +# hardware-invariant signals that track real regressions (escape- +# analysis bugs, accidentally-heap-allocated locals, hot-loop string +# concatenations, etc.); they pin to 0% CV across runs and only move +# when the code does. # -# awk extracts {benchmark_name, delta_percent} pairs, then we filter. +# Tracking the active table: benchstat prints an indented header line +# containing only whitespace, box-drawing chars (`│`, U+2502 — multi- +# byte, not portable to match as a literal in awk regex), and the unit +# label (`sec/op` / `B/op` / `allocs/op`) before each comparison table. +# We detect these by `index(line, "")` since the unit substrings +# do not appear in benchmark names (Go disallows `/` in identifiers). +# Header lines start with whitespace; data rows start with an alpha +# benchmark name; the indent prefix is therefore an unambiguous +# disambiguator. Set `gate=1` only inside B/op + allocs/op tables. regressed=$(awk -v thr="$threshold" ' + /^ +/ && index($0, "sec/op") { gate = 0; next } + /^ +/ && index($0, "B/op") { gate = 1; next } + /^ +/ && index($0, "allocs/op") { gate = 1; next } + !gate { next } # Skip header lines that contain column titles (no benchmark name). /^[A-Za-z]/ && /\+[0-9]+(\.[0-9]+)?%/ { # Match either +NN.NN% (current benchstat output) or +NN% diff --git a/scripts/bench-registry.sh b/scripts/bench-registry.sh new file mode 100644 index 00000000..6a57f89f --- /dev/null +++ b/scripts/bench-registry.sh @@ -0,0 +1,25 @@ +# shellcheck shell=bash +# bench-registry.sh — single source of truth for the perf-gate registry. +# +# Sourced by scripts/bench-check-all.sh (gate runner) and +# scripts/bench-baseline.sh (regenerator); not directly executable. +# Defines the `bench_entries` array; each entry is "pkg|regex": +# pkg — Go import path relative to repo root. +# regex — `-bench` regex; quote-safe since both consumers run under +# plain shell, not Make-invoked /bin/sh (which would tokenise +# unescaped parentheses as subshell metacharacters). +# +# Gating criteria for a new entry: +# 1. Benchmark must be deterministic-allocation — B/op + allocs/op are +# hardware-invariant and gate cleanly across CI runner and dev box; +# sec/op is noisy on dev machines and only fires the gate when the +# regression exceeds the run-to-run CV (which is the right behavior). +# 2. Benchmark must run fast (≲30s per package at count=10 × 500ms × +# benches-in-package). Volume benchmarks such as +# bench/overhead/nccl_fr_bench_test.go (1 GiB/iter, self-asserting +# HeapAlloc) stay advisory and OUT of this registry. + +bench_entries=( + "module/pkg/patterns|^BenchmarkPodEvictedDetector" + "components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$" +)