Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,12 @@ jobs:
run: make doc-check
- name: build
run: make build
- name: install benchstat
run: |
go install golang.org/x/perf/cmd/benchstat@v0.0.0-20260512194132-3cf34090a3db
echo "$(go env GOPATH)/bin" >> "$GITHUB_PATH"
- name: bench-check (perf regression gate)
run: make bench-check

# M6: validate every docs/integrations/examples/*.yaml against the
# right binary. Tracecore-tagged recipes use the in-tree binary;
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee

## [Unreleased]

**`make bench-check` re-baselined as a real perf-regression gate.** Since PR-K.2 deleted the in-tree k8sevents receiver (whose `testdata/bench-baseline.txt` was the gate's sole row), the target had been a no-op `for pkg in ;` empty loop. The loop now iterates a quoted-shell registry in `scripts/bench-check-all.sh` covering `module/pkg/patterns` (`BenchmarkPodEvictedDetector_1kEventWindow`, the M19 detector budget) and `components/receivers/pyspy` (`BenchmarkParseDump` + `BenchmarkStackID`, the M18 join-key hash + faulthandler parser). Each package owns a checked-in `testdata/bench-baseline.txt` captured at `count=10 -benchtime=500ms -benchmem` (benchstat needs ≥6 samples for a meaningful CI band); regenerate with `make bench-baseline` after a vetted, intentional perf change and commit the diff. `scripts/bench-check.sh` parses the `+NN.NN%` deltas from `benchstat baseline new` for the `B/op` + `allocs/op` tables and exits non-zero on any row past `THRESHOLD` (default 10%, env-overridable); `sec/op` rows are skipped because wall-clock CV on dev hardware and shared CI runners routinely crosses 10% from background load alone, even when benchstat marks the delta statistically significant. Alloc-count + B/op are the hardware-invariant signals that pin to 0% CV across runs and only move when the code does. The 1 GiB `bench/overhead/nccl_fr_bench_test.go` stays advisory (self-asserts on `HeapAlloc` against NORTHSTARS O2; too slow + already gated internally) and is NOT in the registry. CI runs the gate in `verify-static` after a pinned `go install golang.org/x/perf/cmd/benchstat@...` step. The shell-registry indirection (Makefile → `scripts/bench-check-all.sh`) exists because the bench regex `^Benchmark(ParseDump|StackID)$` contains parentheses that the Make-invoked `/bin/sh` tokenises as subshell metacharacters. Closes #227.

**`tracecore --version` now reports the git-describe state of the build commit**, not the hardcoded `builder-config.yaml` `dist.version` value. `make build` resolves the injected version in order: `$TRACECORE_VERSION` (CI / release override) → `git describe --tags --always --match 'v*' --dirty=-dev` → the unmodified `dist.version` fallback (used when git is unavailable, e.g. a tarball extract). The `v*` filter keeps `module/vX.Y.Z` submodule tags from shadowing the binary's release namespace; the `-dev` suffix flags uncommitted working trees so a laptop build is never indistinguishable from a tagged release. The injection sed-rewrites a temporary copy of `builder-config.yaml` and restores the original on exit via a trap, so the on-disk source-tree value stays in lockstep with `Chart.yaml` `appVersion` and `make doc-check`'s parity gate. `.github/workflows/release.yml` pins `TRACECORE_VERSION="$TAG"` for both the goreleaser pre-build matrix and the ko-publish step so the in-image binary's `--version` matches the published tag verbatim regardless of git-history depth. Closes RFC-0013 PR-D follow-up.

Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-first-pivot.md))** - binary now assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`) containing only the moat (NCCL FlightRecorder receiver, OTTL processors with windowed semantics, pattern detectors). The M1 in-tree pipeline runtime + factory-based assembly is queued for deletion at v0.1.0 in favor of the OCB-generated boot path; the canonical `clockreceiver` + `stdoutexporter` examples ship for one PR cycle and then exit. Targeting v0.1.0 / v0.2.0 / v0.3.0 release boundaries per RFC-0013 §4.
Expand Down
31 changes: 10 additions & 21 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
.PHONY: help build clean hooks

# Test suites
.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check
.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-baseline

# Format + tidy
.PHONY: fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify
Expand Down Expand Up @@ -79,26 +79,15 @@ test: ## Run unit tests with the race detector.
bench: ## Run benchmarks across the repo with -benchmem, count=5.
go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./...

bench-check: ## Run benches; fail if any row regresses >10% vs the per-package baseline under testdata/bench-baseline.txt. THRESHOLD overridable.
@# No per-package baselines remain in tree at this point — the
@# k8sevents baseline was retired in RFC-0013 PR-K.2 alongside the
@# in-tree k8sevents receiver. Future per-package baselines wire
@# back into the `for pkg in ...` loop below. The target stays so
@# downstream automation (`make ci`) keeps a stable invocation.
@status=0; \
out=$$(mktemp); \
trap 'rm -f $$out' EXIT; \
for pkg in ; do \
baseline=$$pkg/testdata/bench-baseline.txt; \
if [ ! -f $$baseline ]; then \
echo "bench-check: missing baseline $$baseline" >&2; \
status=2; continue; \
fi; \
echo "==> $$pkg"; \
go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./$$pkg/ > $$out 2>&1; \
scripts/bench-check.sh $$baseline $$out $${THRESHOLD:-10} || status=$$?; \
done; \
exit $$status
bench-check: ## Run perf gate: bench each gated pkg vs its committed baseline; fail if any row regresses >THRESHOLD% (default 10).
@# Per-package registry + execution lives in scripts/bench-check-all.sh
@# rather than inline here: the bench regex contains parentheses that
@# Make-invoked /bin/sh tokenises as subshell metacharacters. THRESHOLD
@# env var passes through ("THRESHOLD=20 make bench-check" etc.).
scripts/bench-check-all.sh

bench-baseline: ## Regenerate every per-package bench-baseline.txt. Commit the diff after vetting it on your hardware.
scripts/bench-baseline.sh


fmt: ## Check formatting; fails if any file is not gofumpt-clean.
Expand Down
26 changes: 26 additions & 0 deletions components/receivers/pyspy/testdata/bench-baseline.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
goos: darwin
goarch: arm64
pkg: github.com/tracecoreai/tracecore/components/receivers/pyspy
cpu: Apple M1 Max
BenchmarkParseDump-10 10000 69886 ns/op 66339 B/op 14 allocs/op
BenchmarkParseDump-10 10000 109513 ns/op 66337 B/op 14 allocs/op
BenchmarkParseDump-10 10000 50370 ns/op 66337 B/op 14 allocs/op
BenchmarkParseDump-10 10000 78885 ns/op 66337 B/op 14 allocs/op
BenchmarkParseDump-10 33966 19766 ns/op 66336 B/op 14 allocs/op
BenchmarkParseDump-10 42043 19763 ns/op 66336 B/op 14 allocs/op
BenchmarkParseDump-10 36619 16521 ns/op 66336 B/op 14 allocs/op
BenchmarkParseDump-10 10000 68161 ns/op 66337 B/op 14 allocs/op
BenchmarkParseDump-10 33739 24151 ns/op 66336 B/op 14 allocs/op
BenchmarkParseDump-10 32496 17291 ns/op 66336 B/op 14 allocs/op
BenchmarkStackID-10 1000000 2463 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 1000000 2175 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 1000000 2419 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 1000000 2151 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 788203 2649 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 442069 2578 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 605858 2137 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 1000000 2168 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 1000000 1993 ns/op 32 B/op 3 allocs/op
BenchmarkStackID-10 982612 2085 ns/op 32 B/op 3 allocs/op
PASS
ok github.com/tracecoreai/tracecore/components/receivers/pyspy 32.159s
16 changes: 16 additions & 0 deletions module/pkg/patterns/testdata/bench-baseline.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
goos: darwin
goarch: arm64
pkg: github.com/tracecoreai/tracecore/module/pkg/patterns
cpu: Apple M1 Max
BenchmarkPodEvictedDetector_1kEventWindow-10 278 2311466 ns/op 847109 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 249 2603816 ns/op 847115 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 214 2610155 ns/op 847060 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 241 2670796 ns/op 847112 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 249 2468809 ns/op 847070 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 222 2466137 ns/op 847072 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 237 2476560 ns/op 847063 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 256 2770240 ns/op 847087 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 267 2589770 ns/op 847115 B/op 15635 allocs/op
BenchmarkPodEvictedDetector_1kEventWindow-10 250 2418795 ns/op 847075 B/op 15635 allocs/op
PASS
ok github.com/tracecoreai/tracecore/module/pkg/patterns 9.643s
29 changes: 29 additions & 0 deletions scripts/bench-baseline.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/usr/bin/env bash
# bench-baseline.sh — regenerate every per-package bench baseline
# registered in scripts/bench-registry.sh.
#
# benchstat needs >= 6 samples for confidence intervals at level 0.95
# (anything less yields a "need >= 6 samples" warning + ± ∞ band, which
# defeats the noise filter). count=10 gives headroom + matches what
# `make bench-check` runs, so a freshly-regenerated baseline self-
# compares cleanly under `benchstat <baseline> <baseline>`.
#
# Run on the hardware you intend to gate on. The committed baseline
# pins B/op + allocs/op (hardware-invariant) precisely; sec/op gating
# only fires when a regression exceeds the run-to-run CV, which is the
# right behavior — alloc-count growth is the load-bearing signal, not
# wall-clock jitter.
set -euo pipefail

# shellcheck source=scripts/bench-registry.sh
source "$(dirname "$0")/bench-registry.sh"

for entry in "${bench_entries[@]}"; do
pkg="${entry%%|*}"
pat="${entry#*|}"
baseline="$pkg/testdata/bench-baseline.txt"
mkdir -p "$pkg/testdata"
echo "==> regenerating $baseline ($pat)"
go test -bench="$pat" -benchmem -benchtime=500ms -count=10 \
-run='^$' "./$pkg/" > "$baseline"
done
40 changes: 40 additions & 0 deletions scripts/bench-check-all.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/usr/bin/env bash
# bench-check-all.sh — perf gate runner.
#
# Iterates the registry in scripts/bench-registry.sh and, for each
# (package, bench-regex) entry, runs the matching benchmarks and
# compares results against the package's frozen
# `testdata/bench-baseline.txt` via scripts/bench-check.sh (benchstat
# under the hood).
#
# Exit codes:
# 0 every gated row stayed within THRESHOLD (default 10%).
# 1 one or more rows regressed past THRESHOLD.
# 2 a baseline file was missing or benchstat itself was missing.
set -euo pipefail

# shellcheck source=scripts/bench-registry.sh
source "$(dirname "$0")/bench-registry.sh"

threshold="${THRESHOLD:-10}"

status=0
out=$(mktemp)
trap 'rm -f "$out"' EXIT

for entry in "${bench_entries[@]}"; do
pkg="${entry%%|*}"
pat="${entry#*|}"
baseline="$pkg/testdata/bench-baseline.txt"
if [[ ! -f "$baseline" ]]; then
echo "bench-check: missing baseline $baseline" >&2
status=2
continue
fi
echo "==> $pkg ($pat)"
go test -bench="$pat" -benchmem -benchtime=500ms -count=10 \
-run='^$' "./$pkg/" > "$out" 2>&1
scripts/bench-check.sh "$baseline" "$out" "$threshold" || status=$?
done

exit $status
28 changes: 24 additions & 4 deletions scripts/bench-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,32 @@ trap 'rm -f "$cmp"' EXIT

benchstat "$baseline" "$new" | tee "$cmp"

# Parse `+NN.NN%` regressions from benchstat output. The vs-base
# column shows `~` for noise, `+X.YZ%` for slowdowns, `-X.YZ%` for
# speedups. We only care about + with magnitude > threshold.
# Parse `+NN.NN%` regressions from benchstat output, restricted to the
# B/op and allocs/op tables. sec/op rows are deliberately ignored: the
# wall-clock CV on dev hardware (and on shared CI runners) routinely
# crosses the 10% threshold from background-load jitter alone, even
# when benchstat marks the delta as statistically significant — gating
# on it would false-fire on most PRs without catching anything code
# review can act on. Alloc-count and B/op growth are the
# hardware-invariant signals that track real regressions (escape-
# analysis bugs, accidentally-heap-allocated locals, hot-loop string
# concatenations, etc.); they pin to 0% CV across runs and only move
# when the code does.
#
# awk extracts {benchmark_name, delta_percent} pairs, then we filter.
# Tracking the active table: benchstat prints an indented header line
# containing only whitespace, box-drawing chars (`│`, U+2502 — multi-
# byte, not portable to match as a literal in awk regex), and the unit
# label (`sec/op` / `B/op` / `allocs/op`) before each comparison table.
# We detect these by `index(line, "<unit>")` since the unit substrings
# do not appear in benchmark names (Go disallows `/` in identifiers).
# Header lines start with whitespace; data rows start with an alpha
# benchmark name; the indent prefix is therefore an unambiguous
# disambiguator. Set `gate=1` only inside B/op + allocs/op tables.
regressed=$(awk -v thr="$threshold" '
/^ +/ && index($0, "sec/op") { gate = 0; next }
/^ +/ && index($0, "B/op") { gate = 1; next }
/^ +/ && index($0, "allocs/op") { gate = 1; next }
!gate { next }
# Skip header lines that contain column titles (no benchmark name).
/^[A-Za-z]/ && /\+[0-9]+(\.[0-9]+)?%/ {
# Match either +NN.NN% (current benchstat output) or +NN%
Expand Down
25 changes: 25 additions & 0 deletions scripts/bench-registry.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# shellcheck shell=bash
# bench-registry.sh — single source of truth for the perf-gate registry.
#
# Sourced by scripts/bench-check-all.sh (gate runner) and
# scripts/bench-baseline.sh (regenerator); not directly executable.
# Defines the `bench_entries` array; each entry is "pkg|regex":
# pkg — Go import path relative to repo root.
# regex — `-bench` regex; quote-safe since both consumers run under
# plain shell, not Make-invoked /bin/sh (which would tokenise
# unescaped parentheses as subshell metacharacters).
#
# Gating criteria for a new entry:
# 1. Benchmark must be deterministic-allocation — B/op + allocs/op are
# hardware-invariant and gate cleanly across CI runner and dev box;
# sec/op is noisy on dev machines and only fires the gate when the
# regression exceeds the run-to-run CV (which is the right behavior).
# 2. Benchmark must run fast (≲30s per package at count=10 × 500ms ×
# benches-in-package). Volume benchmarks such as
# bench/overhead/nccl_fr_bench_test.go (1 GiB/iter, self-asserting
# HeapAlloc) stay advisory and OUT of this registry.

bench_entries=(
"module/pkg/patterns|^BenchmarkPodEvictedDetector"
"components/receivers/pyspy|^Benchmark(ParseDump|StackID)\$"
)
Loading