refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a)#242
Merged
Conversation
added 2 commits
May 31, 2026 13:07
Mechanical reshuffle of the pattern library and replay runner from
internal/synthesis/{patterns,replay}/ to module/pkg/{patterns,replay}/
so the moat surface lives inside the same Go submodule that hosts the
OTel components PR-I.2b will introduce (rankjoinprocessor, pattern-
detectorprocessor).
No behavior change. Imports updated everywhere; doc comments + the
chaos.yml + bench.yml workflow paths follow the new home. module/go.mod
gains direct requires on santhosh-tekuri/jsonschema/v6 + stretchr/
testify (the moved test files' deps); root go.mod sheds the now-
indirect jsonschema reference. Collector pins on module/ stay at
v0.110.0 — the MVS-floor trap from PR-I.1b is honored.
RFC-0013 §migration L249 amended to describe the I.2a/I.2b split.
Forward-looking doc references under docs/followups/ + docs/research/
updated to point at module/pkg/. Historical RFC + CHANGELOG entries
that stamp prior receiver-era paths are left intact.
Partial close of #221: PR-I.2a (relocate) lands here; PR-I.2b (new
processors + builder-config wiring + module/v0.2.0 tag) remains open.
Signed-off-by: Tri Lam <tri@maydow.com>
Adversarial-review followup. Two corrections to the chaos.yml updates the relocation commit made: 1. Path triggers `module/pkg/**` was too broad — it would fire chaos on unrelated module/pkg additions (e.g. the nccl_fr_parser package that landed in PR-I.1b). Narrowed to `module/pkg/patterns/**` + `module/pkg/replay/**`, matching the pre-move scope shape. 2. `go test ./module/pkg/...` does not actually resolve from the root module (module/ is its own Go module; root `./...` does not cross into it under either GOWORK=on or GOWORK=off). The original `./internal/synthesis/...` worked because synthesis was in the root module. Fixed by `cd module && go test ./pkg/patterns/... ./pkg/replay/...`, mirroring the make-ci invocation shape used by the nccl-fr fuzz targets. Signed-off-by: Tri Lam <tri@maydow.com>
trilamsr
added a commit
that referenced
this pull request
May 31, 2026
…/4) (#243) ## Summary PR-1 of 4 in the OTel collector v0.110 -> v0.130 staged catch-up (#225). - **Tooling**: new `make bump-otel VERSION=0.X.0` Makefile target. Single-source sed rewrite across `builder-config.yaml` (16 lines), `module/go.mod` (collector/pdata require lines), and the `builder@v0.X.0` pin in `Makefile`. Defaults `PDATA_VERSION` to `1.<minor-94>.0` per the upstream offset (v0.110<->v1.16, v0.115<->v1.21, v0.130<->v1.36); override for off-cycle bumps. `go mod tidy` is left manual so reviewers see MVS-resolved drift in the diff. - **Bump**: collector v0.110.0 -> v0.115.0. `consumer` graduated to v1.21.0 (v1.x train); otel libs followed upstream to v1.32.0; pdata -> v1.21.0. Dropped two indirect entries removed in v0.115 (`component/componentprofiles`, `internal/globalsignal`). - **Test fix**: `receivertest.NewNopSettings(...)` took no args from v0.110 through v0.119.x; the existing `module/receiver/ncclfrreceiver/nccl_fr_test.go:288` call passed `componentType()` and was broken on `main` against the v0.110 pin. Drop the bogus arg -- `set.ID = component.NewIDWithName(componentType(), "test")` already carries the type for downstream BuildInfo derivation. (At v0.120 the bump-back signature lands as `NewNopSettingsWithType`, addressed in PR-2.) - **Docs**: RFC-0013 example block bumped in lockstep. Historical PR-I.1b narrative line left as-is (records what landed then). ## Out of scope (deferred per #225 plan) - PR-2: v0.115 -> v0.120 + Go 1.23 floor bump (module/go.mod `go 1.22.0` -> `go 1.23.0`). - PR-3: v0.120 -> v0.125 + `*profiles` -> `x*` migration (consumer/consumerprofiles -> consumer/xconsumer, etc). - PR-4: v0.125 -> v0.130 + `TLSSetting` -> `TLS` rename across config blocks. - Root `go.mod` untouched -- already at v1.59.0 / v0.153.0 from a separate code path. ## Renovate decision (per #225 ask) The issue suggested a Renovate `regexManagers` block for `builder-config.yaml` since Dependabot can't parse the freeform `gomod:` lines. Decision: **skip Renovate, document the choice**. Rationale: 1. `make bump-otel` already automates the multi-file rewrite end-to-end; Renovate's value-add would be opening a PR, but the PR body would still need a human to run `go mod tidy` (Renovate can't, in-PR, drive a shell-out for a non-go-module file). 2. Adding a second bot expands the toolchain footprint (renovate.json + Mend permissions) for one file's worth of regex coverage. 3. Dependabot keeps gomod ecosystem coverage of root/module go.mod, which is the lion's share. Revisit if (a) Dependabot adds custom regex managers, or (b) the bump cadence exceeds quarterly and the manual ergonomics start to bite. ## Verification - `GOWORK=off go build ./...` at root + module/: green. - `GOWORK=off go test ./...` in module/: green (was red pre-bump on `main`). Post-merge of #242 (which relocated `internal/synthesis/{patterns,replay}` -> `module/pkg/{patterns,replay}`) the package set under verification grew to four: `module`, `module/pkg/nccl/fr_parser`, `module/pkg/patterns`, `module/pkg/replay`, `module/receiver/ncclfrreceiver`. All green at v0.115.0. - `GOWORK=off go test -race ./...` in module/: green. - `make ci-fuzz-nccl-fr` (30s gate): PASS, 2 new corpus interesting. - `make build` (OCB end-to-end): produces `_build/tracecore` against `builder@v0.115.0`; `--version` prints expected. - `docker run --rm -v $(pwd)/install/kubernetes/tracecore:/chart alpine/helm:3.16.4 lint /chart`: 1 chart linted, 0 failed. - `make check` (fmt + tidy-check + lint + vet + mod-verify): clean. ```release-notes chore(otel): bump pinned collector v0.110 -> v0.115 (PR-1 of 4 in the v0.110 -> v0.130 catch-up). Adds `make bump-otel VERSION=0.X.0` to single-source the multi-file pin rewrite. No runtime/operator-visible behaviour change; OCB-built tracecore binary now resolves against collector v0.115.0 components. ``` Refs #225 (leave open; PR-2/3/4 to follow). --------- Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
9 tasks
trilamsr
added a commit
that referenced
this pull request
May 31, 2026
Closes #227. ## Summary `make bench-check` has been a no-op since PR-K.2 (#217) deleted `components/receivers/k8sevents/` — its `testdata/bench-baseline.txt` was the sole row in the gate's `for pkg in …` loop, leaving the loop body literally empty (`for pkg in ; do …`). The target stayed as a stub so `make ci` automation kept a stable invocation, with a TODO to re-register packages later. That re-registration is this PR. ## Root cause `Makefile:71-101` (pre-diff) iterated an empty list. The wrapper around `benchstat`+`scripts/bench-check.sh` was intact; only the *registry of which packages to gate* had been emptied. So the fix is purely a registration + baseline-capture exercise — no logic change to the gate's comparison machinery, plus one scope-tightening change called out under "Adversarial-review fix" below. ## What this PR does 1. **Registers two packages** with deterministic-allocation micro-benchmarks for gating: - `module/pkg/patterns` — `BenchmarkPodEvictedDetector_1kEventWindow` (the M19 detector budget). (Path is post-PR-I.2a / #242; this branch was rebased onto that merge during the PR.) - `components/receivers/pyspy` — `BenchmarkParseDump` + `BenchmarkStackID` (the M18 join-key hash + faulthandler parser). 2. **Captures committed baselines** at `module/pkg/patterns/testdata/bench-baseline.txt` and `components/receivers/pyspy/testdata/bench-baseline.txt`, both at `count=10 -benchtime=500ms -benchmem`. `count=10` is the minimum that survives benchstat's `≥6 samples for confidence interval at level 0.95` warning with headroom for one outlier. 3. **Extracts the registry to `scripts/bench-registry.sh`** sourced by both `scripts/bench-check-all.sh` (gate runner, wired to `make bench-check`) and `scripts/bench-baseline.sh` (regenerator, wired to `make bench-baseline`). Single source of truth — drift between "what we gate" and "what we regenerate" can no longer happen. Why scripts and not an inline Make loop: the bench regex `^Benchmark(ParseDump|StackID)$` contains parentheses that the Make-invoked `/bin/sh` tokenises as subshell metacharacters. Quoting under `make` shellquote escaping is more fragile than just punting to plain shell. 4. **Restricts the gate to `B/op` + `allocs/op` tables** in `scripts/bench-check.sh`; `sec/op` is no longer gated. See "Adversarial-review fix" below. 5. **Wires the gate into CI** as a new step in `.github/workflows/ci.yml`'s `verify-static` job, after `make build`. A pinned `go install golang.org/x/perf/cmd/benchstat@…` step installs the tool on the runner. ## Adversarial-review fix (mid-PR) The first commit gated sec/op too, with the design note claiming pure wall-clock jitter would show as benchstat's `~` (non-significant) and never reach the `+NN%` parser. **That's false.** Repeating `make bench-check` immediately after a baseline capture, with identical code on identical hardware, fired: ``` PodEvictedDetector_1kEventWindow-10 sec/op +16.50% (p=0.001 n=10) PodEvictedDetector_1kEventWindow-10 B/op +0.01% (p=0.005 n=10) PodEvictedDetector_1kEventWindow-10 allocs/op ~ (p=1.000 n=10) ``` The 16.5% delta was background-load drift between two runs minutes apart. benchstat marks it significant because the CV is small (~5-10%) and the means are reliably different — both true symptoms of *real* but *uninteresting* wall-clock variance. Gating on sec/op would produce continuous false fires across the team without catching anything code review can act on. **Fix**: `scripts/bench-check.sh` now tracks the active benchstat table by indented header line (`sec/op` / `B/op` / `allocs/op`) and only counts `+NN%` deltas under `B/op` + `allocs/op`. Both stay pinned to 0% CV across runs (deterministic Go allocator + identical bytes-allocated per code path) and only move when the code does. `make bench-check` on clean main now passes consistently across runs; the TDD planted regression (heap-allocating sink in `stackID`) still trips the gate at `+800% B/op` + `+133% allocs/op`. ## Design notes - **Why these two benches and not bench/overhead/nccl_fr_bench_test.go**: that benchmark replays 1 GiB of fixtures per iter (~90s on M1 Max) and *already* self-asserts on `HeapAlloc` delta against the NORTHSTARS O2 ceiling. Adding it to `bench-check` would push CI past the budget and produce nothing the in-bench assertion doesn't already gate. Stays advisory. - **Hardware skew is irrelevant for B/op + allocs/op**: baselines were captured on Apple M1 Max; CI runs on `ubuntu-latest` (x86_64). The Go allocator is deterministic across GOARCH for a given code path — `b.ReportAllocs()` returns identical numbers on both. The historical k8sevents baseline lived under the same cross-arch arrangement. - **`make ci` unchanged**: `bench-check` is NOT added to the `make ci` recipe. That would force every contributor to install benchstat. CI installs benchstat itself in the workflow; local dev runs `make bench-check` only when intentionally vetting a perf change. - **`module/` submodule resolution**: `module/pkg/patterns` lives in the in-repo Go submodule (`module/go.mod`); local dev runs resolve through `go.work` and OCB through `builder-config.yaml`'s `replaces: ./module`. `go test ./module/pkg/patterns/…` works from the repo root because of the workspace. ## TDD record - **Define gate**: scripts/bench-check-all.sh + bench-registry.sh register two packages; threshold 10%, env-overridable; only B/op + allocs/op rows count. - **Red (no plant)**: `make bench-check` on clean main passes — exit 0, both packages report `PASS: no benchmarks regressed by more than 10% vs baseline.` (Re-run multiple times across different machine load — stays green.) - **Red→Green (planted regression)**: edited `stackID` to allocate 64 B/iter via a package-level sink (escape-analysis-routed to the heap, so allocs/op actually moved); `make bench-check` exit 2 (make wraps the inner exit 1). Output flagged `StackID-10 +800.00% (p=0.000 n=10)` on B/op and `+133.33% (p=0.000 n=10)` on allocs/op. - **Green (revert)**: plant removed; `make bench-check` exit 0; `git diff main -- components/receivers/pyspy/stackid.go` empty. ## Release notes ```release-notes ### Added - `make bench-check` is now a real perf-regression gate again (was a no-op since PR-K.2). Registry-driven via `scripts/bench-registry.sh` (single source of truth, sourced by both `bench-check-all.sh` runner and `bench-baseline.sh` regenerator); the runner gates `B/op` + `allocs/op` deltas against a committed `testdata/bench-baseline.txt` via benchstat; any row regressing past `THRESHOLD%` (default 10, env-overridable) fails. `sec/op` is deliberately not gated (wall-clock CV on dev hardware and shared CI runners routinely crosses 10% from background load alone, even when benchstat marks the delta significant). Two packages registered: `module/pkg/patterns` (M19 PodEvictedDetector budget) and `components/receivers/pyspy` (M18 ParseDump + StackID hash). Regenerate baselines with `make bench-baseline` after a vetted, intentional perf change and commit the diff. - CI `verify-static` job now runs `make bench-check` after installing a pinned benchstat. Local dev only needs benchstat when intentionally running the perf gate. ``` ## Test plan - [x] `make check` (fmt, tidy-check, lint, vet, mod-verify) — clean (re-run post-adversarial-fix) - [x] `make actionlint` — clean - [x] `make zizmor` — clean (`No findings to report. Good job!`) - [x] `make license-check` — clean - [x] `make doc-check` — clean - [x] `make bench-check` on clean main — exit 0, both packages PASS, repeated runs stay green (false-positive bug fixed) - [x] `make bench-check` with deliberate alloc-regression planted in `stackID` — exit 2, gate flags `+800% B/op` + `+133% allocs/op` - [x] `make bench-check` after revert — exit 0 again, stackid.go diff vs main is empty - [ ] CI `verify-static` runs `make bench-check` green on this PR (awaiting CI) ## Self-grading **A** — root cause named, fix scoped to missing registration without expanding scope (no new benchmark framework, no rewrite of the benchstat-comparison machinery beyond the sec/op-vs-alloc scoping fix). TDD cycle complete with a deliberate regression that the gate catches. Adversarial self-review found a real false-positive bug (sec/op gating) and fixed it in-PR before merge. Registry extracted to a single source of truth so the two consumers can't drift. CHANGELOG entry written. Not A+ because: even with sec/op excluded, the gate still depends on benchstat's text-output format remaining stable; a future major version that changes column layout could silently disable the gate (it would skip every row, exit 0, print PASS). A future iteration could parse benchstat's `-format=csv` output instead, which is its versioned machine-readable interface. Out of scope here; the current text format hasn't changed for years and matches the existing `scripts/bench-check.sh` shape that already shipped with the k8sevents baseline. --------- Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Mechanical reshuffle of the pattern library and replay runner from
internal/synthesis/{patterns,replay}/intomodule/pkg/{patterns,replay}/so the moat surface lives inside the same Go submodule that hosts the OTel
components PR-I.2b will introduce (
rankjoinprocessor,patterndetectorprocessor).This is PR-I.2a per the RFC-0013 §migration L249 sub-slice; net-new
processor packages,
builder-config.yamlwiring, and themodule/v0.2.0tag bump land in PR-I.2b later.Why
internal/synthesis/patterns/was the only piece of the differentiator("moat") still living in the root module's
internal/tree after thePR-K wave deletions. Once PR-I.2b introduces the processor packages
they will import the pattern lib from
module/; a cross-module importchain that today is
module/receiver/* → root/internal/synthesis/*isthe wrong direction for the post-pivot architecture (processors target
the OCB-distribution baseline at collector v0.110.0 which
module/pins; root tracks v1.59.0).
What moved
24 files renamed via
git mv(rename similarity 93-100%), zero contentedits except:
github.com/tracecoreai/tracecore/internal/synthesis/{patterns,replay}→
…/module/pkg/{patterns,replay}.home).
module/pkg/replay/pod_evicted/_real_world/README.md: 3 pathreferences updated.
Module-graph plumbing
module/go.modgains direct requires ongithub.com/santhosh-tekuri/jsonschema/v6 v6.0.2andgithub.com/stretchr/testify v1.9.0(the moved test files' deps);module/go.sumpicks updlclark/regexp2 v1.11.0(jsonschema'stransitive).
go.modflipssanthosh-tekuri/jsonschema/v6to// indirect(root-module Go code no longer references it directly; the only
consumer was
verdict_schema_test.go, which moved).module/go.mod's collector pins stay atv0.110.0aftergo mod tidy. The OCB-distribution baseline pinfrom PR-I.1b is preserved —
scraperhelperis still part ofcollector/receiverat v0.110.0 and would break thehostmetricsreceiver@v0.110.0build if it floated forward tov1.59.0.
CI / workflow path updates
.github/workflows/bench.yml(M19 benchmark) →./module/pkg/patterns/..github/workflows/chaos.ymlpaths-filter trigger:internal/synthesis/**→module/pkg/patterns/**+module/pkg/replay/**(deliberately tighter thanmodule/pkg/**so unrelated module additions, e.g. nccl_fr_parser, do not fire the
chaos gate).
.github/workflows/chaos.ymlM19 race-test invocation:go test ./internal/synthesis/...→cd module && go test ./pkg/patterns/... ./pkg/replay/....The naive rewrite
./module/pkg/...does not resolve from root—
module/is its own Go module; the root./...glob does notcross into it under either GOWORK=on or GOWORK=off. Verified the
fixed invocation passes with
-race -count=1.Doc references updated
docs/rfcs/0013-distro-first-pivot.mdPR-I.2 paragraph amended todescribe the PR-I.2a / PR-I.2b sub-slice.
docs/followups/{M19,M4b,otlphttp}.md,docs/research/m16-kueue-production-followups.md: forward-lookingreferences repointed at
module/pkg/.paths are intentionally left intact.
Commits
refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a)— the 24-file
git mv+ import-path bump + go.mod/go.sum tidy +forward-looking doc + bench.yml + initial chaos.yml update.
ci(chaos): scope path triggers + test invocation to patterns + replay— adversarial-review follow-up tightening chaos.yml paths-filter
and fixing the broken root-module test invocation.
Out of scope (do not look for these)
rankjoinprocessororpatterndetectorprocessorpackages.builder-config.yamlprocessors:wiring.module/v0.2.0).processor packages).
All four land in PR-I.2b.
Test plan
GOWORK=off go build ./...clean in root.GOWORK=off go build ./...clean inmodule/.cd module && go test ./pkg/patterns/... ./pkg/replay/...green(the rubric L405/L406/L414 gates pass at the new location).
cd module && go test -race -count=1 ./pkg/patterns/... ./pkg/replay/...green (the corrected chaos.yml invocation).GOWORK=off go test ./...green in root (12 packages —internal/synthesiswas the only consumer left in rootpost-PR-K).
make checkgreen (fmt, tidy-check, lint, vet, mod-verify).make verifygreen (adds license, fixtures, build-tags, RCEgate, register-lint, actionlint, zizmor, doc-check,
no-autoupdate).
module/go.modcollector pins still atv0.110.0(MVS floorpreserved post-tidy).
grep); notools/orbench/coupling.Refs #221.