refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a) by trilamsr · Pull Request #242 · TraceCoreAI/tracecore

trilamsr · 2026-05-31T20:10:07Z

Summary

Mechanical reshuffle of the pattern library and replay runner from
internal/synthesis/{patterns,replay}/ into module/pkg/{patterns,replay}/
so the moat surface lives inside the same Go submodule that hosts the OTel
components PR-I.2b will introduce (rankjoinprocessor,
patterndetectorprocessor).

This is PR-I.2a per the RFC-0013 §migration L249 sub-slice; net-new
processor packages, builder-config.yaml wiring, and the
module/v0.2.0 tag bump land in PR-I.2b later.

Why

internal/synthesis/patterns/ was the only piece of the differentiator
("moat") still living in the root module's internal/ tree after the
PR-K wave deletions. Once PR-I.2b introduces the processor packages
they will import the pattern lib from module/; a cross-module import
chain that today is module/receiver/* → root/internal/synthesis/* is
the wrong direction for the post-pivot architecture (processors target
the OCB-distribution baseline at collector v0.110.0 which module/
pins; root tracks v1.59.0).

What moved

internal/synthesis/patterns/  →  module/pkg/patterns/
internal/synthesis/replay/    →  module/pkg/replay/

24 files renamed via git mv (rename similarity 93-100%), zero content
edits except:

5 Go files: import-path bump
github.com/tracecoreai/tracecore/internal/synthesis/{patterns,replay}
→ …/module/pkg/{patterns,replay}.
4 doc-comment path mentions inside the moved files (point at the new
home).
module/pkg/replay/pod_evicted/_real_world/README.md: 3 path
references updated.

Module-graph plumbing

module/go.mod gains direct requires on
github.com/santhosh-tekuri/jsonschema/v6 v6.0.2 and
github.com/stretchr/testify v1.9.0 (the moved test files' deps);
module/go.sum picks up dlclark/regexp2 v1.11.0 (jsonschema's
transitive).
Root go.mod flips santhosh-tekuri/jsonschema/v6 to // indirect
(root-module Go code no longer references it directly; the only
consumer was verdict_schema_test.go, which moved).
MVS-trap honored: module/go.mod's collector pins stay at
v0.110.0 after go mod tidy. The OCB-distribution baseline pin
from PR-I.1b is preserved — scraperhelper is still part of
collector/receiver at v0.110.0 and would break the
hostmetricsreceiver@v0.110.0 build if it floated forward to
v1.59.0.

CI / workflow path updates

.github/workflows/bench.yml (M19 benchmark) →
./module/pkg/patterns/.
.github/workflows/chaos.yml paths-filter trigger:
internal/synthesis/** → module/pkg/patterns/** +
module/pkg/replay/** (deliberately tighter than module/pkg/**
so unrelated module additions, e.g. nccl_fr_parser, do not fire the
chaos gate).
.github/workflows/chaos.yml M19 race-test invocation:
go test ./internal/synthesis/... →
cd module && go test ./pkg/patterns/... ./pkg/replay/....
The naive rewrite ./module/pkg/... does not resolve from root
— module/ is its own Go module; the root ./... glob does not
cross into it under either GOWORK=on or GOWORK=off. Verified the
fixed invocation passes with -race -count=1.

Doc references updated

docs/rfcs/0013-distro-first-pivot.md PR-I.2 paragraph amended to
describe the PR-I.2a / PR-I.2b sub-slice.
docs/followups/{M19,M4b,otlphttp}.md,
docs/research/m16-kueue-production-followups.md: forward-looking
references repointed at module/pkg/.
Historical RFC + CHANGELOG entries that stamp prior receiver-era
paths are intentionally left intact.

Commits

refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a)
— the 24-file git mv + import-path bump + go.mod/go.sum tidy +
forward-looking doc + bench.yml + initial chaos.yml update.
ci(chaos): scope path triggers + test invocation to patterns + replay
— adversarial-review follow-up tightening chaos.yml paths-filter
and fixing the broken root-module test invocation.

Out of scope (do not look for these)

No rankjoinprocessor or patterndetectorprocessor packages.
No builder-config.yaml processors: wiring.
No submodule tag bump (module/v0.2.0).
No M19 cross-signal join test relocation (that comes with the
processor packages).

All four land in PR-I.2b.

Test plan

GOWORK=off go build ./... clean in root.
GOWORK=off go build ./... clean in module/.
cd module && go test ./pkg/patterns/... ./pkg/replay/... green
(the rubric L405/L406/L414 gates pass at the new location).
cd module && go test -race -count=1 ./pkg/patterns/... ./pkg/replay/... green (the corrected chaos.yml invocation).
GOWORK=off go test ./... green in root (12 packages —
internal/synthesis was the only consumer left in root
post-PR-K).
make check green (fmt, tidy-check, lint, vet, mod-verify).
make verify green (adds license, fixtures, build-tags, RCE
gate, register-lint, actionlint, zizmor, doc-check,
no-autoupdate).
module/go.mod collector pins still at v0.110.0 (MVS floor
preserved post-tidy).
No root-module Go file imports the moved packages (verified via
grep); no tools/ or bench/ coupling.

PR-I.2a (relocate): pattern library + replay runner moved from
`internal/synthesis/{patterns,replay}/` to `module/pkg/{patterns,replay}/`
to colocate the moat surface with the Go submodule that will host the
OTel processors in PR-I.2b. Mechanical reshuffle only — no behavior
change, no API surface change, no public release artifact. Module pins
on `module/` stay at the OCB-distribution baseline (collector
v0.110.0). Partial close of #221; PR-I.2b (net-new processors +
builder-config wiring + `module/v0.2.0` tag) remains open.

Refs #221.

Mechanical reshuffle of the pattern library and replay runner from internal/synthesis/{patterns,replay}/ to module/pkg/{patterns,replay}/ so the moat surface lives inside the same Go submodule that hosts the OTel components PR-I.2b will introduce (rankjoinprocessor, pattern- detectorprocessor). No behavior change. Imports updated everywhere; doc comments + the chaos.yml + bench.yml workflow paths follow the new home. module/go.mod gains direct requires on santhosh-tekuri/jsonschema/v6 + stretchr/ testify (the moved test files' deps); root go.mod sheds the now- indirect jsonschema reference. Collector pins on module/ stay at v0.110.0 — the MVS-floor trap from PR-I.1b is honored. RFC-0013 §migration L249 amended to describe the I.2a/I.2b split. Forward-looking doc references under docs/followups/ + docs/research/ updated to point at module/pkg/. Historical RFC + CHANGELOG entries that stamp prior receiver-era paths are left intact. Partial close of #221: PR-I.2a (relocate) lands here; PR-I.2b (new processors + builder-config wiring + module/v0.2.0 tag) remains open. Signed-off-by: Tri Lam <tri@maydow.com>

Adversarial-review followup. Two corrections to the chaos.yml updates the relocation commit made: 1. Path triggers `module/pkg/**` was too broad — it would fire chaos on unrelated module/pkg additions (e.g. the nccl_fr_parser package that landed in PR-I.1b). Narrowed to `module/pkg/patterns/**` + `module/pkg/replay/**`, matching the pre-move scope shape. 2. `go test ./module/pkg/...` does not actually resolve from the root module (module/ is its own Go module; root `./...` does not cross into it under either GOWORK=on or GOWORK=off). The original `./internal/synthesis/...` worked because synthesis was in the root module. Fixed by `cd module && go test ./pkg/patterns/... ./pkg/replay/...`, mirroring the make-ci invocation shape used by the nccl-fr fuzz targets. Signed-off-by: Tri Lam <tri@maydow.com>

…/4) (#243) ## Summary PR-1 of 4 in the OTel collector v0.110 -> v0.130 staged catch-up (#225). - **Tooling**: new `make bump-otel VERSION=0.X.0` Makefile target. Single-source sed rewrite across `builder-config.yaml` (16 lines), `module/go.mod` (collector/pdata require lines), and the `builder@v0.X.0` pin in `Makefile`. Defaults `PDATA_VERSION` to `1.<minor-94>.0` per the upstream offset (v0.110<->v1.16, v0.115<->v1.21, v0.130<->v1.36); override for off-cycle bumps. `go mod tidy` is left manual so reviewers see MVS-resolved drift in the diff. - **Bump**: collector v0.110.0 -> v0.115.0. `consumer` graduated to v1.21.0 (v1.x train); otel libs followed upstream to v1.32.0; pdata -> v1.21.0. Dropped two indirect entries removed in v0.115 (`component/componentprofiles`, `internal/globalsignal`). - **Test fix**: `receivertest.NewNopSettings(...)` took no args from v0.110 through v0.119.x; the existing `module/receiver/ncclfrreceiver/nccl_fr_test.go:288` call passed `componentType()` and was broken on `main` against the v0.110 pin. Drop the bogus arg -- `set.ID = component.NewIDWithName(componentType(), "test")` already carries the type for downstream BuildInfo derivation. (At v0.120 the bump-back signature lands as `NewNopSettingsWithType`, addressed in PR-2.) - **Docs**: RFC-0013 example block bumped in lockstep. Historical PR-I.1b narrative line left as-is (records what landed then). ## Out of scope (deferred per #225 plan) - PR-2: v0.115 -> v0.120 + Go 1.23 floor bump (module/go.mod `go 1.22.0` -> `go 1.23.0`). - PR-3: v0.120 -> v0.125 + `*profiles` -> `x*` migration (consumer/consumerprofiles -> consumer/xconsumer, etc). - PR-4: v0.125 -> v0.130 + `TLSSetting` -> `TLS` rename across config blocks. - Root `go.mod` untouched -- already at v1.59.0 / v0.153.0 from a separate code path. ## Renovate decision (per #225 ask) The issue suggested a Renovate `regexManagers` block for `builder-config.yaml` since Dependabot can't parse the freeform `gomod:` lines. Decision: **skip Renovate, document the choice**. Rationale: 1. `make bump-otel` already automates the multi-file rewrite end-to-end; Renovate's value-add would be opening a PR, but the PR body would still need a human to run `go mod tidy` (Renovate can't, in-PR, drive a shell-out for a non-go-module file). 2. Adding a second bot expands the toolchain footprint (renovate.json + Mend permissions) for one file's worth of regex coverage. 3. Dependabot keeps gomod ecosystem coverage of root/module go.mod, which is the lion's share. Revisit if (a) Dependabot adds custom regex managers, or (b) the bump cadence exceeds quarterly and the manual ergonomics start to bite. ## Verification - `GOWORK=off go build ./...` at root + module/: green. - `GOWORK=off go test ./...` in module/: green (was red pre-bump on `main`). Post-merge of #242 (which relocated `internal/synthesis/{patterns,replay}` -> `module/pkg/{patterns,replay}`) the package set under verification grew to four: `module`, `module/pkg/nccl/fr_parser`, `module/pkg/patterns`, `module/pkg/replay`, `module/receiver/ncclfrreceiver`. All green at v0.115.0. - `GOWORK=off go test -race ./...` in module/: green. - `make ci-fuzz-nccl-fr` (30s gate): PASS, 2 new corpus interesting. - `make build` (OCB end-to-end): produces `_build/tracecore` against `builder@v0.115.0`; `--version` prints expected. - `docker run --rm -v $(pwd)/install/kubernetes/tracecore:/chart alpine/helm:3.16.4 lint /chart`: 1 chart linted, 0 failed. - `make check` (fmt + tidy-check + lint + vet + mod-verify): clean. ```release-notes chore(otel): bump pinned collector v0.110 -> v0.115 (PR-1 of 4 in the v0.110 -> v0.130 catch-up). Adds `make bump-otel VERSION=0.X.0` to single-source the multi-file pin rewrite. No runtime/operator-visible behaviour change; OCB-built tracecore binary now resolves against collector v0.115.0 components. ``` Refs #225 (leave open; PR-2/3/4 to follow). --------- Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>

Closes #227. ## Summary `make bench-check` has been a no-op since PR-K.2 (#217) deleted `components/receivers/k8sevents/` — its `testdata/bench-baseline.txt` was the sole row in the gate's `for pkg in …` loop, leaving the loop body literally empty (`for pkg in ; do …`). The target stayed as a stub so `make ci` automation kept a stable invocation, with a TODO to re-register packages later. That re-registration is this PR. ## Root cause `Makefile:71-101` (pre-diff) iterated an empty list. The wrapper around `benchstat`+`scripts/bench-check.sh` was intact; only the *registry of which packages to gate* had been emptied. So the fix is purely a registration + baseline-capture exercise — no logic change to the gate's comparison machinery, plus one scope-tightening change called out under "Adversarial-review fix" below. ## What this PR does 1. **Registers two packages** with deterministic-allocation micro-benchmarks for gating: - `module/pkg/patterns` — `BenchmarkPodEvictedDetector_1kEventWindow` (the M19 detector budget). (Path is post-PR-I.2a / #242; this branch was rebased onto that merge during the PR.) - `components/receivers/pyspy` — `BenchmarkParseDump` + `BenchmarkStackID` (the M18 join-key hash + faulthandler parser). 2. **Captures committed baselines** at `module/pkg/patterns/testdata/bench-baseline.txt` and `components/receivers/pyspy/testdata/bench-baseline.txt`, both at `count=10 -benchtime=500ms -benchmem`. `count=10` is the minimum that survives benchstat's `≥6 samples for confidence interval at level 0.95` warning with headroom for one outlier. 3. **Extracts the registry to `scripts/bench-registry.sh`** sourced by both `scripts/bench-check-all.sh` (gate runner, wired to `make bench-check`) and `scripts/bench-baseline.sh` (regenerator, wired to `make bench-baseline`). Single source of truth — drift between "what we gate" and "what we regenerate" can no longer happen. Why scripts and not an inline Make loop: the bench regex `^Benchmark(ParseDump|StackID)$` contains parentheses that the Make-invoked `/bin/sh` tokenises as subshell metacharacters. Quoting under `make` shellquote escaping is more fragile than just punting to plain shell. 4. **Restricts the gate to `B/op` + `allocs/op` tables** in `scripts/bench-check.sh`; `sec/op` is no longer gated. See "Adversarial-review fix" below. 5. **Wires the gate into CI** as a new step in `.github/workflows/ci.yml`'s `verify-static` job, after `make build`. A pinned `go install golang.org/x/perf/cmd/benchstat@…` step installs the tool on the runner. ## Adversarial-review fix (mid-PR) The first commit gated sec/op too, with the design note claiming pure wall-clock jitter would show as benchstat's `~` (non-significant) and never reach the `+NN%` parser. **That's false.** Repeating `make bench-check` immediately after a baseline capture, with identical code on identical hardware, fired: ``` PodEvictedDetector_1kEventWindow-10 sec/op +16.50% (p=0.001 n=10) PodEvictedDetector_1kEventWindow-10 B/op +0.01% (p=0.005 n=10) PodEvictedDetector_1kEventWindow-10 allocs/op ~ (p=1.000 n=10) ``` The 16.5% delta was background-load drift between two runs minutes apart. benchstat marks it significant because the CV is small (~5-10%) and the means are reliably different — both true symptoms of *real* but *uninteresting* wall-clock variance. Gating on sec/op would produce continuous false fires across the team without catching anything code review can act on. **Fix**: `scripts/bench-check.sh` now tracks the active benchstat table by indented header line (`sec/op` / `B/op` / `allocs/op`) and only counts `+NN%` deltas under `B/op` + `allocs/op`. Both stay pinned to 0% CV across runs (deterministic Go allocator + identical bytes-allocated per code path) and only move when the code does. `make bench-check` on clean main now passes consistently across runs; the TDD planted regression (heap-allocating sink in `stackID`) still trips the gate at `+800% B/op` + `+133% allocs/op`. ## Design notes - **Why these two benches and not bench/overhead/nccl_fr_bench_test.go**: that benchmark replays 1 GiB of fixtures per iter (~90s on M1 Max) and *already* self-asserts on `HeapAlloc` delta against the NORTHSTARS O2 ceiling. Adding it to `bench-check` would push CI past the budget and produce nothing the in-bench assertion doesn't already gate. Stays advisory. - **Hardware skew is irrelevant for B/op + allocs/op**: baselines were captured on Apple M1 Max; CI runs on `ubuntu-latest` (x86_64). The Go allocator is deterministic across GOARCH for a given code path — `b.ReportAllocs()` returns identical numbers on both. The historical k8sevents baseline lived under the same cross-arch arrangement. - **`make ci` unchanged**: `bench-check` is NOT added to the `make ci` recipe. That would force every contributor to install benchstat. CI installs benchstat itself in the workflow; local dev runs `make bench-check` only when intentionally vetting a perf change. - **`module/` submodule resolution**: `module/pkg/patterns` lives in the in-repo Go submodule (`module/go.mod`); local dev runs resolve through `go.work` and OCB through `builder-config.yaml`'s `replaces: ./module`. `go test ./module/pkg/patterns/…` works from the repo root because of the workspace. ## TDD record - **Define gate**: scripts/bench-check-all.sh + bench-registry.sh register two packages; threshold 10%, env-overridable; only B/op + allocs/op rows count. - **Red (no plant)**: `make bench-check` on clean main passes — exit 0, both packages report `PASS: no benchmarks regressed by more than 10% vs baseline.` (Re-run multiple times across different machine load — stays green.) - **Red→Green (planted regression)**: edited `stackID` to allocate 64 B/iter via a package-level sink (escape-analysis-routed to the heap, so allocs/op actually moved); `make bench-check` exit 2 (make wraps the inner exit 1). Output flagged `StackID-10 +800.00% (p=0.000 n=10)` on B/op and `+133.33% (p=0.000 n=10)` on allocs/op. - **Green (revert)**: plant removed; `make bench-check` exit 0; `git diff main -- components/receivers/pyspy/stackid.go` empty. ## Release notes ```release-notes ### Added - `make bench-check` is now a real perf-regression gate again (was a no-op since PR-K.2). Registry-driven via `scripts/bench-registry.sh` (single source of truth, sourced by both `bench-check-all.sh` runner and `bench-baseline.sh` regenerator); the runner gates `B/op` + `allocs/op` deltas against a committed `testdata/bench-baseline.txt` via benchstat; any row regressing past `THRESHOLD%` (default 10, env-overridable) fails. `sec/op` is deliberately not gated (wall-clock CV on dev hardware and shared CI runners routinely crosses 10% from background load alone, even when benchstat marks the delta significant). Two packages registered: `module/pkg/patterns` (M19 PodEvictedDetector budget) and `components/receivers/pyspy` (M18 ParseDump + StackID hash). Regenerate baselines with `make bench-baseline` after a vetted, intentional perf change and commit the diff. - CI `verify-static` job now runs `make bench-check` after installing a pinned benchstat. Local dev only needs benchstat when intentionally running the perf gate. ``` ## Test plan - [x] `make check` (fmt, tidy-check, lint, vet, mod-verify) — clean (re-run post-adversarial-fix) - [x] `make actionlint` — clean - [x] `make zizmor` — clean (`No findings to report. Good job!`) - [x] `make license-check` — clean - [x] `make doc-check` — clean - [x] `make bench-check` on clean main — exit 0, both packages PASS, repeated runs stay green (false-positive bug fixed) - [x] `make bench-check` with deliberate alloc-regression planted in `stackID` — exit 2, gate flags `+800% B/op` + `+133% allocs/op` - [x] `make bench-check` after revert — exit 0 again, stackid.go diff vs main is empty - [ ] CI `verify-static` runs `make bench-check` green on this PR (awaiting CI) ## Self-grading **A** — root cause named, fix scoped to missing registration without expanding scope (no new benchmark framework, no rewrite of the benchstat-comparison machinery beyond the sec/op-vs-alloc scoping fix). TDD cycle complete with a deliberate regression that the gate catches. Adversarial self-review found a real false-positive bug (sec/op gating) and fixed it in-PR before merge. Registry extracted to a single source of truth so the two consumers can't drift. CHANGELOG entry written. Not A+ because: even with sec/op excluded, the gate still depends on benchstat's text-output format remaining stable; a future major version that changes column layout could silently disable the gate (it would skip every row, exit 0, print PASS). A future iteration could parse benchstat's `-format=csv` output instead, which is its versioned machine-readable interface. Out of scope here; the current text format hasn't changed for years and matches the existing `scripts/bench-check.sh` shape that already shipped with the k8sevents baseline. --------- Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>

Tri Lam added 2 commits May 31, 2026 13:07

trilamsr enabled auto-merge (squash) May 31, 2026 20:13

trilamsr merged commit eb9f791 into main May 31, 2026
19 checks passed

trilamsr deleted the refactor/i221a-patterns-to-module branch May 31, 2026 20:21

trilamsr mentioned this pull request May 31, 2026

chore(otel): bump collector v0.110 -> v0.115 + add bump tooling (PR-1/4) #243

Merged

trilamsr mentioned this pull request May 31, 2026

ci(bench): re-baseline make bench-check + add regression gate #244

Merged

9 tasks

trilamsr mentioned this pull request May 31, 2026

PR-I.2: rankjoinprocessor + patterndetectorprocessor in module/ #221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a)#242

refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a)#242
trilamsr merged 2 commits into
mainfrom
refactor/i221a-patterns-to-module

trilamsr commented May 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

trilamsr commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What moved

Module-graph plumbing

CI / workflow path updates

Doc references updated

Commits

Out of scope (do not look for these)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

trilamsr commented May 31, 2026 •

edited

Loading