Skip to content

refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a)#242

Merged
trilamsr merged 2 commits into
mainfrom
refactor/i221a-patterns-to-module
May 31, 2026
Merged

refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a)#242
trilamsr merged 2 commits into
mainfrom
refactor/i221a-patterns-to-module

Conversation

@trilamsr

@trilamsr trilamsr commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

Mechanical reshuffle of the pattern library and replay runner from
internal/synthesis/{patterns,replay}/ into module/pkg/{patterns,replay}/
so the moat surface lives inside the same Go submodule that hosts the OTel
components PR-I.2b will introduce (rankjoinprocessor,
patterndetectorprocessor).

This is PR-I.2a per the RFC-0013 §migration L249 sub-slice; net-new
processor packages, builder-config.yaml wiring, and the
module/v0.2.0 tag bump land in PR-I.2b later.

Why

internal/synthesis/patterns/ was the only piece of the differentiator
("moat") still living in the root module's internal/ tree after the
PR-K wave deletions. Once PR-I.2b introduces the processor packages
they will import the pattern lib from module/; a cross-module import
chain that today is module/receiver/* → root/internal/synthesis/* is
the wrong direction for the post-pivot architecture (processors target
the OCB-distribution baseline at collector v0.110.0 which module/
pins; root tracks v1.59.0).

What moved

internal/synthesis/patterns/  →  module/pkg/patterns/
internal/synthesis/replay/    →  module/pkg/replay/

24 files renamed via git mv (rename similarity 93-100%), zero content
edits except:

  • 5 Go files: import-path bump
    github.com/tracecoreai/tracecore/internal/synthesis/{patterns,replay}
    …/module/pkg/{patterns,replay}.
  • 4 doc-comment path mentions inside the moved files (point at the new
    home).
  • module/pkg/replay/pod_evicted/_real_world/README.md: 3 path
    references updated.

Module-graph plumbing

  • module/go.mod gains direct requires on
    github.com/santhosh-tekuri/jsonschema/v6 v6.0.2 and
    github.com/stretchr/testify v1.9.0 (the moved test files' deps);
    module/go.sum picks up dlclark/regexp2 v1.11.0 (jsonschema's
    transitive).
  • Root go.mod flips santhosh-tekuri/jsonschema/v6 to // indirect
    (root-module Go code no longer references it directly; the only
    consumer was verdict_schema_test.go, which moved).
  • MVS-trap honored: module/go.mod's collector pins stay at
    v0.110.0 after go mod tidy. The OCB-distribution baseline pin
    from PR-I.1b is preserved — scraperhelper is still part of
    collector/receiver at v0.110.0 and would break the
    hostmetricsreceiver@v0.110.0 build if it floated forward to
    v1.59.0.

CI / workflow path updates

  • .github/workflows/bench.yml (M19 benchmark) →
    ./module/pkg/patterns/.
  • .github/workflows/chaos.yml paths-filter trigger:
    internal/synthesis/**module/pkg/patterns/** +
    module/pkg/replay/** (deliberately tighter than module/pkg/**
    so unrelated module additions, e.g. nccl_fr_parser, do not fire the
    chaos gate).
  • .github/workflows/chaos.yml M19 race-test invocation:
    go test ./internal/synthesis/...
    cd module && go test ./pkg/patterns/... ./pkg/replay/....
    The naive rewrite ./module/pkg/... does not resolve from root
    module/ is its own Go module; the root ./... glob does not
    cross into it under either GOWORK=on or GOWORK=off. Verified the
    fixed invocation passes with -race -count=1.

Doc references updated

  • docs/rfcs/0013-distro-first-pivot.md PR-I.2 paragraph amended to
    describe the PR-I.2a / PR-I.2b sub-slice.
  • docs/followups/{M19,M4b,otlphttp}.md,
    docs/research/m16-kueue-production-followups.md: forward-looking
    references repointed at module/pkg/.
  • Historical RFC + CHANGELOG entries that stamp prior receiver-era
    paths are intentionally left intact.

Commits

  1. refactor(module): relocate patterns + replay into module/pkg/ (PR-I.2a)
    — the 24-file git mv + import-path bump + go.mod/go.sum tidy +
    forward-looking doc + bench.yml + initial chaos.yml update.
  2. ci(chaos): scope path triggers + test invocation to patterns + replay
    — adversarial-review follow-up tightening chaos.yml paths-filter
    and fixing the broken root-module test invocation.

Out of scope (do not look for these)

  • No rankjoinprocessor or patterndetectorprocessor packages.
  • No builder-config.yaml processors: wiring.
  • No submodule tag bump (module/v0.2.0).
  • No M19 cross-signal join test relocation (that comes with the
    processor packages).

All four land in PR-I.2b.

Test plan

  • GOWORK=off go build ./... clean in root.
  • GOWORK=off go build ./... clean in module/.
  • cd module && go test ./pkg/patterns/... ./pkg/replay/... green
    (the rubric L405/L406/L414 gates pass at the new location).
  • cd module && go test -race -count=1 ./pkg/patterns/... ./pkg/replay/... green (the corrected chaos.yml invocation).
  • GOWORK=off go test ./... green in root (12 packages —
    internal/synthesis was the only consumer left in root
    post-PR-K).
  • make check green (fmt, tidy-check, lint, vet, mod-verify).
  • make verify green (adds license, fixtures, build-tags, RCE
    gate, register-lint, actionlint, zizmor, doc-check,
    no-autoupdate).
  • module/go.mod collector pins still at v0.110.0 (MVS floor
    preserved post-tidy).
  • No root-module Go file imports the moved packages (verified via
    grep); no tools/ or bench/ coupling.
PR-I.2a (relocate): pattern library + replay runner moved from
`internal/synthesis/{patterns,replay}/` to `module/pkg/{patterns,replay}/`
to colocate the moat surface with the Go submodule that will host the
OTel processors in PR-I.2b. Mechanical reshuffle only — no behavior
change, no API surface change, no public release artifact. Module pins
on `module/` stay at the OCB-distribution baseline (collector
v0.110.0). Partial close of #221; PR-I.2b (net-new processors +
builder-config wiring + `module/v0.2.0` tag) remains open.

Refs #221.

Tri Lam added 2 commits May 31, 2026 13:07
Mechanical reshuffle of the pattern library and replay runner from
internal/synthesis/{patterns,replay}/ to module/pkg/{patterns,replay}/
so the moat surface lives inside the same Go submodule that hosts the
OTel components PR-I.2b will introduce (rankjoinprocessor, pattern-
detectorprocessor).

No behavior change. Imports updated everywhere; doc comments + the
chaos.yml + bench.yml workflow paths follow the new home. module/go.mod
gains direct requires on santhosh-tekuri/jsonschema/v6 + stretchr/
testify (the moved test files' deps); root go.mod sheds the now-
indirect jsonschema reference. Collector pins on module/ stay at
v0.110.0 — the MVS-floor trap from PR-I.1b is honored.

RFC-0013 §migration L249 amended to describe the I.2a/I.2b split.
Forward-looking doc references under docs/followups/ + docs/research/
updated to point at module/pkg/. Historical RFC + CHANGELOG entries
that stamp prior receiver-era paths are left intact.

Partial close of #221: PR-I.2a (relocate) lands here; PR-I.2b (new
processors + builder-config wiring + module/v0.2.0 tag) remains open.

Signed-off-by: Tri Lam <tri@maydow.com>
Adversarial-review followup. Two corrections to the chaos.yml updates
the relocation commit made:

1. Path triggers `module/pkg/**` was too broad — it would fire chaos
   on unrelated module/pkg additions (e.g. the nccl_fr_parser package
   that landed in PR-I.1b). Narrowed to
   `module/pkg/patterns/**` + `module/pkg/replay/**`, matching the
   pre-move scope shape.

2. `go test ./module/pkg/...` does not actually resolve from the root
   module (module/ is its own Go module; root `./...` does not cross
   into it under either GOWORK=on or GOWORK=off). The original
   `./internal/synthesis/...` worked because synthesis was in the
   root module. Fixed by `cd module && go test ./pkg/patterns/...
   ./pkg/replay/...`, mirroring the make-ci invocation shape used by
   the nccl-fr fuzz targets.

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 20:13
@trilamsr trilamsr merged commit eb9f791 into main May 31, 2026
19 checks passed
@trilamsr trilamsr deleted the refactor/i221a-patterns-to-module branch May 31, 2026 20:21
trilamsr added a commit that referenced this pull request May 31, 2026
…/4) (#243)

## Summary

PR-1 of 4 in the OTel collector v0.110 -> v0.130 staged catch-up (#225).

- **Tooling**: new `make bump-otel VERSION=0.X.0` Makefile target.
Single-source sed rewrite across `builder-config.yaml` (16 lines),
`module/go.mod` (collector/pdata require lines), and the
`builder@v0.X.0` pin in `Makefile`. Defaults `PDATA_VERSION` to
`1.<minor-94>.0` per the upstream offset (v0.110<->v1.16,
v0.115<->v1.21, v0.130<->v1.36); override for off-cycle bumps. `go mod
tidy` is left manual so reviewers see MVS-resolved drift in the diff.
- **Bump**: collector v0.110.0 -> v0.115.0. `consumer` graduated to
v1.21.0 (v1.x train); otel libs followed upstream to v1.32.0; pdata ->
v1.21.0. Dropped two indirect entries removed in v0.115
(`component/componentprofiles`, `internal/globalsignal`).
- **Test fix**: `receivertest.NewNopSettings(...)` took no args from
v0.110 through v0.119.x; the existing
`module/receiver/ncclfrreceiver/nccl_fr_test.go:288` call passed
`componentType()` and was broken on `main` against the v0.110 pin. Drop
the bogus arg -- `set.ID = component.NewIDWithName(componentType(),
"test")` already carries the type for downstream BuildInfo derivation.
(At v0.120 the bump-back signature lands as `NewNopSettingsWithType`,
addressed in PR-2.)
- **Docs**: RFC-0013 example block bumped in lockstep. Historical
PR-I.1b narrative line left as-is (records what landed then).

## Out of scope (deferred per #225 plan)

- PR-2: v0.115 -> v0.120 + Go 1.23 floor bump (module/go.mod `go 1.22.0`
-> `go 1.23.0`).
- PR-3: v0.120 -> v0.125 + `*profiles` -> `x*` migration
(consumer/consumerprofiles -> consumer/xconsumer, etc).
- PR-4: v0.125 -> v0.130 + `TLSSetting` -> `TLS` rename across config
blocks.
- Root `go.mod` untouched -- already at v1.59.0 / v0.153.0 from a
separate code path.

## Renovate decision (per #225 ask)

The issue suggested a Renovate `regexManagers` block for
`builder-config.yaml` since Dependabot can't parse the freeform `gomod:`
lines. Decision: **skip Renovate, document the choice**. Rationale:

1. `make bump-otel` already automates the multi-file rewrite end-to-end;
Renovate's value-add would be opening a PR, but the PR body would still
need a human to run `go mod tidy` (Renovate can't, in-PR, drive a
shell-out for a non-go-module file).
2. Adding a second bot expands the toolchain footprint (renovate.json +
Mend permissions) for one file's worth of regex coverage.
3. Dependabot keeps gomod ecosystem coverage of root/module go.mod,
which is the lion's share.

Revisit if (a) Dependabot adds custom regex managers, or (b) the bump
cadence exceeds quarterly and the manual ergonomics start to bite.

## Verification

- `GOWORK=off go build ./...` at root + module/: green.
- `GOWORK=off go test ./...` in module/: green (was red pre-bump on
`main`). Post-merge of #242 (which relocated
`internal/synthesis/{patterns,replay}` ->
`module/pkg/{patterns,replay}`) the package set under verification grew
to four: `module`, `module/pkg/nccl/fr_parser`, `module/pkg/patterns`,
`module/pkg/replay`, `module/receiver/ncclfrreceiver`. All green at
v0.115.0.
- `GOWORK=off go test -race ./...` in module/: green.
- `make ci-fuzz-nccl-fr` (30s gate): PASS, 2 new corpus interesting.
- `make build` (OCB end-to-end): produces `_build/tracecore` against
`builder@v0.115.0`; `--version` prints expected.
- `docker run --rm -v $(pwd)/install/kubernetes/tracecore:/chart
alpine/helm:3.16.4 lint /chart`: 1 chart linted, 0 failed.
- `make check` (fmt + tidy-check + lint + vet + mod-verify): clean.

```release-notes
chore(otel): bump pinned collector v0.110 -> v0.115 (PR-1 of 4 in the
v0.110 -> v0.130 catch-up). Adds `make bump-otel VERSION=0.X.0` to
single-source the multi-file pin rewrite. No runtime/operator-visible
behaviour change; OCB-built tracecore binary now resolves against
collector v0.115.0 components.
```

Refs #225 (leave open; PR-2/3/4 to follow).

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
Closes #227.

## Summary

`make bench-check` has been a no-op since PR-K.2 (#217) deleted
`components/receivers/k8sevents/` — its `testdata/bench-baseline.txt`
was the sole row in the gate's `for pkg in …` loop, leaving the loop
body literally empty (`for pkg in ; do …`). The target stayed as a stub
so `make ci` automation kept a stable invocation, with a TODO to
re-register packages later. That re-registration is this PR.

## Root cause

`Makefile:71-101` (pre-diff) iterated an empty list. The wrapper around
`benchstat`+`scripts/bench-check.sh` was intact; only the *registry of
which packages to gate* had been emptied. So the fix is purely a
registration + baseline-capture exercise — no logic change to the gate's
comparison machinery, plus one scope-tightening change called out under
"Adversarial-review fix" below.

## What this PR does

1. **Registers two packages** with deterministic-allocation
micro-benchmarks for gating:
- `module/pkg/patterns` — `BenchmarkPodEvictedDetector_1kEventWindow`
(the M19 detector budget). (Path is post-PR-I.2a / #242; this branch was
rebased onto that merge during the PR.)
- `components/receivers/pyspy` — `BenchmarkParseDump` +
`BenchmarkStackID` (the M18 join-key hash + faulthandler parser).

2. **Captures committed baselines** at
`module/pkg/patterns/testdata/bench-baseline.txt` and
`components/receivers/pyspy/testdata/bench-baseline.txt`, both at
`count=10 -benchtime=500ms -benchmem`. `count=10` is the minimum that
survives benchstat's `≥6 samples for confidence interval at level 0.95`
warning with headroom for one outlier.

3. **Extracts the registry to `scripts/bench-registry.sh`** sourced by
both `scripts/bench-check-all.sh` (gate runner, wired to `make
bench-check`) and `scripts/bench-baseline.sh` (regenerator, wired to
`make bench-baseline`). Single source of truth — drift between "what we
gate" and "what we regenerate" can no longer happen.

Why scripts and not an inline Make loop: the bench regex
`^Benchmark(ParseDump|StackID)$` contains parentheses that the
Make-invoked `/bin/sh` tokenises as subshell metacharacters. Quoting
under `make` shellquote escaping is more fragile than just punting to
plain shell.

4. **Restricts the gate to `B/op` + `allocs/op` tables** in
`scripts/bench-check.sh`; `sec/op` is no longer gated. See
"Adversarial-review fix" below.

5. **Wires the gate into CI** as a new step in
`.github/workflows/ci.yml`'s `verify-static` job, after `make build`. A
pinned `go install golang.org/x/perf/cmd/benchstat@…` step installs the
tool on the runner.

## Adversarial-review fix (mid-PR)

The first commit gated sec/op too, with the design note claiming pure
wall-clock jitter would show as benchstat's `~` (non-significant) and
never reach the `+NN%` parser. **That's false.** Repeating `make
bench-check` immediately after a baseline capture, with identical code
on identical hardware, fired:

```
PodEvictedDetector_1kEventWindow-10  sec/op  +16.50% (p=0.001 n=10)
PodEvictedDetector_1kEventWindow-10  B/op    +0.01% (p=0.005 n=10)
PodEvictedDetector_1kEventWindow-10  allocs/op  ~ (p=1.000 n=10)
```

The 16.5% delta was background-load drift between two runs minutes
apart. benchstat marks it significant because the CV is small (~5-10%)
and the means are reliably different — both true symptoms of *real* but
*uninteresting* wall-clock variance. Gating on sec/op would produce
continuous false fires across the team without catching anything code
review can act on.

**Fix**: `scripts/bench-check.sh` now tracks the active benchstat table
by indented header line (`sec/op` / `B/op` / `allocs/op`) and only
counts `+NN%` deltas under `B/op` + `allocs/op`. Both stay pinned to 0%
CV across runs (deterministic Go allocator + identical bytes-allocated
per code path) and only move when the code does. `make bench-check` on
clean main now passes consistently across runs; the TDD planted
regression (heap-allocating sink in `stackID`) still trips the gate at
`+800% B/op` + `+133% allocs/op`.

## Design notes

- **Why these two benches and not
bench/overhead/nccl_fr_bench_test.go**: that benchmark replays 1 GiB of
fixtures per iter (~90s on M1 Max) and *already* self-asserts on
`HeapAlloc` delta against the NORTHSTARS O2 ceiling. Adding it to
`bench-check` would push CI past the budget and produce nothing the
in-bench assertion doesn't already gate. Stays advisory.
- **Hardware skew is irrelevant for B/op + allocs/op**: baselines were
captured on Apple M1 Max; CI runs on `ubuntu-latest` (x86_64). The Go
allocator is deterministic across GOARCH for a given code path —
`b.ReportAllocs()` returns identical numbers on both. The historical
k8sevents baseline lived under the same cross-arch arrangement.
- **`make ci` unchanged**: `bench-check` is NOT added to the `make ci`
recipe. That would force every contributor to install benchstat. CI
installs benchstat itself in the workflow; local dev runs `make
bench-check` only when intentionally vetting a perf change.
- **`module/` submodule resolution**: `module/pkg/patterns` lives in the
in-repo Go submodule (`module/go.mod`); local dev runs resolve through
`go.work` and OCB through `builder-config.yaml`'s `replaces: ./module`.
`go test ./module/pkg/patterns/…` works from the repo root because of
the workspace.

## TDD record

- **Define gate**: scripts/bench-check-all.sh + bench-registry.sh
register two packages; threshold 10%, env-overridable; only B/op +
allocs/op rows count.
- **Red (no plant)**: `make bench-check` on clean main passes — exit 0,
both packages report `PASS: no benchmarks regressed by more than 10% vs
baseline.` (Re-run multiple times across different machine load — stays
green.)
- **Red→Green (planted regression)**: edited `stackID` to allocate 64
B/iter via a package-level sink (escape-analysis-routed to the heap, so
allocs/op actually moved); `make bench-check` exit 2 (make wraps the
inner exit 1). Output flagged `StackID-10 +800.00% (p=0.000 n=10)` on
B/op and `+133.33% (p=0.000 n=10)` on allocs/op.
- **Green (revert)**: plant removed; `make bench-check` exit 0; `git
diff main -- components/receivers/pyspy/stackid.go` empty.

## Release notes

```release-notes
### Added
- `make bench-check` is now a real perf-regression gate again (was a no-op since PR-K.2). Registry-driven via `scripts/bench-registry.sh` (single source of truth, sourced by both `bench-check-all.sh` runner and `bench-baseline.sh` regenerator); the runner gates `B/op` + `allocs/op` deltas against a committed `testdata/bench-baseline.txt` via benchstat; any row regressing past `THRESHOLD%` (default 10, env-overridable) fails. `sec/op` is deliberately not gated (wall-clock CV on dev hardware and shared CI runners routinely crosses 10% from background load alone, even when benchstat marks the delta significant). Two packages registered: `module/pkg/patterns` (M19 PodEvictedDetector budget) and `components/receivers/pyspy` (M18 ParseDump + StackID hash). Regenerate baselines with `make bench-baseline` after a vetted, intentional perf change and commit the diff.
- CI `verify-static` job now runs `make bench-check` after installing a pinned benchstat. Local dev only needs benchstat when intentionally running the perf gate.
```

## Test plan

- [x] `make check` (fmt, tidy-check, lint, vet, mod-verify) — clean
(re-run post-adversarial-fix)
- [x] `make actionlint` — clean
- [x] `make zizmor` — clean (`No findings to report. Good job!`)
- [x] `make license-check` — clean
- [x] `make doc-check` — clean
- [x] `make bench-check` on clean main — exit 0, both packages PASS,
repeated runs stay green (false-positive bug fixed)
- [x] `make bench-check` with deliberate alloc-regression planted in
`stackID` — exit 2, gate flags `+800% B/op` + `+133% allocs/op`
- [x] `make bench-check` after revert — exit 0 again, stackid.go diff vs
main is empty
- [ ] CI `verify-static` runs `make bench-check` green on this PR
(awaiting CI)

## Self-grading

**A** — root cause named, fix scoped to missing registration without
expanding scope (no new benchmark framework, no rewrite of the
benchstat-comparison machinery beyond the sec/op-vs-alloc scoping fix).
TDD cycle complete with a deliberate regression that the gate catches.
Adversarial self-review found a real false-positive bug (sec/op gating)
and fixed it in-PR before merge. Registry extracted to a single source
of truth so the two consumers can't drift. CHANGELOG entry written.

Not A+ because: even with sec/op excluded, the gate still depends on
benchstat's text-output format remaining stable; a future major version
that changes column layout could silently disable the gate (it would
skip every row, exit 0, print PASS). A future iteration could parse
benchstat's `-format=csv` output instead, which is its versioned
machine-readable interface. Out of scope here; the current text format
hasn't changed for years and matches the existing
`scripts/bench-check.sh` shape that already shipped with the k8sevents
baseline.

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant