diff --git a/CHANGELOG.md b/CHANGELOG.md index 684f0477..d9fb8771 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,10 +6,11 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-first-pivot.md))** - binary now assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`) containing only the moat (NCCL FlightRecorder receiver, OTTL processors with windowed semantics, pattern detectors). The M1 in-tree pipeline runtime + factory-based assembly is queued for deletion at v0.1.0 in favor of the OCB-generated boot path; the canonical `clockreceiver` + `stdoutexporter` examples ship for one PR cycle and then exit. Targeting v0.1.0 / v0.2.0 / v0.3.0 release boundaries per RFC-0013 §4. -Pivot landed across three waves of PRs: +Pivot landed across four waves of PRs: - Wave 1 (#166 RFC doc accepted, #168 delete kueue + kineto receivers, #169 pre-PR-A drift sweep + Helm security tighten, #170 containerstdout deletion explicit in §7, #171 PR-A OCB skeleton + `builder-config.yaml` + `make build-ocb`, #172 dedup gate execution, #173 rename check tiers + add PR body-artifact guard, #174 PR-C release pipeline → goreleaser stack + RFC supersession + top-level doc alignment, #175 wave-1 self-review fixes + delete archive folder). - Wave 2 (#176 PR-D image build → ko + `_build/` walker fix + PR-B reframe as side-effect of binary swap, #177 build-ocb CI gate, #178 post-wave-2 drift sweep, #179 v0.1→v0.2 migration guide skeleton). -- Wave 3 (PR-E: bench heartbeat swap `clockreceiver` → `hostmetricsreceiver`; PR-J: ship the four receiver-side recipes that replace the deleted in-tree receivers — filelog+container, journald+filelog+OTTL, k8sobjects+transform, prometheusreceiver). +- Wave 3 — PR-B1-shape sibling ports (each receiver/exporter gets its own `selftel.go` + `lifecycle.go` siblings off `internal/selftelemetry` + `internal/runtime/lifecycle`) plus support infra: #180 PR-E bench heartbeat swap `clockreceiver` → `hostmetricsreceiver`; #181 RFC-0013 §migration rescope (PR-A2/B1/B2/I.1/I.2 sub-sequencing + in-repo submodule); #182 PR-G RFC-0004 archive + stale-path sweep; #183 PR-H PRINCIPLES + CONTRIBUTING pivot alignment; #184 nccl_fr (PR-B1 canonical); #185 clockreceiver; #186 stdoutexporter; #187 kernelevents; #188 dcgm; #189 PR-A2 OCB-generated `main` swap + delete `cmd/tracecore/` tree; #190 install-bench → OCB binary; #191 PR-L migration-guide body; #192 doc-rot sweep post-A2; #193 otlphttp; #194 pyspy; #196 k8sevents. +- Wave 4 — PR-B2-shape sibling ports (mechanical import swap to upstream `go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`; lands together with #195 PR-J recipes, #199 RFC §migration amendment for PR-I/K sub-slicing, #200 PR-N security-posture migration, #198 lint concurrency fix, #210 TOCTOU race-window test hardening): #195 PR-J four upstream-receiver recipes; #197 PR-F precursor containerstdout off selftel+lc; #198 golangci-lint stale-PID fix; #199 RFC §migration PR-I/K amendment + PR-B2 gate; #200 PR-N pyspy capability-surface guide; #201 PR-B2 nccl_fr upstream (canonical); #202 stdoutexporter upstream; #203 pyspy upstream; #206 PR-F.1 (delete `internal/{selftelemetry,telemetry}` + `components/receivers/dcgm/` + `pkg/dcgm/`); #208 kernelevents upstream; #209 containerstdout upstream; #210 lifecycle TOCTOU concurrent-Add race hardening (kernelevents + k8sevents). Remaining open: #204 (k8sevents upstream), #205 (PR-B3 clockreceiver upstream), #207 (otlphttp upstream) — gate for PR-F.2. **PR-F.1 landed: `components/receivers/dcgm/` + `pkg/dcgm/` + `internal/selftelemetry/` + `internal/telemetry/` deleted; one orphan clockreceiver integration test deleted.** Net deletion across the four moats RFC-0013 §migration step 8 promised. Deletes: - `components/receivers/dcgm/` + `pkg/dcgm/` — cgo stub never shipped real code; live ports removed in #188's PR-B2-shaped dcgm sweep; kueue + kineto already deleted in #168. @@ -17,7 +18,7 @@ Pivot landed across three waves of PRs: - `internal/telemetry/` — was the in-tree `MeterProvider` + probe-server (`/metrics`, `/healthz`, `/readyz`) wrapper. Probes now flow through the upstream `healthcheckextension`; meter-provider is upstream `service.telemetry`. Only remaining consumers were `internal/selftelemetry/*_test.go` (deleted together with selftelemetry) and one orphan clockreceiver integration test. - `components/receivers/clockreceiver/errors_integration_test.go` — orphan integration test from #185's PR-B1 clockreceiver port; bootstrapped via the now-deleted `selftelemetry.Receiver` interface but never migrated to the receiver-scoped sibling `selftel.go`. The covered behaviour ("errors_total surfaces on downstream failure") is now exercised through clockreceiver's sibling tests. -PR-F.2 (deferred): `internal/componentstatus/` (5-line `ReportStatus` free function) travels with `internal/pipeline` — its only non-test consumers are `internal/pipeline/runtime_test.go` + `internal/pipeline/pipelinetest/fixture_test.go`. Deletion lands when pipeline migrates to upstream `go.opentelemetry.io/collector/component/componentstatus`. +PR-F.2 (deferred — pending three open ports): Delete `internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}`. Gated on the last three pipeline+consumer-importing receivers landing — k8sevents (#204), clockreceiver (#205), otlphttp (#207) — all three open as of this entry, all three following the PR-B2 (#201) shape. Once they merge, the entire `internal/*` runtime bundle has zero non-test consumers and drops in a single cut. The `clockreceiver` source deletion stays in PR-K (chart + values-keys deprecation cycle) — PR-F.2 only deletes `internal/*` packages, not the canonical-example receivers themselves. Build-tag `dcgm` retired (`make build-tags` no longer vets `-tags dcgm`). `make bench-check` loop drops both deleted package rows (dcgm + internal/telemetry). `scripts/register-lint.sh` allowlist emptied (the two `internal/telemetry/{build_info,slo}.go` entries are gone with the package). Chart `receivers.dcgm` toggle + `_helpers.tpl` doc-list + `NOTES.txt` warning retained until PR-K removes them outright (toggle is already inert — operators enabling `receivers.dcgm.enabled=true` have crashed at boot since PR-A2). `internal/runtime/lifecycle/` doc-comment updated. `docs/FAILURE-MODES.md` self-tel-surface rows rewired to upstream-delegated wording. `docs/patterns/{README,pattern-{1,3,4,5}}.md` replay-test pointers updated. diff --git a/MILESTONES.md b/MILESTONES.md index 4de8dff9..5a6d7b8d 100644 --- a/MILESTONES.md +++ b/MILESTONES.md @@ -108,7 +108,7 @@ Every milestone, in every lane, satisfies all seven principles below. Depth live ### M1. Pipeline runtime & component contract - **Status:** ☑ delivered (PRs #12 + #13) -- **Status (RFC-0013):** DELETED at v0.1.0 (pipeline boot path) - replaced by OCB-generated `main.go` from `builder-config.yaml`. `internal/pipeline/`, `internal/pipelinebuilder/`, `internal/config/`, `internal/consumer/`, `internal/fanout/` audited and folded into the upstream lifecycle per RFC-0013 §7 (kept only if a custom receiver/processor depends on a non-replaceable abstraction). `internal/runtime/lifecycle/` deletes at v0.2.0 with its last consumer. The bundled `components/receivers/clockreceiver/` and `components/exporters/stdoutexporter/` canonical examples are queued for deletion at v0.1.0; `clockreceiver` replaced by `telemetrygeneratorreceiver`. +- **Status (RFC-0013):** DELETED at v0.1.0 (pipeline boot path) - replaced by OCB-generated `main.go` from `builder-config.yaml`. **PR-A2 landed (#189)**: `cmd/tracecore/` deleted (3,032 LOC across 14 source + 7 test files); the OCB binary at `./_build/tracecore` is the canonical entry point. **PR-F.1 landed (#206)**: `internal/selftelemetry/` + `internal/telemetry/` deleted; every receiver/exporter now travels its own `selftel.go` + `lifecycle.go` siblings (PR-B1-shape sibling ports: #184/#185/#186/#187/#188/#193/#194/#196/#197). **PR-F.2 deferred**: `internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}` drop together once the last three pipeline+consumer-importing receivers land (#204 k8sevents, #205 clockreceiver, #207 otlphttp — all PR-B2-shape ports off canonical #201). `internal/config/` retained (still load-bearing for `tracecore validate`). The bundled `components/receivers/clockreceiver/` and `components/exporters/stdoutexporter/` canonical examples are queued for deletion at v0.2.0 (PR-K.2); `clockreceiver` replaced by `hostmetricsreceiver` (loadscraper @ 1s) per PR-E unblocking (#180) — the originally-planned `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib at any tag. - **Depends on:** none (foundational) - **Reference:** [RFC-0003](docs/rfcs/0003-pipeline-runtime-and-component-contract.md). Contract documented in [`internal/pipeline/README.md`](internal/pipeline/README.md). @@ -126,7 +126,7 @@ Every milestone, in every lane, satisfies all seven principles below. Depth live ### M2. Self-telemetry surface - **Status:** ☑ delivered (PR #17) -- **Status (RFC-0013):** DELETED at v0.1.0 - replaced by upstream `go.opentelemetry.io/collector/component/componentstatus` + `service/telemetry` + standard `otelcol_*` metrics. `internal/componentstatus`, `internal/selftelemetry`, `internal/telemetry` removed per RFC-0013 §7. M2 carry-forward divergence list closes via adoption, not via further in-tree work. +- **Status (RFC-0013):** DELETED at v0.1.0 - replaced by upstream `go.opentelemetry.io/collector/component/componentstatus` + `service/telemetry` + standard `otelcol_*` metrics. **PR-F.1 landed (#206)**: `internal/selftelemetry/` + `internal/telemetry/` deleted; probes flow through upstream `healthcheckextension`, meter-provider via upstream `service.telemetry`. **PR-F.2 deferred**: `internal/componentstatus` deletes alongside `internal/pipeline` (its only non-test consumers) once the last three PR-B2-shape ports land (#204 / #205 / #207). M2 carry-forward divergence list closes via adoption, not via further in-tree work. - **Depends on:** M1 - **Reference:** [RFC-0006](docs/rfcs/0006-self-telemetry-surface.md). - **Carry-forward:** see [`docs/followups/M2.md`](docs/followups/M2.md) (pprof endpoint, queue impl, restart mechanism, OTLP push reader, MetricsLevel knob, histogram tuning, per-role CreateSettings split, TracerProvider field). @@ -587,7 +587,7 @@ Lane 6 covers NVIDIA-side device telemetry (DCGM), NCCL collective diagnostics ( ### M8. DCGM receiver - cgo client + hardware integration (carry-forward) - **Status:** ⧗ (alpha scaffold shipped in PR #18; cgo client + hardware integration carry-forward pending) -- **Status (RFC-0013):** DELETED at v0.1.0 - cgo client path never shipped, and the stub adds no value once `dcgm-exporter` + `prometheusreceiver` is the supported path. NVIDIA's 1st-party `dcgm-exporter` covers every metric in the RFC-0005 set; cross-vendor `gpu.vendor` resource attribute lands via OTTL transform over Prometheus output (RFC-0013 §3, upstream-contribution target to OTel `hw.*` semconv per §5). Replacement applies to AMD (`ROCm/device-metrics-exporter`), Intel (`intel/xpumanager`), and Habana (Habana Prometheus Metric Exporter) on the same recipe shape. +- **Status (RFC-0013):** DELETED at v0.1.0 — **landed in PR-F.1 (#206)**: `components/receivers/dcgm/` + `pkg/dcgm/` removed (cgo client path never shipped real code; live ports removed in #188's PR-B2-shaped dcgm sweep). Replaced by `dcgm-exporter` + `prometheusreceiver` per `docs/integrations/prometheus-scrape.md` (PR-J, #195). NVIDIA's 1st-party `dcgm-exporter` covers every metric in the RFC-0005 set; cross-vendor `gpu.vendor` resource attribute lands via OTTL transform over Prometheus output (RFC-0013 §3, upstream-contribution target to OTel `hw.*` semconv per §5). Replacement applies to AMD (`ROCm/device-metrics-exporter`), Intel (`intel/xpumanager`), and Habana (Habana Prometheus Metric Exporter) on the same recipe shape. Chart `receivers.dcgm` toggle + `_helpers.tpl` doc-list + `NOTES.txt` warning retained until PR-K.3 (toggle is inert post-PR-A2: enabling it crashes the OCB binary at boot with "unknown factory"). - **Depends on:** M1 - **Reference:** [RFC-0005](docs/rfcs/0005-dcgm-receiver-scope.md) - **Hardware:** Linux + NVIDIA GPU host with `nv-hostengine` reachable; driver R580 LTSB + DCGM 4.4.x reference (per [endoflife.date/nvidia](https://endoflife.date/nvidia) - R580 active support ends 2026-08-04, refresh LTSB pin within Q3 2026; DCGM 4.4.2 is current core release per [NVIDIA/DCGM tags](https://github.com/NVIDIA/DCGM/tags)) diff --git a/docs/migration/v0.1-to-v0.2.md b/docs/migration/v0.1-to-v0.2.md index be376880..f749fad4 100644 --- a/docs/migration/v0.1-to-v0.2.md +++ b/docs/migration/v0.1-to-v0.2.md @@ -130,7 +130,7 @@ CI workflows changed path triggers from `cmd/tracecore/**` to `builder-config.ya ## `internal/*` package deletion (PR-F) -> **Status:** PR-F not yet open at the time of this guide's first publish. The packages listed below are still present in v0.2.0 RC builds and will be deleted in PR-F before v0.2.0 GA. +> **Status:** PR-F.1 landed (#206) — `internal/selftelemetry/` and `internal/telemetry/` are already gone in current main. PR-F.2 (deletes `internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}`) is gated on three open ports: #204 (k8sevents), #205 (clockreceiver), #207 (otlphttp). Once those land, the remaining `internal/*` runtime packages drop in a single cut before v0.2.0 GA. Several internal Go packages were load-bearing only for the deleted `cmd/tracecore` boot path and the in-tree receivers/exporters. Third-party Go importers (unlikely in OSS pre-1.0; the packages live under `internal/` and the Go compiler rejects external imports) lose: @@ -199,8 +199,16 @@ Charts are pinned by `--version` (the chart-package version from `Chart.yaml`), ## Open items (fill in as PRs land) -- [ ] PR-I (in-repo Go submodule extraction at `module/`) — link -- [x] PR-J (ship recipes for filelog + journald + k8sobjects + prometheus) — [`docs/integrations/{filelog-container,journald-kernel,k8sobjects-events,prometheus-scrape}.md`](../integrations/) -- [ ] PR-K (delete in-tree receivers) — link -- [ ] PR-L (this guide, full body) — link -- [ ] PR-E unblocking decision (heartbeat replacement) — link +- [x] PR-A2 (switch entrypoint to OCB-generated main + delete `cmd/tracecore/`) — [#189](https://github.com/TraceCoreAI/tracecore/pull/189) +- [x] PR-E (heartbeat replacement decision) — `hostmetricsreceiver` (loadscraper @ 1s); [#180](https://github.com/TraceCoreAI/tracecore/pull/180). The originally-planned `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib (contrib issues #41687 + #43657 both closed `not_planned`). +- [x] PR-F.1 (delete `internal/{selftelemetry,telemetry}` + `components/receivers/dcgm/` + `pkg/dcgm/`) — [#206](https://github.com/TraceCoreAI/tracecore/pull/206) +- [ ] PR-F.2 (delete `internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}`) — gated on #204 / #205 / #207 +- [ ] PR-I.1a (scaffold `module/` Go submodule + `go.work` + `replaces:`) — in flight +- [ ] PR-I.1b (`git mv` nccl_fr → `module/receiver/ncclfrreceiver`) — gate satisfied by [#201](https://github.com/TraceCoreAI/tracecore/pull/201); waits on PR-I.1a +- [ ] PR-I.2 (`rankjoinprocessor` + `patterndetectorprocessor`) — gated on PR-K.1 +- [x] PR-J (ship recipes for filelog + journald + k8sobjects + prometheus) — [`docs/integrations/{filelog-container,journald-kernel,k8sobjects-events,prometheus-scrape}.md`](../integrations/); [#195](https://github.com/TraceCoreAI/tracecore/pull/195) +- [ ] PR-K.1 (sever patterns-lib k8sevents dep) — in flight +- [ ] PR-K.2 (delete in-tree receivers + migrate ~86 test fixtures + delete `tools/failure-inject/xidgen/`) — link +- [ ] PR-K.3 (chart cleanup + values-keys `NOTES.txt` deprecation + values delete after one-minor window) — link +- [x] PR-L (this guide, skeleton + body) — skeleton [#179](https://github.com/TraceCoreAI/tracecore/pull/179), body [#191](https://github.com/TraceCoreAI/tracecore/pull/191); living document +- [x] PR-N (pyspy capability-surface security note, ahead of v0.3.0) — [`docs/migration/v0.2-to-v0.3.md`](v0.2-to-v0.3.md); [#200](https://github.com/TraceCoreAI/tracecore/pull/200) diff --git a/docs/rfcs/0013-distro-first-pivot.md b/docs/rfcs/0013-distro-first-pivot.md index 4b5ec962..f6420646 100644 --- a/docs/rfcs/0013-distro-first-pivot.md +++ b/docs/rfcs/0013-distro-first-pivot.md @@ -229,36 +229,36 @@ Release-boundary schedule in §4. PR sequencing follows. PR sequencing within v0.1.0: 1. **PR-A1** (landed, #171): Add `builder-config.yaml`. `make build-ocb` produces `_build/tracecore` via OCB side-by-side with the legacy `cmd/tracecore` binary. -2. **PR-A2** (landed, 2026-05-30): Switched `cmd/tracecore` to the OCB-generated main. Deleted the entire `cmd/tracecore/` tree (3,032 LOC across 14 source + 7 test files) — the OCB binary at `./_build/tracecore` is the canonical entry point. All receivers register through OCB's generated `otelcol.Factories`, not the hand-rolled `cmd/tracecore/components.go`. Also deleted: `components.yaml` (superseded by `builder-config.yaml`), `tools/components-gen/` (codegen no longer needed), `Makefile` `generate` + `generate-check` + `run` targets, internal-version `-ldflags -X` injection (OCB main reads version from `builder-config.yaml` `dist.version`). Goreleaser switched to `builder: prebuilt` against `./_build/{Os}-{Arch}/tracecore`; ko-publish runs from inside `./_build/` against the OCB-generated submodule; Chart Dockerfile builds via `make`; daemonset args dropped the `collect` subcommand. Scope deferrals: the legacy `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout,dcgm,pyspy,nccl_fr}` packages and `internal/{pipeline,selftelemetry,telemetry,componentstatus,pipelinebuilder,consumer,fanout,runtime}` packages survive as orphan-but-compiling code until PR-F (deletes internal/*) and PR-K (deletes in-tree receivers + migrates chart shape). The chart's `tracecore validate` CI gate is temporarily disabled — the chart's `renderedConfig` template still emits the legacy `telemetry:` top-level key and references `clockreceiver`/`pyspy`/`containerstdout`, none of which the OCB binary recognises; PR-K reinstates the gate after the chart shape migrates to upstream `service.telemetry` + OCB-supported receivers. The `failure-inject` subcommand surface was already duplicated in the standalone `tools/failure-inject/` binary; `debug dump` and `receivers list` are replaced by upstream `tracecore validate` / `tracecore components`. **Sequencing gate satisfied**: PR-B2 / PR-F / PR-I are unblocked. +2. **PR-A2** (landed, #189, 2026-05-30): Switched `cmd/tracecore` to the OCB-generated main. Deleted the entire `cmd/tracecore/` tree (3,032 LOC across 14 source + 7 test files) — the OCB binary at `./_build/tracecore` is the canonical entry point. All receivers register through OCB's generated `otelcol.Factories`, not the hand-rolled `cmd/tracecore/components.go`. Also deleted: `components.yaml` (superseded by `builder-config.yaml`), `tools/components-gen/` (codegen no longer needed), `Makefile` `generate` + `generate-check` + `run` targets, internal-version `-ldflags -X` injection (OCB main reads version from `builder-config.yaml` `dist.version`). Goreleaser switched to `builder: prebuilt` against `./_build/{Os}-{Arch}/tracecore`; ko-publish runs from inside `./_build/` against the OCB-generated submodule; Chart Dockerfile builds via `make`; daemonset args dropped the `collect` subcommand. Scope deferrals: the legacy `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout,dcgm,pyspy,nccl_fr}` packages and `internal/{pipeline,selftelemetry,telemetry,componentstatus,pipelinebuilder,consumer,fanout,runtime}` packages survive as orphan-but-compiling code until PR-F (deletes internal/*) and PR-K (deletes in-tree receivers + migrates chart shape). The chart's `tracecore validate` CI gate is temporarily disabled — the chart's `renderedConfig` template still emits the legacy `telemetry:` top-level key and references `clockreceiver`/`pyspy`/`containerstdout`, none of which the OCB binary recognises; PR-K reinstates the gate after the chart shape migrates to upstream `service.telemetry` + OCB-supported receivers. The `failure-inject` subcommand surface was already duplicated in the standalone `tools/failure-inject/` binary; `debug dump` and `receivers list` are replaced by upstream `tracecore validate` / `tracecore components`. **Sequencing gate satisfied**: PR-B2 / PR-F / PR-I are unblocked. 3. **PR-B1**: Port `components/receivers/nccl_fr` off `internal/selftelemetry` and `internal/runtime/lifecycle`. Helpers travel with the receiver as unexported `selftel.go` + `lifecycle.go` siblings (slimmed of multi-receiver indirection — drop the noop type and the Kind canonical-set registry; keep the 8-instrument MeterProvider pattern). **Metric namespace:** helpers acquire the meter via `set.TelemetrySettings.MeterProvider.Meter("github.com/tracecoreai/tracecore/components/receivers/nccl_fr")`, NOT via the global `service/telemetry` provider — instrument names use the `otelcol_receiver_ncclfr_*` shape (matching upstream `receiver/scraperhelper`'s `otelcol_receiver__*` convention where `` is the OCB receiver package name without underscores, i.e. `ncclfr` per the `nccl_fr → ncclfr` rename tracked in project memory). Receiver-scoped meter cannot collide with the pipeline-runtime's own `otelcol_*` namespace. No second rename in PR-I; helper-emitted metrics are byte-identical pre/post extraction. **Unblocks** PR-F's `internal/selftelemetry` delete (when paired with the other receivers' ports / deletions). -4. **PR-B2** (lands with or after PR-A2): Port `components/receivers/nccl_fr` off `internal/{pipeline,consumer,runtime/lifecycle}` to upstream `go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`. Mechanical import swap (~60 LOC) but requires the binary boot path to already be OCB-driven; otherwise nccl_fr has nowhere to register. **Hard gate for PR-I.1b**: until nccl_fr depends only on upstream packages, the `git mv` into `module/receiver/ncclfrreceiver/` would drag `internal/` paths into the submodule and break the `module/go.mod` boundary. PR-B1 only severed `selftelemetry` + the meter-provider half of `runtime/lifecycle`; PR-B2 finishes the lifecycle helpers and the pipeline/consumer ports. **Slug-collision note**: merged commit #188 (`feat(pivot): PR-B2 — port dcgm off internal selftel + lifecycle`) reused the `PR-B2` label for the dcgm replication of the PR-B1 shape; that work is a PR-B1-style port for a different receiver, not the canonical PR-B2 defined here. The dcgm package is deleted entirely in PR-F (cgo stub never shipped), so the dcgm "PR-B2" was a no-regression sweep against the deprecated package and does not satisfy this slot. +4. **PR-B2** (landed, #201; sibling-receiver ports #202/#203/#208/#209 followed the same shape): Port `components/receivers/nccl_fr` off `internal/{pipeline,consumer,runtime/lifecycle}` to upstream `go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`. Mechanical import swap (~60 LOC) but requires the binary boot path to already be OCB-driven; otherwise nccl_fr has nowhere to register. **Hard gate for PR-I.1b satisfied**: `grep -r 'internal/(pipeline|consumer|runtime/lifecycle)' components/receivers/nccl_fr/` returns zero hits post-merge. PR-B1 only severed `selftelemetry` + the meter-provider half of `runtime/lifecycle`; PR-B2 finishes the lifecycle helpers and the pipeline/consumer ports. **Wave-3-followup ports** (same shape, different receivers): stdoutexporter (#202), pyspy (#203), kernelevents (#208), containerstdout (#209). The remaining importers of `internal/pipeline` + `internal/consumer` (clockreceiver, k8sevents, otlphttp) have ports in flight as separate PRs; their RFC role is wave cleanup, not a sequencing gate — `internal/pipeline` only deletes after PR-K.2 retires the receivers that lean on it. **Slug-collision note**: merged commit #188 (`feat(pivot): PR-B2 — port dcgm off internal selftel + lifecycle`) reused the `PR-B2` label for the dcgm replication of the PR-B1 shape; that work is a PR-B1-style port for a different receiver, not the canonical PR-B2 defined here. The dcgm package was deleted entirely in PR-F.1 (cgo stub never shipped), so the dcgm "PR-B2" was a no-regression sweep against the deprecated package and does not satisfy this slot. 5. **PR-C** (landed, #174): Release pipeline switches to goreleaser stack. Old `release.yml` archived under `.github/workflows/archived/`. 6. **PR-D** (landed, #176): Image build moves to `ko`. Chart `image.repository` continues to resolve. 7. **PR-E** (landed, #180): `clockreceiver` → `hostmetricsreceiver` (loadscraper @ 1s) in OCB manifest + bench-install Helm values. The originally-planned `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib at any tag (verified 2026-05-30; contrib issues #41687 and #43657 both closed `not_planned`). hostmetrics' loadscraper emits 3 low-cardinality series (`system.cpu.load_average.{1m,5m,15m}`) and satisfies the bench's "first parseable JSON line at sink" pass condition. Scope deferral: chart default stays `clockreceiver` and the in-tree source survives this PR (~92 references across `cmd/tracecore/*_test.go` + `internal/pipeline` + `internal/selftelemetry` fixtures); chart-default flip + source deletion ship as part of PR-K alongside coordinated test-fixture migration and the values-keys `NOTES.txt` deprecation cycle. 8. **PR-F** (lands after PR-A2 + PR-B1 + the wave-3 sibling-port PRs) splits into PR-F.1 + PR-F.2 to match the actual import-graph state — `internal/componentstatus` cannot be deleted until `internal/pipeline` migrates off it, and `internal/pipeline` is explicitly out of PR-F's scope. - **PR-F.1** (landed): Delete `components/receivers/dcgm/` + `pkg/dcgm/` (cgo stub never shipped real code; live ports removed in #188's PR-B2-shaped dcgm sweep; kueue + kineto already deleted in #168). Delete `internal/selftelemetry/` — every consumer (containerstdout, clockreceiver, kernelevents, k8sevents, nccl_fr, dcgm, pyspy, stdoutexporter, otlphttp) ported onto receiver/exporter-scoped sibling `selftel.go` files in wave-3 of the pivot (#184/#185/#186/#187/#188/#193/#194/#196/#197). Delete `internal/telemetry/` — was the in-tree `MeterProvider` + probe-server (`/metrics`, `/healthz`, `/readyz`) wrapper; probes now flow through the upstream `healthcheckextension`, meter-provider via upstream `service.telemetry`; only remaining consumers were `internal/selftelemetry/*_test.go` (deleted in the same cut) and one orphan clockreceiver integration test (`components/receivers/clockreceiver/errors_integration_test.go`, left dangling by #185's PR-B1 clockreceiver port) which is deleted too. Retire the `dcgm` build tag; drop both deleted package rows from `make bench-check`; empty the `register-lint` allowlist (the two `internal/telemetry/{build_info,slo}.go` entries are gone). Chart `receivers.dcgm` toggle + `_helpers.tpl` doc-list + `NOTES.txt` warning retained until PR-K removes them outright (toggle is already inert — operators enabling `receivers.dcgm.enabled=true` have crashed at boot since PR-A2). - - **PR-F.2** (deferred): Delete `internal/componentstatus`. Hard-gated on `internal/pipeline` migrating to upstream `go.opentelemetry.io/collector/component/componentstatus` — `componentstatus` is logically a sibling of `pipeline` (its only non-test consumers are `internal/pipeline/runtime_test.go` + `internal/pipeline/pipelinetest/fixture_test.go`) and `internal/pipeline` is out of PR-F's scope per RFC line 240's original framing. `clockreceiver` source deletion deferred to PR-K — see PR-E note for rationale. -9. **PR-G**: Supersede RFCs (add status headers + redirects). Move RFC-0004 to `archived/`. -10. **PR-H**: Update top-level docs (README, NORTHSTARS, STRATEGY, PRINCIPLES, MILESTONES, CHANGELOG, CONTRIBUTING, AGENTS, docs/README). + - **PR-F.2** (deferred — pending wave-3-followup completion): Delete `internal/componentstatus`, `internal/pipeline`, `internal/pipelinebuilder`, `internal/consumer`, `internal/fanout`, `internal/runtime/lifecycle`. Hard-gated on the remaining three receiver/exporter ports landing (clockreceiver / k8sevents / otlphttp pipeline+consumer ports are in flight as separate agents; the PR-B2-shape templates exist as PRs #205 / #204 / #207). Once those land, `internal/{pipeline,consumer,runtime/lifecycle,componentstatus,pipelinebuilder,fanout}` have zero non-test consumers and can drop in a single cut. The `clockreceiver` source deletion stays in PR-K (chart + values-keys deprecation cycle) — PR-F.2 only deletes the `internal/*` packages, not the canonical-example receivers themselves. See PR-E note for the chart-default flip rationale. +9. **PR-G** (landed, #182): Supersede RFCs (add status headers + redirects). Move RFC-0004 to `archived/`. +10. **PR-H** (landed, #183): Update top-level docs (README, NORTHSTARS, STRATEGY, PRINCIPLES, MILESTONES, CHANGELOG, CONTRIBUTING, AGENTS, docs/README). v0.2.0 sequencing: 1. **PR-I**: Extract the moat into a **Go submodule inside this repo** at `module/` (not an external repo). Layout: `module/go.mod` declaring `module github.com/tracecoreai/tracecore/module`, with `module/receiver/ncclfrreceiver/`, `module/processor/rankjoinprocessor/`, `module/processor/patterndetectorprocessor/`, and `module/pkg/nccl/fr_parser/` (the shared wire-format parser per §6.1). Root `go.work` lists both `.` and `./module` so dev builds resolve without publishing. Tag scheme `module/vX.Y.Z` (Go submodule prefix). builder-config.yaml adds three `gomod:` entries pointing at `github.com/tracecoreai/tracecore/module/{receiver/ncclfrreceiver,processor/rankjoinprocessor,processor/patterndetectorprocessor}` and a `replaces:` block pointing at `./module` for the dev cycle. **Rationale for in-repo submodule (vs external repo)**: single fork, one CI, one issue tracker, one DCO, one PR for cross-cutting changes; Go submodule tags give independent version line; OCB `gomod:` resolves submodules identical to external repos; no operational driver (different maintainer set / different release cadence submodule tags can't solve / license incompatibility) exists. Sub-sequencing: - - **PR-I.1a** — scaffolding only: introduce `module/go.mod` declaring `module github.com/tracecoreai/tracecore/module`, root `go.work` listing `.` and `./module`, and a `builder-config.yaml` `replaces:` skeleton block pointing at `./module`. No file movement; root tree and OCB build stay green side-by-side. Tag `module/v0.0.1` as the genesis tag (empty submodule, validates the tagging contract before any real code moves). - - **PR-I.1b** — `git mv components/receivers/nccl_fr → module/receiver/ncclfrreceiver` + `git mv pkg/nccl/fr_parser → module/pkg/nccl/fr_parser`; rename the Go package `nccl_fr` → `ncclfrreceiver` (matches the OCB receiver name + the metric-namespace shape locked in by PR-B1); update all importers in the root module (failure-inject ncclhang, fuzz harness, integration tests); root `go.work` keeps both modules resolvable. (Per PR-B1's metric-namespace decision, helper-emitted metrics remain byte-identical pre/post move; there is no second rename here.) No new tag — the next bump is `module/v0.1.0` at PR-I.2. **Hard-gated on PR-B2.** + - **PR-I.1a** (in flight — scaffold agent): introduce `module/go.mod` declaring `module github.com/tracecoreai/tracecore/module`, root `go.work` listing `.` and `./module`, and a `builder-config.yaml` `replaces:` skeleton block pointing at `./module`. No file movement; root tree and OCB build stay green side-by-side. Tag `module/v0.0.1` as the genesis tag (empty submodule, validates the tagging contract before any real code moves). + - **PR-I.1b** (pre-staged, gate satisfied): `git mv components/receivers/nccl_fr → module/receiver/ncclfrreceiver` + `git mv pkg/nccl/fr_parser → module/pkg/nccl/fr_parser`; rename the Go package `nccl_fr` → `ncclfrreceiver` (matches the OCB receiver name + the metric-namespace shape locked in by PR-B1); update all importers in the root module (failure-inject ncclhang, fuzz harness, integration tests); root `go.work` keeps both modules resolvable. (Per PR-B1's metric-namespace decision, helper-emitted metrics remain byte-identical pre/post move; there is no second rename here.) No new tag — the next bump is `module/v0.1.0` at PR-I.2. **Hard-gated on PR-B2** — gate satisfied by #201; lands after PR-I.1a scaffolding is in. - **PR-I.2** — introduces `rankjoinprocessor` and `patterndetectorprocessor` as net-new OTel processors wrapping `internal/synthesis/patterns/` logic. **Hard-gated on PR-K.1** (the patterns-lib k8sevents dep must be severed first; PR-K.2's k8sevents delete is sufficient but PR-K.1 is the minimum). Tag `module/v0.1.0` once the moat surface is complete — this is the first version pinned in `builder-config.yaml` for the v0.2.0 release. 2. **PR-J** (landed, #195): Ship recipes: `filelogreceiver + container stanza + file_storage`, `journaldreceiver + filelogreceiver + OTTL transform`, `k8sobjectsreceiver + transform`, `prometheusreceiver` (Kueue + dcgm-exporter). Helm chart values old→new compat map with `NOTES.txt` deprecation warning. (Recipe docs landed under `docs/integrations/`; chart values compat map follows in PR-K.3.) 3. **PR-K**: Sub-sliced into K.1/K.2/K.3 so each lands as a reviewable unit and the patterns-lib dep break is decoupled from the receiver deletes: - - **PR-K.1** — sever `internal/synthesis/patterns/` and the replay runner from `components/receivers/k8sevents` by introducing local model types in `internal/synthesis/patterns/model.go` (the patterns lib currently imports k8sevents for its event-record shape; PR-I.2 cannot ship until this dep is severed). No deletions in this PR. + - **PR-K.1** (in flight — separate agent landing): sever `internal/synthesis/patterns/` and the replay runner from `components/receivers/k8sevents` by introducing local model types in `internal/synthesis/patterns/model.go` (the patterns lib currently imports k8sevents for its event-record shape; PR-I.2 cannot ship until this dep is severed). No deletions in this PR. - **PR-K.2** — delete `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}`; migrate the ~86 test fixtures (`cmd/tracecore/*_test.go` references already removed by PR-A2; remaining fixtures in `internal/pipeline` + `internal/selftelemetry` + chart `renderedConfig` golden files move to hostmetricsreceiver / filelogreceiver shapes); delete `tools/failure-inject/xidgen/` (sole consumer was kernelevents OTTL recipe validation per Open Question #5); keep `tools/failure-inject/ncclhang/` (still useful for `ncclfrreceiver` testing per Open Question #5). - **PR-K.3** — chart cleanup: flip `containerstdout-on-values.yaml` to filelog+container-stanza shape, delete `containerstdout-rbac.yaml` (no SA needed once container stanza reads via hostPath), delete `.github/ISSUE_TEMPLATE/component-bug-kernelevents.yml`, ship `NOTES.txt` deprecation warning for the old values keys (`receivers.clockreceiver`, `receivers.kernelevents`, `receivers.k8sevents`, `receivers.containerstdout`), remove those values keys from `values.yaml` after the one-minor deprecation window. M19 cross-signal join test moves to `processor/rankjoinprocessor/` integration suite against filelogreceiver + k8sobjectsreceiver inputs (lands with PR-I.2, which is gated on PR-K.1). (Kineto already deleted in PR-F per #168; PR-O retains the OTel Profiles GA re-evaluation hook.) -4. **PR-L**: Migration guide in `docs/migration/v0.1-to-v0.2.md` covering every operator-visible change. +4. **PR-L** (landed, skeleton #179 + body #191): Migration guide in `docs/migration/v0.1-to-v0.2.md` covering every operator-visible change. Living document — receiver-deletion / chart-flip rows fill in as PR-K lands. v0.3.0 sequencing: 1. **PR-M**: Delete `components/receivers/pyspy/`, `python/tracecore_pyspy/`, `tools/pyspy-lint/` (pyspy-integration + python-publish workflows already removed pre-RFC). Ship `parca-agent` adoption recipe. -2. **PR-N**: Security-posture migration note (`CAP_SYS_PTRACE` → `CAP_SYS_ADMIN`/`CAP_BPF`). +2. **PR-N** (landed, #200): Security-posture migration note (`CAP_SYS_PTRACE` → `CAP_SYS_ADMIN`/`CAP_BPF`). Shipped at v0.1.0 (ahead of v0.3.0 schedule) as a doc-only update at `docs/migration/v0.2-to-v0.3.md` so operators evaluating pyspy today see the v0.3.0 migration shape up front. 3. **PR-O**: Re-evaluate Kineto (`components/receivers/kineto/`) against OTel Profiles state. If GA: adopt + delete receiver. If still Alpha: keep deferred. ## References