From 4c793df14b43f2f7aefbc19d13df7775dbea9ec7 Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Sat, 30 May 2026 20:05:36 -0700 Subject: [PATCH 1/2] =?UTF-8?q?docs(pivot):=20RFC-0013=20=C2=A7migration?= =?UTF-8?q?=20rescope=20=E2=80=94=20PR-I=20in-repo=20submodule?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Headline: PR-I is now an in-repo Go submodule at module/ (not external tracecoreai/tracecore-components repo). Open-source project — one fork, one CI, one issue tracker, one DCO. Go submodule tags give independent version line; OCB gomod: + replaces: ./module resolves identical to external repo. Three sequencing findings from adversarial scoping of PR-B / PR-F / PR-I: - New PR-A2 introduced as sequencing gate (switch cmd/tracecore to OCB-generated main; precondition for PR-B2 / PR-F / PR-I). - PR-B splits into PR-B1 (port nccl_fr off internal/selftelemetry + internal/runtime/lifecycle; helpers travel as siblings using receiver-scoped MeterProvider — instrument names otelcol_receiver_nccl_fr_* cannot collide with pipeline-runtime's own otelcol_* namespace) + PR-B2 (port off internal/pipeline + internal/consumer after PR-A2 lands). - PR-I subdivides into PR-I.1 (move nccl_fr into module/receiver/ ncclfrreceiver/ after PR-B2) + PR-I.2 (build rankjoinprocessor + patterndetectorprocessor as net-new processors wrapping internal/synthesis/patterns/ after PR-K severs the k8sevents dep). Adversarial-review fixes (independent reviewer on diff): - RFC v0.1.0 sequence renumbered 1-10 (was 1,2,3,4,3,4,5,8,9,10). - PR-B1 metric-namespace handling spelled out explicitly to prevent collision with upstream service/telemetry otelcol_* namespace. - Migration guide moat-components row reworded with explicit current-state vs future-state ("currently in components/... will live in module/..."). - CHANGELOG entry split from 253-word paragraph into headline + 3 bullets + file-list. Co-Authored-By: Claude Opus 4.7 (1M context) Signed-off-by: Tri Lam --- CHANGELOG.md | 10 ++++++++ docs/STRATEGY.md | 12 +++++----- docs/migration/v0.1-to-v0.2.md | 4 ++-- docs/rfcs/0013-distro-first-pivot.md | 36 +++++++++++++++------------- docs/rfcs/README.md | 4 ++-- 5 files changed, 39 insertions(+), 27 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 944245cd..f840e14c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,16 @@ Pivot landed across three waves of PRs: Remaining v0.1.0 work: PR-F (delete `internal/{componentstatus,selftelemetry,telemetry}` + `components/receivers/{dcgm,kueue}`) deferred — chart default pipeline hardwires the to-be-deleted receivers, so deletion happens together with the v0.2.0 recipe migration (PR-K) to avoid an interim chart break. `clockreceiver` source deletion also part of PR-K per PR-E rationale. +**RFC-0013 §migration rescoped (doc-only).** Headline: **PR-I is now an in-repo Go submodule at `module/`, not an external `tracecoreai/tracecore-components` repo.** Open-source project — one fork, one CI, one issue tracker, one DCO wins. Go submodule tags give independent version line; OCB `gomod:` + `replaces: ./module` for dev-loop resolves identical to external repo. + +Three sequencing findings from adversarial scoping of PR-B / PR-F / PR-I: + +- **New PR-A2 introduced as sequencing gate.** Switching `cmd/tracecore` to OCB-generated main + deleting legacy boot wiring is the precondition for PR-B2 / PR-F / PR-I — they cannot delete or rewire `internal/pipeline` + `internal/consumer` while the legacy boot path is the live wiring. +- **PR-B splits into PR-B1 + PR-B2.** B1 (lands now): port nccl_fr off `internal/selftelemetry` + `internal/runtime/lifecycle`; helpers travel with the receiver as siblings using a receiver-scoped `MeterProvider` (instrument names `otelcol_receiver_nccl_fr_*` — receiver-scoped meter cannot collide with pipeline-runtime's own `otelcol_*` namespace). B2 (lands with or after PR-A2): mechanical import swap off `internal/pipeline` + `internal/consumer` to upstream `go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`. +- **PR-I subdivides into PR-I.1 + PR-I.2.** I.1: move surviving `nccl_fr` into `module/receiver/ncclfrreceiver/` after PR-B2 cleans its imports. I.2: build `rankjoinprocessor` + `patterndetectorprocessor` as net-new OTel processors wrapping `internal/synthesis/patterns/` logic, after PR-K severs the k8sevents dep that the pattern engine currently imports. + +Files updated in this PR: `docs/rfcs/0013-distro-first-pivot.md` (§1, §6, §migration §4 v0.2.0 row, §migration §OQ #2, §migration v0.1.0 sequence renumbered 1-10 with PR-A1 / PR-A2 / PR-B1 / PR-B2 / PR-C / PR-D / PR-E / PR-F / PR-G / PR-H, §migration v0.2.0 PR-I body rewritten with in-repo layout + sub-sequencing), `docs/migration/v0.1-to-v0.2.md` (moat-components row reworded with explicit current-state vs future-state), `docs/rfcs/README.md` (new-receiver pointer), `docs/STRATEGY.md` (moat-location framing). + **PR-B reframed: self-tel metric rename (`tracecore.*` → `otelcol_*`) is a side-effect of the binary swap, not a caller rewrite.** Investigation found that `service/telemetry` + `componentstatus` upstream APIs are not drop-in replacements for the `IncError`/`IncEmissions`/`ObserveLatency`/`SetDegraded`/`MarkActivity` surface that `internal/selftelemetry/` provides today — the standard `otelcol_*` metrics RFC-0013 §2 promises are emitted by upstream `receiver/scraperhelper`, `exporter/exporterhelper`, and the OCB-generated pipeline runtime, NOT by `componentstatus` (which is a status-event surface). The rename therefore arrives automatically once PR-A's OCB binary boots with upstream receivers and PR-F deletes the in-tree receivers; no caller rewrite is needed in between. RFC-0013 §migration PR-B is collapsed into PR-F; the standalone PR-B step is documentation-only and lives in this CHANGELOG entry. **PR-D landed: production container-image build moved to [`ko`](https://ko.build).** Root `Dockerfile` deleted; `.ko.yaml` at the repo root pins `gcr.io/distroless/static-debian12:nonroot` by digest, `defaultPlatforms: linux/{amd64,arm64}`, and the `cmd/tracecore` ldflags that match `.goreleaser.yaml` so the in-image binary is shape-identical to the goreleaser-published archive's binary. New `ko-publish` job in `.github/workflows/release.yml` runs after `goreleaser`, builds the multi-arch image, pushes by digest to `ghcr.io/tracecoreai/tracecore` (matching the chart default `image.repository` — no chart values change required), tags `:` plus `:latest` on stable releases only (no `-` in SemVer pre-release), cosign-signs the manifest keyless against the workflow's OIDC identity, and pushes a `actions/attest-build-provenance` attestation bound to the image digest into the registry. `scripts/base-digest-check.sh` now reads the base-image pin from `.ko.yaml::defaultBaseImage` instead of the deleted `Dockerfile`. Operator-visible: chart pull path unchanged; verification flow now uses `cosign verify @` + `gh attestation verify oci://@` (docs/reproducibility.md walkthrough updates carry-forward). The chart-local kind-CI reference `install/kubernetes/tracecore/Dockerfile` (used by `.github/workflows/{chart,install-bench}.yml`) is preserved — it builds a self-contained CI image without depending on the production build path. diff --git a/docs/STRATEGY.md b/docs/STRATEGY.md index f146a9db..946f3390 100644 --- a/docs/STRATEGY.md +++ b/docs/STRATEGY.md @@ -128,7 +128,7 @@ patterns *that are expensive to add and additive later*. **`tracecoreai/tracecore` (this repo):** - `builder-config.yaml` - OCB manifest pinning upstream + contrib + - `tracecore-components` versions per release cycle. + `module/` submodule tag versions per release cycle. - `install/kubernetes/tracecore/` - Helm chart + OTTL normalization layer (the customer-stable contract in RFC-0013 §3). - `docs/integrations/` - bundled recipes wiring upstream receivers @@ -137,8 +137,8 @@ patterns *that are expensive to add and additive later*. - `.github/workflows/release.yml` - goreleaser + slsa-github-generator + cosign-installer + sbom-action integration glue. -**`tracecoreai/tracecore-components` (separate repo, separate Go -module):** +**`module/` (in-repo Go submodule, +path `github.com/tracecoreai/tracecore/module`):** - `receiver/ncclfrreceiver/` - moat scope #1. - `processor/rankjoinprocessor/` - moat scope #2 (windowed cross-signal join). @@ -153,7 +153,7 @@ posture): (versions pinned per release cycle); the components moat evolves at customer-driven cadence (new patterns, parser fixes). Separating the modules lets each move independently. -- `tracecore-components` is the upstream-contribution staging ground: +- `module/` submodule is the upstream-contribution staging ground: when upstream OTel-contrib accepts a component, it leaves this repo with no ripple into the distro skeleton's `go.mod`. @@ -224,8 +224,8 @@ harness lands in M5. **Default answer: don't add a receiver to this repo.** Per RFC-0013 §6, in-house code is bounded to the four moat scopes. New -receivers/processors land in the separate `tracecoreai/tracecore-components` -Go module (which graduates components to upstream OTel-contrib when +receivers/processors land in the in-repo `module/` Go submodule +(which graduates components to upstream OTel-contrib when mature). Only components matching a moat scope live in-tree. When a new component is genuinely in scope (e.g. a second cross-signal diff --git a/docs/migration/v0.1-to-v0.2.md b/docs/migration/v0.1-to-v0.2.md index 06d1dabe..c59e197a 100644 --- a/docs/migration/v0.1-to-v0.2.md +++ b/docs/migration/v0.1-to-v0.2.md @@ -20,7 +20,7 @@ v0.2.0 completes the RFC-0013 receiver swap. The in-tree custom receivers for ke | Kueue scheduler metrics | `kueue` (in-tree, never shipped) | `prometheusreceiver` recipe with bearer-token + TLS | Opt-in via `kueue.recipe: prometheus`. | | Heartbeat / install-bench primitive | `clockreceiver` (in-tree, chart default) | `hostmetricsreceiver` (loadscraper @ 1s, upstream OCB-bundled) | v0.1.x bench already swapped (PR-E). v0.2.0 flips the chart default — set `receivers.hostmetrics.enabled: true` + `receivers.clockreceiver.enabled: false` if you want to track the new default before the chart-default flip; otherwise no action until v0.2.0. `NOTES.txt` will surface a deprecation warning for one minor after the flip. | | Kineto profiler | `kineto` (in-tree, deferred) | Deferred until OTel Profiles GA | No action; re-evaluation when contrib ships `pprofreceiver`. | -| `tracecoreai/tracecore-components` module | Lives inside this repo | Separate Go module pulled via OCB `gomod:` | No operator action. Module split is internal. | +| Moat components (nccl_fr + pattern engine) | Currently in `components/receivers/nccl_fr/` + `internal/synthesis/patterns/` under the single repo-root `go.mod` | Will live in an in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`) pulled via OCB `gomod:` + dev-loop `replaces: ./module` | No operator action. Submodule split is internal — same repo, same fork, same CI; OCB builds via `gomod:` like any other upstream module. | | Helm values keys | Per-receiver `.*` | Per-receiver `.recipe: ` + per-recipe stanzas | One-minor compat. Migrate by setting `.recipe: upstream` per receiver. | ## What's NOT changing @@ -51,7 +51,7 @@ If recipe-toggle rollback doesn't help, pin the chart and image at the prior v0. ## Open items (fill in as PRs land) -- [ ] PR-I (separate `tracecoreai/tracecore-components` module) — link +- [ ] PR-I (in-repo Go submodule extraction at `module/`) — link - [ ] PR-J (ship recipes for filelog + journald + k8sobjects + prometheus) — link - [ ] PR-K (delete in-tree receivers) — link - [ ] PR-L (this guide, full body) — link diff --git a/docs/rfcs/0013-distro-first-pivot.md b/docs/rfcs/0013-distro-first-pivot.md index ce3a719c..bcb598d7 100644 --- a/docs/rfcs/0013-distro-first-pivot.md +++ b/docs/rfcs/0013-distro-first-pivot.md @@ -7,7 +7,7 @@ ## Summary -Tracecore pivots from a build-first OpenTelemetry Collector to a **distribution-first** posture. The binary is assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin `tracecore-components` Go module containing only the parts that have no upstream equivalent (cross-signal pattern detectors, OTTL processors with windowed semantics, NCCL FlightRecorder parsing). Seven in-tree receivers and three internal packages are deleted. Operator-facing telemetry contracts (attribute names, event hint enum, NCCL span schema) are preserved across the cut so downstream alerts survive the swap. Upstream contributions are first-class: when adoption surfaces a gap, tracecore patches upstream rather than forks. +Tracecore pivots from a build-first OpenTelemetry Collector to a **distribution-first** posture. The binary is assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`) containing only the parts that have no upstream equivalent (cross-signal pattern detectors, OTTL processors with windowed semantics, NCCL FlightRecorder parsing). Seven in-tree receivers and three internal packages are deleted. Operator-facing telemetry contracts (attribute names, event hint enum, NCCL span schema) are preserved across the cut so downstream alerts survive the swap. Upstream contributions are first-class: when adoption surfaces a gap, tracecore patches upstream rather than forks. ## Motivation @@ -43,15 +43,15 @@ receivers: - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sobjectsreceiver v0.110.0 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.110.0 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/telemetrygeneratorreceiver v0.110.0 - - gomod: github.com/tracecoreai/tracecore-components/receiver/ncclfrreceiver v0.1.0 + - gomod: github.com/tracecoreai/tracecore/module/receiver/ncclfrreceiver v0.1.0 processors: - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.110.0 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor v0.110.0 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor v0.110.0 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor v0.110.0 - - gomod: github.com/tracecoreai/tracecore-components/processor/rankjoinprocessor v0.1.0 - - gomod: github.com/tracecoreai/tracecore-components/processor/patterndetectorprocessor v0.1.0 + - gomod: github.com/tracecoreai/tracecore/module/processor/rankjoinprocessor v0.1.0 + - gomod: github.com/tracecoreai/tracecore/module/processor/patterndetectorprocessor v0.1.0 exporters: - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.110.0 @@ -65,7 +65,7 @@ extensions: - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/zpagesextension v0.110.0 ``` -The `tracecoreai/tracecore-components` Go module (separate repo) contains only: +The `github.com/tracecoreai/tracecore/module` Go submodule (lives in-repo at `module/` with its own `go.mod`; root `go.work` lists both `.` and `./module` so dev builds resolve without publishing; OCB consumes via `gomod:` + `replaces: ./module` per §migration PR-I) contains only: - `receiver/ncclfrreceiver/` — NCCL FlightRecorder cross-rank parser - `processor/rankjoinprocessor/` — windowed cross-signal join (5s eviction window) - `processor/patterndetectorprocessor/` — pattern engine + replay corpus @@ -117,7 +117,7 @@ The bundled Helm chart ships these mappings as a default OTTL pipeline. Operator | Release | Operator-visible breaks | Internal changes | |---|---|---| | **v0.1.0** | self-tel metric rename `tracecore.*` → `otelcol_*`; release-artifact provenance shape change (documented once) | OCB skeleton; upstream `componentstatus` adoption; release pipeline → goreleaser stack; image build → `ko`; bench heartbeat swap `clockreceiver` → `hostmetricsreceiver` in `bench/install/tracecore-values.yaml` (chart default + source delete deferred to v0.2.0 / PR-K per PR-E rationale) | -| **v0.2.0** | ALL recipe-side receiver swaps + chart-default flip from `clockreceiver` to `hostmetrics`. ONE migration guide. Helm values keys map old→new for one minor release with `NOTES.txt` deprecation warning. | Delete `components/receivers/{kernelevents,k8sevents,dcgm,kueue,clockreceiver,containerstdout}`; ship recipes for `filelog+container`, `journald+filelog+OTTL`, `k8sobjects+transform`, `prometheusreceiver` (dcgm-exporter + Kueue); receivers-only Go module split | +| **v0.2.0** | ALL recipe-side receiver swaps + chart-default flip from `clockreceiver` to `hostmetrics`. ONE migration guide. Helm values keys map old→new for one minor release with `NOTES.txt` deprecation warning. | Delete `components/receivers/{kernelevents,k8sevents,dcgm,kueue,clockreceiver,containerstdout}`; ship recipes for `filelog+container`, `journald+filelog+OTTL`, `k8sobjects+transform`, `prometheusreceiver` (dcgm-exporter + Kueue); moat code (nccl_fr + pattern engine) extracted to in-repo Go submodule at `module/` | | **v0.3.0** | Python profiling: `tracecore_pyspy` PyPI helper deleted; operator deploys `parca-agent` DaemonSet via separate chart; security posture changes (CAP_SYS_PTRACE → CAP_SYS_ADMIN/BPF — review window) | Delete `components/receivers/pyspy/`, `python/tracecore_pyspy/`, `tools/pyspy-lint/`, `.github/workflows/{pyspy-integration,python-publish}.yml`. Kineto re-evaluated when OTel Profiles GA. | Pre-v0.1.0 receivers with zero pilot deployments → clean delete in v0.1.0. Receivers with ≥1 pilot deployment (check D6 NPS Discussion list before cutting v0.1.0) → one-version deprecation warn → delete in next minor. Pre-v0.1.0 there is no compat owed by SemVer; this policy is operator-courtesy, not contract. @@ -140,7 +140,7 @@ When a contribution is in-flight, tracecore ships against a `replace` directive ### 6. What tracecore still builds (the moat) -After adoption, in-house code in `tracecoreai/tracecore-components` is bounded by these four scopes: +After adoption, in-house code in the in-repo `module/` submodule (path `github.com/tracecoreai/tracecore/module`) is bounded by these four scopes: 1. **`receiver/ncclfrreceiver/`** — Cross-rank NCCL FlightRecorder parsing. No OSS owner exists; this is M17 pattern-#1 (NVLink degradation) and M18 pattern-#6 input. `pkg/nccl/fr_parser/` lives alongside as the shared wire-format parser. 2. **`processor/rankjoinprocessor/`** — Windowed cross-signal join (5s default for eviction; configurable per pattern). OTTL has no windowed-join primitive; this is the smallest gap that justifies a custom processor. @@ -219,7 +219,7 @@ Customers pin `tracecore-recipes` like a library. Their `values.yaml` does not g ## Open questions 1. **Pilot operator count.** Before cutting v0.1.0, audit D6 NPS Discussion + integration matrix users to determine whether any operator has deployed `M9 kernelevents`, `M10 k8sevents`, `M13 pyspy Phase 2`, or `M15 containerstdout` in production. If zero: clean delete in v0.1.0. If ≥1: one-version deprecation warn (still under SemVer pre-1.0 policy, but operator-courtesy). -2. **NCCL FlightRecorder receiver homing.** The custom `ncclfrreceiver` belongs in `tracecore-components` (this is settled), but the upstream-contribution question — should we propose an OTel-contrib `ncclfrreceiver` instead — needs evaluation. NVIDIA has not picked this up; CNCF/OTel SIG may have appetite. Tracked as a v0.2 follow-up. +2. **NCCL FlightRecorder receiver homing.** The custom `ncclfrreceiver` belongs in the in-repo `module/` submodule (this is settled), but the upstream-contribution question — should we propose an OTel-contrib `ncclfrreceiver` instead — needs evaluation. NVIDIA has not picked this up; CNCF/OTel SIG may have appetite. Tracked as a v0.2 follow-up. 3. **Recipe-vs-binary versioning skew tolerance.** A recipe can pin a wider binary range than the binary supports. Need a values-validation gate in the chart that asserts `tracecore.minVersion` is satisfied. Tracked as a v0.1 follow-up. 4. **OCB build reproducibility.** Tracecore-specific: validate that OCB-generated `main.go` is byte-identical across CI runs given fixed `builder-config.yaml`. If not, file upstream issue and propose fix. Tracked as a v0.1 verification step. 5. **`tools/failure-inject/` scope.** Keep `pod-evict` (M19 fixture) and `xid` (kernel-events OTTL recipe validation); delete `cpu-steal` and `ncclhang` if no downstream consumer remains. Decide alongside v0.2.0. @@ -230,18 +230,20 @@ Release-boundary schedule in §4. PR sequencing follows. PR sequencing within v0.1.0: -1. **PR-A**: Add `builder-config.yaml`. `make build` switches to `builder --config=builder-config.yaml`. Old binary stays buildable via legacy target for one PR cycle to enable side-by-side testing. -2. **PR-B**: Self-telemetry adopts upstream `componentstatus` + `service/telemetry`. Metric rename `tracecore.*` → `otelcol_*` with one-line migration note in `CHANGELOG.md`. -3. **PR-C**: Release pipeline switches to goreleaser stack. Old `release.yml` archived under `.github/workflows/archived/`. -4. **PR-D**: Image build moves to `ko`. Chart `image.repository` continues to resolve. -5. **PR-E**: `clockreceiver` → `hostmetricsreceiver` (loadscraper @ 1s) in OCB manifest + bench-install Helm values. The originally-planned `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib at any tag (verified 2026-05-30; contrib issues #41687 and #43657 both closed `not_planned`). hostmetrics' loadscraper emits 3 low-cardinality series (`system.cpu.load_average.{1m,5m,15m}`) and satisfies the bench's "first parseable JSON line at sink" pass condition. Scope deferral: chart default stays `clockreceiver` and the in-tree source survives this PR (~92 references across `cmd/tracecore/*_test.go` + `internal/pipeline` + `internal/selftelemetry` fixtures); chart-default flip + source deletion ship as part of PR-K alongside coordinated test-fixture migration and the values-keys `NOTES.txt` deprecation cycle. -6. **PR-F**: Delete `internal/componentstatus`, `internal/selftelemetry`, `internal/telemetry`. Delete `components/receivers/{dcgm,kueue,kineto}` (none shipped real code). `clockreceiver` source deletion deferred to PR-K — see PR-E note for rationale. -7. **PR-G**: Supersede RFCs (add status headers + redirects). Move RFC-0004 to `archived/`. -8. **PR-H**: Update top-level docs (README, NORTHSTARS, STRATEGY, PRINCIPLES, MILESTONES, CHANGELOG, CONTRIBUTING, AGENTS, docs/README). +1. **PR-A1** (landed, #171): Add `builder-config.yaml`. `make build-ocb` produces `_build/tracecore` via OCB side-by-side with the legacy `cmd/tracecore` binary. +2. **PR-A2**: Switch `cmd/tracecore` to the OCB-generated main. Delete `cmd/tracecore/{main,components}.go` legacy wiring. After this lands, all receivers register through OCB's generated `service.Settings.Factories`, not through hand-rolled `cmd/tracecore/components.go`. **Sequencing gate**: this PR is the precondition for PR-B2 / PR-F / PR-I — they cannot delete or rewire `internal/pipeline` + `internal/consumer` while the legacy boot path is the live wiring. +3. **PR-B1**: Port `components/receivers/nccl_fr` off `internal/selftelemetry` and `internal/runtime/lifecycle`. Helpers travel with the receiver as unexported `selftel.go` + `lifecycle.go` siblings (slimmed of multi-receiver indirection — drop the noop type and the Kind canonical-set registry; keep the 8-instrument MeterProvider pattern). **Metric namespace:** helpers acquire the meter via `set.TelemetrySettings.MeterProvider.Meter("github.com/tracecoreai/tracecore/components/receivers/nccl_fr")`, NOT via the global `service/telemetry` provider — instrument names use the `otelcol_receiver_nccl_fr_*` shape (matching upstream `receiver/scraperhelper`'s `otelcol_receiver_*` convention), which lives under the receiver-scoped meter and cannot collide with the pipeline-runtime's own `otelcol_*` namespace. No second rename in PR-I; helper-emitted metrics are byte-identical pre/post extraction. **Unblocks** PR-F's `internal/selftelemetry` delete (when paired with the other receivers' ports / deletions). +4. **PR-B2** (lands with or after PR-A2): Port `components/receivers/nccl_fr` off `internal/pipeline` and `internal/consumer` to `go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`. Mechanical import swap (~60 LOC) but requires the binary boot path to already be OCB-driven; otherwise nccl_fr has nowhere to register. Natural prelude to PR-I. +5. **PR-C** (landed, #174): Release pipeline switches to goreleaser stack. Old `release.yml` archived under `.github/workflows/archived/`. +6. **PR-D** (landed, #176): Image build moves to `ko`. Chart `image.repository` continues to resolve. +7. **PR-E** (landed, #180): `clockreceiver` → `hostmetricsreceiver` (loadscraper @ 1s) in OCB manifest + bench-install Helm values. The originally-planned `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib at any tag (verified 2026-05-30; contrib issues #41687 and #43657 both closed `not_planned`). hostmetrics' loadscraper emits 3 low-cardinality series (`system.cpu.load_average.{1m,5m,15m}`) and satisfies the bench's "first parseable JSON line at sink" pass condition. Scope deferral: chart default stays `clockreceiver` and the in-tree source survives this PR (~92 references across `cmd/tracecore/*_test.go` + `internal/pipeline` + `internal/selftelemetry` fixtures); chart-default flip + source deletion ship as part of PR-K alongside coordinated test-fixture migration and the values-keys `NOTES.txt` deprecation cycle. +8. **PR-F** (lands after PR-A2 + PR-B1): Delete `internal/componentstatus`, `internal/selftelemetry`, `internal/telemetry`. Delete `components/receivers/dcgm` (cgo stub never shipped real code; kueue + kineto already deleted in #168). `clockreceiver` source deletion deferred to PR-K — see PR-E note for rationale. +9. **PR-G**: Supersede RFCs (add status headers + redirects). Move RFC-0004 to `archived/`. +10. **PR-H**: Update top-level docs (README, NORTHSTARS, STRATEGY, PRINCIPLES, MILESTONES, CHANGELOG, CONTRIBUTING, AGENTS, docs/README). v0.2.0 sequencing: -1. **PR-I**: Receivers-only Go module `tracecoreai/tracecore-components` extracted. Migrate `ncclfr` + pattern detectors + custom OTTL processors. OCB manifest pulls via `gomod:`. +1. **PR-I**: Extract the moat into a **Go submodule inside this repo** at `module/` (not an external repo). Layout: `module/go.mod` declaring `module github.com/tracecoreai/tracecore/module`, with `module/receiver/ncclfrreceiver/`, `module/processor/rankjoinprocessor/`, `module/processor/patterndetectorprocessor/`, and `module/pkg/nccl/fr_parser/` (the shared wire-format parser per §6.1). Root `go.work` lists both `.` and `./module` so dev builds resolve without publishing. Tag scheme `module/vX.Y.Z` (Go submodule prefix). builder-config.yaml adds three `gomod:` entries pointing at `github.com/tracecoreai/tracecore/module/{receiver/ncclfrreceiver,processor/rankjoinprocessor,processor/patterndetectorprocessor}` and a `replaces:` block pointing at `./module` for the dev cycle. **Rationale for in-repo submodule (vs external repo)**: single fork, one CI, one issue tracker, one DCO, one PR for cross-cutting changes; Go submodule tags give independent version line; OCB `gomod:` resolves submodules identical to external repos; no operational driver (different maintainer set / different release cadence submodule tags can't solve / license incompatibility) exists. Sub-sequencing: **PR-I.1** moves the surviving `nccl_fr` receiver into `module/receiver/ncclfrreceiver/` (after PR-B2 cleans its imports) + extracts the wire parser to `module/pkg/nccl/fr_parser/`. **PR-I.2** introduces `rankjoinprocessor` and `patterndetectorprocessor` as net-new OTel processors wrapping `internal/synthesis/patterns/` logic (which currently lives as a verdict library and imports `components/receivers/k8sevents`; the k8sevents dep is severed by PR-K's deletion of k8sevents, so PR-I.2 lands after PR-K). 2. **PR-J**: Ship recipes: `filelogreceiver + container stanza + file_storage`, `journaldreceiver + filelogreceiver + OTTL transform`, `k8sobjectsreceiver + transform`, `prometheusreceiver` (Kueue + dcgm-exporter). Helm chart values old→new compat map with `NOTES.txt` deprecation warning. 3. **PR-K**: Delete `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}` together with the test-fixture migration (cmd/tracecore/*_test.go + internal/pipeline + internal/selftelemetry fixtures move to hostmetricsreceiver / filelogreceiver shapes). Flip chart default from `clockreceiver` to `hostmetrics` with `NOTES.txt` deprecation warning. Delete `.github/workflows/kernelevents-integration.yml`. Delete `.github/ISSUE_TEMPLATE/component-bug-kernelevents.yml`. M19 cross-signal join test moves to `processor/rankjoinprocessor/` integration suite against filelogreceiver + k8sobjectsreceiver inputs. (Kineto already deleted in PR-F per #168; PR-O retains the OTel Profiles GA re-evaluation hook.) 4. **PR-L**: Migration guide in `docs/migration/v0.1-to-v0.2.md` covering every operator-visible change. diff --git a/docs/rfcs/README.md b/docs/rfcs/README.md index 1c3c4709..e7da9a62 100644 --- a/docs/rfcs/README.md +++ b/docs/rfcs/README.md @@ -39,7 +39,7 @@ to `NNNN-short-title.md` with the next free number. ## When to read which - **Onboarding to the architecture:** [0001](0001-architecture-overview.md), then [0003](0003-pipeline-runtime-and-component-contract.md). -- **Writing a new receiver:** Per RFC-0013, new receivers land in [`tracecoreai/tracecore-components`](https://github.com/tracecoreai/tracecore-components) outside this repo. Historical references: [0004](archived/0004-clockreceiver-stdoutexporter.md) (canonical contract surface, archived), [0005](0005-dcgm-receiver-scope.md), [0007](0007-kernelevents-receiver-scope.md). +- **Writing a new receiver:** Per RFC-0013, new receivers land in the in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`). Historical references: [0004](archived/0004-clockreceiver-stdoutexporter.md) (canonical contract surface, archived), [0005](0005-dcgm-receiver-scope.md), [0007](0007-kernelevents-receiver-scope.md). - **Touching self-telemetry / `/metrics`:** [0006](0006-self-telemetry-surface.md). - **Considering a fork or out-of-tree split:** [0002](0002-own-binary-vs-otel-contrib.md). - **Adding a dependency or thinking about update semantics:** [0008](0008-auto-update-boundary.md) (the boundary that pins the no-in-binary-update default and the depguard / grep gates that enforce it). @@ -97,4 +97,4 @@ Receivers register via the repo-root `components.yaml` + `make generate`: 2. Run `make generate` to regenerate `cmd/tracecore/components.go` from the YAML. Do NOT hand-edit the generated file — the header carries a `DO NOT EDIT` warning. 3. The generator (`tools/components-gen/main.go`) emits an import alias and a factory map entry like `pipeline.MustNewType(""): .NewFactory()`. -Historical: under the pre-RFC-0013 architecture, the receiver's factory function lived in `components/receivers//factory.go` per the [`0004-clockreceiver-stdoutexporter.md`](archived/0004-clockreceiver-stdoutexporter.md) contract. Per RFC-0013, new receivers live in the `tracecoreai/tracecore-components` Go module, registered via the OCB `builder-config.yaml` manifest. +Historical: under the pre-RFC-0013 architecture, the receiver's factory function lived in `components/receivers//factory.go` per the [`0004-clockreceiver-stdoutexporter.md`](archived/0004-clockreceiver-stdoutexporter.md) contract. Per RFC-0013, new receivers live in the in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`), registered via the OCB `builder-config.yaml` manifest. From 182c64121a4b63e0c588418a1c11a0ccd2456d6a Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Sat, 30 May 2026 20:09:15 -0700 Subject: [PATCH 2/2] =?UTF-8?q?docs(pivot):=20adversarial-review-2=20fixes?= =?UTF-8?q?=20=E2=80=94=20CHANGELOG=20L7=20+=20metric=20name?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Second adversarial review on PR #181 diff found: - CHANGELOG.md:7 stale `tracecoreai/tracecore-components` ref in the Unreleased preamble (was pre-existing, not in commit 4c793df's change scope but inconsistent with the rescope headline below). Rewrite to in-repo `module/` submodule wording. - RFC §migration PR-B1 metric naming used `otelcol_receiver_nccl_fr_*` with the underscore. OCB upstream convention is `otelcol_receiver__*` where is the receiver package name without underscores. Per project memory the rename is `nccl_fr → ncclfr`; align the RFC metric naming to `otelcol_receiver_ncclfr_*` so the helpers emit the right shape from day one and no rename is needed at PR-I. Co-Authored-By: Claude Opus 4.7 (1M context) Signed-off-by: Tri Lam --- CHANGELOG.md | 2 +- docs/rfcs/0013-distro-first-pivot.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f840e14c..369cf23f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,7 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee ## [Unreleased] -Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-first-pivot.md))** - binary now assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin `tracecoreai/tracecore-components` module containing only the moat (NCCL FlightRecorder receiver, OTTL processors with windowed semantics, pattern detectors). The M1 in-tree pipeline runtime + factory-based assembly is queued for deletion at v0.1.0 in favor of the OCB-generated boot path; the canonical `clockreceiver` + `stdoutexporter` examples ship for one PR cycle and then exit. Targeting v0.1.0 / v0.2.0 / v0.3.0 release boundaries per RFC-0013 §4. +Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-first-pivot.md))** - binary now assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin in-repo Go submodule at `module/` (path `github.com/tracecoreai/tracecore/module`) containing only the moat (NCCL FlightRecorder receiver, OTTL processors with windowed semantics, pattern detectors). The M1 in-tree pipeline runtime + factory-based assembly is queued for deletion at v0.1.0 in favor of the OCB-generated boot path; the canonical `clockreceiver` + `stdoutexporter` examples ship for one PR cycle and then exit. Targeting v0.1.0 / v0.2.0 / v0.3.0 release boundaries per RFC-0013 §4. Pivot landed across three waves of PRs: - Wave 1 (#166 RFC doc accepted, #168 delete kueue + kineto receivers, #169 pre-PR-A drift sweep + Helm security tighten, #170 containerstdout deletion explicit in §7, #171 PR-A OCB skeleton + `builder-config.yaml` + `make build-ocb`, #172 dedup gate execution, #173 rename check tiers + add PR body-artifact guard, #174 PR-C release pipeline → goreleaser stack + RFC supersession + top-level doc alignment, #175 wave-1 self-review fixes + delete archive folder). diff --git a/docs/rfcs/0013-distro-first-pivot.md b/docs/rfcs/0013-distro-first-pivot.md index bcb598d7..70fc3ed0 100644 --- a/docs/rfcs/0013-distro-first-pivot.md +++ b/docs/rfcs/0013-distro-first-pivot.md @@ -232,7 +232,7 @@ PR sequencing within v0.1.0: 1. **PR-A1** (landed, #171): Add `builder-config.yaml`. `make build-ocb` produces `_build/tracecore` via OCB side-by-side with the legacy `cmd/tracecore` binary. 2. **PR-A2**: Switch `cmd/tracecore` to the OCB-generated main. Delete `cmd/tracecore/{main,components}.go` legacy wiring. After this lands, all receivers register through OCB's generated `service.Settings.Factories`, not through hand-rolled `cmd/tracecore/components.go`. **Sequencing gate**: this PR is the precondition for PR-B2 / PR-F / PR-I — they cannot delete or rewire `internal/pipeline` + `internal/consumer` while the legacy boot path is the live wiring. -3. **PR-B1**: Port `components/receivers/nccl_fr` off `internal/selftelemetry` and `internal/runtime/lifecycle`. Helpers travel with the receiver as unexported `selftel.go` + `lifecycle.go` siblings (slimmed of multi-receiver indirection — drop the noop type and the Kind canonical-set registry; keep the 8-instrument MeterProvider pattern). **Metric namespace:** helpers acquire the meter via `set.TelemetrySettings.MeterProvider.Meter("github.com/tracecoreai/tracecore/components/receivers/nccl_fr")`, NOT via the global `service/telemetry` provider — instrument names use the `otelcol_receiver_nccl_fr_*` shape (matching upstream `receiver/scraperhelper`'s `otelcol_receiver_*` convention), which lives under the receiver-scoped meter and cannot collide with the pipeline-runtime's own `otelcol_*` namespace. No second rename in PR-I; helper-emitted metrics are byte-identical pre/post extraction. **Unblocks** PR-F's `internal/selftelemetry` delete (when paired with the other receivers' ports / deletions). +3. **PR-B1**: Port `components/receivers/nccl_fr` off `internal/selftelemetry` and `internal/runtime/lifecycle`. Helpers travel with the receiver as unexported `selftel.go` + `lifecycle.go` siblings (slimmed of multi-receiver indirection — drop the noop type and the Kind canonical-set registry; keep the 8-instrument MeterProvider pattern). **Metric namespace:** helpers acquire the meter via `set.TelemetrySettings.MeterProvider.Meter("github.com/tracecoreai/tracecore/components/receivers/nccl_fr")`, NOT via the global `service/telemetry` provider — instrument names use the `otelcol_receiver_ncclfr_*` shape (matching upstream `receiver/scraperhelper`'s `otelcol_receiver__*` convention where `` is the OCB receiver package name without underscores, i.e. `ncclfr` per the `nccl_fr → ncclfr` rename tracked in project memory). Receiver-scoped meter cannot collide with the pipeline-runtime's own `otelcol_*` namespace. No second rename in PR-I; helper-emitted metrics are byte-identical pre/post extraction. **Unblocks** PR-F's `internal/selftelemetry` delete (when paired with the other receivers' ports / deletions). 4. **PR-B2** (lands with or after PR-A2): Port `components/receivers/nccl_fr` off `internal/pipeline` and `internal/consumer` to `go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`. Mechanical import swap (~60 LOC) but requires the binary boot path to already be OCB-driven; otherwise nccl_fr has nowhere to register. Natural prelude to PR-I. 5. **PR-C** (landed, #174): Release pipeline switches to goreleaser stack. Old `release.yml` archived under `.github/workflows/archived/`. 6. **PR-D** (landed, #176): Image build moves to `ko`. Chart `image.repository` continues to resolve.