From 6fc57875e3bc7881337548ac762abf9453d5244b Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Sat, 30 May 2026 19:18:40 -0700 Subject: [PATCH 1/2] =?UTF-8?q?chore(pivot):=20PR-E=20unblock=20=E2=80=94?= =?UTF-8?q?=20bench=20heartbeat=20to=20hostmetricsreceiver?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit clockreceiver -> hostmetricsreceiver (loadscraper @ 1s) in builder-config.yaml + bench/install/tracecore-values.yaml. Adds opt-in `receivers.hostmetrics` block to chart values.yaml (default disabled; chart default stays clockreceiver this release per RFC-0013 §migration PR-E rationale). The originally-planned telemetrygeneratorreceiver does not exist in opentelemetry-collector-contrib at any tag (verified 2026-05-30; contrib issues #41687 and #43657 both closed `not_planned`). hostmetrics' loadscraper emits 3 low-cardinality series (system.cpu.load_average.{1m,5m,15m}) at the cadence the bench's pass condition needs (first parseable JSON line at the sink). RFC-0013 §4 + §7 + §migration PR-E updated. CHANGELOG + migration guide row filled. Chart-default flip + clockreceiver source-deletion (~92 in-tree fixture references) deferred to PR-K alongside coordinated test-fixture migration + NOTES.txt deprecation cycle. Co-Authored-By: Claude Opus 4.7 (1M context) Signed-off-by: Tri Lam --- CHANGELOG.md | 11 ++++-- bench/install/tracecore-values.yaml | 50 ++++++++++++++++-------- builder-config.yaml | 19 +++------ docs/migration/v0.1-to-v0.2.md | 2 +- docs/rfcs/0013-distro-first-pivot.md | 16 ++++---- install/kubernetes/tracecore/values.yaml | 18 +++++++++ 6 files changed, 73 insertions(+), 43 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 954fce13..944245cd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,11 +6,14 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-first-pivot.md))** - binary now assembled via the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus a thin `tracecoreai/tracecore-components` module containing only the moat (NCCL FlightRecorder receiver, OTTL processors with windowed semantics, pattern detectors). The M1 in-tree pipeline runtime + factory-based assembly is queued for deletion at v0.1.0 in favor of the OCB-generated boot path; the canonical `clockreceiver` + `stdoutexporter` examples ship for one PR cycle and then exit. Targeting v0.1.0 / v0.2.0 / v0.3.0 release boundaries per RFC-0013 §4. -Pivot landed across two waves of PRs: +Pivot landed across three waves of PRs: - Wave 1 (#166 RFC doc accepted, #168 delete kueue + kineto receivers, #169 pre-PR-A drift sweep + Helm security tighten, #170 containerstdout deletion explicit in §7, #171 PR-A OCB skeleton + `builder-config.yaml` + `make build-ocb`, #172 dedup gate execution, #173 rename check tiers + add PR body-artifact guard, #174 PR-C release pipeline → goreleaser stack + RFC supersession + top-level doc alignment, #175 wave-1 self-review fixes + delete archive folder). -- Wave 2 (#176 PR-D image build → ko + `_build/` walker fix + PR-B reframe as side-effect of binary swap). +- Wave 2 (#176 PR-D image build → ko + `_build/` walker fix + PR-B reframe as side-effect of binary swap, #177 build-ocb CI gate, #178 post-wave-2 drift sweep, #179 v0.1→v0.2 migration guide skeleton). +- Wave 3 (PR-E: bench heartbeat swap `clockreceiver` → `hostmetricsreceiver`). -Remaining v0.1.0 work: PR-E (clockreceiver → telemetrygenerator) currently BLOCKED — upstream `telemetrygeneratorreceiver` doesn't exist at any version. PR-F (delete `internal/{componentstatus,selftelemetry,telemetry}` + `components/receivers/{clockreceiver,dcgm,kueue}`) deferred — chart default pipeline hardwires the to-be-deleted receivers, so deletion happens together with the v0.2.0 recipe migration (PR-K) to avoid an interim chart break. +**PR-E unblocked.** Original RFC-0013 §migration plan named `telemetrygeneratorreceiver` as the upstream replacement for `clockreceiver`. Verified 2026-05-30: the receiver does not exist in `opentelemetry-collector-contrib` at any tag from v0.95.0 through v0.130.0; two community proposals (contrib issues #41687 and #43657) were closed `not_planned`. Replacement landed on `hostmetricsreceiver` (loadscraper @ 1s) — an upstream OCB-bundled receiver that emits 3 low-cardinality series (`system.cpu.load_average.{1m,5m,15m}`) at the cadence the bench's pass condition needs (first parseable JSON line at the sink — see `bench/install/run.sh`). This PR adds `hostmetricsreceiver` to `builder-config.yaml`, adds a `receivers.hostmetrics` opt-in block to the chart values (default disabled — chart default stays `clockreceiver` this release), and flips `bench/install/tracecore-values.yaml` to enable hostmetrics + disable clockreceiver. RFC-0013 §migration PR-E + §4 + §7 deletion table updated. Chart-default flip from `clockreceiver` to `hostmetrics` + source-deletion of `components/receivers/clockreceiver/` are deferred to PR-K (in-tree-receiver deletion wave) so the values-keys migration ships together with `NOTES.txt` deprecation warnings and the coordinated migration of ~92 in-tree test-fixture references in one cut rather than two operator-visible changes. + +Remaining v0.1.0 work: PR-F (delete `internal/{componentstatus,selftelemetry,telemetry}` + `components/receivers/{dcgm,kueue}`) deferred — chart default pipeline hardwires the to-be-deleted receivers, so deletion happens together with the v0.2.0 recipe migration (PR-K) to avoid an interim chart break. `clockreceiver` source deletion also part of PR-K per PR-E rationale. **PR-B reframed: self-tel metric rename (`tracecore.*` → `otelcol_*`) is a side-effect of the binary swap, not a caller rewrite.** Investigation found that `service/telemetry` + `componentstatus` upstream APIs are not drop-in replacements for the `IncError`/`IncEmissions`/`ObserveLatency`/`SetDegraded`/`MarkActivity` surface that `internal/selftelemetry/` provides today — the standard `otelcol_*` metrics RFC-0013 §2 promises are emitted by upstream `receiver/scraperhelper`, `exporter/exporterhelper`, and the OCB-generated pipeline runtime, NOT by `componentstatus` (which is a status-event surface). The rename therefore arrives automatically once PR-A's OCB binary boots with upstream receivers and PR-F deletes the in-tree receivers; no caller rewrite is needed in between. RFC-0013 §migration PR-B is collapsed into PR-F; the standalone PR-B step is documentation-only and lives in this CHANGELOG entry. @@ -21,7 +24,7 @@ Remaining v0.1.0 work: PR-E (clockreceiver → telemetrygenerator) currently BLO - **Adopt > build posture replaces in-tree receivers for GPU telemetry, container stdout, kernel events, K8s events, Kueue, Python profiling, heartbeat, self-telemetry, release pipeline, and image publish.** Adoption matrix lives in [RFC-0013 §2](docs/rfcs/0013-distro-first-pivot.md#2-adoption-matrix). Vendors: NVIDIA (`dcgm-exporter`), AMD (`ROCm/device-metrics-exporter`), Intel (`intel/xpumanager`), Habana (Habana Prometheus Metric Exporter) - all scraped via upstream `prometheusreceiver`. CNCF: `filelogreceiver` + container stanza + `file_storage`; `journaldreceiver`; `k8sobjectsreceiver`; `telemetrygeneratorreceiver`. CNCF Profiles: `parca-agent` via OTLP profiles sink. Self-telemetry: upstream `componentstatus` + `service/telemetry` + standard `otelcol_*` metrics. - **Customer-stable telemetry contracts preserved across the pivot** via the OTTL `transform` processor in the bundled Helm-chart recipe ([RFC-0013 §3](docs/rfcs/0013-distro-first-pivot.md#3-customer-stable-telemetry-contracts)). Stable surfaces: `k8s.event.hint` 11-entry enum (pod_evicted, mount_failure, backoff, oom_killed, node_unhealthy, schedule_failure, create_failure, volume_attach_failure, container_status_unknown, node_pressure, image_pull_failure); `kernelevents.xid` (NVRM Xid code); `gpu.id` (PCI BDF); `gpu.vendor` (nvidia | amd | intel | habana - upstream-contribution target to OTel `hw.*` semconv); `gen_ai.training.rank` and `gen_ai.training.job_id` (cross-receiver join keys); NCCL FlightRecorder span schema; pattern detector outputs (M17/M18/M19). Operator alerts written against these survive the receiver swap. - **Deletions scheduled** (RFC-0013 §7): - - **v0.1.0:** `components/receivers/clockreceiver/` (→ `telemetrygeneratorreceiver`), `components/receivers/dcgm/` (cgo stub never shipped real path; → `dcgm-exporter` + `prometheusreceiver` recipe), `components/receivers/kueue/` (never shipped; → `prometheusreceiver` recipe), `internal/componentstatus/`, `internal/selftelemetry/`, `internal/telemetry/`. Hand-rolled `.github/workflows/release.yml` rewritten onto the goreleaser stack (prior workflow preserved in git history). Operator-visible breaks: self-tel metric rename `tracecore.*` → `otelcol_*`; release-artifact provenance shape change (documented once). + - **v0.1.0:** bench heartbeat swap `clockreceiver` → `hostmetricsreceiver` (PR-E; source survives until PR-K per coupled test-fixture migration), `components/receivers/dcgm/` (cgo stub never shipped real path; → `dcgm-exporter` + `prometheusreceiver` recipe), `components/receivers/kueue/` (never shipped; → `prometheusreceiver` recipe), `internal/componentstatus/`, `internal/selftelemetry/`, `internal/telemetry/`. Hand-rolled `.github/workflows/release.yml` rewritten onto the goreleaser stack (prior workflow preserved in git history). Operator-visible breaks: self-tel metric rename `tracecore.*` → `otelcol_*`; release-artifact provenance shape change (documented once). - **v0.2.0:** `components/receivers/kernelevents/` (→ `journaldreceiver` + `filelogreceiver` + OTTL Xid transform), `components/receivers/k8sevents/` (→ `k8sobjectsreceiver` + OTTL `k8s.event.hint` transform), `components/receivers/kineto/` (deferred; re-eval at OTel Profiles GA), plus `.github/workflows/kernelevents-integration.yml`. Operator-visible breaks: ALL recipe-side receiver swaps, batched into one migration guide; Helm values keys map old→new for one minor release with `NOTES.txt` deprecation warning. - **v0.3.0:** `components/receivers/pyspy/` (→ `parca-agent` via separate chart), `python/tracecore_pyspy/`, `tools/pyspy-lint/`, `.github/workflows/{pyspy-integration,python-publish}.yml`. Operator-visible breaks: PyPI helper deleted; security posture changes (CAP_SYS_PTRACE → CAP_SYS_ADMIN/BPF - operator review window). - **Upstream contributions become first-class policy.** Tracecore patches upstream first; forks only when upstream rejects ([RFC-0013 §5](docs/rfcs/0013-distro-first-pivot.md#5-upstream-contribution-policy)). When a contribution is in-flight, tracecore ships against a `replace` directive in `go.mod` pointing at the contribution branch; the replace is removed when the upstream tag lands. Likely contribution slots opened by the pivot: `k8sobjectsreceiver` (`k8s.event.hint` derived attribute), `filelogreceiver` / container stanza (PyTorch rank + dataloader-timing presets), `journaldreceiver` (`_TRACE_ID`/`_SPAN_ID` propagation), cross-vendor `gpu.vendor` semconv extension, OTel Profiles Kineto adapter, OCB reproducibility flags, `telemetrygeneratorreceiver` rate-limit knobs. diff --git a/bench/install/tracecore-values.yaml b/bench/install/tracecore-values.yaml index f6ea4e5b..5f5a2a97 100644 --- a/bench/install/tracecore-values.yaml +++ b/bench/install/tracecore-values.yaml @@ -6,27 +6,43 @@ # alongside this bench) so the rendered pipeline wires otlphttp # automatically. No free-form `config:` override needed. # -# RFC-0013 note: the `clockreceiver` and `stdoutexporter` values -# keys below refer to the v0.0.x in-tree components. At v0.1.0 they -# map to recipe equivalents per RFC-0013 §7 (Deletion list): -# clockreceiver -> telemetrygeneratorreceiver (OCB-bundled) [BLOCKED] -# stdoutexporter -> debugexporter (OCB-bundled) -# PR-E status (2026-05-30): the clockreceiver -> telemetrygeneratorreceiver -# swap is DEFERRED. The receiver does not exist in -# opentelemetry-collector-contrib at v0.110.0 or main (verified against -# the GH tree API; receiver/telemetrygeneratorreceiver path 404s on every -# tag from v0.95.0 through v0.130.0). The bench keeps using the in-tree -# clockreceiver until PR-K (v0.2.0 recipe migration) deletes it — at which point -# the bench will switch to a non-clock load source (likely -# hostmetricsreceiver on a 1s scrape, or an OCB-bundled equivalent if one -# lands upstream). See builder-config.yaml TODO(RFC-0013 PR-E) block. -# The chart's compat map will keep these values keys working for one -# minor with a `NOTES.txt` deprecation warning per RFC-0013 §8. +# RFC-0013 PR-E (2026-05-30): bench load source is `hostmetrics` +# (loadscraper @ 1s) — an upstream OTel-contrib receiver bundled by +# OCB. Replaces the legacy in-tree `clockreceiver` here because the +# distro-first pivot's intent is "no custom receiver where upstream +# satisfies." hostmetrics' loadscraper emits 3 low-cardinality +# series (system.cpu.load_average.{1m,5m,15m}) at the cadence the +# bench's pass condition needs (first parseable JSON line at the +# sink — see bench/install/run.sh). +# +# The originally-planned `telemetrygeneratorreceiver` does NOT +# exist in opentelemetry-collector-contrib at any tag (verified +# 2026-05-30; contrib issues #41687 and #43657 both closed +# `not_planned`). Re-evaluation trigger: a new generator-shaped +# receiver landing in contrib. +# +# Scope deferral: chart default stays `clockreceiver` this release; +# default-flip + values-keys migration ship together in PR-K +# (in-tree-receiver deletion wave) with NOTES.txt deprecation +# warnings — one coordinated cut rather than two operator-visible +# changes. `components/receivers/clockreceiver/` source also +# survives until PR-K because it doubles as the canonical example +# receiver across cmd/tracecore/*_test.go + internal/pipeline + +# internal/selftelemetry fixtures (~92 references audited). image: repository: tracecore tag: bench pullPolicy: Never +receivers: + clockreceiver: + enabled: false + hostmetrics: + enabled: true + collection_interval: 1s + scrapers: + load: {} + exporters: stdoutexporter: enabled: false @@ -36,5 +52,5 @@ exporters: pipelines: metrics: - receivers: [clockreceiver] + receivers: [hostmetrics] exporters: [otlphttp] diff --git a/builder-config.yaml b/builder-config.yaml index db6a3b23..7f64b038 100644 --- a/builder-config.yaml +++ b/builder-config.yaml @@ -11,19 +11,12 @@ receivers: - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/journaldreceiver v0.110.0 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sobjectsreceiver v0.110.0 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.110.0 - # TODO(RFC-0013 PR-E): swap clockreceiver -> telemetrygeneratorreceiver. - # BLOCKER (verified 2026-05-30 against opentelemetry-collector-contrib - # v0.110.0 + main): receiver does NOT exist in the OTel contrib repo at - # any path (receiver/telemetrygeneratorreceiver, loadgenreceiver, - # mockreceiver, dummyreceiver all 404). RFC-0013 §1's example shape - # referenced it speculatively — the receiver was never landed upstream. - # Decision: keep clockreceiver alive in cmd/tracecore/components.go - # legacy boot path until either (a) an upstream replacement lands and we - # bump OCB to that release, or (b) PR-F's deletion ships and the bench - # rewires to a different load source (e.g. hostmetricsreceiver on a - # short scrape interval, or otlpreceiver fed by a sibling loader pod). - # Re-evaluate on every OCB version bump. Tracked in RFC-0013 OQ - # follow-up (cannot edit RFC inline per PR-E constraint). + # RFC-0013 PR-E (2026-05-30): hostmetricsreceiver replaces the + # planned-but-nonexistent telemetrygeneratorreceiver as the bench + # heartbeat source. Two upstream proposals (contrib #41687, #43657) + # closed `not_planned`; re-evaluation trigger is a generator-shaped + # receiver landing in contrib at any future tag. + - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.110.0 processors: - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.110.0 diff --git a/docs/migration/v0.1-to-v0.2.md b/docs/migration/v0.1-to-v0.2.md index 69323864..06d1dabe 100644 --- a/docs/migration/v0.1-to-v0.2.md +++ b/docs/migration/v0.1-to-v0.2.md @@ -18,7 +18,7 @@ v0.2.0 completes the RFC-0013 receiver swap. The in-tree custom receivers for ke | GPU telemetry (NVIDIA) | `dcgm` (in-tree, cgo stub) | `dcgm-exporter` DaemonSet + `prometheusreceiver` | Deploy `dcgm-exporter` via its own chart; chart adds a `gpu.nvidia.recipe: prometheus` toggle to wire the scrape. | | GPU telemetry (AMD / Intel / Habana) | Not shipped | `ROCm/device-metrics-exporter` / `intel/xpumanager` / Habana Prometheus Metric Exporter, all scraped via `prometheusreceiver` | New capability; opt-in via `gpu..recipe: prometheus`. | | Kueue scheduler metrics | `kueue` (in-tree, never shipped) | `prometheusreceiver` recipe with bearer-token + TLS | Opt-in via `kueue.recipe: prometheus`. | -| Heartbeat / install-bench primitive | `clockreceiver` (in-tree) | `telemetrygeneratorreceiver` — **BLOCKED**: receiver does not exist upstream at any version | TBD. Current plan: `hostmetricsreceiver` (1s scrape) as the load source. Track [GitHub issue / RFC OQ followup]. | +| Heartbeat / install-bench primitive | `clockreceiver` (in-tree, chart default) | `hostmetricsreceiver` (loadscraper @ 1s, upstream OCB-bundled) | v0.1.x bench already swapped (PR-E). v0.2.0 flips the chart default — set `receivers.hostmetrics.enabled: true` + `receivers.clockreceiver.enabled: false` if you want to track the new default before the chart-default flip; otherwise no action until v0.2.0. `NOTES.txt` will surface a deprecation warning for one minor after the flip. | | Kineto profiler | `kineto` (in-tree, deferred) | Deferred until OTel Profiles GA | No action; re-evaluation when contrib ships `pprofreceiver`. | | `tracecoreai/tracecore-components` module | Lives inside this repo | Separate Go module pulled via OCB `gomod:` | No operator action. Module split is internal. | | Helm values keys | Per-receiver `.*` | Per-receiver `.recipe: ` + per-recipe stanzas | One-minor compat. Migrate by setting `.recipe: upstream` per receiver. | diff --git a/docs/rfcs/0013-distro-first-pivot.md b/docs/rfcs/0013-distro-first-pivot.md index 31b04b64..db8bf830 100644 --- a/docs/rfcs/0013-distro-first-pivot.md +++ b/docs/rfcs/0013-distro-first-pivot.md @@ -116,8 +116,8 @@ The bundled Helm chart ships these mappings as a default OTTL pipeline. Operator | Release | Operator-visible breaks | Internal changes | |---|---|---| -| **v0.1.0** | self-tel metric rename `tracecore.*` → `otelcol_*`; release-artifact provenance shape change (documented once) | OCB skeleton; upstream `componentstatus` adoption; release pipeline → goreleaser stack; image build → `ko`; `clockreceiver` → `telemetrygeneratorreceiver` | -| **v0.2.0** | ALL recipe-side receiver swaps. ONE migration guide. Helm values keys map old→new for one minor release with `NOTES.txt` deprecation warning. | Delete `components/receivers/{kernelevents,k8sevents,dcgm,kueue,clockreceiver,containerstdout}`; ship recipes for `filelog+container`, `journald+filelog+OTTL`, `k8sobjects+transform`, `prometheusreceiver` (dcgm-exporter + Kueue); receivers-only Go module split | +| **v0.1.0** | self-tel metric rename `tracecore.*` → `otelcol_*`; release-artifact provenance shape change (documented once) | OCB skeleton; upstream `componentstatus` adoption; release pipeline → goreleaser stack; image build → `ko`; bench heartbeat swap `clockreceiver` → `hostmetricsreceiver` (chart-default flip deferred to v0.2.0) | +| **v0.2.0** | ALL recipe-side receiver swaps + chart-default flip from `clockreceiver` to `hostmetrics`. ONE migration guide. Helm values keys map old→new for one minor release with `NOTES.txt` deprecation warning. | Delete `components/receivers/{kernelevents,k8sevents,dcgm,kueue,clockreceiver,containerstdout}`; ship recipes for `filelog+container`, `journald+filelog+OTTL`, `k8sobjects+transform`, `prometheusreceiver` (dcgm-exporter + Kueue); receivers-only Go module split | | **v0.3.0** | Python profiling: `tracecore_pyspy` PyPI helper deleted; operator deploys `parca-agent` DaemonSet via separate chart; security posture changes (CAP_SYS_PTRACE → CAP_SYS_ADMIN/BPF — review window) | Delete `components/receivers/pyspy/`, `python/tracecore_pyspy/`, `tools/pyspy-lint/`, `.github/workflows/{pyspy-integration,python-publish}.yml`. Kineto re-evaluated when OTel Profiles GA. | Pre-v0.1.0 receivers with zero pilot deployments → clean delete in v0.1.0. Receivers with ≥1 pilot deployment (check D6 NPS Discussion list before cutting v0.1.0) → one-version deprecation warn → delete in next minor. Pre-v0.1.0 there is no compat owed by SemVer; this policy is operator-courtesy, not contract. @@ -134,7 +134,7 @@ Likely contribution slots opened by the pivot: - **`dcgm-exporter`** + AMD/Intel/Habana counterparts: cross-vendor `gpu.vendor` resource attribute. Propose to OTel `hw.*` semconv. - **OTel Profiles**: PyTorch Kineto chrome-trace adapter when the profile spec stabilizes (likely a separate `pprofreceiver`-adjacent format converter). - **OCB (`go.opentelemetry.io/collector/cmd/builder`)**: build-info fields and reproducibility flags if `-trimpath` / `SOURCE_DATE_EPOCH` plumbing is incomplete. -- **`telemetrygeneratorreceiver`**: rate-limit knobs for install-bench cadence if missing. +- **`hostmetricsreceiver`**: scraper subset selection ergonomics for low-cardinality install-bench-style use (currently the bench enables only the loadscraper; chart surface stays minimal until/unless other scrapers prove useful). When a contribution is in-flight, tracecore ships against a `replace` directive in `go.mod` pointing at the contribution branch. The replace is removed when the upstream tag lands. @@ -155,7 +155,7 @@ In-tree receivers deleted at v0.2.0 (or v0.1.0 if no pilots): | Path | Replacement | Cuts | |---|---|---| -| `components/receivers/clockreceiver/` | `telemetrygeneratorreceiver` | v0.1.0 | +| `components/receivers/clockreceiver/` | `hostmetricsreceiver` (loadscraper @ 1s) — see PR-E note below | v0.2.0 (source delete coupled to test-fixture migration in PR-K) | | `components/receivers/containerstdout/` | `filelogreceiver` + container stanza + `file_storage` extension | v0.2.0 (pending pilot audit per Open Question #1) | | `components/receivers/dcgm/` | `dcgm-exporter` + `prometheusreceiver` (recipe) | v0.1.0 (cgo stub never shipped real path) | | `components/receivers/k8sevents/` | `k8sobjectsreceiver` + OTTL transform | v0.2.0 | @@ -234,8 +234,8 @@ PR sequencing within v0.1.0: 2. **PR-B**: Self-telemetry adopts upstream `componentstatus` + `service/telemetry`. Metric rename `tracecore.*` → `otelcol_*` with one-line migration note in `CHANGELOG.md`. 3. **PR-C**: Release pipeline switches to goreleaser stack. Old `release.yml` archived under `.github/workflows/archived/`. 4. **PR-D**: Image build moves to `ko`. Chart `image.repository` continues to resolve. -5. **PR-E**: `clockreceiver` → `telemetrygeneratorreceiver` in OCB manifest + bench-install Helm values. -6. **PR-F**: Delete `internal/componentstatus`, `internal/selftelemetry`, `internal/telemetry`. Delete `components/receivers/{clockreceiver,dcgm,kueue,kineto}` (none shipped real code). +5. **PR-E**: `clockreceiver` → `hostmetricsreceiver` (loadscraper @ 1s) in OCB manifest + bench-install Helm values. The originally-planned `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib at any tag (verified 2026-05-30; contrib issues #41687 and #43657 both closed `not_planned`). hostmetrics' loadscraper emits 3 low-cardinality series (`system.cpu.load_average.{1m,5m,15m}`) and satisfies the bench's "first parseable JSON line at sink" pass condition. Scope deferral: chart default stays `clockreceiver` and the in-tree source survives this PR (~92 references across `cmd/tracecore/*_test.go` + `internal/pipeline` + `internal/selftelemetry` fixtures); chart-default flip + source deletion ship as part of PR-K alongside coordinated test-fixture migration and the values-keys `NOTES.txt` deprecation cycle. +6. **PR-F**: Delete `internal/componentstatus`, `internal/selftelemetry`, `internal/telemetry`. Delete `components/receivers/{dcgm,kueue,kineto}` (none shipped real code). `clockreceiver` source deletion deferred to PR-K — see PR-E note for rationale. 7. **PR-G**: Supersede RFCs (add status headers + redirects). Move RFC-0004 to `archived/`. 8. **PR-H**: Update top-level docs (README, NORTHSTARS, STRATEGY, PRINCIPLES, MILESTONES, CHANGELOG, CONTRIBUTING, AGENTS, docs/README). @@ -243,7 +243,7 @@ v0.2.0 sequencing: 1. **PR-I**: Receivers-only Go module `tracecoreai/tracecore-components` extracted. Migrate `ncclfr` + pattern detectors + custom OTTL processors. OCB manifest pulls via `gomod:`. 2. **PR-J**: Ship recipes: `filelogreceiver + container stanza + file_storage`, `journaldreceiver + filelogreceiver + OTTL transform`, `k8sobjectsreceiver + transform`, `prometheusreceiver` (Kueue + dcgm-exporter). Helm chart values old→new compat map with `NOTES.txt` deprecation warning. -3. **PR-K**: Delete `components/receivers/{kernelevents,k8sevents,containerstdout}`. Delete `.github/workflows/kernelevents-integration.yml`. Delete `.github/ISSUE_TEMPLATE/component-bug-kernelevents.yml`. M19 cross-signal join test moves to `processor/rankjoinprocessor/` integration suite against filelogreceiver + k8sobjectsreceiver inputs. (Kineto already deleted in PR-F per #168; PR-O retains the OTel Profiles GA re-evaluation hook.) +3. **PR-K**: Delete `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}` together with the test-fixture migration (cmd/tracecore/*_test.go + internal/pipeline + internal/selftelemetry fixtures move to hostmetricsreceiver / filelogreceiver shapes). Flip chart default from `clockreceiver` to `hostmetrics` with `NOTES.txt` deprecation warning. Delete `.github/workflows/kernelevents-integration.yml`. Delete `.github/ISSUE_TEMPLATE/component-bug-kernelevents.yml`. M19 cross-signal join test moves to `processor/rankjoinprocessor/` integration suite against filelogreceiver + k8sobjectsreceiver inputs. (Kineto already deleted in PR-F per #168; PR-O retains the OTel Profiles GA re-evaluation hook.) 4. **PR-L**: Migration guide in `docs/migration/v0.1-to-v0.2.md` covering every operator-visible change. v0.3.0 sequencing: @@ -263,7 +263,7 @@ v0.3.0 sequencing: - [otel-contrib k8sobjectsreceiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sobjectsreceiver) - [otel-contrib journaldreceiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/journaldreceiver) - [otel-contrib prometheusreceiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver) -- [otel-contrib telemetrygeneratorreceiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/telemetrygeneratorreceiver) +- [otel-contrib hostmetricsreceiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) (loadscraper used as bench heartbeat; see PR-E) - [parca-dev/parca-agent](https://github.com/parca-dev/parca-agent) - [OTel Profiles signal (Alpha, Mar 2026)](https://opentelemetry.io/blog/2026/profiles-alpha/) - [Perfetto trace_processor](https://perfetto.dev/docs/analysis/trace-processor) diff --git a/install/kubernetes/tracecore/values.yaml b/install/kubernetes/tracecore/values.yaml index 829e648d..9907b393 100644 --- a/install/kubernetes/tracecore/values.yaml +++ b/install/kubernetes/tracecore/values.yaml @@ -84,6 +84,24 @@ receivers: clockreceiver: enabled: true interval: 1s + # hostmetrics — upstream OTel-contrib receiver, OCB-bundled. Added in + # RFC-0013 PR-E as the bench-install heartbeat source (load scraper @ + # 1s) replacing the planned-but-nonexistent telemetrygeneratorreceiver. + # Opt-in (enabled: false) this release; the chart default remains + # clockreceiver until PR-K flips it as part of the in-tree-receiver + # deletion wave so the values-keys migration ships with NOTES.txt + # deprecation warnings in one coordinated cut. + # + # Scrapers: the loadscraper emits 3 low-cardinality series + # (system.cpu.load_average.{1m,5m,15m}). Other scrapers (cpu, memory, + # disk, filesystem, network, paging, processes, process) are + # available via the free-form config: block below if needed; the + # chart-surface keys here cover only the bench's heartbeat need. + hostmetrics: + enabled: false + collection_interval: 1s + scrapers: + load: {} dcgm: enabled: false mode: standalone From a3c5432a8073e4c77ee3407e2bb0a9b99d96b648 Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Sat, 30 May 2026 19:23:47 -0700 Subject: [PATCH 2/2] chore(pivot): PR-E adversarial-review fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address adversarial-review findings on #180: - bench/install/README.md: document `clockreceiver_interval_seconds` field is a historical schema-v1 name; semantics still correct (heartbeat-receiver emit period). Schema v2 will rename alongside PR-K chart-default flip. Also update tick-aliasing caveat to name both receivers. - docs/rfcs/0013-distro-first-pivot.md §4: tighten v0.1.0 release row to match CHANGELOG wording ("bench/install/tracecore-values.yaml" scope explicit; "source delete deferred to v0.2.0 / PR-K"). Hostmetrics @ 1s loadscraper verified locally on darwin/arm64 against the OCB-built binary: 3 datapoints (system.cpu.load_average.{1m,5m,15m}) emit 1.002s after pipeline start, no errors, no warnings. Co-Authored-By: Claude Opus 4.7 (1M context) Signed-off-by: Tri Lam --- bench/install/README.md | 15 +++++++++++---- docs/rfcs/0013-distro-first-pivot.md | 2 +- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/bench/install/README.md b/bench/install/README.md index b3e65cce..11f56796 100644 --- a/bench/install/README.md +++ b/bench/install/README.md @@ -53,8 +53,13 @@ for the JSON Schema. Each row carries: - `install_seconds`; `helm install` return time - `first_data_seconds`; first OTLP byte at the sink (tick-aliased, see `clockreceiver_interval_seconds`) -- `clockreceiver_interval_seconds`; receiver tick period, for - tick-alias correction across runs with different intervals +- `clockreceiver_interval_seconds`; heartbeat-receiver emit period, + for tick-alias correction across runs with different intervals. + **Note (RFC-0013 PR-E, 2026-05-30):** the bench heartbeat source + is now `hostmetricsreceiver` (loadscraper @ 1s); the field name + is preserved for schema-v1 stability. Schema v2 will rename to + `heartbeat_interval_seconds` alongside the PR-K chart-default + flip. - `poll_interval_ms`; sink-side polling cadence (noise floor for `first_data_seconds`) - envelope fields per the shared schema @@ -67,8 +72,10 @@ for the JSON Schema. Each row carries: `components/exporters/otlphttp/otlphttp_test.go`), but install-bench validates the metrics wire path only. Adding traces+logs to the bench is tracked in `docs/FOLLOWUPS.md`. -- **First-data is tick-aliased.** The clockreceiver fires on a 1 s - interval by default; `first_data_seconds` includes up to one full +- **First-data is tick-aliased.** The bench heartbeat source emits + on a 1 s interval (hostmetricsreceiver loadscraper as of PR-E; + was clockreceiver pre-PR-E — chart default remains clockreceiver + this release); `first_data_seconds` includes up to one full tick of wait. Subtract `clockreceiver_interval_seconds` for the pipeline-startup latency. - **No multi-arch yet.** ubuntu-latest only. arm64 GHA runners were diff --git a/docs/rfcs/0013-distro-first-pivot.md b/docs/rfcs/0013-distro-first-pivot.md index db8bf830..ce3a719c 100644 --- a/docs/rfcs/0013-distro-first-pivot.md +++ b/docs/rfcs/0013-distro-first-pivot.md @@ -116,7 +116,7 @@ The bundled Helm chart ships these mappings as a default OTTL pipeline. Operator | Release | Operator-visible breaks | Internal changes | |---|---|---| -| **v0.1.0** | self-tel metric rename `tracecore.*` → `otelcol_*`; release-artifact provenance shape change (documented once) | OCB skeleton; upstream `componentstatus` adoption; release pipeline → goreleaser stack; image build → `ko`; bench heartbeat swap `clockreceiver` → `hostmetricsreceiver` (chart-default flip deferred to v0.2.0) | +| **v0.1.0** | self-tel metric rename `tracecore.*` → `otelcol_*`; release-artifact provenance shape change (documented once) | OCB skeleton; upstream `componentstatus` adoption; release pipeline → goreleaser stack; image build → `ko`; bench heartbeat swap `clockreceiver` → `hostmetricsreceiver` in `bench/install/tracecore-values.yaml` (chart default + source delete deferred to v0.2.0 / PR-K per PR-E rationale) | | **v0.2.0** | ALL recipe-side receiver swaps + chart-default flip from `clockreceiver` to `hostmetrics`. ONE migration guide. Helm values keys map old→new for one minor release with `NOTES.txt` deprecation warning. | Delete `components/receivers/{kernelevents,k8sevents,dcgm,kueue,clockreceiver,containerstdout}`; ship recipes for `filelog+container`, `journald+filelog+OTTL`, `k8sobjects+transform`, `prometheusreceiver` (dcgm-exporter + Kueue); receivers-only Go module split | | **v0.3.0** | Python profiling: `tracecore_pyspy` PyPI helper deleted; operator deploys `parca-agent` DaemonSet via separate chart; security posture changes (CAP_SYS_PTRACE → CAP_SYS_ADMIN/BPF — review window) | Delete `components/receivers/pyspy/`, `python/tracecore_pyspy/`, `tools/pyspy-lint/`, `.github/workflows/{pyspy-integration,python-publish}.yml`. Kineto re-evaluated when OTel Profiles GA. |