Skip to content

feat(pivot): port k8sevents off internal selftel + lifecycle (PR-F unblock)#196

Merged
trilamsr merged 1 commit into
mainfrom
pr-k8sevents-selftel-port
May 31, 2026
Merged

feat(pivot): port k8sevents off internal selftel + lifecycle (PR-F unblock)#196
trilamsr merged 1 commit into
mainfrom
pr-k8sevents-selftel-port

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

Summary

RFC-0013 PR-F unblock: ports components/receivers/k8sevents off
internal/selftelemetry + internal/runtime/lifecycle to package-local
sibling types, mirroring PR-B1 (#184, nccl_fr) and PR-B2 (#187,
kernelevents). One more receiver off the internal helpers; clears
another blocker for PR-F's deletion of those internal packages.

Why

internal/selftelemetry and internal/runtime/lifecycle are slated
for deletion in RFC-0013 PR-F. Every in-tree user must migrate first.
PR-B1 landed the pattern (nccl_fr); PR-B2 applied it to
kernelevents; this PR applies it to k8sevents. Receivers landing
post-PR-F will own their selftel + lifecycle in their own submodule per
OTel convention — this PR puts k8sevents on that footing.

Pattern choice (multi-source, keeps Add())

k8sevents is multi-source (Events informer + Node informer driver
goroutines both join the receiver's lifecycle), so this PR follows
the PR-B2 kernelevents sibling rather than the slimmer PR-B1 nccl_fr
sibling — keeps lifecycle.Add() (with its post-Shutdown and
pre-Start refusal guards, plus the mutex-guarded TOCTOU-safe
Add/Shutdown interleave) so the SharedInformerFactory driver
goroutine joins the same WaitGroup as the run-loop goroutine.

What changed

  • New components/receivers/k8sevents/selftel.go — local
    selfTelemetry interface + noop + OTel-backed implementations +
    recordInitError fallback ticker. Instrumentation scope name pinned
    to k8sevents' Go import path per OTel convention (PR-I will move it
    to the submodule path when k8sevents goes external). Metric names
    (tracecore.receiver.errors_total{kind,component_id} and siblings)
    preserved so dashboards/alerts don't regress.
  • New components/receivers/k8sevents/lifecycle.go — local
    lifecycle helper (Start + Add + Shutdown + panic-recovery +
    TOCTOU-safe mutex-guarded closed/Add interleave).
  • New selftel_test.go + lifecycle_test.go — TDD-first, no
    internal/telemetry dep. Pin: noop safety across every kind,
    nil-MeterProvider error, errors_total kind+component_id labels,
    scope-name = receiver Go import path, init_errors_total tick,
    factory noop fallback, Start/Shutdown/Add contracts, Add
    refusal-modes, TOCTOU concurrent Add-during-Shutdown stress.
  • factory.go, receiver.go — drop internal imports, wire local
    selfTelemetry + lifecycle. r.lc retyped from
    *lifecycle.Lifecycle to *lifecycle; r.telemetry retyped from
    selftelemetry.Receiver to selfTelemetry. Exported KindWatch
    / KindBackpressureDrop / KindNode* constants retyped from
    selftelemetry.Kind to package-local kind alias (load-bearing
    for the K8sEventsReceiverDegraded alert label values, which stay
    byte-identical). Added re-exported KindParse / KindDownstream
    / KindCardinality / KindPanic so external _test packages can
    still partition assertions on canonical kinds without depending on
    the deleted internal package.
  • export_test.go — test-only recordingTel replaced by exported
    CapturingTelemetry (mirrors selftelemetry.CapturingReceiver,
    trimmed to the k8sevents surface). External _test packages get
    a stable shim that survives the internal/selftelemetry deletion.
  • receiver_test.go, node_failure_modes_test.go — drop
    internal/selftelemetry import; redirect through
    k8sevents.CapturingTelemetry. Per-test bodies otherwise
    unchanged (kept the recordingTel symbol as a thin wrapper).

Verification

  • make check clean (golangci-lint 0 issues, vet clean, mod-verify)
  • make verify clean (incl. doc-check + alert-check + chart-check
    • no-autoupdate)
  • go test -race -count=1 ./... — entire repo green (incl. k8sevents
    • kernelevents + every internal package, fixture, tool)
  • go test -race -count=1 ./components/receivers/k8sevents/...
    all green; lifecycle TOCTOU stress test passes deterministically
    under -race
  • Goroutine leaks: pre-existing TestReceiver_GoleakNoLeakAfterShutdown
    continues to pass after the lifecycle rewire

Out of scope (deferred)

  • PR-F itself (deleting internal/selftelemetry +
    internal/runtime/lifecycle) — gated on the remaining 3 wave-1
    consumers (pyspy, otlphttp, containerstdout) landing their ports.
    This PR is one of those four.
NONE

RFC-0013 PR-F unblock: ports `components/receivers/k8sevents` off
`internal/selftelemetry` + `internal/runtime/lifecycle` to package-local
sibling types, mirroring PR-B1 (#184, nccl_fr) and PR-B2 (#187,
kernelevents). One more receiver off the internal helpers; clears
another blocker for PR-F's deletion of those internal packages.

## Pattern

Followed the multi-source kernelevents sibling (k8sevents has Events
informer + Node informer driver goroutines sharing one lifecycle.Add),
not the slimmer single-source nccl_fr sibling. New files:

- `selftel.go` — package-local `selfTelemetry` interface, noop +
  OTel-backed implementations, `recordInitError` fallback ticker.
  Instrumentation scope name pinned to k8sevents' Go import path per
  OTel convention (PR-I will move it to the submodule path).
- `lifecycle.go` — package-local `lifecycle` helper with Start +
  Add + Shutdown + panic-recovery + TOCTOU-safe mutex-guarded
  closed/Add interleave. Keeps `Add()` so the informer driver
  goroutine joins the receiver's WaitGroup.
- `selftel_test.go` — TDD-first ManualReader-backed assertions
  pinning errors_total{kind,component_id}, emissions_total, scope
  name, init_errors_total tick, nil-MeterProvider safety, factory
  noop-fallback.
- `lifecycle_test.go` — Start/Shutdown/Add contracts +
  TOCTOU-concurrent-Add-during-Shutdown stress (passes under -race).

## Existing tests rewired

- `export_test.go` — `recordingTel` (test-only) replaced by exported
  `CapturingTelemetry` so external `_test` packages can assert
  without depending on the deleted internal/selftelemetry package.
- `receiver_test.go` — drops `selftelemetry` import; per-test bodies
  unchanged via thin `newRecordingTel()` shim.
- `node_failure_modes_test.go` — switches from
  `selftelemetry.NewCapturingReceiver()` to
  `k8sevents.NewCapturingTelemetry()`; `countKinds` retyped to
  `[]string` so callers compare against the exported `KindNodeWatch`
  / `KindWatch` / `KindParse` constants directly.

## Verification

- `make check` / `make verify` clean (golangci-lint 0 issues, vet
  clean, mod-verify, doc-check)
- `go test -race -count=1 ./...` entire repo green
- `go test -race -count=1 ./components/receivers/k8sevents/...` green

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 06:14
@trilamsr trilamsr disabled auto-merge May 31, 2026 06:16
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 06:25
@trilamsr trilamsr merged commit d324dc0 into main May 31, 2026
13 checks passed
@trilamsr trilamsr deleted the pr-k8sevents-selftel-port branch May 31, 2026 06:25
trilamsr added a commit that referenced this pull request May 31, 2026
)

## Summary

Four amendments to `docs/rfcs/0013-distro-first-pivot.md` (plus a
one-line sweep across 6 companion docs) per the scope-review findings
staged before PR-I.1 / PR-K / PR-M code work begins. Pre-stages each
decision in the RFC so the autonomous code PRs don't escalate
mid-flight.

## Root cause

#181 (RFC-0013 PR-I in-repo submodule rescope) was incomplete:

1. Sweep missed 6 companion docs still pointing at the original-design
external `tracecoreai/tracecore-components` repo.
2. §7 listed 3 GitHub workflows for deletion that were already removed
pre-RFC and 1 issue template (`component-bug-dcgm.yml`) that was already
removed pre-RFC.
3. PR-K was a single 4-receiver-delete-plus-chart-migration mega-PR with
no decoupling of the `internal/synthesis/patterns/` k8sevents dep break,
which is on PR-I.2's critical path.
4. PR-I.1 conflated the `module/go.mod` scaffolding with the `git mv` +
package rename, blocking PR-I.1a from landing without PR-B2 even though
the scaffolding step has no nccl_fr dep.

Mid-flight discovery during merge cycle: merged commit #188
(`feat(pivot): PR-B2 — port dcgm off internal selftel + lifecycle`)
reused the `PR-B2` slug for a PR-B1-shape dcgm port (which is moot since
dcgm is deleted entirely in PR-F), creating a naming collision against
the canonical PR-B2 defined in the RFC — the nccl_fr
`internal/{pipeline,consumer,runtime/lifecycle}` → upstream port that
hard-gates the PR-I.1b `git mv`.

## Amendments

1. **§6/§7 sweep miss (Amendment 1)**: Remove surviving
`tracecoreai/tracecore-components` external-repo references across
`docs/getting-started.md`, `docs/followups/M11.md`,
`docs/followups/M19.md`, `docs/FOLLOWUPS.md`,
`docs/rfcs/0003-pipeline-runtime-and-component-contract.md`,
`AGENTS.md`. All re-pointed at `github.com/tracecoreai/tracecore/module`
per RFC-0013 §6. Verified zero surviving stale refs.
2. **§7 nonexistent workflow entries (Amendment 2)**: Collapse
`pyspy-integration.yml`, `python-publish.yml`,
`kernelevents-integration.yml` deletion rows into one row marked
"already removed pre-RFC". `component-bug-dcgm.yml` also already
removed. Only `component-bug-kernelevents.yml` survives for PR-K. §4
v0.3.0 row + PR-M slug cleaned for consistency.
3. **§migration PR-K sub-slice (Amendment 3)**:
- **PR-K.1** — sever `internal/synthesis/patterns/` from
`components/receivers/k8sevents` via local model types in
`internal/synthesis/patterns/model.go`. No deletions. **Unblocks
PR-I.2.**
- **PR-K.2** — delete
`components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}`
+ migrate ~86 test fixtures + delete `tools/failure-inject/xidgen/` +
keep `tools/failure-inject/ncclhang/`.
- **PR-K.3** — chart cleanup: flip `containerstdout-on-values.yaml` to
filelog+container-stanza, delete `containerstdout-rbac.yaml`, delete
`.github/ISSUE_TEMPLATE/component-bug-kernelevents.yml`, ship
`NOTES.txt` deprecation + values-key removal.
4. **§migration PR-I sub-slice + PR-B2 promotion (Amendment 4)**:
- **PR-B2** reframed as hard gate for PR-I.1b: port
`components/receivers/nccl_fr` off
`internal/{pipeline,consumer,runtime/lifecycle}` to upstream
`go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`.
Slug-collision note added re: merged #188.
- **PR-I.1a** — `module/go.mod` + root `go.work` + `builder-config.yaml`
`replaces:` skeleton. No file movement. Tag `module/v0.0.1` (genesis
tag, validates the tagging contract).
- **PR-I.1b** — `git mv components/receivers/nccl_fr →
module/receiver/ncclfrreceiver` + `git mv pkg/nccl/fr_parser →
module/pkg/nccl/fr_parser` + rename Go package `nccl_fr` →
`ncclfrreceiver` + update all importers. Hard-gated on PR-B2. No new
tag; next bump is `module/v0.1.0` at PR-I.2.
- **PR-I.2** — `rankjoinprocessor` + `patterndetectorprocessor` net-new.
Hard-gated on PR-K.1. Tag `module/v0.1.0` (first version pinned in
`builder-config.yaml` for v0.2.0).

Also: PR-J marked `(landed, #195)` with note that recipe docs landed but
chart-values compat map follows in PR-K.3.

## Adversarial review (5 lenses, inline)

- **(a) PR slug internal consistency**: PR-I.1b ↔ PR-B2 ↔ PR-I.2 ↔
PR-K.1 bidirectional gates all match. PR-J landed marker consistent with
#195. §4 v0.2.0 row has pre-existing drift (mentions dcgm+kueue in
v0.2.0 when PR-F+#168 already deleted them in v0.1.0) — out of these 4
amendments' scope; flag for follow-up.
- **(b) PR-B2 hard-gate naming**: tightened from "Hard gate for PR-I.1"
to "Hard gate for PR-I.1b" — accurate because PR-I.1a is
scaffolding-only with no file movement.
- **(c) Sub-PR numbering collision**: #188 explicitly addressed in
slug-collision note. #185/#186/#187/#193/#194/#196 are PR-B1-shape ports
for non-nccl receivers (no PR-slug label in their commits), no
collision.
- **(d) Stale external-repo refs**: `grep -rn
"tracecoreai/tracecore-components" docs/ AGENTS.md README.md` returns
zero hits post-amendment.
- **(e) Cross-reference link integrity**:
`docs/migration/v0.1-to-v0.2.md` references `#migration--rollout` and
`#3-customer-stable-telemetry-contracts`; both anchors preserved (§
headers unchanged). `make doc-check` confirms 526 markdown links
resolve.

## Test plan

- [x] `make doc-check` — 526 markdown links resolve, 0 stale refs,
banned-phrase lint clean, alert-check + chart-appversion gates green.
- [x] Pre-push hooks: golangci-lint clean, go vet clean, go mod verify
clean.
- [ ] CI doc-check + actionlint + zizmor gates pass on PR.

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr pushed a commit that referenced this pull request May 31, 2026
Reviewer findings on PR #197:

1. Delete KindFingerprintCardinality. The kind had zero IncError call
   sites in production — only defensive coverage in TestSelfTel_*
   and a t.Skip()'d "Phase 14" deferred test. Per the dcgm sibling
   rule "every kind has an impl call site" + [[no-bloat]], deleted
   the const, the failure_modes_test.go skipped test, kind_test.go
   row, RUNBOOK section, README alert row, prometheus-alerts entry,
   RFC-0010 enumeration, and docs/followups/M15.md grep example.
   The original tailer-pool LRU it was meant to instrument never
   materialised (footnoted in RFC-0010 §M15).

2. Rename 9 TestSelfTel_* → TestContainerstdout_SelfTel_* so test
   names disambiguate from sibling receivers' selftel tests (matches
   the lifecycle-test prefix convention reviewer flagged).

3. Reduce export surface — NewTelemetry, NewNoopTelemetry, and
   ErrNilMeterProvider had zero external callers (factory + selftel_test
   only, both same-package), so unexported to newTelemetry,
   newNoopTelemetry, errNilMeterProvider. Telemetry interface, Kind
   type, KindXxx constants, and CapturingTelemetry/NewCapturingTelemetry
   stay EXPORTED — they are used by ~10 external test files in package
   containerstdout_test (cursor_test, attribution_test, ratelimit_test,
   tailer_test, factory_test, etc.) and operators grep KindXxx names
   in dashboards/alert rules per RFC-0010. Unexporting them would
   force converting every external test into the internal package,
   which is net bloat against the symbol-hiding gain.

make check + go test -race ./components/receivers/containerstdout/...
+ go test ./... all green (excluding pre-existing TestK8sevents_
Lifecycle_ConcurrentAddDuringShutdown_NoPanic flake from #196).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Ports `components/receivers/containerstdout` off the
`internal/selftelemetry` + `internal/runtime/lifecycle` packages —
biggest port in the RFC-0013 PR-F unblock wave (14 production `.go`
files, 24 deleted import sites, ~80 selftelemetry/lifecycle references
rewired). Receiver now owns its self-telemetry + streaming-source
lifecycle surfaces in-tree, mirroring the kernelevents (multi-source) +
dcgm (co-located Kind block) sibling patterns. Zero remaining imports of
the internal packages from inside `containerstdout/`.

Unblocks PR-F's deletion of `internal/selftelemetry` +
`internal/runtime/lifecycle` once the remaining sibling ports
(pyspy/otlphttp/k8sevents) land.

## Design notes

- **`selftel.go` exported surface.** Per reviewer findings on the
initial port:
- `Telemetry` interface, `Kind` type, `KindXxx` constants, and
`CapturingTelemetry` / `NewCapturingTelemetry` stay EXPORTED. They are
consumed by ~10 external test files in package `containerstdout_test`
(cursor_test, attribution_test, ratelimit_test, tailer_test,
factory_test, attribution_body_test, attribution_informer_test,
dataloader_test, pattern_consumer_test, failure_modes_test). Operators
also grep `KindXxx` names in dashboards/alert rules per RFC-0010.
Symbol-hiding here would force every external test into the internal
package — a much larger blast radius than the symbol export.
- `newTelemetry`, `newNoopTelemetry`, `errNilMeterProvider` are
UNEXPORTED. Factory + same-package `selftel_test.go` are the only
callers; no godoc-visible surface gain from being exported.
- **Canonical + receiver-local `Kind*` co-located.** One const block in
`selftel.go` for the full kind set
(`KindParse`/`KindRead`/`KindCardinality`/`KindDownstream`/`KindPanic`
canonical mirrors +
`KindRotationStalled`/`KindCursorWriteFailed`/`KindBackpressureDrop`/`KindAttributionCardinality`/`KindRateLimitCardinality`/`KindWatch`
receiver-local). Replaces the prior split between `kind.go` and
`internal/selftelemetry`. Matches the dcgm sibling layout. `kind.go`
deleted; `kind_test.go` updated to also pin the canonical mirror values
(extra coverage, since string-byte parity with the deleted internal
`Kind*` values is now load-bearing for dashboard/alert grep).
- **`KindFingerprintCardinality` dropped (reviewer finding).** The
original RFC-0010 enumeration declared a fingerprint-LRU cardinality
kind, but the tailer-pool LRU it was meant to instrument never
materialised — zero `IncError` call sites in production code, only
defensive coverage in `TestSelfTel_*` and a `t.Skip()`'d "Phase 14"
deferred test. Per the dcgm sibling rule "every kind has an impl call
site" + `[[no-bloat]]`, deleted the const + the skipped test + RUNBOOK
section + README row + prometheus-alerts entry + RFC-0010 enumeration +
docs/followups/M15.md grep example. Footnoted in RFC-0010 §M15.
- **`lifecycle.go` — receiver-scoped lifecycle with `Add()`.**
Multi-source: the per-tailer `Run` + per-tailer pipeline + informer +
`healthLoop` + `idleEvictLoop` all join the same WaitGroup. TOCTOU-safe
`Add` (mutex around `(closed-check, wg.Add(1))`).
- **`CapturingTelemetry` lives in production file.** External `_test`
packages need to construct it (replaces
`selftelemetry.NewCapturingReceiver()` in 7 test files). Putting it in a
`_test.go` file would force a test-helper package or build-tag dance.
The dead-code cost in the production binary is negligible.
- **Test-rig `_test.go` files swap
`selftelemetry.NewCapturingReceiver()` →
`containerstdout.NewCapturingTelemetry()` mechanically.** Same shape,
same accessors, same `[]Kind` slice return for assertion.
- **All 9 selftel tests prefixed `TestContainerstdout_SelfTel_*`**
(reviewer finding) so test names disambiguate from sibling receivers'
selftel tests — matches the lifecycle-test prefix convention.

## Test plan

- [x] `go test ./components/receivers/containerstdout/ -count=1 -race` —
green.
- [x] `go test ./... -count=1` — full repo green except a pre-existing
flake in `TestK8sevents_Lifecycle_ConcurrentAddDuringShutdown_NoPanic`
introduced by #196 (race-window collapse on fast machines; 3/5
retry-pass; unrelated to this PR).
- [x] `make check` — gofumpt clean, golangci-lint 0 issues, vet clean,
go.mod verified.
- [x] `TestTailer_RotationStalledKind` ×10 under `-race` — green.
- [x]
`TestContainerstdout_Lifecycle_ConcurrentAddDuringShutdown_NoPanic` ×20
under `-race` — green. Scheduler-hardened with a `shutdownGate` so the
test deterministically exercises the TOCTOU race window.
- [x] New selftel tests pin: noop safety across every Kind, nil-MP error
sentinel, `errors_total` w/ kind + component_id labels, every
receiver-local Kind routes through the same counter, emissions
discard-negative, scope name = receiver import path, `init_errors_total`
tick on factory fallback, nil-MP `recordInitError` panic-safe,
`CapturingTelemetry` round-trip.
- [x] New lifecycle tests pin:
happy/idempotent/panic-cb/deadline-wrap/`Add`-shares-WG/`Add`-panic-fires-cb/`Add`-before-Start-noop/`Add`-after-Shutdown-noop/concurrent-`Add`-during-Shutdown
TOCTOU.

```release-notes
NONE
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
#206)

## Summary

Deletes the three internal moats and the in-tree DCGM receiver that
RFC-0013 §migration step 8 promised — the payoff for the wave-3
sibling-port PRs (#184/#185/#186/#187/#188/#193/#194/#196/#197).

**Net: -12,482 LOC across 92 files (78 deletions, 14 modifications).**

### What deletes

| Path | LOC | Why safe now |
|---|---|---|
| `components/receivers/dcgm/` | 7,604 | cgo stub never shipped real
code; #188's PR-B2-shaped dcgm sweep already removed the live port
surface. |
| `pkg/dcgm/` | 922 | Only consumer was the deleted receiver. Bonus
cleanup. |
| `internal/selftelemetry/` | 1,946 | Every consumer (containerstdout,
clockreceiver, kernelevents, k8sevents, nccl_fr, dcgm, pyspy,
stdoutexporter, otlphttp) ported onto receiver/exporter-scoped sibling
`selftel.go` files. |
| `internal/telemetry/` | 1,991 | Probes flow through upstream
`healthcheckextension`; MeterProvider via upstream `service.telemetry`.
Only remaining consumers were `internal/selftelemetry/*_test.go`
(deleted together) + one orphan clockreceiver test. |
| `components/receivers/clockreceiver/errors_integration_test.go` | 100
| Orphan from #185's PR-B1 clockreceiver port — bootstrapped via the
deleted `selftelemetry.Receiver` interface but never migrated to the
receiver-scoped sibling `selftel.go`. Covered behaviour ("errors_total
surfaces on downstream failure") is now exercised through
clockreceiver's sibling tests. |

### Pre-flight grep evidence (post-merge of origin/main)

```
$ grep -rn "tracecoreai/tracecore/internal/selftelemetry" --include="*.go" .
(zero matches)

$ grep -rn "tracecoreai/tracecore/internal/telemetry" --include="*.go" .
(zero matches)

$ grep -rn "tracecoreai/tracecore/components/receivers/dcgm" --include="*.go" .
$ grep -rn "tracecoreai/tracecore/pkg/dcgm" --include="*.go" .
(zero matches)
```

### Tooling

- Retire the `dcgm` build tag — `make build-tags` no longer vets `-tags
dcgm` (kept as a hook for future build-tag-gated paths).
- `make bench-check` loop drops both deleted package rows
(`internal/telemetry`, `components/receivers/dcgm`).
- `scripts/register-lint.sh` allowlist emptied (the two
`internal/telemetry/{build_info,slo}.go` entries are gone with the
package; allowlist comment notes the post-PR-F.1 state).
- `go.mod` direct deps shrink — `github.com/prometheus/client_golang`
and `go.opentelemetry.io/otel/exporters/prometheus` drop to indirect
(they were used by `internal/telemetry/server.go`).

### Chart toggles intentionally retained

Chart `receivers.dcgm` toggle + `templates/NOTES.txt` warning +
`templates/_helpers.tpl` doc-comment list keep the `dcgm` symbol for the
migration window. The toggle has been inert since PR-A2 — operators
enabling `receivers.dcgm.enabled=true` already crashed at boot because
the OCB binary doesn't register the factory. PR-K removes the toggle
entirely alongside the chart-default flip from `clockreceiver` →
`hostmetrics` and the v0.2.0 recipe migration.

### Doc sweep

- `internal/runtime/lifecycle/lifecycle.go` doc-comment: drop the dcgm
pointer; flag containerstdout as the sole remaining in-tree consumer;
reschedule the package itself for PR-F.2 deletion once containerstdout
ports off the helper or PR-K.2 deletes the receiver.
- `docs/FAILURE-MODES.md` self-tel-surface rows rewired from
`internal/telemetry/server_test.go::*` (deleted) to upstream-delegated
wording.
- `docs/patterns/{README,pattern-{1,3,4,5}}.md` replay-test pointers
updated — the in-tree `components/receivers/dcgm/pattern_replay_test.go`
is gone; pattern replay now flows through
`docs/integrations/prometheus-scrape.md` (PR-J's upstream
`dcgm-exporter` recipe).
- `docs/README.md` per-component table: drop the deleted
`internal/telemetry/{README,SECURITY}.md` rows + the deleted
`components/receivers/dcgm/{README,RUNBOOK}.md` rows.
- `STYLE.md` vendor-SDK section: drop the `pkg/dcgm/` reference + the
`//go:build dcgm` example; explicit cross-reference to PR-F.1 in the
integration-test build-tag note.
- `CHANGELOG.md`: PR-F.1 landed entry under Unreleased; "Remaining
v0.1.0 work" line updated to point at PR-F.2.
- `docs/rfcs/0013-distro-first-pivot.md` §migration step 8: PR-F entry
replaced with the PR-F.1/PR-F.2 split + the explicit rationale
(componentstatus travels with pipeline; pipeline is out of PR-F's scope
per line 240's original framing).

### Out of scope (PR-F.2 follow-up)

- `internal/componentstatus/` — 5-line `ReportStatus` free function.
Travels with `internal/pipeline` (its only non-test consumers are
`internal/pipeline/runtime_test.go` +
`internal/pipeline/pipelinetest/fixture_test.go`). Deletion lands when
pipeline migrates to upstream
`go.opentelemetry.io/collector/component/componentstatus`.

### Rationale links

- RFC-0013 §migration step 8 — the PR-F entry now codifies the F.1/F.2
split in this branch's RFC update.
- PR-B2 scope-discovery (#188) — established the "rename + slim, don't
reshape" pattern for the dcgm sweep that retired the cgo path.
- Wave-3 PRs that unblocked selftelemetry deletion: #184 (nccl_fr), #185
(clockreceiver), #186 (kernelevents), #187 (stdoutexporter), #188
(dcgm), #193 (otlphttp), #194 (pyspy), #196 (k8sevents), #197
(containerstdout).

```release-notes
[CHANGE] internal/{selftelemetry,telemetry} packages deleted; components/receivers/dcgm + pkg/dcgm deleted. Operators using the v0.1.x in-tree `tracecore.*` self-telemetry metric names migrate per docs/migration/v0.1-to-v0.2.md. Third-party importers of internal/* (unlikely pre-1.0) lose the `selftelemetry.{Receiver,Exporter}` interfaces and the `telemetry.MeterProvider` wrapper; receiver/exporter authors now wire a receiver-scoped sibling `selftel.go` per the PR-B1 pattern.
```

## Test plan

- [x] `make verify` (lint + vet + tidy-check + mod-verify +
license-check + generate-fixtures-check + build-tags + nccl-fr-rce-gate
+ register-lint + actionlint + zizmor + doc-check + no-autoupdate-check)
— exit 0.
- [x] `go test ./...` — all green (29 packages).
- [x] `make build` (OCB) — `./_build/tracecore` produced.
- [x] `./_build/tracecore --version` — `tracecore version
0.1.0-m9-alpha`.
- [x] Pre-flight greps for all four deleted paths — zero external
importers.
- [ ] CI green on PR (linux/race matrix, chart render, install-bench,
zizmor, govulncheck).
- [ ] Operator verification that the chart's `dcgm` toggle remains inert
post-merge (no behaviour change from main — already inert since PR-A2).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
#204)

## Summary

- Port `components/receivers/k8sevents` off the v0.1.x internal facades
(`internal/pipeline`, `internal/consumer`, `internal/runtime/lifecycle`)
  onto upstream
`go.opentelemetry.io/collector/{component,receiver,consumer}` v1.59.0 —
the canonical types the OCB-generated `_build/main.go` already consumes.
  Mirrors PR-B2 #201 (nccl_fr); same type-swap table applied.
- Factory is now `receiver.NewFactory(componentType,
createDefaultConfig,
receiver.WithLogs(createLogs, component.StabilityLevelBeta))` instead of
  a hand-rolled struct implementing `internal/pipeline.ReceiverFactory`.
  Stability level (`Beta`) preserved so OCB-surfaced metadata doesn't
  regress.
- `k8sEventsReceiver` implements `receiver.Logs` via `Start(ctx,
  component.Host) error` / `Shutdown(ctx) error`; the
`pipeline.ComponentState` embed dropped (upstream `component.Component`
  carries no equivalent mixin; the multi-source lifecycle bookkeeping
  already lives in the sibling `lifecycle.go` from PR #196).
- Logger swapped from `*slog.Logger` → upstream's `*zap.Logger`. Every
  call site converted to `zap.String / Int / Int64 / Duration / Strings
  / Bool / Error` fields; log messages and field names preserved
  byte-for-byte so operator log-content alerting doesn't regress.

## Multi-source specifics

Unlike PR-B1/B2 nccl_fr (single-source, single `lc.Start`), k8sevents
runs an Events informer + an optional Node informer in parallel under
the package-local lifecycle helper's `Add()`. Both sources fan into the
same `selfTelemetry` instance; the receiver-wide degraded flag stays
the OR of per-source `degradedEvent` / `degradedNode` bits — preserved
across the port. Per-source error kinds (`KindNodeWatch`,
`KindNodeParse`, `KindNodePanic`, `KindNodeBackpressureDrop`) still
fire on the correct path; the
`TestReceiver_NodeWatchErrorFiresDistinctKind` /
`TestReceiver_EmitNodePreservesEventDegraded` /
`TestReceiver_NodeParseAndPanicFireDistinctKinds` falsifiers stay
green.

## Hard gate

PR-I.1 (submodule extraction) requires zero `internal/(pipeline|
consumer|runtime/lifecycle)` import hits in the package. This PR clears
it:

```
$ grep -rn 'internal/pipeline\|internal/consumer\|internal/runtime/lifecycle' \
    components/receivers/k8sevents/*.go
components/receivers/k8sevents/factory.go:27:// internal/pipeline factory; PR-F preserves it across the upstream
components/receivers/k8sevents/lifecycle.go:4:// v0.1.x dependency on `internal/runtime/lifecycle`, which is slated
```

Two comment-only mentions remain — historical context for the
v0.1.x → v0.2.0 migration, not imports.

## Predecessor

PR #196 (merged) ported the **self-telemetry + lifecycle helpers** into
the package as siblings (with the multi-source `Add()` k8sevents needs).
This PR handles the **pipeline + consumer + factory** layer — the last
remaining `internal/*` imports in the package.

## Type-swap reference

Same mapping PR-B2 #201 established:

| Internal | Upstream |
|---|---|
| `internal/pipeline.Type` | `component.Type` |
| `internal/pipeline.ReceiverFactory` | `receiver.Factory` |
| `internal/pipeline.CreateSettings` | `receiver.Settings` (via
`receivertest.NewNopSettings` in tests) |
| `internal/pipeline.Config` | `component.Config` |
| `internal/pipeline.Receiver` | `receiver.Logs` (`= interface{
component.Component }`) |
| `internal/consumer.Logs` | `consumer.Logs` |
| `*slog.Logger` | `*zap.Logger` |
| `internal/pipeline.MustNewType` | `component.MustNewType` |
| `internal/pipeline.MustNewID` | `component.NewIDWithName` |
| `set.Telemetry.Logger` / `set.Telemetry.MeterProvider` /
`set.Telemetry.Resource` | `set.Logger` / `set.MeterProvider` /
`set.Resource` (flattened via embedded `TelemetrySettings`) |

## Test plan

- [x] `go build ./...` — green
- [x] `go test -race ./components/receivers/k8sevents/...` — 60+ tests
      pass (incl. multi-source informer + degraded-state falsifiers +
      goleak)
- [x] `go test ./...` — green except pre-existing
      `kernelevents/TestReceiver_SLIBudget` flake (passes on retry; same
      flake PR-B2 documented)
- [x] `make check` — golangci-lint + go vet + go mod verify — green
- [x] No `log/slog` imports remain in the package
- [x] Hard-gate `grep` returns only the two comment-only mentions

## Compatibility note

No new module versions added — k8sevents inherits the upstream pins
PR-B2
already brought in (`collector/component`, `collector/receiver`,
`collector/consumer` v1.59.0; `zap` 1.28.0; `receiver/receivertest`
v0.153.0).

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Reconcile the four pivot-tracking docs
(`docs/rfcs/0013-distro-first-pivot.md`, `CHANGELOG.md`,
`MILESTONES.md`, `docs/migration/v0.1-to-v0.2.md`) with the wave-3
(PR-B1-shape sibling ports) and wave-4 (PR-B2-shape upstream-only ports
+ PR-F.1 + PR-J + PR-L + PR-N) landings. Pure doc sweep — no code or
config touched.

## What changed

### `docs/rfcs/0013-distro-first-pivot.md` §migration

PR sequence rows updated with PR-number citations and landed markers:

- **PR-A2** (landed, #189, 2026-05-30)
- **PR-B2** (landed, #201) — also enumerates sibling-receiver follow-ups
under PR-B2 to dispel the slug collision with #188's PR-B2-labelled dcgm
port: stdoutexporter (#202), pyspy (#203), kernelevents (#208),
containerstdout (#209)
- **PR-F.1** (landed) — fleshed-out delete list
(`internal/{selftelemetry,telemetry}` + `components/receivers/dcgm/` +
`pkg/dcgm/` + one orphan clockreceiver integration test)
- **PR-F.2** re-scoped — now deletes the whole
`internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}`
bundle in one cut once the last three pipeline+consumer-importing
receivers land (#204 k8sevents, #205 clockreceiver, #207 otlphttp). Per
the import-graph state — `internal/componentstatus`'s only non-test
consumer is `internal/pipeline`, so they delete together
- **PR-G** (landed, #182), **PR-H** (landed, #183)
- **PR-I.1a** (in flight — scaffold agent), **PR-I.1b** (pre-staged;
gate satisfied by #201)
- **PR-J** (landed, #195) — kept existing marker
- **PR-K.1** (in flight — separate agent landing)
- **PR-L** (landed, skeleton #179 + body #191) — flagged as living
document
- **PR-N** (landed, #200) — shipped at v0.1.0 ahead of v0.3.0 as a
doc-only update at `docs/migration/v0.2-to-v0.3.md`

### `CHANGELOG.md` [Unreleased]

- Restructured the pivot wave list as **four waves** (was three). Wave 3
enumerates PR-B1-shape sibling ports + support infra (#180-#194/#196).
Wave 4 enumerates PR-B2-shape upstream-only ports + PR-J (#195) + PR-F.1
(#206) + PR-N (#200) + lint/TOCTOU hardening (#198/#210).
- Tightened the PR-F.2 deferred note to point at the three open ports
(#204/#205/#207) as the gate.

### `MILESTONES.md`

- **M1** (pipeline runtime) — status row now cites PR-A2 (#189), PR-F.1
(#206), PR-F.2 gate (#204/#205/#207), PR-E (#180), retains
`internal/config/` (still load-bearing for `tracecore validate`).
- **M2** (self-telemetry) — status row now cites PR-F.1 (#206); flags
`internal/componentstatus` as travelling with `internal/pipeline` in
PR-F.2.
- **M8** (DCGM receiver) — status flipped to *landed-and-replaced*:
cites PR-F.1 (#206) deletion + PR-J (#195)
`docs/integrations/prometheus-scrape.md` recipe. Notes the inert chart
toggle retention until PR-K.3.

### `docs/migration/v0.1-to-v0.2.md`

- §`internal/*` package deletion (PR-F) status flips from "not yet open"
to "PR-F.1 landed (#206), PR-F.2 gated on three open ports".
- Open-items checklist expanded from 5 to 13 entries — tracks every PR
letter the migration guide cares about (A2 / E / F.1 / F.2 / I.1a-c / J
/ K.1-3 / L / N) with PR numbers and links.

## Why now

Tracking docs accumulated drift across wave-3 + wave-4 because every
sibling-port PR (and the support-infra PRs around them) updated the
bottom of `CHANGELOG.md` but did not always touch the upstream
sequencing section in RFC-0013. Per memory rule `[Keeping this document
current]`: status drift is a review blocker. This PR is the consolidated
catch-up; future port PRs include their RFC-row flip in-PR.

## What this PR does NOT change

- No code, no config, no YAML, no chart — only the four tracking docs.
- No new doc gates added; existing gates pass.
- No PRs other than the four named docs are modified.

## Test plan

- [x] `bash scripts/doc-check.sh` clean (33 test refs, 528 links
resolve, comment-noise diff gate clean vs `origin/main`, all 13 gates
green).
- [x] Pre-commit hook (`commitlint` 72-char subject limit + DCO +
AI-trailer gates) passed.
- [x] Pre-push hook (`make ci-fast` equivalent: `golangci-lint`, `go
vet`, `go mod verify`, `no-autoupdate-check`, `doc-check.sh`) passed on
second attempt after `git fetch origin main` populated the worktree's
`origin/main` ref — first push failed because the worktree previously
tracked the (gone) `pr-a2-ocb-main-swap` branch, so `doc-check.sh`'s
comment-noise diff-scope gate exited 128 on the missing ref. Root cause
fixed by the fetch; not a workaround.
- [ ] CI green on this branch.

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Deletes the seven `internal/*` packages that RFC-0013 §migration step 8
PR-F.2 promised once the upstream-port wave
(#201/#202/#203/#204/#205/#207/#208/#209) cleared every external caller
of the in-tree pipeline runtime.

**Net: -6,888 LOC across 56 deleted files, +80 LOC across 14 modified
files. 70 files total.** This is the final cut of RFC-0013 §migration
step 8 PR-F.

## What deletes

| Path | LOC | Replacement |
|---|---|---|
| `internal/pipeline/` | 4,134 | `go.opentelemetry.io/collector/service`
(OCB-generated `_build/main.go` consumes `builder-config.yaml`). |
| `internal/pipelinebuilder/` | 1,282 | Same — assembly is upstream
`service`. |
| `internal/config/` | 718 | Upstream `confmap` providers (`file`,
`yaml`, `env`). |
| `internal/consumer/` | 87 | Upstream
`go.opentelemetry.io/collector/consumer`. |
| `internal/fanout/` | 366 | Upstream `internal/fanoutconsumer`
(collector module). |
| `internal/componentstatus/` | 16 | Upstream
`component/componentstatus.ReportStatus` (same free-function shape). |
| `internal/runtime/lifecycle/` | 505 | Per-receiver package-local
`lifecycle.go` siblings — already ported during the PR-B1 wave
(#184/#185/#186/#187/#194/#196/#197); the in-tree helper had no
remaining non-test consumer after PR-F.1 + the wave-2 upstream-port PRs.
`kernelevents/lifecycle.go` was inherited from k8sevents (#208). |

## Pre-flight grep evidence

```
$ grep -rn 'tracecoreai/tracecore/internal/(pipeline|consumer|pipelinebuilder|config|fanout|componentstatus|runtime/lifecycle)' --include='*.go' .
(zero matches)
```

## Tooling

- `.golangci.yml` `ignore-interface-regexps` repointed at upstream
`consumer.{Metrics,Traces,Logs}` + `component.Component`. The
in-tree-only same-package-error-wrap exemption stays — the STYLE rule
applies regardless of which interface is forwarded.
- `.github/workflows/chaos.yml` drops the `chaos-pipeline-test` job (the
in-tree `internal/pipeline/chaos_test.go` is gone; upstream `service`
provides the equivalent panic-recovery contract). `harness-determinism`
(failure-inject golden-SHA), `cpu-steal-mpstat`, `pattern-pod-evicted`
jobs preserved.
- `.github/workflows/install-bench.yml` drops the
`internal/{pipeline,runtime,selftelemetry}/**` path-filter rows.
- `go.mod` / `go.sum` unchanged.

## Doc sweep

- `CHANGELOG.md` Unreleased: PR-F.2 landed entry replacing the "PR-F.2
deferred" sentence; "Remaining v0.1.0 work" line updated; one dead
`internal/pipeline/README.md` link in Foundation block rewritten as
"deleted at v0.1.0".
- `docs/rfcs/0013-distro-first-pivot.md` §7 deletion table: both
pipeline-internals and runtime/lifecycle rows updated from "v0.1.0
(audit first…)" / "v0.2.0 (with last consumer)" to "v0.1.0 (landed
PR-F.2)". §migration step 8 reframed.
- `docs/FAILURE-MODES.md` Lifecycle / Data flow / Shutdown timing /
Backend tables rewired from in-tree
`internal/{config,pipeline,fanout}/*_test.go::TestName` pointers to
upstream-delegated wording matching the pattern PR-A2 established.
- `docs/STRATEGY.md` "Post-RFC-0013 status" intro updated; "Stable
interfaces in `internal/pipeline/`" graduation row rewritten to point at
the upstream surface.
- `docs/migration/v0.1-to-v0.2.md` `internal/*` section status banner
flipped from "deferred, still present in RC builds" to "landed, deleted
in v0.2.0 builds".
- `MILESTONES.md` v0.1.0 deletions row extended with boot-path
internals; M1 + M4b + M19 rubric details annotated with the PR-F.2
retirement.
- `README.md` Contributor row repointed at upstream
`go.opentelemetry.io/collector` package docs.
- `AGENTS.md` "Self-telemetry internals" bullet split into "Self-tel
internals" + "Pipeline / boot-path internals" with explicit deletion
status.
- `docs/README.md` table row for `internal/pipeline/README.md` dropped.
- `components/receivers/kernelevents/README.md` lifecycle-sibling
rationale updated to past-tense.
- `tools/failure-inject/README.md` "Testing locally" section drops the
`-tags=chaos ./internal/pipeline/...` invocation.

## Sequencing

This PR is hard-gated on every upstream-port PR landing first:

- #201 nccl_fr (PR-B2)
- #202 stdoutexporter
- #203 pyspy
- #204 k8sevents
- #205 clockreceiver (PR-B3)
- #207 otlphttp
- #208 kernelevents
- #209 containerstdout
- #206 PR-F.1 (selftel / telemetry / dcgm)

All nine merged before this PR opened; this is the moat-deletion payoff.
Remaining v0.1.0 work is PR-K (chart-default flip + `clockreceiver` +
`stdoutexporter` + remaining receiver source deletions, coupled with
test-fixture migration and the `telemetry:` values-key deprecation
cycle).

## Test plan

- [x] `make check` — golangci-lint 0 issues, go vet clean, go mod verify
ok.
- [x] `go build ./...` — clean.
- [x] `go test -count=1 ./...` — green (excluding the known
`kernelevents/TestReceiver_SLIBudget` flake called out in #205's body,
which only triggers under heavy parallel `go test ./...` load; passes
standalone).
- [x] `grep` confirms zero non-internal callers of the deleted packages.
- [x] Doc-check pre-push hook passes after the CHANGELOG dead-link fix.

```release-notes
[CHANGE] internal/{pipeline,pipelinebuilder,config,consumer,fanout,componentstatus,runtime/lifecycle} packages deleted. The OCB-generated boot path off builder-config.yaml replaces them. Third-party importers of internal/* (unlikely pre-1.0; the packages live under internal/ and the Go compiler rejects external imports) lose the pipeline-assembly + lifecycle + config-loader surfaces; receiver authors now wire against upstream go.opentelemetry.io/collector/{component,receiver,consumer,pipeline} directly. See docs/migration/v0.1-to-v0.2.md "internal/* package deletion".
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant