Skip to content

feat(pivot): PR-F port containerstdout off internal selftel + lc#197

Merged
trilamsr merged 3 commits into
mainfrom
pr-containerstdout-selftel-port
May 31, 2026
Merged

feat(pivot): PR-F port containerstdout off internal selftel + lc#197
trilamsr merged 3 commits into
mainfrom
pr-containerstdout-selftel-port

Conversation

@trilamsr

@trilamsr trilamsr commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

Ports components/receivers/containerstdout off the internal/selftelemetry + internal/runtime/lifecycle packages — biggest port in the RFC-0013 PR-F unblock wave (14 production .go files, 24 deleted import sites, ~80 selftelemetry/lifecycle references rewired). Receiver now owns its self-telemetry + streaming-source lifecycle surfaces in-tree, mirroring the kernelevents (multi-source) + dcgm (co-located Kind block) sibling patterns. Zero remaining imports of the internal packages from inside containerstdout/.

Unblocks PR-F's deletion of internal/selftelemetry + internal/runtime/lifecycle once the remaining sibling ports (pyspy/otlphttp/k8sevents) land.

Design notes

  • selftel.go exported surface. Per reviewer findings on the initial port:
    • Telemetry interface, Kind type, KindXxx constants, and CapturingTelemetry / NewCapturingTelemetry stay EXPORTED. They are consumed by ~10 external test files in package containerstdout_test (cursor_test, attribution_test, ratelimit_test, tailer_test, factory_test, attribution_body_test, attribution_informer_test, dataloader_test, pattern_consumer_test, failure_modes_test). Operators also grep KindXxx names in dashboards/alert rules per RFC-0010. Symbol-hiding here would force every external test into the internal package — a much larger blast radius than the symbol export.
    • newTelemetry, newNoopTelemetry, errNilMeterProvider are UNEXPORTED. Factory + same-package selftel_test.go are the only callers; no godoc-visible surface gain from being exported.
  • Canonical + receiver-local Kind* co-located. One const block in selftel.go for the full kind set (KindParse/KindRead/KindCardinality/KindDownstream/KindPanic canonical mirrors + KindRotationStalled/KindCursorWriteFailed/KindBackpressureDrop/KindAttributionCardinality/KindRateLimitCardinality/KindWatch receiver-local). Replaces the prior split between kind.go and internal/selftelemetry. Matches the dcgm sibling layout. kind.go deleted; kind_test.go updated to also pin the canonical mirror values (extra coverage, since string-byte parity with the deleted internal Kind* values is now load-bearing for dashboard/alert grep).
  • KindFingerprintCardinality dropped (reviewer finding). The original RFC-0010 enumeration declared a fingerprint-LRU cardinality kind, but the tailer-pool LRU it was meant to instrument never materialised — zero IncError call sites in production code, only defensive coverage in TestSelfTel_* and a t.Skip()'d "Phase 14" deferred test. Per the dcgm sibling rule "every kind has an impl call site" + [[no-bloat]], deleted the const + the skipped test + RUNBOOK section + README row + prometheus-alerts entry + RFC-0010 enumeration + docs/followups/M15.md grep example. Footnoted in RFC-0010 §M15.
  • lifecycle.go — receiver-scoped lifecycle with Add(). Multi-source: the per-tailer Run + per-tailer pipeline + informer + healthLoop + idleEvictLoop all join the same WaitGroup. TOCTOU-safe Add (mutex around (closed-check, wg.Add(1))).
  • CapturingTelemetry lives in production file. External _test packages need to construct it (replaces selftelemetry.NewCapturingReceiver() in 7 test files). Putting it in a _test.go file would force a test-helper package or build-tag dance. The dead-code cost in the production binary is negligible.
  • Test-rig _test.go files swap selftelemetry.NewCapturingReceiver()containerstdout.NewCapturingTelemetry() mechanically. Same shape, same accessors, same []Kind slice return for assertion.
  • All 9 selftel tests prefixed TestContainerstdout_SelfTel_* (reviewer finding) so test names disambiguate from sibling receivers' selftel tests — matches the lifecycle-test prefix convention.

Test plan

  • go test ./components/receivers/containerstdout/ -count=1 -race — green.
  • go test ./... -count=1 — full repo green except a pre-existing flake in TestK8sevents_Lifecycle_ConcurrentAddDuringShutdown_NoPanic introduced by feat(pivot): port k8sevents off internal selftel + lifecycle (PR-F unblock) #196 (race-window collapse on fast machines; 3/5 retry-pass; unrelated to this PR).
  • make check — gofumpt clean, golangci-lint 0 issues, vet clean, go.mod verified.
  • TestTailer_RotationStalledKind ×10 under -race — green.
  • TestContainerstdout_Lifecycle_ConcurrentAddDuringShutdown_NoPanic ×20 under -race — green. Scheduler-hardened with a shutdownGate so the test deterministically exercises the TOCTOU race window.
  • New selftel tests pin: noop safety across every Kind, nil-MP error sentinel, errors_total w/ kind + component_id labels, every receiver-local Kind routes through the same counter, emissions discard-negative, scope name = receiver import path, init_errors_total tick on factory fallback, nil-MP recordInitError panic-safe, CapturingTelemetry round-trip.
  • New lifecycle tests pin: happy/idempotent/panic-cb/deadline-wrap/Add-shares-WG/Add-panic-fires-cb/Add-before-Start-noop/Add-after-Shutdown-noop/concurrent-Add-during-Shutdown TOCTOU.
NONE

Receiver-scoped sibling for `internal/selftelemetry` and
`internal/runtime/lifecycle` so RFC-0013 PR-F can delete the
internal moats. Mirrors the kernelevents (PR #187) multi-source
sibling pattern + the dcgm (PR #188) co-located Kind block. 14
production .go files rewired; zero remaining imports of the
internal packages in containerstdout.

- selftel.go: exported Telemetry interface, Kind type (canonical
  mirrors + receiver-local kinds co-located in one const block),
  NewTelemetry / NewNoopTelemetry / NewCapturingTelemetry,
  recordInitError. Scope name pinned to the receiver's import path.
  Constructors stay exported because external _test packages
  consume them via NewCursorStore / NewCache / NewRateLimiter /
  NewPodInformer / TailerOptions.Telemetry.
- lifecycle.go: receiver-scoped lifecycle with Add() (multi-source:
  per-tailer Run + per-tailer pipeline + informer + health +
  idleEvict loops all join the same WaitGroup). TOCTOU-safe Add.
- New tests (TDD): selftel_test pins noop safety across every
  Kind, nil-MP error sentinel, errors_total + scope name, every
  receiver-local kind routes through the same counter,
  init_errors_total tick, nil-MP recordInitError safety,
  CapturingTelemetry round-trip. lifecycle_test pins
  happy/idempotent/panic-cb/deadline/Add-WG-share/Add-panic-cb/
  Add-before-start-noop/Add-after-shutdown-noop/concurrent-Add-
  during-Shutdown (TOCTOU, race-detector verified, scheduler-
  hardened with a shutdownGate so the test exercises the race
  window deterministically rather than flaking on fast machines).
- TestTailer_RotationStalledKind stays green via the new sibling
  (the pre-existing assertion that KindRotationStalled fires after
  stall must hold; verified under -race × 10).

LOC: +1419 new selftel/lifecycle (incl. tests) / -193 in 23
rewired sites = net +1226 across the receiver. Eliminates 24
import sites of the deleted-target packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 06:25
@trilamsr trilamsr disabled auto-merge May 31, 2026 06:25
Tri Lam added 2 commits May 30, 2026 23:40
Reviewer findings on PR #197:

1. Delete KindFingerprintCardinality. The kind had zero IncError call
   sites in production — only defensive coverage in TestSelfTel_*
   and a t.Skip()'d "Phase 14" deferred test. Per the dcgm sibling
   rule "every kind has an impl call site" + [[no-bloat]], deleted
   the const, the failure_modes_test.go skipped test, kind_test.go
   row, RUNBOOK section, README alert row, prometheus-alerts entry,
   RFC-0010 enumeration, and docs/followups/M15.md grep example.
   The original tailer-pool LRU it was meant to instrument never
   materialised (footnoted in RFC-0010 §M15).

2. Rename 9 TestSelfTel_* → TestContainerstdout_SelfTel_* so test
   names disambiguate from sibling receivers' selftel tests (matches
   the lifecycle-test prefix convention reviewer flagged).

3. Reduce export surface — NewTelemetry, NewNoopTelemetry, and
   ErrNilMeterProvider had zero external callers (factory + selftel_test
   only, both same-package), so unexported to newTelemetry,
   newNoopTelemetry, errNilMeterProvider. Telemetry interface, Kind
   type, KindXxx constants, and CapturingTelemetry/NewCapturingTelemetry
   stay EXPORTED — they are used by ~10 external test files in package
   containerstdout_test (cursor_test, attribution_test, ratelimit_test,
   tailer_test, factory_test, etc.) and operators grep KindXxx names
   in dashboards/alert rules per RFC-0010. Unexporting them would
   force converting every external test into the internal package,
   which is net bloat against the symbol-hiding gain.

make check + go test -race ./components/receivers/containerstdout/...
+ go test ./... all green (excluding pre-existing TestK8sevents_
Lifecycle_ConcurrentAddDuringShutdown_NoPanic flake from #196).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 06:56
@trilamsr trilamsr merged commit e94ad0b into main May 31, 2026
20 of 22 checks passed
@trilamsr trilamsr deleted the pr-containerstdout-selftel-port branch May 31, 2026 07:27
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

- Port `components/receivers/containerstdout` off the v0.1.x internal
  facades (`internal/pipeline`, `internal/consumer`) onto upstream
  `go.opentelemetry.io/collector/{component,receiver,consumer}` v1.59.0
  — the canonical types the OCB-generated `_build/main.go` already
  consumes for all third-party receivers.
- Follow-up to **PR-F #197** (which ported off `internal/selftelemetry`
  + `internal/runtime/lifecycle` into in-package siblings) and mirrors
  **PR-B2 #201** (which did the same swap for nccl_fr). After this PR
  the receiver has zero `internal/*` imports, clearing the PR-I.1
  submodule-extraction gate for containerstdout.
- Factory is now `receiver.NewFactory(componentType,
  createDefaultConfig, receiver.WithLogs(createLogs,
  component.StabilityLevelBeta))` instead of a hand-rolled struct
  implementing `internal/pipeline.ReceiverFactory`. Stability level
  (`Beta`) preserved across the swap so OCB-surfaced metadata doesn't
  regress. The `Factory` package-var + `type factory struct{}` are
  deleted; each OCB-stitched pipeline gets a freshly-built factory via
  `containerstdout.NewFactory()` (mirrors upstream-contrib
  otlpreceiver / filelogreceiver and the sibling nccl_fr port).
- Receiver + noopReceiver no longer embed `pipeline.ComponentState`
  (upstream `component.Component` carries no equivalent
  Started/Stopped mixin; the runtime never read that bookkeeping on
  the upstream graph). Lifecycle bookkeeping the receiver actually
  needs lives in the in-package `lifecycle` helper added in PR-F #197.
- Logger swapped from `*slog.Logger` → upstream's `*zap.Logger` (the
  type carried in `receiver.Settings.Logger`). All log call sites
  converted to `zap.String/Int64/Float64/Bool/Error` fields; log
  messages and field names are byte-for-byte preserved so operator
  alerting on log content does not regress. Internal lifecycle helper,
  tailer, informer, pipeline, and receiver all converted in lockstep.

## Type-swap reference

Inherited verbatim from PR-B2 #201 — the canonical mapping for the
PR-F.2 receiver/exporter ports.

| Internal | Upstream |
|---|---|
| `internal/pipeline.Type` | `component.Type` |
| `internal/pipeline.ReceiverFactory` | `receiver.Factory` |
| `internal/pipeline.CreateSettings` | `receiver.Settings` (via
`receivertest.NewNopSettings` in tests) |
| `internal/pipeline.Config` | `component.Config` |
| `internal/pipeline.Receiver` | `receiver.Logs` (`= interface{
component.Component }`) |
| `internal/pipeline.Host` | `component.Host` (via
`componenttest.NewNopHost` in tests) |
| `internal/pipeline.ID` | `component.ID` |
| `internal/consumer.Logs` | `consumer.Logs` |
| `*slog.Logger` | `*zap.Logger` |
| `internal/pipeline.MustNewType` | `component.MustNewType` |
| `internal/pipeline.MustNewID` | `component.NewIDWithName` |

## Hard gate

PR-I.1 (submodule extraction to
`module/receiver/containerstdoutreceiver/`)
requires zero `internal/*` imports from the receiver package. This PR
clears it:

```
$ grep -rn 'internal/pipeline\|internal/consumer\|internal/runtime/lifecycle\|internal/selftelemetry' components/receivers/containerstdout/*.go
(no matches)
```

Comment-only historical references remain in `factory.go`,
`receiver.go`, `noop_receiver.go`, `selftel.go`, `kind_test.go`, and
`selftel_test.go` documenting the v0.1.x → v0.2.0 migration; they are
not imports.

## Predecessor / scope

- **PR-B2 #201** (merged 2026-05-31) — same swap on nccl_fr; reference
  template for the type-swap table above.
- **PR-F #197** (merged 2026-05-31) — extracted `internal/selftelemetry`
  + `internal/runtime/lifecycle` into in-package siblings (`selftel.go`,
  `lifecycle.go`, `CapturingTelemetry`, local `Kind*` consts). This PR
  builds on top.

Together with the sibling receiver/exporter ports already merged
(clockreceiver / kernelevents / stdoutexporter / k8sevents / otlphttp /
pyspy / dcgm / nccl_fr), this is the last `internal/{pipeline,consumer}`
import site on `components/receivers/containerstdout` — once the other
sibling PR-F.2 ports land, RFC-0013 PR-F can delete the
`internal/{pipeline,consumer}` packages outright.

## Test removals (intentional)

Three tests are removed because the upstream API makes them
tautological:

- `TestFactory_CreateMetrics_Unsupported` /
  `TestFactory_CreateTraces_Unsupported` — `receiver.NewFactory(...
  WithLogs(...))` returns a factory whose CreateMetrics / CreateTraces
  surface upstream's `componenterror.ErrDataTypeIsNotSupported`
  naturally; no hand-rolled sentinel to pin.
- `TestNewFactory_ReturnsPackageVarFactory` — there is no longer a
  `Factory` package-var; `NewFactory()` constructs a fresh factory
  each call, mirroring upstream-contrib. Mirrors the PR-B2 nccl_fr
  removal.

## Compatibility note

`go.mod` promotes
`go.opentelemetry.io/collector/component/componenttest`
from indirect to direct (used by `receiver_test.go` +
`pipeline_test.go` for `componenttest.NewNopHost()`). No transitive-dep
churn beyond that — all `go.opentelemetry.io/collector/{component,
receiver,consumer}` v1.59.0 + `receiver/receivertest` v0.153.0 were
already pinned by PR-B2.

## Test plan

- [x] `make check` — gofumpt clean, golangci-lint 0 issues, vet clean,
  go mod verified.
- [x] `go test -race ./components/receivers/containerstdout/...
-count=10`
  — all tests green under race across 10 runs, including stress runs
  of `TestTailer_RotationStalledKind` and the TOCTOU pin
  `TestContainerstdout_Lifecycle_ConcurrentAddDuringShutdown_NoPanic`.
- [x] `go test ./...` — full repo green except the pre-existing
  `TestK8sevents_Lifecycle_ConcurrentAddDuringShutdown_NoPanic`
  race-window flake (passes on retry; documented in PR-B2 #201 +
  PR-F #197 bodies).
- [x] `TestContainerstdout_SelfTel_ScopeNameIsReceiverImportPath` still
  pins the OTel scope name to the receiver's Go import path

(`github.com/tracecoreai/tracecore/components/receivers/containerstdout`)
  so a regression back to the deleted internal/selftelemetry scope
  fails here.

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
#206)

## Summary

Deletes the three internal moats and the in-tree DCGM receiver that
RFC-0013 §migration step 8 promised — the payoff for the wave-3
sibling-port PRs (#184/#185/#186/#187/#188/#193/#194/#196/#197).

**Net: -12,482 LOC across 92 files (78 deletions, 14 modifications).**

### What deletes

| Path | LOC | Why safe now |
|---|---|---|
| `components/receivers/dcgm/` | 7,604 | cgo stub never shipped real
code; #188's PR-B2-shaped dcgm sweep already removed the live port
surface. |
| `pkg/dcgm/` | 922 | Only consumer was the deleted receiver. Bonus
cleanup. |
| `internal/selftelemetry/` | 1,946 | Every consumer (containerstdout,
clockreceiver, kernelevents, k8sevents, nccl_fr, dcgm, pyspy,
stdoutexporter, otlphttp) ported onto receiver/exporter-scoped sibling
`selftel.go` files. |
| `internal/telemetry/` | 1,991 | Probes flow through upstream
`healthcheckextension`; MeterProvider via upstream `service.telemetry`.
Only remaining consumers were `internal/selftelemetry/*_test.go`
(deleted together) + one orphan clockreceiver test. |
| `components/receivers/clockreceiver/errors_integration_test.go` | 100
| Orphan from #185's PR-B1 clockreceiver port — bootstrapped via the
deleted `selftelemetry.Receiver` interface but never migrated to the
receiver-scoped sibling `selftel.go`. Covered behaviour ("errors_total
surfaces on downstream failure") is now exercised through
clockreceiver's sibling tests. |

### Pre-flight grep evidence (post-merge of origin/main)

```
$ grep -rn "tracecoreai/tracecore/internal/selftelemetry" --include="*.go" .
(zero matches)

$ grep -rn "tracecoreai/tracecore/internal/telemetry" --include="*.go" .
(zero matches)

$ grep -rn "tracecoreai/tracecore/components/receivers/dcgm" --include="*.go" .
$ grep -rn "tracecoreai/tracecore/pkg/dcgm" --include="*.go" .
(zero matches)
```

### Tooling

- Retire the `dcgm` build tag — `make build-tags` no longer vets `-tags
dcgm` (kept as a hook for future build-tag-gated paths).
- `make bench-check` loop drops both deleted package rows
(`internal/telemetry`, `components/receivers/dcgm`).
- `scripts/register-lint.sh` allowlist emptied (the two
`internal/telemetry/{build_info,slo}.go` entries are gone with the
package; allowlist comment notes the post-PR-F.1 state).
- `go.mod` direct deps shrink — `github.com/prometheus/client_golang`
and `go.opentelemetry.io/otel/exporters/prometheus` drop to indirect
(they were used by `internal/telemetry/server.go`).

### Chart toggles intentionally retained

Chart `receivers.dcgm` toggle + `templates/NOTES.txt` warning +
`templates/_helpers.tpl` doc-comment list keep the `dcgm` symbol for the
migration window. The toggle has been inert since PR-A2 — operators
enabling `receivers.dcgm.enabled=true` already crashed at boot because
the OCB binary doesn't register the factory. PR-K removes the toggle
entirely alongside the chart-default flip from `clockreceiver` →
`hostmetrics` and the v0.2.0 recipe migration.

### Doc sweep

- `internal/runtime/lifecycle/lifecycle.go` doc-comment: drop the dcgm
pointer; flag containerstdout as the sole remaining in-tree consumer;
reschedule the package itself for PR-F.2 deletion once containerstdout
ports off the helper or PR-K.2 deletes the receiver.
- `docs/FAILURE-MODES.md` self-tel-surface rows rewired from
`internal/telemetry/server_test.go::*` (deleted) to upstream-delegated
wording.
- `docs/patterns/{README,pattern-{1,3,4,5}}.md` replay-test pointers
updated — the in-tree `components/receivers/dcgm/pattern_replay_test.go`
is gone; pattern replay now flows through
`docs/integrations/prometheus-scrape.md` (PR-J's upstream
`dcgm-exporter` recipe).
- `docs/README.md` per-component table: drop the deleted
`internal/telemetry/{README,SECURITY}.md` rows + the deleted
`components/receivers/dcgm/{README,RUNBOOK}.md` rows.
- `STYLE.md` vendor-SDK section: drop the `pkg/dcgm/` reference + the
`//go:build dcgm` example; explicit cross-reference to PR-F.1 in the
integration-test build-tag note.
- `CHANGELOG.md`: PR-F.1 landed entry under Unreleased; "Remaining
v0.1.0 work" line updated to point at PR-F.2.
- `docs/rfcs/0013-distro-first-pivot.md` §migration step 8: PR-F entry
replaced with the PR-F.1/PR-F.2 split + the explicit rationale
(componentstatus travels with pipeline; pipeline is out of PR-F's scope
per line 240's original framing).

### Out of scope (PR-F.2 follow-up)

- `internal/componentstatus/` — 5-line `ReportStatus` free function.
Travels with `internal/pipeline` (its only non-test consumers are
`internal/pipeline/runtime_test.go` +
`internal/pipeline/pipelinetest/fixture_test.go`). Deletion lands when
pipeline migrates to upstream
`go.opentelemetry.io/collector/component/componentstatus`.

### Rationale links

- RFC-0013 §migration step 8 — the PR-F entry now codifies the F.1/F.2
split in this branch's RFC update.
- PR-B2 scope-discovery (#188) — established the "rename + slim, don't
reshape" pattern for the dcgm sweep that retired the cgo path.
- Wave-3 PRs that unblocked selftelemetry deletion: #184 (nccl_fr), #185
(clockreceiver), #186 (kernelevents), #187 (stdoutexporter), #188
(dcgm), #193 (otlphttp), #194 (pyspy), #196 (k8sevents), #197
(containerstdout).

```release-notes
[CHANGE] internal/{selftelemetry,telemetry} packages deleted; components/receivers/dcgm + pkg/dcgm deleted. Operators using the v0.1.x in-tree `tracecore.*` self-telemetry metric names migrate per docs/migration/v0.1-to-v0.2.md. Third-party importers of internal/* (unlikely pre-1.0) lose the `selftelemetry.{Receiver,Exporter}` interfaces and the `telemetry.MeterProvider` wrapper; receiver/exporter authors now wire a receiver-scoped sibling `selftel.go` per the PR-B1 pattern.
```

## Test plan

- [x] `make verify` (lint + vet + tidy-check + mod-verify +
license-check + generate-fixtures-check + build-tags + nccl-fr-rce-gate
+ register-lint + actionlint + zizmor + doc-check + no-autoupdate-check)
— exit 0.
- [x] `go test ./...` — all green (29 packages).
- [x] `make build` (OCB) — `./_build/tracecore` produced.
- [x] `./_build/tracecore --version` — `tracecore version
0.1.0-m9-alpha`.
- [x] Pre-flight greps for all four deleted paths — zero external
importers.
- [ ] CI green on PR (linux/race matrix, chart render, install-bench,
zizmor, govulncheck).
- [ ] Operator verification that the chart's `dcgm` toggle remains inert
post-merge (no behaviour change from main — already inert since PR-A2).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
…ts+k8sevents) (#210)

## Summary

Backport containerstdout's hardened TOCTOU race-window test pattern
(#197, #209) to `kernelevents` and `k8sevents`
`Lifecycle_ConcurrentAddDuringShutdown_NoPanic` tests. The pre-hardened
tests have a brittle race window that collapses on fast schedulers —
Shutdown wins universally, every Add no-ops via the closed-guard, and
the test passes vacuously while never exercising the TOCTOU path it
claims to cover. On slightly slower (or differently-loaded) CI machines,
the same brittleness flips and produces intermittent failures, blocking
unrelated PRs.

## Root cause

The lifecycle's mutex guard around `(closed-check, wg.Add(1))` is
correct in production code. The *test*, however, releases all 50 adders
+ the shutdowner from a single gate, then relies on `runtime.Gosched()`
alone to interleave them. On fast schedulers the shutdowner reaches
`lc.Shutdown(...)` before any adder reaches `lc.Add(...)`, so:

- Branch (a) "Add wins, registers under WaitGroup" never fires
- Branch (b) "Shutdown wins, Add no-ops" fires for all 50 adders
- Test passes but tests nothing meaningful
- On flip side: race window straddle changes between CI runs, surfacing
as flake

Containerstdout's hardened equivalent inserts a `shutdownGate` channel
that holds the shutdowner back until 50µs after `release` fires,
deterministically straddling adders-in-flight with the Shutdown call.
This PR ports that pattern verbatim.

## Changes

- `components/receivers/kernelevents/lifecycle_test.go`: add
`shutdownGate` chan + 50µs sleep + comment explaining intent
- `components/receivers/k8sevents/lifecycle_test.go`: same

Production code: unchanged.

## Verification

```
for i in $(seq 1 10); do go test -race -count=1 -run ConcurrentAddDuringShutdown ./components/receivers/kernelevents/...; done
# 10/10 PASS

for i in $(seq 1 10); do go test -race -count=1 -run ConcurrentAddDuringShutdown ./components/receivers/k8sevents/...; done
# 10/10 PASS
```

Branch-coverage check across 10 verbose iterations (registered count <50
== branch (b) exercised):

- kernelevents: registered counts 44, 47, 49, 50, 50, 47, 44, 50, 50, 50
→ branch (b) hit 6/10 iterations
- k8sevents: registered counts 49, 49, 50, 50, 50, 48, 50, 44, 48, 50 →
branch (b) hit 7/10 iterations

Both branches deterministically exercised. The existing `registeredCount
== 0` guard in the test prevents the inverse-vacuous regression
(all-no-op).

Full repo: `make check` clean, `go test -race ./...` green.

## Motivation

Unblocks four in-flight PRs hitting this flake on CI:
- #203
- #204
- #205
- #207

## Reference

Containerstdout's port: #197 (original finding + fix), #209 (additional
hardening). Same pattern applied here with no behavioral divergence.

## Test plan

- [x] `make check` passes locally
- [x] `go test -race ./...` passes locally
- [x] 10-iter stress on `ConcurrentAddDuringShutdown` in both target
packages, 10/10 pass
- [x] Verified both TOCTOU branches still exercised (branch (a) every
iter; branch (b) 6-7/10 iters)
- [ ] CI green on this branch
- [ ] PRs #203, #204, #205, #207 re-run + CI green after merge

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Deletes the seven `internal/*` packages that RFC-0013 §migration step 8
PR-F.2 promised once the upstream-port wave
(#201/#202/#203/#204/#205/#207/#208/#209) cleared every external caller
of the in-tree pipeline runtime.

**Net: -6,888 LOC across 56 deleted files, +80 LOC across 14 modified
files. 70 files total.** This is the final cut of RFC-0013 §migration
step 8 PR-F.

## What deletes

| Path | LOC | Replacement |
|---|---|---|
| `internal/pipeline/` | 4,134 | `go.opentelemetry.io/collector/service`
(OCB-generated `_build/main.go` consumes `builder-config.yaml`). |
| `internal/pipelinebuilder/` | 1,282 | Same — assembly is upstream
`service`. |
| `internal/config/` | 718 | Upstream `confmap` providers (`file`,
`yaml`, `env`). |
| `internal/consumer/` | 87 | Upstream
`go.opentelemetry.io/collector/consumer`. |
| `internal/fanout/` | 366 | Upstream `internal/fanoutconsumer`
(collector module). |
| `internal/componentstatus/` | 16 | Upstream
`component/componentstatus.ReportStatus` (same free-function shape). |
| `internal/runtime/lifecycle/` | 505 | Per-receiver package-local
`lifecycle.go` siblings — already ported during the PR-B1 wave
(#184/#185/#186/#187/#194/#196/#197); the in-tree helper had no
remaining non-test consumer after PR-F.1 + the wave-2 upstream-port PRs.
`kernelevents/lifecycle.go` was inherited from k8sevents (#208). |

## Pre-flight grep evidence

```
$ grep -rn 'tracecoreai/tracecore/internal/(pipeline|consumer|pipelinebuilder|config|fanout|componentstatus|runtime/lifecycle)' --include='*.go' .
(zero matches)
```

## Tooling

- `.golangci.yml` `ignore-interface-regexps` repointed at upstream
`consumer.{Metrics,Traces,Logs}` + `component.Component`. The
in-tree-only same-package-error-wrap exemption stays — the STYLE rule
applies regardless of which interface is forwarded.
- `.github/workflows/chaos.yml` drops the `chaos-pipeline-test` job (the
in-tree `internal/pipeline/chaos_test.go` is gone; upstream `service`
provides the equivalent panic-recovery contract). `harness-determinism`
(failure-inject golden-SHA), `cpu-steal-mpstat`, `pattern-pod-evicted`
jobs preserved.
- `.github/workflows/install-bench.yml` drops the
`internal/{pipeline,runtime,selftelemetry}/**` path-filter rows.
- `go.mod` / `go.sum` unchanged.

## Doc sweep

- `CHANGELOG.md` Unreleased: PR-F.2 landed entry replacing the "PR-F.2
deferred" sentence; "Remaining v0.1.0 work" line updated; one dead
`internal/pipeline/README.md` link in Foundation block rewritten as
"deleted at v0.1.0".
- `docs/rfcs/0013-distro-first-pivot.md` §7 deletion table: both
pipeline-internals and runtime/lifecycle rows updated from "v0.1.0
(audit first…)" / "v0.2.0 (with last consumer)" to "v0.1.0 (landed
PR-F.2)". §migration step 8 reframed.
- `docs/FAILURE-MODES.md` Lifecycle / Data flow / Shutdown timing /
Backend tables rewired from in-tree
`internal/{config,pipeline,fanout}/*_test.go::TestName` pointers to
upstream-delegated wording matching the pattern PR-A2 established.
- `docs/STRATEGY.md` "Post-RFC-0013 status" intro updated; "Stable
interfaces in `internal/pipeline/`" graduation row rewritten to point at
the upstream surface.
- `docs/migration/v0.1-to-v0.2.md` `internal/*` section status banner
flipped from "deferred, still present in RC builds" to "landed, deleted
in v0.2.0 builds".
- `MILESTONES.md` v0.1.0 deletions row extended with boot-path
internals; M1 + M4b + M19 rubric details annotated with the PR-F.2
retirement.
- `README.md` Contributor row repointed at upstream
`go.opentelemetry.io/collector` package docs.
- `AGENTS.md` "Self-telemetry internals" bullet split into "Self-tel
internals" + "Pipeline / boot-path internals" with explicit deletion
status.
- `docs/README.md` table row for `internal/pipeline/README.md` dropped.
- `components/receivers/kernelevents/README.md` lifecycle-sibling
rationale updated to past-tense.
- `tools/failure-inject/README.md` "Testing locally" section drops the
`-tags=chaos ./internal/pipeline/...` invocation.

## Sequencing

This PR is hard-gated on every upstream-port PR landing first:

- #201 nccl_fr (PR-B2)
- #202 stdoutexporter
- #203 pyspy
- #204 k8sevents
- #205 clockreceiver (PR-B3)
- #207 otlphttp
- #208 kernelevents
- #209 containerstdout
- #206 PR-F.1 (selftel / telemetry / dcgm)

All nine merged before this PR opened; this is the moat-deletion payoff.
Remaining v0.1.0 work is PR-K (chart-default flip + `clockreceiver` +
`stdoutexporter` + remaining receiver source deletions, coupled with
test-fixture migration and the `telemetry:` values-key deprecation
cycle).

## Test plan

- [x] `make check` — golangci-lint 0 issues, go vet clean, go mod verify
ok.
- [x] `go build ./...` — clean.
- [x] `go test -count=1 ./...` — green (excluding the known
`kernelevents/TestReceiver_SLIBudget` flake called out in #205's body,
which only triggers under heavy parallel `go test ./...` load; passes
standalone).
- [x] `grep` confirms zero non-internal callers of the deleted packages.
- [x] Doc-check pre-push hook passes after the CHANGELOG dead-link fix.

```release-notes
[CHANGE] internal/{pipeline,pipelinebuilder,config,consumer,fanout,componentstatus,runtime/lifecycle} packages deleted. The OCB-generated boot path off builder-config.yaml replaces them. Third-party importers of internal/* (unlikely pre-1.0; the packages live under internal/ and the Go compiler rejects external imports) lose the pipeline-assembly + lifecycle + config-loader surfaces; receiver authors now wire against upstream go.opentelemetry.io/collector/{component,receiver,consumer,pipeline} directly. See docs/migration/v0.1-to-v0.2.md "internal/* package deletion".
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant