Skip to content

feat(pivot): port kernelevents off internal pipeline+consumer to upstream#208

Merged
trilamsr merged 1 commit into
mainfrom
pr-kernelevents-upstream
May 31, 2026
Merged

feat(pivot): port kernelevents off internal pipeline+consumer to upstream#208
trilamsr merged 1 commit into
mainfrom
pr-kernelevents-upstream

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

Summary

Ports components/receivers/kernelevents/ off the soon-to-be-deleted internal/{pipeline,consumer,runtime/lifecycle} packages onto upstream OTel collector v1.59.0 / v0.153.0. Mirrors PR-B2 (#201) for the multi-source kernelevents shape — receiver-scoped lifecycle helper keeps its Add() method (one goroutine per source under one WaitGroup); only the logger type flips from *slog.Logger to *zap.Logger. Net delta: +317 / -296 LOC across 28 files in components/receivers/kernelevents/ plus go.mod (promotes componenttest, pipeline from indirect to direct).

Root cause

RFC-0013 PR-F deletes internal/pipeline, internal/consumer, and internal/runtime/lifecycle. Every receiver must consume upstream types directly. PR #187 already pulled the receiver-local lifecycle + selftel siblings into this package (preempting the internal-lifecycle dep) but kept *slog.Logger and the internal/pipeline factory shape. This PR completes the swap.

Type-swap applied

internal upstream
internal/pipeline.Type component.Type
internal/pipeline.ReceiverFactory receiver.Factory (via receiver.NewFactory(...))
internal/pipeline.CreateSettings receiver.Settings (via receivertest.NewNopSettings)
internal/pipeline.Config component.Config
internal/pipeline.Receiver receiver.Logs (compile-time var _ receiver.Logs = (*kernelEventsReceiver)(nil))
internal/pipeline.Host component.Host (via componenttest.NewNopHost)
internal/pipeline.ComponentState mixin dropped (Started()/Stopped() accessors were unused)
internal/pipeline.ErrSignalNotSupported pipeline.ErrSignalNotSupported (auto-surfaced by receiver.NewFactory default)
internal/pipeline/pipelinetest.New(t) local testSettings(t) wrapper + componenttest.NewNopHost
internal/consumer.{Logs,Capabilities,Metrics,Traces} upstream consumer.*
internal/pipeline.TelemetrySettings.{Logger,MeterProvider,Resource} receiver.Settings.{Logger,MeterProvider,Resource} (TelemetrySettings embedded, no .Telemetry indirection)
*slog.Logger everywhere (lifecycle, source, kmsg, journald, factory, kernelevents.go, all tests) *zap.Logger
slog.Default() zap.NewNop()

Behavior changes (deliberate)

  • Shutdown surfaces lifecycle ctx-deadline error: previously _ = r.lc.Shutdown(ctx) silently swallowed any deadline-exceeded; now wrapped + returned (kernelevents lifecycle shutdown: %w). Matches PR-B2 nccl_fr discipline. TestReceiver_ShutdownIsIdempotent still passes (second Shutdown short-circuits to the first call's stashed error, which is nil on success).
  • Factory shape: package-var Factory removed; NewFactory() returns a fresh receiver.Factory per call (matches upstream-contrib pattern + PR-B2 nccl_fr). Stability pinned at component.StabilityLevelBeta.
  • No ComponentState mixin: Started() / Stopped() accessors had no callers; dropping the embed removes the only stateful field the receiver carried for upstream interface satisfaction.

Behavior preserved

  • Multi-source Add() lifecycle contract — TOCTOU-safe under the closed mutex guard; TestKernelevents_Lifecycle_ConcurrentAddDuringShutdown_NoPanic passes under -race (the only mode it exercises; flaky without -race as documented in the test docstring).
  • Per-source isolation (each source owns its OWN *lifecycle; receiver hosts the driver goroutine).
  • Sibling-isolation invariant + cancel cascade (Start-ctx cancellation reaches every source without an explicit Shutdown).
  • Self-telemetry: metric names + label shape unchanged (tracecore.receiver.errors_total{kind,component_id} and siblings); instrumentation scope = receiver's Go import path.
  • Schema URL (kernelEventsSchemaURL), attribute set, severity mapping, partial-source mode, regex / facility / min-severity filters, kmsg bufio.ErrTooLongkindKmsgOversized discrimination, journald restart-with-backoff + slow-recovery loop.

Test plan

  • make check0 issues.
  • go test -race ./components/receivers/kernelevents/... — pass.
  • go test ./... — pass for kernelevents; two unrelated pre-existing race-window flakes (TestK8sevents_Lifecycle_ConcurrentAddDuringShutdown_NoPanic, TestServer_HealthzReturns503DuringShutdown) both pass under -race.
  • Concurrent-Add stress test passes under -race (first try; the test is -race-only by design).
  • Compile-time assertion var _ receiver.Logs = (*kernelEventsReceiver)(nil) guards against upstream interface drift.
  • go.mod tidies cleanly with componenttest + pipeline promoted indirect → direct.

Out of scope

  • internal/sli import in bench_test.go (SLI summary publisher, not deleted by PR-F).
  • internal/pipeline/pipelinetest deletion (covered by sibling PRs).
  • Source-template .go.example parser-gate test — comment refreshed; the template's package-kernelevents private-type usage means the parse-only gate is unchanged.
NONE

Mirrors PR-B2 (#201) for the multi-source kernelevents receiver. Swap
internal/{pipeline,consumer,runtime/lifecycle} → upstream
go.opentelemetry.io/collector/{component,consumer,receiver,receiver/receivertest}
+ component/componenttest; flip *slog.Logger → *zap.Logger across
factory + lifecycle + sources + tests. Receiver-scoped lifecycle
helper keeps Add() (multi-source contract) — only the logger type
flips. Factory shape moves from package-var ReceiverFactory to
upstream NewFactory() builder with WithLogs(stability=beta). Shutdown
now surfaces lifecycle ctx-deadline error instead of silently swallowing
(matches nccl_fr discipline). receivertest.NewNopSettings replaces the
internal pipelinetest.New fixture; componenttest.NewNopHost replaces
pipelinetest.NewHost.

Tests: in-package source tests use zap.NewNop; parser_test's warnOnce
gating asserted via zaptest/observer instead of slog text-handler
buffer. factory_test now exercises the upstream LogsStability + the
pipeline.ErrSignalNotSupported default surfaced by WithLogs.
TestKernelevents_Lifecycle_ConcurrentAddDuringShutdown_NoPanic passes
under -race (the only mode it's designed to exercise; flaky without
-race as documented in the test docstring).

Self-telemetry preserved (metric names + label shape unchanged).
Schema URL, attribute set, severity mapping, partial-source mode all
intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 07:49
@trilamsr trilamsr merged commit 82df7c2 into main May 31, 2026
14 checks passed
@trilamsr trilamsr deleted the pr-kernelevents-upstream branch May 31, 2026 07:56
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Cross-cut review cleanup of
[#205](#205) (clockreceiver
port off internal/pipeline). Two convention drifts caught when reviewing
the merged PR against peer receivers; both are pure realignment, no
behavior change.

## Root cause

#205 ported `clockreceiver` from `internal/pipeline+consumer` to the
upstream `receiver.Factory` shape. The port worked, but two
package-surface decisions diverged from every other component package in
the repo:

1. **`type Receiver struct` exported.** Every peer keeps the concrete
struct package-private:
   - `containerStdoutReceiver` (components/receivers/containerstdout)
   - `k8sEventsReceiver` (components/receivers/k8sevents)
   - `kernelEventsReceiver` (components/receivers/kernelevents)
   - `ncclfrReceiver` (components/receivers/nccl_fr)
   - `pyspyReceiver` (components/receivers/pyspy)

The factory is the only exported constructor — callers stitch via OCB,
never by importing the concrete struct. `git grep
clockreceiver.Receiver` returns zero external hits, confirming no caller
depended on the exported name. Restore convention.

2. **Local `type componenttest struct{}` test stub.** The fake
`component.Host` in `clockreceiver_test.go` shadowed the upstream
package `go.opentelemetry.io/collector/component/componenttest` — which
is already in `go.mod` at `v0.153.0` and is what every peer test uses
(`componenttest.NewNopHost()`). The local stub was invented for no
reason; peer ports
[#208](#208)
(containerstdout) and
[#209](#209) (k8sevents)
use the upstream helper directly.

Neither drift is a workaround for an upstream blocker — both are pure
convention misses caught post-merge. This PR is the root-cause fix.

## Changes

- Rename `Receiver` → `clockReceiver` across `clockreceiver.go`,
`factory.go` (comment), `selftel_test.go` (type-assertion).
- Drop local `componenttest{}` stub from `clockreceiver_test.go`;
replace usage with upstream `componenttest.NewNopHost()`.
- Add doc comment on `clockReceiver` explaining the package-private
convention so a future drift fails review.

Net diff: 4 files, +23/-23 LOC.

## Test plan

- [x] `make check` (fmt + tidy + lint + vet + mod-verify) — green
- [x] `go test -race ./components/receivers/clockreceiver/...` — green
- [x] Verified no external `clockreceiver.Receiver` callers exist (`grep
-rn 'clockreceiver\.Receiver' --include='*.go'` returns nothing)
- [x] Verified `componenttest.NewNopHost()` is the peer-receiver pattern
(containerstdout, kernelevents, nccl_fr, k8sevents)

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Reconcile the four pivot-tracking docs
(`docs/rfcs/0013-distro-first-pivot.md`, `CHANGELOG.md`,
`MILESTONES.md`, `docs/migration/v0.1-to-v0.2.md`) with the wave-3
(PR-B1-shape sibling ports) and wave-4 (PR-B2-shape upstream-only ports
+ PR-F.1 + PR-J + PR-L + PR-N) landings. Pure doc sweep — no code or
config touched.

## What changed

### `docs/rfcs/0013-distro-first-pivot.md` §migration

PR sequence rows updated with PR-number citations and landed markers:

- **PR-A2** (landed, #189, 2026-05-30)
- **PR-B2** (landed, #201) — also enumerates sibling-receiver follow-ups
under PR-B2 to dispel the slug collision with #188's PR-B2-labelled dcgm
port: stdoutexporter (#202), pyspy (#203), kernelevents (#208),
containerstdout (#209)
- **PR-F.1** (landed) — fleshed-out delete list
(`internal/{selftelemetry,telemetry}` + `components/receivers/dcgm/` +
`pkg/dcgm/` + one orphan clockreceiver integration test)
- **PR-F.2** re-scoped — now deletes the whole
`internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}`
bundle in one cut once the last three pipeline+consumer-importing
receivers land (#204 k8sevents, #205 clockreceiver, #207 otlphttp). Per
the import-graph state — `internal/componentstatus`'s only non-test
consumer is `internal/pipeline`, so they delete together
- **PR-G** (landed, #182), **PR-H** (landed, #183)
- **PR-I.1a** (in flight — scaffold agent), **PR-I.1b** (pre-staged;
gate satisfied by #201)
- **PR-J** (landed, #195) — kept existing marker
- **PR-K.1** (in flight — separate agent landing)
- **PR-L** (landed, skeleton #179 + body #191) — flagged as living
document
- **PR-N** (landed, #200) — shipped at v0.1.0 ahead of v0.3.0 as a
doc-only update at `docs/migration/v0.2-to-v0.3.md`

### `CHANGELOG.md` [Unreleased]

- Restructured the pivot wave list as **four waves** (was three). Wave 3
enumerates PR-B1-shape sibling ports + support infra (#180-#194/#196).
Wave 4 enumerates PR-B2-shape upstream-only ports + PR-J (#195) + PR-F.1
(#206) + PR-N (#200) + lint/TOCTOU hardening (#198/#210).
- Tightened the PR-F.2 deferred note to point at the three open ports
(#204/#205/#207) as the gate.

### `MILESTONES.md`

- **M1** (pipeline runtime) — status row now cites PR-A2 (#189), PR-F.1
(#206), PR-F.2 gate (#204/#205/#207), PR-E (#180), retains
`internal/config/` (still load-bearing for `tracecore validate`).
- **M2** (self-telemetry) — status row now cites PR-F.1 (#206); flags
`internal/componentstatus` as travelling with `internal/pipeline` in
PR-F.2.
- **M8** (DCGM receiver) — status flipped to *landed-and-replaced*:
cites PR-F.1 (#206) deletion + PR-J (#195)
`docs/integrations/prometheus-scrape.md` recipe. Notes the inert chart
toggle retention until PR-K.3.

### `docs/migration/v0.1-to-v0.2.md`

- §`internal/*` package deletion (PR-F) status flips from "not yet open"
to "PR-F.1 landed (#206), PR-F.2 gated on three open ports".
- Open-items checklist expanded from 5 to 13 entries — tracks every PR
letter the migration guide cares about (A2 / E / F.1 / F.2 / I.1a-c / J
/ K.1-3 / L / N) with PR numbers and links.

## Why now

Tracking docs accumulated drift across wave-3 + wave-4 because every
sibling-port PR (and the support-infra PRs around them) updated the
bottom of `CHANGELOG.md` but did not always touch the upstream
sequencing section in RFC-0013. Per memory rule `[Keeping this document
current]`: status drift is a review blocker. This PR is the consolidated
catch-up; future port PRs include their RFC-row flip in-PR.

## What this PR does NOT change

- No code, no config, no YAML, no chart — only the four tracking docs.
- No new doc gates added; existing gates pass.
- No PRs other than the four named docs are modified.

## Test plan

- [x] `bash scripts/doc-check.sh` clean (33 test refs, 528 links
resolve, comment-noise diff gate clean vs `origin/main`, all 13 gates
green).
- [x] Pre-commit hook (`commitlint` 72-char subject limit + DCO +
AI-trailer gates) passed.
- [x] Pre-push hook (`make ci-fast` equivalent: `golangci-lint`, `go
vet`, `go mod verify`, `no-autoupdate-check`, `doc-check.sh`) passed on
second attempt after `git fetch origin main` populated the worktree's
`origin/main` ref — first push failed because the worktree previously
tracked the (gone) `pr-a2-ocb-main-swap` branch, so `doc-check.sh`'s
comment-noise diff-scope gate exited 128 on the missing ref. Root cause
fixed by the fetch; not a workaround.
- [ ] CI green on this branch.

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Deletes the seven `internal/*` packages that RFC-0013 §migration step 8
PR-F.2 promised once the upstream-port wave
(#201/#202/#203/#204/#205/#207/#208/#209) cleared every external caller
of the in-tree pipeline runtime.

**Net: -6,888 LOC across 56 deleted files, +80 LOC across 14 modified
files. 70 files total.** This is the final cut of RFC-0013 §migration
step 8 PR-F.

## What deletes

| Path | LOC | Replacement |
|---|---|---|
| `internal/pipeline/` | 4,134 | `go.opentelemetry.io/collector/service`
(OCB-generated `_build/main.go` consumes `builder-config.yaml`). |
| `internal/pipelinebuilder/` | 1,282 | Same — assembly is upstream
`service`. |
| `internal/config/` | 718 | Upstream `confmap` providers (`file`,
`yaml`, `env`). |
| `internal/consumer/` | 87 | Upstream
`go.opentelemetry.io/collector/consumer`. |
| `internal/fanout/` | 366 | Upstream `internal/fanoutconsumer`
(collector module). |
| `internal/componentstatus/` | 16 | Upstream
`component/componentstatus.ReportStatus` (same free-function shape). |
| `internal/runtime/lifecycle/` | 505 | Per-receiver package-local
`lifecycle.go` siblings — already ported during the PR-B1 wave
(#184/#185/#186/#187/#194/#196/#197); the in-tree helper had no
remaining non-test consumer after PR-F.1 + the wave-2 upstream-port PRs.
`kernelevents/lifecycle.go` was inherited from k8sevents (#208). |

## Pre-flight grep evidence

```
$ grep -rn 'tracecoreai/tracecore/internal/(pipeline|consumer|pipelinebuilder|config|fanout|componentstatus|runtime/lifecycle)' --include='*.go' .
(zero matches)
```

## Tooling

- `.golangci.yml` `ignore-interface-regexps` repointed at upstream
`consumer.{Metrics,Traces,Logs}` + `component.Component`. The
in-tree-only same-package-error-wrap exemption stays — the STYLE rule
applies regardless of which interface is forwarded.
- `.github/workflows/chaos.yml` drops the `chaos-pipeline-test` job (the
in-tree `internal/pipeline/chaos_test.go` is gone; upstream `service`
provides the equivalent panic-recovery contract). `harness-determinism`
(failure-inject golden-SHA), `cpu-steal-mpstat`, `pattern-pod-evicted`
jobs preserved.
- `.github/workflows/install-bench.yml` drops the
`internal/{pipeline,runtime,selftelemetry}/**` path-filter rows.
- `go.mod` / `go.sum` unchanged.

## Doc sweep

- `CHANGELOG.md` Unreleased: PR-F.2 landed entry replacing the "PR-F.2
deferred" sentence; "Remaining v0.1.0 work" line updated; one dead
`internal/pipeline/README.md` link in Foundation block rewritten as
"deleted at v0.1.0".
- `docs/rfcs/0013-distro-first-pivot.md` §7 deletion table: both
pipeline-internals and runtime/lifecycle rows updated from "v0.1.0
(audit first…)" / "v0.2.0 (with last consumer)" to "v0.1.0 (landed
PR-F.2)". §migration step 8 reframed.
- `docs/FAILURE-MODES.md` Lifecycle / Data flow / Shutdown timing /
Backend tables rewired from in-tree
`internal/{config,pipeline,fanout}/*_test.go::TestName` pointers to
upstream-delegated wording matching the pattern PR-A2 established.
- `docs/STRATEGY.md` "Post-RFC-0013 status" intro updated; "Stable
interfaces in `internal/pipeline/`" graduation row rewritten to point at
the upstream surface.
- `docs/migration/v0.1-to-v0.2.md` `internal/*` section status banner
flipped from "deferred, still present in RC builds" to "landed, deleted
in v0.2.0 builds".
- `MILESTONES.md` v0.1.0 deletions row extended with boot-path
internals; M1 + M4b + M19 rubric details annotated with the PR-F.2
retirement.
- `README.md` Contributor row repointed at upstream
`go.opentelemetry.io/collector` package docs.
- `AGENTS.md` "Self-telemetry internals" bullet split into "Self-tel
internals" + "Pipeline / boot-path internals" with explicit deletion
status.
- `docs/README.md` table row for `internal/pipeline/README.md` dropped.
- `components/receivers/kernelevents/README.md` lifecycle-sibling
rationale updated to past-tense.
- `tools/failure-inject/README.md` "Testing locally" section drops the
`-tags=chaos ./internal/pipeline/...` invocation.

## Sequencing

This PR is hard-gated on every upstream-port PR landing first:

- #201 nccl_fr (PR-B2)
- #202 stdoutexporter
- #203 pyspy
- #204 k8sevents
- #205 clockreceiver (PR-B3)
- #207 otlphttp
- #208 kernelevents
- #209 containerstdout
- #206 PR-F.1 (selftel / telemetry / dcgm)

All nine merged before this PR opened; this is the moat-deletion payoff.
Remaining v0.1.0 work is PR-K (chart-default flip + `clockreceiver` +
`stdoutexporter` + remaining receiver source deletions, coupled with
test-fixture migration and the `telemetry:` values-key deprecation
cycle).

## Test plan

- [x] `make check` — golangci-lint 0 issues, go vet clean, go mod verify
ok.
- [x] `go build ./...` — clean.
- [x] `go test -count=1 ./...` — green (excluding the known
`kernelevents/TestReceiver_SLIBudget` flake called out in #205's body,
which only triggers under heavy parallel `go test ./...` load; passes
standalone).
- [x] `grep` confirms zero non-internal callers of the deleted packages.
- [x] Doc-check pre-push hook passes after the CHANGELOG dead-link fix.

```release-notes
[CHANGE] internal/{pipeline,pipelinebuilder,config,consumer,fanout,componentstatus,runtime/lifecycle} packages deleted. The OCB-generated boot path off builder-config.yaml replaces them. Third-party importers of internal/* (unlikely pre-1.0; the packages live under internal/ and the Go compiler rejects external imports) lose the pipeline-assembly + lifecycle + config-loader surfaces; receiver authors now wire against upstream go.opentelemetry.io/collector/{component,receiver,consumer,pipeline} directly. See docs/migration/v0.1-to-v0.2.md "internal/* package deletion".
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant