Skip to content

feat(pivot): port containerstdout off internal pipeline+consumer#209

Merged
trilamsr merged 1 commit into
mainfrom
pr-containerstdout-upstream
May 31, 2026
Merged

feat(pivot): port containerstdout off internal pipeline+consumer#209
trilamsr merged 1 commit into
mainfrom
pr-containerstdout-upstream

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

Summary

  • Port components/receivers/containerstdout off the v0.1.x internal
    facades (internal/pipeline, internal/consumer) onto upstream
    go.opentelemetry.io/collector/{component,receiver,consumer} v1.59.0
    — the canonical types the OCB-generated _build/main.go already
    consumes for all third-party receivers.
  • Follow-up to PR-F feat(pivot): PR-F port containerstdout off internal selftel + lc #197 (which ported off internal/selftelemetry
  • Factory is now receiver.NewFactory(componentType, createDefaultConfig, receiver.WithLogs(createLogs, component.StabilityLevelBeta)) instead of a hand-rolled struct
    implementing internal/pipeline.ReceiverFactory. Stability level
    (Beta) preserved across the swap so OCB-surfaced metadata doesn't
    regress. The Factory package-var + type factory struct{} are
    deleted; each OCB-stitched pipeline gets a freshly-built factory via
    containerstdout.NewFactory() (mirrors upstream-contrib
    otlpreceiver / filelogreceiver and the sibling nccl_fr port).
  • Receiver + noopReceiver no longer embed pipeline.ComponentState
    (upstream component.Component carries no equivalent
    Started/Stopped mixin; the runtime never read that bookkeeping on
    the upstream graph). Lifecycle bookkeeping the receiver actually
    needs lives in the in-package lifecycle helper added in PR-F feat(pivot): PR-F port containerstdout off internal selftel + lc #197.
  • Logger swapped from *slog.Logger → upstream's *zap.Logger (the
    type carried in receiver.Settings.Logger). All log call sites
    converted to zap.String/Int64/Float64/Bool/Error fields; log
    messages and field names are byte-for-byte preserved so operator
    alerting on log content does not regress. Internal lifecycle helper,
    tailer, informer, pipeline, and receiver all converted in lockstep.

Type-swap reference

Inherited verbatim from PR-B2 #201 — the canonical mapping for the
PR-F.2 receiver/exporter ports.

Internal Upstream
internal/pipeline.Type component.Type
internal/pipeline.ReceiverFactory receiver.Factory
internal/pipeline.CreateSettings receiver.Settings (via receivertest.NewNopSettings in tests)
internal/pipeline.Config component.Config
internal/pipeline.Receiver receiver.Logs (= interface{ component.Component })
internal/pipeline.Host component.Host (via componenttest.NewNopHost in tests)
internal/pipeline.ID component.ID
internal/consumer.Logs consumer.Logs
*slog.Logger *zap.Logger
internal/pipeline.MustNewType component.MustNewType
internal/pipeline.MustNewID component.NewIDWithName

Hard gate

PR-I.1 (submodule extraction to module/receiver/containerstdoutreceiver/)
requires zero internal/* imports from the receiver package. This PR
clears it:

$ grep -rn 'internal/pipeline\|internal/consumer\|internal/runtime/lifecycle\|internal/selftelemetry' components/receivers/containerstdout/*.go
(no matches)

Comment-only historical references remain in factory.go,
receiver.go, noop_receiver.go, selftel.go, kind_test.go, and
selftel_test.go documenting the v0.1.x → v0.2.0 migration; they are
not imports.

Predecessor / scope

Together with the sibling receiver/exporter ports already merged
(clockreceiver / kernelevents / stdoutexporter / k8sevents / otlphttp /
pyspy / dcgm / nccl_fr), this is the last internal/{pipeline,consumer}
import site on components/receivers/containerstdout — once the other
sibling PR-F.2 ports land, RFC-0013 PR-F can delete the
internal/{pipeline,consumer} packages outright.

Test removals (intentional)

Three tests are removed because the upstream API makes them tautological:

  • TestFactory_CreateMetrics_Unsupported /
    TestFactory_CreateTraces_Unsupportedreceiver.NewFactory(... WithLogs(...)) returns a factory whose CreateMetrics / CreateTraces
    surface upstream's componenterror.ErrDataTypeIsNotSupported
    naturally; no hand-rolled sentinel to pin.
  • TestNewFactory_ReturnsPackageVarFactory — there is no longer a
    Factory package-var; NewFactory() constructs a fresh factory
    each call, mirroring upstream-contrib. Mirrors the PR-B2 nccl_fr
    removal.

Compatibility note

go.mod promotes go.opentelemetry.io/collector/component/componenttest
from indirect to direct (used by receiver_test.go +
pipeline_test.go for componenttest.NewNopHost()). No transitive-dep
churn beyond that — all go.opentelemetry.io/collector/{component, receiver,consumer} v1.59.0 + receiver/receivertest v0.153.0 were
already pinned by PR-B2.

Test plan

  • make check — gofumpt clean, golangci-lint 0 issues, vet clean,
    go mod verified.
  • go test -race ./components/receivers/containerstdout/... -count=10
    — all tests green under race across 10 runs, including stress runs
    of TestTailer_RotationStalledKind and the TOCTOU pin
    TestContainerstdout_Lifecycle_ConcurrentAddDuringShutdown_NoPanic.
  • go test ./... — full repo green except the pre-existing
    TestK8sevents_Lifecycle_ConcurrentAddDuringShutdown_NoPanic
    race-window flake (passes on retry; documented in PR-B2 feat(pivot): PR-B2 — port nccl_fr off internal pipeline+consumer #201 +
    PR-F feat(pivot): PR-F port containerstdout off internal selftel + lc #197 bodies).
  • TestContainerstdout_SelfTel_ScopeNameIsReceiverImportPath still
    pins the OTel scope name to the receiver's Go import path
    (github.com/tracecoreai/tracecore/components/receivers/containerstdout)
    so a regression back to the deleted internal/selftelemetry scope
    fails here.
NONE

Ports `components/receivers/containerstdout` off the v0.1.x internal
facades (`internal/pipeline`, `internal/consumer`) onto upstream
`go.opentelemetry.io/collector/{component,receiver,consumer}` v1.59.0
— follow-up to PR-F #197 (selftel + lifecycle port) and mirrors PR-B2
#201 (nccl_fr same swap). After this PR the receiver has zero
`internal/*` imports, clearing the PR-I.1 submodule-extraction gate.

Type-swap (per PR-B2 #201 reference table):

  internal/pipeline.Type             → component.Type
  internal/pipeline.ReceiverFactory  → receiver.Factory
  internal/pipeline.CreateSettings   → receiver.Settings
  internal/pipeline.Config           → component.Config (any)
  internal/pipeline.Receiver         → receiver.Logs
  internal/pipeline.Host             → component.Host
  internal/pipeline.ID               → component.ID
  internal/consumer.Logs             → consumer.Logs
  *slog.Logger                       → *zap.Logger

Factory is now `receiver.NewFactory(componentType, createDefaultConfig,
receiver.WithLogs(createLogs, component.StabilityLevelBeta))` —
stability level preserved across the swap so OCB-surfaced metadata
doesn't regress. The hand-rolled `Factory` package-var + `type factory
struct{}` are deleted; each OCB-stitched pipeline gets a freshly-built
factory via `containerstdout.NewFactory()`. The `CreateMetrics` /
`CreateTraces` sentinel methods are no longer needed —
receiver.NewFactory's default unimplemented behavior surfaces "signal
not supported" naturally, so the two sentinel tests are removed.

Receiver + noopReceiver no longer embed `pipeline.ComponentState`
(upstream `component.Component` carries no equivalent
Started/Stopped mixin; the runtime never read that bookkeeping on the
upstream graph). Lifecycle bookkeeping the receiver actually needs
lives in the in-package `lifecycle` helper added in PR-F #197.

Logger swapped from `*slog.Logger` → upstream's `*zap.Logger` (the
type carried in `receiver.Settings.Logger`). All log call sites
converted to `zap.String/Int64/Float64/Bool/Error` fields; log
messages and field names are byte-for-byte preserved so operator
alerting on log content does not regress. Internal lifecycle helper,
tailer, informer, pipeline, and receiver all converted in lockstep.

Tests swap `pipelinetest.New(t)` → `receivertest.NewNopSettings(componentType())`
+ `componenttest.NewNopHost()`. A package-local `testSettings()` helper
pins the ID to `containerstdout/test` so selftel label assertions
stay deterministic; mirrors the sibling PR-B2 nccl_fr pattern.

Hard gate (PR-I.1 submodule extraction):

  $ grep -rn 'internal/pipeline\|internal/consumer\|internal/runtime/lifecycle\|internal/selftelemetry' components/receivers/containerstdout/*.go
  (no matches)

Comment-only historical references remain in factory.go, receiver.go,
noop_receiver.go, selftel.go, kind_test.go, selftel_test.go documenting
the v0.1.x → v0.2.0 migration; they are not imports.

Compatibility note: `go.mod` promotes
`go.opentelemetry.io/collector/component/componenttest` from indirect
to direct (used by receiver_test.go + pipeline_test.go for
componenttest.NewNopHost()). No transitive-dep churn beyond that.

Test plan:
- `make check` — gofumpt clean, golangci-lint 0 issues, vet clean,
  go.mod verified
- `go test -race ./components/receivers/containerstdout/... -count=10`
  — all tests green under race, including stress runs of
  TestTailer_RotationStalledKind and
  TestContainerstdout_Lifecycle_ConcurrentAddDuringShutdown_NoPanic
- `go test ./...` — full repo green except the pre-existing
  TestK8sevents_Lifecycle_ConcurrentAddDuringShutdown_NoPanic
  race-window flake (passes on retry; documented in PR-B2 #201 +
  PR-F #197 bodies)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 07:52
@trilamsr trilamsr merged commit 476d328 into main May 31, 2026
14 checks passed
@trilamsr trilamsr deleted the pr-containerstdout-upstream branch May 31, 2026 08:00
trilamsr added a commit that referenced this pull request May 31, 2026
…ts+k8sevents) (#210)

## Summary

Backport containerstdout's hardened TOCTOU race-window test pattern
(#197, #209) to `kernelevents` and `k8sevents`
`Lifecycle_ConcurrentAddDuringShutdown_NoPanic` tests. The pre-hardened
tests have a brittle race window that collapses on fast schedulers —
Shutdown wins universally, every Add no-ops via the closed-guard, and
the test passes vacuously while never exercising the TOCTOU path it
claims to cover. On slightly slower (or differently-loaded) CI machines,
the same brittleness flips and produces intermittent failures, blocking
unrelated PRs.

## Root cause

The lifecycle's mutex guard around `(closed-check, wg.Add(1))` is
correct in production code. The *test*, however, releases all 50 adders
+ the shutdowner from a single gate, then relies on `runtime.Gosched()`
alone to interleave them. On fast schedulers the shutdowner reaches
`lc.Shutdown(...)` before any adder reaches `lc.Add(...)`, so:

- Branch (a) "Add wins, registers under WaitGroup" never fires
- Branch (b) "Shutdown wins, Add no-ops" fires for all 50 adders
- Test passes but tests nothing meaningful
- On flip side: race window straddle changes between CI runs, surfacing
as flake

Containerstdout's hardened equivalent inserts a `shutdownGate` channel
that holds the shutdowner back until 50µs after `release` fires,
deterministically straddling adders-in-flight with the Shutdown call.
This PR ports that pattern verbatim.

## Changes

- `components/receivers/kernelevents/lifecycle_test.go`: add
`shutdownGate` chan + 50µs sleep + comment explaining intent
- `components/receivers/k8sevents/lifecycle_test.go`: same

Production code: unchanged.

## Verification

```
for i in $(seq 1 10); do go test -race -count=1 -run ConcurrentAddDuringShutdown ./components/receivers/kernelevents/...; done
# 10/10 PASS

for i in $(seq 1 10); do go test -race -count=1 -run ConcurrentAddDuringShutdown ./components/receivers/k8sevents/...; done
# 10/10 PASS
```

Branch-coverage check across 10 verbose iterations (registered count <50
== branch (b) exercised):

- kernelevents: registered counts 44, 47, 49, 50, 50, 47, 44, 50, 50, 50
→ branch (b) hit 6/10 iterations
- k8sevents: registered counts 49, 49, 50, 50, 50, 48, 50, 44, 48, 50 →
branch (b) hit 7/10 iterations

Both branches deterministically exercised. The existing `registeredCount
== 0` guard in the test prevents the inverse-vacuous regression
(all-no-op).

Full repo: `make check` clean, `go test -race ./...` green.

## Motivation

Unblocks four in-flight PRs hitting this flake on CI:
- #203
- #204
- #205
- #207

## Reference

Containerstdout's port: #197 (original finding + fix), #209 (additional
hardening). Same pattern applied here with no behavioral divergence.

## Test plan

- [x] `make check` passes locally
- [x] `go test -race ./...` passes locally
- [x] 10-iter stress on `ConcurrentAddDuringShutdown` in both target
packages, 10/10 pass
- [x] Verified both TOCTOU branches still exercised (branch (a) every
iter; branch (b) 6-7/10 iters)
- [ ] CI green on this branch
- [ ] PRs #203, #204, #205, #207 re-run + CI green after merge

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Cross-cut review cleanup of
[#205](#205) (clockreceiver
port off internal/pipeline). Two convention drifts caught when reviewing
the merged PR against peer receivers; both are pure realignment, no
behavior change.

## Root cause

#205 ported `clockreceiver` from `internal/pipeline+consumer` to the
upstream `receiver.Factory` shape. The port worked, but two
package-surface decisions diverged from every other component package in
the repo:

1. **`type Receiver struct` exported.** Every peer keeps the concrete
struct package-private:
   - `containerStdoutReceiver` (components/receivers/containerstdout)
   - `k8sEventsReceiver` (components/receivers/k8sevents)
   - `kernelEventsReceiver` (components/receivers/kernelevents)
   - `ncclfrReceiver` (components/receivers/nccl_fr)
   - `pyspyReceiver` (components/receivers/pyspy)

The factory is the only exported constructor — callers stitch via OCB,
never by importing the concrete struct. `git grep
clockreceiver.Receiver` returns zero external hits, confirming no caller
depended on the exported name. Restore convention.

2. **Local `type componenttest struct{}` test stub.** The fake
`component.Host` in `clockreceiver_test.go` shadowed the upstream
package `go.opentelemetry.io/collector/component/componenttest` — which
is already in `go.mod` at `v0.153.0` and is what every peer test uses
(`componenttest.NewNopHost()`). The local stub was invented for no
reason; peer ports
[#208](#208)
(containerstdout) and
[#209](#209) (k8sevents)
use the upstream helper directly.

Neither drift is a workaround for an upstream blocker — both are pure
convention misses caught post-merge. This PR is the root-cause fix.

## Changes

- Rename `Receiver` → `clockReceiver` across `clockreceiver.go`,
`factory.go` (comment), `selftel_test.go` (type-assertion).
- Drop local `componenttest{}` stub from `clockreceiver_test.go`;
replace usage with upstream `componenttest.NewNopHost()`.
- Add doc comment on `clockReceiver` explaining the package-private
convention so a future drift fails review.

Net diff: 4 files, +23/-23 LOC.

## Test plan

- [x] `make check` (fmt + tidy + lint + vet + mod-verify) — green
- [x] `go test -race ./components/receivers/clockreceiver/...` — green
- [x] Verified no external `clockreceiver.Receiver` callers exist (`grep
-rn 'clockreceiver\.Receiver' --include='*.go'` returns nothing)
- [x] Verified `componenttest.NewNopHost()` is the peer-receiver pattern
(containerstdout, kernelevents, nccl_fr, k8sevents)

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Reconcile the four pivot-tracking docs
(`docs/rfcs/0013-distro-first-pivot.md`, `CHANGELOG.md`,
`MILESTONES.md`, `docs/migration/v0.1-to-v0.2.md`) with the wave-3
(PR-B1-shape sibling ports) and wave-4 (PR-B2-shape upstream-only ports
+ PR-F.1 + PR-J + PR-L + PR-N) landings. Pure doc sweep — no code or
config touched.

## What changed

### `docs/rfcs/0013-distro-first-pivot.md` §migration

PR sequence rows updated with PR-number citations and landed markers:

- **PR-A2** (landed, #189, 2026-05-30)
- **PR-B2** (landed, #201) — also enumerates sibling-receiver follow-ups
under PR-B2 to dispel the slug collision with #188's PR-B2-labelled dcgm
port: stdoutexporter (#202), pyspy (#203), kernelevents (#208),
containerstdout (#209)
- **PR-F.1** (landed) — fleshed-out delete list
(`internal/{selftelemetry,telemetry}` + `components/receivers/dcgm/` +
`pkg/dcgm/` + one orphan clockreceiver integration test)
- **PR-F.2** re-scoped — now deletes the whole
`internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}`
bundle in one cut once the last three pipeline+consumer-importing
receivers land (#204 k8sevents, #205 clockreceiver, #207 otlphttp). Per
the import-graph state — `internal/componentstatus`'s only non-test
consumer is `internal/pipeline`, so they delete together
- **PR-G** (landed, #182), **PR-H** (landed, #183)
- **PR-I.1a** (in flight — scaffold agent), **PR-I.1b** (pre-staged;
gate satisfied by #201)
- **PR-J** (landed, #195) — kept existing marker
- **PR-K.1** (in flight — separate agent landing)
- **PR-L** (landed, skeleton #179 + body #191) — flagged as living
document
- **PR-N** (landed, #200) — shipped at v0.1.0 ahead of v0.3.0 as a
doc-only update at `docs/migration/v0.2-to-v0.3.md`

### `CHANGELOG.md` [Unreleased]

- Restructured the pivot wave list as **four waves** (was three). Wave 3
enumerates PR-B1-shape sibling ports + support infra (#180-#194/#196).
Wave 4 enumerates PR-B2-shape upstream-only ports + PR-J (#195) + PR-F.1
(#206) + PR-N (#200) + lint/TOCTOU hardening (#198/#210).
- Tightened the PR-F.2 deferred note to point at the three open ports
(#204/#205/#207) as the gate.

### `MILESTONES.md`

- **M1** (pipeline runtime) — status row now cites PR-A2 (#189), PR-F.1
(#206), PR-F.2 gate (#204/#205/#207), PR-E (#180), retains
`internal/config/` (still load-bearing for `tracecore validate`).
- **M2** (self-telemetry) — status row now cites PR-F.1 (#206); flags
`internal/componentstatus` as travelling with `internal/pipeline` in
PR-F.2.
- **M8** (DCGM receiver) — status flipped to *landed-and-replaced*:
cites PR-F.1 (#206) deletion + PR-J (#195)
`docs/integrations/prometheus-scrape.md` recipe. Notes the inert chart
toggle retention until PR-K.3.

### `docs/migration/v0.1-to-v0.2.md`

- §`internal/*` package deletion (PR-F) status flips from "not yet open"
to "PR-F.1 landed (#206), PR-F.2 gated on three open ports".
- Open-items checklist expanded from 5 to 13 entries — tracks every PR
letter the migration guide cares about (A2 / E / F.1 / F.2 / I.1a-c / J
/ K.1-3 / L / N) with PR numbers and links.

## Why now

Tracking docs accumulated drift across wave-3 + wave-4 because every
sibling-port PR (and the support-infra PRs around them) updated the
bottom of `CHANGELOG.md` but did not always touch the upstream
sequencing section in RFC-0013. Per memory rule `[Keeping this document
current]`: status drift is a review blocker. This PR is the consolidated
catch-up; future port PRs include their RFC-row flip in-PR.

## What this PR does NOT change

- No code, no config, no YAML, no chart — only the four tracking docs.
- No new doc gates added; existing gates pass.
- No PRs other than the four named docs are modified.

## Test plan

- [x] `bash scripts/doc-check.sh` clean (33 test refs, 528 links
resolve, comment-noise diff gate clean vs `origin/main`, all 13 gates
green).
- [x] Pre-commit hook (`commitlint` 72-char subject limit + DCO +
AI-trailer gates) passed.
- [x] Pre-push hook (`make ci-fast` equivalent: `golangci-lint`, `go
vet`, `go mod verify`, `no-autoupdate-check`, `doc-check.sh`) passed on
second attempt after `git fetch origin main` populated the worktree's
`origin/main` ref — first push failed because the worktree previously
tracked the (gone) `pr-a2-ocb-main-swap` branch, so `doc-check.sh`'s
comment-noise diff-scope gate exited 128 on the missing ref. Root cause
fixed by the fetch; not a workaround.
- [ ] CI green on this branch.

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
## Summary

Deletes the seven `internal/*` packages that RFC-0013 §migration step 8
PR-F.2 promised once the upstream-port wave
(#201/#202/#203/#204/#205/#207/#208/#209) cleared every external caller
of the in-tree pipeline runtime.

**Net: -6,888 LOC across 56 deleted files, +80 LOC across 14 modified
files. 70 files total.** This is the final cut of RFC-0013 §migration
step 8 PR-F.

## What deletes

| Path | LOC | Replacement |
|---|---|---|
| `internal/pipeline/` | 4,134 | `go.opentelemetry.io/collector/service`
(OCB-generated `_build/main.go` consumes `builder-config.yaml`). |
| `internal/pipelinebuilder/` | 1,282 | Same — assembly is upstream
`service`. |
| `internal/config/` | 718 | Upstream `confmap` providers (`file`,
`yaml`, `env`). |
| `internal/consumer/` | 87 | Upstream
`go.opentelemetry.io/collector/consumer`. |
| `internal/fanout/` | 366 | Upstream `internal/fanoutconsumer`
(collector module). |
| `internal/componentstatus/` | 16 | Upstream
`component/componentstatus.ReportStatus` (same free-function shape). |
| `internal/runtime/lifecycle/` | 505 | Per-receiver package-local
`lifecycle.go` siblings — already ported during the PR-B1 wave
(#184/#185/#186/#187/#194/#196/#197); the in-tree helper had no
remaining non-test consumer after PR-F.1 + the wave-2 upstream-port PRs.
`kernelevents/lifecycle.go` was inherited from k8sevents (#208). |

## Pre-flight grep evidence

```
$ grep -rn 'tracecoreai/tracecore/internal/(pipeline|consumer|pipelinebuilder|config|fanout|componentstatus|runtime/lifecycle)' --include='*.go' .
(zero matches)
```

## Tooling

- `.golangci.yml` `ignore-interface-regexps` repointed at upstream
`consumer.{Metrics,Traces,Logs}` + `component.Component`. The
in-tree-only same-package-error-wrap exemption stays — the STYLE rule
applies regardless of which interface is forwarded.
- `.github/workflows/chaos.yml` drops the `chaos-pipeline-test` job (the
in-tree `internal/pipeline/chaos_test.go` is gone; upstream `service`
provides the equivalent panic-recovery contract). `harness-determinism`
(failure-inject golden-SHA), `cpu-steal-mpstat`, `pattern-pod-evicted`
jobs preserved.
- `.github/workflows/install-bench.yml` drops the
`internal/{pipeline,runtime,selftelemetry}/**` path-filter rows.
- `go.mod` / `go.sum` unchanged.

## Doc sweep

- `CHANGELOG.md` Unreleased: PR-F.2 landed entry replacing the "PR-F.2
deferred" sentence; "Remaining v0.1.0 work" line updated; one dead
`internal/pipeline/README.md` link in Foundation block rewritten as
"deleted at v0.1.0".
- `docs/rfcs/0013-distro-first-pivot.md` §7 deletion table: both
pipeline-internals and runtime/lifecycle rows updated from "v0.1.0
(audit first…)" / "v0.2.0 (with last consumer)" to "v0.1.0 (landed
PR-F.2)". §migration step 8 reframed.
- `docs/FAILURE-MODES.md` Lifecycle / Data flow / Shutdown timing /
Backend tables rewired from in-tree
`internal/{config,pipeline,fanout}/*_test.go::TestName` pointers to
upstream-delegated wording matching the pattern PR-A2 established.
- `docs/STRATEGY.md` "Post-RFC-0013 status" intro updated; "Stable
interfaces in `internal/pipeline/`" graduation row rewritten to point at
the upstream surface.
- `docs/migration/v0.1-to-v0.2.md` `internal/*` section status banner
flipped from "deferred, still present in RC builds" to "landed, deleted
in v0.2.0 builds".
- `MILESTONES.md` v0.1.0 deletions row extended with boot-path
internals; M1 + M4b + M19 rubric details annotated with the PR-F.2
retirement.
- `README.md` Contributor row repointed at upstream
`go.opentelemetry.io/collector` package docs.
- `AGENTS.md` "Self-telemetry internals" bullet split into "Self-tel
internals" + "Pipeline / boot-path internals" with explicit deletion
status.
- `docs/README.md` table row for `internal/pipeline/README.md` dropped.
- `components/receivers/kernelevents/README.md` lifecycle-sibling
rationale updated to past-tense.
- `tools/failure-inject/README.md` "Testing locally" section drops the
`-tags=chaos ./internal/pipeline/...` invocation.

## Sequencing

This PR is hard-gated on every upstream-port PR landing first:

- #201 nccl_fr (PR-B2)
- #202 stdoutexporter
- #203 pyspy
- #204 k8sevents
- #205 clockreceiver (PR-B3)
- #207 otlphttp
- #208 kernelevents
- #209 containerstdout
- #206 PR-F.1 (selftel / telemetry / dcgm)

All nine merged before this PR opened; this is the moat-deletion payoff.
Remaining v0.1.0 work is PR-K (chart-default flip + `clockreceiver` +
`stdoutexporter` + remaining receiver source deletions, coupled with
test-fixture migration and the `telemetry:` values-key deprecation
cycle).

## Test plan

- [x] `make check` — golangci-lint 0 issues, go vet clean, go mod verify
ok.
- [x] `go build ./...` — clean.
- [x] `go test -count=1 ./...` — green (excluding the known
`kernelevents/TestReceiver_SLIBudget` flake called out in #205's body,
which only triggers under heavy parallel `go test ./...` load; passes
standalone).
- [x] `grep` confirms zero non-internal callers of the deleted packages.
- [x] Doc-check pre-push hook passes after the CHANGELOG dead-link fix.

```release-notes
[CHANGE] internal/{pipeline,pipelinebuilder,config,consumer,fanout,componentstatus,runtime/lifecycle} packages deleted. The OCB-generated boot path off builder-config.yaml replaces them. Third-party importers of internal/* (unlikely pre-1.0; the packages live under internal/ and the Go compiler rejects external imports) lose the pipeline-assembly + lifecycle + config-loader surfaces; receiver authors now wire against upstream go.opentelemetry.io/collector/{component,receiver,consumer,pipeline} directly. See docs/migration/v0.1-to-v0.2.md "internal/* package deletion".
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request Jun 2, 2026
## Summary

Closes the M3 carry-forward in `docs/MILESTONES.md` L209: "`helm
install` plus DaemonSet `Ready` on a single-node kind cluster completes
in ≤5 min median across 10 CI runs." The single-run ≤300s gate already
lives in `chart.yml`; this PR adds the missing 10-run rolling-median
aggregation layer.

**Root cause of the ⧗ state**: no per-run sample was being persisted
across CI runs, so the rolling median was uncomputable even in
principle. The single-run gate had nothing to roll up against. Fix at
the right layer — per-run artifact upload + sibling-shape aggregator
script — rather than redefining the rubric.

Sibling pattern: PR #446's `bench-cv-rolling` artifact pipeline. Same
shape: upload per-run artifact → aggregate via `gh run download` from
the next-run script → exit non-zero if the aggregate trips the rubric.

## Pieces

- **`.github/workflows/chart.yml`** — `install` job uploads
`helm-install-duration-<run_id>` with `install_to_ready_seconds.txt`
(single integer) + metadata. 90-day retention. `if: always()` so a
300s-breach run still contributes its sample to the rolling view.
- **`scripts/helm-install-rolling.sh`** — downloads last N=10 successful
main-branch chart.yml runs, computes median, fails if median > 300.
Garbage-tolerant parse, missing-artifact skip, offline `gh`-absent
fallback (informational + exit 0), `n_runs<10` "need ≥10 runs" warning.
- **`scripts/helm-install-rolling_test.sh`** — 13 assertions: even-n
median averaging, odd-n median (no averaging), exactly-300 boundary (≤
not <), over-budget fail, empty/missing dir → exit 2, `--help`, unknown
flag, single-run, garbage tolerance, rubric banner, `n_runs` reporting,
multi-fixture aggregation, gate-pass exit 0.
- **`make helm-install-rolling-report`** — operator entry point. Honours
`N=20 make helm-install-rolling-report`.
- **`docs/MILESTONES.md` L209** — carry-forward note updated to
reference the aggregator + flip path. Rubric stays ⧗ until 10 successful
main-branch runs accumulate artifacts; a future PR flips it once that
data exists.
- **`install/kubernetes/tracecore/README.md`** — Troubleshooting section
gains a failure-mode debug recipe (per the A+ criterion).

## Why median, not CV

`bench-cv-rolling.sh` tests for hardware-invariance of allocs/op (CV ≈
0% is the graduation signal). `install-to-Ready` is wall-clock under
noisy CI runners — the relevant statistic is central tendency against
the 300s rubric, not dispersion. Matches `MILESTONES.md` wording
verbatim ("median across 10 CI runs").

## Verification

- `shellcheck scripts/helm-install-rolling.sh
scripts/helm-install-rolling_test.sh` → exit 0
- `actionlint .github/workflows/chart.yml` → exit 0
- `bash scripts/helm-install-rolling_test.sh` → 13/13 PASS
- Mutation tests:
1. Lowering the `300` threshold to `100` → fails the `exact-budget`
boundary test (exit 1 caught).
2. Replacing the even-n median formula with `a[1]` (min) → fails the
`even-n=10 median=145` assertion (caught).
- Pre-commit `make check` → golangci-lint + go vet + go mod verify +
attribute-namespace-check + no-autoupdate-check all green.

## Test plan

- [x] shellcheck both scripts → exit 0
- [x] actionlint `chart.yml` → exit 0
- [x] 13/13 shell-test assertions pass locally
- [x] Mutation-verified: threshold + median formula both produce failing
tests when mutated
- [x] `make helm-install-rolling-report` runs offline (no `gh` auth
needed) and prints informational fallback rather than crashing
- [ ] CI `chart.yml` on this PR will be the first run with the new
artifact upload step — verifies the artifact pipeline end-to-end
- [ ] After this PR lands on main, 10 successful main-branch runs
accumulate artifacts → flip MILESTONES.md L209 ⧗ → ☑ in a follow-up

## Self-grade: A+

- **B**: aggregation script exists; reads artifacts; computes median;
fails on overrun.
- **A**: above + wired into CI (per-run artifact upload in `chart.yml`);
MILESTONES.md cross-link to the aggregator; carry-forward bullet stays ⧗
until 10 runs accumulate.
- **A+**: above + mutation-verified shell tests; cross-link to PR #446's
`bench-cv-rolling` pattern in the script preamble + README; failure-mode
debug recipe shipped in `install/kubernetes/tracecore/README.md`.

```release-notes
- New `scripts/helm-install-rolling.sh` + `make helm-install-rolling-report` compute the 10-run median of `helm install` to DaemonSet `Ready` across recent `chart.yml` runs on main; drives the M3 carry-forward (`docs/MILESTONES.md` L209) graduation.
- `chart.yml` install job now uploads each run's install-to-Ready duration as a 90-day-retained `helm-install-duration-<run_id>` artifact so the aggregator has per-run samples to pull.
- Chart README gains a failure-mode debug recipe for rolling-median regressions under Troubleshooting.
```

Signed-off-by: Tri Lam <tree@lumalabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant