Skip to content

feat(pivot): port otlphttp off internal/selftelemetry (PR-F unblock)#193

Merged
trilamsr merged 3 commits into
mainfrom
pr-otlphttp-selftel-port
May 31, 2026
Merged

feat(pivot): port otlphttp off internal/selftelemetry (PR-F unblock)#193
trilamsr merged 3 commits into
mainfrom
pr-otlphttp-selftel-port

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

Summary

Sibling-port the otlphttp exporter off internal/selftelemetry so RFC-0013 PR-F can delete the internal package. Mirrors the stdoutexporter precedent landed in #186.

  • components/exporters/otlphttp/selftel.go owns the package-local kind, selfExporter, newSelfExporter, and recordInitError (no internal/selftelemetry import).
  • Metric contract preserved: tracecore.exporter.calls_total{result, kind, component_id} with the same three kind values the v0.1.x exporter emitted (marshal, io, downstream). init_errors_total ticks on register failure via the same factory fallback shape as stdoutexporter. Dashboards / alerts keyed on the counter do not regress.
  • Instrumentation scope name pins to the exporter's Go import path (github.com/tracecoreai/tracecore/components/exporters/otlphttp); when the exporter moves under module/ in PR-I, the scope name moves with it.

ExporterCarrier dropped (intentional gap)

The v0.1.x SelfExporter() selftelemetry.Exporter carrier is removed. There is no current production consumer of the carrier in this tree, and PR-F deletes the contract regardless. The runtime degrades to the documented "no per-exporter signal" mode; tracecore_exporter_failure_rate still appears in scrape via the SLO observable gauge (reports 0 with no readers). Documented at newSelfTelemetry in otlphttp.go. Matches the stdoutexporter precedent.

Root cause

The otlphttp exporter's internal/selftelemetry import was the load-bearing dependency blocking PR-F from deleting the internal package. Root-caused at the import itself, not a symptom — fixed by full sibling port (no shim, no compat layer).

Test plan

  • make check (golangci-lint, vet, mod verify) — clean.
  • go test ./components/exporters/otlphttp/... — all tests green (8 new sibling tests + 5 rewired classify tests + 22 pre-existing integration tests).
  • go build ./... — full tree compiles; no other consumers of otlphttp's selftelemetry surface.
  • New tests cover: noop hot-path safety; errNilMeterProvider returned (not silent noop); calls_total emission with all three kinds + component_id label; OTel scope name pinned to the exporter import path; init_errors_total shape with kind=exporter + reason=instrument_register; factory fallback when meter registration fails (synthetic failingExporterMP); factory fallback when MeterProvider is nil; sibling-types compile pin (any reintroduction of internal/selftelemetry here breaks compile).
NONE

Internal refactor — sibling-port of the otlphttp exporter's self-telemetry to a package-local surface. Metric names, label shape, and kind values preserved. The v0.1.x SelfExporter() (ExporterCarrier) handle is dropped — the runtime degrades to the documented "no per-exporter signal" mode; tracecore_exporter_failure_rate still appears via the SLO observable gauge. No operator-facing change.

Tri Lam and others added 3 commits May 30, 2026 23:04
Sibling-port the otlphttp exporter to a package-local self-telemetry
surface so PR-F can delete `internal/selftelemetry`. Mirrors the
stdoutexporter precedent (PR #186): selftel.go owns `kind`,
`selfExporter`, `newSelfExporter`, and `recordInitError`; the
instrumentation scope name pins to the exporter's Go import path.

Metric contract preserved: `tracecore.exporter.calls_total{result, kind,
component_id}` with the same three kind values (marshal, io, downstream)
the v0.1.x exporter emitted; dashboards / alerts keyed on the counter
do not regress. `init_errors_total` ticks on register failure via the
same factory fallback shape as stdoutexporter.

ExporterCarrier dropped — there is no current production consumer of
`SelfExporter()` in this tree and PR-F deletes the contract anyway. The
runtime degrades to the documented "no per-exporter signal" mode;
`tracecore.exporter.failure_rate` still appears via the SLO observable
gauge. Documented at newSelfTelemetry.

TDD red-then-green: 8 new sibling tests cover noop safety, nil-MP error,
calls_total emission with all 3 kinds, scope-name pin, init_errors
shape, factory fallback (failing meter), nil-MP factory path, and the
sibling-types compile pin. classify_internal_test.go rewired to the
package-local kind type. All new + updated + pre-existing tests green;
make check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Tri Lam <tri@maydow.com>
Rename TestSelfTelemetry_* / TestFactory_* / TestSelfExporter_* /
TestRecordInitError_* tests in selftel_test.go + factory_test.go to
TestOtlphttp_* form so test output unambiguously identifies the
package under test. Doc comments above each test updated in lockstep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 06:28
@trilamsr trilamsr merged commit ab46fe2 into main May 31, 2026
14 checks passed
@trilamsr trilamsr deleted the pr-otlphttp-selftel-port branch May 31, 2026 06:34
trilamsr added a commit that referenced this pull request May 31, 2026
)

## Summary

Four amendments to `docs/rfcs/0013-distro-first-pivot.md` (plus a
one-line sweep across 6 companion docs) per the scope-review findings
staged before PR-I.1 / PR-K / PR-M code work begins. Pre-stages each
decision in the RFC so the autonomous code PRs don't escalate
mid-flight.

## Root cause

#181 (RFC-0013 PR-I in-repo submodule rescope) was incomplete:

1. Sweep missed 6 companion docs still pointing at the original-design
external `tracecoreai/tracecore-components` repo.
2. §7 listed 3 GitHub workflows for deletion that were already removed
pre-RFC and 1 issue template (`component-bug-dcgm.yml`) that was already
removed pre-RFC.
3. PR-K was a single 4-receiver-delete-plus-chart-migration mega-PR with
no decoupling of the `internal/synthesis/patterns/` k8sevents dep break,
which is on PR-I.2's critical path.
4. PR-I.1 conflated the `module/go.mod` scaffolding with the `git mv` +
package rename, blocking PR-I.1a from landing without PR-B2 even though
the scaffolding step has no nccl_fr dep.

Mid-flight discovery during merge cycle: merged commit #188
(`feat(pivot): PR-B2 — port dcgm off internal selftel + lifecycle`)
reused the `PR-B2` slug for a PR-B1-shape dcgm port (which is moot since
dcgm is deleted entirely in PR-F), creating a naming collision against
the canonical PR-B2 defined in the RFC — the nccl_fr
`internal/{pipeline,consumer,runtime/lifecycle}` → upstream port that
hard-gates the PR-I.1b `git mv`.

## Amendments

1. **§6/§7 sweep miss (Amendment 1)**: Remove surviving
`tracecoreai/tracecore-components` external-repo references across
`docs/getting-started.md`, `docs/followups/M11.md`,
`docs/followups/M19.md`, `docs/FOLLOWUPS.md`,
`docs/rfcs/0003-pipeline-runtime-and-component-contract.md`,
`AGENTS.md`. All re-pointed at `github.com/tracecoreai/tracecore/module`
per RFC-0013 §6. Verified zero surviving stale refs.
2. **§7 nonexistent workflow entries (Amendment 2)**: Collapse
`pyspy-integration.yml`, `python-publish.yml`,
`kernelevents-integration.yml` deletion rows into one row marked
"already removed pre-RFC". `component-bug-dcgm.yml` also already
removed. Only `component-bug-kernelevents.yml` survives for PR-K. §4
v0.3.0 row + PR-M slug cleaned for consistency.
3. **§migration PR-K sub-slice (Amendment 3)**:
- **PR-K.1** — sever `internal/synthesis/patterns/` from
`components/receivers/k8sevents` via local model types in
`internal/synthesis/patterns/model.go`. No deletions. **Unblocks
PR-I.2.**
- **PR-K.2** — delete
`components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}`
+ migrate ~86 test fixtures + delete `tools/failure-inject/xidgen/` +
keep `tools/failure-inject/ncclhang/`.
- **PR-K.3** — chart cleanup: flip `containerstdout-on-values.yaml` to
filelog+container-stanza, delete `containerstdout-rbac.yaml`, delete
`.github/ISSUE_TEMPLATE/component-bug-kernelevents.yml`, ship
`NOTES.txt` deprecation + values-key removal.
4. **§migration PR-I sub-slice + PR-B2 promotion (Amendment 4)**:
- **PR-B2** reframed as hard gate for PR-I.1b: port
`components/receivers/nccl_fr` off
`internal/{pipeline,consumer,runtime/lifecycle}` to upstream
`go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`.
Slug-collision note added re: merged #188.
- **PR-I.1a** — `module/go.mod` + root `go.work` + `builder-config.yaml`
`replaces:` skeleton. No file movement. Tag `module/v0.0.1` (genesis
tag, validates the tagging contract).
- **PR-I.1b** — `git mv components/receivers/nccl_fr →
module/receiver/ncclfrreceiver` + `git mv pkg/nccl/fr_parser →
module/pkg/nccl/fr_parser` + rename Go package `nccl_fr` →
`ncclfrreceiver` + update all importers. Hard-gated on PR-B2. No new
tag; next bump is `module/v0.1.0` at PR-I.2.
- **PR-I.2** — `rankjoinprocessor` + `patterndetectorprocessor` net-new.
Hard-gated on PR-K.1. Tag `module/v0.1.0` (first version pinned in
`builder-config.yaml` for v0.2.0).

Also: PR-J marked `(landed, #195)` with note that recipe docs landed but
chart-values compat map follows in PR-K.3.

## Adversarial review (5 lenses, inline)

- **(a) PR slug internal consistency**: PR-I.1b ↔ PR-B2 ↔ PR-I.2 ↔
PR-K.1 bidirectional gates all match. PR-J landed marker consistent with
#195. §4 v0.2.0 row has pre-existing drift (mentions dcgm+kueue in
v0.2.0 when PR-F+#168 already deleted them in v0.1.0) — out of these 4
amendments' scope; flag for follow-up.
- **(b) PR-B2 hard-gate naming**: tightened from "Hard gate for PR-I.1"
to "Hard gate for PR-I.1b" — accurate because PR-I.1a is
scaffolding-only with no file movement.
- **(c) Sub-PR numbering collision**: #188 explicitly addressed in
slug-collision note. #185/#186/#187/#193/#194/#196 are PR-B1-shape ports
for non-nccl receivers (no PR-slug label in their commits), no
collision.
- **(d) Stale external-repo refs**: `grep -rn
"tracecoreai/tracecore-components" docs/ AGENTS.md README.md` returns
zero hits post-amendment.
- **(e) Cross-reference link integrity**:
`docs/migration/v0.1-to-v0.2.md` references `#migration--rollout` and
`#3-customer-stable-telemetry-contracts`; both anchors preserved (§
headers unchanged). `make doc-check` confirms 526 markdown links
resolve.

## Test plan

- [x] `make doc-check` — 526 markdown links resolve, 0 stale refs,
banned-phrase lint clean, alert-check + chart-appversion gates green.
- [x] Pre-push hooks: golangci-lint clean, go vet clean, go mod verify
clean.
- [ ] CI doc-check + actionlint + zizmor gates pass on PR.

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
#206)

## Summary

Deletes the three internal moats and the in-tree DCGM receiver that
RFC-0013 §migration step 8 promised — the payoff for the wave-3
sibling-port PRs (#184/#185/#186/#187/#188/#193/#194/#196/#197).

**Net: -12,482 LOC across 92 files (78 deletions, 14 modifications).**

### What deletes

| Path | LOC | Why safe now |
|---|---|---|
| `components/receivers/dcgm/` | 7,604 | cgo stub never shipped real
code; #188's PR-B2-shaped dcgm sweep already removed the live port
surface. |
| `pkg/dcgm/` | 922 | Only consumer was the deleted receiver. Bonus
cleanup. |
| `internal/selftelemetry/` | 1,946 | Every consumer (containerstdout,
clockreceiver, kernelevents, k8sevents, nccl_fr, dcgm, pyspy,
stdoutexporter, otlphttp) ported onto receiver/exporter-scoped sibling
`selftel.go` files. |
| `internal/telemetry/` | 1,991 | Probes flow through upstream
`healthcheckextension`; MeterProvider via upstream `service.telemetry`.
Only remaining consumers were `internal/selftelemetry/*_test.go`
(deleted together) + one orphan clockreceiver test. |
| `components/receivers/clockreceiver/errors_integration_test.go` | 100
| Orphan from #185's PR-B1 clockreceiver port — bootstrapped via the
deleted `selftelemetry.Receiver` interface but never migrated to the
receiver-scoped sibling `selftel.go`. Covered behaviour ("errors_total
surfaces on downstream failure") is now exercised through
clockreceiver's sibling tests. |

### Pre-flight grep evidence (post-merge of origin/main)

```
$ grep -rn "tracecoreai/tracecore/internal/selftelemetry" --include="*.go" .
(zero matches)

$ grep -rn "tracecoreai/tracecore/internal/telemetry" --include="*.go" .
(zero matches)

$ grep -rn "tracecoreai/tracecore/components/receivers/dcgm" --include="*.go" .
$ grep -rn "tracecoreai/tracecore/pkg/dcgm" --include="*.go" .
(zero matches)
```

### Tooling

- Retire the `dcgm` build tag — `make build-tags` no longer vets `-tags
dcgm` (kept as a hook for future build-tag-gated paths).
- `make bench-check` loop drops both deleted package rows
(`internal/telemetry`, `components/receivers/dcgm`).
- `scripts/register-lint.sh` allowlist emptied (the two
`internal/telemetry/{build_info,slo}.go` entries are gone with the
package; allowlist comment notes the post-PR-F.1 state).
- `go.mod` direct deps shrink — `github.com/prometheus/client_golang`
and `go.opentelemetry.io/otel/exporters/prometheus` drop to indirect
(they were used by `internal/telemetry/server.go`).

### Chart toggles intentionally retained

Chart `receivers.dcgm` toggle + `templates/NOTES.txt` warning +
`templates/_helpers.tpl` doc-comment list keep the `dcgm` symbol for the
migration window. The toggle has been inert since PR-A2 — operators
enabling `receivers.dcgm.enabled=true` already crashed at boot because
the OCB binary doesn't register the factory. PR-K removes the toggle
entirely alongside the chart-default flip from `clockreceiver` →
`hostmetrics` and the v0.2.0 recipe migration.

### Doc sweep

- `internal/runtime/lifecycle/lifecycle.go` doc-comment: drop the dcgm
pointer; flag containerstdout as the sole remaining in-tree consumer;
reschedule the package itself for PR-F.2 deletion once containerstdout
ports off the helper or PR-K.2 deletes the receiver.
- `docs/FAILURE-MODES.md` self-tel-surface rows rewired from
`internal/telemetry/server_test.go::*` (deleted) to upstream-delegated
wording.
- `docs/patterns/{README,pattern-{1,3,4,5}}.md` replay-test pointers
updated — the in-tree `components/receivers/dcgm/pattern_replay_test.go`
is gone; pattern replay now flows through
`docs/integrations/prometheus-scrape.md` (PR-J's upstream
`dcgm-exporter` recipe).
- `docs/README.md` per-component table: drop the deleted
`internal/telemetry/{README,SECURITY}.md` rows + the deleted
`components/receivers/dcgm/{README,RUNBOOK}.md` rows.
- `STYLE.md` vendor-SDK section: drop the `pkg/dcgm/` reference + the
`//go:build dcgm` example; explicit cross-reference to PR-F.1 in the
integration-test build-tag note.
- `CHANGELOG.md`: PR-F.1 landed entry under Unreleased; "Remaining
v0.1.0 work" line updated to point at PR-F.2.
- `docs/rfcs/0013-distro-first-pivot.md` §migration step 8: PR-F entry
replaced with the PR-F.1/PR-F.2 split + the explicit rationale
(componentstatus travels with pipeline; pipeline is out of PR-F's scope
per line 240's original framing).

### Out of scope (PR-F.2 follow-up)

- `internal/componentstatus/` — 5-line `ReportStatus` free function.
Travels with `internal/pipeline` (its only non-test consumers are
`internal/pipeline/runtime_test.go` +
`internal/pipeline/pipelinetest/fixture_test.go`). Deletion lands when
pipeline migrates to upstream
`go.opentelemetry.io/collector/component/componentstatus`.

### Rationale links

- RFC-0013 §migration step 8 — the PR-F entry now codifies the F.1/F.2
split in this branch's RFC update.
- PR-B2 scope-discovery (#188) — established the "rename + slim, don't
reshape" pattern for the dcgm sweep that retired the cgo path.
- Wave-3 PRs that unblocked selftelemetry deletion: #184 (nccl_fr), #185
(clockreceiver), #186 (kernelevents), #187 (stdoutexporter), #188
(dcgm), #193 (otlphttp), #194 (pyspy), #196 (k8sevents), #197
(containerstdout).

```release-notes
[CHANGE] internal/{selftelemetry,telemetry} packages deleted; components/receivers/dcgm + pkg/dcgm deleted. Operators using the v0.1.x in-tree `tracecore.*` self-telemetry metric names migrate per docs/migration/v0.1-to-v0.2.md. Third-party importers of internal/* (unlikely pre-1.0) lose the `selftelemetry.{Receiver,Exporter}` interfaces and the `telemetry.MeterProvider` wrapper; receiver/exporter authors now wire a receiver-scoped sibling `selftel.go` per the PR-B1 pattern.
```

## Test plan

- [x] `make verify` (lint + vet + tidy-check + mod-verify +
license-check + generate-fixtures-check + build-tags + nccl-fr-rce-gate
+ register-lint + actionlint + zizmor + doc-check + no-autoupdate-check)
— exit 0.
- [x] `go test ./...` — all green (29 packages).
- [x] `make build` (OCB) — `./_build/tracecore` produced.
- [x] `./_build/tracecore --version` — `tracecore version
0.1.0-m9-alpha`.
- [x] Pre-flight greps for all four deleted paths — zero external
importers.
- [ ] CI green on PR (linux/race matrix, chart render, install-bench,
zizmor, govulncheck).
- [ ] Operator verification that the chart's `dcgm` toggle remains inert
post-merge (no behaviour change from main — already inert since PR-A2).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request May 31, 2026
…#207)

## Summary

Ports the in-tree `components/exporters/otlphttp` off
`internal/{pipeline,consumer}` to the upstream OTel collector exporter
API (v1.59.0 / v0.153.0). Mirrors the PR-B2 nccl_fr receiver port (#201)
on the exporter flavor.

This is RFC-0013 §migration cleanup: `internal/pipeline` and
`internal/consumer` are slated for deletion by PR-F. Until this PR,
otlphttp blocked that deletion. After this PR, OCB can stitch otlphttp
alongside upstream-contrib exporters without a translation layer.

## Type-swap surface

| Before (`internal/*`) | After (upstream) |
| ---------------------------------- |
----------------------------------------- |
| `pipeline.ExporterFactory` + `Factory` var |
`exporter.NewFactory(...)` constructor |
| `pipeline.CreateSettings` | `exporter.Settings` |
| `pipeline.BuildInfo` | `component.BuildInfo` |
| `pipeline.ID` | `component.ID` |
| `pipeline.ComponentState` (embed) | local `sync.Mutex + bool` for
idempotent `Shutdown` |
| `*slog.Logger` | `*zap.Logger` (upstream `TelemetrySettings.Logger`) |
| `pipelinetest.New(t).CreateSettings` |
`exportertest.NewNopSettings(componentType())` |
| `consumer.Capabilities` (internal) | `consumer.Capabilities` (upstream
— same name, new import) |

## Behavior preserved across the swap

- Retry semantics (429/502/503/504 + Retry-After integer / RFC 1123 /
clamping).
- Idempotent `Shutdown` that drains in-flight `Consume*` (`inflight`
WaitGroup pattern + TOCTOU-safe ordering).
- Self-telemetry metric names and label shape
(`tracecore.exporter.calls_total{result,kind,component_id}`,
`tracecore.selftelemetry.init_errors_total`).
- Sentinel-based `classifyKind` (`errPermanentStatus` /
`errRetryableStatus` → `kindDownstream`).
- Header canonicalization, gzip threshold, user-agent format.
- Stability level (`StabilityLevelBeta`) carried via
`WithMetrics/Traces/Logs`.

## Behavior intentionally dropped

None new — the per-exporter `ExporterCarrier` / `FailureRateReader`
plumbing was already removed by the earlier selftelemetry port (#193).
This PR is mechanical type-swap only.

## Compile-time guard

`factory.go` now pins `_ exporter.Metrics = (*otlpExporter)(nil)` (plus
Traces/Logs) so upstream shape changes surface at build time, not at OCB
stitch time.

## Test plan

- [x] `go test -race ./components/exporters/otlphttp/...` (green, 2.7s)
- [x] `go build ./...`
- [x] `make check` (lint + vet + tidy + doc-check + mod verify — all
green)
- [x] `zaptest/observer`-based assertion for
`signal=metrics/traces/logs` Stringer rendering (replaces the prior
`slog.NewTextHandler` byte-buffer assertion)
- [x] `failingExporterMP` synthetic-failure path still falls back to
noop telemetry and ticks `init_errors_total`
- [x] Existing `pipelinetest` callers in this package removed; no other
in-tree caller references the deleted `otlphttp.Factory` package var

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant