Skip to content

feat(pivot): RFC-0013 namespace alignment — tracecore.* → otelcol.*#216

Merged
trilamsr merged 4 commits into
mainfrom
pr-otelcol-namespace-alignment
May 31, 2026
Merged

feat(pivot): RFC-0013 namespace alignment — tracecore.* → otelcol.*#216
trilamsr merged 4 commits into
mainfrom
pr-otelcol-namespace-alignment

Conversation

@trilamsr

@trilamsr trilamsr commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

The four surviving in-tree components (nccl_fr, pyspy, otlphttp, stdoutexporter) each emit self-telemetry through their own per-component MeterProvider. This PR renames their instrument names from the v0.1.x tracecore.* family to the upstream otelcol_<role>_<component>_<metric> convention so the in-tree namespace does not collide with the OCB pipeline-runtime's own otelcol_* family.

Per RFC-0013 §migration v0.1.0 row 119: "self-tel metric rename tracecore.*otelcol_*".

Scope reduced post-PR-K.2 / PR-F.2. The four legacy in-tree receivers originally listed for rename (clockreceiver, containerstdout, k8sevents, kernelevents) plus the in-tree boot-path internals were deleted from the binary in #217 and #215 while this PR was in CI. The namespace-alignment edits to those files were dropped during the post-#217 merge; the surviving net diff covers only the four components still in the binary.

Rename matrix

OTel-dot form (Prometheus scrape renders dots as underscores):

v0.1.x instrument post-rename instrument
tracecore.receiver.errors_total otelcol.receiver.<name>.errors_total
tracecore.receiver.emissions_total otelcol.receiver.<name>.emissions_total
tracecore.receiver.collection_latency_seconds otelcol.receiver.<name>.collection_latency_seconds
tracecore.receiver.degraded_seconds_total otelcol.receiver.<name>.degraded_seconds_total
tracecore.receiver.last_activity_unix_seconds otelcol.receiver.<name>.last_activity_unix_seconds
tracecore.exporter.calls_total otelcol.exporter.<name>.calls_total
tracecore.selftelemetry.init_errors_total otelcol.selftelemetry.init_errors_total

Where <name> is the OCB component name without underscores. Per-component substitutions:

Component <name>
components/receivers/nccl_fr ncclfr (underscore stripped per RFC)
components/receivers/pyspy pyspy
components/exporters/otlphttp otlphttp
components/exporters/stdoutexporter stdoutexporter

Label shape is preserved. component_id still partitions per-instance; kind / result values are unchanged. Dashboards and alerts that filtered on labels need only the metric-name rename, not a label-selector rewrite.

What changed

  • Four selftel.go files (otlphttp, stdoutexporter, nccl_fr, pyspy) — instrument literals updated; header comments rewritten to document the new convention + label-preservation invariant.
  • Four selftel_test.go filesfindInstrument(..., "...") and scopeOf(..., "...") assertions updated; receiverInstrumentPrefix / exporterInstrumentPrefix constants (used by the failingReceiverMP / failingExporterMP synthetic-failure seams) bumped to the new prefix so a future drift back to tracecore.* would fail compile-time on the prefix mismatch.
  • docs/examples/prometheus-alerts.example.yaml — rewritten as receiver-agnostic starter using regex matchers ({__name__=~"otelcol_receiver_.*_errors_total"}) so a new in-tree receiver inherits coverage on first scrape. Removed the tracecore_exporter_failure_rate / tracecore_build_info rules — those v0.1.x gauges no longer exist post-RFC-0013.
  • docs/migration/v0.1-to-v0.2.md — new "In-tree receiver / exporter namespace alignment (RFC-0013 v0.1.0)" section listing only the four surviving in-tree components; points operators at the existing "Orphan in-tree components" table for deleted-receiver migration.
  • NOTE on ExporterCarrier removal blocks in otlphttp.go + stdoutexporter.go collapsed — they referenced cmd/tracecore.collect.collectFailureRateReaders (deleted in PR-A2) and internal/selftelemetry (deleted in PR-F.1). New comments point at the PromQL recipe rate(otelcol_exporter_<name>_calls_total{result="error"}[5m]).
  • CHANGELOG[Unreleased] entry added documenting the alignment.

Test plan

  • go vet ./... clean (root + module/ submodule)
  • go test ./... all green (root module; module/ submodule has no test files)
  • make check (fmt + tidy-check + lint + vet + mod-verify) green
  • make doc-check green
  • TestSelfTelemetry_* assertions pass for every surviving package
  • TestFactory_FallsBackToNoopWhenMeterFails synthetic-register-failure seam still trips on the renamed prefix (failingReceiverMP / failingExporterMP)

Merge resolution (post-#215, #217)

Conflicts surfaced as "deleted by them" across 20 files in the four deleted receivers. Resolution: accept all upstream deletions; the namespace-alignment edits to those files were moot because the files themselves no longer exist on main. No content edits required to the surviving four components from the merge — their namespace-alignment commits remain intact.

**Breaking (pre-1.0)**: in-tree component self-telemetry metric names renamed from `tracecore.*` to `otelcol.<role>.<name>.*` per RFC-0013 §migration v0.1.0 namespace alignment. Affects the four surviving in-tree components (`nccl_fr`, `pyspy`, `otlphttp`, `stdoutexporter`); the four legacy in-tree receivers (`clockreceiver`, `containerstdout`, `k8sevents`, `kernelevents`) were already deleted from the binary in #215 / #217 and migrate via the upstream-receiver replacements documented in the "Orphan in-tree components" table in `docs/migration/v0.1-to-v0.2.md`. Label shape preserved (`component_id`, `kind`, `result` unchanged). Operators with dashboards / alerts on the v0.1.x names rename per the table in `docs/migration/v0.1-to-v0.2.md` under "In-tree receiver / exporter namespace alignment".

Tri Lam added 2 commits May 31, 2026 02:19
The eight surviving in-tree components (clockreceiver, containerstdout,
k8sevents, kernelevents, nccl_fr, pyspy, otlphttp, stdoutexporter) each
emit self-telemetry through their own per-component MeterProvider.
Instrument names now match the upstream
`otelcol_<role>_<component>_<metric>` convention so the in-tree
namespace does not collide with the OCB pipeline-runtime's own
`otelcol_*` family.

Rename matrix (OTel-dot form; Prometheus scrape renders as underscores):
  tracecore.receiver.errors_total                      → otelcol.receiver.<name>.errors_total
  tracecore.receiver.emissions_total                   → otelcol.receiver.<name>.emissions_total
  tracecore.receiver.collection_latency_seconds        → otelcol.receiver.<name>.collection_latency_seconds
  tracecore.receiver.degraded_seconds_total            → otelcol.receiver.<name>.degraded_seconds_total
  tracecore.receiver.last_activity_unix_seconds        → otelcol.receiver.<name>.last_activity_unix_seconds
  tracecore.exporter.calls_total                       → otelcol.exporter.<name>.calls_total
  tracecore.selftelemetry.init_errors_total            → otelcol.selftelemetry.init_errors_total

Where <name> is the OCB component name without underscores (nccl_fr →
ncclfr; all others unchanged). Label shape is preserved — component_id
still partitions per-instance, kind/result values unchanged.

Updates eight selftel.go + eight selftel_test.go files (instrument
literals + scope-name asserts + receiverInstrumentPrefix /
exporterInstrumentPrefix constants in the failingMP synthetic-failure
seam). Updates three per-receiver prometheus-alerts.example.yaml
(containerstdout / k8sevents / kernelevents) + the receiver-agnostic
docs/examples/prometheus-alerts.example.yaml against the new
namespace. The kernelevents prometheus_alerts_test.go expectedMetrics
array tracks the rename so a typo in an alert still fails the gate.

Migration guide picks up a new "In-tree receiver / exporter namespace
alignment" section with the full rename table, per-component <name>
substitutions, and a PromQL diff recipe. Stale code-comments
referencing the v0.1.x prom-underscore names updated across
containerstdout/{attribution,cursor,informer,ratelimit}.go,
k8sevents/{receiver,selftel,node_failure_modes_test}.go,
kernelevents/source.go, clockreceiver/clockreceiver.go, and the
otlphttp/stdoutexporter `NOTE on ExporterCarrier removal` blocks
(which now point at the post-RFC-0013 PromQL recipe instead of the
deleted SLO observable gauge).

Test plan:
- `go vet ./components/...` clean
- `go test -short ./components/...` all green
- TestSelfTelemetry_EmitsErrorsTotal_WithKindAndComponentID + the
  TestSelfTelemetry_ScopeNameIsReceiverImportPath / ScopeNameIsExporterImportPath
  variants pass for every package
- TestFactory_FallsBackToNoopWhenMeterFails synthetic-register-failure
  seam still trips on the renamed prefix (failingReceiverMP /
  failingExporterMP)
- kernelevents TestPrometheusAlerts_ReferenceWiredCounters cross-checks
  the alert YAML against the post-rename instrument list

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Tri Lam <tri@maydow.com>
…alignment

# Conflicts:
#	components/exporters/otlphttp/otlphttp.go
#	components/exporters/otlphttp/selftel_test.go
#	components/receivers/clockreceiver/clockreceiver.go
Tri Lam added 2 commits May 31, 2026 10:20
…alignment

# Conflicts:
#	components/receivers/clockreceiver/clockreceiver.go
#	components/receivers/clockreceiver/selftel.go
#	components/receivers/clockreceiver/selftel_test.go
#	components/receivers/containerstdout/attribution.go
#	components/receivers/containerstdout/cursor.go
#	components/receivers/containerstdout/informer.go
#	components/receivers/containerstdout/prometheus-alerts.example.yaml
#	components/receivers/containerstdout/ratelimit.go
#	components/receivers/containerstdout/selftel.go
#	components/receivers/containerstdout/selftel_test.go
#	components/receivers/k8sevents/node_failure_modes_test.go
#	components/receivers/k8sevents/prometheus-alerts.example.yaml
#	components/receivers/k8sevents/receiver.go
#	components/receivers/k8sevents/selftel.go
#	components/receivers/k8sevents/selftel_test.go
#	components/receivers/kernelevents/prometheus-alerts.example.yaml
#	components/receivers/kernelevents/prometheus_alerts_test.go
#	components/receivers/kernelevents/selftel.go
#	components/receivers/kernelevents/selftel_test.go
#	components/receivers/kernelevents/source.go
PR-K.2 (#217) and PR-F.2 (#215) deleted clockreceiver, containerstdout,
k8sevents, kernelevents from the binary; the namespace-alignment section
in v0.1-to-v0.2.md was authored before those merges landed and still
listed all eight. Trim the per-component substitution table to the four
surviving in-tree components (nccl_fr, pyspy, otlphttp, stdoutexporter)
and point operators at the existing "Orphan in-tree components" table
for the deleted-receiver migration path. Refresh the PromQL example
to use otlphttp (still present) instead of containerstdout (deleted).

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) May 31, 2026 17:26
@trilamsr trilamsr merged commit 8becd8e into main May 31, 2026
15 checks passed
@trilamsr trilamsr deleted the pr-otelcol-namespace-alignment branch May 31, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant