Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Pivot landed across four waves of PRs:
- Wave 4 — PR-B2-shape sibling ports (mechanical import swap to upstream `go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`; lands together with #195 PR-J recipes, #199 RFC §migration amendment for PR-I/K sub-slicing, #200 PR-N security-posture migration, #198 lint concurrency fix, #210 TOCTOU race-window test hardening): #195 PR-J four upstream-receiver recipes; #197 PR-F precursor containerstdout off selftel+lc; #198 golangci-lint stale-PID fix; #199 RFC §migration PR-I/K amendment + PR-B2 gate; #200 PR-N pyspy capability-surface guide; #201 PR-B2 nccl_fr upstream (canonical); #202 stdoutexporter upstream; #203 pyspy upstream; #204 k8sevents upstream; #205 PR-B3 clockreceiver upstream; #206 PR-F.1 (delete `internal/{selftelemetry,telemetry}` + `components/receivers/dcgm/` + `pkg/dcgm/`); #207 otlphttp upstream; #208 kernelevents upstream; #209 containerstdout upstream; #210 lifecycle TOCTOU concurrent-Add race hardening (kernelevents + k8sevents); #211 PR-K.1 sever patterns lib + replay runner from k8sevents; #212 wave-3/4 docs sweep; #213 clockreceiver convention restore.
- Wave 5 — receiver-source deletions (this PR + follow-ups): PR-K.2 (this PR) deletes `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}/` + `tools/failure-inject/xidgen/` + the `containerstdout-on-values.yaml` chart fixture; PR-K.3 (next) clears the chart values keys + DaemonSet template refs + `NOTES.txt` deprecation warnings. PR-F.2's `internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}` cut is now gated only on the surviving consumers (`nccl_fr`, `pyspy`, `otlphttp`, `stdoutexporter`, `internal/pipeline/chaos_test.go`) landing on upstream-only.

**RFC-0013 namespace alignment landed: in-tree component self-telemetry renamed `tracecore.*` → `otelcol.<role>.<name>.*`.** The eight surviving in-tree components (`clockreceiver`, `containerstdout`, `k8sevents`, `kernelevents`, `nccl_fr`, `pyspy`, `otlphttp`, `stdoutexporter`) each emit through their own per-component MeterProvider; instrument names now match the upstream `otelcol_<role>_<component>_<metric>` convention (e.g. `otelcol_receiver_containerstdout_errors_total`, `otelcol_exporter_otlphttp_calls_total`). Label shape preserved (`component_id`, `kind`, `result` unchanged). Per-component scope name unchanged (still the Go import path) — the receiver-scoped meter cannot collide with the OCB pipeline-runtime's own `otelcol_*` namespace. Three `prometheus-alerts.example.yaml` files (containerstdout, k8sevents, kernelevents) + `docs/examples/prometheus-alerts.example.yaml` rewritten against the new namespace. Migration table added to `docs/migration/v0.1-to-v0.2.md` under "In-tree receiver / exporter namespace alignment" with the rename matrix, per-component `<name>` substitutions, and a PromQL diff recipe.

**PR-F.1 landed: `components/receivers/dcgm/` + `pkg/dcgm/` + `internal/selftelemetry/` + `internal/telemetry/` deleted; one orphan clockreceiver integration test deleted.** Net deletion across the four moats RFC-0013 §migration step 8 promised. Deletes:
- `components/receivers/dcgm/` + `pkg/dcgm/` — cgo stub never shipped real code; live ports removed in #188's PR-B2-shaped dcgm sweep; kueue + kineto already deleted in #168.
- `internal/selftelemetry/` — every consumer (containerstdout, clockreceiver, kernelevents, k8sevents, nccl_fr, dcgm, pyspy, stdoutexporter, otlphttp) ported onto receiver/exporter-scoped sibling `selftel.go` files in wave-3 of the pivot (#184/#185/#186/#187/#188/#193/#194/#196/#197). The 5-method `selftelemetry.Receiver` and 1-method `selftelemetry.Exporter` interfaces (and the `Kind` canonical-set enum) leave the tree.
Expand Down
4 changes: 2 additions & 2 deletions components/exporters/otlphttp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Operator-supplied `headers` are sent verbatim on every outgoing request; useful

## Self-telemetry labels

The exporter increments `tracecore.exporter.calls_total{result, kind, component_id}` on every Consume*. The `kind` values emitted by otlphttp are exporter-local low-cardinality strings declared in [`selftel.go`](selftel.go) (sibling-scoped, package-local — see RFC-0013 §migration PR-B1; the `internal/selftelemetry` canonical set is being deleted in PR-F):
The exporter increments `otelcol.exporter.otlphttp.calls_total{result, kind, component_id}` on every Consume* (Prometheus scrape renders this as `otelcol_exporter_otlphttp_calls_total`). The `kind` values emitted by otlphttp are exporter-local low-cardinality strings declared in [`selftel.go`](selftel.go) (sibling-scoped, package-local — see RFC-0013 §migration v0.1.0 namespace alignment; the `internal/selftelemetry` canonical set was deleted in PR-F.1):

| `kind` | When | Operator first-step |
|---|---|---|
Expand All @@ -75,7 +75,7 @@ Operator dashboards split by `kind` to triage:

## Partial-success responses

A 200 OK response with a populated `partial_success` body is treated as a full success in v0.1.0; the response body is NOT decoded. The OTLP spec marks client handling of `partial_success` as OPTIONAL. Operators who need rejected-count reporting can scrape `tracecore.exporter.calls_total{result=success}` and compare against upstream input counts via their backend.
A 200 OK response with a populated `partial_success` body is treated as a full success in v0.1.0; the response body is NOT decoded. The OTLP spec marks client handling of `partial_success` as OPTIONAL. Operators who need rejected-count reporting can scrape `otelcol_exporter_otlphttp_calls_total{result="success"}` and compare against upstream input counts via their backend.

## Signals supported

Expand Down
35 changes: 14 additions & 21 deletions components/exporters/otlphttp/otlphttp.go
Original file line number Diff line number Diff line change
Expand Up @@ -312,34 +312,27 @@ func buildUserAgent(bi component.BuildInfo) string {
// newSelfTelemetry wires the per-exporter self-telemetry handle.
// Returns a no-op when the MeterProvider is absent or instrument
// registration fails; the register-failure path also ticks
// `tracecore.selftelemetry.init_errors_total` via recordInitError so
// operators can alert on > 0. Mirrors the nccl_fr sibling — same wire
// shape, no internal/selftelemetry import.
// `otelcol.selftelemetry.init_errors_total` via recordInitError so
// operators can alert on > 0. Mirrors the stdoutexporter sibling —
// same wire shape, no internal/selftelemetry import.
//
// NOTE on ExporterCarrier removal:
//
// v0.1.x otlphttp exposed `SelfExporter() selftelemetry.Exporter` so
// the runtime's reader-collection path could feed
// `tracecore.exporter.failure_rate`. The PR-B1 sibling port dropped
// the `selftelemetry.ExporterCarrier` implementation:
// `tracecore.exporter.failure_rate`. RFC-0013 PR-A2 deleted the
// `cmd/tracecore` hand-wired entry point and PR-F.1 deleted
// `internal/selftelemetry` entirely — the carrier surface has no
// remaining consumer.
//
// - The runtime path that consumed ExporterCarrier (`cmd/tracecore`
// in v0.1.x) silently skipped components that didn't implement
// it. There is no current production caller in this tree; the
// v0.1.x ConsumeCarrier was the only consumer and PR-F deletes it.
// - `tracecore_exporter_failure_rate` still appears in scrape via the
// SLO observable gauge (reports 0 with no readers registered).
// - `tracecore.exporter.calls_total{result,kind,component_id}`
// - `otelcol.exporter.otlphttp.calls_total{result,kind,component_id}`
// continues to surface because the sibling impl emits it on
// `set.MeterProvider` directly — dashboards / alerts keyed on the
// calls_total counter do not regress.
// - PR-F deletes `internal/selftelemetry` entirely, so the contract
// evaporates regardless. Removing now keeps the sibling
// import-graph clean and matches the stdoutexporter precedent.
//
// The per-exporter failure_rate gauge feed is the documented gap; the
// runtime degrades to the "no per-exporter signal" mode in line with
// the v0.1.x contract.
// calls_total counter rate-derive failure via
// PromQL `rate(otelcol_exporter_otlphttp_calls_total{result="error"}[5m])`.
// - The per-exporter failure_rate gauge feed is intentionally
// dropped; the v0.1.x SLO observable gauge contract is replaced
// by the upstream OCB pipeline-runtime counters.
func newSelfTelemetry(ctx context.Context, set exporter.Settings, logger *zap.Logger) selfExporter {
if set.MeterProvider == nil {
logger.Warn("otlphttp: no MeterProvider; self-telemetry using noop")
Expand Down Expand Up @@ -604,7 +597,7 @@ func (e *otlpExporter) doOnce(ctx context.Context, endpoint string, body []byte,
// spec marks client handling as OPTIONAL. v1 treats all 200s
// as full success without parsing the body. Operators who
// want partial-success reporting can scrape the
// `tracecore.exporter.calls_total` counter, which still
// `otelcol.exporter.otlphttp.calls_total` counter, which still
// counts the 200 as a success here.
return false, retryHint{}, nil
case isRetryableStatus(resp.StatusCode):
Expand Down
38 changes: 19 additions & 19 deletions components/exporters/otlphttp/selftel.go
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
// SPDX-License-Identifier: Apache-2.0

// Exporter-scoped self-telemetry surface. Replaces the v0.1.x
// dependency on `internal/selftelemetry`, which is slated for deletion
// in RFC-0013 PR-F. Metric names + label shape are preserved
// (`tracecore.exporter.calls_total{result,kind,component_id}`) so
// dashboards / alerts don't regress. The instrumentation scope name is
// THIS exporter's Go import path — when the exporter moves under
// dependency on `internal/selftelemetry`. Metric names follow the
// upstream OTel collector `otelcol_<role>_<component>_<metric>`
// convention per RFC-0013 §migration v0.1.0 namespace alignment:
// `otelcol.exporter.otlphttp.calls_total{result,kind,component_id}`
// (Prometheus exporter renders the dots as underscores). Label shape
// is preserved (`component_id`) so multi-instance disambiguation in
// dashboards is unchanged from v0.1.x. The instrumentation scope name
// is THIS exporter's Go import path — when the exporter moves under
// `module/` in PR-I, the scope name moves with it, matching OTel
// convention.
//
Expand Down Expand Up @@ -69,16 +72,13 @@ var errNilMeterProvider = errors.New("otlphttp: MeterProvider is nil")
// removal in otlphttp.go).
//
// Why drop FailureRateReader / ExporterCarrier:
// - The runtime path that consumed ExporterCarrier (`cmd/tracecore`
// in v0.1.x) silently skipped components that didn't implement
// it — the documented "no per-exporter signal" degraded mode. The
// production-runtime caller has no current consumer in this tree.
// - `tracecore_exporter_failure_rate` still appears in scrape via the
// SLO observable gauge (reports 0 with no readers).
// - PR-F deletes `internal/selftelemetry` entirely, so any code that
// referenced ExporterCarrier here would have to be removed then
// anyway. Removing now keeps the sibling import-graph clean and
// matches the stdoutexporter precedent.
// - The runtime path that consumed ExporterCarrier
// (`cmd/tracecore.collect.collectFailureRateReaders` in v0.1.x)
// was deleted by RFC-0013 PR-A2 along with the hand-wired entry
// point, so the carrier interface has no remaining consumer.
// - `internal/selftelemetry` (which owned the carrier) was deleted
// by RFC-0013 PR-F.1. Operators rate-derive failure rate via
// PromQL `rate(otelcol_exporter_otlphttp_calls_total{result="error"}[5m])`.
type selfExporter interface {
IncCallSuccess()
IncCallFailure(k kind)
Expand All @@ -95,7 +95,7 @@ func (noopSelfExporter) IncCallFailure(kind) {}
var _ selfExporter = noopSelfExporter{}

// newSelfExporter returns a real selfExporter backed by an OTel counter
// `tracecore.exporter.calls_total{result, kind, component_id}` acquired
// `otelcol.exporter.otlphttp.calls_total{result, kind, component_id}` acquired
// from mp. The component's id is attached as the `component_id` label
// on every emission. Metric name + label shape preserved from the
// v0.1.x internal selftelemetry package so dashboards / alerts don't
Expand All @@ -107,7 +107,7 @@ func newSelfExporter(id component.ID, mp metric.MeterProvider) (selfExporter, er
meter := mp.Meter(instrumentationScope)

calls, err := meter.Int64Counter(
"tracecore.exporter.calls_total",
"otelcol.exporter.otlphttp.calls_total",
metric.WithDescription("Exporter Consume* calls partitioned by result"),
)
if err != nil {
Expand Down Expand Up @@ -145,7 +145,7 @@ func (e *selfExporterImpl) IncCallFailure(k kind) {
))
}

// recordInitError ticks tracecore.selftelemetry.init_errors_total when
// recordInitError ticks otelcol.selftelemetry.init_errors_total when
// exporter wiring falls back to noop telemetry. Operators alert on
// `> 0` to learn that self-telemetry isn't really plugged in. Panics
// from a broken MeterProvider are swallowed — recordInitError IS the
Expand All @@ -158,7 +158,7 @@ func recordInitError(ctx context.Context, mp metric.MeterProvider, kindLabel, co
}
meter := mp.Meter(instrumentationScope)
c, err := meter.Int64Counter(
"tracecore.selftelemetry.init_errors_total",
"otelcol.selftelemetry.init_errors_total",
metric.WithDescription("Counter of self-telemetry construction failures that fell back to the noop implementation."),
)
if err != nil {
Expand Down
32 changes: 16 additions & 16 deletions components/exporters/otlphttp/selftel_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ func collectRM(t *testing.T, rdr *sdkmetric.ManualReader) metricdata.ResourceMet
}

// findInstrument returns the first metricdata.Metrics whose Name matches the
// supplied OTel-dot name (e.g. "tracecore.exporter.calls_total"). Returns
// supplied OTel-dot name (e.g. "otelcol.exporter.otlphttp.calls_total"). Returns
// (nil, false) if absent. Scope-agnostic: walks all scope metrics.
func findInstrument(rm metricdata.ResourceMetrics, name string) (metricdata.Metrics, bool) {
for _, sm := range rm.ScopeMetrics {
Expand Down Expand Up @@ -115,7 +115,7 @@ func TestOtlphttp_NewExporter_NilProviderErrors(t *testing.T) {
// M2 metric contract for exporters. After IncCallSuccess() ×2 +
// IncCallFailure(kindMarshal) ×1 + IncCallFailure(kindIO) ×1 +
// IncCallFailure(kindDownstream) ×1, the ManualReader collects
// tracecore.exporter.calls_total with datapoints partitioned by result and
// otelcol.exporter.otlphttp.calls_total with datapoints partitioned by result and
// (for failures) kind, labeled with the component_id. A regression that
// drops the kind label, the component_id label, the result label, or the
// metric-name prefix fails here.
Expand All @@ -132,9 +132,9 @@ func TestOtlphttp_EmitsCallsTotal_WithResultKindAndComponentID(t *testing.T) {
se.IncCallFailure(kindDownstream)

rm := collectRM(t, rdr)
m, ok := findInstrument(rm, "tracecore.exporter.calls_total")
m, ok := findInstrument(rm, "otelcol.exporter.otlphttp.calls_total")
if !ok {
t.Fatalf("metric tracecore.exporter.calls_total absent; have: %s", dumpNames(rm))
t.Fatalf("metric otelcol.exporter.otlphttp.calls_total absent; have: %s", dumpNames(rm))
}
sum, ok := m.Data.(metricdata.Sum[int64])
if !ok {
Expand Down Expand Up @@ -180,7 +180,7 @@ func TestOtlphttp_ScopeNameIsExporterImportPath(t *testing.T) {
}
se.IncCallSuccess()
rm := collectRM(t, rdr)
scope, ok := scopeOf(rm, "tracecore.exporter.calls_total")
scope, ok := scopeOf(rm, "otelcol.exporter.otlphttp.calls_total")
if !ok {
t.Fatalf("calls_total absent")
}
Expand All @@ -192,7 +192,7 @@ func TestOtlphttp_ScopeNameIsExporterImportPath(t *testing.T) {

// TestOtlphttp_RecordInitError_TicksInitErrorsCounter pins: when factory wiring
// fails (newSelfExporter returns an error), recordInitError surfaces a
// tracecore.selftelemetry.init_errors_total tick with kind="exporter",
// otelcol.selftelemetry.init_errors_total tick with kind="exporter",
// the component_id label, and reason="instrument_register". This is the
// only signal that an exporter fell back to noop telemetry; dropping the
// recordInitError call must fail this test.
Expand All @@ -201,7 +201,7 @@ func TestOtlphttp_RecordInitError_TicksInitErrorsCounter(t *testing.T) {
recordInitError(context.Background(), mp, "exporter", testID().String(), reasonInstrumentRegister)

rm := collectRM(t, rdr)
m, ok := findInstrument(rm, "tracecore.selftelemetry.init_errors_total")
m, ok := findInstrument(rm, "otelcol.selftelemetry.init_errors_total")
if !ok {
t.Fatalf("init_errors_total absent; have: %s", dumpNames(rm))
}
Expand Down Expand Up @@ -240,11 +240,11 @@ func TestOtlphttp_RecordInitError_NilProviderIsSafe(t *testing.T) {

// TestOtlphttp_FallsBackToNoopWhenMeterFails pins the factory
// observability contract end-to-end: when newSelfExporter returns an
// error (synthetic register failure for every tracecore.exporter.*
// error (synthetic register failure for every otelcol.exporter.otlphttp.*
// instrument), the factory MUST (1) leave the exporter with a working
// noop telemetry field (no nil, no panic on hot-path calls), AND (2)
// tick tracecore.selftelemetry.init_errors_total via recordInitError.
// Mirrors the nccl_fr sibling test seam.
// tick otelcol.selftelemetry.init_errors_total via recordInitError.
// Mirrors the stdoutexporter sibling test seam.
func TestOtlphttp_FallsBackToNoopWhenMeterFails(t *testing.T) {
mp, rdr := newTestMeterProvider(t)
failing := &failingExporterMP{real: mp}
Expand All @@ -268,12 +268,12 @@ func TestOtlphttp_FallsBackToNoopWhenMeterFails(t *testing.T) {
exp.telemetry.IncCallFailure(kindIO)

rm := collectRM(t, rdr)
if m, ok := findInstrument(rm, "tracecore.exporter.calls_total"); ok {
if m, ok := findInstrument(rm, "otelcol.exporter.otlphttp.calls_total"); ok {
if sum, ok := m.Data.(metricdata.Sum[int64]); ok && len(sum.DataPoints) > 0 {
t.Errorf("noop fallback leaked Inc* into calls_total datapoints: %v", sum.DataPoints)
}
}
m, ok := findInstrument(rm, "tracecore.selftelemetry.init_errors_total")
m, ok := findInstrument(rm, "otelcol.selftelemetry.init_errors_total")
if !ok {
t.Fatalf("init_errors_total absent after factory fallback; have: %s", dumpNames(rm))
}
Expand Down Expand Up @@ -385,9 +385,9 @@ func dumpNames(rm metricdata.ResourceMetrics) string {
}

// failingExporterMP wraps a real MeterProvider but fails every instrument
// registration whose name starts with "tracecore.exporter.". Mirrors the
// stdoutexporter sibling test seam so a future refactor that reorders the
// newSelfExporter constructor doesn't silently bypass coverage.
// registration whose name starts with "otelcol.exporter.otlphttp.".
// Mirrors the stdoutexporter sibling test seam so a future refactor that
// reorders the newSelfExporter constructor doesn't silently bypass coverage.
type failingExporterMP struct {
embedded.MeterProvider
real metric.MeterProvider
Expand All @@ -401,7 +401,7 @@ type failingExporterMeter struct {
metric.Meter
}

const exporterInstrumentPrefix = "tracecore.exporter."
const exporterInstrumentPrefix = "otelcol.exporter.otlphttp."

var errSyntheticExporterFailure = errors.New("synthetic: exporter instrument registration failed")

Expand Down
Loading
Loading