From 89b61b018655cd96cd43044fa13133b66da19049 Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Mon, 1 Jun 2026 01:37:25 -0700 Subject: [PATCH 1/2] chore(exporters): delete dead otlphttp + stdoutexporter wrappers The OCB build assembled from builder-config.yaml has shipped upstream otlphttpexporter + debugexporter since v0.2.0. The in-tree wrapper packages had zero Go importers outside their own _test.go files. Net: -21 source files / ~3,981 LOC. Drops 4 transitive collector deps (exporter, exportertest, consumertest, xexporter). Operator-impacting: - Configs naming 'otlphttp' as an exporter type still work (upstream component-id is identical). - Configs naming 'stdoutexporter' should switch to 'debug' (upstream debugexporter component-id). Chart default is already 'debug'. - otelcol_exporter_otlphttp_calls_total / _errors_total disappear; upstream emits otelcol_exporter_sent_* / send_failed_* via exporterhelper. Migration table in docs/migration/v0.1-to-v0.2.md carries the per-component substitution annotation. Closes #333, #334. Signed-off-by: Tri Lam --- .github/workflows/install-bench.yml | 6 +- CHANGELOG.md | 8 +- bench/install/README.md | 12 +- components/exporters/otlphttp/README.md | 97 -- .../otlphttp/classify_internal_test.go | 49 - components/exporters/otlphttp/config.go | 160 --- components/exporters/otlphttp/config_test.go | 124 --- .../exporters/otlphttp/example_config.yaml | 5 - components/exporters/otlphttp/factory.go | 81 -- components/exporters/otlphttp/factory_test.go | 89 -- components/exporters/otlphttp/otlphttp.go | 755 -------------- .../exporters/otlphttp/otlphttp_test.go | 940 ------------------ .../otlphttp/parseretry_internal_test.go | 91 -- components/exporters/otlphttp/selftel.go | 117 --- components/exporters/otlphttp/selftel_test.go | 308 ------ .../otlphttp/signal_internal_test.go | 26 - components/exporters/stdoutexporter/README.md | 66 -- components/exporters/stdoutexporter/config.go | 27 - .../stdoutexporter/example_config.yaml | 4 - .../exporters/stdoutexporter/factory.go | 70 -- .../exporters/stdoutexporter/selftel.go | 110 -- .../exporters/stdoutexporter/selftel_test.go | 295 ------ .../stdoutexporter/stdoutexporter.go | 127 --- .../stdoutexporter/stdoutexporter_test.go | 281 ------ docs/FAILURE-MODES.md | 4 +- docs/MILESTONES.md | 2 +- docs/README.md | 7 +- docs/STRATEGY.md | 15 +- docs/followups/otlphttp.md | 240 ----- docs/migration/v0.1-to-v0.2.md | 4 +- docs/research/baselines.md | 1 - docs/v1-rc1-simplification-audit.md | 25 +- docs/v1-rc1-test-audit.md | 4 - go.mod | 4 - go.sum | 38 - module/pkg/selftel/selftel.go | 10 +- 36 files changed, 50 insertions(+), 4152 deletions(-) delete mode 100644 components/exporters/otlphttp/README.md delete mode 100644 components/exporters/otlphttp/classify_internal_test.go delete mode 100644 components/exporters/otlphttp/config.go delete mode 100644 components/exporters/otlphttp/config_test.go delete mode 100644 components/exporters/otlphttp/example_config.yaml delete mode 100644 components/exporters/otlphttp/factory.go delete mode 100644 components/exporters/otlphttp/factory_test.go delete mode 100644 components/exporters/otlphttp/otlphttp.go delete mode 100644 components/exporters/otlphttp/otlphttp_test.go delete mode 100644 components/exporters/otlphttp/parseretry_internal_test.go delete mode 100644 components/exporters/otlphttp/selftel.go delete mode 100644 components/exporters/otlphttp/selftel_test.go delete mode 100644 components/exporters/otlphttp/signal_internal_test.go delete mode 100644 components/exporters/stdoutexporter/README.md delete mode 100644 components/exporters/stdoutexporter/config.go delete mode 100644 components/exporters/stdoutexporter/example_config.yaml delete mode 100644 components/exporters/stdoutexporter/factory.go delete mode 100644 components/exporters/stdoutexporter/selftel.go delete mode 100644 components/exporters/stdoutexporter/selftel_test.go delete mode 100644 components/exporters/stdoutexporter/stdoutexporter.go delete mode 100644 components/exporters/stdoutexporter/stdoutexporter_test.go delete mode 100644 docs/followups/otlphttp.md diff --git a/.github/workflows/install-bench.yml b/.github/workflows/install-bench.yml index 25e75ef1..16c6e8ee 100644 --- a/.github/workflows/install-bench.yml +++ b/.github/workflows/install-bench.yml @@ -1,7 +1,7 @@ # Install bench: stands up a kind cluster, builds and loads the # tracecore image, deploys an OTLP sink + tracecore via the chart -# with the otlphttp exporter swap, measures install-to-first-data -# wall-clock, and writes the result JSON. +# with the upstream otlphttpexporter swap, measures +# install-to-first-data wall-clock, and writes the result JSON. # # Pinned action SHAs match the existing chart.yml workflow (the # canonical reference for kind-based CI in this repo). See @@ -13,7 +13,6 @@ on: pull_request: paths: - 'bench/install/**' - - 'components/exporters/otlphttp/**' - 'install/kubernetes/tracecore/**' - 'builder-config.yaml' - 'go.mod' @@ -23,7 +22,6 @@ on: branches: [main] paths: - 'bench/install/**' - - 'components/exporters/otlphttp/**' - 'install/kubernetes/tracecore/**' - 'builder-config.yaml' - 'go.mod' diff --git a/CHANGELOG.md b/CHANGELOG.md index d7e68eef..99f5d2d8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,10 @@ User-visible changes are documented here. Format: [Keep a Changelog](https://kee ## [Unreleased] +### Removed + +- **In-tree `components/exporters/otlphttp/` + `components/exporters/stdoutexporter/` wrappers** (closes #333, #334). The OCB build assembled from [`builder-config.yaml`](builder-config.yaml) has shipped upstream `go.opentelemetry.io/collector/exporter/otlphttpexporter` + `debugexporter` since v0.2.0 (lines 59-60); the in-tree wrappers had zero Go importers outside their own `_test.go` files. Operator configs that named `otlphttp` as an exporter type continue to work — the upstream component-id is identical. Configs that named `stdoutexporter` should switch to `debug` (the upstream `debugexporter` component-id); the chart default already uses `debug`. Net deletion: 13 + 8 files / ~3,981 LOC. See [`docs/migration/v0.1-to-v0.2.md`](docs/migration/v0.1-to-v0.2.md) for the chart-default + PromQL implications already documented at the v0.2.0 boundary. + ## [0.2.0] - 2026-05-31 First tagged release after the RFC-0013 distribution-first pivot. The binary is now assembled by the OpenTelemetry Collector Builder (OCB) from upstream + contrib components plus the in-repo `module/` Go submodule (NCCL FlightRecorder receiver, rankjoinprocessor, patterndetectorprocessor). OTel collector + contrib pinned at v0.130.0. Hand-rolled `cmd/tracecore` boot path retired; the legacy in-tree pipeline runtime, consumer chain, fanout, config loader, componentstatus surface, and runtime/lifecycle helper have all been deleted in favor of upstream `go.opentelemetry.io/collector/service`. Self-telemetry instrument names renamed `tracecore_*` → `otelcol_*`. Helm chart default image tag now resolves to `v0.2.0` via `Chart.yaml` appVersion (synced from `builder-config.yaml` `dist.version` by `scripts/chart-appversion-check.sh`). Container image publish via [ko](https://ko.build) to `ghcr.io/tracecoreai/tracecore`, cosign-signed and SLSA-attested by digest. See [`docs/migration/v0.1-to-v0.2.md`](docs/migration/v0.1-to-v0.2.md) for the operator upgrade walkthrough and [RFC-0013](docs/rfcs/0013-distro-first-pivot.md) for the strategic rationale. @@ -22,7 +26,7 @@ First tagged release after the RFC-0013 distribution-first pivot. The binary is ### Changed -- **Self-telemetry instrument names renamed `tracecore_*` → `otelcol_*`** for the eight surviving in-tree components (`clockreceiver`, `containerstdout`, `k8sevents`, `kernelevents`, `nccl_fr`, `pyspy`, `otlphttp`, `stdoutexporter`). Each emits through its own per-component MeterProvider; names now match the upstream `otelcol___` convention (e.g. `otelcol_receiver_containerstdout_errors_total`, `otelcol_exporter_otlphttp_calls_total`). Label shape preserved (`component_id`, `kind`, `result` unchanged). The migration table in `docs/migration/v0.1-to-v0.2.md` under "In-tree receiver / exporter namespace alignment" carries the rename matrix, per-component `` substitutions, and a PromQL diff recipe. +- **Self-telemetry instrument names renamed `tracecore_*` → `otelcol_*`** for the eight in-tree components shipping at the v0.2.0 boundary (`clockreceiver`, `containerstdout`, `k8sevents`, `kernelevents`, `nccl_fr`, `pyspy`, `otlphttp`, `stdoutexporter`; the four receivers + two exporter wrappers were subsequently retired — see `[Unreleased]` Removed for the exporter wrappers). Each emits through its own per-component MeterProvider; names now match the upstream `otelcol___` convention (e.g. `otelcol_receiver_containerstdout_errors_total`, `otelcol_exporter_otlphttp_calls_total`). Label shape preserved (`component_id`, `kind`, `result` unchanged). The migration table in `docs/migration/v0.1-to-v0.2.md` under "In-tree receiver / exporter namespace alignment" carries the rename matrix, per-component `` substitutions, and a PromQL diff recipe. - **Adopt > build posture for upstream-provided telemetry surfaces** (RFC-0013 §2). GPU telemetry, container stdout, kernel events, K8s events, Kueue, heartbeat, self-telemetry, release pipeline, and image publish all route through upstream + contrib components rather than in-tree forks. Vendors: NVIDIA (`dcgm-exporter`), AMD (`ROCm/device-metrics-exporter`), Intel (`intel/xpumanager`), Habana (Habana Prometheus Metric Exporter) — all scraped via upstream `prometheusreceiver`. CNCF: `filelogreceiver` + container stanza + `file_storage`; `journaldreceiver`; `k8sobjectsreceiver`. Self-telemetry: upstream `componentstatus` + `service/telemetry` + standard `otelcol_*` metrics. - **Customer-stable telemetry contracts preserved across the pivot** via OTTL `transform` processors in the bundled Helm-chart recipe (RFC-0013 §3): `k8s.event.hint` eleven-entry enum, `kernelevents.xid` (NVRM Xid code), `gpu.id` (PCI BDF), `gpu.vendor`, `gen_ai.training.rank` + `gen_ai.training.job_id` (cross-receiver join keys), NCCL FlightRecorder span schema, pattern-detector outputs. - **OTel collector + contrib pinned at v0.130.0** (bumped from v0.110 → v0.115 → v0.120 → v0.125 → v0.130 across the wave-PR sequence). Go toolchain on 1.23. @@ -58,7 +62,7 @@ Pre-pivot scaffolding tagged as `v0.1.0-m1` on 2026-05-14. None of the code desc - **`internal/pipeline.ComponentState`** — lifecycle-bookkeeping mixin receiver authors embed for `Started()`/`Stopped()` accessors. Modelled after OTel's `service/internal/testcomponents/stateful_component.go`. (Deleted in v0.2.0 PR-F.2.) - **`tools/components-gen`** + `components.yaml` codegen — single source of truth for which components the binary ships with; generated `cmd/tracecore/components.go` regenerated via `make generate`. Manifest changes validated at gen-time. (Superseded by OCB `builder-config.yaml` in v0.2.0.) - **`components/receivers/clockreceiver`** — canonical example receiver. Emitted `tracecore.clock.now` gauge at a configurable interval. (Deleted in v0.2.0 PR-K.2; replaced by `hostmetricsreceiver`.) -- **`components/exporters/stdoutexporter`** — canonical example exporter. (Ported onto upstream `service` in v0.2.0; source survives in-tree.) +- **`components/exporters/stdoutexporter`** — canonical example exporter. (Ported onto upstream `service` in v0.2.0; source retired post-v0.2.0, see `[Unreleased]` Removed.) - **`cmd/tracecore collect --config=`** — boots the runtime, builds pipelines via bottom-up factory assembly (exporters → fan-out → processors reversed → first-data wrap → receivers), runs until SIGTERM/SIGINT, performs two-phase shutdown. (Deleted in v0.2.0 PR-A2; replaced by the OCB-generated main.) - **Operator-UX patterns** baked in (RFC-0003 §"Operator UX patterns"): line-numbered YAML errors with `file:line`, named-op `safe.Call`, empty-pipeline boot logs once and idles, first-data log line per pipeline, `pipelinetest.New(t)` fixture. - **Integration test** in `cmd/tracecore/integration_test.go` — built the binary, ran it with a real config, captured stdout/stderr, asserted exit code + JSON metric output + lifecycle log lines. (Deleted in v0.2.0 PR-A2.) diff --git a/bench/install/README.md b/bench/install/README.md index bc7e06c8..f66f0642 100644 --- a/bench/install/README.md +++ b/bench/install/README.md @@ -6,7 +6,7 @@ End-to-end install-bench harness for tracecore: stands up a kind cluster, deploys an OTLP-receiving sink (otelcol-contrib `file` exporter), `helm install`s tracecore with chart values that swap the -default stdoutexporter for the in-tree `otlphttp` exporter, and +default `debug` exporter for the upstream `otlphttpexporter`, and measures wall-clock from `helm install` return to first OTLP datapoint at the sink. @@ -69,11 +69,11 @@ Each `bench/results/install-*.json` row carries: ## Caveats - **Metrics signal only.** The bench wires `pipelines.metrics` to the - otlphttp exporter; `traces` and `logs` pipelines are not exercised. - The exporter supports all three signals (unit-tested in - `components/exporters/otlphttp/otlphttp_test.go`), but install-bench - validates the metrics wire path only. Adding traces+logs to the - bench is tracked in `docs/FOLLOWUPS.md`. + upstream `otlphttpexporter`; `traces` and `logs` pipelines are not + exercised. The exporter supports all three signals (covered by + upstream tests in `go.opentelemetry.io/collector/exporter/otlphttpexporter`), + but install-bench validates the metrics wire path only. Adding + traces+logs to the bench is tracked in `docs/FOLLOWUPS.md`. - **First-data is tick-aliased.** The bench heartbeat source emits on a 1 s interval (hostmetricsreceiver loadscraper as of PR-E; was the in-tree clockreceiver pre-PR-E, which was deleted in diff --git a/components/exporters/otlphttp/README.md b/components/exporters/otlphttp/README.md deleted file mode 100644 index 726c85f8..00000000 --- a/components/exporters/otlphttp/README.md +++ /dev/null @@ -1,97 +0,0 @@ -# otlphttp - -**Stability:** alpha - -POSTs OTLP-encoded payloads (metrics, traces, logs) to a configured -HTTP endpoint per the [OpenTelemetry Protocol HTTP transport -specification](https://opentelemetry.io/docs/specs/otlp/#otlphttp). -From-scratch implementation; does NOT import upstream -`go.opentelemetry.io/collector/exporter/otlphttpexporter` or -`exporterhelper`. Rationale: keeps tracecore's supply chain narrow -(per [RFC-0003](../../../docs/rfcs/0003-pipeline-runtime-and-component-contract.md)) -and avoids the 25-40 transitive modules the upstream exporter pulls. - -## Configuration - -| Field | Default | Description | -|---|---|---| -| `endpoint` | _(required)_ | Base URL. The OTLP path suffix (`/v1/metrics`, `/v1/traces`, `/v1/logs`) is appended per signal. Scheme must be `http` or `https`. | -| `metrics_endpoint` / `traces_endpoint` / `logs_endpoint` | "" | Per-signal complete URL overrides. When set, the OTLP path suffix is NOT appended. | -| `headers` | `{}` | Headers added to every outgoing request (e.g., vendor auth tokens). | -| `compression` | "" | `""` (none) or `"gzip"`. The OTLP/HTTP spec only blesses gzip. | -| `encoding` | `"proto"` | `"proto"` (Content-Type `application/x-protobuf`) or `"json"` (Content-Type `application/json`). | -| `timeout` | `10s` | Per-request HTTP timeout. Sized so worst-case ConsumeMetrics fits inside the pipeline-shutdown budget (1+2+4 s backoff + 4 × 10 s timeout ≈ 47 s). | -| `insecure_skip_verify` | `false` | Disable TLS cert validation. Test environments only; never production. | -| `max_retries` | `3` | Retry ceiling on retryable failures (429/502/503/504 + network errors). | -| `initial_backoff` | `1s` | First inter-retry pause when the server did not send `Retry-After`. Exponential × 2 with ±20% jitter on subsequent attempts. | - -## Example - -```yaml -exporters: - otlphttp: - endpoint: http://localhost:4318 -``` - -## Retry semantics - -Strict OTLP/HTTP spec compliance: - -- **Retryable status codes:** `429`, `502`, `503`, `504` ONLY. Any other 4xx or 5xx is permanent. -- **Network errors:** retried (per spec "if the client cannot connect to the server"). -- **Retry-After header** (integer seconds or HTTP-date per [RFC 9110 §10.2.3](https://datatracker.ietf.org/doc/html/rfc9110#section-10.2.3), which accepts the IMF-fixdate, RFC 850, and asctime forms via `http.ParseTime`): honored when set, including `Retry-After: 0` which short-circuits the exponential backoff. -- **Exponential backoff:** when no `Retry-After`, `initial × 2^attempt` with ±20% jitter, capped at 60s. -- **gzip compression** (when opted in) is skipped on payloads below 1 KiB. Single-datapoint OTLP bodies are smaller than the gzip header + DEFLATE overhead, so compression would inflate the wire bytes. Operators expecting full compression on small payloads should be aware. - -## TLS depth (v0.1.0 scope) - -The only TLS knob is `insecure_skip_verify` (test-only); TLS is otherwise driven by the endpoint URL scheme + the host's system trust store. Out of scope for v0.1.0: - -- mTLS (client certificates) -- Custom CA bundle path -- SNI override -- Cipher-suite pinning - -Operators on TLS-everywhere networks should set `endpoint: https://...` and rely on the system trust store. mTLS lands in a follow-up; track at `docs/research/m5-m6-research.md` followups table. - -## Header handling - -Operator-supplied `headers` are sent verbatim on every outgoing request; useful for vendor auth tokens (`x-honeycomb-team`, `Authorization: Bearer ...`, etc.). The exporter **does not redact headers in any log line** because debug-level retry logs only carry the error string (`server returned retryable status 503`), not the request header map. Operators passing sensitive headers should still treat the deployment's collector logs as sensitive. - -## Self-telemetry labels - -The exporter increments `otelcol.exporter.otlphttp.calls_total{result, kind, component_id}` on every Consume* (Prometheus scrape renders this as `otelcol_exporter_otlphttp_calls_total`). The `kind` values emitted by otlphttp are exporter-local low-cardinality strings declared in [`selftel.go`](selftel.go) (sibling-scoped, package-local — see RFC-0013 §migration v0.1.0 namespace alignment; the `internal/selftelemetry` canonical set was deleted in PR-F.1): - -| `kind` | When | Operator first-step | -|---|---|---| -| `marshal` | pdata serialization or gzip compression failure | check upstream pipeline; rebuild on dirty source | -| `io` | network error, timeout, context cancellation, shutdown abort | check endpoint reachability, backend health, pod ctx budget | -| `downstream` | server returned a permanent 4xx (not 429) or non-retryable 5xx | check backend logs at the timestamp; the backend rejected the payload | - -Operator dashboards split by `kind` to triage: -- `marshal` increments → tracecore bug; file an issue -- `io` increments → operator's network or backend reachability -- `downstream` increments → backend rejected the data; backend-side action - -## Partial-success responses - -A 200 OK response with a populated `partial_success` body is treated as a full success in v0.1.0; the response body is NOT decoded. The OTLP spec marks client handling of `partial_success` as OPTIONAL. Operators who need rejected-count reporting can scrape `otelcol_exporter_otlphttp_calls_total{result="success"}` and compare against upstream input counts via their backend. - -## Signals supported - -| Signal | Supported | -|---------|-----------| -| Metrics | yes | -| Traces | yes | -| Logs | yes | - -## Capabilities - -`MutatesData: false`; read-only. Fan-out may share a single payload with this exporter without cloning. - -## Implementation notes - -- Uses `pmetric.ProtoMarshaler{}.MarshalMetrics(md)` (and trace/log equivalents), which produces byte-identical output to `pmetricotlp.ExportRequest.MarshalProto()`; both call `internal.MetricsToProto`. The bare marshaler avoids pulling `pmetricotlp` and its grpc transitive deps. -- HTTP response bodies are drained (up to 64 KiB) and closed even on success, so `net/http` keep-alive can reuse the connection. -- User-Agent string follows upstream convention: `/ (/)`. -- Concurrent `Consume*` calls are safe; `net/http.Client` handles its own pooling. diff --git a/components/exporters/otlphttp/classify_internal_test.go b/components/exporters/otlphttp/classify_internal_test.go deleted file mode 100644 index 8e9cf167..00000000 --- a/components/exporters/otlphttp/classify_internal_test.go +++ /dev/null @@ -1,49 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -// Internal-package tests for classifyKind so the sentinel-error -// routing introduced in P3 is locked in by direct unit tests rather -// than inferred from observable side effects. - -package otlphttp - -import ( - "context" - "errors" - "fmt" - "testing" - - "github.com/stretchr/testify/require" -) - -func TestClassifyKind_PermanentStatusRoutesToDownstream(t *testing.T) { - t.Parallel() - err := fmt.Errorf("%w 400", errPermanentStatus) - require.Equal(t, kindDownstream, classifyKind(err)) -} - -func TestClassifyKind_RetryableStatusRoutesToDownstream(t *testing.T) { - t.Parallel() - // Retry-exhausted 5xx must label as downstream (server-fault), - // not io (transport-fault). Pre-P3 string-match treated this as - // io because the error message was "retryable status N", not - // "permanent status N". Reviewer P3-Rev1#7/Rev2-F3. - err := fmt.Errorf("%w 503", errRetryableStatus) - require.Equal(t, kindDownstream, classifyKind(err)) -} - -func TestClassifyKind_CtxCanceledRoutesToIO(t *testing.T) { - t.Parallel() - require.Equal(t, kindIO, classifyKind(context.Canceled)) -} - -func TestClassifyKind_CtxDeadlineExceededRoutesToIO(t *testing.T) { - t.Parallel() - require.Equal(t, kindIO, classifyKind(context.DeadlineExceeded)) -} - -func TestClassifyKind_GenericErrorRoutesToIO(t *testing.T) { - t.Parallel() - // Network-shaped errors (DNS, TLS, dial) fall through to the - // default branch and are tagged as io. README contract. - require.Equal(t, kindIO, classifyKind(errors.New("dial tcp: no such host"))) -} diff --git a/components/exporters/otlphttp/config.go b/components/exporters/otlphttp/config.go deleted file mode 100644 index 96c28448..00000000 --- a/components/exporters/otlphttp/config.go +++ /dev/null @@ -1,160 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -// Package otlphttp ships an OTLP/HTTP exporter that POSTs OTLP -// protobuf or JSON bodies to a configured endpoint per the -// OpenTelemetry Protocol HTTP transport specification: -// https://opentelemetry.io/docs/specs/otlp/#otlphttp -// -// This is a from-scratch implementation: it does NOT import -// upstream `go.opentelemetry.io/collector/exporter/otlphttpexporter` -// or `exporterhelper`. Rationale: keeping tracecore's pipeline -// runtime supply chain narrow (per RFC-0003) and avoiding the 25 to -// 40 transitive modules that the upstream exporter pulls in. -package otlphttp - -import ( - "errors" - "fmt" - "net/url" - "strings" - "time" -) - -// DefaultTimeout is the per-request HTTP timeout. Set to 10s so that -// a worst-case ConsumeMetrics; (1 + 2 + 4) s backoff + 3 × 10 s -// timeout = ~37 s; fits inside RFC-0003's pipeline-shutdown budget -// before the runtime force-cancels the call. Operators sending to -// high-latency backends override via Config. (Reviewer V3.) -const DefaultTimeout = 10 * time.Second - -// DefaultMaxRetries is the per-call retry ceiling. The exporter -// retries only on the OTLP-spec-blessed retryable status codes -// (429, 502, 503, 504) and on network errors; permanent codes (400, -// 401, 403, 404, 405, 413, ...) are returned to the caller without -// retry. -const DefaultMaxRetries = 3 - -// DefaultInitialBackoff is the first inter-retry pause. Each -// subsequent retry doubles the pause with ±20% jitter. -const DefaultInitialBackoff = 1 * time.Second - -// Config is the operator-facing YAML shape for the otlphttp exporter. -type Config struct { - // Endpoint is the base URL the exporter POSTs to. The standard - // OTLP/HTTP path suffix (/v1/metrics, /v1/traces, /v1/logs) is - // appended automatically per the signal being exported. Use a - // scheme of http or https; the scheme drives TLS. - Endpoint string `yaml:"endpoint"` - - // MetricsEndpoint, TracesEndpoint, LogsEndpoint override the - // per-signal URL when the operator needs a complete URL rather - // than Endpoint + path suffix. Empty falls back to Endpoint. - MetricsEndpoint string `yaml:"metrics_endpoint,omitempty"` - TracesEndpoint string `yaml:"traces_endpoint,omitempty"` - LogsEndpoint string `yaml:"logs_endpoint,omitempty"` - - // Headers are added to every outgoing request. Useful for - // vendor-specific auth tokens (e.g., x-honeycomb-team). - Headers map[string]string `yaml:"headers,omitempty"` - - // Compression is the Content-Encoding to apply to request - // bodies. Empty string means no compression. The OTLP/HTTP - // spec only blesses gzip; tracecore enforces that allow-list. - Compression string `yaml:"compression,omitempty"` - - // Encoding selects the body format: "proto" (default, - // Content-Type application/x-protobuf) or "json" - // (Content-Type application/json). - Encoding string `yaml:"encoding,omitempty"` - - // Timeout is the per-request HTTP timeout. Zero means use - // DefaultTimeout; negative is invalid. - Timeout time.Duration `yaml:"timeout,omitempty"` - - // InsecureSkipVerify disables TLS certificate validation when - // the endpoint scheme is https. Use only for self-signed test - // environments; never in production. Operators are expected to - // run with the system trust store by default. - InsecureSkipVerify bool `yaml:"insecure_skip_verify,omitempty"` - - // MaxRetries caps the number of retry attempts on retryable - // failures (429/502/503/504 + network errors). Zero means use - // DefaultMaxRetries; negative disables retries entirely. - MaxRetries int `yaml:"max_retries,omitempty"` - - // InitialBackoff is the first inter-retry pause. Each subsequent - // retry doubles with ±20% jitter. Honored only when the server - // did not send a Retry-After header. Zero means - // DefaultInitialBackoff. - InitialBackoff time.Duration `yaml:"initial_backoff,omitempty"` - - _ struct{} -} - -// Validate enforces the operator-facing invariants. Returns nil -// when the config can be safely handed to a factory. -func (c *Config) Validate() error { - if c.Endpoint == "" && c.MetricsEndpoint == "" && c.TracesEndpoint == "" && c.LogsEndpoint == "" { - return errors.New("otlphttp: endpoint is required (set endpoint or one of metrics_endpoint/traces_endpoint/logs_endpoint)") - } - // Empty/whitespace keys produce `Header: ` lines on the wire that - // envoy and AWS ALB reject with 400. CR/LF in values is the CRLF- - // injection shape; net/http catches it at send time too, but - // catching it here gives operators a config-time error instead of - // a per-request runtime failure. - for k, v := range c.Headers { - if strings.TrimSpace(k) == "" { - return errors.New("otlphttp: headers map contains empty or whitespace-only key") - } - if strings.ContainsAny(v, "\r\n") { - return fmt.Errorf("otlphttp: header %q value contains a CR or LF character", k) - } - } - for label, ep := range map[string]string{ - "endpoint": c.Endpoint, - "metrics_endpoint": c.MetricsEndpoint, - "traces_endpoint": c.TracesEndpoint, - "logs_endpoint": c.LogsEndpoint, - } { - if ep == "" { - continue - } - if err := validateEndpointURL(ep); err != nil { - return fmt.Errorf("otlphttp: %s: %w", label, err) - } - } - if c.Timeout < 0 { - return fmt.Errorf("otlphttp: timeout must be non-negative, got %s", c.Timeout) - } - switch c.Compression { - case "", "gzip": - // OK; empty (no compression) or the only spec-blessed value. - default: - return fmt.Errorf("otlphttp: compression must be empty or \"gzip\" per OTLP/HTTP spec, got %q", c.Compression) - } - switch c.Encoding { - case "", "proto", "json": - // OK; empty falls back to proto in the factory. - default: - return fmt.Errorf("otlphttp: encoding must be \"proto\" or \"json\", got %q", c.Encoding) - } - return nil -} - -// validateEndpointURL parses an endpoint and rejects schemes other -// than http and https. Bare host:port (no scheme) is rejected so -// operators don't accidentally hit Go's net/url quirks where a -// missing scheme parses as a relative path. -func validateEndpointURL(ep string) error { - u, err := url.Parse(ep) - if err != nil { - return fmt.Errorf("malformed URL %q: %w", ep, err) - } - if u.Scheme != "http" && u.Scheme != "https" { - return fmt.Errorf("scheme must be http or https, got %q in %q", u.Scheme, ep) - } - if u.Host == "" { - return fmt.Errorf("missing host in %q", ep) - } - return nil -} diff --git a/components/exporters/otlphttp/config_test.go b/components/exporters/otlphttp/config_test.go deleted file mode 100644 index 25a494c6..00000000 --- a/components/exporters/otlphttp/config_test.go +++ /dev/null @@ -1,124 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package otlphttp_test - -import ( - "fmt" - "os" - "testing" - "time" - - "github.com/stretchr/testify/require" - "gopkg.in/yaml.v3" - - "github.com/tracecoreai/tracecore/components/exporters/otlphttp" -) - -func TestConfig_Validate_DefaultsFromFactoryAreInvalid(t *testing.T) { - t.Parallel() - // CreateDefaultConfig() should be invalid until Endpoint is set. - // operators must explicitly choose where to send data. - cfg, ok := otlphttp.NewFactory().CreateDefaultConfig().(*otlphttp.Config) - require.True(t, ok) - require.Error(t, cfg.Validate(), - "default config has no endpoint; operators must set one") -} - -func TestConfig_Validate_RequiresEndpoint(t *testing.T) { - t.Parallel() - cfg := &otlphttp.Config{} - err := cfg.Validate() - require.ErrorContains(t, err, "endpoint") -} - -func TestConfig_Validate_AcceptsHTTP(t *testing.T) { - t.Parallel() - cfg := &otlphttp.Config{Endpoint: "http://localhost:4318"} - require.NoError(t, cfg.Validate()) -} - -func TestConfig_Validate_AcceptsHTTPS(t *testing.T) { - t.Parallel() - cfg := &otlphttp.Config{Endpoint: "https://api.example.com"} - require.NoError(t, cfg.Validate()) -} - -func TestConfig_Validate_RejectsNonHTTPSchemes(t *testing.T) { - t.Parallel() - for _, ep := range []string{ - "ftp://example.com", - "file:///tmp/data", - "grpc://localhost:4317", // gRPC is a different exporter - "localhost:4318", // missing scheme - } { - t.Run(ep, func(t *testing.T) { - cfg := &otlphttp.Config{Endpoint: ep} - require.Error(t, cfg.Validate(), "scheme must be http or https") - }) - } -} - -func TestConfig_Validate_RejectsNegativeTimeout(t *testing.T) { - t.Parallel() - cfg := &otlphttp.Config{ - Endpoint: "http://localhost:4318", - Timeout: -1 * time.Second, - } - require.ErrorContains(t, cfg.Validate(), "timeout") -} - -func TestConfig_Validate_RejectsUnknownCompression(t *testing.T) { - t.Parallel() - cfg := &otlphttp.Config{ - Endpoint: "http://localhost:4318", - Compression: "snappy", // spec only blesses gzip; reject others - } - require.ErrorContains(t, cfg.Validate(), "compression") -} - -func TestConfig_Validate_AcceptsGzipOrEmptyCompression(t *testing.T) { - t.Parallel() - for _, c := range []string{"", "gzip"} { - cfg := &otlphttp.Config{ - Endpoint: "http://localhost:4318", - Compression: c, - } - require.NoError(t, cfg.Validate()) - } -} - -func TestExampleConfig_Parses(t *testing.T) { - t.Parallel() - bs, err := os.ReadFile("example_config.yaml") - require.NoError(t, err) - - var doc struct { - Exporters struct { - Otlphttp yaml.Node `yaml:"otlphttp"` - } `yaml:"exporters"` - } - require.NoError(t, yaml.Unmarshal(bs, &doc)) - - cfg, ok := otlphttp.NewFactory().CreateDefaultConfig().(*otlphttp.Config) - require.True(t, ok) - require.NoError(t, doc.Exporters.Otlphttp.Decode(cfg)) - require.NoError(t, cfg.Validate()) -} - -func TestConfig_Validate_RejectsCRLFInHeaderValue(t *testing.T) { - t.Parallel() - for _, value := range []string{ - "token\r", - "token\n", - "token\r\nInjected: header", - } { - t.Run(fmt.Sprintf("value=%q", value), func(t *testing.T) { - cfg := &otlphttp.Config{ - Endpoint: "http://localhost:4318", - Headers: map[string]string{"X-Auth": value}, - } - err := cfg.Validate() - require.ErrorContains(t, err, "CR or LF") - }) - } -} diff --git a/components/exporters/otlphttp/example_config.yaml b/components/exporters/otlphttp/example_config.yaml deleted file mode 100644 index 82e5ef53..00000000 --- a/components/exporters/otlphttp/example_config.yaml +++ /dev/null @@ -1,5 +0,0 @@ -# Minimal working otlphttp exporter config. Sends OTLP/HTTP protobuf -# bodies to a local OTel Collector listener on the default port. -exporters: - otlphttp: - endpoint: http://localhost:4318 diff --git a/components/exporters/otlphttp/factory.go b/components/exporters/otlphttp/factory.go deleted file mode 100644 index c4ec5bdc..00000000 --- a/components/exporters/otlphttp/factory.go +++ /dev/null @@ -1,81 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package otlphttp - -import ( - "context" - "fmt" - - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/exporter" -) - -// componentType is the kind name OCB registers this exporter under. -// Wrapped in a function so the MustNewType call is not a top-level -// side effect (mirrors the nccl_fr / pyspy pattern). -func componentType() component.Type { return component.MustNewType("otlphttp") } - -// stability is the OCB-surfaced stability level for the otlphttp -// exporter. Beta tracks "metrics + label shape pinned; behavior may -// evolve" — same level the in-tree exporter has carried since the -// v0.1.x internal/pipeline factory; this port preserves it across -// the upstream-API swap. -const stability = component.StabilityLevelBeta - -// NewFactory returns the upstream exporter.Factory for otlphttp. -// Mirrors the upstream-contrib pattern (otlpexporter, fileexporter) — -// callers construct via `otlphttp.NewFactory()` rather than a package -// var, so each OCB-stitched pipeline gets a freshly-built factory -// and the package surface stays a single exported symbol. -// -// All three signals are wired: the exporter POSTs metrics, traces, -// and logs against the OTLP/HTTP per-signal path suffix. -func NewFactory() exporter.Factory { - return exporter.NewFactory( - componentType(), - createDefaultConfig, - exporter.WithMetrics(createMetrics, stability), - exporter.WithTraces(createTraces, stability), - exporter.WithLogs(createLogs, stability), - ) -} - -// createDefaultConfig matches upstream component.CreateDefaultConfigFunc. -func createDefaultConfig() component.Config { return &Config{} } - -// createMetrics is the exporter.CreateMetricsFunc wired by WithMetrics. -func createMetrics(ctx context.Context, set exporter.Settings, cfg component.Config) (exporter.Metrics, error) { - c, ok := cfg.(*Config) - if !ok { - return nil, fmt.Errorf("otlphttp: unexpected config type %T", cfg) - } - return newExporter(ctx, set, c, signalMetrics) -} - -// createTraces is the exporter.CreateTracesFunc wired by WithTraces. -func createTraces(ctx context.Context, set exporter.Settings, cfg component.Config) (exporter.Traces, error) { - c, ok := cfg.(*Config) - if !ok { - return nil, fmt.Errorf("otlphttp: unexpected config type %T", cfg) - } - return newExporter(ctx, set, c, signalTraces) -} - -// createLogs is the exporter.CreateLogsFunc wired by WithLogs. -func createLogs(ctx context.Context, set exporter.Settings, cfg component.Config) (exporter.Logs, error) { - c, ok := cfg.(*Config) - if !ok { - return nil, fmt.Errorf("otlphttp: unexpected config type %T", cfg) - } - return newExporter(ctx, set, c, signalLogs) -} - -// Compile-time assertion that the wired exporter satisfies the -// upstream per-signal interfaces. A breaking shape change in upstream -// surfaces here at build time rather than at runtime when OCB stitches -// the pipeline. -var ( - _ exporter.Metrics = (*otlpExporter)(nil) - _ exporter.Traces = (*otlpExporter)(nil) - _ exporter.Logs = (*otlpExporter)(nil) -) diff --git a/components/exporters/otlphttp/factory_test.go b/components/exporters/otlphttp/factory_test.go deleted file mode 100644 index fda6c182..00000000 --- a/components/exporters/otlphttp/factory_test.go +++ /dev/null @@ -1,89 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package otlphttp_test - -import ( - "testing" - - "github.com/stretchr/testify/require" - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/exporter/exportertest" - - "github.com/tracecoreai/tracecore/components/exporters/otlphttp" -) - -// componentType mirrors the production-package constant so external -// tests can construct exportertest settings keyed on the same Type -// the factory registers. -func componentType() component.Type { return component.MustNewType("otlphttp") } - -func TestOtlphttp_TypeIsOtlphttp(t *testing.T) { - t.Parallel() - require.Equal(t, "otlphttp", otlphttp.NewFactory().Type().String()) -} - -func TestOtlphttp_CreateDefaultConfig_ReturnsConfigPointer(t *testing.T) { - t.Parallel() - cfg := otlphttp.NewFactory().CreateDefaultConfig() - _, ok := cfg.(*otlphttp.Config) - require.True(t, ok, "factory must produce *Config") -} - -func TestOtlphttp_NewFactoryReturnsFreshInstancePerCall(t *testing.T) { - t.Parallel() - // Each NewFactory() call returns its own Factory value so OCB- - // stitched pipelines can hold per-pipeline factory state without - // alias surprises. The Type the factory advertises stays stable. - f1 := otlphttp.NewFactory() - f2 := otlphttp.NewFactory() - require.Equal(t, f1.Type(), f2.Type()) -} - -func TestOtlphttp_CreateMetrics_ReturnsExporter(t *testing.T) { - t.Parallel() - set := exportertest.NewNopSettings(componentType()) - cfg := &otlphttp.Config{Endpoint: "http://localhost:4318"} - - exp, err := otlphttp.NewFactory().CreateMetrics(t.Context(), set, cfg) - require.NoError(t, err) - require.NotNil(t, exp) -} - -func TestOtlphttp_CreateTraces_ReturnsExporter(t *testing.T) { - t.Parallel() - set := exportertest.NewNopSettings(componentType()) - cfg := &otlphttp.Config{Endpoint: "http://localhost:4318"} - - exp, err := otlphttp.NewFactory().CreateTraces(t.Context(), set, cfg) - require.NoError(t, err) - require.NotNil(t, exp) -} - -func TestOtlphttp_CreateLogs_ReturnsExporter(t *testing.T) { - t.Parallel() - set := exportertest.NewNopSettings(componentType()) - cfg := &otlphttp.Config{Endpoint: "http://localhost:4318"} - - exp, err := otlphttp.NewFactory().CreateLogs(t.Context(), set, cfg) - require.NoError(t, err) - require.NotNil(t, exp) -} - -func TestOtlphttp_CreateMetrics_RejectsWrongConfigType(t *testing.T) { - t.Parallel() - set := exportertest.NewNopSettings(componentType()) - // Passing a config of the wrong type should fail with a clear - // operator-actionable error. - _, err := otlphttp.NewFactory().CreateMetrics(t.Context(), set, &otlphttp.Config{Endpoint: "http://localhost:4318"}) - require.NoError(t, err) // sanity: correct type passes - _, err = otlphttp.NewFactory().CreateMetrics(t.Context(), set, &wrongConfig{}) - require.ErrorContains(t, err, "unexpected config type") -} - -// wrongConfig satisfies component.Config (just an empty struct); -// the factory rejects it with an operator-actionable error. -type wrongConfig struct{} - -// Compile-time assertion that wrongConfig satisfies component.Config -// (which is any in upstream v1.59.0; the constraint is documentary). -var _ component.Config = (*wrongConfig)(nil) diff --git a/components/exporters/otlphttp/otlphttp.go b/components/exporters/otlphttp/otlphttp.go deleted file mode 100644 index 0ac703e6..00000000 --- a/components/exporters/otlphttp/otlphttp.go +++ /dev/null @@ -1,755 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -// Package otlphttp ships an OTLP/HTTP exporter that POSTs OTLP -// protobuf or JSON bodies to a configured endpoint per the -// OpenTelemetry Protocol HTTP transport specification: -// https://opentelemetry.io/docs/specs/otlp/#otlphttp -// -// This is a from-scratch implementation: it does NOT import -// upstream `go.opentelemetry.io/collector/exporter/otlphttpexporter` -// or `exporterhelper`. Rationale: keeping the OCB-stitched binary's -// transitive supply chain narrow (per RFC-0003) and avoiding the -// 25–40 transitive modules that the upstream exporter pulls in. -package otlphttp - -import ( - "bytes" - "compress/gzip" - "context" - "crypto/tls" - "errors" - "fmt" - "io" - "math" - mrand "math/rand/v2" - "net/http" - "net/textproto" - "runtime" - "strconv" - "strings" - "sync" - "time" - - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/consumer" - "go.opentelemetry.io/collector/exporter" - "go.opentelemetry.io/collector/pdata/plog" - "go.opentelemetry.io/collector/pdata/pmetric" - "go.opentelemetry.io/collector/pdata/ptrace" - "go.uber.org/zap" -) - -// signal disambiguates which OTLP/HTTP path the exporter is bound -// to. Set by the factory per createMetrics/createTraces/createLogs. -type signal int - -const ( - signalMetrics signal = iota - signalTraces - signalLogs -) - -// String renders the signal as the lowercase name operators see in -// the OTLP/HTTP path suffix ("metrics"/"traces"/"logs"). Without this, -// zap's structured field formatter would render `signal` fields as the -// underlying int, so operators grepping logs would see "signal=2" -// instead of "signal=logs". -func (s signal) String() string { - switch s { - case signalMetrics: - return "metrics" - case signalTraces: - return "traces" - case signalLogs: - return "logs" - default: - return fmt.Sprintf("unknown(%d)", int(s)) - } -} - -// Constants pinned to the OTLP/HTTP specification at -// https://opentelemetry.io/docs/specs/otlp/#otlphttp. -const ( - pathMetrics = "/v1/metrics" - pathTraces = "/v1/traces" - pathLogs = "/v1/logs" - - contentTypeProto = "application/x-protobuf" - contentTypeJSON = "application/json" - - encodingJSON = "json" - encodingProto = "proto" - - compressionGzip = "gzip" - - headerRetryAfter = "Retry-After" - - // maxResponseReadBytes caps how much of a response body we read - // to avoid unbounded memory use on a misbehaving server. Reading + - // closing the body is required so net/http can reuse the - // keep-alive connection. - maxResponseReadBytes = 64 * 1024 -) - -// Self-telemetry types (kind, selfExporter) live in `selftel.go` -// (sibling-scoped, package-local), not in `internal/selftelemetry` — -// see RFC-0013 §migration. PR-F deletes `internal/selftelemetry`; the -// sibling pattern is what survives. - -// otlpExporter is the per-signal exporter the factory hands out. -// Goroutine-safe; net/http.Client handles concurrent calls. -type otlpExporter struct { - id component.ID - logger *zap.Logger - telemetry selfExporter - transport *http.Transport - client *http.Client - cfg *Config - signal signal - endpoint string - userAgent string - - // canonicalHeaders is cfg.Headers with keys passed through - // textproto.CanonicalMIMEHeaderKey at construction so the per-call - // req.Header.Set hot path doesn't re-canonicalize on every send. - // net/http stores in the canonical form regardless; doing it once - // up front saves an allocation+lower-case sweep per request. - canonicalHeaders map[string]string - - maxRetries int - initialBackoff time.Duration - - // shutdownCtx is canceled by Shutdown so in-flight retry sleeps - // abort immediately. The exporter joins the operator's context - // with shutdownCtx via the merged-context helper below. - shutdownCtx context.Context - shutdownCancel context.CancelFunc - - // inflight tracks active Consume* calls so Shutdown can wait - // for them to drain (with a budget) before closing idle conns. - inflight sync.WaitGroup - - // stopMu guards stopped + makes Shutdown idempotent. ComponentState - // from the v0.1.x internal/pipeline package owned this bookkeeping; - // the upstream `component.Component` contract leaves idempotency to - // the implementation, so we carry it here. - stopMu sync.Mutex - stopped bool -} - -// errShutdown is returned by Consume* after Shutdown has been called. -// Sentinel so callers can match without parsing the string. -var errShutdown = errors.New("otlphttp: exporter is shut down") - -// errPermanentStatus and errRetryableStatus let classifyKind use -// errors.Is rather than string-match on formatted messages. Both map -// to kindDownstream; "io" is reserved for transport-level failures -// (DNS, TLS, dial, context cancellation). -var ( - errPermanentStatus = errors.New("otlphttp: server returned permanent status") - errRetryableStatus = errors.New("otlphttp: server returned retryable status") -) - -func newExporter(ctx context.Context, set exporter.Settings, cfg *Config, sig signal) (*otlpExporter, error) { - endpoint, err := resolveEndpoint(cfg, sig) - if err != nil { - return nil, err - } - - timeout := cfg.Timeout - if timeout == 0 { - timeout = DefaultTimeout - } - maxRetries := cfg.MaxRetries - if maxRetries == 0 { - maxRetries = DefaultMaxRetries - } - initialBackoff := cfg.InitialBackoff - if initialBackoff == 0 { - initialBackoff = DefaultInitialBackoff - } - - transport := buildTransport(cfg) - - logger := set.Logger - if logger == nil { - logger = zap.NewNop() - } - te := newSelfTelemetry(ctx, set, logger) - // insecure_skip_verify regressions get inherited from dev YAML into - // prod. WARN at startup so operators grepping pod logs catch it - // without having to render the ConfigMap. - if cfg.InsecureSkipVerify && strings.HasPrefix(endpoint, "https://") { - logger.Warn("otlphttp: TLS verification disabled (insecure_skip_verify); do not use in production", - zap.String("endpoint", endpoint), - zap.Stringer("signal", sig), - ) - } - - shutdownCtx, shutdownCancel := context.WithCancel(context.Background()) - - return &otlpExporter{ - id: set.ID, - logger: logger, - telemetry: te, - transport: transport, - client: &http.Client{Timeout: timeout, Transport: transport}, - cfg: cfg, - signal: sig, - endpoint: endpoint, - userAgent: buildUserAgent(set.BuildInfo), - canonicalHeaders: canonicalizeHeaders(cfg.Headers), - maxRetries: maxRetries, - initialBackoff: initialBackoff, - shutdownCtx: shutdownCtx, - shutdownCancel: shutdownCancel, - }, nil -} - -// buildTransport clones the runtime's DefaultTransport (falling back -// to a fresh one if the default has been replaced) and applies -// tracecore-specific tuning: a 60s idle-conn timeout aligned with the -// modal AWS NLB / backend-LB idle cutoff (Go's default 90s leaves -// dead connections in the pool that fail on next use), and an -// optional InsecureSkipVerify toggle for operator-opted-in TLS bypass. -func buildTransport(cfg *Config) *http.Transport { - var transport *http.Transport - if dt, ok := http.DefaultTransport.(*http.Transport); ok { - transport = dt.Clone() - } else { - transport = &http.Transport{} - } - transport.IdleConnTimeout = 60 * time.Second - if cfg.InsecureSkipVerify { - transport.TLSClientConfig = &tls.Config{InsecureSkipVerify: true} //nolint:gosec // operator-opted via config - } - return transport -} - -// canonicalizeHeaders applies textproto.CanonicalMIMEHeaderKey to each -// header key once at construction so the per-call req.Header assign -// hot path doesn't re-canonicalize on every send. Returns nil for an -// empty input so the caller's nil-store contract stays intact. -func canonicalizeHeaders(headers map[string]string) map[string]string { - if len(headers) == 0 { - return nil - } - out := make(map[string]string, len(headers)) - for k, v := range headers { - out[textproto.CanonicalMIMEHeaderKey(k)] = v - } - return out -} - -// Start satisfies the upstream component.Component contract. The -// exporter has no async init; net/http.Client is wired in -// newExporter. Idempotent. -func (e *otlpExporter) Start(context.Context, component.Host) error { return nil } - -// Shutdown cancels in-flight retry sleeps, waits for any active -// Consume* to drain (capped by ctx), and closes idle HTTP connections. -// Idempotent: subsequent calls return nil without re-cancelling. -func (e *otlpExporter) Shutdown(ctx context.Context) error { - e.stopMu.Lock() - if e.stopped { - e.stopMu.Unlock() - return nil - } - e.stopped = true - e.stopMu.Unlock() - - e.shutdownCancel() - done := make(chan struct{}) - go func() { - e.inflight.Wait() - close(done) - }() - select { - case <-done: - case <-ctx.Done(): - // Drain budget exceeded. In-flight calls bail on shutdownCtx; - // operators investigating dropped-data incidents need this - // WARN in the on-call grep path. - e.logger.Warn("otlphttp: shutdown drain budget exceeded; in-flight calls will bail on their next iteration", - zap.Stringer("signal", e.signal), - ) - } - e.transport.CloseIdleConnections() - return nil -} - -// stoppedFlag returns the current shutdown state under the mutex. -// Used by Consume* paths after their inflight.Add(1) to short-circuit -// post-Shutdown calls. -func (e *otlpExporter) stoppedFlag() bool { - e.stopMu.Lock() - defer e.stopMu.Unlock() - return e.stopped -} - -// resolveEndpoint picks the per-signal URL, appending the OTLP path -// suffix if the operator provided only the base Endpoint. -func resolveEndpoint(cfg *Config, sig signal) (string, error) { - var override, suffix string - switch sig { - case signalMetrics: - override, suffix = cfg.MetricsEndpoint, pathMetrics - case signalTraces: - override, suffix = cfg.TracesEndpoint, pathTraces - case signalLogs: - override, suffix = cfg.LogsEndpoint, pathLogs - default: - return "", fmt.Errorf("otlphttp: unknown signal %d", sig) - } - if override != "" { - return override, nil - } - if cfg.Endpoint == "" { - return "", fmt.Errorf("otlphttp: no endpoint configured for signal %d", sig) - } - return strings.TrimRight(cfg.Endpoint, "/") + suffix, nil -} - -// buildUserAgent mirrors upstream otlphttpexporter's format so -// operators familiar with the OTel ecosystem recognize the string. -func buildUserAgent(bi component.BuildInfo) string { - desc := bi.Description - if desc == "" { - desc = "tracecore" - } - ver := bi.Version - if ver == "" { - ver = "dev" - } - return fmt.Sprintf("%s/%s (%s/%s)", desc, ver, runtime.GOOS, runtime.GOARCH) -} - -// newSelfTelemetry wires the per-exporter self-telemetry handle. -// Returns a no-op when the MeterProvider is absent or instrument -// registration fails; the register-failure path also ticks -// `otelcol.selftelemetry.init_errors_total` via recordInitError so -// operators can alert on > 0. Mirrors the stdoutexporter sibling — -// same wire shape, no internal/selftelemetry import. -// -// NOTE on ExporterCarrier removal: -// -// v0.1.x otlphttp exposed `SelfExporter() selftelemetry.Exporter` so -// the runtime's reader-collection path could feed -// `tracecore.exporter.failure_rate`. RFC-0013 PR-A2 deleted the -// `cmd/tracecore` hand-wired entry point and PR-F.1 deleted -// `internal/selftelemetry` entirely — the carrier surface has no -// remaining consumer. -// -// - `otelcol.exporter.otlphttp.calls_total{result,kind,component_id}` -// continues to surface because the sibling impl emits it on -// `set.MeterProvider` directly — dashboards / alerts keyed on the -// calls_total counter rate-derive failure via -// PromQL `rate(otelcol_exporter_otlphttp_calls_total{result="error"}[5m])`. -// - The per-exporter failure_rate gauge feed is intentionally -// dropped; the v0.1.x SLO observable gauge contract is replaced -// by the upstream OCB pipeline-runtime counters. -func newSelfTelemetry(ctx context.Context, set exporter.Settings, logger *zap.Logger) selfExporter { - if set.MeterProvider == nil { - logger.Warn("otlphttp: no MeterProvider; self-telemetry using noop") - return newNoopSelfExporter() - } - e, err := newSelfExporter(set.ID, set.MeterProvider) - if err != nil { - recordInitError(ctx, set.MeterProvider, - "exporter", set.ID.String(), reasonInstrumentRegister) - logger.Warn("otlphttp self-telemetry init failed; using noop", zap.Error(err)) - return newNoopSelfExporter() - } - return e -} - -// ErrShutdownSentinel exposes errShutdown for external test packages -// to match via errors.Is without parsing the error string. Package- -// internal callers use errShutdown directly. -func ErrShutdownSentinel() error { return errShutdown } - -// Capabilities reports MutatesData=false; the exporter only reads -// the payload. -func (*otlpExporter) Capabilities() consumer.Capabilities { - return consumer.Capabilities{MutatesData: false} -} - -// ConsumeMetrics serializes and POSTs md. Empty payloads no-op. -// -// inflight.Add(1) MUST precede the stopped check. Reverse order -// opens a TOCTOU window: Shutdown's drain completes (counter == 0) -// before this call registers, then this call POSTs against a -// transport whose idle conns were just closed. -func (e *otlpExporter) ConsumeMetrics(ctx context.Context, md pmetric.Metrics) error { - e.inflight.Add(1) - defer e.inflight.Done() - if e.stoppedFlag() { - return errShutdown - } - if md.MetricCount() == 0 { - e.telemetry.IncCallSuccess() - return nil - } - body, encoding, err := e.marshalMetrics(md) - if err != nil { - e.telemetry.IncCallFailure(kindMarshal) - return fmt.Errorf("otlphttp: marshal metrics: %w", err) - } - return e.send(ctx, body, encoding) -} - -// ConsumeTraces; see ConsumeMetrics. -func (e *otlpExporter) ConsumeTraces(ctx context.Context, td ptrace.Traces) error { - e.inflight.Add(1) - defer e.inflight.Done() - if e.stoppedFlag() { - return errShutdown - } - if td.SpanCount() == 0 { - e.telemetry.IncCallSuccess() - return nil - } - body, encoding, err := e.marshalTraces(td) - if err != nil { - e.telemetry.IncCallFailure(kindMarshal) - return fmt.Errorf("otlphttp: marshal traces: %w", err) - } - return e.send(ctx, body, encoding) -} - -// ConsumeLogs; see ConsumeMetrics. -func (e *otlpExporter) ConsumeLogs(ctx context.Context, ld plog.Logs) error { - e.inflight.Add(1) - defer e.inflight.Done() - if e.stoppedFlag() { - return errShutdown - } - if ld.LogRecordCount() == 0 { - e.telemetry.IncCallSuccess() - return nil - } - body, encoding, err := e.marshalLogs(ld) - if err != nil { - e.telemetry.IncCallFailure(kindMarshal) - return fmt.Errorf("otlphttp: marshal logs: %w", err) - } - return e.send(ctx, body, encoding) -} - -// marshalMetrics emits OTLP wire bytes. `pmetric.ProtoMarshaler` and -// `pmetricotlp.ExportRequest.MarshalProto` produce byte-identical -// output (both call internal.MetricsToProto); the bare marshaler -// avoids pulling in the `pmetricotlp` subpackage's gRPC transitive -// dependencies. -func (e *otlpExporter) marshalMetrics(md pmetric.Metrics) ([]byte, string, error) { - if e.cfg.Encoding == encodingJSON { - bs, err := (&pmetric.JSONMarshaler{}).MarshalMetrics(md) - if err != nil { - return nil, contentTypeJSON, fmt.Errorf("pdata pmetric JSON marshal: %w", err) - } - return bs, contentTypeJSON, nil - } - bs, err := (&pmetric.ProtoMarshaler{}).MarshalMetrics(md) - if err != nil { - return nil, contentTypeProto, fmt.Errorf("pdata pmetric proto marshal: %w", err) - } - return bs, contentTypeProto, nil -} - -func (e *otlpExporter) marshalTraces(td ptrace.Traces) ([]byte, string, error) { - if e.cfg.Encoding == encodingJSON { - bs, err := (&ptrace.JSONMarshaler{}).MarshalTraces(td) - if err != nil { - return nil, contentTypeJSON, fmt.Errorf("pdata ptrace JSON marshal: %w", err) - } - return bs, contentTypeJSON, nil - } - bs, err := (&ptrace.ProtoMarshaler{}).MarshalTraces(td) - if err != nil { - return nil, contentTypeProto, fmt.Errorf("pdata ptrace proto marshal: %w", err) - } - return bs, contentTypeProto, nil -} - -func (e *otlpExporter) marshalLogs(ld plog.Logs) ([]byte, string, error) { - if e.cfg.Encoding == encodingJSON { - bs, err := (&plog.JSONMarshaler{}).MarshalLogs(ld) - if err != nil { - return nil, contentTypeJSON, fmt.Errorf("pdata plog JSON marshal: %w", err) - } - return bs, contentTypeJSON, nil - } - bs, err := (&plog.ProtoMarshaler{}).MarshalLogs(ld) - if err != nil { - return nil, contentTypeProto, fmt.Errorf("pdata plog proto marshal: %w", err) - } - return bs, contentTypeProto, nil -} - -// send POSTs the payload with retry on the OTLP-spec-blessed -// retryable codes. Returns nil on full success or partial-success; -// returns error on permanent failure or retry exhaustion. -func (e *otlpExporter) send(ctx context.Context, body []byte, contentType string) error { - body, contentEncoding, err := e.maybeCompress(body) - if err != nil { - e.telemetry.IncCallFailure(kindMarshal) - return fmt.Errorf("otlphttp: compress: %w", err) - } - return e.sendWithRetry(ctx, e.endpoint, body, contentType, contentEncoding) -} - -// sendWithRetry runs the retry/backoff loop around doOnce per the -// OTLP/HTTP spec: retry on 429/502/503/504 + transport errors, honor -// Retry-After when set, fall back to exponential backoff otherwise. -// Caller is responsible for compression — sendWithRetry POSTs the -// body bytes verbatim. Endpoint is per-signal and cached at -// construction; passed explicitly so the loop body has no -// receiver-state surprises. -func (e *otlpExporter) sendWithRetry(ctx context.Context, endpoint string, body []byte, contentType, contentEncoding string) error { - var lastErr error - maxAttempts := e.maxRetries + 1 // include the initial try - for attempt := 0; attempt < maxAttempts; attempt++ { - if err := firstCanceledErr(ctx, e.shutdownCtx); err != nil { - e.telemetry.IncCallFailure(kindIO) - return err - } - retryable, hint, err := e.doOnce(ctx, endpoint, body, contentType, contentEncoding) - if err == nil { - e.telemetry.IncCallSuccess() - return nil - } - // Short-circuit retry if ctx or shutdown was cancelled. net/http - // wraps context errors, so without this check the loop spins to - // maxAttempts-1 on an already-cancelled call. - if cerr := firstCanceledErr(ctx, e.shutdownCtx); cerr != nil { - e.telemetry.IncCallFailure(kindIO) - return cerr - } - lastErr = err - if !retryable || attempt == maxAttempts-1 { - e.telemetry.IncCallFailure(classifyKind(err)) - return err - } - // Honor server's Retry-After when set (including 0). Only - // fall back to exponential backoff when the header is absent - // or unparseable. - var wait time.Duration - if hint.set { - wait = hint.delay - } else { - wait = e.backoff(attempt) - } - e.logger.Debug("otlphttp: retrying", - zap.Int("attempt", attempt+1), - zap.Int("max_attempts", maxAttempts), - zap.Duration("wait", wait), - zap.Error(err), - ) - // NewTimer + Stop instead of time.After: when ctx or - // shutdownCtx fires before the timer expires, the underlying - // runtime timer stays armed until its scheduled wake instead of - // being released immediately. A bounded leak at steady state, - // but cheap to avoid. - timer := time.NewTimer(wait) - select { - case <-ctx.Done(): - timer.Stop() - e.telemetry.IncCallFailure(kindIO) - return ctx.Err() - case <-e.shutdownCtx.Done(): - timer.Stop() - e.telemetry.IncCallFailure(kindIO) - return errShutdown - case <-timer.C: - } - } - return lastErr -} - -// firstCanceledErr returns the canceled context's error (errShutdown -// for the shutdownCtx branch), or nil if both are live. Non-blocking. -func firstCanceledErr(ctx, shutdown context.Context) error { - if err := ctx.Err(); err != nil { - return err //nolint:wrapcheck // pass through context cancellation verbatim - } - if err := shutdown.Err(); err != nil { - return errShutdown - } - return nil -} - -// retryHint carries the Retry-After value the server suggested. -// `set` is true when the header was present and parsed successfully; -// false means the caller should fall back to its backoff schedule. -// `delay` may be zero; server can legitimately say "retry now" -// with Retry-After: 0. -type retryHint struct { - set bool - delay time.Duration -} - -// doOnce performs a single HTTP round trip. Returns (retryable, -// retryHint, error). -func (e *otlpExporter) doOnce(ctx context.Context, endpoint string, body []byte, contentType, contentEncoding string) (bool, retryHint, error) { - req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, bytes.NewReader(body)) - if err != nil { - return false, retryHint{}, fmt.Errorf("build request: %w", err) - } - req.Header.Set("Content-Type", contentType) - if contentEncoding != "" { - req.Header.Set("Content-Encoding", contentEncoding) - } - req.Header.Set("User-Agent", e.userAgent) - // Keys were canonicalized once at construction; assign directly to - // req.Header to skip net/http's per-call MIMECanonicalKey pass. - for k, v := range e.canonicalHeaders { - req.Header[k] = []string{v} - } - - resp, err := e.client.Do(req) //nolint:bodyclose // drainAndClose handles it in defer - if err != nil { - // Network errors are retryable per the spec ("If the client - // cannot connect to the server, the client SHOULD retry"). - return true, retryHint{}, fmt.Errorf("post: %w", err) - } - defer drainAndClose(resp.Body) - - switch { - case resp.StatusCode == http.StatusOK: - // Partial-success body decoding is deferred to v2; the OTLP - // spec marks client handling as OPTIONAL. v1 treats all 200s - // as full success without parsing the body. Operators who - // want partial-success reporting can scrape the - // `otelcol.exporter.otlphttp.calls_total` counter, which still - // counts the 200 as a success here. - return false, retryHint{}, nil - case isRetryableStatus(resp.StatusCode): - hint := parseRetryAfter(resp.Header.Get(headerRetryAfter), time.Now()) - return true, hint, fmt.Errorf("%w %d", errRetryableStatus, resp.StatusCode) - default: - return false, retryHint{}, fmt.Errorf("%w %d", errPermanentStatus, resp.StatusCode) - } -} - -// isRetryableStatus encodes the OTLP/HTTP spec's allow-list of -// retryable codes: 429, 502, 503, 504. ALL OTHER 4xx OR 5xx MUST -// NOT BE RETRIED. -func isRetryableStatus(code int) bool { - switch code { - case http.StatusTooManyRequests, http.StatusBadGateway, - http.StatusServiceUnavailable, http.StatusGatewayTimeout: - return true - } - return false -} - -// maxRetryAfter caps a server-sent Retry-After to a sane upper bound. -// A misbehaving (or malicious) backend can otherwise pin the exporter -// for 24h+ on a single response (reviewer V23). 60s aligns with the -// exponential-backoff cap. -const maxRetryAfter = 60 * time.Second - -// parseRetryAfter implements the two-step parse the OTLP/HTTP spec -// inherits from RFC 7231: integer seconds first, then HTTP-date. -// Returns (hint, set=true) on parseable values; (zero, set=false) -// when the header is absent or unparseable. Negative integer seconds -// or past dates clamp to delay=0 with set=true so the caller still -// short-circuits the backoff schedule. Delays exceeding maxRetryAfter -// are clamped to that ceiling. -func parseRetryAfter(value string, now time.Time) retryHint { - value = strings.TrimSpace(value) // reviewer V27: envoy/HAProxy occasionally trail - if value == "" { - return retryHint{} - } - if secs, err := strconv.Atoi(value); err == nil { - if secs < 0 { - return retryHint{set: true, delay: 0} - } - // Clamp before constructing the Duration; guards reviewer V23 - // (24h DoS) AND reviewer-S4 integer-overflow on extreme values. - if secs > int(maxRetryAfter/time.Second) { - secs = int(maxRetryAfter / time.Second) - } - return retryHint{set: true, delay: time.Duration(secs) * time.Second} - } - if t, err := http.ParseTime(value); err == nil { - d := t.Sub(now) - if d < 0 { - d = 0 - } - if d > maxRetryAfter { - d = maxRetryAfter - } - return retryHint{set: true, delay: d} - } - return retryHint{} -} - -// backoff returns the inter-retry pause for attempt N (0-indexed): -// initial × 2^N, with ±20% jitter. The base is capped at 1 minute -// AFTER jitter is applied so the upper bound is exactly 60s, not -// 72s (reviewer M3). -// -// Uses math/rand/v2 which is goroutine-safe and lock-free; the -// jitter does not need cryptographic randomness (reviewer M4). -func (e *otlpExporter) backoff(attempt int) time.Duration { - base := float64(e.initialBackoff) * math.Pow(2, float64(attempt)) - frac := mrand.Float64() //nolint:gosec // jitter is not security-sensitive; math/rand/v2 is correct here - jitter := (frac*0.4 - 0.2) * base // [-20%, +20%) - out := time.Duration(base + jitter) - if out > time.Minute { - out = time.Minute - } - if out < 0 { - out = 0 - } - return out -} - -// gzipMinBytes skips gzip below this size. Gzip header + DEFLATE block -// overhead (~18-30 bytes) makes single-datapoint OTLP payloads grow -// when compressed. -const gzipMinBytes = 1024 - -// maybeCompress applies gzip when the operator opted in AND the -// payload is above the size threshold. Returns the (possibly-rewritten) -// body and the Content-Encoding header to use. -func (e *otlpExporter) maybeCompress(body []byte) ([]byte, string, error) { - if e.cfg.Compression != compressionGzip || len(body) < gzipMinBytes { - return body, "", nil - } - var buf bytes.Buffer - w := gzip.NewWriter(&buf) - if _, err := w.Write(body); err != nil { - _ = w.Close() - return nil, "", fmt.Errorf("gzip write: %w", err) - } - if err := w.Close(); err != nil { - return nil, "", fmt.Errorf("gzip close: %w", err) - } - return buf.Bytes(), compressionGzip, nil -} - -// classifyKind maps a transport error to a low-cardinality kind tag. -// Retry-exhausted 5xx remains downstream, NOT io: operators triaging -// by kind look at network for io and at the backend for downstream. -func classifyKind(err error) kind { - if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { - return kindIO - } - if errors.Is(err, errPermanentStatus) || errors.Is(err, errRetryableStatus) { - return kindDownstream - } - return kindIO -} - -// drainAndClose reads up to maxResponseReadBytes from the body then -// closes it. Required for net/http keep-alive: the connection is -// only reused if the body is fully consumed. -func drainAndClose(r io.ReadCloser) { - if r == nil { - return - } - _, _ = io.CopyN(io.Discard, r, maxResponseReadBytes) - _ = r.Close() -} diff --git a/components/exporters/otlphttp/otlphttp_test.go b/components/exporters/otlphttp/otlphttp_test.go deleted file mode 100644 index 55815523..00000000 --- a/components/exporters/otlphttp/otlphttp_test.go +++ /dev/null @@ -1,940 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package otlphttp_test - -import ( - "bytes" - "compress/gzip" - "context" - "fmt" - "io" - "net/http" - "net/http/httptest" - "runtime" - "strconv" - "sync" - "sync/atomic" - "testing" - "time" - - "github.com/stretchr/testify/require" - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/exporter" - "go.opentelemetry.io/collector/exporter/exportertest" - "go.opentelemetry.io/collector/pdata/pcommon" - "go.opentelemetry.io/collector/pdata/plog" - "go.opentelemetry.io/collector/pdata/pmetric" - "go.opentelemetry.io/collector/pdata/ptrace" - "go.uber.org/zap" - "go.uber.org/zap/zapcore" - "go.uber.org/zap/zaptest/observer" - - "github.com/tracecoreai/tracecore/components/exporters/otlphttp" -) - -func newMetrics(t *testing.T) pmetric.Metrics { - t.Helper() - md := pmetric.NewMetrics() - rm := md.ResourceMetrics().AppendEmpty() - sm := rm.ScopeMetrics().AppendEmpty() - m := sm.Metrics().AppendEmpty() - m.SetName("tracecore.test") - g := m.SetEmptyGauge() - dp := g.DataPoints().AppendEmpty() - dp.SetTimestamp(pcommon.NewTimestampFromTime(time.Unix(0, 0))) - dp.SetIntValue(42) - return md -} - -// newLargeMetrics builds a pmetric.Metrics that serializes well above -// the gzipMinBytes threshold. Adds N gauge datapoints each carrying -// a stable string attribute so wire bytes grow linearly with N. -func newLargeMetrics(t *testing.T, count int) pmetric.Metrics { - t.Helper() - md := pmetric.NewMetrics() - rm := md.ResourceMetrics().AppendEmpty() - sm := rm.ScopeMetrics().AppendEmpty() - m := sm.Metrics().AppendEmpty() - m.SetName("tracecore.test.large") - g := m.SetEmptyGauge() - for i := 0; i < count; i++ { - dp := g.DataPoints().AppendEmpty() - dp.SetTimestamp(pcommon.NewTimestampFromTime(time.Unix(int64(i), 0))) - dp.SetIntValue(int64(i)) - dp.Attributes().PutStr("k", "value-padding-to-exceed-the-threshold") - } - return md -} - -func newTraces(t *testing.T) ptrace.Traces { - t.Helper() - td := ptrace.NewTraces() - rs := td.ResourceSpans().AppendEmpty() - ss := rs.ScopeSpans().AppendEmpty() - s := ss.Spans().AppendEmpty() - s.SetName("tracecore.test.span") - s.SetStartTimestamp(pcommon.NewTimestampFromTime(time.Unix(0, 0))) - s.SetEndTimestamp(pcommon.NewTimestampFromTime(time.Unix(0, 1))) - return td -} - -func newLogs(t *testing.T) plog.Logs { - t.Helper() - ld := plog.NewLogs() - rl := ld.ResourceLogs().AppendEmpty() - sl := rl.ScopeLogs().AppendEmpty() - lr := sl.LogRecords().AppendEmpty() - lr.SetSeverityText("INFO") - lr.Body().SetStr("tracecore.test.log") - return ld -} - -// nopSettings returns exporter.Settings via exportertest.NewNopSettings -// — tracks upstream BuildInfo / TelemetrySettings shape changes without -// the test file having to mirror them by hand. -func nopSettings() exporter.Settings { - return exportertest.NewNopSettings(componentType()) -} - -func mustMetricsExporter(t *testing.T, cfg *otlphttp.Config) exporter.Metrics { - t.Helper() - exp, err := otlphttp.NewFactory().CreateMetrics(t.Context(), nopSettings(), cfg) - require.NoError(t, err) - return exp -} - -func mustTracesExporter(t *testing.T, cfg *otlphttp.Config) exporter.Traces { - t.Helper() - exp, err := otlphttp.NewFactory().CreateTraces(t.Context(), nopSettings(), cfg) - require.NoError(t, err) - return exp -} - -func mustLogsExporter(t *testing.T, cfg *otlphttp.Config) exporter.Logs { - t.Helper() - exp, err := otlphttp.NewFactory().CreateLogs(t.Context(), nopSettings(), cfg) - require.NoError(t, err) - return exp -} - -// recordingHandler captures every HTTP request into a slice for -// assertions. Goroutine-safe. -type recordingHandler struct { - mu sync.Mutex - requests []recordedRequest - - // Each call returns the next entry in `responses` (rotating if - // it runs out). Default response is 200 OK. - responses []recordedResponse - respIdx atomic.Int32 -} - -type recordedRequest struct { - Path string - Method string - ContentType string - ContentEncoding string - UserAgent string - Headers http.Header - Body []byte -} - -type recordedResponse struct { - Status int - RetryAfter string - Body []byte -} - -func (h *recordingHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { - body, _ := io.ReadAll(r.Body) - h.mu.Lock() - h.requests = append(h.requests, recordedRequest{ - Path: r.URL.Path, - Method: r.Method, - ContentType: r.Header.Get("Content-Type"), - ContentEncoding: r.Header.Get("Content-Encoding"), - UserAgent: r.Header.Get("User-Agent"), - Headers: r.Header.Clone(), - Body: body, - }) - h.mu.Unlock() - - var resp recordedResponse - if len(h.responses) > 0 { - i := int(h.respIdx.Add(1)-1) % len(h.responses) - resp = h.responses[i] - } - if resp.Status == 0 { - resp.Status = http.StatusOK - } - if resp.RetryAfter != "" { - w.Header().Set("Retry-After", resp.RetryAfter) - } - w.WriteHeader(resp.Status) - if len(resp.Body) > 0 { - _, _ = w.Write(resp.Body) - } -} - -func (h *recordingHandler) seen() []recordedRequest { - h.mu.Lock() - defer h.mu.Unlock() - out := make([]recordedRequest, len(h.requests)) - copy(out, h.requests) - return out -} - -func TestExporter_PostsMetricsAsOTLPProto(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{Endpoint: srv.URL}) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, http.MethodPost, got[0].Method) - require.Equal(t, "/v1/metrics", got[0].Path) - require.Equal(t, "application/x-protobuf", got[0].ContentType) - require.NotEmpty(t, got[0].Body, "proto body must be non-empty") -} - -func TestExporter_PostsTracesAsOTLPProto(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustTracesExporter(t, &otlphttp.Config{Endpoint: srv.URL}) - require.NoError(t, exp.ConsumeTraces(t.Context(), newTraces(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "/v1/traces", got[0].Path) - require.Equal(t, "application/x-protobuf", got[0].ContentType) -} - -func TestExporter_PostsLogsAsOTLPProto(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustLogsExporter(t, &otlphttp.Config{Endpoint: srv.URL}) - require.NoError(t, exp.ConsumeLogs(t.Context(), newLogs(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "/v1/logs", got[0].Path) - require.Equal(t, "application/x-protobuf", got[0].ContentType) -} - -func TestExporter_JSONEncodingSetsJSONContentType(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{Endpoint: srv.URL, Encoding: "json"}) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "application/json", got[0].ContentType) -} - -func TestExporter_EmptyMetrics_NoHTTPCall(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{Endpoint: srv.URL}) - require.NoError(t, exp.ConsumeMetrics(t.Context(), pmetric.NewMetrics())) - require.Empty(t, h.seen(), "empty Metrics must not produce HTTP traffic") -} - -func TestExporter_SendsUserAgent(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - set := nopSettings() - set.BuildInfo = component.BuildInfo{ - Command: "tracecore", - Description: "tracecore", - Version: "v0.1.0-test", - } - exp, err := otlphttp.NewFactory().CreateMetrics(t.Context(), set, &otlphttp.Config{Endpoint: srv.URL}) - require.NoError(t, err) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Contains(t, got[0].UserAgent, "tracecore/v0.1.0-test") - require.Contains(t, got[0].UserAgent, "(") - require.Contains(t, got[0].UserAgent, "/") // goos/goarch separator -} - -func TestExporter_PropagatesCustomHeaders(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - Headers: map[string]string{"X-Custom-Auth": "fake-token-abc"}, - }) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "fake-token-abc", got[0].Headers.Get("X-Custom-Auth")) -} - -func TestExporter_GzipCompressionRoundTrips(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{Endpoint: srv.URL, Compression: "gzip"}) - // Payload must be ≥1024 bytes to clear the gzip-skip threshold. - // Build with many datapoints to exceed it. - require.NoError(t, exp.ConsumeMetrics(t.Context(), newLargeMetrics(t, 64))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "gzip", got[0].ContentEncoding) - - // The body must be valid gzip and decode to non-empty bytes. - gz, err := gzip.NewReader(bytes.NewReader(got[0].Body)) - require.NoError(t, err) - t.Cleanup(func() { _ = gz.Close() }) - plain, err := io.ReadAll(gz) - require.NoError(t, err) - require.NotEmpty(t, plain) -} - -// TestExporter_GzipSkippedOnSmallPayload pins the v0.1.0 threshold: -// below gzipMinBytes the exporter does NOT compress even when the -// operator opted in. Saves CPU + bytes on single-datapoint emits. -func TestExporter_GzipSkippedOnSmallPayload(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{Endpoint: srv.URL, Compression: "gzip"}) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Empty(t, got[0].ContentEncoding, - "single-datapoint payload below the gzip threshold must NOT compress") -} - -func TestExporter_RetriesOn503ThenSucceeds(t *testing.T) { - t.Parallel() - h := &recordingHandler{ - responses: []recordedResponse{ - {Status: http.StatusServiceUnavailable}, - {Status: http.StatusOK}, - }, - } - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 2, - InitialBackoff: 10 * time.Millisecond, - }) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - require.Len(t, h.seen(), 2, "must retry once after 503, then succeed") -} - -func TestExporter_DoesNotRetryOnPermanent400(t *testing.T) { - t.Parallel() - h := &recordingHandler{ - responses: []recordedResponse{{Status: http.StatusBadRequest}}, - } - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 5, // generous; we expect ZERO retries - InitialBackoff: 1 * time.Millisecond, - }) - err := exp.ConsumeMetrics(t.Context(), newMetrics(t)) - require.ErrorContains(t, err, "permanent status 400") - require.Len(t, h.seen(), 1, "400 is permanent; must not retry") -} - -func TestExporter_DoesNotRetryOn404(t *testing.T) { - t.Parallel() - h := &recordingHandler{ - responses: []recordedResponse{{Status: http.StatusNotFound}}, - } - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 5, - InitialBackoff: 1 * time.Millisecond, - }) - err := exp.ConsumeMetrics(t.Context(), newMetrics(t)) - require.ErrorContains(t, err, "permanent status 404") - require.Len(t, h.seen(), 1) -} - -func TestExporter_StopsRetryingAtMaxAttempts(t *testing.T) { - t.Parallel() - h := &recordingHandler{ - responses: []recordedResponse{{Status: http.StatusServiceUnavailable}}, - } - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 2, - InitialBackoff: 1 * time.Millisecond, - }) - err := exp.ConsumeMetrics(t.Context(), newMetrics(t)) - require.ErrorContains(t, err, "retryable status 503") - require.Len(t, h.seen(), 3, "1 initial + 2 retries = 3 attempts") -} - -func TestExporter_HonorsRetryAfterIntegerSeconds(t *testing.T) { - t.Parallel() - h := &recordingHandler{ - responses: []recordedResponse{ - {Status: http.StatusServiceUnavailable, RetryAfter: "0"}, // 0 = no wait - {Status: http.StatusOK}, - }, - } - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 1, - InitialBackoff: 10 * time.Second, // would dominate if Retry-After ignored - }) - - start := time.Now() - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - elapsed := time.Since(start) - // Retry-After=0 should produce sub-second total wall-clock; - // if InitialBackoff were used, elapsed >= 10s. Allow generous - // slack on shared CI runners. - require.Less(t, elapsed, 2*time.Second, "Retry-After=0 must short-circuit InitialBackoff") -} - -func TestExporter_RetriesOnNetworkError(t *testing.T) { - t.Parallel() - // Endpoint that resolves but always refuses connection - // (port 1 is reserved + filtered on most hosts). - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: "http://127.0.0.1:1", - MaxRetries: 2, - InitialBackoff: 1 * time.Millisecond, - Timeout: 100 * time.Millisecond, - }) - err := exp.ConsumeMetrics(t.Context(), newMetrics(t)) - require.Error(t, err, "unreachable endpoint must surface error after retries") -} - -func TestExporter_ConcurrentConsumeMetricsIsSafe(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{Endpoint: srv.URL}) - - const concurrent = 16 - var wg sync.WaitGroup - for i := 0; i < concurrent; i++ { - wg.Add(1) - go func() { - defer wg.Done() - _ = exp.ConsumeMetrics(t.Context(), newMetrics(t)) - }() - } - wg.Wait() - - require.Len(t, h.seen(), concurrent) -} - -// TestExporter_RetryExhaustionLabelsKindDownstream pins the -// sentinel-error routing: a retry-exhausted 503 must label as -// kindDownstream (server-fault), not kindIO (transport-fault). The -// README's Self-telemetry labels table mandates this. -func TestExporter_RetryExhaustionLabelsKindDownstream(t *testing.T) { - t.Parallel() - h := &recordingHandler{ - responses: []recordedResponse{{Status: http.StatusServiceUnavailable}}, - } - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 1, - InitialBackoff: 1 * time.Millisecond, - }) - err := exp.ConsumeMetrics(t.Context(), newMetrics(t)) - require.Error(t, err) -} - -// TestExporter_AddInflightBeforeStoppedCheckPinsTOCTOU asserts that -// once Add(1) runs, Shutdown.Wait MUST observe this call. The -// inverted order (stoppedFlag() then Add) opened a window where a -// drained Shutdown returned before the call added itself. -// -// We force-execute the post-Add path by Shutdown'ing AFTER one call -// has Add'd; the call must complete its POST (a sink Reachable -// confirms the Add was observed) and the Shutdown drain must wait -// for it. -func TestExporter_AddInflightBeforeStoppedCheckPinsTOCTOU(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp, err := otlphttp.NewFactory().CreateMetrics(t.Context(), nopSettings(), &otlphttp.Config{Endpoint: srv.URL}) - require.NoError(t, err) - - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - require.Len(t, h.seen(), 1, "first POST must land") - - // Shutdown synchronously; if a call were currently in flight, the - // drain would wait for it. Here the call already returned, so - // drain completes immediately. The contract this test pins is: - // AFTER Shutdown returns, no further Consume* may POST. - require.NoError(t, exp.Shutdown(t.Context())) - - err = exp.ConsumeMetrics(t.Context(), newMetrics(t)) - require.ErrorIs(t, err, otlphttp.ErrShutdownSentinel(), - "post-Shutdown Consume must return errShutdown") - require.Len(t, h.seen(), 1, "post-Shutdown Consume must not POST") -} - -func TestExporter_RespectsTracesEndpointOverride(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustTracesExporter(t, &otlphttp.Config{ - Endpoint: "http://example-base.invalid", - TracesEndpoint: srv.URL + "/custom/traces/endpoint", - }) - require.NoError(t, exp.ConsumeTraces(t.Context(), newTraces(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "/custom/traces/endpoint", got[0].Path) -} - -func TestExporter_RespectsLogsEndpointOverride(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustLogsExporter(t, &otlphttp.Config{ - Endpoint: "http://example-base.invalid", - LogsEndpoint: srv.URL + "/custom/logs/endpoint", - }) - require.NoError(t, exp.ConsumeLogs(t.Context(), newLogs(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "/custom/logs/endpoint", got[0].Path) -} - -func TestExporter_RespectsMetricsEndpointOverride(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: "http://example-base.invalid", // bogus base - MetricsEndpoint: srv.URL + "/custom/metrics/endpoint", // overrides base - }) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "/custom/metrics/endpoint", got[0].Path, - "per-signal override must replace the base path") -} - -// TestExporter_BodyIsWellFormedOTLPProto pins the byte-validity claim -// in the README: the bytes we POST are well-formed OTLP protobuf that -// round-trips through pmetric.ProtoUnmarshaler. This proves the bare -// pmetric.ProtoMarshaler produces OTLP wire bytes (not some internal -// representation), without pulling pmetricotlp into go.sum. -func TestExporter_BodyIsWellFormedOTLPProto(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{Endpoint: srv.URL}) - original := newMetrics(t) - require.NoError(t, exp.ConsumeMetrics(t.Context(), original)) - - got := h.seen() - require.Len(t, got, 1) - - // Round-trip the wire bytes through the bare pdata unmarshaler. - // If the body is not valid OTLP protobuf, this fails. - round, err := (&pmetric.ProtoUnmarshaler{}).UnmarshalMetrics(got[0].Body) - require.NoError(t, err, "exported bytes must be valid OTLP protobuf") - - // Equality at the OTLP semantic level: same metric count + same - // first-metric name. Full deep-equality is brittle across pdata - // internal-representation changes; these two checks anchor the - // claim "the bytes carry the data we sent." - require.Equal(t, original.MetricCount(), round.MetricCount()) - require.Equal(t, - original.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).Name(), - round.ResourceMetrics().At(0).ScopeMetrics().At(0).Metrics().At(0).Name(), - ) -} - -// TestExporter_ContextCancellationDuringRetrySleepStopsTheLoop asserts -// that ctx cancellation mid-retry-sleep aborts the loop within -// "soon" (well under the still-pending backoff window). -func TestExporter_ContextCancellationDuringRetrySleepStopsTheLoop(t *testing.T) { - t.Parallel() - // Server returns 503 forever so the exporter sits in its retry - // backoff between attempts. - h := &recordingHandler{ - responses: []recordedResponse{{Status: http.StatusServiceUnavailable}}, - } - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 5, - InitialBackoff: 30 * time.Second, // long enough to dominate test time - }) - - ctx, cancel := context.WithCancel(t.Context()) - done := make(chan error, 1) - go func() { - done <- exp.ConsumeMetrics(ctx, newMetrics(t)) - }() - - // Let one attempt happen, then cancel during the backoff sleep. - require.Eventually(t, func() bool { return len(h.seen()) >= 1 }, - 2*time.Second, 10*time.Millisecond, "first attempt must land") - - start := time.Now() - cancel() - select { - case err := <-done: - require.Error(t, err) - require.ErrorIs(t, err, context.Canceled) - require.Less(t, time.Since(start), 5*time.Second, - "ctx cancel must abort the retry sleep, not wait it out") - case <-time.After(5 * time.Second): - t.Fatal("ConsumeMetrics did not return after ctx cancel") - } -} - -// TestExporter_NoGoroutineLeakAcrossManyCalls verifies the exporter -// does not spawn untracked goroutines per ConsumeMetrics call. We -// compare runtime.NumGoroutine before and after; net/http.Transport's -// idle-connection goroutines are part of the noise floor, so we -// allow a small static slack. -func TestExporter_NoGoroutineLeakAcrossManyCalls(t *testing.T) { - // Not t.Parallel; counts goroutines and other tests would skew it. - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{Endpoint: srv.URL}) - - // Warm the transport: first calls spawn the idle-conn goroutines. - for i := 0; i < 4; i++ { - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - } - runtime.GC() - time.Sleep(50 * time.Millisecond) // let any pending close goroutines drain - before := runtime.NumGoroutine() - - for i := 0; i < 50; i++ { - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - } - runtime.GC() - time.Sleep(200 * time.Millisecond) - after := runtime.NumGoroutine() - - // Allow up to 4 idle-conn goroutines worth of variance; anything - // beyond is a leak. The exporter itself spawns zero per call. - require.LessOrEqualf(t, after-before, 4, - "goroutine count grew by %d across 50 calls (before=%d after=%d); leak", - after-before, before, after) -} - -// TestExporter_RetriesOnAllOTLPSpecRetryableStatusCodes pins the -// retryable-code allow-list against the OTLP/HTTP spec (429, 502, -// 503, 504). The exporter MUST retry on each; any other 4xx/5xx is -// permanent. -func TestExporter_RetriesOnAllOTLPSpecRetryableStatusCodes(t *testing.T) { - t.Parallel() - for _, code := range []int{ - http.StatusTooManyRequests, // 429 - http.StatusBadGateway, // 502 - http.StatusServiceUnavailable, // 503 - http.StatusGatewayTimeout, // 504 - } { - code := code - t.Run(strconv.Itoa(code), func(t *testing.T) { - t.Parallel() - h := &recordingHandler{responses: []recordedResponse{ - {Status: code}, - {Status: http.StatusOK}, - }} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 2, - InitialBackoff: 1 * time.Millisecond, - }) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - require.Len(t, h.seen(), 2, - "%d must be retryable; expected 1 retry then 200", code) - }) - } -} - -func TestExporter_HonorsRetryAfterHTTPDate(t *testing.T) { - t.Parallel() - // RFC 1123 date ~1s in the future. Use a fixed past date so the - // "elapsed" math is bounded but non-zero. - future := time.Now().Add(2 * time.Second).UTC().Format(http.TimeFormat) - h := &recordingHandler{responses: []recordedResponse{ - {Status: http.StatusServiceUnavailable, RetryAfter: future}, - {Status: http.StatusOK}, - }} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 2, - InitialBackoff: 30 * time.Second, // would dominate if RFC 1123 ignored - }) - start := time.Now() - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - elapsed := time.Since(start) - // Server's "Retry-After" is ~2s; exponential backoff would be 30s. - // Assert elapsed is far below the 30s alternative. - require.Less(t, elapsed, 10*time.Second, - "RFC 1123 date must be honored over the InitialBackoff fallback") -} - -// TestExporter_ShutdownAbortsInFlightRetrySleep pins the v1.1 -// shutdown contract: calling Shutdown while ConsumeMetrics is in a -// retry-sleep window unblocks the call within milliseconds, returning -// errShutdown. Prevents a multi-second shutdown-hang on retryable -// failures. -func TestExporter_ShutdownAbortsInFlightRetrySleep(t *testing.T) { - t.Parallel() - h := &recordingHandler{responses: []recordedResponse{ - {Status: http.StatusServiceUnavailable}, - }} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp, err := otlphttp.NewFactory().CreateMetrics(t.Context(), nopSettings(), &otlphttp.Config{ - Endpoint: srv.URL, - MaxRetries: 5, - InitialBackoff: 30 * time.Second, // long enough to dominate - }) - require.NoError(t, err) - - done := make(chan error, 1) - go func() { - done <- exp.ConsumeMetrics(t.Context(), newMetrics(t)) - }() - // Let one attempt land so we know we're inside the backoff sleep. - require.Eventually(t, func() bool { return len(h.seen()) >= 1 }, - 2*time.Second, 10*time.Millisecond) - - start := time.Now() - require.NoError(t, exp.Shutdown(t.Context())) - select { - case err := <-done: - require.Error(t, err) - require.Less(t, time.Since(start), 5*time.Second, - "Shutdown must abort retry-sleep, not wait it out") - case <-time.After(5 * time.Second): - t.Fatal("ConsumeMetrics did not return after Shutdown") - } -} - -// TestExporter_ConsumeAfterShutdownReturnsErrShutdown asserts that -// Consume* short-circuits with a clean error after Shutdown; no -// ghost POSTs to the backend. -func TestExporter_ConsumeAfterShutdownReturnsErrShutdown(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp, err := otlphttp.NewFactory().CreateMetrics(t.Context(), nopSettings(), &otlphttp.Config{Endpoint: srv.URL}) - require.NoError(t, err) - require.NoError(t, exp.Shutdown(t.Context())) - - err = exp.ConsumeMetrics(t.Context(), newMetrics(t)) - require.Error(t, err) - require.Empty(t, h.seen(), - "ConsumeMetrics after Shutdown must not POST anything") -} - -func TestConfig_Validate_RejectsEmptyHeaderKey(t *testing.T) { - t.Parallel() - for _, key := range []string{"", " ", "\t"} { - t.Run(fmt.Sprintf("key=%q", key), func(t *testing.T) { - cfg := &otlphttp.Config{ - Endpoint: "http://localhost:4318", - Headers: map[string]string{key: "value"}, - } - require.ErrorContains(t, cfg.Validate(), "header") - }) - } -} - -// Asserts signal arrives in structured log output as its name -// ("metrics"/"traces"/"logs") rather than the underlying int. Without -// the Stringer method on signal, zap's encoder would render the typed- -// int via reflection in a form that exposes the kind id, defeating an -// operator's "signal=logs" grep. Covers all three factory paths since -// each constructs its own signal value. zaptest/observer reads the -// structured Field directly so we don't depend on text-handler -// formatting. -func TestExporter_LogsRenderSignalAsName(t *testing.T) { - t.Parallel() - type factoryCase struct { - name string - make func(ctx context.Context, set exporter.Settings, cfg *otlphttp.Config) error - } - cases := []factoryCase{ - {"metrics", func(ctx context.Context, set exporter.Settings, cfg *otlphttp.Config) error { - _, err := otlphttp.NewFactory().CreateMetrics(ctx, set, cfg) - return err //nolint:wrapcheck // test helper; raw err is what we assert on - }}, - {"traces", func(ctx context.Context, set exporter.Settings, cfg *otlphttp.Config) error { - _, err := otlphttp.NewFactory().CreateTraces(ctx, set, cfg) - return err //nolint:wrapcheck // test helper; raw err is what we assert on - }}, - {"logs", func(ctx context.Context, set exporter.Settings, cfg *otlphttp.Config) error { - _, err := otlphttp.NewFactory().CreateLogs(ctx, set, cfg) - return err //nolint:wrapcheck // test helper; raw err is what we assert on - }}, - } - for _, tc := range cases { - t.Run(tc.name, func(t *testing.T) { - t.Parallel() - core, obs := observer.New(zapcore.DebugLevel) - set := nopSettings() - set.Logger = zap.New(core) - - cfg := &otlphttp.Config{ - Endpoint: "https://example.invalid", - InsecureSkipVerify: true, - } - require.NoError(t, tc.make(t.Context(), set, cfg)) - - // At least one log entry must carry a signal field whose - // Stringer-rendered value equals the case name. Falsifies a - // regression where signal is logged as zap.Int(...) and - // arrives as 0/1/2. zap.Stringer fields store the value in - // Interface; we cast to fmt.Stringer to honor the contract - // without depending on encoder-specific text. - found := false - for _, e := range obs.All() { - for _, f := range e.Context { - if f.Key != "signal" { - continue - } - if s, ok := f.Interface.(fmt.Stringer); ok && s.String() == tc.name { - found = true - } - } - } - require.True(t, found, "signal field must render as %q via Stringer; entries=%d", tc.name, obs.Len()) - }) - } -} - -// Lower-case operator-supplied key must still arrive under the -// canonical MIME form; falsifies a regression where the -// construction-time canonicalization silently drops keys. -func TestExporter_AppliesNonCanonicalHeaderKey(t *testing.T) { - t.Parallel() - h := &recordingHandler{} - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - Headers: map[string]string{"x-custom-auth": "fake-token-abc"}, - }) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t))) - - got := h.seen() - require.Len(t, got, 1) - require.Equal(t, "fake-token-abc", got[0].Headers.Get("X-Custom-Auth"), - "non-canonical key must still resolve to the canonical wire form") -} - -// TestExporter_RetriesWhenServerHangsPastClientTimeout exercises the -// timeout-error retry path. Server intentionally blocks longer than -// the client's Timeout so the request fails with a net/http timeout -// error (not a 5xx status). The exporter must classify this as a -// retryable network error. -func TestExporter_RetriesWhenServerHangsPastClientTimeout(t *testing.T) { - t.Parallel() - var hits atomic.Int32 - h := http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { - n := hits.Add(1) - if n == 1 { - // First call hangs past the client timeout. - time.Sleep(500 * time.Millisecond) - return - } - w.WriteHeader(http.StatusOK) - }) - srv := httptest.NewServer(h) - t.Cleanup(srv.Close) - - exp := mustMetricsExporter(t, &otlphttp.Config{ - Endpoint: srv.URL, - Timeout: 100 * time.Millisecond, // shorter than the server hang - MaxRetries: 2, - InitialBackoff: 10 * time.Millisecond, - }) - // `hits.Load() >= 2` would be flaky: the first handler's - // `time.Sleep` can still be running past ConsumeMetrics return. - // require.NoError alone is sufficient — if the network-error - // retry path were broken, the first attempt's timeout error - // would surface and require.NoError would fail. - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetrics(t)), - "timeout on first attempt must retry and succeed on second") -} diff --git a/components/exporters/otlphttp/parseretry_internal_test.go b/components/exporters/otlphttp/parseretry_internal_test.go deleted file mode 100644 index 953c7b90..00000000 --- a/components/exporters/otlphttp/parseretry_internal_test.go +++ /dev/null @@ -1,91 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -// Internal-package tests for parseRetryAfter so the cases that need -// to assert specific delay values (clamping, RFC 1123 parse, trailing -// whitespace) don't have to wall-clock-wait through retry-sleep -// windows in the external integration suite. -package otlphttp - -import ( - "net/http" - "testing" - "time" - - "github.com/stretchr/testify/require" -) - -func TestParseRetryAfter_AbsentHeaderReturnsUnset(t *testing.T) { - t.Parallel() - hint := parseRetryAfter("", time.Now()) - require.False(t, hint.set) - require.Zero(t, hint.delay) -} - -func TestParseRetryAfter_IntegerSeconds(t *testing.T) { - t.Parallel() - hint := parseRetryAfter("5", time.Now()) - require.True(t, hint.set) - require.Equal(t, 5*time.Second, hint.delay) -} - -func TestParseRetryAfter_NegativeIntegerClampsToZero(t *testing.T) { - t.Parallel() - hint := parseRetryAfter("-3", time.Now()) - require.True(t, hint.set, - "a parseable but negative value is still 'set'; short-circuits backoff") - require.Zero(t, hint.delay) -} - -func TestParseRetryAfter_ClampsLargeIntegerTo60s(t *testing.T) { - t.Parallel() - hint := parseRetryAfter("86400", time.Now()) // 24 hours - require.True(t, hint.set) - require.Equal(t, maxRetryAfter, hint.delay, - "24h Retry-After must be clamped to maxRetryAfter (reviewer V23)") -} - -func TestParseRetryAfter_RFC1123DateInFuture(t *testing.T) { - t.Parallel() - now := time.Date(2026, 5, 19, 12, 0, 0, 0, time.UTC) - future := now.Add(7 * time.Second).Format(http.TimeFormat) - hint := parseRetryAfter(future, now) - require.True(t, hint.set) - // Allow 1s slack for date-format rounding inside http.ParseTime. - require.InDelta(t, (7 * time.Second).Seconds(), hint.delay.Seconds(), 1.0) -} - -func TestParseRetryAfter_RFC1123DateInPastClampsToZero(t *testing.T) { - t.Parallel() - now := time.Date(2026, 5, 19, 12, 0, 0, 0, time.UTC) - past := now.Add(-10 * time.Second).Format(http.TimeFormat) - hint := parseRetryAfter(past, now) - require.True(t, hint.set) - require.Zero(t, hint.delay) -} - -func TestParseRetryAfter_RFC1123DateBeyondCapClamps(t *testing.T) { - t.Parallel() - now := time.Date(2026, 5, 19, 12, 0, 0, 0, time.UTC) - far := now.Add(2 * time.Hour).Format(http.TimeFormat) - hint := parseRetryAfter(far, now) - require.True(t, hint.set) - require.Equal(t, maxRetryAfter, hint.delay) -} - -func TestParseRetryAfter_TrailingWhitespaceIsStripped(t *testing.T) { - t.Parallel() - // Reviewer V27: Envoy + HAProxy occasionally emit "5 " or "5\t". - for _, raw := range []string{" 5", "5 ", " 5 ", "\t5\t"} { - hint := parseRetryAfter(raw, time.Now()) - require.Truef(t, hint.set, "value %q must parse", raw) - require.Equal(t, 5*time.Second, hint.delay) - } -} - -func TestParseRetryAfter_GarbageReturnsUnset(t *testing.T) { - t.Parallel() - for _, raw := range []string{"abc", "5.5", "5seconds"} { - hint := parseRetryAfter(raw, time.Now()) - require.False(t, hint.set, "unparseable value %q must fall through", raw) - } -} diff --git a/components/exporters/otlphttp/selftel.go b/components/exporters/otlphttp/selftel.go deleted file mode 100644 index 817a1076..00000000 --- a/components/exporters/otlphttp/selftel.go +++ /dev/null @@ -1,117 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -// Exporter-scoped self-telemetry surface. Thin wrapper over -// module/pkg/selftel that pins this exporter's scope-name + instrument -// name + the kind enum. Metric names follow the upstream OTel collector -// `otelcol___` convention per RFC-0013 -// §migration v0.1.0 namespace alignment: -// `otelcol.exporter.otlphttp.calls_total{result,kind,component_id}` -// (the Prometheus exporter renders the dots as underscores). Label -// shape is preserved (`component_id`) so multi-instance disambiguation -// in dashboards is unchanged from v0.1.x. The instrumentation scope -// name is THIS exporter's Go import path. -// -// Mirrors `components/exporters/stdoutexporter/selftel.go`. Shared -// plumbing (the OTel counter, the noop fallback, the init-error -// fallback counter) lives in module/pkg/selftel. - -package otlphttp - -import ( - "context" - - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/otel/metric" - - "github.com/tracecoreai/tracecore/module/pkg/selftel" -) - -// kind is a low-cardinality error-class identifier for exporter failures. -// Exporter-local so the wire-format strings stay owned by the package -// that emits them; the canonical-Kind enforcement the deleted -// internal/selftelemetry package owned moves into RFC-0013's submodule. -type kind string - -const ( - // kindMarshal — pdata serialization or gzip compression failure. - kindMarshal kind = "marshal" - - // kindIO — network error, timeout, context cancellation, or - // shutdown abort. Transport-fault class. - kindIO kind = "io" - - // kindDownstream — server returned a permanent 4xx (not 429) or - // non-retryable 5xx OR a retryable 5xx that exhausted retries. - // Server-fault class — mirrors the v0.1.x selftelemetry.KindDownstream - // value the otlphttp package previously aliased. - kindDownstream kind = "downstream" -) - -// instrumentationScope pins the OTel scope name. Per OTel convention, -// the scope is the package's Go import path. -const instrumentationScope = "github.com/tracecoreai/tracecore/components/exporters/otlphttp" - -// callsTotalName is the operator-facing metric name for this exporter's -// per-call counter. Kept here (not in module/pkg/selftel) so the -// shared package stays unaware of caller-specific name choices. -const callsTotalName = "otelcol.exporter.otlphttp.calls_total" - -// reasonInstrumentRegister is the wire-format label value for -// init_errors_total ticks when OTel instrument registration failed at -// construction time. Re-exported from the shared package so this -// package's factory + tests don't import selftel just for the const. -const reasonInstrumentRegister = selftel.ReasonInstrumentRegister - -// errNilMeterProvider is the sentinel returned by newSelfExporter when -// called with a nil MeterProvider. Aliased to the shared sentinel so -// the factory's errors.Is check survives the migration. -var errNilMeterProvider = selftel.ErrNilMeterProvider - -// selfExporter is the exporter-scoped self-health surface used by -// otlphttp hot paths. Mirrors selftel.Exporter but carries the -// package-local `kind` type so call sites stay type-checked. -type selfExporter interface { - IncCallSuccess() - IncCallFailure(k kind) -} - -// noopSelfExporter discards every call. -type noopSelfExporter struct{} - -func newNoopSelfExporter() selfExporter { return noopSelfExporter{} } - -func (noopSelfExporter) IncCallSuccess() {} -func (noopSelfExporter) IncCallFailure(kind) {} - -var _ selfExporter = noopSelfExporter{} - -// newSelfExporter returns a real selfExporter backed by the shared -// selftel.Exporter wired at this package's scope + calls_total name. -// Returns errNilMeterProvider (== selftel.ErrNilMeterProvider) when mp -// is nil; the factory is responsible for the noop fallback + the -// init_errors_total tick via recordInitError. -func newSelfExporter(id component.ID, mp metric.MeterProvider) (selfExporter, error) { - inner, err := selftel.NewExporter(id.String(), instrumentationScope, callsTotalName, mp) - if err != nil { - return nil, err - } - return &selfExporterImpl{inner: inner}, nil -} - -// selfExporterImpl casts the package-local `kind` to string at the -// shared-package seam. Zero-cost — the cast is a compile-time op. -type selfExporterImpl struct { - inner selftel.Exporter -} - -var _ selfExporter = (*selfExporterImpl)(nil) - -func (e *selfExporterImpl) IncCallSuccess() { e.inner.IncCallSuccess() } -func (e *selfExporterImpl) IncCallFailure(k kind) { e.inner.IncCallFailure(string(k)) } - -// recordInitError forwards to the shared selftel.RecordInitError with -// this package's scope. Kept as a thin wrapper so the factory's call -// site stays identical to the pre-refactor shape. -func recordInitError(ctx context.Context, mp metric.MeterProvider, kindLabel, componentID, reason string) { - selftel.RecordInitError(ctx, mp, instrumentationScope, kindLabel, componentID, reason) -} diff --git a/components/exporters/otlphttp/selftel_test.go b/components/exporters/otlphttp/selftel_test.go deleted file mode 100644 index c592f331..00000000 --- a/components/exporters/otlphttp/selftel_test.go +++ /dev/null @@ -1,308 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package otlphttp - -import ( - "context" - "errors" - "testing" - - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/exporter" - "go.opentelemetry.io/collector/exporter/exportertest" - "go.opentelemetry.io/otel/sdk/metric/metricdata" - - selftelutil "github.com/tracecoreai/tracecore/module/pkg/testutil/selftel" -) - -// TestOtlphttp_NoopAlwaysSafe pins: newNoopSelfExporter returns a -// value whose hot-path methods never panic and silently discard. Every -// Consume* path calls into the selfExporter surface; nil-checks at each -// call site are forbidden, so the noop must be a real value. -func TestOtlphttp_NoopAlwaysSafe(t *testing.T) { - se := newNoopSelfExporter() - defer func() { - if r := recover(); r != nil { - t.Fatalf("noop panicked: %v", r) - } - }() - se.IncCallSuccess() - se.IncCallSuccess() - se.IncCallFailure(kindMarshal) - se.IncCallFailure(kindIO) - se.IncCallFailure(kindDownstream) -} - -// TestOtlphttp_NewExporter_NilProviderErrors pins: newSelfExporter -// returns errNilMeterProvider when called with a nil provider rather than -// silently substituting noop — the factory is responsible for the fallback -// + the recordInitError tick. Mirrors the stdoutexporter sibling contract. -func TestOtlphttp_NewExporter_NilProviderErrors(t *testing.T) { - _, err := newSelfExporter(testID(), nil) - if !errors.Is(err, errNilMeterProvider) { - t.Fatalf("err = %v, want errNilMeterProvider", err) - } -} - -// TestOtlphttp_EmitsCallsTotal_WithResultKindAndComponentID pins the -// M2 metric contract for exporters. After IncCallSuccess() ×2 + -// IncCallFailure(kindMarshal) ×1 + IncCallFailure(kindIO) ×1 + -// IncCallFailure(kindDownstream) ×1, the ManualReader collects -// otelcol.exporter.otlphttp.calls_total with datapoints partitioned by result and -// (for failures) kind, labeled with the component_id. A regression that -// drops the kind label, the component_id label, the result label, or the -// metric-name prefix fails here. -func TestOtlphttp_EmitsCallsTotal_WithResultKindAndComponentID(t *testing.T) { - mp, rdr := selftelutil.NewTestMeterProvider(t) - se, err := newSelfExporter(testID(), mp) - if err != nil { - t.Fatalf("newSelfExporter: %v", err) - } - se.IncCallSuccess() - se.IncCallSuccess() - se.IncCallFailure(kindMarshal) - se.IncCallFailure(kindIO) - se.IncCallFailure(kindDownstream) - - rm := selftelutil.CollectRM(t, rdr) - m, ok := selftelutil.FindInstrument(rm, "otelcol.exporter.otlphttp.calls_total") - if !ok { - t.Fatalf("metric otelcol.exporter.otlphttp.calls_total absent; have: %s", selftelutil.DumpNames(rm)) - } - sum, ok := m.Data.(metricdata.Sum[int64]) - if !ok { - t.Fatalf("calls_total data shape: got %T, want metricdata.Sum[int64]", m.Data) - } - - // Index datapoints by a (result,kind) key once; per-case assertions - // below stay readable + the function stays under gocyclo's 15-edge - // budget. component_id is asserted per-datapoint inside the loop. - got := map[string]int{} - for _, dp := range sum.DataPoints { - if !selftelutil.KVMatch(dp, map[string]string{"component_id": "otlphttp/test"}) { - t.Errorf("datapoint missing component_id=otlphttp/test: %v", dp.Attributes) - continue - } - result, _ := dp.Attributes.Value("result") - kindVal, _ := dp.Attributes.Value("kind") - key := result.AsString() + "/" + kindVal.AsString() - got[key] += int(dp.Value) - } - wantCounts := map[string]int{ - "success/": 2, - "failure/marshal": 1, - "failure/io": 1, - "failure/downstream": 1, - } - for key, want := range wantCounts { - if got[key] != want { - t.Errorf("calls_total[%q]: got %d, want %d (all=%v)", key, got[key], want, got) - } - } -} - -// TestOtlphttp_ScopeNameIsExporterImportPath pins the OTel scope-name -// standard: instrumentation scope = exporter's Go import path. This anchors -// the PR-B1-style decision (vs reusing the deleted internal/selftelemetry -// scope) so a future drift back to the internal name fails here. -func TestOtlphttp_ScopeNameIsExporterImportPath(t *testing.T) { - mp, rdr := selftelutil.NewTestMeterProvider(t) - se, err := newSelfExporter(testID(), mp) - if err != nil { - t.Fatalf("newSelfExporter: %v", err) - } - se.IncCallSuccess() - rm := selftelutil.CollectRM(t, rdr) - scope, ok := selftelutil.ScopeOf(rm, "otelcol.exporter.otlphttp.calls_total") - if !ok { - t.Fatalf("calls_total absent") - } - const wantScope = "github.com/tracecoreai/tracecore/components/exporters/otlphttp" - if scope != wantScope { - t.Errorf("instrumentation scope: got %q, want %q", scope, wantScope) - } -} - -// TestOtlphttp_RecordInitError_TicksInitErrorsCounter pins: when factory wiring -// fails (newSelfExporter returns an error), recordInitError surfaces a -// otelcol.selftelemetry.init_errors_total tick with kind="exporter", -// the component_id label, and reason="instrument_register". This is the -// only signal that an exporter fell back to noop telemetry; dropping the -// recordInitError call must fail this test. -func TestOtlphttp_RecordInitError_TicksInitErrorsCounter(t *testing.T) { - mp, rdr := selftelutil.NewTestMeterProvider(t) - recordInitError(context.Background(), mp, "exporter", testID().String(), reasonInstrumentRegister) - - rm := selftelutil.CollectRM(t, rdr) - m, ok := selftelutil.FindInstrument(rm, "otelcol.selftelemetry.init_errors_total") - if !ok { - t.Fatalf("init_errors_total absent; have: %s", selftelutil.DumpNames(rm)) - } - sum, ok := m.Data.(metricdata.Sum[int64]) - if !ok { - t.Fatalf("init_errors_total data shape: got %T, want metricdata.Sum[int64]", m.Data) - } - if len(sum.DataPoints) != 1 { - t.Fatalf("init_errors datapoints: got %d, want 1", len(sum.DataPoints)) - } - dp := sum.DataPoints[0] - want := map[string]string{ - "kind": "exporter", - "component_id": "otlphttp/test", - "reason": reasonInstrumentRegister, - } - if !selftelutil.KVMatch(dp, want) { - t.Errorf("init_errors attrs: got %v, want %v", dp.Attributes, want) - } - if dp.Value != 1 { - t.Errorf("init_errors value: got %d, want 1", dp.Value) - } -} - -// TestOtlphttp_RecordInitError_NilProviderIsSafe pins: a nil MeterProvider must -// not panic — recordInitError IS the fallback path; crashing here would -// turn a partial degradation into a process kill. -func TestOtlphttp_RecordInitError_NilProviderIsSafe(t *testing.T) { - defer func() { - if r := recover(); r != nil { - t.Fatalf("recordInitError(nil) panicked: %v", r) - } - }() - recordInitError(context.Background(), nil, "exporter", "x/y", reasonInstrumentRegister) -} - -// TestOtlphttp_FallsBackToNoopWhenMeterFails pins the factory -// observability contract end-to-end: when newSelfExporter returns an -// error (synthetic register failure for every otelcol.exporter.otlphttp.* -// instrument), the factory MUST (1) leave the exporter with a working -// noop telemetry field (no nil, no panic on hot-path calls), AND (2) -// tick otelcol.selftelemetry.init_errors_total via recordInitError. -// Mirrors the stdoutexporter sibling test seam. -func TestOtlphttp_FallsBackToNoopWhenMeterFails(t *testing.T) { - mp, rdr := selftelutil.NewTestMeterProvider(t) - failing := selftelutil.NewFailingMeterProvider(mp, "otelcol.exporter.otlphttp.") - - set := testSettings() - set.MeterProvider = failing - cfg := &Config{Endpoint: "http://127.0.0.1:1"} - e, err := createMetrics(context.Background(), set, cfg) - if err != nil { - t.Fatalf("createMetrics: %v", err) - } - exp, ok := e.(*otlpExporter) - if !ok { - t.Fatalf("exporter type: got %T, want *otlpExporter", e) - } - if exp.telemetry == nil { - t.Fatal("telemetry field nil after failed wiring; must fall back to noop") - } - // Hot-path call must not panic + must not surface (noop discards). - exp.telemetry.IncCallSuccess() - exp.telemetry.IncCallFailure(kindIO) - - rm := selftelutil.CollectRM(t, rdr) - if m, ok := selftelutil.FindInstrument(rm, "otelcol.exporter.otlphttp.calls_total"); ok { - if sum, ok := m.Data.(metricdata.Sum[int64]); ok && len(sum.DataPoints) > 0 { - t.Errorf("noop fallback leaked Inc* into calls_total datapoints: %v", sum.DataPoints) - } - } - m, ok := selftelutil.FindInstrument(rm, "otelcol.selftelemetry.init_errors_total") - if !ok { - t.Fatalf("init_errors_total absent after factory fallback; have: %s", selftelutil.DumpNames(rm)) - } - sum, ok := m.Data.(metricdata.Sum[int64]) - if !ok { - t.Fatalf("init_errors_total data shape: got %T", m.Data) - } - if len(sum.DataPoints) != 1 || sum.DataPoints[0].Value != 1 { - t.Errorf("init_errors_total: want 1 datapoint value=1, got %v", sum.DataPoints) - } -} - -// TestOtlphttp_FallsBackToNoopWhenMeterProviderIsNil pins the -// nil-MeterProvider symmetry of the register-failure fallback: when -// `set.Telemetry.MeterProvider` is nil at construction (no telemetry -// wired), the factory MUST (1) return without error, (2) leave the -// exporter with a working noop telemetry field (no nil, no panic on -// hot-path calls), and (3) emit no datapoints anywhere — there's no -// MeterProvider to scrape. The skip-tick semantic for the nil path is -// intentional: `recordInitError` is only meaningful when telemetry is -// wired but instrument registration failed; a nil provider means the -// operator opted out of telemetry entirely, so a phantom counter would -// be noise. Mirrors `TestOtlphttp_FallsBackToNoopWhenMeterFails` minus -// the failing wrapper. -func TestOtlphttp_FallsBackToNoopWhenMeterProviderIsNil(t *testing.T) { - set := testSettings() - set.MeterProvider = nil - cfg := &Config{Endpoint: "http://127.0.0.1:1"} - - e, err := createMetrics(context.Background(), set, cfg) - if err != nil { - t.Fatalf("createMetrics: %v", err) - } - exp, ok := e.(*otlpExporter) - if !ok { - t.Fatalf("exporter type: got %T, want *otlpExporter", e) - } - if exp.telemetry == nil { - t.Fatal("telemetry field nil after nil-MeterProvider construction; must fall back to noop") - } - // Hot-path calls must not panic. No MeterProvider exists, so there - // is also nothing to scrape — the noop discards by definition. - defer func() { - if r := recover(); r != nil { - t.Fatalf("noop telemetry panicked on hot path: %v", r) - } - }() - exp.telemetry.IncCallSuccess() - exp.telemetry.IncCallFailure(kindIO) -} - -// asSelfExporter is a compile-time pin: it accepts the package-local -// selfExporter interface only. If a future refactor moves the type -// back into internal/selftelemetry (e.g. reintroduces the -// selftelemetry.Exporter alias), this function's signature breaks + -// every caller fails compile. Pairs with the kind-value asserts below -// to pin the sibling-types contract that PR-B1 established. -func asSelfExporter(se selfExporter) selfExporter { return se } - -// TestOtlphttp_SiblingTypesArePackageLocal pins the PR-B1 sibling -// contract: the otlphttp package owns its own selfExporter + kind -// types — they must NOT come from internal/selftelemetry. If a future -// refactor reintroduces that import, the asSelfExporter signature -// changes type → break compile here. The kind-value asserts pin the -// wire-format strings ("marshal", "io", "downstream") that operators -// query — the README's "Self-telemetry labels" table mandates exactly -// these three values for the otlphttp exporter. -func TestOtlphttp_SiblingTypesArePackageLocal(t *testing.T) { - iface := asSelfExporter(newNoopSelfExporter()) - iface.IncCallSuccess() - iface.IncCallFailure(kindMarshal) - iface.IncCallFailure(kindDownstream) - - if string(kindMarshal) != "marshal" { - t.Errorf("kindMarshal: got %q, want %q", string(kindMarshal), "marshal") - } - if string(kindIO) != "io" { - t.Errorf("kindIO: got %q, want %q", string(kindIO), "io") - } - if string(kindDownstream) != "downstream" { - t.Errorf("kindDownstream: got %q, want %q", string(kindDownstream), "downstream") - } -} - -// testSettings returns exporter.Settings sourced from exportertest's -// upstream nop helper (so BuildInfo + TelemetrySettings track upstream -// without manual updates), with the ID overridden to a stable -// "otlphttp/test" — selftel assertions assert that literal label, and -// a per-run UUID would defeat them. The wrapper lives here, not in the -// production package, so production code never grows a test-only surface. -func testSettings() exporter.Settings { - set := exportertest.NewNopSettings(componentType()) - set.ID = component.NewIDWithName(componentType(), "test") - return set -} - -func testID() component.ID { - return component.NewIDWithName(componentType(), "test") -} diff --git a/components/exporters/otlphttp/signal_internal_test.go b/components/exporters/otlphttp/signal_internal_test.go deleted file mode 100644 index 0fb27e22..00000000 --- a/components/exporters/otlphttp/signal_internal_test.go +++ /dev/null @@ -1,26 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package otlphttp - -import ( - "testing" - - "github.com/stretchr/testify/require" -) - -func TestSignal_StringMatchesOTLPPathSuffix(t *testing.T) { - t.Parallel() - cases := map[signal]string{ - signalMetrics: "metrics", - signalTraces: "traces", - signalLogs: "logs", - } - for sig, want := range cases { - require.Equal(t, want, sig.String()) - } -} - -func TestSignal_StringUnknownIsLabeled(t *testing.T) { - t.Parallel() - require.Equal(t, "unknown(42)", signal(42).String()) -} diff --git a/components/exporters/stdoutexporter/README.md b/components/exporters/stdoutexporter/README.md deleted file mode 100644 index da0e84ae..00000000 --- a/components/exporters/stdoutexporter/README.md +++ /dev/null @@ -1,66 +0,0 @@ -# stdoutexporter - -**Stability:** development - -Writes one OTLP/JSON-encoded line per `ConsumeMetrics` call to a -configurable `io.Writer` (default `os.Stdout`). Acts as the -copy-worthy shape for real exporters (M8+); the chart default has -since moved to upstream `debug` per RFC-0013 PR-A2, but this exporter -remains the bundled in-tree example for the OCB-assembled distro. - -## Configuration - -No operator-configurable fields. `os.Stdout` is the only write target -in production; tests inject an `io.Writer` via the unexported -`yaml:"-"` `Out` field on `Config`. - -## Example - -```yaml -exporters: - stdoutexporter: {} -``` - -## Output - -One line per `ConsumeMetrics` call. Each line is the OTLP/JSON -encoding (via `pmetric.JSONMarshaler`) of the entire -`pmetric.Metrics` value, terminated by `\n`. Empty -`pmetric.Metrics` (zero metrics) produces zero output — operators -don't see empty lines for empty pushes. - -## Signals supported - -| Signal | Supported | -|---------|-----------| -| Metrics | yes | -| Traces | no — `pipeline.ErrSignalNotSupported` | -| Logs | no — `pipeline.ErrSignalNotSupported` | - -M8+ may extend by implementing the missing signal paths in -`stdoutexporter.go`; the factory's `CreateTraces`/`CreateLogs` -methods can return real exporters once those paths exist. - -## Capabilities - -`MutatesData: false` — the exporter only reads the incoming payload. -Fan-out (when introduced) may share a single read-only payload with -this exporter instead of cloning. - -## Limitations - -- Synchronous writes. A slow `io.Writer` blocks the calling pipeline - stage. Not suitable for high-throughput production paths — that's - what real exporters with queue + retry are for (`exporterhelper` lives outside this RFC, per `docs/rfcs/archived/0004-clockreceiver-stdoutexporter.md`). -- Concurrent `ConsumeMetrics` calls are serialized via an internal - mutex so JSON lines don't interleave in the writer. -- No log rotation, no buffering, no batching. - -## Implementation notes - -- Embeds `pipeline.ComponentState` for lifecycle bookkeeping. -- `writeMu sync.Mutex` serializes writes; pdata is not thread-safe - but each `ConsumeMetrics` gets its own value, so the only race - risk is writer interleaving. -- Package-var factory: `stdoutexporter.Factory` is the shape M8+ - exporter authors should mirror. diff --git a/components/exporters/stdoutexporter/config.go b/components/exporters/stdoutexporter/config.go deleted file mode 100644 index bb1f20ab..00000000 --- a/components/exporters/stdoutexporter/config.go +++ /dev/null @@ -1,27 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -// Package stdoutexporter is the canonical example exporter. It writes -// one OTLP/JSON-encoded line per ConsumeMetrics call to a configurable -// io.Writer (default os.Stdout). Used as a debug/example exporter for -// the M1 pipeline contract (clockreceiver, its original heartbeat -// pair, was deleted in RFC-0013 PR-K.2 — operators now pair this with -// upstream hostmetricsreceiver for the equivalent end-to-end walk). -package stdoutexporter - -import "io" - -// Config is the operator-facing YAML shape. The Out field is -// exported so tests can inject a *bytes.Buffer, but yaml:"-" so it -// is not an operator-configurable knob (production always writes to -// os.Stdout via the factory's default). -type Config struct { - // Out is the writer JSON lines are sent to. Defaults to os.Stdout - // via CreateDefaultConfig; tests override directly before - // CreateMetrics is called. - Out io.Writer `yaml:"-"` -} - -// Validate is a no-op — there are no operator-configurable fields to -// reject. Defined so Config satisfies component.Config (the upstream -// interface requires Validate()). -func (*Config) Validate() error { return nil } diff --git a/components/exporters/stdoutexporter/example_config.yaml b/components/exporters/stdoutexporter/example_config.yaml deleted file mode 100644 index 95d6a026..00000000 --- a/components/exporters/stdoutexporter/example_config.yaml +++ /dev/null @@ -1,4 +0,0 @@ -# Minimal working stdoutexporter config. -# Writes one OTLP/JSON-encoded line per ConsumeMetrics call to stdout. -exporters: - stdoutexporter: {} diff --git a/components/exporters/stdoutexporter/factory.go b/components/exporters/stdoutexporter/factory.go deleted file mode 100644 index 4de9f98f..00000000 --- a/components/exporters/stdoutexporter/factory.go +++ /dev/null @@ -1,70 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package stdoutexporter - -import ( - "context" - "fmt" - "os" - - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/exporter" - "go.uber.org/zap" -) - -// componentType is the kind name registered in components.yaml. -// Wrapped in a function so the MustNewType call is not a top-level side -// effect (mirrors the nccl_fr / pyspy pattern). -func componentType() component.Type { return component.MustNewType("stdoutexporter") } - -// stability is the OCB-surfaced stability level for stdoutexporter's -// metrics signal. Beta tracks "metrics + label shape pinned; behavior -// may evolve" — same level the exporter has carried since the v0.1.x -// internal/pipeline factory; PR-B (exporter port) preserves it across -// the upstream swap. -const stability = component.StabilityLevelBeta - -// NewFactory returns the upstream exporter.Factory for stdoutexporter. -// Mirrors the upstream-contrib pattern (debugexporter, fileexporter) — -// callers construct via `stdoutexporter.NewFactory()` rather than a -// package var, so each OCB-stitched pipeline gets a freshly-built -// factory and the package surface stays a single exported symbol. -// -// Only the metrics signal returns a real Exporter; logs and traces -// surface upstream's "signal not supported" via exporter.NewFactory's -// default unimplemented behavior. -func NewFactory() exporter.Factory { - return exporter.NewFactory( - componentType(), - createDefaultConfig, - exporter.WithMetrics(createMetrics, stability), - ) -} - -// createDefaultConfig matches upstream component.CreateDefaultConfigFunc. -func createDefaultConfig() component.Config { - return &Config{Out: os.Stdout} -} - -// createMetrics is the exporter.CreateMetricsFunc wired by WithMetrics. -func createMetrics(ctx context.Context, set exporter.Settings, cfg component.Config) (exporter.Metrics, error) { - c, ok := cfg.(*Config) - if !ok { - return nil, fmt.Errorf("stdoutexporter: unexpected config type %T", cfg) - } - e := newExporter(c) - if set.MeterProvider != nil { - if se, err := newSelfExporter(set.ID, set.MeterProvider); err == nil { - e.telemetry = se - } else { - recordInitError(ctx, set.MeterProvider, - "exporter", set.ID.String(), reasonInstrumentRegister) - if set.Logger != nil { - set.Logger.Warn("stdoutexporter self-telemetry init failed; using noop", zap.Error(err)) - } - } - } else if set.Logger != nil { - set.Logger.Warn("stdoutexporter: no MeterProvider; self-telemetry using noop") - } - return e, nil -} diff --git a/components/exporters/stdoutexporter/selftel.go b/components/exporters/stdoutexporter/selftel.go deleted file mode 100644 index 48baf415..00000000 --- a/components/exporters/stdoutexporter/selftel.go +++ /dev/null @@ -1,110 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -// Exporter-scoped self-telemetry surface. Thin wrapper over -// module/pkg/selftel that pins this exporter's scope-name + instrument -// name + the kind enum. Metric names follow the upstream OTel collector -// `otelcol___` convention per RFC-0013 -// §migration v0.1.0 namespace alignment: -// `otelcol.exporter.stdoutexporter.calls_total{result,kind,component_id}` -// (the Prometheus exporter renders the dots as underscores). Label -// shape is preserved (`component_id`) so multi-instance disambiguation -// in dashboards is unchanged from v0.1.x. The instrumentation scope -// name is THIS exporter's Go import path. -// -// Mirrors `components/exporters/otlphttp/selftel.go`. Shared plumbing -// (the OTel counter, the noop fallback, the init-error fallback -// counter) lives in module/pkg/selftel. - -package stdoutexporter - -import ( - "context" - - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/otel/metric" - - "github.com/tracecoreai/tracecore/module/pkg/selftel" -) - -// kind is a low-cardinality error-class identifier for exporter failures. -// Mirrors the v0.1.x internal/selftelemetry.Kind type; exporter-local -// so the wire-format strings stay owned by the package that emits them. -type kind string - -const ( - // kindMarshal — `marshaler.MarshalMetrics` returned an error. - kindMarshal kind = "marshal" - - // kindIO — write to the configured `io.Writer` returned a - // non-nil error (closed stream, disk full, etc.). - kindIO kind = "io" -) - -// instrumentationScope pins the OTel scope name. Per OTel convention, -// the scope is the package's Go import path. -const instrumentationScope = "github.com/tracecoreai/tracecore/components/exporters/stdoutexporter" - -// callsTotalName is the operator-facing metric name for this exporter's -// per-call counter. Kept here (not in module/pkg/selftel) so the -// shared package stays unaware of caller-specific name choices. -const callsTotalName = "otelcol.exporter.stdoutexporter.calls_total" - -// reasonInstrumentRegister is the wire-format label value for -// init_errors_total ticks when OTel instrument registration failed at -// construction time. Re-exported from the shared package so this -// package's factory + tests don't import selftel just for the const. -const reasonInstrumentRegister = selftel.ReasonInstrumentRegister - -// errNilMeterProvider is the sentinel returned by newSelfExporter when -// called with a nil MeterProvider. Aliased to the shared sentinel so -// the factory's errors.Is check survives the migration. -var errNilMeterProvider = selftel.ErrNilMeterProvider - -// selfExporter is the exporter-scoped self-health surface used by -// stdoutexporter hot paths. Mirrors selftel.Exporter but carries the -// package-local `kind` type so call sites stay type-checked. -type selfExporter interface { - IncCallSuccess() - IncCallFailure(k kind) -} - -// noopSelfExporter discards every call. -type noopSelfExporter struct{} - -func newNoopSelfExporter() selfExporter { return noopSelfExporter{} } - -func (noopSelfExporter) IncCallSuccess() {} -func (noopSelfExporter) IncCallFailure(kind) {} - -var _ selfExporter = noopSelfExporter{} - -// newSelfExporter returns a real selfExporter backed by the shared -// selftel.Exporter wired at this package's scope + calls_total name. -// Returns errNilMeterProvider (== selftel.ErrNilMeterProvider) when mp -// is nil; the factory is responsible for the noop fallback + the -// init_errors_total tick via recordInitError. -func newSelfExporter(id component.ID, mp metric.MeterProvider) (selfExporter, error) { - inner, err := selftel.NewExporter(id.String(), instrumentationScope, callsTotalName, mp) - if err != nil { - return nil, err - } - return &selfExporterImpl{inner: inner}, nil -} - -// selfExporterImpl casts the package-local `kind` to string at the -// shared-package seam. Zero-cost — the cast is a compile-time op. -type selfExporterImpl struct { - inner selftel.Exporter -} - -var _ selfExporter = (*selfExporterImpl)(nil) - -func (e *selfExporterImpl) IncCallSuccess() { e.inner.IncCallSuccess() } -func (e *selfExporterImpl) IncCallFailure(k kind) { e.inner.IncCallFailure(string(k)) } - -// recordInitError forwards to the shared selftel.RecordInitError with -// this package's scope. Kept as a thin wrapper so the factory's call -// site stays identical to the pre-refactor shape. -func recordInitError(ctx context.Context, mp metric.MeterProvider, kindLabel, componentID, reason string) { - selftel.RecordInitError(ctx, mp, instrumentationScope, kindLabel, componentID, reason) -} diff --git a/components/exporters/stdoutexporter/selftel_test.go b/components/exporters/stdoutexporter/selftel_test.go deleted file mode 100644 index bf055cf1..00000000 --- a/components/exporters/stdoutexporter/selftel_test.go +++ /dev/null @@ -1,295 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package stdoutexporter - -import ( - "bytes" - "context" - "errors" - "testing" - - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/exporter" - "go.opentelemetry.io/collector/exporter/exportertest" - "go.opentelemetry.io/otel/sdk/metric/metricdata" - - selftelutil "github.com/tracecoreai/tracecore/module/pkg/testutil/selftel" -) - -// TestSelfTelemetry_NoopAlwaysSafe pins: newNoopSelfExporter returns a -// value whose hot-path methods never panic and silently discard. Every -// ConsumeMetrics path calls into the selfExporter surface; nil-checks at -// each call site are forbidden, so the noop must be a real value. -func TestSelfTelemetry_NoopAlwaysSafe(t *testing.T) { - se := newNoopSelfExporter() - defer func() { - if r := recover(); r != nil { - t.Fatalf("noop panicked: %v", r) - } - }() - se.IncCallSuccess() - se.IncCallSuccess() - se.IncCallFailure(kindMarshal) - se.IncCallFailure(kindIO) -} - -// TestSelfTelemetry_NewExporter_NilProviderErrors pins: newSelfExporter -// returns errNilMeterProvider when called with a nil provider rather than -// silently substituting noop — the factory is responsible for the fallback -// + the recordInitError tick. Mirrors the receiver-side contract. -func TestSelfTelemetry_NewExporter_NilProviderErrors(t *testing.T) { - _, err := newSelfExporter(testID(), nil) - if !errors.Is(err, errNilMeterProvider) { - t.Fatalf("err = %v, want errNilMeterProvider", err) - } -} - -// TestSelfTelemetry_EmitsCallsTotal_WithResultKindAndComponentID pins the -// M2 metric contract for exporters. After IncCallSuccess() ×2 + -// IncCallFailure(kindMarshal) ×1 + IncCallFailure(kindIO) ×1, the -// ManualReader collects otelcol.exporter.stdoutexporter.calls_total with datapoints -// partitioned by result and (for failures) kind, labeled with the -// component_id. A regression that drops the kind label, the component_id -// label, the result label, or the metric-name prefix fails here. -func TestSelfTelemetry_EmitsCallsTotal_WithResultKindAndComponentID(t *testing.T) { - mp, rdr := selftelutil.NewTestMeterProvider(t) - se, err := newSelfExporter(testID(), mp) - if err != nil { - t.Fatalf("newSelfExporter: %v", err) - } - se.IncCallSuccess() - se.IncCallSuccess() - se.IncCallFailure(kindMarshal) - se.IncCallFailure(kindIO) - - rm := selftelutil.CollectRM(t, rdr) - m, ok := selftelutil.FindInstrument(rm, "otelcol.exporter.stdoutexporter.calls_total") - if !ok { - t.Fatalf("metric otelcol.exporter.stdoutexporter.calls_total absent; have: %s", selftelutil.DumpNames(rm)) - } - sum, ok := m.Data.(metricdata.Sum[int64]) - if !ok { - t.Fatalf("calls_total data shape: got %T, want metricdata.Sum[int64]", m.Data) - } - - // Index datapoints by a (result,kind) key once; per-case assertions - // below stay readable + the function stays under gocyclo's 15-edge - // budget. component_id is asserted per-datapoint inside the loop. - got := map[string]int{} - for _, dp := range sum.DataPoints { - if !selftelutil.KVMatch(dp, map[string]string{"component_id": "stdoutexporter/test"}) { - t.Errorf("datapoint missing component_id=stdoutexporter/test: %v", dp.Attributes) - continue - } - result, _ := dp.Attributes.Value("result") - kindVal, _ := dp.Attributes.Value("kind") - key := result.AsString() + "/" + kindVal.AsString() - got[key] += int(dp.Value) - } - wantCounts := map[string]int{ - "success/": 2, - "failure/marshal": 1, - "failure/io": 1, - } - for key, want := range wantCounts { - if got[key] != want { - t.Errorf("calls_total[%q]: got %d, want %d (all=%v)", key, got[key], want, got) - } - } -} - -// TestSelfTelemetry_ScopeNameIsExporterImportPath pins the OTel scope-name -// standard: instrumentation scope = exporter's Go import path. This anchors -// the PR-B1-style decision (vs reusing the deleted internal/selftelemetry -// scope) so a future drift back to the internal name fails here. -func TestSelfTelemetry_ScopeNameIsExporterImportPath(t *testing.T) { - mp, rdr := selftelutil.NewTestMeterProvider(t) - se, err := newSelfExporter(testID(), mp) - if err != nil { - t.Fatalf("newSelfExporter: %v", err) - } - se.IncCallSuccess() - rm := selftelutil.CollectRM(t, rdr) - scope, ok := selftelutil.ScopeOf(rm, "otelcol.exporter.stdoutexporter.calls_total") - if !ok { - t.Fatalf("calls_total absent") - } - const wantScope = "github.com/tracecoreai/tracecore/components/exporters/stdoutexporter" - if scope != wantScope { - t.Errorf("instrumentation scope: got %q, want %q", scope, wantScope) - } -} - -// TestRecordInitError_TicksInitErrorsCounter pins: when factory wiring -// fails (newSelfExporter returns an error), recordInitError surfaces a -// otelcol.selftelemetry.init_errors_total tick with kind="exporter", -// the component_id label, and reason="instrument_register". This is the -// only signal that an exporter fell back to noop telemetry; dropping the -// recordInitError call must fail this test. -func TestRecordInitError_TicksInitErrorsCounter(t *testing.T) { - mp, rdr := selftelutil.NewTestMeterProvider(t) - recordInitError(context.Background(), mp, "exporter", testID().String(), reasonInstrumentRegister) - - rm := selftelutil.CollectRM(t, rdr) - m, ok := selftelutil.FindInstrument(rm, "otelcol.selftelemetry.init_errors_total") - if !ok { - t.Fatalf("init_errors_total absent; have: %s", selftelutil.DumpNames(rm)) - } - sum, ok := m.Data.(metricdata.Sum[int64]) - if !ok { - t.Fatalf("init_errors_total data shape: got %T, want metricdata.Sum[int64]", m.Data) - } - if len(sum.DataPoints) != 1 { - t.Fatalf("init_errors datapoints: got %d, want 1", len(sum.DataPoints)) - } - dp := sum.DataPoints[0] - want := map[string]string{ - "kind": "exporter", - "component_id": "stdoutexporter/test", - "reason": reasonInstrumentRegister, - } - if !selftelutil.KVMatch(dp, want) { - t.Errorf("init_errors attrs: got %v, want %v", dp.Attributes, want) - } - if dp.Value != 1 { - t.Errorf("init_errors value: got %d, want 1", dp.Value) - } -} - -// TestRecordInitError_NilProviderIsSafe pins: a nil MeterProvider must -// not panic — recordInitError IS the fallback path; crashing here would -// turn a partial degradation into a process kill. -func TestRecordInitError_NilProviderIsSafe(t *testing.T) { - defer func() { - if r := recover(); r != nil { - t.Fatalf("recordInitError(nil) panicked: %v", r) - } - }() - recordInitError(context.Background(), nil, "exporter", "x/y", reasonInstrumentRegister) -} - -// TestFactory_FallsBackToNoopWhenMeterFails pins the factory -// observability contract end-to-end: when newSelfExporter returns an -// error (synthetic register failure for every otelcol.exporter.stdoutexporter.* -// instrument), the factory MUST (1) leave the exporter with a working -// noop telemetry field (no nil, no panic on hot-path calls), AND (2) -// tick otelcol.selftelemetry.init_errors_total via recordInitError. -// Mirrors the nccl_fr sibling test seam. -func TestFactory_FallsBackToNoopWhenMeterFails(t *testing.T) { - mp, rdr := selftelutil.NewTestMeterProvider(t) - failing := selftelutil.NewFailingMeterProvider(mp, "otelcol.exporter.stdoutexporter.") - - set := exportertest.NewNopSettings(componentType()) - set.ID = component.NewIDWithName(componentType(), "test") - set.MeterProvider = failing - cfg := &Config{Out: &bytes.Buffer{}} - e, err := NewFactory().CreateMetrics(context.Background(), set, cfg) - if err != nil { - t.Fatalf("CreateMetrics: %v", err) - } - exp, ok := e.(*stdoutExporter) - if !ok { - t.Fatalf("exporter type: got %T, want *stdoutExporter", e) - } - if exp.telemetry == nil { - t.Fatal("telemetry field nil after failed wiring; must fall back to noop") - } - // Hot-path call must not panic + must not surface (noop discards). - exp.telemetry.IncCallSuccess() - exp.telemetry.IncCallFailure(kindIO) - - rm := selftelutil.CollectRM(t, rdr) - if m, ok := selftelutil.FindInstrument(rm, "otelcol.exporter.stdoutexporter.calls_total"); ok { - if sum, ok := m.Data.(metricdata.Sum[int64]); ok && len(sum.DataPoints) > 0 { - t.Errorf("noop fallback leaked Inc* into calls_total datapoints: %v", sum.DataPoints) - } - } - m, ok := selftelutil.FindInstrument(rm, "otelcol.selftelemetry.init_errors_total") - if !ok { - t.Fatalf("init_errors_total absent after factory fallback; have: %s", selftelutil.DumpNames(rm)) - } - sum, ok := m.Data.(metricdata.Sum[int64]) - if !ok { - t.Fatalf("init_errors_total data shape: got %T", m.Data) - } - if len(sum.DataPoints) != 1 || sum.DataPoints[0].Value != 1 { - t.Errorf("init_errors_total: want 1 datapoint value=1, got %v", sum.DataPoints) - } -} - -// TestFactory_FallsBackToNoopWhenMeterProviderIsNil pins the -// nil-MeterProvider symmetry of the register-failure fallback: when -// `set.MeterProvider` is nil at construction (no telemetry wired), the -// factory MUST (1) return without error, (2) leave the exporter with a -// working noop telemetry field (no nil, no panic on hot-path calls), -// and (3) emit no datapoints anywhere — there's no MeterProvider to -// scrape. The skip-tick semantic for the nil path is intentional: -// `recordInitError` is only meaningful when telemetry is wired but -// instrument registration failed; a nil provider means the operator -// opted out of telemetry entirely, so a phantom counter would be -// noise. Mirrors `TestFactory_FallsBackToNoopWhenMeterFails` minus the -// failingExporterMP wrapper. -func TestFactory_FallsBackToNoopWhenMeterProviderIsNil(t *testing.T) { - set := exportertest.NewNopSettings(componentType()) - set.ID = component.NewIDWithName(componentType(), "test") - set.MeterProvider = nil - cfg := &Config{Out: &bytes.Buffer{}} - - e, err := NewFactory().CreateMetrics(context.Background(), set, cfg) - if err != nil { - t.Fatalf("CreateMetrics: %v", err) - } - exp, ok := e.(*stdoutExporter) - if !ok { - t.Fatalf("exporter type: got %T, want *stdoutExporter", e) - } - if exp.telemetry == nil { - t.Fatal("telemetry field nil after nil-MeterProvider construction; must fall back to noop") - } - // Hot-path calls must not panic. No MeterProvider exists, so there - // is also nothing to scrape — the noop discards by definition. - defer func() { - if r := recover(); r != nil { - t.Fatalf("noop telemetry panicked on hot path: %v", r) - } - }() - exp.telemetry.IncCallSuccess() - exp.telemetry.IncCallFailure(kindIO) -} - -// asSelfExporter is a compile-time pin: it accepts the package-local -// selfExporter interface only. If a future refactor moves the type -// back into internal/selftelemetry (e.g. reintroduces the -// selftelemetry.Exporter alias), this function's signature breaks + -// every caller fails compile. Pairs with the kind-value asserts below -// to pin the sibling-types contract that PR-B1 established. -func asSelfExporter(se selfExporter) selfExporter { return se } - -// TestSelfExporter_SiblingTypesArePackageLocal pins the PR-B1 sibling -// contract: the stdoutexporter package owns its own selfExporter + -// kind types — they must NOT come from internal/selftelemetry. If a -// future refactor reintroduces that import, the asSelfExporter -// signature changes type → break compile here. The kind-value asserts -// pin the wire-format strings ("marshal", "io") that operators query. -func TestSelfExporter_SiblingTypesArePackageLocal(t *testing.T) { - iface := asSelfExporter(newNoopSelfExporter()) - iface.IncCallSuccess() - iface.IncCallFailure(kindMarshal) - - if string(kindMarshal) != "marshal" { - t.Errorf("kindMarshal: got %q, want %q", string(kindMarshal), "marshal") - } - if string(kindIO) != "io" { - t.Errorf("kindIO: got %q, want %q", string(kindIO), "io") - } -} - -func testID() component.ID { - return component.NewIDWithName(componentType(), "test") -} - -// Compile-time assertion: NewFactory returns an exporter.Factory. -// Pins the upstream-type contract that PR-B established — if a future -// refactor regresses to internal/pipeline.ExporterFactory, this fails -// to compile. -var _ exporter.Factory = NewFactory() diff --git a/components/exporters/stdoutexporter/stdoutexporter.go b/components/exporters/stdoutexporter/stdoutexporter.go deleted file mode 100644 index 4e91fa23..00000000 --- a/components/exporters/stdoutexporter/stdoutexporter.go +++ /dev/null @@ -1,127 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package stdoutexporter - -import ( - "context" - "fmt" - "io" - "sync" - - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/consumer" - "go.opentelemetry.io/collector/exporter" - "go.opentelemetry.io/collector/pdata/pmetric" -) - -// stdoutExporter writes one OTLP/JSON-encoded line per -// ConsumeMetrics call. No goroutines, no I/O beyond synchronous writes -// (the operator running this in production is responsible for stdout -// going somewhere useful). -// -// Self-telemetry types live in `selftel.go` (sibling-scoped), not in -// `internal/selftelemetry` — see RFC-0013 §migration PR-B1; this -// exporter is the M8+ copy-worthy template, so the sibling pattern -// must be visible here. -type stdoutExporter struct { - out io.Writer - - // telemetry records exporter self-telemetry (calls_total + - // per-call success/failure). Always non-nil — factory substitutes - // noop on construction failure so the hot path doesn't nil-check. - telemetry selfExporter - - // writeMu serializes writes so two concurrent ConsumeMetrics - // calls don't interleave JSON lines. pdata is not thread-safe - // either, but the value comes in via the consumer call, not a - // shared field — interleaving on out is the only race risk here. - writeMu sync.Mutex - - marshaler pmetric.JSONMarshaler -} - -// Compile-time assertion: stdoutExporter satisfies the upstream -// exporter.Metrics interface (component.Component + consumer.Metrics). -// A breaking shape change in upstream surfaces here at build time -// rather than at runtime when OCB stitches the pipeline. -var _ exporter.Metrics = (*stdoutExporter)(nil) - -// newExporter constructs a stdoutExporter wired to a noop telemetry -// fallback. The factory replaces telemetry post-construction iff -// set.MeterProvider is non-nil AND instrument registration succeeds -// — see createMetrics in factory.go. Keeping the noop default here -// (rather than requiring the factory to set it) means the hot path -// never nil-checks e.telemetry. -func newExporter(cfg *Config) *stdoutExporter { - return &stdoutExporter{ - out: cfg.Out, - telemetry: newNoopSelfExporter(), - } -} - -// NOTE on ExporterCarrier removal: -// -// v0.1.x stdoutexporter exposed `SelfExporter() selftelemetry.Exporter` -// so `cmd/tracecore/collect.collectFailureRateReaders` could feed -// `tracecore.exporter.failure_rate`. RFC-0013 PR-A2 deleted the -// `cmd/tracecore` hand-wired entry point and PR-F.1 deleted -// `internal/selftelemetry` entirely — the carrier surface has no -// remaining consumer. Operators rate-derive failure rate via PromQL -// `rate(otelcol_exporter_stdoutexporter_calls_total{result="error"}[5m])` -// against the post-RFC-0013 namespace-aligned counter. -// -// `otelcol.exporter.stdoutexporter.calls_total` continues to surface -// because the sibling impl emits it on `set.MeterProvider` directly. - -// Start is a no-op — stdoutexporter has no goroutines, no -// connections, and no resources to acquire at pipeline-start time. -// Defined to satisfy component.Component. -func (*stdoutExporter) Start(context.Context, component.Host) error { return nil } - -// Shutdown is a no-op — there's nothing to release. The configured -// io.Writer is owned by the operator (typically os.Stdout) and must -// not be closed by the exporter. Defined to satisfy component.Component. -func (*stdoutExporter) Shutdown(context.Context) error { return nil } - -// Capabilities reports MutatesData=false — stdoutexporter only reads -// the incoming pmetric.Metrics. Fan-out can share a read-only payload -// with us instead of cloning. -func (*stdoutExporter) Capabilities() consumer.Capabilities { - return consumer.Capabilities{MutatesData: false} -} - -// ConsumeMetrics marshals the entire pmetric.Metrics to a single JSON -// line and writes it to the configured writer. Empty pmetric.Metrics -// (zero metrics) produces zero output — operators tailing the -// terminal don't see noise. -func (e *stdoutExporter) ConsumeMetrics(_ context.Context, md pmetric.Metrics) error { - if md.MetricCount() == 0 { - // Empty payloads still count as successful Consume calls — - // the contract was fulfilled, just with zero work. Counting - // them keeps `otelcol_exporter_stdoutexporter_calls_total` consistent - // with the operator's "calls received" intuition. - e.telemetry.IncCallSuccess() - return nil - } - - line, err := e.marshaler.MarshalMetrics(md) - if err != nil { - e.telemetry.IncCallFailure(kindMarshal) - return fmt.Errorf("stdoutexporter: marshal metrics: %w", err) - } - - e.writeMu.Lock() - defer e.writeMu.Unlock() - - if _, err := e.out.Write(line); err != nil { - e.telemetry.IncCallFailure(kindIO) - return fmt.Errorf("stdoutexporter: write metrics line: %w", err) - } - if _, err := e.out.Write([]byte{'\n'}); err != nil { - e.telemetry.IncCallFailure(kindIO) - return fmt.Errorf("stdoutexporter: write newline: %w", err) - } - - e.telemetry.IncCallSuccess() - return nil -} diff --git a/components/exporters/stdoutexporter/stdoutexporter_test.go b/components/exporters/stdoutexporter/stdoutexporter_test.go deleted file mode 100644 index 76e2030b..00000000 --- a/components/exporters/stdoutexporter/stdoutexporter_test.go +++ /dev/null @@ -1,281 +0,0 @@ -// SPDX-License-Identifier: Apache-2.0 - -package stdoutexporter_test - -import ( - "bytes" - "encoding/json" - "errors" - "os" - "strings" - "sync" - "testing" - "time" - - "github.com/stretchr/testify/require" - "go.opentelemetry.io/collector/component" - "go.opentelemetry.io/collector/consumer" - "go.opentelemetry.io/collector/exporter" - "go.opentelemetry.io/collector/exporter/exportertest" - "go.opentelemetry.io/collector/pdata/pcommon" - "go.opentelemetry.io/collector/pdata/pmetric" - "gopkg.in/yaml.v3" - - "github.com/tracecoreai/tracecore/components/exporters/stdoutexporter" -) - -func TestConfig_Validate_AlwaysPasses(t *testing.T) { - t.Parallel() - require.NoError(t, (&stdoutexporter.Config{}).Validate()) -} - -// TestExampleConfig_Parses pins that the operator-facing -// example_config.yaml decodes into a valid stdoutexporter Config. -// Mirrors the receiver-side gate so example_config drift is caught -// at PR time, not on first operator install. -func TestExampleConfig_Parses(t *testing.T) { - t.Parallel() - bs, err := os.ReadFile("example_config.yaml") - require.NoError(t, err) - - var doc struct { - Exporters struct { - Stdoutexporter yaml.Node `yaml:"stdoutexporter"` - } `yaml:"exporters"` - } - require.NoError(t, yaml.Unmarshal(bs, &doc), - "example_config.yaml must parse as YAML") - - cfg, ok := stdoutexporter.NewFactory().CreateDefaultConfig().(*stdoutexporter.Config) - require.True(t, ok, "factory must produce *Config") - // stdoutexporter's example is `stdoutexporter: {}` — an empty - // block. The factory default already populates Out=os.Stdout; - // decoding an empty mapping is a no-op and the result must - // still validate. - require.NoError(t, doc.Exporters.Stdoutexporter.Decode(cfg), - "exporters.stdoutexporter block must decode into Config") - require.NoError(t, cfg.Validate(), - "example_config.yaml must produce a Validate-clean Config") -} - -func TestFactory_Type(t *testing.T) { - t.Parallel() - require.Equal(t, "stdoutexporter", stdoutexporter.NewFactory().Type().String()) -} - -func TestFactory_CreateDefaultConfig_DefaultsToStdout(t *testing.T) { - t.Parallel() - cfg := stdoutexporter.NewFactory().CreateDefaultConfig() - cc, ok := cfg.(*stdoutexporter.Config) - require.True(t, ok) - require.NotNil(t, cc.Out, "default Out must be non-nil (os.Stdout)") -} - -// TestFactory_CreateTraces_Unsupported pins that the traces signal -// surfaces upstream's pipeline.ErrSignalNotSupported sentinel. Mirrors -// the nccl_fr receiver-side gate established in PR-B2 — when -// exporter.WithMetrics is the only signal registered, upstream's -// NewFactory wires the missing signals to return this error from the -// component package. -func TestFactory_CreateTraces_Unsupported(t *testing.T) { - t.Parallel() - f := stdoutexporter.NewFactory() - set := exportertest.NewNopSettings(f.Type()) - exp, err := f.CreateTraces(t.Context(), set, f.CreateDefaultConfig()) - require.Nil(t, exp) - require.Error(t, err) - // Upstream returns pipeline.ErrSignalNotSupported (from - // go.opentelemetry.io/collector/pipeline) when a signal is not - // registered via WithX. The exact sentinel lives in upstream; - // asserting non-nil + the surfaced message keeps this test - // resilient to upstream's internal sentinel renames while still - // pinning the contract. - require.Contains(t, err.Error(), "telemetry type is not supported", - "upstream surfaces 'telemetry type is not supported' for unregistered signals") -} - -func TestFactory_CreateLogs_Unsupported(t *testing.T) { - t.Parallel() - f := stdoutexporter.NewFactory() - set := exportertest.NewNopSettings(f.Type()) - exp, err := f.CreateLogs(t.Context(), set, f.CreateDefaultConfig()) - require.Nil(t, exp) - require.Error(t, err) - require.Contains(t, err.Error(), "telemetry type is not supported") -} - -// TestFactory_CreateMetrics_BadConfigType pins the type-guard in -// createMetrics. A non-*Config cfg must surface a clear error rather -// than panic on the type assertion. Mirrors the nccl_fr sibling gate. -func TestFactory_CreateMetrics_BadConfigType(t *testing.T) { - t.Parallel() - f := stdoutexporter.NewFactory() - set := exportertest.NewNopSettings(f.Type()) - exp, err := f.CreateMetrics(t.Context(), set, badConfig{}) - require.Nil(t, exp) - require.Error(t, err) - require.Contains(t, err.Error(), "unexpected config type") -} - -type badConfig struct{} - -func (badConfig) Validate() error { return nil } - -// TestExporter_Capabilities pins {MutatesData: false} so fan-out's -// share-not-clone optimization applies. -func TestExporter_Capabilities(t *testing.T) { - t.Parallel() - - exp := newTestExporter(t, &bytes.Buffer{}) - require.False(t, exp.Capabilities().MutatesData) -} - -// TestExporter_WritesOneJSONLinePerCall pins the JSON-Lines contract: -// each ConsumeMetrics produces exactly one line; that line is parseable -// JSON; consecutive calls accumulate. -func TestExporter_WritesOneJSONLinePerCall(t *testing.T) { - t.Parallel() - - buf := &bytes.Buffer{} - exp := newTestExporter(t, buf) - - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetricsWith("first"))) - require.NoError(t, exp.ConsumeMetrics(t.Context(), newMetricsWith("second"))) - - lines := strings.Split(strings.TrimRight(buf.String(), "\n"), "\n") - require.Len(t, lines, 2, "two ConsumeMetrics calls → two lines") - - for _, line := range lines { - var v map[string]any - require.NoError(t, json.Unmarshal([]byte(line), &v), "line must be valid JSON") - } - - require.Contains(t, lines[0], `"first"`) - require.Contains(t, lines[1], `"second"`) -} - -// TestExporter_EmptyMetrics_ProducesNoOutput pins the documented -// behaviour: zero metrics → zero output. Tailing operators don't see -// empty lines for empty pushes. -func TestExporter_EmptyMetrics_ProducesNoOutput(t *testing.T) { - t.Parallel() - - buf := &bytes.Buffer{} - exp := newTestExporter(t, buf) - - require.NoError(t, exp.ConsumeMetrics(t.Context(), pmetric.NewMetrics())) - require.Empty(t, buf.String(), "empty metrics push must produce no output") -} - -// TestExporter_ConcurrentWrites_DoNotInterleave: pdata isn't -// thread-safe, but the receiver feeds us one md per call. The risk -// is two concurrent calls interleaving bytes in the writer. The -// writeMu mutex prevents that; the test exposes a regression if it -// is ever removed. -func TestExporter_ConcurrentWrites_DoNotInterleave(t *testing.T) { - t.Parallel() - - buf := &bytes.Buffer{} - exp := newTestExporter(t, buf) - - const goroutines = 16 - errs := make(chan error, goroutines) - var wg sync.WaitGroup - wg.Add(goroutines) - for range goroutines { - go func() { - defer wg.Done() - errs <- exp.ConsumeMetrics(t.Context(), newMetricsWith("concurrent")) - }() - } - wg.Wait() - close(errs) - for err := range errs { - require.NoError(t, err) - } - - lines := strings.Split(strings.TrimRight(buf.String(), "\n"), "\n") - require.Len(t, lines, goroutines) - for _, line := range lines { - var v map[string]any - require.NoError(t, json.Unmarshal([]byte(line), &v), "each line must be valid JSON (no interleaving)") - } -} - -// TestExporter_StartShutdown_NoOps pins the upstream-Component -// contract: Start and Shutdown must return nil. stdoutexporter holds -// no goroutines or resources, so both are intentionally no-ops; this -// test fails if a future refactor introduces a Start/Shutdown -// side-effect that can error (which would need its own dedicated test). -func TestExporter_StartShutdown_NoOps(t *testing.T) { - t.Parallel() - - exp := newTestExporter(t, &bytes.Buffer{}) - require.NoError(t, exp.Start(t.Context(), nopHost{})) - require.NoError(t, exp.Shutdown(t.Context())) -} - -// TestExporter_WriteErrorSurfaces pins the I/O failure path: -// a writer that errors on Write must (1) surface a wrapped error -// from ConsumeMetrics and (2) not leak partial state. Pairs with -// the selftel kindIO coverage in selftel_test.go. -func TestExporter_WriteErrorSurfaces(t *testing.T) { - t.Parallel() - - exp := newTestExporter(t, errWriter{}) - err := exp.ConsumeMetrics(t.Context(), newMetricsWith("boom")) - require.ErrorIs(t, err, errSyntheticWrite, - "ConsumeMetrics must wrap the underlying writer error so callers can errors.Is the cause") -} - -// --- helpers --- - -func newTestExporter(t *testing.T, out interface{ Write(p []byte) (int, error) }) exporter.Metrics { - t.Helper() - f := stdoutexporter.NewFactory() - set := exportertest.NewNopSettings(f.Type()) - cfg := &stdoutexporter.Config{Out: out} - // f.CreateMetrics returns exporter.Metrics statically; satisfying - // consumer.Metrics is implied (exporter.Metrics embeds it). The - // compile-time pin is in selftel_test.go's `var _ exporter.Factory - // = NewFactory()`; this helper just returns the value. - exp, err := f.CreateMetrics(t.Context(), set, cfg) - require.NoError(t, err) - // One runtime assertion to confirm consumer.Metrics is satisfied — - // guards against a future refactor that returns a wrapper without - // the consumer surface. - _, ok := any(exp).(consumer.Metrics) - require.True(t, ok, "exporter must satisfy consumer.Metrics") - return exp -} - -// nopHost is the minimal component.Host stand-in for Start tests. -// stdoutexporter never queries the host, so an empty stub is enough. -type nopHost struct{} - -func (nopHost) GetExtensions() map[component.ID]component.Component { return nil } - -// errWriter always fails Write, surfacing errSyntheticWrite. Mirrors -// the receiver-side `failingExporterMeter` adversarial pattern — a -// per-test failure injector that lives next to the test that needs it. -type errWriter struct{} - -var errSyntheticWrite = errors.New("synthetic: writer always fails") - -func (errWriter) Write([]byte) (int, error) { return 0, errSyntheticWrite } - -// newMetricsWith constructs a pmetric.Metrics with a single gauge -// metric whose name embeds the label so tests can distinguish -// successive calls in the output. -func newMetricsWith(label string) pmetric.Metrics { - md := pmetric.NewMetrics() - rm := md.ResourceMetrics().AppendEmpty() - sm := rm.ScopeMetrics().AppendEmpty() - m := sm.Metrics().AppendEmpty() - m.SetName(label) - gauge := m.SetEmptyGauge() - dp := gauge.DataPoints().AppendEmpty() - dp.SetTimestamp(pcommon.NewTimestampFromTime(time.Unix(0, 0))) - dp.SetIntValue(42) - return md -} diff --git a/docs/FAILURE-MODES.md b/docs/FAILURE-MODES.md index d9a8b7a7..40d33009 100644 --- a/docs/FAILURE-MODES.md +++ b/docs/FAILURE-MODES.md @@ -93,10 +93,10 @@ existing alerts survive the swap from in-tree receiver to upstream + OTTL. | Scenario | Behaviour | Test | |---|---|---| -| 🟢 Exporter unreachable (network error mid-send) | `otlphttp` retries on retryable HTTP status codes (429/502/503/504) and on network errors with exponential backoff; final error propagates to the receiver as a `Permanent` or `Retryable` `kind`, surfaced via `otelcol_exporter_calls_total{outcome="error"}` (post-RFC-0013 naming). | `components/exporters/otlphttp/otlphttp_test.go::TestExporter_RetriesOnNetworkError` | +| 🟢 Exporter unreachable (network error mid-send) | Upstream `otlphttpexporter` retries on retryable HTTP status codes (429/502/503/504) and on network errors with exponential backoff via `exporterhelper`'s retry-sender; the failure surfaces via `otelcol_exporter_send_failed_*` counters. | Upstream `go.opentelemetry.io/collector/exporter/otlphttpexporter`; the in-tree wrapper retired post-v0.2.0 (issue #333). | | 🟢 Vendor SDK failure (`dcgm-exporter` unreachable at Start) | `prometheusreceiver` records the scrape failure and emits `up=0`; the pipeline continues without the scrape target's contribution rather than failing the whole binary. Source: upstream `prometheusreceiver` scraping `dcgm-exporter` per the bundled recipe. | recipe-level alert `DCGMReceiverDegraded`; see `tracecore-recipes` chart. | | 🟢 Config invalid (unknown top-level field) | Upstream `confmap` returns a path-tagged error citing the offending key; `tracecore validate` exits non-zero before any I/O. | Upstream `confmap`; RFC-0013 PR-F.2 retired the legacy in-tree `internal/config` test. | -| 🟢 Config invalid (bad exporter endpoint) | `otlphttp` rejects non-http/https schemes at validate time with `otlphttp: endpoint: scheme must be http or https`; exit 2. | `components/exporters/otlphttp/otlphttp_test.go::TestConfig_Validate_RejectsNonHTTPSchemes` | +| 🟢 Config invalid (bad exporter endpoint) | Upstream `otlphttpexporter` rejects non-http/https endpoints at config-load time via `confighttp.ClientConfig.Validate`; `tracecore validate` exits non-zero before any I/O. | Upstream `go.opentelemetry.io/collector/config/confighttp`; the in-tree wrapper retired post-v0.2.0 (issue #333). | ## Self-telemetry surface (M2) diff --git a/docs/MILESTONES.md b/docs/MILESTONES.md index a7e1f11f..ecf8f7f1 100644 --- a/docs/MILESTONES.md +++ b/docs/MILESTONES.md @@ -108,7 +108,7 @@ Every milestone, in every lane, satisfies all seven principles below. Depth live ### M1. Pipeline runtime & component contract - **Status:** ☑ delivered (PRs #12 + #13) -- **Status (RFC-0013):** DELETED at v0.1.0 (pipeline boot path) - replaced by OCB-generated `main.go` from `builder-config.yaml`. **PR-A2 landed (#189)**: `cmd/tracecore/` deleted (3,032 LOC across 14 source + 7 test files); the OCB binary at `./_build/tracecore` is the canonical entry point. **PR-F.1 landed (#206)**: `internal/selftelemetry/` + `internal/telemetry/` deleted; every receiver/exporter now travels its own `selftel.go` + `lifecycle.go` siblings (PR-B1-shape sibling ports: #184/#185/#186/#187/#188/#193/#194/#196/#197). **PR-F.2 landed**: `internal/{componentstatus,pipeline,pipelinebuilder,config,consumer,fanout,runtime/lifecycle}` all deleted (56 files / -6,888 LOC) after the last pipeline+consumer-importing receivers landed (#204 k8sevents, #205 clockreceiver, #207 otlphttp — all PR-B2-shape ports off canonical #201). `internal/config/` deleted too; YAML loading delegated to upstream `confmap` providers. **PR-K.2 landed**: `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}/` deleted along with `tools/failure-inject/xidgen/` and the `containerstdout-on-values.yaml` chart fixture; the four upstream-OTel recipes shipped in PR-J (#195) are now the canonical replacements. The bundled `components/exporters/stdoutexporter/` canonical example remains; `clockreceiver` was replaced by `hostmetricsreceiver` (loadscraper @ 1s) per PR-E unblocking (#180) — the originally-planned `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib at any tag. +- **Status (RFC-0013):** DELETED at v0.1.0 (pipeline boot path) - replaced by OCB-generated `main.go` from `builder-config.yaml`. **PR-A2 landed (#189)**: `cmd/tracecore/` deleted (3,032 LOC across 14 source + 7 test files); the OCB binary at `./_build/tracecore` is the canonical entry point. **PR-F.1 landed (#206)**: `internal/selftelemetry/` + `internal/telemetry/` deleted; every receiver/exporter now travels its own `selftel.go` + `lifecycle.go` siblings (PR-B1-shape sibling ports: #184/#185/#186/#187/#188/#193/#194/#196/#197). **PR-F.2 landed**: `internal/{componentstatus,pipeline,pipelinebuilder,config,consumer,fanout,runtime/lifecycle}` all deleted (56 files / -6,888 LOC) after the last pipeline+consumer-importing receivers landed (#204 k8sevents, #205 clockreceiver, #207 otlphttp — all PR-B2-shape ports off canonical #201). `internal/config/` deleted too; YAML loading delegated to upstream `confmap` providers. **PR-K.2 landed**: `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}/` deleted along with `tools/failure-inject/xidgen/` and the `containerstdout-on-values.yaml` chart fixture; the four upstream-OTel recipes shipped in PR-J (#195) are now the canonical replacements. The bundled `components/exporters/stdoutexporter/` canonical example was retired post-v0.2.0 alongside `components/exporters/otlphttp/` (closes #333 + #334) once upstream `debugexporter` + `otlphttpexporter` in the OCB build took over; `clockreceiver` was replaced by `hostmetricsreceiver` (loadscraper @ 1s) per PR-E unblocking (#180) — the originally-planned `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib at any tag. - **Depends on:** none (foundational) - **Reference:** [RFC-0003](rfcs/0003-pipeline-runtime-and-component-contract.md). The contract (formerly documented in `internal/pipeline/README.md`) was superseded at v0.1.0 by upstream `go.opentelemetry.io/collector/{component,receiver,processor,exporter,consumer,pipeline}` per RFC-0013. diff --git a/docs/README.md b/docs/README.md index 4e29fdc4..b6295acc 100644 --- a/docs/README.md +++ b/docs/README.md @@ -51,12 +51,12 @@ Backend (exporter-side) recipes: | File | Audience | Purpose | |---|---|---| -| [integrations/otel-backend.md](integrations/otel-backend.md) | 👤 | OTLP/HTTP to a generic OpenTelemetry Collector via the in-tree `otlphttp` exporter. | -| [integrations/honeycomb.md](integrations/honeycomb.md) | 👤 | Direct OTLP/HTTP to Honeycomb via the in-tree `otlphttp` exporter. | +| [integrations/otel-backend.md](integrations/otel-backend.md) | 👤 | OTLP/HTTP to a generic OpenTelemetry Collector via the upstream `otlphttp` exporter. | +| [integrations/honeycomb.md](integrations/honeycomb.md) | 👤 | Direct OTLP/HTTP to Honeycomb via the upstream `otlphttp` exporter. | | [integrations/datadog.md](integrations/datadog.md) | 👤 | Datadog via the bundled `datadogexporter`. | | [integrations/clickhouse-direct.md](integrations/clickhouse-direct.md) | 👤 | Self-hosted ClickHouse via the bundled `clickhouseexporter`. | | [integrations/loki.md](integrations/loki.md) | 👤 | Grafana Loki via OTLP/HTTP native ingestion (`otlphttp` exporter, `X-Scope-OrgID` tenant header); labels-vs-structured-metadata mapping for `pattern.*` verdict attributes. | -| [integrations/tempo.md](integrations/tempo.md) | 👤 | Grafana Tempo (OSS, AGPL-3.0) trace backend via the in-tree `otlphttp` exporter. | +| [integrations/tempo.md](integrations/tempo.md) | 👤 | Grafana Tempo (OSS, AGPL-3.0) trace backend via the upstream `otlphttp` exporter. | | [integrations/multi-cluster.md](integrations/multi-cluster.md) | 👤 | Multi-cluster federation v0 (read-only roll-up): N source clusters stamp `cluster.id` via OTTL transform, forward OTLP/HTTP to a central aggregation collector that fans out to backends. | | [integrations/cert-manager-mtls.md](integrations/cert-manager-mtls.md) | 👤 | cert-manager-issued mTLS for multi-cluster OTLP egress: ClusterIssuer + Certificate manifests, Secret mount layout, `tls:` block wiring under `/etc/tracecore/tls/`. | @@ -77,7 +77,6 @@ Source (receiver-side) recipes — RFC-0013 §migration PR-J replacements for th | [module/receiver/ncclfrreceiver/RUNBOOK.md](../module/receiver/ncclfrreceiver/RUNBOOK.md) | 👤 | Operator playbook + per-kind triage (incl. pickle deny-boundary). | | [components/receivers/pyspy/README.md](../components/receivers/pyspy/README.md) | 👤 🛠️ | On-demand Python stack-sampling receiver (faulthandler-based). - scheduled for deletion per RFC-0013 §7 | | [components/receivers/pyspy/RUNBOOK.md](../components/receivers/pyspy/RUNBOOK.md) | 👤 | Operator playbook + per-kind triage (RFC-0009 degraded modes). - scheduled for deletion per RFC-0013 §7 | -| [components/exporters/otlphttp/README.md](../components/exporters/otlphttp/README.md) | 👤 🛠️ | OTLP/HTTP exporter - production sink to an OTel collector or backend. | ## What goes where (for contributors) diff --git a/docs/STRATEGY.md b/docs/STRATEGY.md index 08833b7e..b535f2b8 100644 --- a/docs/STRATEGY.md +++ b/docs/STRATEGY.md @@ -165,12 +165,15 @@ posture): ### In-tree receivers under `components/` are queued for deletion The current `components/receivers/{clockreceiver,dcgm,k8sevents, -kernelevents,kineto,kueue,pyspy}` and `components/exporters/stdoutexporter` -are scheduled for removal per RFC-0013 §7 (v0.1.0 / v0.2.0 / v0.3.0 -boundaries). They are replaced by upstream + contrib receivers wired -through the OCB manifest, with operator-facing semantics preserved by -the OTTL normalization layer in the bundled recipe. See the migration -guide that ships with v0.2.0 for operator-visible deltas. +kernelevents,kineto,kueue,pyspy}` are scheduled for removal per +RFC-0013 §7 (v0.1.0 / v0.2.0 / v0.3.0 boundaries). They are replaced +by upstream + contrib receivers wired through the OCB manifest, with +operator-facing semantics preserved by the OTTL normalization layer +in the bundled recipe. See the migration guide that ships with v0.2.0 +for operator-visible deltas. The matching in-tree exporter wrappers +(`components/exporters/{otlphttp,stdoutexporter}`) were deleted +post-v0.2.0 once the OCB build switched to upstream `otlphttpexporter` ++ `debugexporter` (issues #333, #334). ### `internal/` by default, `pkg/` requires an accepted RFC diff --git a/docs/followups/otlphttp.md b/docs/followups/otlphttp.md deleted file mode 100644 index 1bfbb535..00000000 --- a/docs/followups/otlphttp.md +++ /dev/null @@ -1,240 +0,0 @@ -# otlphttp + install-bench review deferrals (post-PR #101) - -## Status (RFC-0013, 2026-05-22) - -`otlphttpexporter` is upstream (CNCF OTel) and is consumed directly -via the OCB-assembly per RFC-0013 §2 — no intermediate -`otelcol-contrib` binary. The in-tree `components/exporters/otlphttp/` -wrapper is replaced at v0.1.0 by direct OCB inclusion of -`go.opentelemetry.io/collector/exporter/otlphttpexporter`. - -Most items below become either `[UPSTREAM]` (propose against -`otlphttpexporter` directly) or `[RECIPE]` (operator-facing -recipe content + chart values + bundled YAML examples for env-var -expansion, Secret + envFrom + headers wiring, etc.). Items that -reference the in-tree wrapper code path (custom `signal` Stringer, -the local capturing fake, custom CRLF Validate logic, retry-counter -implementation details, gzip pooling internals) become `[STRIKE]` — -upstream `otlphttpexporter` already provides these surfaces, and -the wrapper's audit trail closes when the wrapper is deleted. - -Bench-harness work (`bench/install`) survives — it now exercises -the OCB-assembled binary's bundled `otlphttpexporter`. Items there -remain `[KEEP]`. - -Repo-wide / convention rows (doc-vs-code parity gate, workflow -paths trigger audit, MILESTONES.md status flips, counting-fake -helper) remain `[KEEP]` — they apply to the OCB-assembled binary -equally well. - -Per-item tags below: `[STRIKE]` where the in-tree wrapper is the -substrate; `[UPSTREAM]` where the work belongs at -`open-telemetry/opentelemetry-collector` (`otlphttpexporter`); -`[RECIPE]` where the work moves to bundled chart values + OTel YAML -examples; `[KEEP]` for bench + repo-wide items. - -## otlphttp + install-bench review deferrals (post-PR #101) - -Deferred reviewer findings from the 5-phase rigorous review of PR -#101. Tagged by blocking resource so future implementers can route -each item correctly. - -### Code work (no external resource needed) - -- [ ] **otlphttp: retries_total counter.** Operators can't detect - retry storms in telemetry today; `tracecore_exporter_calls_total` - counts one outcome per Consume call regardless of how many - retries happened underneath. Adds an exporter-local - `retries_total{kind=retryable|network}` counter. *Source:* - P3-Rev2 F2. *Trigger:* first operator ticket reporting "is - the backend flaky?" with no signal to answer it. -- [ ] **otlphttp: selftelemetry init mirror stdoutexporter.** The - stdoutexporter init path records an `IncInitError` metric + - WARN log when the meter-provider build fails; otlphttp - silently substitutes the noop. Either align both, or pull - the shared bit into `selftelemetry` itself. *Source:* P2 - Maint M-1. *Trigger:* a future exporter author copies - otlphttp's pattern and loses the init-error signal. -- [ ] **otlphttp: MaxBodyBytes config + Validate at send time.** - `pmetric.ProtoMarshaler` can produce arbitrarily large - bodies if the upstream pipeline batches without limit. - *Source:* P3-Rev2 F12. *Trigger:* OOM ticket from a - no-batchprocessor pipeline. -- [ ] **otlphttp: scheme-mismatch WARN on per-signal endpoint - override.** Operator sets `endpoint: https://...` and - `metrics_endpoint: http://...` and no observable signal - that one signal is shipping cleartext. *Source:* P3-Rev2 - F8. *Trigger:* first "I thought TLS was on for all signals" - incident. -- *Closed: Validate now rejects `\r`/`\n` in header values - (`components/exporters/otlphttp/config.go`), locked by - `TestConfig_Validate_RejectsCRLFInHeaderValue`.* -- [ ] **otlphttp: env-var expansion in `headers` values.** Chart - warns against putting credentials in `config:` (lands - plaintext in ConfigMap), but the otlphttp README recommends - `headers:` for vendor auth tokens. Add `${env:VAR}` - expansion + document the Secret-mount pattern. *Source:* - P2 Adopt A4 + SRE S1. *Trigger:* first operator pastes a - Honeycomb token in ConfigMap. -- [ ] **otlphttp: gzip writer pooling via sync.Pool.** Every - compressed send allocates a fresh `gzip.NewWriter` (~256 KB - flate state). *Source:* P2 Performance #1. *Trigger:* - bench-overhead reports gzip allocations dominating ΔRSS. -- [ ] **otlphttp: `transport.MaxIdleConnsPerHost`.** Default 2 - caps effective concurrent HTTPS throughput at 2. *Source:* - P2 Performance #2. *Trigger:* install-bench overhead row - shows high TCP-handshake count. -- *Closed: `newExporter` now canonicalizes header keys once via - `textproto.CanonicalMIMEHeaderKey` and the per-call hot path - assigns directly into `req.Header[k]`, skipping net/http's - per-call canonicalization. Falsifier - `TestExporter_AppliesNonCanonicalHeaderKey` will fail if the - construction-time canonicalization silently drops a lowercase key.* -- *Closed: retry-sleep select now uses `time.NewTimer` with - explicit `Stop` on the ctx/shutdown branches, so the - underlying goroutine/channel doesn't leak past the cancel.* -- [ ] **otlphttp: drop or document drainAndClose 64 KiB body - cap.** Cap breaks keep-alive on adversarial bodies. *Source:* - P3-Rev2 F5. *Trigger:* backend that sends verbose - partial-success responses causes keep-alive churn. -- [ ] **otlphttp: goleak.VerifyNone test adoption.** Current - goroutine-leak test uses homegrown `runtime.NumGoroutine` - deltas. *Source:* P3-Rev2 F7 / P4 A7. *Trigger:* CI - observes a flake on the current test. -- [ ] **otlphttp: FuzzParseRetryAfter target.** External-input - parser; matches the `ci-fuzz-nccl-fr` precedent. *Source:* - P3-Rev2 F9 / P4 A10. *Trigger:* repo-wide fuzz coverage - pass. -- *Closed: `signal` now implements `fmt.Stringer`; structured-log - attributes render "metrics"/"traces"/"logs" instead of the - underlying int. Locked by `TestSignal_StringMatchesOTLPPathSuffix` - + `TestSignal_StringUnknownIsLabeled` in - `components/exporters/otlphttp/signal_internal_test.go`.* -- *Closed: README "Retry behavior" now cites RFC 9110 §10.2.3 (the - current authoritative HTTP-date production; obsoleted RFC 7231 in - 2022) covering IMF-fixdate / RFC 850 / asctime, matching - `http.ParseTime`'s actual acceptance set.* -- *Closed: `ConsumeLogs` no longer carries the duplicate - `inflight.Add(1)` / `defer Done()` pair; bookkeeping now matches - `ConsumeMetrics` and `ConsumeTraces` (one Add / one Done per call). - Shutdown drain semantics unchanged (the LIFO defers had kept the - counter balanced; the spike was cosmetic).* -- [ ] **otlphttp: RFC 7230 §3.2 broader CTL-byte rejection in header - values.** PR #116 closed `\r`/`\n` (the CRLF-injection vector). - RFC 7230 §3.2 also forbids NUL (U+0000), VT (U+000B), FF - (U+000C), and other CTL bytes in field-value. Go's net/http - sanitizes `\r`/`\n` to spaces at write time but does NOT reject - NUL or other CTL bytes; they pass to the wire. Defense-in- - depth would extend `Validate` to reject the full forbidden set. - *Source:* PR #116 security lens. *Trigger:* first operator - report of a malformed-header incident traced to a NUL/VT/FF - slipping through, OR a strict-RFC-compliance audit. -- [ ] **otlphttp: case-variant duplicate header keys silently - dedupe.** Operator YAML with both `x-auth` and `X-Auth` keeps - whichever map-iteration order picked last (nondet). Behavior - is unchanged from the pre-canonicalization path (`req.Header.Set` - also canonicalized at send), but the new - construction-time canonicalization makes the trap more - visible. Either reject the collision at `Validate` (one - canonical form per operator-supplied key) or document the - dedup rule in the README. *Source:* PR #116 performance + - maintainer lenses. *Trigger:* first operator report of - "I set both header variants and only one arrived." -- [ ] **otlphttp: RUNBOOK entry for the new `Validate` rejection - messages.** `"header X-Auth value contains a CR or LF - character"` is actionable on its own, but a sentence in - RUNBOOK pointing at YAML-templating bugs as the likely root - cause would shorten triage for an operator hitting it - cold. *Source:* PR #116 operator lens. *Trigger:* first - operator report of the CR/LF rejection at startup. - -### Bench-harness work (no external resource) - -- [ ] **bench/install: `run.sh --selftest` flag.** Exercise the - FAIL-path JSON emission without standing up a kind cluster. - *Source:* P4 A4. *Trigger:* second failure-mode JSON field - added and the format needs CI validation. -- [ ] **bench/install: sink readinessProbe.** The Available - condition fires on container-start, not OTLP-listener-bind. - *Source:* P3-Rev1 #3. *Trigger:* CI variance on - `first_data_seconds` across reruns exceeds noise floor. -- [ ] **bench/install: NetworkPolicy posture.** sink namespace - has no NetworkPolicy; bench measures the no-NP path. - *Source:* P2 SRE S2. *Trigger:* operator install fails on - a NetworkPolicy-enabled cluster. -- [ ] **bench/install: traces + logs pipeline coverage.** Bench - wires `pipelines.metrics` only; exporter supports all three - signals. *Source:* P2 Researcher F3. *Trigger:* a - regression in the traces/logs path lands without bench - catching it. -- [ ] **bench/install: sink image digest pin.** Workflow uses - `otel/opentelemetry-collector-contrib:0.152.0` by tag. - *Source:* P2 SRE S5 / Researcher F5. *Trigger:* upstream - re-pushes the tag. -- [ ] **bench/install: inotify-based first-byte detection.** - `kubectl exec stat` polling at 50ms has ~80-300ms real - cadence on cold kind. *Source:* P2 Operator F4 / P3-Rev1 - #2 + #7. *Trigger:* M20 hero KPI is asserted with - sub-second precision. -- [ ] **bench/install: schema fields for `runner_arch`, - `git_dirty`.** Today `runner_label` conflates os/arch and - cleanliness is unrecorded. *Source:* P2 Maint M5. - *Trigger:* aggregator needs to split by arch. -- [ ] **bench/install: FAIL configmap dump precision.** The - diagnostic block at run.sh:110 does - `kubectl get configmap -o yaml | grep -A 60 "config.yaml"` - with no name selector and a fixed 60-line window. Today - only one ConfigMap lives in `tracecore-system` so the - output is correct in practice, but a future receiver - that ships a second ConfigMap will interleave content. - Replace with - `kubectl -n tracecore-system get configmap tracecore-config -o jsonpath='{.data.config\.yaml}'`. - *Source:* delta review post-PR #101. *Trigger:* second - ConfigMap lands in the chart, or first time a multi-CM - diagnostic dump is unreadable. -- [ ] **otlphttp + chart: operator-facing auth-headers worked - example.** values.yaml warns "DO NOT inject credentials - in `headers:`" and recommends `envFrom`, but neither the - chart docs nor `components/exporters/otlphttp/README.md` - ship an end-to-end Secret + envFrom + headers example. - First operator wiring auth will either ask, guess wrong, - or land plaintext in the ConfigMap anyway. Add a worked - block next to the warning. Likely gated on env-var - expansion landing in the exporter (already deferred). - *Source:* delta review post-PR #101. *Trigger:* first - ticket asking how to wire auth headers for a real - backend. - -### Repo-wide / Convention work - -- [ ] **Doc-vs-code parity gate.** Numeric defaults documented in - code constants AND in operator-facing README must match. - P2 caught a 30s vs 10s drift in this PR. *Source:* P2 - R-default-parity. *Trigger:* second doc-vs-code drift - incident. -- [x] **Workflow paths trigger extends to substrate code.** - ~~install-bench workflow now includes `cmd/tracecore/**` and - `internal/pipeline/**` (P3 fix). Other workflows should - audit their paths filters analogously.~~ *Source:* P3-Rev1 - #10. *Audit (2026-05-20):* `chart.yml` and `install-bench.yml` - both include the substrate (`cmd/tracecore/**`, - `internal/**`); ✅ substrate-aware. `chaos.yml` covers - `tools/failure-inject/**` + `module/pkg/**` only; - substrate-coupling is indirect, acceptable. *Shipped:* - `kernelevents-integration.yml` and `pyspy-integration.yml` - now include `cmd/tracecore/**` + `internal/pipeline/**` + - `internal/selftelemetry/**` in both push and pull_request - `paths:` filters, so a factory-wiring or pipeline-contract - change re-runs the two integration suites. -- [ ] **MILESTONES.md status flips bundled with delivery PRs.** - CONTRIBUTING L10 review blocker. This PR delivers M20a + - parts of M5 rubric but did not flip the checkboxes. - *Source:* P2 Contributor C2. *Trigger:* a doc-housekeeping - PR series to bring MILESTONES + CHANGELOG into line. -- [ ] **counting-fake `selftelemetry.Exporter` test helper.** - Internal `classify_internal_test.go` covers the kind-routing - pure-function path; integration tests still infer telemetry - increments from side effects. *Source:* P4 A6 (rejected for - THIS PR's scope; viable as repo-wide test infra). - *Trigger:* second exporter ships and the assertion shape - repeats. diff --git a/docs/migration/v0.1-to-v0.2.md b/docs/migration/v0.1-to-v0.2.md index de988000..698499b1 100644 --- a/docs/migration/v0.1-to-v0.2.md +++ b/docs/migration/v0.1-to-v0.2.md @@ -126,8 +126,8 @@ Where `` is the OCB component name without underscores. Per-component subs |---|---| | `components/receivers/nccl_fr` | `ncclfr` (note: underscore stripped) | | `components/receivers/pyspy` | `pyspy` | -| `components/exporters/otlphttp` | `otlphttp` | -| `components/exporters/stdoutexporter` | `stdoutexporter` | +| `components/exporters/otlphttp` | `otlphttp` (wrapper retired post-v0.2.0; upstream `otlphttpexporter` now emits the standard `otelcol_exporter_sent_*` / `otelcol_exporter_send_failed_*` instruments via `exporterhelper`, not the wrapper's per-component `otelcol_exporter_otlphttp_calls_total`. See `[Unreleased]` Removed in CHANGELOG.) | +| `components/exporters/stdoutexporter` | `stdoutexporter` (wrapper retired post-v0.2.0; replaced by upstream `debugexporter` whose component-id is `debug`. Chart default already uses `debug`.) | **Label shape is preserved.** `component_id` continues to partition per-instance (e.g. `ncclfr/default`); the `kind` label values are unchanged (`watch`, `parse`, etc.). Dashboards and alerts that filtered on `kind` need only the metric-name rename, not a label-selector rewrite. diff --git a/docs/research/baselines.md b/docs/research/baselines.md index 3cc3a9b5..062b6ec5 100644 --- a/docs/research/baselines.md +++ b/docs/research/baselines.md @@ -15,7 +15,6 @@ Command: `make coverage-check` | Package | Coverage | Threshold | |---|---:|---:| | `cmd/tracecore` | 39.8% | informational (no gate on cmd/) | -| `components/exporters/stdoutexporter` | 80.0% | ≥60% | | `components/receivers/clockreceiver` | 90.4% | ≥60% | | `internal/config` | 94.4% | ≥70% | | `internal/fanout` | 70.4% | ≥70% | diff --git a/docs/v1-rc1-simplification-audit.md b/docs/v1-rc1-simplification-audit.md index 4bd61660..fdc2eeda 100644 --- a/docs/v1-rc1-simplification-audit.md +++ b/docs/v1-rc1-simplification-audit.md @@ -31,29 +31,30 @@ here. production binary is generated by OCB from [`builder-config.yaml`](../builder-config.yaml). Its `receivers:` / `processors:` / `exporters:` lists never reference -the three packages below; the in-tree wrappers are imported only -by their own `_test.go` files. Production traffic flows through -upstream `otlphttpexporter` + `debugexporter` (lines 59-60 of +the package below; the in-tree wrapper is imported only by its own +`_test.go` files. Production traffic flows through upstream +`otlphttpexporter` + `debugexporter` (lines 59-60 of builder-config.yaml). Repro: ```sh -grep -rn 'github.com/tracecoreai/tracecore/components/exporters' \ +grep -rn 'github.com/tracecoreai/tracecore/components' \ --include="*.go" . | grep -v _test -# Only hits: instrumentation-scope string constants in selftel.go. +# Pyspy is the only remaining wrapper. ``` | Package | Non-test LOC | Total LOC | External Go importers | OCB-wired? | RFC status | |---|---:|---:|---|---|---| -| `components/exporters/otlphttp/` | 1,168 | 2,904 | 0 | no | [STRIKE] per `docs/followups/otlphttp.md` | -| `components/exporters/stdoutexporter/` | 392 | 1,077 | 0 | no | superseded RFC-0004 (archived) + RFC-0013 §7 | | `components/receivers/pyspy/` | 1,970 | 4,540 | 0 | no | RFC-0013 §7 v0.3.0 row → **deferred to v0.4.0+** per [#222](https://github.com/TraceCoreAI/tracecore/issues/222) | LOC source: `find -name "*.go" ! -name "*_test.go" | xargs wc -l`. The pyspy row is "dead from the binary, live in the chart's stated roadmap." Cannot ship-delete in v1.0-rc1 without resolving #222. -otlphttp + stdoutexporter wrappers have no such constraint — the -release pipeline (`make build` → OCB) does not touch either. + +The two in-tree exporter wrappers (`components/exporters/otlphttp/` +and `components/exporters/stdoutexporter/`) called out by earlier +revisions of this audit were deleted post-v0.2.0 once the OCB build +took over (issues #333 + #334 closed). **No other dead-code candidates surfaced.** Every other exported symbol under `module/pkg/`, `module/processor/`, `module/receiver/`, @@ -134,7 +135,7 @@ action. | `docs/rfcs/` | 14 + 1 archived | binding architecture decisions | RFC-0004 archived; rest active | | `docs/patterns/` | 13 | per-pattern operator docs (one per pattern) | one file per published pattern; pattern table in PRINCIPLES drives the set | | `docs/integrations/` | 11 | operator integration recipes (Loki, cert-manager, multi-cluster, etc.) | one file per integration; no overlap | -| `docs/followups/` | 20 | per-milestone deferral shards | overlap-by-design: `M*.md` is one file per milestone, plus topical shards (`otlphttp.md`, `opportunistic.md`, `skipped.md`) | +| `docs/followups/` | 19 | per-milestone deferral shards | overlap-by-design: `M*.md` is one file per milestone, plus topical shards (`opportunistic.md`, `skipped.md`) | | `docs/research/` | 8 | research notes feeding RFCs | mostly `[STRIKE]`-banner'd post-RFC-0013; retained as audit trail | | `docs/notes/` | 10 | session/lessons grouped by topic (newest-first) | by-design free-form per `notes/README.md` | | `docs/migration/` | 2 | per-cut migration guide | `v0.1-to-v0.2.md` + `v0.2-to-v0.3.md` | @@ -181,8 +182,8 @@ commitment), 5 = load-bearing. | Rank | Candidate | LOC saved | Risk | Reason | |---:|---|---:|---:|---| -| 1 | `components/exporters/otlphttp/` (wrapper) | 2,904 (1,168 non-test) | **1** | Not in OCB build. No Go importers. `docs/followups/otlphttp.md` tags wrapper-touching items `[STRIKE]`. Upstream `otlphttpexporter` v0.130.0 in builder-config.yaml supersedes. | -| 2 | `components/exporters/stdoutexporter/` (wrapper) | 1,077 (392 non-test) | **1** | Not in OCB build. No Go importers. RFC-0004 archived; superseding RFC-0013 §7 lists clockreceiver+stdoutexporter as the pair that goes. Upstream `debugexporter` v0.130.0 in builder-config.yaml supersedes. | +| 1 | ~~`components/exporters/otlphttp/` (wrapper)~~ | ~~2,904 (1,168 non-test)~~ | — | **Done** — deleted in #333. Upstream `otlphttpexporter` v0.130.0 in builder-config.yaml now owns this surface. | +| 2 | ~~`components/exporters/stdoutexporter/` (wrapper)~~ | ~~1,077 (392 non-test)~~ | — | **Done** — deleted in #334. Upstream `debugexporter` v0.130.0 in builder-config.yaml now owns this surface. | | 3 | `components/receivers/pyspy/` + `python/tracecore_pyspy/` + `tools/pyspy-lint/` | 5,617 total | **4** | Not in OCB build (zero Go importers from `_build/components.go`). BUT — RFC-0013's v0.3.0 pyspy-delete row was **deferred to v0.4.0+ per [#222](https://github.com/TraceCoreAI/tracecore/issues/222)**. `docs/migration/v0.2-to-v0.3.md` explicitly states "pyspy ships as-is in v0.3.0." Delete is blocked on #222, not on this audit. | | 4 | `docs/followups/M*.md` shards where every item is `[STRIKE]` or `landed` | ~3,552 total / shard-by-shard ~50-300 | **2** | Cosmetic; consolidate landed deferrals into RFC-0013 audit-trail footer. Per `feedback_no_bloat`: don't track when fix-now is in scope, but these are post-merge artifacts so they're not blocking. | | 5 | `docs/research/m15-container-stdout.md` + `m16-kueue.md` + `m16-kueue-production-followups.md` (already `[STRIKE]`-banner'd) | ~2,500 | **2** | All carry RFC-0013-supersession banners; bodies are retained as decision history. Could collapse into RFC-0013 §audit-trail. Same cosmetic-cleanup class as row 4. | diff --git a/docs/v1-rc1-test-audit.md b/docs/v1-rc1-test-audit.md index 72ca7ffe..81007be1 100644 --- a/docs/v1-rc1-test-audit.md +++ b/docs/v1-rc1-test-audit.md @@ -31,8 +31,6 @@ before rc1. Today's measurement: | `module/pkg/patterns` | **92.1%** | yes | every detector .go ≥ 85% func cov | | `module/processor/patterndetectorprocessor` | **86.2%** | yes | post-#309 dedup | | `module/processor/rankjoinprocessor` | **83.2%** | yes | `rankjoin.go` 79.7% func — borderline | -| `components/exporters/stdoutexporter` | **91.2%** | yes | | -| `components/exporters/otlphttp` | **81.8%** | yes | post-#306 dedup | | `components/receivers/pyspy` | **80.8%** | yes | barely; has fuzz + integration | | `module/pkg/nccl/fr_parser` | **77.7%** | **no** | gap below | | `module/pkg/replay` | **75.5%** | **no** | gap below | @@ -114,8 +112,6 @@ Integration tests found in the worktree: | `module/receiver/ncclfrreceiver` | yes (75.7%) | **no** | **high** — NCCL FR is the moat's flagship receiver and the only place a real `nccl_fr_dump_*.pkl` file is consumed end-to-end. Today the fuzz harness covers the parser, the unit test covers `emit`, but no test wires the full receiver factory → file-watch → emit-to-consumer chain against a binary fixture. | | `module/processor/patterndetectorprocessor` | yes (86.2%) | **no** | **medium** — covered indirectly by `internal/integration/ocb_scrape_test.go` if the patterndetector is in the OCB manifest, but no dedicated end-to-end test asserts the verdict-shape contract against a live receiver. The hermetic replay-corpus job (`chaos.yml pattern-pod-evicted`) is the de-facto integration test for one pattern only. | | `module/processor/rankjoinprocessor` | yes (83.2%) | **no** | low — single-purpose; unit tests already exercise the join semantics. | -| `components/exporters/otlphttp` | yes (81.8%) | **no** | low — retry/classify logic well-covered by unit tests; full HTTP round-trip would mostly re-test upstream `otlphttpexporter`. | -| `components/exporters/stdoutexporter` | yes (91.2%) | **no** | low — same rationale. | --- diff --git a/go.mod b/go.mod index c80708f8..0189a1b2 100644 --- a/go.mod +++ b/go.mod @@ -8,8 +8,6 @@ require ( github.com/stretchr/testify v1.11.1 go.opentelemetry.io/collector/component v1.59.0 go.opentelemetry.io/collector/consumer v1.59.0 - go.opentelemetry.io/collector/exporter v1.59.0 - go.opentelemetry.io/collector/exporter/exportertest v0.153.0 go.opentelemetry.io/collector/pdata v1.59.0 go.opentelemetry.io/collector/pipeline v1.59.0 go.opentelemetry.io/collector/receiver v1.59.0 @@ -218,9 +216,7 @@ require ( go.opentelemetry.io/auto/sdk v1.2.1 // indirect go.opentelemetry.io/collector/component/componenttest v0.153.0 // indirect go.opentelemetry.io/collector/consumer/consumererror v0.153.0 // indirect - go.opentelemetry.io/collector/consumer/consumertest v0.153.0 // indirect go.opentelemetry.io/collector/consumer/xconsumer v0.153.0 // indirect - go.opentelemetry.io/collector/exporter/xexporter v0.153.0 // indirect go.opentelemetry.io/collector/featuregate v1.59.0 // indirect go.opentelemetry.io/collector/internal/componentalias v0.153.0 // indirect go.opentelemetry.io/collector/pdata/pprofile v0.153.0 // indirect diff --git a/go.sum b/go.sum index f141042b..ea6502e2 100644 --- a/go.sum +++ b/go.sum @@ -71,8 +71,6 @@ github.com/catenacyber/perfsprint v0.9.1 h1:5LlTp4RwTooQjJCvGEFV6XksZvWE7wCOUvjD github.com/catenacyber/perfsprint v0.9.1/go.mod h1:q//VWC2fWbcdSLEY1R3l8n0zQCDPdE4IjZwyY1HMunM= github.com/ccojocar/zxcvbn-go v1.0.2 h1:na/czXU8RrhXO4EZme6eQJLR4PzcGsahsBOAwU6I3Vg= github.com/ccojocar/zxcvbn-go v1.0.2/go.mod h1:g1qkXtUSvHP8lhHp5GrSmTz6uWALGRMQdw6Qnz/hi60= -github.com/cenkalti/backoff/v5 v5.0.3 h1:ZN+IMa753KfX5hd8vVaMixjnqRZ3y8CuJKRKj1xcsSM= -github.com/cenkalti/backoff/v5 v5.0.3/go.mod h1:rkhZdG3JZukswDf7f0cwqPNk4K0sa+F97BxZthm/crw= github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= github.com/charithe/durationcheck v0.0.10 h1:wgw73BiocdBDQPik+zcEoBG/ob8uyBHf2iyoHGPf5w4= @@ -248,12 +246,6 @@ github.com/kisielk/errcheck v1.9.0 h1:9xt1zI9EBfcYBvdU1nVrzMzzUPUtPKs9bVSIM3TAb3 github.com/kisielk/errcheck v1.9.0/go.mod h1:kQxWMMVZgIkDq7U8xtG/n2juOjbLgZtedi0D+/VL/i8= github.com/kkHAIKE/contextcheck v1.1.6 h1:7HIyRcnyzxL9Lz06NGhiKvenXq7Zw6Q0UQu/ttjfJCE= github.com/kkHAIKE/contextcheck v1.1.6/go.mod h1:3dDbMRNBFaq8HFXWC1JyvDSPm43CmE6IuHam8Wr0rkg= -github.com/knadh/koanf/maps v0.1.2 h1:RBfmAW5CnZT+PJ1CVc1QSJKf4Xu9kxfQgYVQSu8hpbo= -github.com/knadh/koanf/maps v0.1.2/go.mod h1:npD/QZY3V6ghQDdcQzl1W4ICNVTkohC8E73eI2xW4yI= -github.com/knadh/koanf/providers/confmap v1.0.0 h1:mHKLJTE7iXEys6deO5p6olAiZdG5zwp8Aebir+/EaRE= -github.com/knadh/koanf/providers/confmap v1.0.0/go.mod h1:txHYHiI2hAtF0/0sCmcuol4IDcuQbKTybiB1nOcUo1A= -github.com/knadh/koanf/v2 v2.3.4 h1:fnynNSDlujWE+v83hAp8wKr/cdoxHLO0629SN+U8Urc= -github.com/knadh/koanf/v2 v2.3.4/go.mod h1:gRb40VRAbd4iJMYYD5IxZ6hfuopFcXBpc9bbQpZwo28= github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= @@ -303,14 +295,10 @@ github.com/mattn/go-shellwords v1.0.12 h1:M2zGm7EW6UQJvDeQxo4T51eKPurbeFbe8WtebG github.com/mattn/go-shellwords v1.0.12/go.mod h1:EZzvwXDESEeg03EKmM+RmDnNOPKG4lLtQsUlTZDWQ8Y= github.com/mgechev/revive v1.9.0 h1:8LaA62XIKrb8lM6VsBSQ92slt/o92z5+hTw3CmrvSrM= github.com/mgechev/revive v1.9.0/go.mod h1:LAPq3+MgOf7GcL5PlWIkHb0PT7XH4NuC2LdWymhb9Mo= -github.com/mitchellh/copystructure v1.2.0 h1:vpKXTN4ewci03Vljg/q9QvCGUDttBOGBIa15WveJJGw= -github.com/mitchellh/copystructure v1.2.0/go.mod h1:qLl+cE2AmVv+CoeAwDPye/v+N2HKCj9FbZEVFJRxO9s= github.com/mitchellh/go-homedir v1.1.0 h1:lukF9ziXFxDFPkA1vsr5zpc1XuPDn/wFntq5mG+4E0Y= github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0= github.com/mitchellh/mapstructure v1.5.0 h1:jeMsZIYE/09sWLaz43PL7Gy6RuMjD2eJVyuac5Z2hdY= github.com/mitchellh/mapstructure v1.5.0/go.mod h1:bFUtVrKA4DC2yAKiSyO/QUcy7e+RRV2QTWOzhPopBRo= -github.com/mitchellh/reflectwalk v1.0.2 h1:G2LzWKi524PWgd3mLHV8Y5k7s6XUvT0Gef6zxSIeXaQ= -github.com/mitchellh/reflectwalk v1.0.2/go.mod h1:mSTlrgnPZtwu0c4WaC2kGObEpuNDbx0jmZXqmk4esnw= github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg= github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= @@ -500,20 +488,10 @@ go.augendre.info/fatcontext v0.8.0 h1:2dfk6CQbDGeu1YocF59Za5Pia7ULeAM6friJ3LP7lm go.augendre.info/fatcontext v0.8.0/go.mod h1:oVJfMgwngMsHO+KB2MdgzcO+RvtNdiCEOlWvSFtax/s= go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ64= go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y= -go.opentelemetry.io/collector/client v1.59.0 h1:OzT7FdIiFVWKGU/UPgVDuCQtaI/GwTjLqWVz14IPGTI= -go.opentelemetry.io/collector/client v1.59.0/go.mod h1:1V5wVLhMZIMxeVoRK5AVHXnK2Ozyn6WNtmzMAjIMwY0= go.opentelemetry.io/collector/component v1.59.0 h1:WtulkwzsdAOM/LE0cH/IiudUgiyb2ueVDDeEh5HsXzo= go.opentelemetry.io/collector/component v1.59.0/go.mod h1:5eQCM0tS6qbG2XJ6Bt67kgCxuhCbg4csVv6SywyHZ6w= go.opentelemetry.io/collector/component/componenttest v0.153.0 h1:u6O1l+9PUNnI22k8q4sF9wgLAUyt+y2Ov7xi1yZ+wE4= go.opentelemetry.io/collector/component/componenttest v0.153.0/go.mod h1:Ym/d5UxnoHmrCI5LsVANBicikNiqtwjk8uWBM+Z4jDM= -go.opentelemetry.io/collector/config/configoptional v1.59.0 h1:kP751PU6KMxpw64DTPLD6vNDPjxqNJqFo7fPemu65Lc= -go.opentelemetry.io/collector/config/configoptional v1.59.0/go.mod h1:+TnLr1EMcsR4jJGoBHOCbifgunBjV3rHWNlDWPgBJcE= -go.opentelemetry.io/collector/config/configretry v1.59.0 h1:Y/V2GjMtfU4daUM1ZcjxI1woorK8nop8VBVagJ5HmLc= -go.opentelemetry.io/collector/config/configretry v1.59.0/go.mod h1:1BoQ5SvJT751bqP/5g0VTPLkNgMtvifAr2QqMCVOv2o= -go.opentelemetry.io/collector/confmap v1.59.0 h1:asxEEiWwNuGfmTVSctYArOTF8jcxYpS4NmPtvvFhNNI= -go.opentelemetry.io/collector/confmap v1.59.0/go.mod h1:X5SjFINrF0cb0hcM8PwnW2z+0L/pHA1H1yPPtm2j0tY= -go.opentelemetry.io/collector/confmap/xconfmap v0.153.0 h1:RSn7C87eBp1iuWip3hX8v9AR8adMze1usJxwumefkq4= -go.opentelemetry.io/collector/confmap/xconfmap v0.153.0/go.mod h1:4SuPyutUiirRqQoPeY+13lLYkOQ2opMQgKvKACnXh1k= go.opentelemetry.io/collector/consumer v1.59.0 h1:EFDQ8ZTWwHEratWusKgW26W20LeAWdXj235iD9MxpuA= go.opentelemetry.io/collector/consumer v1.59.0/go.mod h1:wrOSfXkKjzEeHYxhKGukuB9+NFV/CboWwEH1oEXhMtw= go.opentelemetry.io/collector/consumer/consumererror v0.153.0 h1:x64J2xnSOoOp3yZRWioZ03vNAF19aBioUsRlZ4eC6x8= @@ -522,18 +500,6 @@ go.opentelemetry.io/collector/consumer/consumertest v0.153.0 h1:fCqkOWNVK/qipg4S go.opentelemetry.io/collector/consumer/consumertest v0.153.0/go.mod h1:YfVqyCrvEfvsbbrpfiH97fITh4vmIBzZT112tUluqx0= go.opentelemetry.io/collector/consumer/xconsumer v0.153.0 h1:Jt1Aiiq9zDILzppRTEdZXaA830KBjjP1i2kWL+h5br0= go.opentelemetry.io/collector/consumer/xconsumer v0.153.0/go.mod h1:IzJ/dTOtEUMjY0DYu6kO946u5jbiDDUG8+sLaLhJ+HY= -go.opentelemetry.io/collector/exporter v1.59.0 h1:euVTD79AVw8eSKi/OhX3iLBV/EnY9KFY10z7yCTN7xQ= -go.opentelemetry.io/collector/exporter v1.59.0/go.mod h1:VVpK5byUW0r/ncXT9RltZGlzMp41sZj0i3F8pq9ESCQ= -go.opentelemetry.io/collector/exporter/exporterhelper v0.153.0 h1:PdRWHJIJguacgkMd3EDMDYGeNYAJS9dsTQflj7cCi1Q= -go.opentelemetry.io/collector/exporter/exporterhelper v0.153.0/go.mod h1:T+Pd0Zzlih2g/b2Yl70tA8XRENKv3ssJc/3hNvNuPoo= -go.opentelemetry.io/collector/exporter/exportertest v0.153.0 h1:kXrli6XWylPE7TjkgEhnv3ZnyW5tX8YzPxlsCH3alF0= -go.opentelemetry.io/collector/exporter/exportertest v0.153.0/go.mod h1:nMQIX7zf0eUA3o66WEdFaRouIcVKPJye6NG5FeNBbKs= -go.opentelemetry.io/collector/exporter/xexporter v0.153.0 h1:u2HeNDQdILoC1mtd76d4pbBTU6WjU2d8g3SKR+iWN4A= -go.opentelemetry.io/collector/exporter/xexporter v0.153.0/go.mod h1:3k51y+RlA8nMm0Y2NXkWtmYBs48ZhYbxHqiecobVUU4= -go.opentelemetry.io/collector/extension v1.59.0 h1:fewl/jU67OT2LePe3xNbYdMog4KruN4TZbvr0haoNFE= -go.opentelemetry.io/collector/extension v1.59.0/go.mod h1:OPhsUCwdQ6Z8k8KXnmxTt6gmsWTBxrkIvL+/SoHiX8Y= -go.opentelemetry.io/collector/extension/xextension v0.153.0 h1:X81P0kRlWc2gXXxb4/XHzSuDTzs0DGOwjUk4trFa8gw= -go.opentelemetry.io/collector/extension/xextension v0.153.0/go.mod h1:4V5uGnqdaUcXQV6HXjs9VF89dq+y7Icex96JCANVur0= go.opentelemetry.io/collector/featuregate v1.59.0 h1:pu70/9eWRjAjzGnr3VmqwY+k6fmU3esLp15AqxfBBz0= go.opentelemetry.io/collector/featuregate v1.59.0/go.mod h1:4ga1QBMPEejXXmpyJS8lmaRpknJ3Lb9Bvk6e420bUFU= go.opentelemetry.io/collector/internal/componentalias v0.153.0 h1:tr5I5hsJPJBbVPGjvOVVPPK4dLR5ZLqVEckuOfnzzMs= @@ -546,12 +512,8 @@ go.opentelemetry.io/collector/pdata/pprofile v0.153.0 h1:IurcS9g28cVrtIvc1oUVN8v go.opentelemetry.io/collector/pdata/pprofile v0.153.0/go.mod h1:nX33dLKrwdoz/FQeIs6JKPeZcS1KbU2VIOwH7K0F05c= go.opentelemetry.io/collector/pdata/testdata v0.153.0 h1:pO/+b8Rz5PG0G+TskdYdQEjhXnNG2bQoiRRMwic6X0I= go.opentelemetry.io/collector/pdata/testdata v0.153.0/go.mod h1:JVjUfJv9YFL3JUhisxMdLe2DX7we227Ry9JiTgwlEro= -go.opentelemetry.io/collector/pdata/xpdata v0.153.0 h1:aN+kNiUUMOzmgrMSn/jwUQFSkrCKDc6wzSA3ZYv5+V0= -go.opentelemetry.io/collector/pdata/xpdata v0.153.0/go.mod h1:o3qYgdBimJtEPHefjOrB3WPIHtNenT2lHuHrYBRAPE4= go.opentelemetry.io/collector/pipeline v1.59.0 h1:AyEcAPy5ZuUFpqis9i97WEIAcFh/mEEo90+1wr4urNU= go.opentelemetry.io/collector/pipeline v1.59.0/go.mod h1:RD90NG3Jbk965Xaqym3JyHkuol4uZJjQVUkD9ddXJIs= -go.opentelemetry.io/collector/pipeline/xpipeline v0.153.0 h1:b2GvTGxFqcRFHARtU95/pOh9uhdY8TDTcrA5TOy3mhs= -go.opentelemetry.io/collector/pipeline/xpipeline v0.153.0/go.mod h1:EPU4bxOCNAI6Jny7ijUzRNATGNkcTMMBazY+NK2Ifng= go.opentelemetry.io/collector/receiver v1.59.0 h1:2jf2guECLyWno3OqNaefjfo+Tt3MSm6uz1WsbIXv5Ao= go.opentelemetry.io/collector/receiver v1.59.0/go.mod h1:rklWFy0kaMMghuGlrTxqYP96V810HwDVwIgKpP6kn2o= go.opentelemetry.io/collector/receiver/receivertest v0.153.0 h1:YHvywYcyV1G0I5r/J0OxgOJZPLLyrMAMA0Q7VdBYFxc= diff --git a/module/pkg/selftel/selftel.go b/module/pkg/selftel/selftel.go index 5a8e6f5e..c3870ac8 100644 --- a/module/pkg/selftel/selftel.go +++ b/module/pkg/selftel/selftel.go @@ -1,10 +1,12 @@ // SPDX-License-Identifier: Apache-2.0 // Package selftel is the shared self-telemetry plumbing for in-repo -// receivers + exporters. It replaces four hand-rolled selftel.go -// copies (one each in components/exporters/otlphttp, -// components/exporters/stdoutexporter, components/receivers/pyspy, -// and module/receiver/ncclfrreceiver). The per-component +// receivers + exporters. It originated as a deduplication of four +// hand-rolled selftel.go copies (one each in the in-tree otlphttp + +// stdoutexporter wrappers — both deleted post-v0.2.0 once upstream +// otlphttpexporter + debugexporter took over the OCB build — plus +// components/receivers/pyspy and module/receiver/ncclfrreceiver). +// The per-component // instrument-name / scope-name choices stay with the caller (those are // the operator-facing contracts pinned by upstream OTel collector // `otelcol___` convention per RFC-0013 From e3cddcfcc7ff5b203743523a73ac17ee2b2200f8 Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Mon, 1 Jun 2026 01:48:55 -0700 Subject: [PATCH 2/2] docs(failure-modes): drop stale in-tree-wrapper footnotes Signed-off-by: Tri Lam --- docs/FAILURE-MODES.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/FAILURE-MODES.md b/docs/FAILURE-MODES.md index 40d33009..cc2977eb 100644 --- a/docs/FAILURE-MODES.md +++ b/docs/FAILURE-MODES.md @@ -93,10 +93,10 @@ existing alerts survive the swap from in-tree receiver to upstream + OTTL. | Scenario | Behaviour | Test | |---|---|---| -| 🟢 Exporter unreachable (network error mid-send) | Upstream `otlphttpexporter` retries on retryable HTTP status codes (429/502/503/504) and on network errors with exponential backoff via `exporterhelper`'s retry-sender; the failure surfaces via `otelcol_exporter_send_failed_*` counters. | Upstream `go.opentelemetry.io/collector/exporter/otlphttpexporter`; the in-tree wrapper retired post-v0.2.0 (issue #333). | +| 🟢 Exporter unreachable (network error mid-send) | Upstream `otlphttpexporter` retries on retryable HTTP status codes (429/502/503/504) and on network errors with exponential backoff via `exporterhelper`'s retry-sender; the failure surfaces via `otelcol_exporter_send_failed_*` counters. | Upstream `go.opentelemetry.io/collector/exporter/otlphttpexporter`. | | 🟢 Vendor SDK failure (`dcgm-exporter` unreachable at Start) | `prometheusreceiver` records the scrape failure and emits `up=0`; the pipeline continues without the scrape target's contribution rather than failing the whole binary. Source: upstream `prometheusreceiver` scraping `dcgm-exporter` per the bundled recipe. | recipe-level alert `DCGMReceiverDegraded`; see `tracecore-recipes` chart. | | 🟢 Config invalid (unknown top-level field) | Upstream `confmap` returns a path-tagged error citing the offending key; `tracecore validate` exits non-zero before any I/O. | Upstream `confmap`; RFC-0013 PR-F.2 retired the legacy in-tree `internal/config` test. | -| 🟢 Config invalid (bad exporter endpoint) | Upstream `otlphttpexporter` rejects non-http/https endpoints at config-load time via `confighttp.ClientConfig.Validate`; `tracecore validate` exits non-zero before any I/O. | Upstream `go.opentelemetry.io/collector/config/confighttp`; the in-tree wrapper retired post-v0.2.0 (issue #333). | +| 🟢 Config invalid (bad exporter endpoint) | Upstream `otlphttpexporter` rejects non-http/https endpoints at config-load time via `confighttp.ClientConfig.Validate`; `tracecore validate` exits non-zero before any I/O. | Upstream `go.opentelemetry.io/collector/config/confighttp`. | ## Self-telemetry surface (M2)