diff --git a/CHANGELOG.md b/CHANGELOG.md index 6b41577b..9a94a82c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -30,6 +30,15 @@ Pre-alpha. The CLI runs the M1 pipeline runtime end-to-end via factory-based ass - **`internal/selftelemetry`** — producer-side `Receiver` interface (`IncError`, `IncEmissions`, `ObserveLatency`, `SetDegraded`, `MarkActivity`) that components write to when reporting their own health, plus a noop default. The `/metrics` endpoint that surfaces these to operators is owned by M2; this package lets M8+ receivers wire to self-telemetry from day one without waiting for M2. - **`docs/agents/REVIEWER-CONTEXT.md`** — pre-digested standards bundle for review subagents launched from the parallel-agent ralph-loops (M8, M9, M10+). Consolidates the must-read entries from `STYLE.md`, `PRINCIPLES.md`, `STYLE-errors.md`, `NORTHSTARS.md`, and the current divergences table. - **`docs/agents/RECEIVER-PATTERNS.md`** — cross-loop knowledge sink. Each parallel-agent receiver loop appends patterns future receiver authors should inherit (build-tag layout, streaming-source lifecycle, subprocess teardown, self-telemetry wiring, cardinality cap, degraded-mode re-arm, correlation-context propagation). +- **M2 — self-telemetry surface (`/metrics`, `/healthz`, `/readyz`) + MeterProvider + ReportStatus alignment.** [RFC-0006](docs/rfcs/RFC-0006-self-telemetry-surface.md): + - `internal/telemetry.NewMeterProvider()` constructs the OTel SDK MeterProvider backed by an OTel Prometheus exporter writing to a tracecore-owned `*prometheus.Registry`. + - `internal/telemetry.Server` mounts `/metrics`, `/healthz`, `/readyz` on a single listener. Default OFF; default `listen: "localhost:8888"` when enabled. + - `internal/selftelemetry.NewReceiver(id, mp)` real impl backing the 5-method interface. Receivers acquire it from `TelemetrySettings.MeterProvider` in one line. + - `clockreceiver` wired as the canonical example: ObserveLatency around every push, IncEmissions/MarkActivity on success, IncError("downstream") on failure. + - `cmd/tracecore` plumbs MeterProvider + BuildInfo + Server lifecycle. ReadyFn flips when Runtime.Start returns. + - **Closes three STRATEGY M2 divergence rows:** `Host.ReportStatus` → free fn `internal/componentstatus.ReportStatus(host, ev)`; `CreateSettings` gains `BuildInfo` + `_ struct{}` guard; `TelemetrySettings` gains `MeterProvider` + `_ struct{}` guard. + - End-to-end integration tests pin the operator-observable contract: scrape returns 200 + expected metric names; default-off does not bind a port. + - **O2 SLO observable gauges** (`tracecore.exporter.failure_rate`, `tracecore.queue.depth_ratio`, `tracecore.component.restart_count_per_hour`): `failure_rate` is a **rolling 60s window** rate driven by real exporter signal via `selftelemetry.Exporter` wired into `stdoutexporter` (the lifetime cumulative ratio would have pinned the gauge above 0 forever after a single failure — useless for SLO alerting); `queue.depth_ratio` and `restart_count_per_hour` report 0 today (carry-forward until tracecore has a queue mechanism + a runtime restart mechanism). Raw counter `tracecore.exporter.calls_total{result,kind}` also surfaces so operators can derive custom windows in PromQL. ### Changed diff --git a/MILESTONES.md b/MILESTONES.md index 4a1ec058..4e1f18ae 100644 --- a/MILESTONES.md +++ b/MILESTONES.md @@ -8,6 +8,8 @@ See [`NORTHSTARS.md`](NORTHSTARS.md) §O6 for the governing operating rules. **Target hit-rate:** ≥80% of the milestones below shipped on the committed week. **Scope:** in-repo work only. Standards-body engagement (NORTHSTARS.md O4) and external community work are recurring cadences, not quarterly milestones — they live in [`NORTHSTARS.md`](NORTHSTARS.md), not here. +**Status glyphs:** ☐ planned / not yet started · ☑ delivered · ☒ deliberately not done (policy choice or out-of-scope) · ⊟ carry-forward (intent valid, deferred for a documented reason). + ## Status legend - ☐ **planned** — committed at quarter start @@ -81,17 +83,28 @@ Must reach `⧗` before Phase 2 broadly starts. Eight foundation tracks; most ca ### M2. Self-telemetry surface -- **Status:** ☐ +- **Status:** ☑ (delivered) - **Owner:** observability lead -- **Files touched:** `internal/telemetry/*.go`, `cmd/tracecore/main.go` (wire-up) +- **Files touched:** `internal/telemetry/*.go`, `internal/selftelemetry/impl*.go`, `internal/componentstatus/*.go`, `internal/pipeline/{component,factory,runtime}.go`, `internal/pipelinebuilder/builder.go`, `internal/config/{config,load,telemetry_test}.go`, `components/receivers/clockreceiver/clockreceiver.go`, `cmd/tracecore/{collect,integration_telemetry_test}.go`, [RFC-0006](docs/rfcs/RFC-0006-self-telemetry-surface.md), `internal/telemetry/README.md`. - **Depends on:** M1 alpha - **Acceptance:** - - `/metrics` (Prometheus exposition), `/healthz`, `/readyz` endpoints on configurable port - - Optional `pprof` via `--debug.pprof` flag (off by default) - - Every component contributes ingested / dropped / queue-depth / exporter success-failure counters through this single surface (per [RFC 0001](docs/rfcs/0001-architecture-overview.md) §Self-telemetry) - - Self-telemetry SLO thresholds wired: exporter failure rate >0.1% sustained → `/readyz` reports degraded; component restart >1/hr → `/readyz` reports degraded - - Structured `slog` logging on stderr, JSON by default (per [`STYLE.md`](STYLE.md) §Logging) - - `internal/telemetry/README.md` documents the contract + - ☑ `/metrics` (Prometheus exposition), `/healthz`, `/readyz` endpoints on configurable port + - ☐ Optional `pprof` via `--debug.pprof` flag — **Carry-forward from M2:** security policy story > 5 LOC of plumbing. + - ☑ Receivers contribute self-metrics (errors/emissions/latency/degraded/last-activity) through `selftelemetry.Receiver` injected via `TelemetrySettings.MeterProvider`. + - ☑ Three O2 SLO observable gauges emitted with exact names: `tracecore.exporter.failure_rate` (driven by real exporter signal via `selftelemetry.Exporter` wired into `stdoutexporter`), `tracecore.queue.depth_ratio` (0 — **Carry-forward from M2** until queue mechanism lands), `tracecore.component.restart_count_per_hour` (0 — **Carry-forward from M2** until restart mechanism lands). Raw counter `tracecore.exporter.calls_total{result,kind}` also emitted so operators can derive richer rates. + - ☒ Self-telemetry SLO thresholds wired to `/readyz` — **Policy-declined**, not deferred. RFC-0006 explicitly chose degraded ≠ not-ready, so k8s doesn't evict pods on transient backend issues. Operators alert on the SLO gauges via Prometheus rules rather than via /readyz. (☒ glyph distinguishes a deliberate policy choice from a carry-forward.) + - ☑ Structured `slog` logging on stderr (inherited from M1). + - ☑ `internal/telemetry/README.md` documents the contract. + - ☑ Three STRATEGY M2 divergences closed (`Host.ReportStatus` → free fn, `CreateSettings` BuildInfo + guard, `TelemetrySettings` MeterProvider + guard). +- **Carry-forward from M2:** + - pprof endpoint (security policy work). + - Queue mechanism that drives `tracecore.queue.depth_ratio` (the gauge is registered + reports 0 today; needs a queue impl). + - Component restart mechanism that drives `tracecore.component.restart_count_per_hour` (gauge registered + reports 0 today; needs a runtime restart impl). + - OTLP push reader on the MeterProvider (operators on push-only backends). + - `MetricsLevel` knob (only when cardinality becomes a real problem). + - Histogram bucket tuning for `collection_latency_seconds`. + - Per-role `CreateSettings` split (`receiver.Settings` / `exporter.Settings` / `processor.Settings`). + - `TracerProvider` field on `TelemetrySettings` (tracing milestone). - **Can run alongside:** M3, M4, M4b, M5, M5b, M6 in drafting; final integration after M1 alpha. - **Effort:** ≈1 week diff --git a/Makefile b/Makefile index 96aa9f58..7242775d 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -.PHONY: help build run test fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify license-check license-fix govulncheck dco-check ai-review hooks clean check ci generate generate-check coverage coverage-check doc-check +.PHONY: help build run test bench bench-check fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify license-check license-fix govulncheck dco-check ai-review hooks clean check ci generate generate-check coverage coverage-check doc-check BIN := tracecore PKG := ./cmd/tracecore @@ -34,6 +34,20 @@ run: ## Run tracecore from source (no version embedding). test: ## Run unit tests with the race detector. go test -race ./... +bench: ## Run benchmarks across the repo with -benchmem, count=5. + go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./... + +bench-check: ## Compare current bench against internal/telemetry/testdata/bench-baseline.txt. Fails if any row regresses >10% on geomean. + @out=$$(mktemp); \ + go test -bench=Benchmark -benchmem -benchtime=500ms -count=5 -run='^$$' ./internal/telemetry/ > $$out 2>&1; \ + if ! command -v benchstat >/dev/null 2>&1; then \ + echo "benchstat not installed; install with: go install golang.org/x/perf/cmd/benchstat@latest"; \ + exit 2; \ + fi; \ + echo "Comparing $$out against internal/telemetry/testdata/bench-baseline.txt..."; \ + benchstat internal/telemetry/testdata/bench-baseline.txt $$out + + fmt: ## Check formatting; fails if any file is not gofumpt-clean. @diff=$$(go tool gofumpt -l .); \ if [ -n "$$diff" ]; then echo "gofumpt needs to be run on:"; echo "$$diff"; exit 1; fi diff --git a/cmd/tracecore/collect.go b/cmd/tracecore/collect.go index ed805e4e..6357a994 100644 --- a/cmd/tracecore/collect.go +++ b/cmd/tracecore/collect.go @@ -4,15 +4,57 @@ package main import ( "context" + "fmt" "log/slog" + "sync/atomic" "time" "github.com/tracecoreai/tracecore/internal/config" "github.com/tracecoreai/tracecore/internal/pipeline" "github.com/tracecoreai/tracecore/internal/pipelinebuilder" + "github.com/tracecoreai/tracecore/internal/selftelemetry" + "github.com/tracecoreai/tracecore/internal/telemetry" "github.com/tracecoreai/tracecore/internal/version" ) +// exporterRegistry implements telemetry.ExporterRegistry over the +// exporters built by pipelinebuilder. Captured at build time; +// readers see a stable snapshot of registered FailureRateReaders. +type exporterRegistry struct { + readers []selftelemetry.FailureRateReader +} + +func (r exporterRegistry) RegisteredExporters() []selftelemetry.FailureRateReader { + return r.readers +} + +func collectFailureRateReaders(pipelines []pipeline.Pipeline) exporterRegistry { + // Dedup by FailureRateReader identity (pointer equality — the + // FailureRateReader docstring pins the pointer-type requirement). + // Same exporter referenced from multiple pipelines must contribute + // once, not N times, or the failure_rate aggregate is multiplied. + seen := map[selftelemetry.FailureRateReader]struct{}{} + var readers []selftelemetry.FailureRateReader + for _, p := range pipelines { + for _, exp := range p.Exporters { + carrier, ok := exp.(selftelemetry.ExporterCarrier) + if !ok { + continue + } + frr, ok := carrier.SelfExporter().(selftelemetry.FailureRateReader) + if !ok { + continue + } + if _, dup := seen[frr]; dup { + continue + } + seen[frr] = struct{}{} + readers = append(readers, frr) + } + } + return exporterRegistry{readers: readers} +} + // runCollect loads config, builds the pipeline runtime, runs until the // context is cancelled (SIGINT/SIGTERM), then performs two-phase // shutdown. Returns the process exit code. @@ -32,12 +74,99 @@ func runCollect(ctx context.Context, logger *slog.Logger, configPath string, dra return exitDataErr } - pipelines, err := pipelinebuilder.BuildPipelines(ctx, logger, cfg, components()) + buildInfo := pipeline.BuildInfo{ + Command: "tracecore", + Description: "tracecore observability collector for GPU clusters", + Version: info.Version, + } + + // Construct the production MeterProvider only when the operator + // opted in. Otherwise BuildPipelines's default noop is fine and + // the HTTP server stays off. + var ( + meterProvider *telemetry.MeterProvider + telemetrySrv *telemetry.Server + ready atomic.Bool + ) + if cfg.Telemetry.Enabled { + mp, srv, err := initTelemetryStack(ctx, logger, cfg.Telemetry, ready.Load) + if err != nil { + logger.Error("init telemetry stack", "err", err) + return exitFailure + } + meterProvider = mp + telemetrySrv = srv + } + + buildOpts := []pipelinebuilder.BuildOption{pipelinebuilder.WithBuildInfo(buildInfo)} + if meterProvider != nil { + buildOpts = append(buildOpts, pipelinebuilder.WithMeterProvider(meterProvider.Provider)) + } + + pipelines, err := pipelinebuilder.BuildPipelines(ctx, logger, cfg, components(), buildOpts...) if err != nil { logger.Error("build pipelines", "err", err) + shutdownTelemetry(ctx, logger, telemetrySrv, meterProvider) return exitDataErr } + if err := registerObservability(meterProvider, pipelines, buildInfo, info.Revision); err != nil { + logger.Error("register observability metrics", "err", err) + shutdownTelemetry(ctx, logger, telemetrySrv, meterProvider) + return exitFailure + } + + return runRuntime(ctx, logger, pipelines, drainBudget, &ready, telemetrySrv, meterProvider) +} + +// registerObservability wires the SLO observable gauges and the +// build_info join-target onto the operator-facing MeterProvider once +// pipelines are built and exporters are concrete. No-op when +// telemetry is disabled (meterProvider == nil). +func registerObservability(mp *telemetry.MeterProvider, pipelines []pipeline.Pipeline, buildInfo pipeline.BuildInfo, revision string) error { + if mp == nil { + return nil + } + src := telemetry.NewAggregateSLOSource( + collectFailureRateReaders(pipelines), + telemetry.DefaultSLOWindow, + ) + if err := telemetry.RegisterSLOMetrics(mp.Provider, src); err != nil { + return fmt.Errorf("SLO metrics: %w", err) + } + // Surface build identity as a Prometheus join-target so + // operators see version metadata next to any tracecore metric. + // Mirrors the otelcol_build_info / prometheus_build_info + // convention. + if err := telemetry.RegisterBuildInfo(mp.Provider, map[string]string{ + "command": buildInfo.Command, + "version": buildInfo.Version, + "revision": revision, + }); err != nil { + return fmt.Errorf("build_info metric: %w", err) + } + return nil +} + +// runRuntime starts the pipeline runtime, flips /readyz to ready, +// blocks until the context is cancelled (SIGINT/SIGTERM), then +// performs the ordered shutdown: +// +// 1. flip /readyz to 503 so scrapers stop new traffic during drain. +// 2. shut down the runtime (drains receivers + exporters). +// 3. shut down telemetry server + MeterProvider in that order so +// a final scrape doesn't fail mid-export. +// +// Returns the process exit code. +func runRuntime( + ctx context.Context, + logger *slog.Logger, + pipelines []pipeline.Pipeline, + drainBudget time.Duration, + ready *atomic.Bool, + telemetrySrv *telemetry.Server, + meterProvider *telemetry.MeterProvider, +) int { rt := pipeline.NewRuntime(pipelines, pipeline.WithLogger(logger), pipeline.WithDrainBudget(drainBudget), @@ -52,22 +181,97 @@ func runCollect(ctx context.Context, logger *slog.Logger, configPath string, dra if shutdownErr := rt.Shutdown(ctx); shutdownErr != nil { logger.Warn("shutdown after failed start", "err", shutdownErr) } + shutdownTelemetry(ctx, logger, telemetrySrv, meterProvider) return exitFailure } + ready.Store(true) logger.Info("tracecore running; waiting for signal") <-ctx.Done() logger.Info("shutdown signal received", "drain_budget", drainBudget) + // Flip /readyz to 503 before the pipeline begins shutting down so + // scrapers stop sending traffic our way. + ready.Store(false) + // WithoutCancel preserves ctx values but drops the already-fired // cancellation so Shutdown's own per-phase timeouts can apply // against a fresh deadline. shutdownCtx := context.WithoutCancel(ctx) - if err := rt.Shutdown(shutdownCtx); err != nil { - logger.Error("shutdown", "err", err) + rtErr := rt.Shutdown(shutdownCtx) + if rtErr != nil { + logger.Error("shutdown", "err", rtErr) + } + + shutdownTelemetry(shutdownCtx, logger, telemetrySrv, meterProvider) + + if rtErr != nil { return exitFailure } logger.Info("tracecore stopped cleanly") return exitOK } + +// initTelemetryStack constructs the MeterProvider + Server pair when +// the operator enabled the self-telemetry surface. On any error it +// drains whatever has already been built and returns the error, +// keeping runCollect's complexity bounded. +func initTelemetryStack(ctx context.Context, logger *slog.Logger, cfg config.Telemetry, readyFn func() bool) (*telemetry.MeterProvider, *telemetry.Server, error) { + mp, err := telemetry.NewMeterProvider() + if err != nil { + return nil, nil, fmt.Errorf("meter provider: %w", err) + } + + srv, err := telemetry.NewServer(telemetry.ServerConfig{ + Listen: cfg.Listen, + MeterProvider: mp, + Paths: telemetry.Paths{ + Metrics: cfg.Paths.Metrics, + Healthz: cfg.Paths.Healthz, + Readyz: cfg.Paths.Readyz, + }, + ReadyFn: readyFn, + Logger: logger, + }) + if err != nil { + if sErr := mp.Shutdown(ctx); sErr != nil { + logger.Warn("shutdown telemetry meter provider after init failure", "err", sErr) + } + return nil, nil, fmt.Errorf("telemetry server: %w", err) + } + + // Start the telemetry server FIRST so /healthz and /readyz are + // observable from the moment the binary commits to running. + // /readyz still reports 503 until the pipeline has Started. + if err := srv.Start(ctx); err != nil { + if sErr := mp.Shutdown(ctx); sErr != nil { + logger.Warn("shutdown telemetry meter provider after start failure", "err", sErr) + } + return nil, nil, fmt.Errorf("start telemetry server: %w", err) + } + + logger.Info("telemetry surface listening", + "listen", cfg.Listen, + "metrics_path", cfg.Paths.Metrics) + + return mp, srv, nil +} + +// shutdownTelemetry drains the telemetry server + MeterProvider in +// that order; the server's connections must close before metric +// exports drain so a final scrape doesn't fail mid-export. Both calls +// are idempotent; nil receivers are accepted (telemetry surface was +// disabled). +func shutdownTelemetry(ctx context.Context, logger *slog.Logger, srv *telemetry.Server, mp *telemetry.MeterProvider) { + if srv != nil { + if err := srv.Shutdown(ctx); err != nil { + logger.Warn("shutdown telemetry server", "err", err) + } + } + if mp != nil { + if err := mp.Shutdown(ctx); err != nil { + logger.Warn("shutdown telemetry meter provider", "err", err) + } + } +} diff --git a/cmd/tracecore/collect_test.go b/cmd/tracecore/collect_test.go new file mode 100644 index 00000000..a1a7554f --- /dev/null +++ b/cmd/tracecore/collect_test.go @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: Apache-2.0 + +package main + +import ( + "testing" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/selftelemetry" +) + +// fakeExporterCarrier is a stand-in for stdoutexporter that exposes +// its own FailureRateReader so collectFailureRateReaders can dedup. +type fakeExporterCarrier struct { + pipeline.ComponentState + self *fakeFRR +} + +func (f *fakeExporterCarrier) SelfExporter() selftelemetry.Exporter { return f.self } + +type fakeFRR struct{ s, f uint64 } + +func (r *fakeFRR) IncCallSuccess() { r.s++ } +func (r *fakeFRR) IncCallFailure(string) { r.f++ } +func (r *fakeFRR) SuccessCount() uint64 { return r.s } +func (r *fakeFRR) FailureCount() uint64 { return r.f } + +// TestCollectFailureRateReaders_Dedup pins the bug fix: the same +// exporter instance referenced from multiple pipelines must contribute +// to the registry exactly once. Without dedup, the failure_rate gauge +// would double-count and overstate the failure ratio when an exporter +// is shared. +func TestCollectFailureRateReaders_Dedup(t *testing.T) { + t.Parallel() + + shared := &fakeExporterCarrier{self: &fakeFRR{s: 10, f: 1}} + other := &fakeExporterCarrier{self: &fakeFRR{s: 20, f: 0}} + + pipelines := []pipeline.Pipeline{ + { + ID: pipeline.MustNewID(pipeline.MustNewType("metrics"), "primary"), + Exporters: []pipeline.Exporter{shared, other}, + }, + { + // Same `shared` exporter referenced from a second + // pipeline — must NOT be counted twice. + ID: pipeline.MustNewID(pipeline.MustNewType("metrics"), "secondary"), + Exporters: []pipeline.Exporter{shared}, + }, + } + + reg := collectFailureRateReaders(pipelines) + got := reg.RegisteredExporters() + require.Len(t, got, 2, "shared exporter must dedup; expected exactly 2 unique readers") +} + +// TestCollectFailureRateReaders_SkipsNonSelfExporterComponents pins +// the contract that exporters NOT implementing SelfExporter() are +// silently skipped (no panic, no error). Components built before +// criterion-10 simply don't participate. +func TestCollectFailureRateReaders_SkipsNonSelfExporterComponents(t *testing.T) { + t.Parallel() + + pipelines := []pipeline.Pipeline{{ + ID: pipeline.MustNewID(pipeline.MustNewType("metrics"), "x"), + Exporters: []pipeline.Exporter{&plainExporter{}}, + }} + reg := collectFailureRateReaders(pipelines) + require.Empty(t, reg.RegisteredExporters()) +} + +type plainExporter struct{ pipeline.ComponentState } diff --git a/cmd/tracecore/integration_telemetry_test.go b/cmd/tracecore/integration_telemetry_test.go new file mode 100644 index 00000000..27078c40 --- /dev/null +++ b/cmd/tracecore/integration_telemetry_test.go @@ -0,0 +1,310 @@ +// SPDX-License-Identifier: Apache-2.0 + +package main + +import ( + "fmt" + "io" + "net" + "net/http" + "os/exec" + "path/filepath" + "strings" + "syscall" + "testing" + "time" + + "github.com/stretchr/testify/require" +) + +// freeTelemetryPort grabs a port, closes the listener, and returns +// "127.0.0.1:P". Same race-y caveat as the unit-test version, but +// fine for the boot/scrape scenario. +func freeTelemetryPort(t *testing.T) string { + t.Helper() + l, err := net.Listen("tcp", "127.0.0.1:0") + require.NoError(t, err) + addr := l.Addr().String() + require.NoError(t, l.Close()) + return addr +} + +// writeTelemetryIntegrationConfig writes a clockreceiver → +// stdoutexporter config with the telemetry surface enabled at addr. +func writeTelemetryIntegrationConfig(t *testing.T, addr string) string { + t.Helper() + yaml := fmt.Sprintf(` +receivers: + clockreceiver: + interval: 100ms +exporters: + stdoutexporter: {} +telemetry: + enabled: true + listen: %q +service: + pipelines: + metrics/primary: + receivers: [clockreceiver] + exporters: [stdoutexporter] +`, addr) + return writeConfigFile(t, yaml) +} + +// TestIntegration_TelemetrySurface_EndToEnd boots tracecore with +// telemetry enabled, scrapes /metrics + /healthz + /readyz, asserts +// the expected metric names from a running clockreceiver appear, +// then shuts down cleanly. +func TestIntegration_TelemetrySurface_EndToEnd(t *testing.T) { + if testing.Short() { + t.Skip("integration test: builds the binary and runs a subprocess; skip with -short") + } + t.Parallel() + + binPath := buildBinary(t) + addr := freeTelemetryPort(t) + cfgPath := writeTelemetryIntegrationConfig(t, addr) + + stderr := &syncBuffer{} + //nolint:gosec // binPath/cfgPath under t.TempDir(), fully test-controlled + cmd := exec.CommandContext(t.Context(), binPath, "collect", + "--config="+cfgPath, "--log.format=text") + cmd.Stderr = stderr + require.NoError(t, cmd.Start()) + t.Cleanup(func() { + if t.Failed() { + t.Logf("=== stderr ===\n%s", stderr.String()) + } + }) + + base := "http://" + addr + + // Healthz must turn 200 as soon as the server is up. + require.Eventually(t, func() bool { + return getStatusCode(t, base+"/healthz") == 200 + }, 5*time.Second, 50*time.Millisecond, "/healthz did not become 200 within 5s") + + // Readyz must turn 200 once the runtime entered "running" — but + // must have been 503 before that point. Eventually-200 covers the + // after-start case; the during-boot 503 case is best left to the + // unit-test (testing it here races with how fast the runtime + // completes Start). + require.Eventually(t, func() bool { + return getStatusCode(t, base+"/readyz") == 200 + }, 5*time.Second, 50*time.Millisecond, "/readyz did not become 200 within 5s") + + // Scrape /metrics — must contain the tracecore.receiver.* names + // from the clockreceiver's self-telemetry calls. The 100ms tick + // interval means at least one emission has happened by the time + // /readyz returns 200, but allow up to 3s for the emission + + // observable-counter cycle. + require.Eventually(t, func() bool { + body := scrapeMetrics(t, base+"/metrics") + return strings.Contains(body, "tracecore_receiver_emissions_total") + }, 5*time.Second, 100*time.Millisecond, "tracecore_receiver_emissions_total not present in scrape") + + body := scrapeMetrics(t, base+"/metrics") + // Four of the five receiver methods surface here in normal + // operation: emissions/latency/activity from real calls, and + // degraded_seconds_total from the observable counter that emits + // at 0 even with no SetDegraded calls. errors_total only + // surfaces on push failure; that path is integration-tested + // in components/receivers/clockreceiver against a failing + // downstream (TestIntegration_ErrorsTotal_SurfacesOnDownstreamFailure) + // since clockreceiver→stdoutexporter doesn't fail under normal + // operation. + require.Contains(t, body, "tracecore_receiver_emissions_total") + require.Contains(t, body, "tracecore_receiver_last_activity_unix_seconds") + require.Contains(t, body, "tracecore_receiver_collection_latency_seconds") + require.Contains(t, body, "tracecore_receiver_degraded_seconds_total") + // build_info must carry the three canonical labels (command, + // revision, version) with value 1. Note: `go test` builds without + // -ldflags so revision="" + version="0.0.0-dev" are honest empty + // strings here; the production binary populates them. + require.Regexp(t, + `tracecore_build_info\{[^}]*command="tracecore"[^}]*revision="[^"]*"[^}]*version="[^"]*"[^}]*\}\s+1`, + body, "build_info must carry the three canonical labels") + + // M2 criterion 10: the three O2 SLO observable gauges must + // appear in every scrape, populated from real exporter signal + // (failure_rate) and pre-wired placeholders for queue + restart + // (carry-forward). + require.Contains(t, body, "tracecore_exporter_failure_rate") + require.Contains(t, body, "tracecore_queue_depth_ratio") + require.Contains(t, body, "tracecore_component_restart_count_per_hour") + // Exporter must also surface its per-call counter so operators can + // derive rates in PromQL when the gauge's windowed value isn't + // granular enough. + require.Contains(t, body, "tracecore_exporter_calls_total") + + // Shutdown: SIGTERM, /readyz flips to 503, then process exits. + require.NoError(t, cmd.Process.Signal(syscall.SIGTERM)) + require.NoError(t, cmd.Wait()) + require.Contains(t, stderr.String(), "tracecore stopped cleanly") +} + +// TestIntegration_ExampleConfigValidates pins the operator +// quickstart: `docs/examples/with-telemetry.yaml` must continue to +// pass `tracecore validate`. Doc rot (config schema drift vs the +// committed example) would otherwise ship to operators as a +// confusing first-run failure. +func TestIntegration_ExampleConfigValidates(t *testing.T) { + if testing.Short() { + t.Skip("integration test: builds the binary; skip with -short") + } + t.Parallel() + + binPath := buildBinary(t) + + // Resolve the example path relative to the repo root from the + // test's working directory (cmd/tracecore/). + examplePath := filepath.Join("..", "..", "docs", "examples", "with-telemetry.yaml") + + stderr := &syncBuffer{} + //nolint:gosec // binPath under t.TempDir(); examplePath is a checked-in repo file + cmd := exec.CommandContext(t.Context(), binPath, "validate", "--config="+examplePath) + cmd.Stderr = stderr + require.NoError(t, cmd.Run(), + "docs/examples/with-telemetry.yaml must continue to validate; stderr=%q", stderr.String()) + require.Contains(t, stderr.String(), "config valid", + "validate must report the operator-friendly success line") +} + +// TestIntegration_ValidateExplain_DumpsResolvedConfig pins the +// operator-facing `validate --explain` output: the resolved +// configuration must appear, with telemetry defaults populated +// where the YAML left them blank. +func TestIntegration_ValidateExplain_DumpsResolvedConfig(t *testing.T) { + if testing.Short() { + t.Skip("integration test: builds the binary; skip with -short") + } + t.Parallel() + + binPath := buildBinary(t) + + // YAML with telemetry enabled but no listen/paths — defaults + // should resolve to localhost:8888 + /metrics etc. + cfg := ` +receivers: + clockreceiver: {} +exporters: + stdoutexporter: {} +telemetry: + enabled: true +service: + pipelines: + metrics/primary: + receivers: [clockreceiver] + exporters: [stdoutexporter] +` + cfgPath := writeConfigFile(t, cfg) + + stderr := &syncBuffer{} + //nolint:gosec // binPath under t.TempDir(); cfgPath checked-in repo file + cmd := exec.CommandContext(t.Context(), binPath, "validate", "--config="+cfgPath, "--explain") + cmd.Stderr = stderr + require.NoError(t, cmd.Run()) + + out := stderr.String() + require.Contains(t, out, "config valid") + require.Contains(t, out, "resolved configuration (after defaults applied):") + require.Contains(t, out, "telemetry.enabled: true") + require.Contains(t, out, "telemetry.listen: localhost:8888") + require.Contains(t, out, "telemetry.paths.metrics: /metrics") + require.Contains(t, out, "telemetry.paths.healthz: /healthz") + require.Contains(t, out, "telemetry.paths.readyz: /readyz") + require.Contains(t, out, "metrics/primary:") + require.Contains(t, out, "receivers: [clockreceiver]") + require.Contains(t, out, "exporters: [stdoutexporter]") +} + +// TestIntegration_ValidateExplain_TelemetryOff pins the disabled +// branch — explain still works, just notes the HTTP surface is off. +func TestIntegration_ValidateExplain_TelemetryOff(t *testing.T) { + if testing.Short() { + t.Skip("integration test: builds the binary; skip with -short") + } + t.Parallel() + + binPath := buildBinary(t) + cfgPath := writeIntegrationConfig(t) // no telemetry block + + stderr := &syncBuffer{} + //nolint:gosec // binPath under t.TempDir(); cfgPath checked-in repo file + cmd := exec.CommandContext(t.Context(), binPath, "validate", "--config="+cfgPath, "--explain") + cmd.Stderr = stderr + require.NoError(t, cmd.Run()) + + out := stderr.String() + require.Contains(t, out, "telemetry.enabled: false") + require.Contains(t, out, "HTTP surface OFF; no port bound") +} + +// TestIntegration_TelemetryDisabled_NoListener pins that the +// default-off contract holds end-to-end: a config without telemetry: +// enabled does NOT bind a port. Operators get zero attack surface +// unless they opt in. +func TestIntegration_TelemetryDisabled_NoListener(t *testing.T) { + if testing.Short() { + t.Skip("integration test: builds the binary and runs a subprocess; skip with -short") + } + t.Parallel() + + binPath := buildBinary(t) + cfgPath := writeIntegrationConfig(t) // no telemetry block + + stderr := &syncBuffer{} + //nolint:gosec // binPath/cfgPath under t.TempDir(), fully test-controlled + cmd := exec.CommandContext(t.Context(), binPath, "collect", + "--config="+cfgPath, "--log.format=text") + cmd.Stderr = stderr + require.NoError(t, cmd.Start()) + t.Cleanup(func() { + if t.Failed() { + t.Logf("=== stderr ===\n%s", stderr.String()) + } + }) + + // Wait until the runtime is up — same pattern as the other + // integration tests. + require.Eventually(t, func() bool { + return strings.Contains(stderr.String(), "tracecore running") + }, 5*time.Second, 25*time.Millisecond) + + // Nothing should be listening on the default telemetry port. + // Try a quick connect; expect ECONNREFUSED (or equivalent). + c, err := net.DialTimeout("tcp", "localhost:8888", 100*time.Millisecond) + if err == nil { + _ = c.Close() + t.Fatal("expected nothing listening on localhost:8888 when telemetry disabled") + } + + require.NoError(t, cmd.Process.Signal(syscall.SIGTERM)) + require.NoError(t, cmd.Wait()) +} + +func getStatusCode(t *testing.T, url string) int { + t.Helper() + req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, url, http.NoBody) + if err != nil { + return 0 + } + resp, err := http.DefaultClient.Do(req) + if err != nil { + return 0 + } + defer func() { _ = resp.Body.Close() }() + return resp.StatusCode +} + +func scrapeMetrics(t *testing.T, url string) string { + t.Helper() + req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, url, http.NoBody) + require.NoError(t, err) + resp, err := http.DefaultClient.Do(req) + require.NoError(t, err) + defer func() { _ = resp.Body.Close() }() + b, err := io.ReadAll(resp.Body) + require.NoError(t, err) + return string(b) +} diff --git a/cmd/tracecore/integration_test.go b/cmd/tracecore/integration_test.go index 18c956e6..040f4bcd 100644 --- a/cmd/tracecore/integration_test.go +++ b/cmd/tracecore/integration_test.go @@ -18,6 +18,8 @@ import ( "github.com/stretchr/testify/require" "go.uber.org/goleak" + + "github.com/tracecoreai/tracecore/internal/pipeline/pipelinetest" ) // binaryPathOnce caches the integration-test binary across every test @@ -104,9 +106,12 @@ func TestIntegration_ClockreceiverToStdoutexporter(t *testing.T) { }) // Poll until we've seen at least 3 JSON metric lines OR the - // scenario-level deadline fires. Polling at 25ms gives ≥6 ticks - // of the receiver's 100ms interval to clear the buffer in <1s. - scenarioDeadline := time.After(1500 * time.Millisecond) + // scenario-level deadline fires. Receiver interval is 100ms, + // so ≥3 lines arrive in ~300ms under ideal scheduling. The + // 5s deadline gives ~16× margin to absorb `go test -race ./...` + // CPU pressure (race detector + concurrent package tests can + // stall the polling goroutine for hundreds of ms at a time). + scenarioDeadline := time.After(5 * time.Second) tick := time.NewTicker(25 * time.Millisecond) defer tick.Stop() @@ -117,7 +122,7 @@ func TestIntegration_ClockreceiverToStdoutexporter(t *testing.T) { // doesn't outlive the test. Then fail. _ = cmd.Process.Signal(syscall.SIGTERM) _ = cmd.Wait() - t.Fatalf("did not see ≥3 stdout lines within 1.5s") + t.Fatalf("did not see ≥3 stdout lines within 5s") case <-tick.C: if countLines(stdout.String()) >= 3 { goto enoughLines @@ -201,7 +206,9 @@ func TestIntegration_DoubleSIGTERM_DoesNotHang(t *testing.T) { } }) - deadline := time.After(1500 * time.Millisecond) + // 5s deadline absorbs `go test -race ./...` CPU pressure; the + // "tracecore running" log fires in ~50-200ms under normal load. + deadline := time.After(5 * time.Second) tick := time.NewTicker(25 * time.Millisecond) defer tick.Stop() for { @@ -209,7 +216,7 @@ func TestIntegration_DoubleSIGTERM_DoesNotHang(t *testing.T) { case <-deadline: _ = cmd.Process.Kill() _ = cmd.Wait() - t.Fatalf("did not see startup log within 1.5s") + t.Fatalf("did not see startup log within 5s") case <-tick.C: if strings.Contains(stderr.String(), "tracecore running") { goto running @@ -281,10 +288,11 @@ func TestIntegration_SIGINT(t *testing.T) { } }) - // Wait for runtime to be live before signaling. + // Wait for runtime to be live before signaling. 5s absorbs + // `go test -race ./...` CPU pressure. require.Eventually(t, func() bool { return strings.Contains(stderr.String(), "tracecore running") - }, 1500*time.Millisecond, 25*time.Millisecond) + }, 5*time.Second, 25*time.Millisecond) require.NoError(t, cmd.Process.Signal(syscall.SIGINT)) require.NoError(t, cmd.Wait(), "SIGINT must trigger the same clean shutdown as SIGTERM") @@ -448,21 +456,9 @@ func splitNonEmpty(s string) []string { return out } -// syncBuffer is bytes.Buffer with a mutex so the exec.Cmd writer -// goroutine doesn't race the test goroutine reading buffered output. -type syncBuffer struct { - mu sync.Mutex - buf bytes.Buffer -} - -func (b *syncBuffer) Write(p []byte) (int, error) { - b.mu.Lock() - defer b.mu.Unlock() - return b.buf.Write(p) -} - -func (b *syncBuffer) String() string { - b.mu.Lock() - defer b.mu.Unlock() - return b.buf.String() -} +// syncBuffer is the shared concurrent-safe bytes.Buffer used by +// integration tests in this package. Aliases pipelinetest.SyncBuffer +// to keep call sites short; the canonical type lives there so +// future receiver-author integration tests get the same primitive +// without re-rolling it per package. +type syncBuffer = pipelinetest.SyncBuffer diff --git a/cmd/tracecore/main.go b/cmd/tracecore/main.go index 7a509f73..5b0a157b 100644 --- a/cmd/tracecore/main.go +++ b/cmd/tracecore/main.go @@ -105,6 +105,11 @@ func run(args []string, stderr io.Writer) int { validateConfigPath := validate.Flag("config", "Path to the collector YAML config. Example: --config=./config.yaml.", ).Required().String() + validateExplain := validate.Flag("explain", + "After validating, print the resolved configuration with defaults applied. "+ + "Use this to confirm what `collect` would actually do with the given YAML "+ + "(e.g., what port the telemetry surface would bind to when only `enabled: true` is set).", + ).Bool() cmd, err := app.Parse(args) if earlyExit >= 0 { @@ -143,7 +148,7 @@ func run(args []string, stderr io.Writer) int { case "collect": return runCollect(ctx, logger, *configPath, *drainBudget) case "validate": - return runValidate(ctx, logger, *validateConfigPath, stderr) + return runValidate(ctx, logger, *validateConfigPath, *validateExplain, stderr) default: logger.Error("unknown command", "command", cmd) return exitUsageErr diff --git a/cmd/tracecore/validate.go b/cmd/tracecore/validate.go index 0f32112c..ad4e81f7 100644 --- a/cmd/tracecore/validate.go +++ b/cmd/tracecore/validate.go @@ -18,12 +18,17 @@ import ( // `collect` returns for the same input, so operators can rely on // `validate` as a dry run. // +// When explain is true, also dumps the resolved configuration after +// defaults have been applied so an operator can see exactly what +// tracecore would do at runtime — handy for "I set `enabled: true` +// and nothing else; what port would it actually bind?" questions. +// // out is the writer for the human-readable summary. Convention is to // match `collect`'s lifecycle-log stream (stderr) so a typical // pre-flight invocation looks like: // // tracecore validate --config=config.yaml || exit $? -func runValidate(ctx context.Context, logger *slog.Logger, configPath string, out io.Writer) int { +func runValidate(ctx context.Context, logger *slog.Logger, configPath string, explain bool, out io.Writer) int { cfg, err := config.Load(configPath) if err != nil { logger.Error("load config", "err", err) @@ -51,5 +56,44 @@ func runValidate(ctx context.Context, logger *slog.Logger, configPath string, ou _, _ = fmt.Fprintf(out, " %s: receivers=%d processors=%d exporters=%d\n", p.ID, len(p.Receivers), len(p.Processors), len(p.Exporters)) } + if explain { + explainResolved(out, cfg) + } return exitOK } + +// explainResolved prints the resolved configuration. Defaults have +// already been applied by config.Load, so this shows the operator +// exactly what runtime values would be used. +// +// Output is human-readable and may change between releases. Operators +// scripting against tracecore parse YAML directly; this is for +// eyeballing during pre-flight. +func explainResolved(out io.Writer, cfg *config.Config) { + _, _ = fmt.Fprintln(out, "") + _, _ = fmt.Fprintln(out, "resolved configuration (after defaults applied):") + + // Self-telemetry surface. + _, _ = fmt.Fprintf(out, " telemetry.enabled: %t\n", cfg.Telemetry.Enabled) + if cfg.Telemetry.Enabled { + _, _ = fmt.Fprintf(out, " telemetry.listen: %s\n", cfg.Telemetry.Listen) + _, _ = fmt.Fprintf(out, " telemetry.paths.metrics: %s\n", cfg.Telemetry.Paths.Metrics) + _, _ = fmt.Fprintf(out, " telemetry.paths.healthz: %s\n", cfg.Telemetry.Paths.Healthz) + _, _ = fmt.Fprintf(out, " telemetry.paths.readyz: %s\n", cfg.Telemetry.Paths.Readyz) + } else { + _, _ = fmt.Fprintln(out, " (HTTP surface OFF; no port bound)") + } + + // Pipeline composition — already printed above as a summary. + // Here add the receiver/processor/exporter names for each pipeline + // so operators see which components run. + _, _ = fmt.Fprintln(out, " pipelines:") + for key, p := range cfg.Service.Pipelines { + _, _ = fmt.Fprintf(out, " %s:\n", key) + _, _ = fmt.Fprintf(out, " receivers: %v\n", p.Receivers) + if len(p.Processors) > 0 { + _, _ = fmt.Fprintf(out, " processors: %v\n", p.Processors) + } + _, _ = fmt.Fprintf(out, " exporters: %v\n", p.Exporters) + } +} diff --git a/components/exporters/stdoutexporter/factory.go b/components/exporters/stdoutexporter/factory.go index e25e8ced..fe5a77a0 100644 --- a/components/exporters/stdoutexporter/factory.go +++ b/components/exporters/stdoutexporter/factory.go @@ -36,12 +36,12 @@ func (*factory) CreateDefaultConfig() pipeline.Config { return &Config{Out: os.Stdout} } -func (*factory) CreateMetrics(_ context.Context, set pipeline.CreateSettings, cfg pipeline.Config) (pipeline.Exporter, error) { +func (*factory) CreateMetrics(ctx context.Context, set pipeline.CreateSettings, cfg pipeline.Config) (pipeline.Exporter, error) { c, ok := cfg.(*Config) if !ok { return nil, fmt.Errorf("stdoutexporter: unexpected config type %T", cfg) } - return newExporter(set, c), nil + return newExporter(ctx, set, c), nil } func (*factory) CreateTraces(_ context.Context, _ pipeline.CreateSettings, _ pipeline.Config) (pipeline.Exporter, error) { diff --git a/components/exporters/stdoutexporter/stdoutexporter.go b/components/exporters/stdoutexporter/stdoutexporter.go index 72a32493..73369214 100644 --- a/components/exporters/stdoutexporter/stdoutexporter.go +++ b/components/exporters/stdoutexporter/stdoutexporter.go @@ -13,6 +13,7 @@ import ( "github.com/tracecoreai/tracecore/internal/consumer" "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/selftelemetry" ) // stdoutExporter writes one OTLP/JSON-encoded line per @@ -27,6 +28,12 @@ type stdoutExporter struct { logger *slog.Logger out io.Writer + // selfExp records exporter self-telemetry (calls_total + + // per-call success/failure for the M2 failure_rate gauge). + // Always non-nil — factory substitutes noop on construction + // failure so the hot path doesn't nil-check. + selfExp selftelemetry.Exporter + // writeMu serializes writes so two concurrent ConsumeMetrics // calls don't interleave JSON lines. pdata is not thread-safe // either, but the value comes in via the consumer call, not a @@ -36,14 +43,36 @@ type stdoutExporter struct { marshaler pmetric.JSONMarshaler } -func newExporter(set pipeline.CreateSettings, cfg *Config) *stdoutExporter { +func newExporter(ctx context.Context, set pipeline.CreateSettings, cfg *Config) *stdoutExporter { + se := selftelemetry.NewNoopExporter() + if set.Telemetry.MeterProvider == nil { + if set.Telemetry.Logger != nil { + set.Telemetry.Logger.Warn("self-telemetry exporter: no MeterProvider; using noop") + } + } else if x, err := selftelemetry.NewExporter(set.ID, set.Telemetry.MeterProvider); err == nil { + se = x + } else { + selftelemetry.RecordInitError(ctx, set.Telemetry.MeterProvider, + "exporter", set.ID.String(), selftelemetry.ReasonInstrumentRegister) + if set.Telemetry.Logger != nil { + set.Telemetry.Logger.Warn("self-telemetry exporter init failed; using noop", "err", err) + } + } return &stdoutExporter{ - id: set.ID, - logger: set.Telemetry.Logger, - out: cfg.Out, + id: set.ID, + logger: set.Telemetry.Logger, + out: cfg.Out, + selfExp: se, } } +// SelfExporter exposes the per-exporter selftelemetry handle so the +// runtime can register it with `internal/telemetry.RegisterSLOMetrics` +// to feed `tracecore.exporter.failure_rate`. Returns nil only if +// construction substituted the noop (in which case the noop is also +// safe — its counts are always 0). +func (e *stdoutExporter) SelfExporter() selftelemetry.Exporter { return e.selfExp } + // Capabilities reports MutatesData=false — stdoutexporter only reads // the incoming pmetric.Metrics. Fan-out can share a read-only payload // with us instead of cloning. @@ -57,11 +86,17 @@ func (*stdoutExporter) Capabilities() consumer.Capabilities { // terminal don't see noise. func (e *stdoutExporter) ConsumeMetrics(_ context.Context, md pmetric.Metrics) error { if md.MetricCount() == 0 { + // Empty payloads still count as successful Consume calls — + // the contract was fulfilled, just with zero work. Counting + // them keeps `tracecore_exporter_calls_total` consistent + // with the operator's "calls received" intuition. + e.selfExp.IncCallSuccess() return nil } line, err := e.marshaler.MarshalMetrics(md) if err != nil { + e.selfExp.IncCallFailure("marshal") return fmt.Errorf("stdoutexporter: marshal metrics: %w", err) } @@ -69,10 +104,14 @@ func (e *stdoutExporter) ConsumeMetrics(_ context.Context, md pmetric.Metrics) e defer e.writeMu.Unlock() if _, err := e.out.Write(line); err != nil { + e.selfExp.IncCallFailure("io") return fmt.Errorf("stdoutexporter: write metrics line: %w", err) } if _, err := e.out.Write([]byte{'\n'}); err != nil { + e.selfExp.IncCallFailure("io") return fmt.Errorf("stdoutexporter: write newline: %w", err) } + + e.selfExp.IncCallSuccess() return nil } diff --git a/components/receivers/clockreceiver/clockreceiver.go b/components/receivers/clockreceiver/clockreceiver.go index 1cd4cfb9..47c7b10f 100644 --- a/components/receivers/clockreceiver/clockreceiver.go +++ b/components/receivers/clockreceiver/clockreceiver.go @@ -13,6 +13,7 @@ import ( "github.com/tracecoreai/tracecore/internal/consumer" "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/selftelemetry" ) // clockReceiver emits a gauge metric (`tracecore.clock.now`) on a @@ -32,18 +33,44 @@ type clockReceiver struct { interval time.Duration next consumer.Metrics + // selfRecv records self-telemetry (errors, emissions, latency, + // activity) per the M2 wiring pattern. Always non-nil — the + // factory substitutes a noop if construction fails. + selfRecv selftelemetry.Receiver + // cancel terminates the ticker goroutine. Set by Start, called // by Shutdown. wg waits for the goroutine to exit. cancel context.CancelFunc wg sync.WaitGroup } -func newReceiver(set pipeline.CreateSettings, cfg *Config, next consumer.Metrics) *clockReceiver { +func newReceiver(ctx context.Context, set pipeline.CreateSettings, cfg *Config, next consumer.Metrics) *clockReceiver { + // Construct the real selftelemetry.Receiver. If MeterProvider is + // somehow nil or construction fails, fall back to the noop so the + // hot path doesn't have to nil-check. The fallback path emits a + // `tracecore.selftelemetry.init_errors_total` tick so operators + // see when a component is silently running with a noop. + sr := selftelemetry.NewNoopReceiver() + if set.Telemetry.MeterProvider == nil { + if set.Telemetry.Logger != nil { + set.Telemetry.Logger.Warn("self-telemetry: no MeterProvider; using noop") + } + } else if r, err := selftelemetry.NewReceiver(set.ID, set.Telemetry.MeterProvider); err == nil { + sr = r + } else { + selftelemetry.RecordInitError(ctx, set.Telemetry.MeterProvider, + "receiver", set.ID.String(), selftelemetry.ReasonInstrumentRegister) + if set.Telemetry.Logger != nil { + set.Telemetry.Logger.Warn("self-telemetry init failed; using noop", "err", err) + } + } + return &clockReceiver{ logger: set.Telemetry.Logger, resource: set.Telemetry.Resource, interval: cfg.Interval, next: next, + selfRecv: sr, } } @@ -130,7 +157,15 @@ func (r *clockReceiver) emit(ctx context.Context, now time.Time) { dp.SetTimestamp(pcommon.NewTimestampFromTime(now)) dp.SetIntValue(now.Unix()) - if err := r.next.ConsumeMetrics(ctx, md); err != nil { + start := time.Now() + err := r.next.ConsumeMetrics(ctx, md) + r.selfRecv.ObserveLatency(time.Since(start)) + + if err != nil { + r.selfRecv.IncError("downstream") r.logger.Warn("clockreceiver: downstream rejected push", "err", err) + return } + r.selfRecv.IncEmissions(1) + r.selfRecv.MarkActivity() } diff --git a/components/receivers/clockreceiver/errors_integration_test.go b/components/receivers/clockreceiver/errors_integration_test.go new file mode 100644 index 00000000..2d7079ad --- /dev/null +++ b/components/receivers/clockreceiver/errors_integration_test.go @@ -0,0 +1,100 @@ +// SPDX-License-Identifier: Apache-2.0 + +package clockreceiver_test + +import ( + "context" + "errors" + "io" + "log/slog" + "net/http" + "net/http/httptest" + "strings" + "testing" + "time" + + "github.com/stretchr/testify/require" + "go.opentelemetry.io/collector/pdata/pcommon" + + "github.com/tracecoreai/tracecore/components/receivers/clockreceiver" + "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/pipeline/pipelinetest" + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// TestIntegration_ErrorsTotal_SurfacesOnDownstreamFailure pins the +// 5th of the 5 selftelemetry.Receiver methods (criterion 8): +// clockreceiver's IncError("downstream") path must surface in the +// Prometheus scrape when the next consumer rejects pushes. +// +// Drives the full producer chain (real MeterProvider + real +// TelemetrySettings.MeterProvider + real selftelemetry.NewReceiver +// inside clockreceiver) without spawning a subprocess. The OTel +// SDK's Prometheus exporter sees the receiver's IncError calls and +// exposes them at /metrics. +func TestIntegration_ErrorsTotal_SurfacesOnDownstreamFailure(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + // Build a real CreateSettings with the production MeterProvider. + id := pipeline.MustNewID(pipeline.MustNewType("clockreceiver"), "errors_int") + cfg := clockreceiver.Factory.CreateDefaultConfig() + c, ok := cfg.(*clockreceiver.Config) + require.True(t, ok) + c.Interval = 50 * time.Millisecond + + sink := &pipelinetest.FailingMetricsSink{Err: errors.New("simulated downstream failure")} + rcv, err := clockreceiver.Factory.CreateMetrics(t.Context(), + pipeline.CreateSettings{ + ID: id, + Telemetry: pipeline.TelemetrySettings{ + Logger: slog.New(slog.NewTextHandler(io.Discard, nil)), + MeterProvider: mp.Provider, + Resource: pcommon.NewResource(), + }, + }, cfg, sink) + require.NoError(t, err) + + host := pipelinetest.NewHost() + require.NoError(t, rcv.Start(t.Context(), host)) + t.Cleanup(func() { _ = rcv.Shutdown(context.Background()) }) + + // Wait for at least a couple of tick attempts. + require.Eventually(t, func() bool { + return sink.Calls() >= 2 + }, 2*time.Second, 25*time.Millisecond) + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + + body := scrapeMetrics(t, srv.URL) + require.Contains(t, body, "tracecore_receiver_errors_total", + "errors_total must surface when downstream pushes fail") + require.Regexp(t, + `tracecore_receiver_errors_total\{[^}]*component_id="clockreceiver/errors_int"[^}]*kind="downstream"[^}]*\}\s+\d+`, + body, "errors_total must carry kind=\"downstream\"") + // Sanity: emissions_total must NOT advance when every push fails. + require.NotContains(t, body, `tracecore_receiver_emissions_total{`+ + `otel_scope_name="clockreceiver/errors_int"} 1`, + "emissions_total must not advance when every push errors") + // And the latency histogram still records, since latency is + // measured around the failed push. + require.Contains(t, body, "tracecore_receiver_collection_latency_seconds") +} + +func scrapeMetrics(t *testing.T, url string) string { + t.Helper() + req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, url, http.NoBody) + require.NoError(t, err) + resp, err := http.DefaultClient.Do(req) + require.NoError(t, err) + t.Cleanup(func() { _ = resp.Body.Close() }) + b, err := io.ReadAll(resp.Body) + require.NoError(t, err) + require.Equal(t, http.StatusOK, resp.StatusCode, "scrape returned %d; body=%q", + resp.StatusCode, strings.TrimSpace(string(b))) + return string(b) +} diff --git a/components/receivers/clockreceiver/factory.go b/components/receivers/clockreceiver/factory.go index 63320dc4..bd295ead 100644 --- a/components/receivers/clockreceiver/factory.go +++ b/components/receivers/clockreceiver/factory.go @@ -43,12 +43,12 @@ func (*factory) CreateDefaultConfig() pipeline.Config { return &Config{Interval: 1 * time.Second} } -func (*factory) CreateMetrics(_ context.Context, set pipeline.CreateSettings, cfg pipeline.Config, next consumer.Metrics) (pipeline.Receiver, error) { +func (*factory) CreateMetrics(ctx context.Context, set pipeline.CreateSettings, cfg pipeline.Config, next consumer.Metrics) (pipeline.Receiver, error) { c, ok := cfg.(*Config) if !ok { return nil, fmt.Errorf("clockreceiver: unexpected config type %T", cfg) } - return newReceiver(set, c, next), nil + return newReceiver(ctx, set, c, next), nil } func (*factory) CreateTraces(_ context.Context, _ pipeline.CreateSettings, _ pipeline.Config, _ consumer.Traces) (pipeline.Receiver, error) { diff --git a/docs/FAILURE-MODES.md b/docs/FAILURE-MODES.md index 4577706e..d0e6526c 100644 --- a/docs/FAILURE-MODES.md +++ b/docs/FAILURE-MODES.md @@ -55,6 +55,22 @@ Each row links to the test that pins it. | 🟡 Operator sets drain budget > 30s | Clamped to 30s with WARN log. Operators who genuinely need longer should fix the exporter, not extend the timeout. | `internal/pipeline/runtime_test.go::TestRuntime_DrainBudgetClamp` | | 🟢 Component `Shutdown` panics | Recovered by `safeShutdown`. Returned as an error joined into the shutdown error set. Process exits, no crash. | `internal/pipeline/runtime_test.go::TestRuntime_PanickingShutdown_RecoveredAsError` | +## Self-telemetry surface (M2) + +| Scenario | Behaviour | Test | +|---|---|---| +| 🟢 `telemetry.enabled: false` | No HTTP listener bound. Default. | `cmd/tracecore/integration_telemetry_test.go::TestIntegration_TelemetryDisabled_NoListener` | +| 🟢 `telemetry.listen` already in use | `Server.Start` returns the bind error before pipeline starts. `cmd/tracecore` exits with `exitFailure`; no partial state. | — (port-conflict triggers the synchronous `net.Listen` error path) | +| 🟢 Bad `listen` address (e.g., `notavalidaddress`) | `tracecore validate` rejects with `telemetry.listen: invalid host:port "..."`. Exit 2. | `internal/config/telemetry_test.go::TestTelemetry_RejectsBadListen` | +| 🟢 Non-absolute path (`metrics: m`) | Validate rejects with `telemetry.paths.metrics: must be an absolute path starting with '/'`. Exit 2. | `internal/config/telemetry_test.go::TestTelemetry_RejectsNonAbsolutePath` | +| 🟢 `/healthz` during shutdown | Returns 503 once `Server.Shutdown` flips the `shuttingDown` atomic. Operator's k8s livenessProbe sees the transition. | `internal/telemetry/server_test.go::TestServer_HealthzReturns503DuringShutdown` | +| 🟢 `/readyz` before runtime ready | Returns 503 until `cmd/tracecore.runCollect` flips `ready` after `Runtime.Start` returns. | `internal/telemetry/server_test.go::TestServer_ReadyzReflectsReadyFn` | +| 🟢 Scrape during high-cardinality emission | The OTel SDK + Prometheus exporter handle thousands of series; tracecore receivers MUST keep `kind` labels low-cardinality by contract (`internal/selftelemetry/interface.go` doc). Receivers using `IncError(err.Error())` violate the contract. | — (cardinality contract is documentation-only; no runtime guard) | +| 🟢 `Server.Shutdown` without `Server.Start` | No-op. Returns nil. Mirrors `Component.Shutdown` idempotency. | `internal/telemetry/server_test.go::TestServer_ShutdownIsIdempotent` | +| 🟡 `Server.Shutdown` exceeds `ShutdownBudget` (800ms) | http.Server cancels in-flight requests; returns within budget. Leaves headroom in PRINCIPLES §1 1s overall budget. | `internal/telemetry/server_test.go::TestServer_ShutdownWithin1s` | +| 🟢 Repeated `Server.Start`/`Server.Shutdown` cycles | Listener fd is closed each Shutdown; no leak. `goleak` in TestMain catches regressions. | `internal/telemetry/server_test.go::TestServer_ShutdownIsIdempotent` (covers the cycle) | +| 🔴 `/metrics` handler panics during scrape | promhttp catches the panic internally (its default behaviour) and returns 500. Our handler chain doesn't add recovery middleware in M2. **Carry-forward from M2:** dedicated panic-recovery + dedicated metric `tracecore.telemetry.scrape_panics_total`. | — (no current test; promhttp default is the only safety net) | + ## Operator quick reference | Symptom | First thing to check | diff --git a/docs/FOLLOWUPS.md b/docs/FOLLOWUPS.md index f7e771ad..92121391 100644 --- a/docs/FOLLOWUPS.md +++ b/docs/FOLLOWUPS.md @@ -28,6 +28,258 @@ Highest-leverage opportunistic items, in order: 2. **Verify squash-merge preserves `Assisted-by:` disclosure** — observable only after M1.6 actually lands on main. +## M2 process lessons (for future milestones) + +Captured here so the next milestone's loop prompt picks them up. + +- **Scope creep in this PR.** Two unrelated changes shipped under + `[telemetry]` commits in M2's PR #17: + 1. `e42289f` bumped three integration-test deadlines (1.5s → 5s) + — CI infrastructure fix, not M2 feature work. + 2. Go directive 1.26.2 → 1.26.3 in go.mod — toolchain change + required for govulncheck to clear on M2's new `net.Listen` call, + but a separable concern. + STYLE.md says "Bump deliberately, in a dedicated PR." Both should + have been their own PRs off main. Retroactive split is blocked by + the no-force-push rule (commits already pushed). Lesson for next + milestone: triage scope at the START of the loop, not as a + reviewer finding. + +- **Commit history discipline.** PR #17 ended at 25 commits; 13 were + review-pass fixes. Operators reading `git log` skim past those. + The repo doesn't yet have a "squash review-fix commits at merge + time" convention. **Opportunistic:** draft RFC-0007 proposing a + commit-discipline rule (e.g., review-fix commits get a `[fixup]` + prefix in the subject so the maintainer can `git rebase -i` at + merge time before the no-force-push rule kicks in). *Target:* when + the next non-trivial PR shows the same pattern. + +- **Self-assessment optimism caught the loop's gates four times.** + I declared "M2 COMPLETE" before criterion 10 was met, scored 5/5 + before reviewers ran, claimed Loop 4 done without spawning the + reviewer agents, and claimed all deferrals were documented when + they weren't. Each time external feedback caught it. The pattern: + declared completion in advance of completion. **Lesson:** + declared-vs-actual gap is the loop's blast radius; gate + declarations on a fresh-eyes reviewer agent BEFORE outputting + the completion tag, not as a post-hoc audit. + +## Considered and explicitly skipped (M2) + +- **OTel `MetricsLevel` knob** (`None|Basic|Normal|Detailed`). OTel + collector ships this to gate verbose self-metrics; tracecore's + five-metric surface is small enough that the knob would add + configuration without paying back. Revisit only if a cardinality + problem materializes. +- **Hand-rolled `prometheus/client_golang` API** instead of the + OTel SDK metric + OTel Prom exporter (Candidate B in + `docs/loops/m2-research.md`). Loses OTel alignment — the stated + STRATEGY tiebreaker — for marginal complexity savings. +- **Hybrid OTel/native-Prom translation layer** (Candidate C). Adds + translation failure modes for no clear win over Candidate A. +- **Splitting `CreateSettings` into per-role + `receiver.Settings`/`exporter.Settings`/`processor.Settings`** at + M2. STRATEGY row deferred to "first cross-role config divergence" + — no concrete forcing function exists today. +- **`TracerProvider` field on `TelemetrySettings`** at M2. Tracing + surface is a separate milestone; the unkeyed-init guard means + adding it later is non-breaking. + +### M2 Loop 4 reviewer deferrals + +Items raised by Loop-4 reviewer agents that were intentionally NOT +addressed in the M2 PR. Each carries the reason and a revisit +condition. + +**Hardening / observability of the surface itself:** + +- **`tracecore.telemetry.scrape_panics_total` counter** — the + Pass-2 reviewer suggested adding a metric alongside the panic- + recovery middleware so operators can alert on a previously-silent + bug. Today the panic is logged at slog.Error; that's the + observability path. Add the counter only if log-based alerting + turns out to be insufficient in practice. +- **`tracecore.telemetry.serve_exited{reason}` counter** — + similarly suggested so `Server.Serve` exits can be alerted via + Prom rules rather than logs. Same rationale to defer. +- **`tracecore.selftelemetry.init_errors_total` counter** — + surfaces the silent noop-fallback path in + `clockreceiver.newReceiver` / `stdoutexporter.newExporter`. Defer + until operators report a real instance of "my metrics + vanished and I couldn't tell why." + +**Cardinality + cost guards:** + +- **Runtime cap on `kind` cardinality in `IncError`/`IncCallFailure`.** + Today the cardinality contract is doc-only (RECEIVER-PATTERNS.md + + interface.go godoc). A receiver passing `err.Error()` blows + up the downstream Prom server. Adding a stable-set check at + the producer is the right long-term fix; deferred because + M2 has no in-tree caller that violates the contract and the + enforcement design (allowlist vs LRU vs hash) wants its own + RFC. +- **Histogram bucket tuning for `tracecore.receiver.collection_latency_seconds`.** + Already in the M2 carry-forward list under MILESTONES.md; tune + when a real receiver produces sub-ms ticks that bucket poorly. + +**API ergonomics (defer until a third caller exists):** + +- **`selftelemetry.NewRecordingReceiver(t)` test helper.** M8/M9 + both ship their own Option-pattern test-double seam; until a + third receiver wants the same thing, the right abstraction + isn't obvious yet. +- **`selftelemetry.Processor` producer interface.** No in-tree + processor today calls into self-telemetry; design the surface + when the first processor needs it. +- **Per-exporter `tracecore.exporter.failure_rate{component_id}` + series.** Today `AggregateSLOSource` reports one global rate; + operators must use the raw `tracecore_exporter_calls_total` + for per-exporter alerts. Add only if multi-exporter deployments + become common. +- **Live `ExporterRegistry` (factory func instead of slice snapshot).** + M2 captures the registry at `BuildPipelines` time; safe today + because the runtime has no dynamic component registration. If + hot-reload or M-something-dynamic-pipelines ever lands, swap + to `func() []FailureRateReader`. +- **Make `AggregateSLOSource` unexported (constructor-only API).** + Pass-2 reviewer noted the type allows copy-construction which + the doc-comment warns against. Defer because no in-tree caller + copies; revisit if a future caller is tempted. + +**Edge cases worth a test someday:** + +- **Tighter path validation in `internal/telemetry.Server`.** + Today `/foo bar` (with space) or `/foo\n` would slip past the + `cfg.MetricsPath[0] != '/'` check and `http.ServeMux.Handle` + would panic at registration. The panic surfaces at `NewServer` + call time so the failure mode is loud, just not the clean + validation error the rest of the surface advertises. Replace + with `url.Parse` if/when a malformed-path bug ships. +- **`uint64` underflow guard in `AggregateSLOSource`.** Today's + exporter counters are monotonic across the binary's lifetime; + a future restart-mechanism that recycles `selftelemetry.Exporter` + instances could non-monotonically reset them, underflowing + `failure - anchor.failure`. Guard with `if failure < anchor.failure + { return 0 }` when the restart mechanism lands. +- **`Server.Shutdown` ctx semantics under already-expired caller + contexts.** `cmd/tracecore` always passes `context.WithoutCancel(ctx)` + so the production path is correct. The contract should be + documented explicitly or made internally robust if a third + caller appears. +- **`SetDegraded` ↔ observable-callback consistency.** The + observable callback for `degraded_seconds_total` reads + `degradedAt` and `accumulated` outside the mutex; a concurrent + toggle could produce a microsecond-scale inconsistency. Take + the mutex in the callback if/when a real scrape-vs-toggle race + is observed in production. + +**Documentation polish:** + +- **Datadog Agent backend compatibility claim** in + `internal/telemetry/README.md`. Listed as "needs the OpenMetrics + integration" but never end-to-end verified. Either run a real + agent against the surface and write up the result, or hedge + the claim further. + +## M2 review phases — what each phase did and what's left + +The M2 PR (#17) went through five distinct review rounds. Each +round produced commits AND deferrals; this section indexes them so a +future maintainer (or future-me) reading `git log` knows which +commit closed which review and what was intentionally left for later. + +**Loop 4 reviewer agents (3 passes × 2 agents):** +- Pass 1 — correctness + UX (commit c685f43) +- Pass 2 — security + docs (commit 0fa17a6) +- Pass 3 — fresh-eyes + M8/M9 author (commit 628c5ea) +- Pass deferrals — initial documentation (commit 6b0af2d) + +**Self-review pass (10 phases):** +- Phase 1 (4fa276b) — pkg distinction docs, MILESTONES glyph, + process lessons (this section's origin) +- Phase 2 (ca30681) — `--version` OTel SDK, Example_* tests, + Kind constants helper, shutdown error logging, + initTelemetryStack helper +- Phase 3 (569f79e) — ServerConfig.Paths sub-struct +- Phase 4 (9a9fa80) — split selftelemetry/impl.go into three files +- Phase 5 (b975fdf) — WindowedRate primitive extracted +- Phase 6 (1da39bc) — pipelinetest test fakes (SyncBuffer, + FailingMetricsSink, RecordingMetricsSink) +- Phase 7 (8d7ade3) — registerObservability + runRuntime helpers +- Phase 8 (a9862cb) — `validate --explain` flag +- Phase 9 (91212f2) — JSON probe responses + deprecation policy +- Phase 10 (6605aa7) — consolidate export_test files + +**Reviewer second-pass (A+ batches 1-4):** +- Batch 1 (a9d6def) — promote ExporterCarrier, drop SetDegraded + mutex (CAS-only), single-call IncError attribute merge +- Batch 2 (1b4346b) — pre-marshaled JSON probe payloads, cached + PromHandler, registerObservableFloat64Gauge helper +- Batch 3 (a1ea906) — promote config.ValidateMountPath as the + single source of truth, WindowedRate "why-not-x/time/rate" + rationale comment, O(n) docstring on ExporterFailureRate +- Batch 4 (a89b7fa) — fd-leak count fault injection in the + goleak test, tick all 19 m2-self-telemetry.md success criteria + +**Still deferred (and which phase decided):** + +| Item | Decided in | Trigger to revisit | +|---|---|---| +| Per-exporter `failure_rate{component_id}` label | Pass 2 reviewer C + 6b0af2d | Multi-exporter deployments become common | +| `selftelemetry.Processor` producer interface | Pass 3 + 6b0af2d | First in-tree processor needs self-telemetry | +| Runtime cap on `IncError(kind)` cardinality | 6b0af2d | First in-tree caller violates the doc contract | +| Histogram bucket tuning (collection_latency_seconds) | Loop 2 verdict + 6b0af2d | Real-receiver data shows defaults bucket poorly | +| `selftelemetry.NewRecordingReceiver(t)` helper | Pass 3 + 6b0af2d | Third receiver wants the same option-pattern seam | +| Scrape-panic + serve-exit + selftelemetry-init-error metrics | Pass 2 + 6b0af2d | Operators report log-only is insufficient | +| Live (factory-func) ExporterRegistry | Pass 2 + 6b0af2d | Dynamic component registration ships | +| Unexport AggregateSLOSource (constructor-only) | Pass 2 + 6b0af2d | A future caller is tempted to copy-construct | +| Tighter `Server` path validation via `url.Parse` | 6b0af2d | A malformed-path bug actually ships | +| uint64 underflow guard in WindowedRate | 6b0af2d (shipped in batch 3) | — closed | +| SetDegraded ↔ observable-callback mutex consistency | Pass 2 + 6b0af2d | Production scrape races visible | +| Server.Shutdown ctx semantics under expired contexts | 6b0af2d | Third Server.Shutdown caller materializes | +| Datadog Agent compatibility verification | Pass 2 + 6b0af2d | Operator reports | +| pprof endpoint | Loop 2 verdict | Security policy story > 5 LOC plumbing | +| OTLP push reader on MeterProvider | Loop 2 verdict | Operators on push-only backends | +| MetricsLevel knob | Loop 2 verdict | Cardinality becomes a real problem | +| Per-role `CreateSettings` split | STRATEGY M2 row | First cross-role config divergence | +| `TracerProvider` field on TelemetrySettings | STRATEGY M2 row | Tracing milestone | +| Queue mechanism that drives queue.depth_ratio | MILESTONES M2 row | Queue milestone | +| Component restart mechanism for restart_count_per_hour | MILESTONES M2 row | Restart milestone | +| `tracecore validate --explain` provenance (default vs explicit) | Phase 8 | Operators ask for it | +| JSON probe richer payload (more fields) | Phase 9 | Tooling needs structured fields beyond `status` | +| Squash review-fix commits at merge time (repo policy) | Phase 1 lesson | Pattern repeats on the next milestone | + +Each future milestone reading this file should scan the table for +"trigger to revisit" entries matching its own scope before opening +new tickets. + +## M2 ecosystem-fit deferrals + +Post-merge reviewer flagged adoption-loop gaps beyond M2 scope. The +implementation is genuinely strong; these are "what does the next +milestone need to make M2 a baseline rather than a destination?" +items. Shipped here as carry-forward with the milestone target the +reviewer suggested. + +| Item | Suggested target | Why deferred from M2 | +|---|---|---| +| Helm chart `install/kubernetes/tracecore/` with `telemetry.enabled` toggle, ServiceMonitor template, kubelet probe stanza | M5b (Helm chart milestone per NORTHSTARS O2) | Packaging is its own milestone with its own conventions; landing a skeleton in M2 risks half-finished chart shipping as "the canonical one." | +| Mermaid architecture diagram in RFC-0006 (Component → meter.Counter.Add → SDK MeterProvider → Prom Reader → Registry → /metrics) | M3 (CI gates the doc render) | Tools to render Mermaid in CI not yet wired; ship the diagram once `make doc-check` knows how to validate it. | +| `tracecore version --bundle` emitting an issue-triage payload (version, revision, build-date, enabled-features, deps, OS/arch) | M3 (operator-UX polish milestone) | Touches CLI args + version package; M2's scope was the telemetry surface, not the CLI shape. | +| Concrete overhead numbers in `internal/telemetry/README.md` "Performance" section (e.g., "at 1Hz scrape with 8 receivers, CPU% is X") | M3 (operator-UX polish) | NORTHSTARS O2 has the <0.05%/receiver budget; running the full bench against representative receivers belongs to the bench-harness milestone (M4 or M5). | +| Threat model 1-pager (`docs/THREAT-MODEL.md`) covering scraper-can-read-not-write + bound-interface attack-surface analysis | M3 (security posture polish) | A real threat model wants formal review; the M2 README "Security posture" section is the operationally meaningful subset for shipping. | +| `EnableOpenMetrics: true` config toggle (current default `false` matches Prometheus) | M5 (backend-compat milestone) | Mimir/Cortex 2024+ want exemplars + `_created` lines; needs explicit operator opt-in and a compat-matrix test. | +| Recording rules `tracecore:exporter_failure_rate:rate5m` + Grafana dashboard JSON | M4 (observability-platform integration milestone) | Operator-platform glue, not collector-side work. | +| Programmatic `tracecore.RunCollect(...)` API for embedding the collector in another Go binary | post-v1 | Pre-v1 the surface is still settling; locking a Go API into the binary now would force re-cuts. | +| Server `Server:` header strip | opportunistic | Trivial; do it the next time someone touches server.go. | +| Rate limit on `/metrics` (defensive against pathological scrapers) | M5 (security hardening) | Slowloris timeouts already bound single-attacker resource burn; rate-limit is a layered defense. | +| SBOM human-readable summary attached to release artifacts | M3 (release-process milestone) | The CI scan already runs; the human-readable manifest is operator-friendliness work. | + +Each item's "suggested target" is a hint, not a commitment — the +milestone planner gets the final word. The point of this table is +that future-me reading FOLLOWUPS sees both the post-M2-A+ gap and +the rationale for where each piece belongs. + ## Open — opportunistic ### Test depth diff --git a/docs/STRATEGY.md b/docs/STRATEGY.md index ae79b412..bf2fa2ac 100644 --- a/docs/STRATEGY.md +++ b/docs/STRATEGY.md @@ -35,9 +35,12 @@ Current accepted divergences: | Test fixture naming | `pipelinetest.Fixture` | `componenttest.NopHost` | Different concepts (full fixture vs no-op host); coexists, doesn't conflict | permanent | | Consumer-seam panic recovery | `pipeline.WrapSafeX` wraps every Consume call | None — panics propagate | M2 will hot-path components for self-metrics; a buggy increment must not crash the binary | permanent | | Operator UX | `WrapFirstData` emits "pipeline first data" once per pipeline | None | UX criterion #4 — lets operators verify a pipeline is alive without external tooling | permanent | -| `Host.ReportStatus` | Method on `Host` (no-op stub for M2) | Removed from `Host` in v0.152; lives on `componentstatus.ReportStatus(host, ev)` free fn | Discovered M1.6 Phase-19 OTel audit. Align in M2 alongside `componentstatus` work. | **M2** | -| `CreateSettings` shape | `{ID, Telemetry}` shared across roles | `receiver.Settings` / `exporter.Settings` / `processor.Settings` with `{ID, TelemetrySettings (embedded), BuildInfo, _ struct{}}` | M2 adds `BuildInfo` and the unkeyed-init guard; per-role split deferred to first cross-role config divergence. | **M2** (BuildInfo + guard); **post-v1** (per-role split) | -| `TelemetrySettings` shape | `{Logger, Resource}` | `{Logger, TracerProvider, MeterProvider, Resource, _ struct{}}` | RFC-0003 §"TelemetrySettings extension preview" pins the M2 shape; field additions are non-breaking under the variadic-Options pattern. | **M2** (MeterProvider); **post-v1** (TracerProvider) | +| `Host.ReportStatus` | Free fn `internal/componentstatus.ReportStatus(host, ev)`; host opts in via `StatusReporter` interface | Free fn `componentstatus.ReportStatus(host, ev)` in `go.opentelemetry.io/collector/component/componentstatus` | M2 closure (in-tree package vs external dep to avoid pulling component module; same shape). | **done** (M2) | +| `CreateSettings` shape | `{ID, Telemetry, BuildInfo, _ struct{}}` shared across roles | `receiver.Settings` / `exporter.Settings` / `processor.Settings` with `{ID, TelemetrySettings (embedded), BuildInfo, _ struct{}}` | M2 added `BuildInfo` and the unkeyed-init guard; per-role split deferred to first cross-role config divergence. | **done** (M2) for BuildInfo + guard; **post-v1** for per-role split | +| `TelemetrySettings` shape | `{Logger *slog.Logger, MeterProvider, Resource, _ struct{}}` | `{Logger *zap.Logger, TracerProvider, MeterProvider, Resource, _ struct{}}` | M2 added MeterProvider + guard; slog vs zap is a pre-existing permanent divergence; TracerProvider deferred to tracing milestone. | **done** (M2) for MeterProvider; **post-v1** for TracerProvider | +| Self-telemetry HTTP default bind | `localhost:8888` | `0.0.0.0:8888` | Security tiebreaker: pre-1.0 tracecore favours safe-by-default. Operators override via `telemetry.listen`. | permanent | +| Self-telemetry metric names | `tracecore.receiver.*` | `otelcol_*` | Separate surface; operators KNOW they're scraping tracecore. | permanent | +| `componentstatus` package | `internal/componentstatus` (in-tree) | `go.opentelemetry.io/collector/component/componentstatus` (external module) | Avoids pulling the OTel collector component module just for this one fn. Revisit at M22 OTel-compat sweep. | permanent (revisit M22) | | Factory interface decomposition | Three monolithic `ReceiverFactory` / `ProcessorFactory` / `ExporterFactory` | Base `component.Factory` embedded into per-role factories, plus `XStability()` per signal | Stability tracking deferred to post-v1.0 (RFC-0003 §"Deferred"); decomposition adds no value without it. | **post-v1** (when stability tracking lands) | | Sentinel error name | `ErrSignalNotSupported` | `pipeline.ErrSignalNotSupported` | Aligned in M1.6 Phase-19 audit (was `ErrSignalUnsupported`). | **done** (M1.6) | diff --git a/docs/agents/RECEIVER-PATTERNS.md b/docs/agents/RECEIVER-PATTERNS.md index aef5fb12..3bf98746 100644 --- a/docs/agents/RECEIVER-PATTERNS.md +++ b/docs/agents/RECEIVER-PATTERNS.md @@ -22,8 +22,57 @@ readers need the trail. ## Active patterns -*Empty — M8 / M9 will populate the first entries when their loops -land.* +### Self-telemetry wiring (constructor injection + noop fallback) + +**Established by:** M2 self-telemetry surface +**Reference:** `components/receivers/clockreceiver/clockreceiver.go` +in `newReceiver` (called from `CreateMetrics`). +**One-line:** Construct the real `selftelemetry.Receiver` from +`set.Telemetry.MeterProvider` + `set.ID` inside the receiver's +`newReceiver` (the function the factory's `CreateMetrics` calls), +with a noop fallback so the hot path never nil-checks. + +The canonical six-line pattern: + +```go +sr := selftelemetry.NewNoopReceiver() +if set.Telemetry.MeterProvider != nil { + if r, err := selftelemetry.NewReceiver(set.ID, set.Telemetry.MeterProvider); err == nil { + sr = r + } else if set.Telemetry.Logger != nil { + set.Telemetry.Logger.Warn("self-telemetry init failed; using noop", "err", err) + } +} +``` + +Call from the hot path (matches `clockreceiver.emit` exactly): + +```go +start := time.Now() +err := r.next.ConsumeMetrics(ctx, md) +sr.ObserveLatency(time.Since(start)) // ALWAYS — including failures + +if err != nil { + sr.IncError("downstream") // kind MUST be low-cardinality + return +} +sr.IncEmissions(n) +sr.MarkActivity() +``` + +Notes: + +- **Latency is observed for both successes and failures** — operators + alerting on p99 latency see the union. A future receiver-author + ergonomics pass may split into two histograms. +- `IncError(kind)`: `kind` MUST be low-cardinality (`"connect"`, + `"parse"`, `"downstream"`, never `err.Error()`). +- `SetDegraded(true)` on entering degraded state, `(false)` on + recovery. Cumulative degraded-seconds accumulate via an + observable counter, so scrapes see fresh data even before + recovery. + +Example config: [`docs/examples/with-telemetry.yaml`](../examples/with-telemetry.yaml). ## Expected first entries diff --git a/docs/agents/REVIEWER-CONTEXT.md b/docs/agents/REVIEWER-CONTEXT.md index 13dcebcc..689b8429 100644 --- a/docs/agents/REVIEWER-CONTEXT.md +++ b/docs/agents/REVIEWER-CONTEXT.md @@ -206,9 +206,12 @@ RFC or STRATEGY.md update. | `pipelinetest.Fixture` vs `componenttest.NopHost` | permanent | done | | `pipeline.WrapSafeX` panic recovery vs propagation | permanent | done | | `WrapFirstData` UX criterion #4 | permanent | done | -| `Host.ReportStatus` method vs free fn `componentstatus.ReportStatus` | open | **M2** | -| `CreateSettings` shape `{ID, Telemetry}` vs per-role split | open | **M2** (BuildInfo+guard); post-v1 (split) | -| `TelemetrySettings` shape `{Logger, Resource}` vs full | open | **M2** (MeterProvider); post-v1 (TracerProvider) | +| `Host.ReportStatus` method vs free fn `internal/componentstatus.ReportStatus(host, ev)` | done | **M2** — in-tree pkg, host opts in via `StatusReporter` | +| `CreateSettings` shape `{ID, Telemetry, BuildInfo, _ struct{}}` vs per-role split | done | **M2** for BuildInfo+guard; post-v1 for per-role split | +| `TelemetrySettings` shape `{Logger, MeterProvider, Resource, _ struct{}}` vs full | done | **M2** for MeterProvider; post-v1 for TracerProvider | +| Self-telemetry HTTP default bind `localhost:8888` vs OTel `0.0.0.0:8888` | permanent | done (M2) — security tiebreaker | +| Self-telemetry metric names `tracecore.*` vs `otelcol_*` | permanent | done (M2) | +| `componentstatus` in-tree at `internal/componentstatus` vs external module | permanent | done (M2); revisit M22 | | Factory monolithic vs decomposed + `XStability()` | open | post-v1 | | Sentinel name `ErrSignalNotSupported` (was Unsupported) | done | M1.6 | diff --git a/docs/examples/grafana-dashboard.example.json b/docs/examples/grafana-dashboard.example.json new file mode 100644 index 00000000..fe511d5e --- /dev/null +++ b/docs/examples/grafana-dashboard.example.json @@ -0,0 +1,228 @@ +{ + "title": "tracecore self-telemetry", + "description": "Starter dashboard for tracecore's M2 self-telemetry surface. Import via Grafana → Dashboards → New → Import → paste JSON. Six panels cover the five selftelemetry.Receiver methods plus exporter health and build identity. Adjust the Prometheus datasource UID if yours isn't named `prometheus`.", + "tags": ["tracecore", "self-telemetry", "m2"], + "timezone": "browser", + "schemaVersion": 38, + "version": 1, + "refresh": "30s", + "time": {"from": "now-30m", "to": "now"}, + "templating": { + "list": [ + { + "name": "instance", + "type": "query", + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "query": "label_values(tracecore_build_info, instance)", + "refresh": 1, + "multi": true, + "includeAll": true + }, + { + "name": "component_id", + "type": "query", + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "query": "label_values(tracecore_receiver_emissions_total{instance=~\"$instance\"}, component_id)", + "refresh": 1, + "multi": true, + "includeAll": true + } + ] + }, + "panels": [ + { + "id": 1, + "title": "Build identity", + "type": "table", + "gridPos": {"x": 0, "y": 0, "w": 24, "h": 4}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "tracecore_build_info{instance=~\"$instance\"}", + "format": "table", + "instant": true, + "refId": "A" + } + ], + "description": "Version + revision + command of every running tracecore instance. Join target for other tracecore_* queries via `* on() group_left(version, revision) tracecore_build_info`." + }, + { + "id": 2, + "title": "Exporter failure rate (60s rolling)", + "type": "timeseries", + "gridPos": {"x": 0, "y": 4, "w": 12, "h": 8}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "tracecore_exporter_failure_rate{instance=~\"$instance\"}", + "legendFormat": "{{instance}}", + "refId": "A" + } + ], + "fieldConfig": { + "defaults": { + "unit": "percentunit", + "min": 0, + "max": 1, + "thresholds": { + "mode": "absolute", + "steps": [ + {"color": "green", "value": null}, + {"color": "yellow", "value": 0.01}, + {"color": "red", "value": 0.1} + ] + } + } + }, + "description": "Operators alert on this gauge sustained above 0.01 — see docs/examples/prometheus-alerts.example.yaml `TracecoreExporterFailureRateHigh`. 0 means \"no failures in the last 60s\" OR \"warming up\" (during boot)." + }, + { + "id": 3, + "title": "Exporter call rate by result", + "type": "timeseries", + "gridPos": {"x": 12, "y": 4, "w": 12, "h": 8}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "sum by (result) (rate(tracecore_exporter_calls_total{instance=~\"$instance\"}[5m]))", + "legendFormat": "{{result}}", + "refId": "A" + } + ], + "fieldConfig": {"defaults": {"unit": "ops"}}, + "description": "Raw counter rate. Use when failure_rate isn't granular enough (custom windows, per-kind splits)." + }, + { + "id": 4, + "title": "Receiver emission rate by component", + "type": "timeseries", + "gridPos": {"x": 0, "y": 12, "w": 12, "h": 8}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "rate(tracecore_receiver_emissions_total{instance=~\"$instance\", component_id=~\"$component_id\"}[5m])", + "legendFormat": "{{component_id}}", + "refId": "A" + } + ], + "fieldConfig": {"defaults": {"unit": "ops"}}, + "description": "Per-component emission throughput. A flat line at 0 for a normally-active receiver is a red flag — pair with the activity-staleness panel below." + }, + { + "id": 5, + "title": "Collection latency (p50 / p95 / p99)", + "type": "timeseries", + "gridPos": {"x": 12, "y": 12, "w": 12, "h": 8}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "histogram_quantile(0.50, sum by (component_id, le) (rate(tracecore_receiver_collection_latency_seconds_bucket{instance=~\"$instance\", component_id=~\"$component_id\"}[5m])))", + "legendFormat": "p50 {{component_id}}", + "refId": "A" + }, + { + "expr": "histogram_quantile(0.95, sum by (component_id, le) (rate(tracecore_receiver_collection_latency_seconds_bucket{instance=~\"$instance\", component_id=~\"$component_id\"}[5m])))", + "legendFormat": "p95 {{component_id}}", + "refId": "B" + }, + { + "expr": "histogram_quantile(0.99, sum by (component_id, le) (rate(tracecore_receiver_collection_latency_seconds_bucket{instance=~\"$instance\", component_id=~\"$component_id\"}[5m])))", + "legendFormat": "p99 {{component_id}}", + "refId": "C" + } + ], + "fieldConfig": {"defaults": {"unit": "s"}}, + "description": "Histogram is bucketed for sub-ms resolution: 100µs, 1ms, 5ms, 10ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s. Failure-path latency is mixed in by design (see selftelemetry.Receiver.ObserveLatency godoc)." + }, + { + "id": 6, + "title": "Receiver activity staleness", + "type": "timeseries", + "gridPos": {"x": 0, "y": 20, "w": 12, "h": 8}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "time() - tracecore_receiver_last_activity_unix_seconds{instance=~\"$instance\", component_id=~\"$component_id\"}", + "legendFormat": "{{component_id}}", + "refId": "A" + } + ], + "fieldConfig": { + "defaults": { + "unit": "s", + "thresholds": { + "mode": "absolute", + "steps": [ + {"color": "green", "value": null}, + {"color": "yellow", "value": 60}, + {"color": "red", "value": 300} + ] + } + } + }, + "description": "Seconds since the receiver's last MarkActivity. Operators alert on >5 minutes (TracecoreReceiverNoActivity). Fresh-boot receivers seed activity to NewReceiver time so this never shows 1970 falsely." + }, + { + "id": 7, + "title": "Receiver errors by kind", + "type": "timeseries", + "gridPos": {"x": 12, "y": 20, "w": 12, "h": 8}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "sum by (component_id, kind) (rate(tracecore_receiver_errors_total{instance=~\"$instance\", component_id=~\"$component_id\"}[5m]))", + "legendFormat": "{{component_id}} {{kind}}", + "refId": "A" + } + ], + "fieldConfig": {"defaults": {"unit": "ops"}}, + "description": "Errors per kind (Kind* constants in selftelemetry: connect, parse, downstream, enumerate, read, cardinality, panic, init). Cardinality is bounded by the canonical kind set; receivers that drift see this panel split into >8 series." + }, + { + "id": 8, + "title": "Degraded-seconds accumulation", + "type": "timeseries", + "gridPos": {"x": 0, "y": 28, "w": 12, "h": 8}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "rate(tracecore_receiver_degraded_seconds_total{instance=~\"$instance\", component_id=~\"$component_id\"}[5m])", + "legendFormat": "{{component_id}}", + "refId": "A" + } + ], + "fieldConfig": { + "defaults": { + "unit": "percentunit", + "min": 0, + "max": 1, + "thresholds": { + "mode": "absolute", + "steps": [ + {"color": "green", "value": null}, + {"color": "yellow", "value": 0.05}, + {"color": "red", "value": 0.5} + ] + } + } + }, + "description": "Fraction of time the receiver spent in degraded state over the last 5 minutes. 0.05 = 5% of the window degraded. Receivers that boot already-degraded must call SetDegraded(true) explicitly for this to tick." + }, + { + "id": 9, + "title": "Self-telemetry init errors", + "type": "timeseries", + "gridPos": {"x": 12, "y": 28, "w": 12, "h": 8}, + "datasource": {"type": "prometheus", "uid": "prometheus"}, + "targets": [ + { + "expr": "sum by (kind, reason) (rate(tracecore_selftelemetry_init_errors_total{instance=~\"$instance\"}[5m]))", + "legendFormat": "{{kind}} {{reason}}", + "refId": "A" + } + ], + "fieldConfig": {"defaults": {"unit": "ops"}}, + "description": "Components that fell back to the noop selftelemetry impl — their per-component metrics are absent. Should always be 0 in healthy operation; non-zero indicates a wire-up regression." + } + ] +} diff --git a/docs/examples/prometheus-alerts.example.yaml b/docs/examples/prometheus-alerts.example.yaml new file mode 100644 index 00000000..c9a8c4a7 --- /dev/null +++ b/docs/examples/prometheus-alerts.example.yaml @@ -0,0 +1,87 @@ +# Starter Prometheus alert rules for tracecore's M2 self-telemetry +# surface. Five alerts cover the operationally meaningful failure +# modes. Adjust thresholds + `for:` durations to match your SLOs; +# the defaults here are deliberately conservative. +# +# Wire by including under `rule_files:` in prometheus.yml, or convert +# to a PrometheusRule CRD for prometheus-operator setups. +groups: + - name: tracecore.self-telemetry + interval: 30s + rules: + # Exporter failure rate sustained — operators' #1 page-worthy + # signal. The gauge is a rolling 60s window; alerting above 0.01 + # over 5 minutes means a real upstream is rejecting pushes. + - alert: TracecoreExporterFailureRateHigh + expr: tracecore_exporter_failure_rate > 0.01 + for: 5m + labels: + severity: warning + annotations: + summary: tracecore exporter failure rate exceeds 1% (5m sustained) + description: | + tracecore_exporter_failure_rate is {{ $value }} on + {{ $labels.instance }} for the last 5 minutes. Inspect + tracecore_exporter_calls_total{result="failure"} to see + which exporter is failing. + + # Receiver stuck in degraded state — accumulating + # degraded-seconds while no recovery transition fires. + - alert: TracecoreReceiverDegraded + expr: rate(tracecore_receiver_degraded_seconds_total[5m]) > 0.05 + for: 5m + labels: + severity: warning + annotations: + summary: tracecore receiver {{ $labels.component_id }} spent >5%/min in degraded state + description: | + Receiver {{ $labels.component_id }} on {{ $labels.instance }} + has accumulated degraded time at a rate of + {{ $value | humanizePercentage }} over the last 5 minutes. + + # Stale activity — a receiver that hasn't reported activity in + # 5 minutes. Possible deadlock or stuck upstream. + - alert: TracecoreReceiverNoActivity + expr: (time() - tracecore_receiver_last_activity_unix_seconds) > 300 + for: 1m + labels: + severity: warning + annotations: + summary: tracecore receiver {{ $labels.component_id }} has been silent for >5 minutes + description: | + tracecore_receiver_last_activity_unix_seconds is + {{ $value | humanizeDuration }} behind wall-clock on + {{ $labels.instance }} for component + {{ $labels.component_id }}. The receiver may be + wedged or its upstream may have disappeared. + + # Self-telemetry construction silently fell back to noop. + # Operators see no per-component metrics from this binary even + # though the surface is up; investigate the binary's log. + - alert: TracecoreSelftelemetryInitErrors + expr: increase(tracecore_selftelemetry_init_errors_total[10m]) > 0 + labels: + severity: warning + annotations: + summary: tracecore self-telemetry init failed for {{ $labels.kind }} {{ $labels.component_id }} + description: | + Component {{ $labels.component_id }} (kind={{ $labels.kind }}) + fell back to the noop selftelemetry impl on + {{ $labels.instance }}; reason={{ $labels.reason }}. + Per-component metrics from this component are absent + from the scrape. + + # Build identity informational — fires on a join against an + # external "blessed version" set. Operators redefine the + # expression to match their fleet management approach; the + # alert ships as scaffolding. + - alert: TracecoreBuildIdentityKnown + expr: tracecore_build_info == 1 + labels: + severity: info + annotations: + summary: tracecore {{ $labels.version }} ({{ $labels.revision }}) on {{ $labels.instance }} + description: | + Informational. Replace with a real version-drift alert + once your fleet defines a "current" tracecore version + (e.g., `tracecore_build_info{version!="v0.2.0"}`). diff --git a/docs/examples/with-telemetry.yaml b/docs/examples/with-telemetry.yaml new file mode 100644 index 00000000..584fbac7 --- /dev/null +++ b/docs/examples/with-telemetry.yaml @@ -0,0 +1,41 @@ +# Operator-facing example: tracecore with the M2 self-telemetry +# surface enabled. +# +# Save as `tracecore.yaml` and run: +# tracecore collect --config=tracecore.yaml +# then scrape: +# curl http://localhost:8888/metrics +# +# The `telemetry:` block is OPT-IN. Without it tracecore binds no +# HTTP port — operators in production opt in by setting +# `telemetry.enabled: true` (or by uncommenting the listen line and +# changing it to `:8888` to scrape from another node). + +receivers: + clockreceiver: + interval: 1s + +exporters: + stdoutexporter: {} + +# Self-telemetry surface. The whole block is opt-in: omitting it +# leaves tracecore binding no HTTP port. As shipped this example +# enables the surface so a first-run operator gets a working +# `curl localhost:8888/metrics` — set `enabled: false` (or delete +# the block) to turn it off again. +telemetry: + enabled: true + # Default "localhost:8888" — safe default keeps tracecore from + # binding all interfaces. Operators in multi-node setups override + # to ":8888" or to a specific interface. + listen: localhost:8888 + paths: + metrics: /metrics + healthz: /healthz + readyz: /readyz + +service: + pipelines: + metrics/primary: + receivers: [clockreceiver] + exporters: [stdoutexporter] diff --git a/docs/rfcs/RFC-0006-self-telemetry-surface.md b/docs/rfcs/RFC-0006-self-telemetry-surface.md new file mode 100644 index 00000000..bc9617b7 --- /dev/null +++ b/docs/rfcs/RFC-0006-self-telemetry-surface.md @@ -0,0 +1,275 @@ +# RFC-0006: Self-telemetry surface + +**Status:** Accepted (M2) +**Authors:** Tracecore team +**Implementation:** `internal/telemetry`, `internal/selftelemetry` +**Related:** [RFC-0003 §"TelemetrySettings extension preview"](0003-pipeline-runtime-and-component-contract.md), [`docs/STRATEGY.md`](../STRATEGY.md) M2 divergence rows. + +## Summary + +Add a single operator-facing HTTP server that exposes Prometheus-style +`/metrics` plus k8s-compatible `/healthz` and `/readyz`. Producer side +is the OTel SDK MeterProvider injected into every component via +`pipeline.TelemetrySettings`. Aligns three pre-existing divergences +from OTel collector v0.152.0: `Host.ReportStatus`, `CreateSettings` +shape, and `TelemetrySettings` shape. + +## Goals + +1. Operators scrape one endpoint and see what tracecore is doing. +2. Receiver authors wire self-telemetry in one line. +3. The surface itself has zero workload-perturbation risk + (default-off; localhost-only when on). + +## Non-goals (carry-forward from M2) + +- `pprof` endpoint (security policy story > 5 LOC plumbing). +- OTLP push reader (operators on push-only backends). +- `MetricsLevel` knob (defer until cardinality is a real problem). +- Histogram bucket tuning for `collection_latency_seconds`. +- Per-role `CreateSettings` split. + +## Architecture + +```mermaid +flowchart LR + classDef component fill:#fef3c7,stroke:#92400e + classDef sdk fill:#e0e7ff,stroke:#3730a3 + classDef surface fill:#dcfce7,stroke:#166534 + classDef operator fill:#fecaca,stroke:#991b1b + + R[clockreceiver / M8 / M9]:::component + E[stdoutexporter / future exporters]:::component + + R -- selftelemetry.Receiver --> SR[receiverImpl]:::sdk + E -- selftelemetry.Exporter --> SE[exporterImpl]:::sdk + + SR -- Meter.Counter.Add --> MP[OTel SDK MeterProvider]:::sdk + SE -- Meter.Counter.Add --> MP + + SLO[AggregateSLOSource + WindowedRate]:::sdk -- Observable callbacks --> MP + + MP -- Reader --> EX[OTel Prometheus exporter]:::sdk + EX --> REG[prometheus.Registry]:::sdk + REG --> H[promhttp.HandlerFor]:::surface + H --> M[/metrics]:::surface + HC[handleHealthz]:::surface --> HZ[/healthz]:::surface + RD[handleReadyz]:::surface --> RZ[/readyz]:::surface + + M --> OPS[Prometheus / Mimir / Grafana]:::operator + HZ --> K8S[k8s livenessProbe]:::operator + RZ --> K8S +``` + +Three layers: **components** (receivers/exporters call the +producer-side `selftelemetry` interfaces), **SDK** (OTel +MeterProvider with the bundled Prom exporter writing into a +tracecore-owned `*prometheus.Registry`), **surface** (single +ServeMux mounting `/metrics`, `/healthz`, `/readyz` on one +listener). Operator scraping ↔ k8s probes ↔ Prom/Grafana sit on +the right. + +The dashed-line equivalents you'd add for M3+: a TracerProvider +parallel to the MeterProvider for tracing; an OTLP push Reader +parallel to the Prom Reader for push-only operators. + +## Producer surface: `selftelemetry.Receiver` + +The five-method interface already shipped at M1.6: + +| Method | Metric (OTel SDK name) | Prom exposition | +|---|---|---| +| `IncError(kind)` | `tracecore.receiver.errors_total` | `tracecore_receiver_errors_total{kind,component_id}` | +| `IncEmissions(n)` | `tracecore.receiver.emissions_total` | `tracecore_receiver_emissions_total{component_id}` | +| `ObserveLatency(d)` | `tracecore.receiver.collection_latency_seconds` | histogram | +| `SetDegraded(b)` | `tracecore.receiver.degraded_seconds_total` | observable counter — accumulates including currently-open interval | +| `MarkActivity()` | `tracecore.receiver.last_activity_unix_seconds` | observable gauge | + +Receiver authors get the real impl from `selftelemetry.NewReceiver(id, +mp)` where `mp` comes from `telSet.MeterProvider`. The noop impl +remains for tests + the import-time default. + +## Operator surface: `/metrics`, `/healthz`, `/readyz` + +Single `http.ServeMux` behind one listener. + +- `/metrics` — promhttp handler over the OTel-fed + `*prometheus.Registry`. text-exposition format. HEAD + GET work. +- `/healthz` — 200 unless shutting down. 503 on shutdown. +- `/readyz` — 200 once `cmd/tracecore.runCollect`'s `ready` atomic + flips true (after `Runtime.Start` returns). 503 before that AND + during shutdown. A receiver entering "degraded" state does NOT + flip `/readyz` to 503 — that policy decision keeps k8s from + evicting pods on transient degradation. + +## Config schema + +```yaml +telemetry: + enabled: false # default OFF — opt in to bind any port + listen: "localhost:8888" # default localhost-only + paths: + metrics: /metrics + healthz: /healthz + readyz: /readyz +``` + +When `enabled: false`, other fields are not validated. Operators may +keep a fully-spelled production block commented or with the disable +flag without tripping `tracecore validate`. + +Validation: + +- `listen` must be a host:port (`net.SplitHostPort`). +- Paths must start with `/`. +- Empty fields apply the canonical defaults. + +## Lifecycle + +`cmd/tracecore.runCollect`: + +1. Build BuildInfo from `internal/version`. +2. If telemetry enabled: construct MeterProvider + Server. Start + the Server. Now `/healthz` returns 200 and `/readyz` returns 503 + (probe-friendly during boot). +3. Build pipelines with `WithMeterProvider(mp)` + `WithBuildInfo(bi)`. +4. Start runtime. +5. Flip `ready` atomic to true. `/readyz` transitions to 200. +6. Wait for signal. +7. Flip `ready` to false. `/readyz` transitions to 503 — scrapers + stop sending traffic. +8. Shut down runtime. +9. Shut down telemetry server. Connections drain. +10. Shut down MeterProvider. + +Order matters: server closes connections before MeterProvider +drains so a final scrape doesn't fail mid-export. + +## Divergences from OTel collector v0.152.0 (STRATEGY.md M2 rows +move from open to done) + +1. **`Host.ReportStatus`** → free function `componentstatus.ReportStatus(host, ev)`. + tracecore's package is `internal/componentstatus` rather than + `go.opentelemetry.io/collector/component/componentstatus` — same + shape, in-tree to avoid pulling the collector component module. +2. **`CreateSettings`** — gains `BuildInfo BuildInfo` field + + trailing `_ struct{}` guard. Mirrors OTel's + `{ID, TelemetrySettings, BuildInfo, _ struct{}}` at v1.55.0, + minus the per-role split (deferred to first cross-role config + divergence; carry-forward). +3. **`TelemetrySettings`** — gains `MeterProvider metric.MeterProvider` + + `_ struct{}` guard. Mirrors OTel's + `{Logger, TracerProvider, MeterProvider, Resource, _ struct{}}` + at v1.55.0, minus TracerProvider (deferred to post-v1; + carry-forward). + +Pre-existing divergences kept (already in STRATEGY): + +- `*slog.Logger` vs OTel's `*zap.Logger`. +- `localhost:8888` default vs OTel's `0.0.0.0:8888` (security + tiebreaker; documented). +- `tracecore.*` self-metric names vs `otelcol_*` semconv (separate + surface). + +## Risks + mitigations + +| Risk | Mitigation | +|---|---| +| OTel Prom exporter (v0.x) API churn | Single chokepoint at `telemetry.NewMeterProvider`; pinned version | +| Listener fd leak under repeated Start/Shutdown | `Server.Shutdown` waits for `Serve` to return + `goleak` in TestMain | +| `/readyz` flapping under k8s | Degraded ≠ not-ready; only "pre-Started" returns 503 | +| Shutdown exceeds PRINCIPLES §1 1s budget | `Server.Shutdown` capped at 800ms; pull-based exporter has no remote drain | +| Self-metric cardinality explosion | `kind` labels are low-cardinality by contract; receivers calling `IncError(err.Error())` would violate it (docs warn) | +| `failure_rate` pinned > 0 forever after one historical failure | Rolling 60s window via timestamped ring buffer in `AggregateSLOSource`; lifetime ratio is never exposed | +| Server.Start fails after CAS → Shutdown hangs on doneCh | Close `doneCh` in the bind-error path; regression test forces EADDRINUSE | +| `http.Server.Serve` returns non-clean error silently | `ServerConfig.Logger` surfaces non-`ErrServerClosed` exits at slog.Error | + +## TLS — carry-forward schema sketch + +M2 ships plaintext-only. The reverse-proxy approach (operator runs +Envoy/nginx/Istio sidecar in front of `:8888`) is the documented +near-term recommendation; in-process TLS is deferred so the config +schema stays stable across milestones, sketched here: + +```yaml +telemetry: + enabled: true + listen: "0.0.0.0:8888" + tls: + enabled: false # default off + cert_file: "" # PEM-encoded server certificate + key_file: "" # PEM-encoded private key + client_ca_file: "" # set for mTLS — clients must present + # a cert signed by this CA + min_version: "TLS1.3" # "TLS1.2" | "TLS1.3" (default 1.3) + paths: + metrics: /metrics + healthz: /healthz + readyz: /readyz +``` + +Future milestones implementing TLS preserve this YAML shape. The +implementation: + +- `tls.enabled: true` requires `cert_file` + `key_file`; the loader + rejects bare-enabled or cert-without-key. +- `client_ca_file` set → mTLS enforced (`tls.Config.ClientAuth = + RequireAndVerifyClientCert`); otherwise server auth only. +- `min_version` defaults to `TLS1.3`; downgrade to `TLS1.2` + requires explicit operator action. +- Cert + key are reloaded on SIGHUP — operators rotate without + restart. + +Pre-1.0 the schema may shift; v1.0 anchors it. + +## Deprecation policy for self-metric names + +Operators write Prometheus alert rules, Grafana dashboards, and SLO +recording rules against the metric names this surface ships: + +- `tracecore_receiver_{errors,emissions,collection_latency_seconds,degraded_seconds,last_activity_unix_seconds}_total` +- `tracecore_exporter_calls_total`, `tracecore_exporter_failure_rate` +- `tracecore_queue_depth_ratio`, `tracecore_component_restart_count_per_hour` +- `tracecore_build_info`, `tracecore_selftelemetry_init_errors_total` + +A future milestone renaming any of these silently breaks operator +tooling. To avoid that, tracecore commits to the following deprecation +procedure for any post-1.0 rename: + +1. **Announce.** Open an RFC documenting the rename, the rationale, + and the cut-over window. Land an `Assisted-by:` commit that adds + the new name **alongside** the old; both emit the same value. + +2. **Bridge window.** Both names emit for at least one minor release + (`v0.X.0` → `v0.X+1.0`). Operators update their rules at their own + pace. + +3. **Surface usage.** During the bridge window emit a + `tracecore_deprecated_metric{old_name, new_name}` counter that + ticks every time the deprecated name is scraped (the OTel SDK + doesn't directly expose scrape-time counters; an observable gauge + sufficies for "old name is still being asked for"). Operators + alert on the counter to find dashboards still using the old name. + +4. **Removal.** After the bridge window the old name is removed in + the same minor release that announces the deprecated counter's + removal. CHANGELOG `[Removed]` section + a release-notes + `[CHANGE]` block warn loudly. + +Pre-1.0 (today, M2–M7) tracecore reserves the right to rename without +this procedure — the surface is still settling. The CHANGELOG entry +for any pre-1.0 rename explicitly notes "no bridge window: pre-1.0 +contract." + +## Acceptance tests + +| Test | Pin | +|---|---| +| `internal/pipeline.TestCreateSettings_HasBuildInfo` | BuildInfo round-trips through CreateSettings | +| `internal/pipeline.TestTelemetrySettings_HasMeterProvider` | MeterProvider round-trips through TelemetrySettings | +| `internal/telemetry.TestNewMeterProvider_YieldsWorkingMeter` | Provider yields working Meter; emissions surface via PromHandler | +| `internal/telemetry.TestServer_*` | Lifecycle, /metrics 200, /healthz + /readyz semantics, shutdown <1s | +| `internal/selftelemetry.TestReceiver_*` | Five-method impl emits the expected Prometheus metric families | +| `internal/componentstatus.TestReportStatus_*` | Free fn delegates / silently discards | +| `cmd/tracecore.TestIntegration_TelemetrySurface_EndToEnd` | Real binary, real config, real scrape — four clockreceiver self-metrics visible (emissions, latency, activity, degraded); errors-path covered by `components/receivers/clockreceiver.TestIntegration_ErrorsTotal_SurfacesOnDownstreamFailure` | +| `cmd/tracecore.TestIntegration_TelemetryDisabled_NoListener` | Default-off contract: no port bound | diff --git a/go.mod b/go.mod index 870a616e..04e5b25a 100644 --- a/go.mod +++ b/go.mod @@ -1,11 +1,16 @@ module github.com/tracecoreai/tracecore -go 1.26.2 +go 1.26.3 require ( github.com/alecthomas/kingpin/v2 v2.4.0 + github.com/prometheus/client_golang v1.23.2 github.com/stretchr/testify v1.11.1 go.opentelemetry.io/collector/pdata v1.58.0 + go.opentelemetry.io/otel v1.43.0 + go.opentelemetry.io/otel/exporters/prometheus v0.65.0 + go.opentelemetry.io/otel/metric v1.43.0 + go.opentelemetry.io/otel/sdk/metric v1.43.0 go.uber.org/goleak v1.3.0 gopkg.in/yaml.v3 v3.0.1 ) @@ -25,7 +30,7 @@ require ( github.com/OpenPeeDeeP/depguard/v2 v2.2.1 // indirect github.com/alecthomas/chroma/v2 v2.17.2 // indirect github.com/alecthomas/go-check-sumtype v0.3.1 // indirect - github.com/alecthomas/units v0.0.0-20211218093645-b94a6e3cc137 // indirect + github.com/alecthomas/units v0.0.0-20240927000941-0f3dac36c52b // indirect github.com/alexkohler/nakedret/v2 v2.0.6 // indirect github.com/alexkohler/prealloc v1.0.0 // indirect github.com/alingse/asasalint v0.0.11 // indirect @@ -67,6 +72,8 @@ require ( github.com/fzipp/gocyclo v0.6.0 // indirect github.com/ghostiam/protogetter v0.3.15 // indirect github.com/go-critic/go-critic v0.13.0 // indirect + github.com/go-logr/logr v1.4.3 // indirect + github.com/go-logr/stdr v1.2.2 // indirect github.com/go-toolsmith/astcast v1.1.0 // indirect github.com/go-toolsmith/astcopy v1.1.0 // indirect github.com/go-toolsmith/astequal v1.2.0 // indirect @@ -78,7 +85,6 @@ require ( github.com/go-xmlfmt/xmlfmt v1.1.3 // indirect github.com/gobwas/glob v0.2.3 // indirect github.com/gofrs/flock v0.12.1 // indirect - github.com/golang/protobuf v1.5.3 // indirect github.com/golangci/dupl v0.0.0-20250308024227-f665c8d69b32 // indirect github.com/golangci/go-printf-func-name v0.1.0 // indirect github.com/golangci/gofmt v0.0.0-20250106114630-d62b90e6713d // indirect @@ -90,6 +96,7 @@ require ( github.com/golangci/unconvert v0.0.0-20250410112200-a129a6e6413e // indirect github.com/google/addlicense v1.2.0 // indirect github.com/google/go-cmp v0.7.0 // indirect + github.com/google/uuid v1.6.0 // indirect github.com/gordonklaus/ineffassign v0.1.0 // indirect github.com/gostaticanalysis/analysisutil v0.7.1 // indirect github.com/gostaticanalysis/comment v1.5.0 // indirect @@ -128,7 +135,6 @@ require ( github.com/mattn/go-colorable v0.1.14 // indirect github.com/mattn/go-isatty v0.0.20 // indirect github.com/mattn/go-runewidth v0.0.16 // indirect - github.com/matttproud/golang_protobuf_extensions v1.0.1 // indirect github.com/mgechev/revive v1.9.0 // indirect github.com/mitchellh/go-homedir v1.1.0 // indirect github.com/mitchellh/mapstructure v1.5.0 // indirect @@ -136,6 +142,7 @@ require ( github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect github.com/moricho/tparallel v0.3.2 // indirect github.com/muesli/termenv v0.16.0 // indirect + github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect github.com/nakabonne/nestif v0.3.1 // indirect github.com/nishanths/exhaustive v0.12.0 // indirect github.com/nishanths/predeclared v0.2.2 // indirect @@ -145,10 +152,10 @@ require ( github.com/pelletier/go-toml/v2 v2.2.4 // indirect github.com/pmezard/go-difflib v1.0.0 // indirect github.com/polyfloyd/go-errorlint v1.8.0 // indirect - github.com/prometheus/client_golang v1.12.1 // indirect - github.com/prometheus/client_model v0.2.0 // indirect - github.com/prometheus/common v0.32.1 // indirect - github.com/prometheus/procfs v0.7.3 // indirect + github.com/prometheus/client_model v0.6.2 // indirect + github.com/prometheus/common v0.67.5 // indirect + github.com/prometheus/otlptranslator v1.0.0 // indirect + github.com/prometheus/procfs v0.20.1 // indirect github.com/quasilyte/go-ruleguard v0.4.4 // indirect github.com/quasilyte/go-ruleguard/dsl v0.3.22 // indirect github.com/quasilyte/gogrep v0.5.0 // indirect @@ -198,11 +205,15 @@ require ( go-simpler.org/musttag v0.13.1 // indirect go-simpler.org/sloglint v0.11.0 // indirect go.augendre.info/fatcontext v0.8.0 // indirect + go.opentelemetry.io/auto/sdk v1.2.1 // indirect go.opentelemetry.io/collector/featuregate v1.58.0 // indirect + go.opentelemetry.io/otel/sdk v1.43.0 // indirect + go.opentelemetry.io/otel/trace v1.43.0 // indirect go.uber.org/atomic v1.7.0 // indirect go.uber.org/automaxprocs v1.6.0 // indirect go.uber.org/multierr v1.11.0 // indirect go.uber.org/zap v1.24.0 // indirect + go.yaml.in/yaml/v2 v2.4.4 // indirect golang.org/x/exp/typeparams v0.0.0-20250210185358-939b2ce775ac // indirect golang.org/x/mod v0.32.0 // indirect golang.org/x/sync v0.19.0 // indirect diff --git a/go.sum b/go.sum index e4b1d459..d24059e3 100644 --- a/go.sum +++ b/go.sum @@ -2,39 +2,6 @@ 4d63.com/gocheckcompilerdirectives v1.3.0/go.mod h1:ofsJ4zx2QAuIP/NO/NAh1ig6R1Fb18/GI7RVMwz7kAY= 4d63.com/gochecknoglobals v0.2.2 h1:H1vdnwnMaZdQW/N+NrkT1SZMTBmcwHe9Vq8lJcYYTtU= 4d63.com/gochecknoglobals v0.2.2/go.mod h1:lLxwTQjL5eIesRbvnzIP3jZtG140FnTdz+AlMa+ogt0= -cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw= -cloud.google.com/go v0.34.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw= -cloud.google.com/go v0.38.0/go.mod h1:990N+gfupTy94rShfmMCWGDn0LpTmnzTp2qbd1dvSRU= -cloud.google.com/go v0.44.1/go.mod h1:iSa0KzasP4Uvy3f1mN/7PiObzGgflwredwwASm/v6AU= -cloud.google.com/go v0.44.2/go.mod h1:60680Gw3Yr4ikxnPRS/oxxkBccT6SA1yMk63TGekxKY= -cloud.google.com/go v0.45.1/go.mod h1:RpBamKRgapWJb87xiFSdk4g1CME7QZg3uwTez+TSTjc= -cloud.google.com/go v0.46.3/go.mod h1:a6bKKbmY7er1mI7TEI4lsAkts/mkhTSZK8w33B4RAg0= -cloud.google.com/go v0.50.0/go.mod h1:r9sluTvynVuxRIOHXQEHMFffphuXHOMZMycpNR5e6To= -cloud.google.com/go v0.52.0/go.mod h1:pXajvRH/6o3+F9jDHZWQ5PbGhn+o8w9qiu/CffaVdO4= -cloud.google.com/go v0.53.0/go.mod h1:fp/UouUEsRkN6ryDKNW/Upv/JBKnv6WDthjR6+vze6M= -cloud.google.com/go v0.54.0/go.mod h1:1rq2OEkV3YMf6n/9ZvGWI3GWw0VoqH/1x2nd8Is/bPc= -cloud.google.com/go v0.56.0/go.mod h1:jr7tqZxxKOVYizybht9+26Z/gUq7tiRzu+ACVAMbKVk= -cloud.google.com/go v0.57.0/go.mod h1:oXiQ6Rzq3RAkkY7N6t3TcE6jE+CIBBbA36lwQ1JyzZs= -cloud.google.com/go v0.62.0/go.mod h1:jmCYTdRCQuc1PHIIJ/maLInMho30T/Y0M4hTdTShOYc= -cloud.google.com/go v0.65.0/go.mod h1:O5N8zS7uWy9vkA9vayVHs65eM1ubvY4h553ofrNHObY= -cloud.google.com/go/bigquery v1.0.1/go.mod h1:i/xbL2UlR5RvWAURpBYZTtm/cXjCha9lbfbpx4poX+o= -cloud.google.com/go/bigquery v1.3.0/go.mod h1:PjpwJnslEMmckchkHFfq+HTD2DmtT67aNFKH1/VBDHE= -cloud.google.com/go/bigquery v1.4.0/go.mod h1:S8dzgnTigyfTmLBfrtrhyYhwRxG72rYxvftPBK2Dvzc= -cloud.google.com/go/bigquery v1.5.0/go.mod h1:snEHRnqQbz117VIFhE8bmtwIDY80NLUZUMb4Nv6dBIg= -cloud.google.com/go/bigquery v1.7.0/go.mod h1://okPTzCYNXSlb24MZs83e2Do+h+VXtc4gLoIoXIAPc= -cloud.google.com/go/bigquery v1.8.0/go.mod h1:J5hqkt3O0uAFnINi6JXValWIb1v0goeZM77hZzJN/fQ= -cloud.google.com/go/datastore v1.0.0/go.mod h1:LXYbyblFSglQ5pkeyhO+Qmw7ukd3C+pD7TKLgZqpHYE= -cloud.google.com/go/datastore v1.1.0/go.mod h1:umbIZjpQpHh4hmRpGhH4tLFup+FVzqBi1b3c64qFpCk= -cloud.google.com/go/pubsub v1.0.1/go.mod h1:R0Gpsv3s54REJCy4fxDixWD93lHJMoZTyQ2kNxGRt3I= -cloud.google.com/go/pubsub v1.1.0/go.mod h1:EwwdRX2sKPjnvnqCa270oGRyludottCI76h+R3AArQw= -cloud.google.com/go/pubsub v1.2.0/go.mod h1:jhfEVHT8odbXTkndysNHCcx0awwzvfOlguIAii9o8iA= -cloud.google.com/go/pubsub v1.3.1/go.mod h1:i+ucay31+CNRpDW4Lu78I4xXG+O1r/MAHgjpRVR+TSU= -cloud.google.com/go/storage v1.0.0/go.mod h1:IhtSnM/ZTZV8YYJWCY8RULGVqBDmpoyjwiyrjsg+URw= -cloud.google.com/go/storage v1.5.0/go.mod h1:tpKbwo567HUNpVclU5sGELwQWBDZ8gh0ZeosJ0Rtdos= -cloud.google.com/go/storage v1.6.0/go.mod h1:N7U0C8pVQ/+NIKOBQyamJIeKQKkZ+mxpohlUTyfDhBk= -cloud.google.com/go/storage v1.8.0/go.mod h1:Wv1Oy7z6Yz3DshWRJFhqM/UCfaWIRTdp0RXyy7KQOVs= -cloud.google.com/go/storage v1.10.0/go.mod h1:FLPqc6j+Ki4BU591ie1oL6qBQGu2Bl/tZ9ullr3+Kg0= -dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU= github.com/4meepo/tagalign v1.4.2 h1:0hcLHPGMjDyM1gHG58cS73aQF8J4TdVR96TZViorO9E= github.com/4meepo/tagalign v1.4.2/go.mod h1:+p4aMyFM+ra7nb41CnFG6aSDXqRxU/w1VQqScKqDARI= github.com/Abirdcfly/dupword v0.1.3 h1:9Pa1NuAsZvpFPi9Pqkd93I7LIYRURj+A//dFd5tgBeE= @@ -45,10 +12,8 @@ github.com/Antonboom/nilnil v1.1.0 h1:jGxJxjgYS3VUUtOTNk8Z1icwT5ESpLH/426fjmQG+n github.com/Antonboom/nilnil v1.1.0/go.mod h1:b7sAlogQjFa1wV8jUW3o4PMzDVFLbTux+xnQdvzdcIE= github.com/Antonboom/testifylint v1.6.1 h1:6ZSytkFWatT8mwZlmRCHkWz1gPi+q6UBSbieji2Gj/o= github.com/Antonboom/testifylint v1.6.1/go.mod h1:k+nEkathI2NFjKO6HvwmSrbzUcQ6FAnbZV+ZRrnXPLI= -github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU= github.com/BurntSushi/toml v1.5.0 h1:W5quZX/G/csjUnuI8SUYlsHs9M38FC7znL0lIO+DvMg= github.com/BurntSushi/toml v1.5.0/go.mod h1:ukJfTF/6rtPPRCnwkur4qwRxa8vTRFBF0uk2lLoLwho= -github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo= github.com/Djarvur/go-err113 v0.0.0-20210108212216-aea10b59be24 h1:sHglBQTwgx+rWPdisA5ynNEsoARbiCBOyGcJM4/OzsM= github.com/Djarvur/go-err113 v0.0.0-20210108212216-aea10b59be24/go.mod h1:4UJr5HIiMZrwgkSPdsjy2uOQExX/WEILpIrO9UPGuXs= github.com/GaijinEntertainment/go-exhaustruct/v3 v3.3.1 h1:Sz1JIXEcSfhz7fUi7xHnhpIE0thVASYjvosApmHuD2k= @@ -67,13 +32,8 @@ github.com/alecthomas/kingpin/v2 v2.4.0 h1:f48lwail6p8zpO1bC4TxtqACaGqHYA22qkHjH github.com/alecthomas/kingpin/v2 v2.4.0/go.mod h1:0gyi0zQnjuFk8xrkNKamJoyUo382HRL7ATRpFZCw6tE= github.com/alecthomas/repr v0.4.0 h1:GhI2A8MACjfegCPVq9f1FLvIBS+DrQ2KQBFZP1iFzXc= github.com/alecthomas/repr v0.4.0/go.mod h1:Fr0507jx4eOXV7AlPV6AVZLYrLIuIeSOWtW57eE/O/4= -github.com/alecthomas/template v0.0.0-20160405071501-a0175ee3bccc/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc= -github.com/alecthomas/template v0.0.0-20190718012654-fb15b899a751/go.mod h1:LOuyumcjzFXgccqObfd/Ljyb9UuFJ6TxHnclSeseNhc= -github.com/alecthomas/units v0.0.0-20151022065526-2efee857e7cf/go.mod h1:ybxpYRFXyAe+OPACYpWeL0wqObRcbAqCMya13uyzqw0= -github.com/alecthomas/units v0.0.0-20190717042225-c3de453c63f4/go.mod h1:ybxpYRFXyAe+OPACYpWeL0wqObRcbAqCMya13uyzqw0= -github.com/alecthomas/units v0.0.0-20190924025748-f65c72e2690d/go.mod h1:rBZYJk541a8SKzHPHnH3zbiI+7dagKZ0cgpgrD7Fyho= -github.com/alecthomas/units v0.0.0-20211218093645-b94a6e3cc137 h1:s6gZFSlWYmbqAuRjVTiNNhvNRfY2Wxp9nhfyel4rklc= -github.com/alecthomas/units v0.0.0-20211218093645-b94a6e3cc137/go.mod h1:OMCwj8VM1Kc9e19TLln2VL61YJF0x1XFtfdL4JdbSyE= +github.com/alecthomas/units v0.0.0-20240927000941-0f3dac36c52b h1:mimo19zliBX/vSQ6PWWSL9lK8qwHozUj03+zLoEB8O0= +github.com/alecthomas/units v0.0.0-20240927000941-0f3dac36c52b/go.mod h1:fvzegU4vN3H1qMT+8wDmzjAcDONcgo2/SZ/TyfdUOFs= github.com/alexkohler/nakedret/v2 v2.0.6 h1:ME3Qef1/KIKr3kWX3nti3hhgNxw6aqN5pZmQiFSsuzQ= github.com/alexkohler/nakedret/v2 v2.0.6/go.mod h1:l3RKju/IzOMQHmsEvXwkqMDzHHvurNQfAgE1eVmT40Q= github.com/alexkohler/prealloc v1.0.0 h1:Hbq0/3fJPQhNkN0dR95AVrr6R7tou91y0uHG5pOcUuw= @@ -90,8 +50,6 @@ github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiE github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8= github.com/benbjohnson/clock v1.1.0 h1:Q92kusRqC1XV2MjkWETPvjJVqKetz1OzxZB7mHJLju8= github.com/benbjohnson/clock v1.1.0/go.mod h1:J11/hYXuz8f4ySSvYwY0FKfm+ezbsZBKZxNJlLklBHA= -github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973/go.mod h1:Dwedo/Wpr24TaqPxmxbtue+5NUziq4I4S80YR8gNf3Q= -github.com/beorn7/perks v1.0.0/go.mod h1:KWe93zE9D1o94FZ5RNwFwVgaQK1VOXiVxmqh+CedLV8= github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM= github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw= github.com/bkielbasa/cyclop v1.2.3 h1:faIVMIGDIANuGPWH031CZJTi2ymOQBULs9H21HSMa5w= @@ -114,9 +72,6 @@ github.com/catenacyber/perfsprint v0.9.1 h1:5LlTp4RwTooQjJCvGEFV6XksZvWE7wCOUvjD github.com/catenacyber/perfsprint v0.9.1/go.mod h1:q//VWC2fWbcdSLEY1R3l8n0zQCDPdE4IjZwyY1HMunM= github.com/ccojocar/zxcvbn-go v1.0.2 h1:na/czXU8RrhXO4EZme6eQJLR4PzcGsahsBOAwU6I3Vg= github.com/ccojocar/zxcvbn-go v1.0.2/go.mod h1:g1qkXtUSvHP8lhHp5GrSmTz6uWALGRMQdw6Qnz/hi60= -github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU= -github.com/cespare/xxhash/v2 v2.1.1/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= -github.com/cespare/xxhash/v2 v2.1.2/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= github.com/charithe/durationcheck v0.0.10 h1:wgw73BiocdBDQPik+zcEoBG/ob8uyBHf2iyoHGPf5w4= @@ -133,13 +88,8 @@ github.com/charmbracelet/x/term v0.2.1 h1:AQeHeLZ1OqSXhrAWpYUtZyX1T3zVxfpZuEQMIQ github.com/charmbracelet/x/term v0.2.1/go.mod h1:oQ4enTYFV7QN4m0i9mzHrViD7TQKvNEEkHUMCmsxdUg= github.com/chavacava/garif v0.1.0 h1:2JHa3hbYf5D9dsgseMKAmc/MZ109otzgNFk5s87H9Pc= github.com/chavacava/garif v0.1.0/go.mod h1:XMyYCkEL58DF0oyW4qDjjnPWONs2HBqYKI+UIPD+Gww= -github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI= -github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e/go.mod h1:nSuG5e5PlCu98SY8svDHJxuZscDgtXS6KTTbou5AhLI= -github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMnBNeIyt5eFwwo7qiLfzFZmjNmxjkiQlU= github.com/ckaznocha/intrange v0.3.1 h1:j1onQyXvHUsPWujDH6WIjhyH26gkRt/txNlV7LspvJs= github.com/ckaznocha/intrange v0.3.1/go.mod h1:QVepyz1AkUoFQkpEqksSYpNpUo3c5W7nWh/s6SHIJJk= -github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw= -github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc= github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g= github.com/curioswitch/go-reassign v0.3.0 h1:dh3kpQHuADL3cobV/sSGETA8DOv457dwl+fbBAhrQPs= github.com/curioswitch/go-reassign v0.3.0/go.mod h1:nApPCCTtqLJN/s8HfItCcKV0jIPwluBOvZP+dsJGA88= @@ -156,10 +106,6 @@ github.com/denis-tingaikin/go-header v0.5.0 h1:SRdnP5ZKvcO9KKRP1KJrhFR3RrlGuD+42 github.com/denis-tingaikin/go-header v0.5.0/go.mod h1:mMenU5bWrok6Wl2UsZjy+1okegmwQ3UgWl4V1D8gjlY= github.com/dlclark/regexp2 v1.11.5 h1:Q/sSnsKerHeCkc/jSTNq1oCm7KiVgUMZRDUoRu0JQZQ= github.com/dlclark/regexp2 v1.11.5/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8= -github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4= -github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4= -github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98= -github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7+kN2VEUnK/pcBlmesArF7c= github.com/ettle/strcase v0.2.0 h1:fGNiVF21fHXpX1niBgk0aROov1LagYsOwV/xqKDKR/Q= github.com/ettle/strcase v0.2.0/go.mod h1:DajmHElDSaX76ITe3/VHVyMin4LWSJN5Z909Wp+ED1A= github.com/fatih/color v1.18.0 h1:S8gINlzdQ840/4pfAwic/ZE0djQEH3wM94VfqLTZcOM= @@ -178,20 +124,13 @@ github.com/ghostiam/protogetter v0.3.15 h1:1KF5sXel0HE48zh1/vn0Loiw25A9ApyseLzQu github.com/ghostiam/protogetter v0.3.15/go.mod h1:WZ0nw9pfzsgxuRsPOFQomgDVSWtDLJRfQJEhsGbmQMA= github.com/go-critic/go-critic v0.13.0 h1:kJzM7wzltQasSUXtYyTl6UaPVySO6GkaR1thFnJ6afY= github.com/go-critic/go-critic v0.13.0/go.mod h1:M/YeuJ3vOCQDnP2SU+ZhjgRzwzcBW87JqLpMJLrZDLI= -github.com/go-gl/glfw v0.0.0-20190409004039-e6da0acd62b1/go.mod h1:vR7hzQXu2zJy9AVAgeJqvqgH9Q5CA+iKCZ2gyEVpxRU= -github.com/go-gl/glfw/v3.3/glfw v0.0.0-20191125211704-12ad95a8df72/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8= -github.com/go-gl/glfw/v3.3/glfw v0.0.0-20200222043503-6f7a984d4dc4/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8= -github.com/go-kit/kit v0.8.0/go.mod h1:xBxKIO96dXMWWy0MnWVtmwkA9/13aqxPnvrjFYMA2as= -github.com/go-kit/kit v0.9.0/go.mod h1:xBxKIO96dXMWWy0MnWVtmwkA9/13aqxPnvrjFYMA2as= -github.com/go-kit/log v0.1.0/go.mod h1:zbhenjAZHb184qTLMA9ZjW7ThYL0H2mk7Q6pNt4vbaY= -github.com/go-logfmt/logfmt v0.3.0/go.mod h1:Qt1PoO58o5twSAckw1HlFXLmHsOX5/0LbT9GBnD5lWE= -github.com/go-logfmt/logfmt v0.4.0/go.mod h1:3RMwSq7FuexP4Kalkev3ejPJsZTpXXBr9+V4qmtdjCk= -github.com/go-logfmt/logfmt v0.5.0/go.mod h1:wCYkCAKZfumFQihp8CzCvQ3paCTfi41vtzG1KdI/P7A= -github.com/go-logr/logr v1.4.2 h1:6pFjapn8bFcIbiKo3XT4j/BhANplGihG6tvd+8rYgrY= -github.com/go-logr/logr v1.4.2/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= +github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A= +github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI= +github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= +github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag= +github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE= github.com/go-quicktest/qt v1.101.0 h1:O1K29Txy5P2OK0dGo59b7b0LR6wKfIhttaAhHUyn7eI= github.com/go-quicktest/qt v1.101.0/go.mod h1:14Bz/f7NwaXPtdYEgzsx46kqSxVwTbzVZsDC26tQJow= -github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY= github.com/go-task/slim-sprig/v3 v3.0.0 h1:sUs3vkvUymDpBKi3qH1YSqBQk9+9D/8M2mN1vB6EwHI= github.com/go-task/slim-sprig/v3 v3.0.0/go.mod h1:W848ghGpv3Qj3dhTPRyJypKRiqCdHZiAzKg9hl15HA8= github.com/go-toolsmith/astcast v1.1.0 h1:+JN9xZV1A+Re+95pgnMgDboWNVnIMMQXwfBwLRPgSC8= @@ -221,36 +160,6 @@ github.com/gobwas/glob v0.2.3 h1:A4xDbljILXROh+kObIiy5kIaPYD8e96x1tgBhUI5J+Y= github.com/gobwas/glob v0.2.3/go.mod h1:d3Ez4x06l9bZtSvzIay5+Yzi0fmZzPgnTbPcKjJAkT8= github.com/gofrs/flock v0.12.1 h1:MTLVXXHf8ekldpJk3AKicLij9MdwOWkZ+a/jHHZby9E= github.com/gofrs/flock v0.12.1/go.mod h1:9zxTsyu5xtJ9DK+1tFZyibEV7y3uwDxPPfbxeeHCoD0= -github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ= -github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q= -github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= -github.com/golang/groupcache v0.0.0-20191227052852-215e87163ea7/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= -github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= -github.com/golang/mock v1.1.1/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A= -github.com/golang/mock v1.2.0/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A= -github.com/golang/mock v1.3.1/go.mod h1:sBzyDLLjw3U8JLTeZvSv8jJB+tU5PVekmnlKIyFUx0Y= -github.com/golang/mock v1.4.0/go.mod h1:UOMv5ysSaYNkG+OFQykRIcU/QvvxJf3p21QfJ2Bt3cw= -github.com/golang/mock v1.4.1/go.mod h1:UOMv5ysSaYNkG+OFQykRIcU/QvvxJf3p21QfJ2Bt3cw= -github.com/golang/mock v1.4.3/go.mod h1:UOMv5ysSaYNkG+OFQykRIcU/QvvxJf3p21QfJ2Bt3cw= -github.com/golang/mock v1.4.4/go.mod h1:l3mdAwkq5BuhzHwde/uurv3sEJeZMXNpwsxVWU71h+4= -github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= -github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= -github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= -github.com/golang/protobuf v1.3.3/go.mod h1:vzj43D7+SQXF/4pzW/hwtAqwc6iTitCiVSaWz5lYuqw= -github.com/golang/protobuf v1.3.4/go.mod h1:vzj43D7+SQXF/4pzW/hwtAqwc6iTitCiVSaWz5lYuqw= -github.com/golang/protobuf v1.3.5/go.mod h1:6O5/vntMXwX2lRkT1hjjk0nAC1IDOTvTlVgjlRvqsdk= -github.com/golang/protobuf v1.4.0-rc.1/go.mod h1:ceaxUfeHdC40wWswd/P6IGgMaK3YpKi5j83Wpe3EHw8= -github.com/golang/protobuf v1.4.0-rc.1.0.20200221234624-67d41d38c208/go.mod h1:xKAWHe0F5eneWXFV3EuXVDTCmh+JuBKY0li0aMyXATA= -github.com/golang/protobuf v1.4.0-rc.2/go.mod h1:LlEzMj4AhA7rCAGe4KMBDvJI+AwstrUpVNzEA03Pprs= -github.com/golang/protobuf v1.4.0-rc.4.0.20200313231945-b860323f09d0/go.mod h1:WU3c8KckQ9AFe+yFwt9sWVRKCVIyN9cPHBJSNnbL67w= -github.com/golang/protobuf v1.4.0/go.mod h1:jodUvKwWbYaEsadDk5Fwe5c77LiNKVO9IDvqG2KuDX0= -github.com/golang/protobuf v1.4.1/go.mod h1:U8fpvMrcmy5pZrNK1lt4xCsGvpyWQ/VVv6QDs8UjoX8= -github.com/golang/protobuf v1.4.2/go.mod h1:oDoupMAO8OvCJWAcko0GGGIgR6R6ocIYbsSw735rRwI= -github.com/golang/protobuf v1.4.3/go.mod h1:oDoupMAO8OvCJWAcko0GGGIgR6R6ocIYbsSw735rRwI= -github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk= -github.com/golang/protobuf v1.5.2/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY= -github.com/golang/protobuf v1.5.3 h1:KhyjKVUg7Usr/dYsdSqoFveMYd5ko72D+zANwlG1mmg= -github.com/golang/protobuf v1.5.3/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY= github.com/golangci/dupl v0.0.0-20250308024227-f665c8d69b32 h1:WUvBfQL6EW/40l6OmeSBYQJNSif4O11+bmWEz+C7FYw= github.com/golangci/dupl v0.0.0-20250308024227-f665c8d69b32/go.mod h1:NUw9Zr2Sy7+HxzdjIULge71wI6yEg1lWQr7Evcu8K0E= github.com/golangci/go-printf-func-name v0.1.0 h1:dVokQP+NMTO7jwO4bwsRwLWeudOVUPPyAKJuzv8pEJU= @@ -271,40 +180,22 @@ github.com/golangci/unconvert v0.0.0-20250410112200-a129a6e6413e h1:gD6P7NEo7Eqt github.com/golangci/unconvert v0.0.0-20250410112200-a129a6e6413e/go.mod h1:h+wZwLjUTJnm/P2rwlbJdRPZXOzaT36/FwnPnY2inzc= github.com/google/addlicense v1.2.0 h1:W+DP4A639JGkcwBGMDvjSurZHvaq2FN0pP7se9czsKA= github.com/google/addlicense v1.2.0/go.mod h1:Sm/DHu7Jk+T5miFHHehdIjbi4M5+dJDRS3Cq0rncIxA= -github.com/google/btree v0.0.0-20180813153112-4030bb1f1f0c/go.mod h1:lNA+9X1NB3Zf8V7Ke586lFgjr2dZNuvo3lPJSGZ5JPQ= -github.com/google/btree v1.0.0/go.mod h1:lNA+9X1NB3Zf8V7Ke586lFgjr2dZNuvo3lPJSGZ5JPQ= github.com/google/go-cmdtest v0.4.1-0.20220921163831-55ab3332a786 h1:rcv+Ippz6RAtvaGgKxc+8FQIpxHgsF+HBzPyYL2cyVU= github.com/google/go-cmdtest v0.4.1-0.20220921163831-55ab3332a786/go.mod h1:apVn/GCasLZUVpAJ6oWAuyP7Ne7CEsQbTnc0plM3m+o= -github.com/google/go-cmp v0.2.0/go.mod h1:oXzfMopK8JAjlY9xF4vHSVASa0yLyX7SntLO5aqRK0M= -github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= -github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= -github.com/google/go-cmp v0.4.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= -github.com/google/go-cmp v0.4.1/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= -github.com/google/go-cmp v0.5.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.1/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.4/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= -github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.8/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8= github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU= github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= -github.com/google/martian v2.1.0+incompatible/go.mod h1:9I4somxYTbIHy5NJKHRl3wXiIaQGbYVAs8BPL6v8lEs= -github.com/google/martian/v3 v3.0.0/go.mod h1:y5Zk1BBys9G+gd6Jrk0W3cC1+ELVxBWuIGO+w/tUAp0= -github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57/go.mod h1:zfwlbNMJ+OItoe0UupaVj+oy1omPYYDuagoSzA8v9mc= -github.com/google/pprof v0.0.0-20190515194954-54271f7e092f/go.mod h1:zfwlbNMJ+OItoe0UupaVj+oy1omPYYDuagoSzA8v9mc= -github.com/google/pprof v0.0.0-20191218002539-d4f498aebedc/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= -github.com/google/pprof v0.0.0-20200212024743-f11f1df84d12/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= -github.com/google/pprof v0.0.0-20200229191704-1ebb73c60ed3/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= -github.com/google/pprof v0.0.0-20200430221834-fc25d7d30c6d/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= -github.com/google/pprof v0.0.0-20200708004538-1a94d8640e99/go.mod h1:ZgVRPoUq/hfqzAqh7sHMqb3I9Rq5C59dIz2SbBwJ4eM= github.com/google/pprof v0.0.0-20241210010833-40e02aabc2ad h1:a6HEuzUHeKH6hwfN/ZoQgRgVIWFJljSWa/zetS2WTvg= github.com/google/pprof v0.0.0-20241210010833-40e02aabc2ad/go.mod h1:vavhavw2zAxS5dIdcRluK6cSGGPlZynqzFM8NdvU144= github.com/google/renameio v0.1.0 h1:GOZbcHa3HfsPKPlmyPyN2KEohoMXOhdMbHrvbpl2QaA= github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI= -github.com/googleapis/gax-go/v2 v2.0.4/go.mod h1:0Wqv26UfaUD9n4G6kQubkQ+KchISgw+vpHVxEJEs9eg= -github.com/googleapis/gax-go/v2 v2.0.5/go.mod h1:DWXyrwAJ9X0FpwwEdw+IPEYBICEFu5mhpdKc/us6bOk= +github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0= +github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= github.com/gordonklaus/ineffassign v0.1.0 h1:y2Gd/9I7MdY1oEIt+n+rowjBNDcLQq3RsH5hwJd0f9s= github.com/gordonklaus/ineffassign v0.1.0/go.mod h1:Qcp2HIAYhR7mNUVSIxZww3Guk4it82ghYcEXIAk+QT0= github.com/gostaticanalysis/analysisutil v0.7.1 h1:ZMCjoue3DtDWQ5WyU16YbjbQEQ3VuzwxALrpYd+HeKk= @@ -327,15 +218,12 @@ github.com/hashicorp/go-uuid v1.0.3/go.mod h1:6SBZvOh/SIDV7/2o3Jml5SYk/TvGqwFJ/b github.com/hashicorp/go-version v1.2.1/go.mod h1:fltr4n8CU8Ke44wwGCBoEymUuxUHl09ZGVZPK5anwXA= github.com/hashicorp/go-version v1.9.0 h1:CeOIz6k+LoN3qX9Z0tyQrPtiB1DFYRPfCIBtaXPSCnA= github.com/hashicorp/go-version v1.9.0/go.mod h1:fltr4n8CU8Ke44wwGCBoEymUuxUHl09ZGVZPK5anwXA= -github.com/hashicorp/golang-lru v0.5.0/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8= -github.com/hashicorp/golang-lru v0.5.1/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8= github.com/hashicorp/golang-lru/v2 v2.0.7 h1:a+bsQ5rvGLjzHuww6tVxozPZFVghXaHOwFs4luLUK2k= github.com/hashicorp/golang-lru/v2 v2.0.7/go.mod h1:QeFd9opnmA6QUJc5vARoKUSoFhyfM2/ZepoAG6RGpeM= github.com/hashicorp/hcl v1.0.0 h1:0Anlzjpi4vEasTeNFn2mLJgTSwt0+6sfsiTG8qcWGx4= github.com/hashicorp/hcl v1.0.0/go.mod h1:E5yfLk+7swimpb2L/Alb/PJmXilQ/rhwaUYs4T20WEQ= github.com/hexops/gotextdiff v1.0.3 h1:gitA9+qJrrTCsiCl7+kh75nPqQt1cx4ZkudSTLoUqJM= github.com/hexops/gotextdiff v1.0.3/go.mod h1:pSWU5MAI3yDq+fZBTazCSJysOMbxWL1BSow5/V2vxeg= -github.com/ianlancetaylor/demangle v0.0.0-20181102032728-5e5cf60278f6/go.mod h1:aSSvb/t6k1mPoxDqO4vJh6VOCGPwU4O0C2/Eqndh1Sc= github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8= github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw= github.com/jgautheron/goconst v1.8.1 h1:PPqCYp3K/xlOj5JmIe6O1Mj6r1DbkdbLtR3AJuZo414= @@ -344,39 +232,28 @@ github.com/jingyugao/rowserrcheck v1.1.1 h1:zibz55j/MJtLsjP1OF4bSdgXxwL1b+Vn7Tjz github.com/jingyugao/rowserrcheck v1.1.1/go.mod h1:4yvlZSDb3IyDTUZJUmpZfm2Hwok+Dtp+nu2qOq+er9c= github.com/jjti/go-spancheck v0.6.4 h1:Tl7gQpYf4/TMU7AT84MN83/6PutY21Nb9fuQjFTpRRc= github.com/jjti/go-spancheck v0.6.4/go.mod h1:yAEYdKJ2lRkDA8g7X+oKUHXOWVAXSBJRv04OhF+QUjk= -github.com/jpillora/backoff v1.0.0/go.mod h1:J/6gKK9jxlEcS3zixgDgUAsiuZ7yrSoa/FX5e0EB2j4= -github.com/json-iterator/go v1.1.6/go.mod h1:+SdeFBvtyEkXs7REEP0seUULqWtbJapLOCVDaaPEHmU= -github.com/json-iterator/go v1.1.10/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4= -github.com/json-iterator/go v1.1.11/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4= github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM= github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo= -github.com/jstemmer/go-junit-report v0.0.0-20190106144839-af01ea7f8024/go.mod h1:6v2b51hI/fHJwM22ozAgKL4VKDeJcHhJFhtBdhmNjmU= -github.com/jstemmer/go-junit-report v0.9.1/go.mod h1:Brl9GWCQeLvo8nXZwPNNblvFj/XSXhF0NWZEnDohbsk= -github.com/julienschmidt/httprouter v1.2.0/go.mod h1:SYymIcj16QtmaHHD7aYtjjsJG7VTCxuUUipMqKk8s4w= -github.com/julienschmidt/httprouter v1.3.0/go.mod h1:JR6WtHb+2LUe8TCKY3cZOxFyyO8IZAc4RVcycCCAKdM= github.com/julz/importas v0.2.0 h1:y+MJN/UdL63QbFJHws9BVC5RpA2iq0kpjrFajTGivjQ= github.com/julz/importas v0.2.0/go.mod h1:pThlt589EnCYtMnmhmRYY/qn9lCf/frPOK+WMx3xiJY= github.com/karamaru-alpha/copyloopvar v1.2.1 h1:wmZaZYIjnJ0b5UoKDjUHrikcV0zuPyyxI4SVplLd2CI= github.com/karamaru-alpha/copyloopvar v1.2.1/go.mod h1:nFmMlFNlClC2BPvNaHMdkirmTJxVCY0lhxBtlfOypMM= github.com/kisielk/errcheck v1.9.0 h1:9xt1zI9EBfcYBvdU1nVrzMzzUPUtPKs9bVSIM3TAb3M= github.com/kisielk/errcheck v1.9.0/go.mod h1:kQxWMMVZgIkDq7U8xtG/n2juOjbLgZtedi0D+/VL/i8= -github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck= github.com/kkHAIKE/contextcheck v1.1.6 h1:7HIyRcnyzxL9Lz06NGhiKvenXq7Zw6Q0UQu/ttjfJCE= github.com/kkHAIKE/contextcheck v1.1.6/go.mod h1:3dDbMRNBFaq8HFXWC1JyvDSPm43CmE6IuHam8Wr0rkg= -github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ= -github.com/konsorten/go-windows-terminal-sequences v1.0.3/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ= -github.com/kr/logfmt v0.0.0-20140226030751-b84e30acd515/go.mod h1:+0opPa2QZZtGFBFZlji/RkVcI2GknAs/DXo4wKdlNEc= -github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo= +github.com/klauspost/compress v1.18.0 h1:c/Cqfb0r+Yi+JtIEq73FWXVkRonBlf0CRNYc8Zttxdo= +github.com/klauspost/compress v1.18.0/go.mod h1:2Pp+KzxcywXVXMr50+X0Q/Lsb43OQHYWRCY2AiWywWQ= github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= -github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ= -github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI= github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= github.com/kulti/thelper v0.6.3 h1:ElhKf+AlItIu+xGnI990no4cE2+XaSu1ULymV2Yulxs= github.com/kulti/thelper v0.6.3/go.mod h1:DsqKShOvP40epevkFrvIwkCMNYxMeTNjdWL4dqWHZ6I= github.com/kunwardeep/paralleltest v1.0.14 h1:wAkMoMeGX/kGfhQBPODT/BL8XhK23ol/nuQ3SwFaUw8= github.com/kunwardeep/paralleltest v1.0.14/go.mod h1:di4moFqtfz3ToSKxhNjhOZL+696QtJGCFe132CbBLGk= +github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc= +github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw= github.com/lasiar/canonicalheader v1.1.2 h1:vZ5uqwvDbyJCnMhmFYimgMZnJMjwljN5VGY0VKbMXb4= github.com/lasiar/canonicalheader v1.1.2/go.mod h1:qJCeLFS0G/QlLQ506T+Fk/fWMa2VmBUiEI2cuMK4djI= github.com/ldez/exptostd v0.4.3 h1:Ag1aGiq2epGePuRJhez2mzOpZ8sI9Gimcb4Sb3+pk9Y= @@ -414,8 +291,6 @@ github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D github.com/mattn/go-runewidth v0.0.9/go.mod h1:H031xJmbD/WCDINGzjvQ9THkh0rPKHF+m2gUSrubnMI= github.com/mattn/go-runewidth v0.0.16 h1:E5ScNMtiwvlvB5paMFdw9p4kSQzbXFikJ5SQO6TULQc= github.com/mattn/go-runewidth v0.0.16/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w= -github.com/matttproud/golang_protobuf_extensions v1.0.1 h1:4hp9jkHxhMHkqkrB3Ix0jegS5sx/RkqARlsWZ6pIwiU= -github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5Ld7szi9bcBfOoFv/3dc6xSMkL2PC0= github.com/mgechev/revive v1.9.0 h1:8LaA62XIKrb8lM6VsBSQ92slt/o92z5+hTw3CmrvSrM= github.com/mgechev/revive v1.9.0/go.mod h1:LAPq3+MgOf7GcL5PlWIkHb0PT7XH4NuC2LdWymhb9Mo= github.com/mitchellh/go-homedir v1.1.0 h1:lukF9ziXFxDFPkA1vsr5zpc1XuPDn/wFntq5mG+4E0Y= @@ -425,8 +300,6 @@ github.com/mitchellh/mapstructure v1.5.0/go.mod h1:bFUtVrKA4DC2yAKiSyO/QUcy7e+RR github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg= github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= -github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0= -github.com/modern-go/reflect2 v1.0.1/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0= github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk= github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee h1:W5t00kpgFdJifH4BDsTlE89Zl93FEloxaWZfGcifgq8= github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk= @@ -434,8 +307,8 @@ github.com/moricho/tparallel v0.3.2 h1:odr8aZVFA3NZrNybggMkYO3rgPRcqjeQUlBBFVxKH github.com/moricho/tparallel v0.3.2/go.mod h1:OQ+K3b4Ln3l2TZveGCywybl68glfLEwFGqvnjok8b+U= github.com/muesli/termenv v0.16.0 h1:S5AlUN9dENB57rsbnkPyfdGuWIlkmzJjbFf0Tf5FWUc= github.com/muesli/termenv v0.16.0/go.mod h1:ZRfOIKPFDYQoDFF4Olj7/QJbW60Ol/kL1pU3VfY/Cnk= -github.com/mwitkow/go-conntrack v0.0.0-20161129095857-cc309e4a2223/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U= -github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U= +github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA= +github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ= github.com/nakabonne/nestif v0.3.1 h1:wm28nZjhQY5HyYPx+weN3Q65k6ilSBxDb8v5S81B81U= github.com/nakabonne/nestif v0.3.1/go.mod h1:9EtoZochLn5iUprVDmDjqGKPofoUEBL8U4Ngq6aY7OE= github.com/nishanths/exhaustive v0.12.0 h1:vIY9sALmw6T/yxiASewa4TQcFsVYZQQRUQJhKRf3Swg= @@ -461,8 +334,6 @@ github.com/pelletier/go-toml v1.9.5 h1:4yBQzkHv+7BHq2PQUZF3Mx0IYxG7LsP222s7Agd3v github.com/pelletier/go-toml v1.9.5/go.mod h1:u1nR/EPcESfeI/szUZKdtJ0xRNbUoANCkoOuaOx1Y+c= github.com/pelletier/go-toml/v2 v2.2.4 h1:mye9XuhQ6gvn5h28+VilKrrPoQVanw5PMw/TB0t5Ec4= github.com/pelletier/go-toml/v2 v2.2.4/go.mod h1:2gIqNv+qfxSVS7cM2xJQKtLSTLUE9V8t9Stt+h56mCY= -github.com/pkg/errors v0.8.0/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= -github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= @@ -471,28 +342,16 @@ github.com/polyfloyd/go-errorlint v1.8.0 h1:DL4RestQqRLr8U4LygLw8g2DX6RN1eBJOpa2 github.com/polyfloyd/go-errorlint v1.8.0/go.mod h1:G2W0Q5roxbLCt0ZQbdoxQxXktTjwNyDbEaj3n7jvl4s= github.com/prashantv/gostub v1.1.0 h1:BTyx3RfQjRHnUWaGF9oQos79AlQ5k8WNktv7VGvVH4g= github.com/prashantv/gostub v1.1.0/go.mod h1:A5zLQHz7ieHGG7is6LLXLz7I8+3LZzsrV0P1IAHhP5U= -github.com/prometheus/client_golang v0.9.1/go.mod h1:7SWBe2y4D6OKWSNQJUaRYU/AaXPKyh/dDVn+NZz0KFw= -github.com/prometheus/client_golang v1.0.0/go.mod h1:db9x61etRT2tGnBNRi70OPL5FsnadC4Ky3P0J6CfImo= -github.com/prometheus/client_golang v1.7.1/go.mod h1:PY5Wy2awLA44sXw4AOSfFBetzPP4j5+D6mVACh+pe2M= -github.com/prometheus/client_golang v1.11.0/go.mod h1:Z6t4BnS23TR94PD6BsDNk8yVqroYurpAkEiz0P2BEV0= -github.com/prometheus/client_golang v1.12.1 h1:ZiaPsmm9uiBeaSMRznKsCDNtPCS0T3JVDGF+06gjBzk= -github.com/prometheus/client_golang v1.12.1/go.mod h1:3Z9XVyYiZYEO+YQWt3RD2R3jrbd179Rt297l4aS6nDY= -github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910/go.mod h1:MbSGuTsp3dbXC40dX6PRTWyKYBIrTGTE9sqQNg2J8bo= -github.com/prometheus/client_model v0.0.0-20190129233127-fd36f4220a90/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA= -github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA= -github.com/prometheus/client_model v0.2.0 h1:uq5h0d+GuxiXLJLNABMgp2qUWDPiLvgCzz2dUR+/W/M= -github.com/prometheus/client_model v0.2.0/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA= -github.com/prometheus/common v0.4.1/go.mod h1:TNfzLD0ON7rHzMJeJkieUDPYmFC7Snx/y86RQel1bk4= -github.com/prometheus/common v0.10.0/go.mod h1:Tlit/dnDKsSWFlCLTWaA1cyBgKHSMdTB80sz/V91rCo= -github.com/prometheus/common v0.26.0/go.mod h1:M7rCNAaPfAosfx8veZJCuw84e35h3Cfd9VFqTh1DIvc= -github.com/prometheus/common v0.32.1 h1:hWIdL3N2HoUx3B8j3YN9mWor0qhY/NlEKZEaXxuIRh4= -github.com/prometheus/common v0.32.1/go.mod h1:vu+V0TpY+O6vW9J44gczi3Ap/oXXR10b+M/gUGO4Hls= -github.com/prometheus/procfs v0.0.0-20181005140218-185b4288413d/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk= -github.com/prometheus/procfs v0.0.2/go.mod h1:TjEm7ze935MbeOT/UhFTIMYKhuLP4wbCsTZCD3I8kEA= -github.com/prometheus/procfs v0.1.3/go.mod h1:lV6e/gmhEcM9IjHGsFOCxxuZ+z1YqCvr4OA4YeYWdaU= -github.com/prometheus/procfs v0.6.0/go.mod h1:cz+aTbrPOrUb4q7XlbU9ygM+/jj0fzG6c1xBZuNvfVA= -github.com/prometheus/procfs v0.7.3 h1:4jVXhlkAyzOScmCkXBTOLRLTz8EeU+eyjrwB/EPq0VU= -github.com/prometheus/procfs v0.7.3/go.mod h1:cz+aTbrPOrUb4q7XlbU9ygM+/jj0fzG6c1xBZuNvfVA= +github.com/prometheus/client_golang v1.23.2 h1:Je96obch5RDVy3FDMndoUsjAhG5Edi49h0RJWRi/o0o= +github.com/prometheus/client_golang v1.23.2/go.mod h1:Tb1a6LWHB3/SPIzCoaDXI4I8UHKeFTEQ1YCr+0Gyqmg= +github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk= +github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE= +github.com/prometheus/common v0.67.5 h1:pIgK94WWlQt1WLwAC5j2ynLaBRDiinoAb86HZHTUGI4= +github.com/prometheus/common v0.67.5/go.mod h1:SjE/0MzDEEAyrdr5Gqc6G+sXI67maCxzaT3A2+HqjUw= +github.com/prometheus/otlptranslator v1.0.0 h1:s0LJW/iN9dkIH+EnhiD3BlkkP5QVIUVEoIwkU+A6qos= +github.com/prometheus/otlptranslator v1.0.0/go.mod h1:vRYWnXvI6aWGpsdY/mOT/cbeVRBlPWtBNDb7kGR3uKM= +github.com/prometheus/procfs v0.20.1 h1:XwbrGOIplXW/AU3YhIhLODXMJYyC1isLFfYCsTEycfc= +github.com/prometheus/procfs v0.20.1/go.mod h1:o9EMBZGRyvDrSPH1RqdxhojkuXstoe4UlK79eF5TGGo= github.com/quasilyte/go-ruleguard v0.4.4 h1:53DncefIeLX3qEpjzlS1lyUmQoUEeOWPFWqaTJq9eAQ= github.com/quasilyte/go-ruleguard v0.4.4/go.mod h1:Vl05zJ538vcEEwu16V/Hdu7IYZWyKSwIy4c88Ro1kRE= github.com/quasilyte/go-ruleguard/dsl v0.3.22 h1:wd8zkOhSNr+I+8Qeciml08ivDt1pSXe60+5DqOpCjPE= @@ -508,7 +367,6 @@ github.com/raeperd/recvcheck v0.2.0/go.mod h1:n04eYkwIR0JbgD73wT8wL4JjPC3wm0nFtz github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc= github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ= github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88= -github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFRclV5y23lUDJ4= github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ= github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc= github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM= @@ -530,9 +388,6 @@ github.com/sergi/go-diff v1.2.0 h1:XU+rvMAioB0UC3q1MFrIQy4Vo5/4VsRDQQXHsEya6xQ= github.com/sergi/go-diff v1.2.0/go.mod h1:STckp+ISIX8hZLjrqAeVduY0gWCT9IjLuqbuNXdaHfM= github.com/shurcooL/go v0.0.0-20180423040247-9e1955d9fb6e/go.mod h1:TDJrrUr11Vxrven61rcy3hJMUqaf/CLWYhHNPmT14Lk= github.com/shurcooL/go-goon v0.0.0-20170922171312-37c2f522c041/go.mod h1:N5mDOmsrJOB+vfqUK+7DmDyjhSLIIBnXo9lvZJj3MWQ= -github.com/sirupsen/logrus v1.2.0/go.mod h1:LxeOpSwHxABJmUn/MG1IvRgCAasNZTLOkJPxbbu5VWo= -github.com/sirupsen/logrus v1.4.2/go.mod h1:tLMulIdttU9McNUspp0xgXVQah82FyeX6MwdIuYE2rE= -github.com/sirupsen/logrus v1.6.0/go.mod h1:7uNnSEd1DgxDLC74fIahvMZmmYsHGZGEOFrfsX/uA88= github.com/sirupsen/logrus v1.9.3 h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ= github.com/sirupsen/logrus v1.9.3/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ= github.com/sivchari/containedctx v1.0.3 h1:x+etemjbsh2fB5ewm5FeLNi5bUjK0V8n0RB+Wwfd0XE= @@ -559,7 +414,6 @@ github.com/ssgreg/nlreturn/v2 v2.2.1/go.mod h1:E/iiPB78hV7Szg2YfRgyIrk1AD6JVMTRk github.com/stbenjam/no-sprintf-host-port v0.2.0 h1:i8pxvGrt1+4G0czLr/WnmyH7zbZ8Bg8etvARQ1rpyl4= github.com/stbenjam/no-sprintf-host-port v0.2.0/go.mod h1:eL0bQ9PasS0hsyTyfTjjG+E80QIyPnBVQbYZyv20Jfk= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= -github.com/stretchr/objx v0.1.1/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw= github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo= github.com/stretchr/objx v0.5.2 h1:xuMeJ0Sdp5ZMRXx/aWO6RZxdr3beISkG5/G/aIRr3pY= @@ -572,6 +426,7 @@ github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/ github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU= github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo= +github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U= github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U= github.com/subosito/gotenv v1.4.1 h1:jyEFiXpy21Wm81FBN71l9VoMMV8H8jG+qIK3GCpY6Qs= @@ -613,7 +468,6 @@ github.com/yeya24/promlinter v0.3.0/go.mod h1:cDfJQQYv9uYciW60QT0eeHlFodotkYZlL+ github.com/ykadowak/zerologlint v0.1.5 h1:Gy/fMz1dFQN9JZTPjv1hxEk+sRWm05row04Yoolgdiw= github.com/ykadowak/zerologlint v0.1.5/go.mod h1:KaUskqF3e/v59oPmdq1U1DnKcuHokl2/K1U4pmIELKg= github.com/yuin/goldmark v1.1.25/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= -github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= github.com/yuin/goldmark v1.1.32/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= github.com/yuin/goldmark v1.3.5/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k= @@ -629,17 +483,26 @@ go-simpler.org/sloglint v0.11.0 h1:JlR1X4jkbeaffiyjLtymeqmGDKBDO1ikC6rjiuFAOco= go-simpler.org/sloglint v0.11.0/go.mod h1:CFDO8R1i77dlciGfPEPvYke2ZMx4eyGiEIWkyeW2Pvw= go.augendre.info/fatcontext v0.8.0 h1:2dfk6CQbDGeu1YocF59Za5Pia7ULeAM6friJ3LP7lmk= go.augendre.info/fatcontext v0.8.0/go.mod h1:oVJfMgwngMsHO+KB2MdgzcO+RvtNdiCEOlWvSFtax/s= -go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU= -go.opencensus.io v0.22.0/go.mod h1:+kGneAE2xo2IficOXnaByMWTGM9T73dGwxeWcUqIpI8= -go.opencensus.io v0.22.2/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw= -go.opencensus.io v0.22.3/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw= -go.opencensus.io v0.22.4/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw= +go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ64= +go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y= go.opentelemetry.io/collector/featuregate v1.58.0 h1:Kh6Dpgbxywv/Q3D6qPehaSxNCxvr/U/ki7CL4y3udCo= go.opentelemetry.io/collector/featuregate v1.58.0/go.mod h1:4ga1QBMPEejXXmpyJS8lmaRpknJ3Lb9Bvk6e420bUFU= go.opentelemetry.io/collector/internal/testutil v0.152.0 h1:8LGwekR7mLcUDhT1ofLmdnrHRFuUa3U7PBd95ZvJEjQ= go.opentelemetry.io/collector/internal/testutil v0.152.0/go.mod h1:Jkjs6rkqs973LqgZ0Fe3zrokQRKULYXPIf4HuqStiEE= go.opentelemetry.io/collector/pdata v1.58.0 h1:5Lxut3NxKp87066Pzt+3q7+JUuFI5B3teCyLZIF8wIs= go.opentelemetry.io/collector/pdata v1.58.0/go.mod h1:4vZtODINbC/JF3eGocnatdImzbRHseOywIcr+aULjCg= +go.opentelemetry.io/otel v1.43.0 h1:mYIM03dnh5zfN7HautFE4ieIig9amkNANT+xcVxAj9I= +go.opentelemetry.io/otel v1.43.0/go.mod h1:JuG+u74mvjvcm8vj8pI5XiHy1zDeoCS2LB1spIq7Ay0= +go.opentelemetry.io/otel/exporters/prometheus v0.65.0 h1:jOveH/b4lU9HT7y+Gfamf18BqlOuz2PWEvs8yM7Q6XE= +go.opentelemetry.io/otel/exporters/prometheus v0.65.0/go.mod h1:i1P8pcumauPtUI4YNopea1dhzEMuEqWP1xoUZDylLHo= +go.opentelemetry.io/otel/metric v1.43.0 h1:d7638QeInOnuwOONPp4JAOGfbCEpYb+K6DVWvdxGzgM= +go.opentelemetry.io/otel/metric v1.43.0/go.mod h1:RDnPtIxvqlgO8GRW18W6Z/4P462ldprJtfxHxyKd2PY= +go.opentelemetry.io/otel/sdk v1.43.0 h1:pi5mE86i5rTeLXqoF/hhiBtUNcrAGHLKQdhg4h4V9Dg= +go.opentelemetry.io/otel/sdk v1.43.0/go.mod h1:P+IkVU3iWukmiit/Yf9AWvpyRDlUeBaRg6Y+C58QHzg= +go.opentelemetry.io/otel/sdk/metric v1.43.0 h1:S88dyqXjJkuBNLeMcVPRFXpRw2fuwdvfCGLEo89fDkw= +go.opentelemetry.io/otel/sdk/metric v1.43.0/go.mod h1:C/RJtwSEJ5hzTiUz5pXF1kILHStzb9zFlIEe85bhj6A= +go.opentelemetry.io/otel/trace v1.43.0 h1:BkNrHpup+4k4w+ZZ86CZoHHEkohws8AY+WTX09nk+3A= +go.opentelemetry.io/otel/trace v1.43.0/go.mod h1:/QJhyVBUUswCphDVxq+8mld+AvhXZLhe+8WVFxiFff0= go.opentelemetry.io/proto/slim/otlp v1.10.0 h1:iR97Vs/ZDR+y9TfuP9b1XBtdPWeC+OMslIBmhcLU7jM= go.opentelemetry.io/proto/slim/otlp v1.10.0/go.mod h1:lV9250stpjYLPNA5viFabIgP2QlUGRT1GdTgAf8SIUk= go.opentelemetry.io/proto/slim/otlp/collector/profiles/v1development v0.3.0 h1:RUF5rO0hAlgiJt1fzQVzcVs3vZVNHIcMLgOgG4rWNcQ= @@ -656,49 +519,20 @@ go.uber.org/multierr v1.11.0 h1:blXXJkSxSSfBVBlC76pxqeO+LN3aDfLQo+309xJstO0= go.uber.org/multierr v1.11.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN80Y= go.uber.org/zap v1.24.0 h1:FiJd5l1UOLj0wCgbSE0rwwXHzEdAZS6hiiSnxJN/D60= go.uber.org/zap v1.24.0/go.mod h1:2kMP+WWQ8aoFoedH3T2sq6iJ2yDWpHbP0f6MQbS9Gkg= -golang.org/x/crypto v0.0.0-20180904163835-0709b304e793/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4= +go.yaml.in/yaml/v2 v2.4.4 h1:tuyd0P+2Ont/d6e2rl3be67goVK4R6deVxCUX5vyPaQ= +go.yaml.in/yaml/v2 v2.4.4/go.mod h1:gMZqIpDtDqOfM0uNfy0SkpRhvUryYH0Z6wdMYcacYXQ= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= -golang.org/x/crypto v0.0.0-20190510104115-cbcb75029529/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= -golang.org/x/crypto v0.0.0-20190605123033-f99c8df09eb5/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc= golang.org/x/crypto v0.13.0/go.mod h1:y6Z2r+Rw4iayiXXAIxJIDAJ1zMW4yaTpebo8fPOliYc= golang.org/x/crypto v0.14.0/go.mod h1:MVFd36DqK4CsrnJYDkBA3VC4m2GkXAM0PvzMCn4JQf4= -golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= -golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= -golang.org/x/exp v0.0.0-20190510132918-efd6b22b2522/go.mod h1:ZjyILWgesfNpC6sMxTJOJm9Kp84zZh5NQWvqDGG3Qr8= -golang.org/x/exp v0.0.0-20190829153037-c13cbed26979/go.mod h1:86+5VVa7VpoJ4kLfm080zCjGlMRFzhUhsZKEZO7MGek= -golang.org/x/exp v0.0.0-20191030013958-a1ab85dbe136/go.mod h1:JXzH8nQsPlswgeRAPE3MuO9GYsAcnJvJ4vnMwN/5qkY= -golang.org/x/exp v0.0.0-20191129062945-2f5052295587/go.mod h1:2RIsYlXP63K8oxa1u096TMicItID8zy7Y6sNkU49FU4= -golang.org/x/exp v0.0.0-20191227195350-da58074b4299/go.mod h1:2RIsYlXP63K8oxa1u096TMicItID8zy7Y6sNkU49FU4= -golang.org/x/exp v0.0.0-20200119233911-0405dc783f0a/go.mod h1:2RIsYlXP63K8oxa1u096TMicItID8zy7Y6sNkU49FU4= -golang.org/x/exp v0.0.0-20200207192155-f17229e696bd/go.mod h1:J/WKrq2StrnmMY6+EHIKF9dgMWnmCNThgcyBT1FY9mM= -golang.org/x/exp v0.0.0-20200224162631-6cc2880d07d6/go.mod h1:3jZMyOhIsHpP37uCMkUooju7aAi5cS1Q23tOzKc+0MU= golang.org/x/exp v0.0.0-20240909161429-701f63a606c0 h1:e66Fs6Z+fZTbFBAxKfP3PALWBtpfqks2bwGcexMxgtk= golang.org/x/exp v0.0.0-20240909161429-701f63a606c0/go.mod h1:2TbTHSBQa924w8M6Xs1QcRcFwyucIwBGpK1p2f1YFFY= golang.org/x/exp/typeparams v0.0.0-20220428152302-39d4317da171/go.mod h1:AbB0pIl9nAr9wVwH+Z2ZpaocVmF5I4GyWCDIsVjR0bk= golang.org/x/exp/typeparams v0.0.0-20230203172020-98cc5a0785f9/go.mod h1:AbB0pIl9nAr9wVwH+Z2ZpaocVmF5I4GyWCDIsVjR0bk= golang.org/x/exp/typeparams v0.0.0-20250210185358-939b2ce775ac h1:TSSpLIG4v+p0rPv1pNOQtl1I8knsO4S9trOxNMOLVP4= golang.org/x/exp/typeparams v0.0.0-20250210185358-939b2ce775ac/go.mod h1:AbB0pIl9nAr9wVwH+Z2ZpaocVmF5I4GyWCDIsVjR0bk= -golang.org/x/image v0.0.0-20190227222117-0694c2d4d067/go.mod h1:kZ7UVZpmo3dzQBMxlp+ypCbDeSB+sBbTgSJuh5dn5js= -golang.org/x/image v0.0.0-20190802002840-cff245a6509b/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0= -golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE= -golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU= -golang.org/x/lint v0.0.0-20190301231843-5614ed5bae6f/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE= -golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= -golang.org/x/lint v0.0.0-20190409202823-959b441ac422/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= -golang.org/x/lint v0.0.0-20190909230951-414d861bb4ac/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= -golang.org/x/lint v0.0.0-20190930215403-16217165b5de/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= -golang.org/x/lint v0.0.0-20191125180803-fdd1cda4f05f/go.mod h1:5qLYkcX4OjUUV8bRuDixDT3tpyyb+LUpUlRWLxfhWrs= -golang.org/x/lint v0.0.0-20200130185559-910be7a94367/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY= -golang.org/x/lint v0.0.0-20200302205851-738671d3881b/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY= -golang.org/x/mobile v0.0.0-20190312151609-d3739f865fa6/go.mod h1:z+o9i4GpDbdi3rU15maQ/Ox0txvL9dWGYEHz965HBQE= -golang.org/x/mobile v0.0.0-20190719004257-d2bd2a29d028/go.mod h1:E/iHnbuqvinMTCcRqshq8CkpyQDoeVncDDYHnLhea+o= -golang.org/x/mod v0.0.0-20190513183733-4bf6d317e70e/go.mod h1:mXi4GBBbnImb6dmsKGUJ2LatrhH/nqhxcFungHvyanc= -golang.org/x/mod v0.1.0/go.mod h1:0QHyrYULN0/3qlju5TqG8bIK38QM8yzMo5ekMj3DlcY= -golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg= -golang.org/x/mod v0.1.1-0.20191107180719-034126e5016b/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg= golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.4.1/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= @@ -712,38 +546,13 @@ golang.org/x/mod v0.12.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs= golang.org/x/mod v0.13.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c= golang.org/x/mod v0.32.0 h1:9F4d3PHLljb6x//jOyokMv3eX+YDeepZSEo3mFJy93c= golang.org/x/mod v0.32.0/go.mod h1:SgipZ/3h2Ci89DlEtEXWUk/HteuRin+HHhN+WbNhguU= -golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= -golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= -golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= -golang.org/x/net v0.0.0-20190108225652-1e06a53dbb7e/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= -golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= -golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= -golang.org/x/net v0.0.0-20190501004415-9ce7a6920f09/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= -golang.org/x/net v0.0.0-20190503192946-f4e77d36d62c/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= -golang.org/x/net v0.0.0-20190603091049-60506f45cf65/go.mod h1:HSz+uSET+XFnRR8LxR5pz3Of3rY3CfYBVs4xY44aLks= -golang.org/x/net v0.0.0-20190613194153-d28f0bde5980/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20190628185345-da137c7871d7/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20190724013045-ca1201d0de80/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20191209160850-c0dbc17a3553/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20200114155413-6afb5195e5aa/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20200202094626-16171245cfb2/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20200222125558-5a598a2470a0/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20200301022130-244492dfa37a/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= -golang.org/x/net v0.0.0-20200324143707-d3edc9973b7e/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= -golang.org/x/net v0.0.0-20200501053045-e0ff5e5a1de5/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= -golang.org/x/net v0.0.0-20200506145744-7e3656a0809f/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= -golang.org/x/net v0.0.0-20200513185701-a91f0712d120/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= -golang.org/x/net v0.0.0-20200520182314-0ba52f642ac2/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= golang.org/x/net v0.0.0-20200625001655-4c5254603344/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA= -golang.org/x/net v0.0.0-20200707034311-ab3426394381/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA= -golang.org/x/net v0.0.0-20200822124328-c89045814202/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA= golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU= golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg= golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM= -golang.org/x/net v0.0.0-20210525063256-abc453219eb5/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= golang.org/x/net v0.0.0-20211015210444-4f30a5c0130f/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c= golang.org/x/net v0.2.0/go.mod h1:KqCZLdyyvdV855qA2rE3GC2aiw5xGR5TEjj8smXukLY= @@ -754,22 +563,10 @@ golang.org/x/net v0.15.0/go.mod h1:idbUs1IY1+zTqbi8yxTbhexhEEk5ur9LInksu6HrEpk= golang.org/x/net v0.16.0/go.mod h1:NxSsAGuq816PNPmqtQdLE42eU2Fs7NoRIZrHJAlaCOE= golang.org/x/net v0.51.0 h1:94R/GTO7mt3/4wIKpcR5gkGmRLOuE/2hNGeWq/GBIFo= golang.org/x/net v0.51.0/go.mod h1:aamm+2QF5ogm02fjy5Bb7CQ0WMt1/WVM7FtyaTLlA9Y= -golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U= -golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= -golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= -golang.org/x/oauth2 v0.0.0-20191202225959-858c2ad4c8b6/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= -golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw= -golang.org/x/oauth2 v0.0.0-20210514164344-f6687ab2804c/go.mod h1:KelEdhl1UZF7XfJ4dDtk6s++YSgaE7mD/BuKKDLBl4A= -golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= -golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= -golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= -golang.org/x/sync v0.0.0-20190227155943-e225da77a7e6/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= -golang.org/x/sync v0.0.0-20200317015054-43a5402ce75a/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20200625203802-6e8e738ad208/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= -golang.org/x/sync v0.0.0-20201207232520-09787c993a3a/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= @@ -777,48 +574,18 @@ golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y= golang.org/x/sync v0.4.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y= golang.org/x/sync v0.19.0 h1:vV+1eWNmZ5geRlYjzm2adRgW2/mcpevXNg50YZtPCE4= golang.org/x/sync v0.19.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI= -golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= -golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= -golang.org/x/sys v0.0.0-20181116152217-5ac8a444bdc5/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= -golang.org/x/sys v0.0.0-20190312061237-fead79001313/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20190422165155-953cdadca894/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20190502145724-3ef323f4f1fd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20190507160741-ecd444e8653b/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20190606165138-5da285871e9c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20190624142023-c5567b49c5d0/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20190726091711-fc99dfbffb4e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20191001151750-bb3f8db39f24/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20191204072324-ce4227a45e2e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20191228213918-04cbcbbfeed8/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200106162015-b016eb3dc98e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200113162924-86b910548bc1/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200122134326-e047566fdf82/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200202164722-d101bd2416d5/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200212091648-12a6c2dcc1e4/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200223170610-d5e6a3e2c0ae/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200302150141-5c8b2ff67527/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200331124033-c3d80250170d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200501052902-10377860bb8e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200511232937-7e40ca221e25/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200515095857-1151b9dac4a9/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200523222454-059865788121/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200615200032-f1bc736245b1/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200625212154-ddb9806d33ae/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= -golang.org/x/sys v0.0.0-20200803210538-64077c9b5642/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210124154548-22da62e12c0c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210330210617-4fbd30eecc44/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210510120138-977fb7262007/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= -golang.org/x/sys v0.0.0-20210603081109-ebe580a85c40/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20211019181941-9d821ace8654/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20211105183446-c75c47738b0c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= -golang.org/x/sys v0.0.0-20220114195835-da31bd327af9/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220412211240-33da011f77ad/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= @@ -841,9 +608,7 @@ golang.org/x/term v0.6.0/go.mod h1:m6U89DPEgQRMq3DNkDClhWw02AUbt2daBVO4cn4Hv9U= golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo= golang.org/x/term v0.12.0/go.mod h1:owVbMEjm3cBLCHdkQu9b1opXd4ETQWc3BhuQGKgXgvU= golang.org/x/term v0.13.0/go.mod h1:LTmsnFJwVN6bCy1rVCoS+qHT1HhALEFxKncY3WNNh4U= -golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= -golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk= golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= @@ -855,53 +620,12 @@ golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8= golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE= golang.org/x/text v0.34.0 h1:oL/Qq0Kdaqxa1KbNeMKwQq0reLCCaFtqu2eNuSeNHbk= golang.org/x/text v0.34.0/go.mod h1:homfLqTYRFyVYemLBFl5GgL/DWEiH5wcsQ5gSh1yziA= -golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= -golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= -golang.org/x/time v0.0.0-20191024005414-555d28b269f0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ= golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= -golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= -golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY= -golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= -golang.org/x/tools v0.0.0-20190312151545-0bb0c0a6e846/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= -golang.org/x/tools v0.0.0-20190312170243-e65039ee4138/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= -golang.org/x/tools v0.0.0-20190425150028-36563e24a262/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q= -golang.org/x/tools v0.0.0-20190506145303-2d16b83fe98c/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q= -golang.org/x/tools v0.0.0-20190524140312-2c0ae7006135/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q= -golang.org/x/tools v0.0.0-20190606124116-d0a3d012864b/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc= -golang.org/x/tools v0.0.0-20190621195816-6e04913cbbac/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc= -golang.org/x/tools v0.0.0-20190628153133-6cdbf07be9d0/go.mod h1:/rFqwRUd4F7ZHNgwSSTFct+R/Kf4OFW1sUzUTQQTgfc= -golang.org/x/tools v0.0.0-20190816200558-6889da9d5479/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= -golang.org/x/tools v0.0.0-20190911174233-4f2ddba30aff/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= -golang.org/x/tools v0.0.0-20191012152004-8de300cfc20a/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= -golang.org/x/tools v0.0.0-20191113191852-77e3bb0ad9e7/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= -golang.org/x/tools v0.0.0-20191115202509-3a792d9c32b2/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= -golang.org/x/tools v0.0.0-20191125144606-a911d9008d1f/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= -golang.org/x/tools v0.0.0-20191130070609-6e064ea0cf2d/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= -golang.org/x/tools v0.0.0-20191216173652-a0e659d51361/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20191227053925-7b8e75db28f4/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200117161641-43d50277825c/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200122220014-bf1340f18c4a/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200130002326-2f3ba24bd6e7/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200204074204-1cc6d1ef6c74/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200207183749-b753a1ba74fa/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200212150539-ea181f53ac56/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200224181240-023911ca70b2/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200227222343-706bc42d1f0d/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28= -golang.org/x/tools v0.0.0-20200304193943-95d2e580d8eb/go.mod h1:o4KQGtdN14AW+yjsvvwRTJJuXz8XRtIHtEnmAXLyFUw= -golang.org/x/tools v0.0.0-20200312045724-11d5b4c81c7d/go.mod h1:o4KQGtdN14AW+yjsvvwRTJJuXz8XRtIHtEnmAXLyFUw= golang.org/x/tools v0.0.0-20200324003944-a576cf524670/go.mod h1:Sl4aGygMT6LrqrWclx+PTx3U+LnKx/seiNR+3G19Ar8= golang.org/x/tools v0.0.0-20200329025819-fd4102a86c65/go.mod h1:Sl4aGygMT6LrqrWclx+PTx3U+LnKx/seiNR+3G19Ar8= -golang.org/x/tools v0.0.0-20200331025713-a30bf2db82d4/go.mod h1:Sl4aGygMT6LrqrWclx+PTx3U+LnKx/seiNR+3G19Ar8= -golang.org/x/tools v0.0.0-20200501065659-ab2804fb9c9d/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= -golang.org/x/tools v0.0.0-20200512131952-2bc93b1c0c88/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= -golang.org/x/tools v0.0.0-20200515010526-7d3b6ebf133d/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= -golang.org/x/tools v0.0.0-20200618134242-20370b0cb4b2/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= golang.org/x/tools v0.0.0-20200724022722-7017fd6b1305/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA= -golang.org/x/tools v0.0.0-20200729194436-6467de6f59a7/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA= -golang.org/x/tools v0.0.0-20200804011535-6c149bb5ef0d/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA= golang.org/x/tools v0.0.0-20200820010801-b793a1359eac/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA= -golang.org/x/tools v0.0.0-20200825202427-b303f430e36d/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA= golang.org/x/tools v0.0.0-20201023174141-c8cfbd0f21e6/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= golang.org/x/tools v0.1.1-0.20210205202024-ef80cdb6ec6d/go.mod h1:9bzcO0MWcOuT0tm1iBGzDVPshzfwoVvREIui8C+MHqU= golang.org/x/tools v0.1.1-0.20210302220138-2ac05c832e1a/go.mod h1:9bzcO0MWcOuT0tm1iBGzDVPshzfwoVvREIui8C+MHqU= @@ -926,115 +650,22 @@ golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8T golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= -google.golang.org/api v0.4.0/go.mod h1:8k5glujaEP+g9n7WNsDg8QP6cUVNI86fCNMcbazEtwE= -google.golang.org/api v0.7.0/go.mod h1:WtwebWUNSVBH/HAw79HIFXZNqEvBhG+Ra+ax0hx3E3M= -google.golang.org/api v0.8.0/go.mod h1:o4eAsZoiT+ibD93RtjEohWalFOjRDx6CVaqeizhEnKg= -google.golang.org/api v0.9.0/go.mod h1:o4eAsZoiT+ibD93RtjEohWalFOjRDx6CVaqeizhEnKg= -google.golang.org/api v0.13.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI= -google.golang.org/api v0.14.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI= -google.golang.org/api v0.15.0/go.mod h1:iLdEw5Ide6rF15KTC1Kkl0iskquN2gFfn9o9XIsbkAI= -google.golang.org/api v0.17.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= -google.golang.org/api v0.18.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= -google.golang.org/api v0.19.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= -google.golang.org/api v0.20.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= -google.golang.org/api v0.22.0/go.mod h1:BwFmGc8tA3vsd7r/7kR8DY7iEEGSU04BFxCo5jP/sfE= -google.golang.org/api v0.24.0/go.mod h1:lIXQywCXRcnZPGlsd8NbLnOjtAoL6em04bJ9+z0MncE= -google.golang.org/api v0.28.0/go.mod h1:lIXQywCXRcnZPGlsd8NbLnOjtAoL6em04bJ9+z0MncE= -google.golang.org/api v0.29.0/go.mod h1:Lcubydp8VUV7KeIHD9z2Bys/sm/vGKnG1UHuDBSrHWM= -google.golang.org/api v0.30.0/go.mod h1:QGmEvQ87FHZNiUVJkT14jQNYJ4ZJjdRF23ZXz5138Fc= -google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM= -google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4= -google.golang.org/appengine v1.5.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4= -google.golang.org/appengine v1.6.1/go.mod h1:i06prIuMbXzDqacNJfV5OdTW448YApPu5ww/cMBSeb0= -google.golang.org/appengine v1.6.5/go.mod h1:8WjMMxjGQR8xUklV/ARdw2HLXBOI7O7uCIDZVag1xfc= -google.golang.org/appengine v1.6.6/go.mod h1:8WjMMxjGQR8xUklV/ARdw2HLXBOI7O7uCIDZVag1xfc= -google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc= -google.golang.org/genproto v0.0.0-20190307195333-5fe7a883aa19/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= -google.golang.org/genproto v0.0.0-20190418145605-e7d98fc518a7/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= -google.golang.org/genproto v0.0.0-20190425155659-357c62f0e4bb/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= -google.golang.org/genproto v0.0.0-20190502173448-54afdca5d873/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= -google.golang.org/genproto v0.0.0-20190801165951-fa694d86fc64/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc= -google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc= -google.golang.org/genproto v0.0.0-20190911173649-1774047e7e51/go.mod h1:IbNlFCBrqXvoKpeg0TB2l7cyZUmoaFKYIwrEpbDKLA8= -google.golang.org/genproto v0.0.0-20191108220845-16a3f7862a1a/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= -google.golang.org/genproto v0.0.0-20191115194625-c23dd37a84c9/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= -google.golang.org/genproto v0.0.0-20191216164720-4f79533eabd1/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= -google.golang.org/genproto v0.0.0-20191230161307-f3c370f40bfb/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= -google.golang.org/genproto v0.0.0-20200115191322-ca5a22157cba/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= -google.golang.org/genproto v0.0.0-20200122232147-0452cf42e150/go.mod h1:n3cpQtvxv34hfy77yVDNjmbRyujviMdxYliBSkLhpCc= -google.golang.org/genproto v0.0.0-20200204135345-fa8e72b47b90/go.mod h1:GmwEX6Z4W5gMy59cAlVYjN9JhxgbQH6Gn+gFDQe2lzA= -google.golang.org/genproto v0.0.0-20200212174721-66ed5ce911ce/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= -google.golang.org/genproto v0.0.0-20200224152610-e50cd9704f63/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= -google.golang.org/genproto v0.0.0-20200228133532-8c2c7df3a383/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= -google.golang.org/genproto v0.0.0-20200305110556-506484158171/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= -google.golang.org/genproto v0.0.0-20200312145019-da6875a35672/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= -google.golang.org/genproto v0.0.0-20200331122359-1ee6d9798940/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= -google.golang.org/genproto v0.0.0-20200430143042-b979b6f78d84/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= -google.golang.org/genproto v0.0.0-20200511104702-f5ebc3bea380/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c= -google.golang.org/genproto v0.0.0-20200515170657-fc4c6c6a6587/go.mod h1:YsZOwe1myG/8QRHRsmBRE1LrgQY60beZKjly0O1fX9U= -google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013/go.mod h1:NbSheEEYHJ7i3ixzK3sjbqSGDJWnxyFXZblF3eUsNvo= -google.golang.org/genproto v0.0.0-20200618031413-b414f8b61790/go.mod h1:jDfRM7FcilCzHH/e9qn6dsT145K34l5v+OpcnNgKAAA= -google.golang.org/genproto v0.0.0-20200729003335-053ba62fc06f/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= -google.golang.org/genproto v0.0.0-20200804131852-c06518451d9c/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= -google.golang.org/genproto v0.0.0-20200825200019-8632dd797987/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no= -google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c= -google.golang.org/grpc v1.20.1/go.mod h1:10oTOabMzJvdu6/UiuZezV6QK5dSlG84ov/aaiqXj38= -google.golang.org/grpc v1.21.1/go.mod h1:oYelfM1adQP15Ek0mdvEgi9Df8B9CZIaU1084ijfRaM= -google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyacEbxg= -google.golang.org/grpc v1.25.1/go.mod h1:c3i+UQWmh7LiEpx4sFZnkU36qjEYZ0imhYfXVyQciAY= -google.golang.org/grpc v1.26.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk= -google.golang.org/grpc v1.27.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk= -google.golang.org/grpc v1.27.1/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk= -google.golang.org/grpc v1.28.0/go.mod h1:rpkK4SK4GF4Ach/+MFLZUBavHOvF2JJB5uozKKal+60= -google.golang.org/grpc v1.29.1/go.mod h1:itym6AZVZYACWQqET3MqgPpjcuV5QH3BxFS3IjizoKk= -google.golang.org/grpc v1.30.0/go.mod h1:N36X2cJ7JwdamYAgDz+s+rVMFjt3numwzf/HckM8pak= -google.golang.org/grpc v1.31.0/go.mod h1:N36X2cJ7JwdamYAgDz+s+rVMFjt3numwzf/HckM8pak= -google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd/go.mod h1:DFci5gLYBciE7Vtevhsrf46CRTquxDuWsQurQQe4oz8= -google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64/go.mod h1:kwYJMbMJ01Woi6D6+Kah6886xMZcty6N08ah7+eCXa0= -google.golang.org/protobuf v0.0.0-20200228230310-ab0ca4ff8a60/go.mod h1:cfTl7dwQJ+fmap5saPgwCLgHXTUD7jkjRqWcaiX5VyM= -google.golang.org/protobuf v1.20.1-0.20200309200217-e05f789c0967/go.mod h1:A+miEFZTKqfCUM6K7xSMQL9OKL/b6hQv+e19PK+JZNE= -google.golang.org/protobuf v1.21.0/go.mod h1:47Nbq4nVaFHyn7ilMalzfO3qCViNmqZ2kzikPIcrTAo= -google.golang.org/protobuf v1.22.0/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU= -google.golang.org/protobuf v1.23.0/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU= -google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU= -google.golang.org/protobuf v1.24.0/go.mod h1:r/3tXBNzIEhYS9I1OUVjXDlt8tc493IdKGjtUeSXeh4= -google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c= -google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw= -google.golang.org/protobuf v1.26.0/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc= google.golang.org/protobuf v1.36.11 h1:fV6ZwhNocDyBLK0dj+fg8ektcVegBBuEolpbTQyBNVE= google.golang.org/protobuf v1.36.11/go.mod h1:HTf+CrKn2C3g5S8VImy6tdcUvCska2kB7j23XfzDpco= -gopkg.in/alecthomas/kingpin.v2 v2.2.6/go.mod h1:FMv+mEhP44yOT+4EoQTLFTRgOQ1FBLkstjWtayDeSgw= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= -gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= -gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= -gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI= gopkg.in/ini.v1 v1.67.0 h1:Dgnx+6+nfE+IfzjUEISNeydPJh9AXNNsWbGP9KzCsOA= gopkg.in/ini.v1 v1.67.0/go.mod h1:pNLf8WUiyNEtQjuu5G5vTm06TEv9tsIgeAvK8hOrP4k= -gopkg.in/yaml.v2 v2.2.1/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= -gopkg.in/yaml.v2 v2.2.4/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= -gopkg.in/yaml.v2 v2.2.5/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= -gopkg.in/yaml.v2 v2.3.0/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY= gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ= gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= -honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= -honnef.co/go/tools v0.0.0-20190106161140-3f1c8253044a/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= -honnef.co/go/tools v0.0.0-20190418001031-e561f6794a2a/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= -honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= -honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg= -honnef.co/go/tools v0.0.1-2020.1.3/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k= -honnef.co/go/tools v0.0.1-2020.1.4/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k= honnef.co/go/tools v0.6.1 h1:R094WgE8K4JirYjBaOpz/AvTyUu/3wbmAoskKN/pxTI= honnef.co/go/tools v0.6.1/go.mod h1:3puzxxljPCe8RGJX7BIy1plGbxEOZni5mR2aXe3/uk4= mvdan.cc/gofumpt v0.8.0 h1:nZUCeC2ViFaerTcYKstMmfysj6uhQrA2vJe+2vwGU6k= mvdan.cc/gofumpt v0.8.0/go.mod h1:vEYnSzyGPmjvFkqJWtXkh79UwPWP9/HMxQdGEXZHjpg= mvdan.cc/unparam v0.0.0-20250301125049-0df0534333a4 h1:WjUu4yQoT5BHT1w8Zu56SP8367OuBV5jvo+4Ulppyf8= mvdan.cc/unparam v0.0.0-20250301125049-0df0534333a4/go.mod h1:rthT7OuvRbaGcd5ginj6dA2oLE7YNlta9qhBNNdCaLE= -rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8= -rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0= -rsc.io/sampler v1.3.0/go.mod h1:T1hPZKmBbMNahiBKFy5HrXp6adAjACjK9JXDnKaTXpA= diff --git a/internal/componentstatus/componentstatus.go b/internal/componentstatus/componentstatus.go new file mode 100644 index 00000000..ffb79a03 --- /dev/null +++ b/internal/componentstatus/componentstatus.go @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: Apache-2.0 + +// Package componentstatus is tracecore's in-tree analogue of OTel +// collector's `go.opentelemetry.io/collector/component/componentstatus` +// at v0.152. ReportStatus is a free function that delegates to the +// host only when the host opts in via the optional StatusReporter +// interface — the pre-M2 Host.ReportStatus method has been removed. +// +// See docs/STRATEGY.md "Host.ReportStatus" divergence row. +package componentstatus + +import "github.com/tracecoreai/tracecore/internal/pipeline" + +// StatusReporter is the optional interface a pipeline.Host may +// implement to record a Component's status events. Hosts that don't +// care about status events simply omit this method; ReportStatus +// degrades to a no-op. +// +// The method is named ReportComponentStatus (not ReportStatus) so a +// type that satisfies it can't be confused with an old-style host +// that exposed ReportStatus directly. Once the pre-M2 method is +// removed, this is the only way to surface status into a host. +type StatusReporter interface { + ReportComponentStatus(ev pipeline.StatusEvent) +} + +// ReportStatus delivers ev to host if host implements StatusReporter, +// otherwise silently discards it. The "silent discard" is deliberate: +// receivers should not have to know which host implementation they're +// running against, and a binary that wants status logging wires a +// StatusReporter into its host. +// +// Mirrors OTel's componentstatus.ReportStatus(host, ev) signature +// (with a tracecore-flavored StatusEvent that already lived in +// pipeline.StatusEvent pre-M2). +func ReportStatus(host pipeline.Host, ev pipeline.StatusEvent) { + if r, ok := host.(StatusReporter); ok { + r.ReportComponentStatus(ev) + } +} diff --git a/internal/componentstatus/componentstatus_test.go b/internal/componentstatus/componentstatus_test.go new file mode 100644 index 00000000..72594b45 --- /dev/null +++ b/internal/componentstatus/componentstatus_test.go @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: Apache-2.0 + +package componentstatus_test + +import ( + "errors" + "testing" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/componentstatus" + "github.com/tracecoreai/tracecore/internal/pipeline" +) + +// recordingHost implements both pipeline.Host and the optional +// componentstatus.StatusReporter interface so the free function can +// delegate to it. +type recordingHost struct { + events []pipeline.StatusEvent +} + +func (h *recordingHost) GetExtensions() map[pipeline.ID]pipeline.Component { + return map[pipeline.ID]pipeline.Component{} +} + +func (h *recordingHost) ReportComponentStatus(ev pipeline.StatusEvent) { + h.events = append(h.events, ev) +} + +// silentHost implements pipeline.Host but NOT StatusReporter — to +// pin the silent-discard contract. +type silentHost struct{} + +func (silentHost) GetExtensions() map[pipeline.ID]pipeline.Component { + return map[pipeline.ID]pipeline.Component{} +} + +// TestReportStatus_DelegatesToStatusReporter verifies that when a +// host implements StatusReporter, ReportStatus passes through. +func TestReportStatus_DelegatesToStatusReporter(t *testing.T) { + t.Parallel() + + h := &recordingHost{} + componentstatus.ReportStatus(h, pipeline.StatusEvent{Kind: "starting"}) + componentstatus.ReportStatus(h, pipeline.StatusEvent{Kind: "fault", Err: errors.New("boom")}) + + require.Len(t, h.events, 2) + require.Equal(t, "starting", h.events[0].Kind) + require.Equal(t, "fault", h.events[1].Kind) + require.EqualError(t, h.events[1].Err, "boom") +} + +// TestReportStatus_SilentDiscardWhenHostHasNoReporter pins the +// "no opt-in == no error" contract: a host that doesn't implement +// StatusReporter sees the call discarded without panicking. This +// matches OTel's behavior and lets receivers call ReportStatus +// freely without knowing the host shape. +func TestReportStatus_SilentDiscardWhenHostHasNoReporter(t *testing.T) { + t.Parallel() + + require.NotPanics(t, func() { + componentstatus.ReportStatus(silentHost{}, pipeline.StatusEvent{Kind: "ignored"}) + }) +} diff --git a/internal/config/config.go b/internal/config/config.go index dac7f21d..a5f03ec5 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -4,6 +4,7 @@ package config import ( "fmt" + "net" "strings" "gopkg.in/yaml.v3" @@ -11,6 +12,11 @@ import ( "github.com/tracecoreai/tracecore/internal/pipeline" ) +// netSplitHostPort is a small alias so the validator imports net at +// the package level rather than letting the bare call drift into +// other helpers. +var netSplitHostPort = net.SplitHostPort + // Config is the in-memory shape of the collector YAML config after // structural parsing. Per-component config bodies are kept as opaque // yaml.Nodes; the matching factory decodes them at runtime. @@ -19,6 +25,126 @@ type Config struct { Processors map[string]yaml.Node `yaml:"processors,omitempty"` Exporters map[string]yaml.Node `yaml:"exporters,omitempty"` Service Service `yaml:"service,omitempty"` + Telemetry Telemetry `yaml:"telemetry,omitempty"` +} + +// Telemetry is the operator-facing block for tracecore's self- +// telemetry surface. Default is OFF — operators opt in by setting +// `telemetry.enabled: true`. When disabled, other fields are not +// validated so operators can keep a full block commented as a +// production template. +type Telemetry struct { + // Enabled gates the surface. Default false. + Enabled bool `yaml:"enabled"` + + // Listen is the bind address for the HTTP server (host:port). + // Default "localhost:8888" — localhost-only is the safer + // default; operators in multi-node setups override to ":8888". + Listen string `yaml:"listen,omitempty"` + + // Paths controls the routes mounted on the single listener. + // Each field defaults to the conventional name. + Paths TelemetryPaths `yaml:"paths,omitempty"` +} + +// TelemetryPaths captures the three route knobs operators may need +// to override (e.g., service-mesh probe-path conflicts). +type TelemetryPaths struct { + // Metrics defaults to "/metrics". + Metrics string `yaml:"metrics,omitempty"` + // Healthz defaults to "/healthz". + Healthz string `yaml:"healthz,omitempty"` + // Readyz defaults to "/readyz". + Readyz string `yaml:"readyz,omitempty"` +} + +// telemetryDefaults are applied when a field is the zero-value AND +// the block is enabled. Kept centralized so tests + the loader stay +// in sync on what "default" means. +const ( + defaultTelemetryListen = "localhost:8888" + defaultTelemetryMetricsPath = "/metrics" + defaultTelemetryHealthzPath = "/healthz" + defaultTelemetryReadyzPath = "/readyz" +) + +// applyDefaults fills in zero-value fields with the canonical +// defaults when the block is enabled. No-op when disabled (the +// zero-value fields stay zero so operators inspecting the parsed +// config can tell "default-on" from "explicit-set"). +func (t *Telemetry) applyDefaults() { + if !t.Enabled { + return + } + if t.Listen == "" { + t.Listen = defaultTelemetryListen + } + if t.Paths.Metrics == "" { + t.Paths.Metrics = defaultTelemetryMetricsPath + } + if t.Paths.Healthz == "" { + t.Paths.Healthz = defaultTelemetryHealthzPath + } + if t.Paths.Readyz == "" { + t.Paths.Readyz = defaultTelemetryReadyzPath + } +} + +// validate runs the operator-facing checks. Empty fields after +// applyDefaults are bugs in the loader (not operator errors); this +// catches the operator-error path: malformed listen, non-absolute +// paths. +func (t *Telemetry) validate() error { + if !t.Enabled { + return nil + } + if _, _, err := splitHostPort(t.Listen); err != nil { + return fmt.Errorf("telemetry.listen: %w", err) + } + if err := validatePath("metrics", t.Paths.Metrics); err != nil { + return err + } + if err := validatePath("healthz", t.Paths.Healthz); err != nil { + return err + } + return validatePath("readyz", t.Paths.Readyz) +} + +func validatePath(name, p string) error { + if err := ValidateMountPath(p); err != nil { + return fmt.Errorf("telemetry.paths.%s: %w", name, err) + } + return nil +} + +// ValidateMountPath enforces the shared mount-path rules used by both +// the YAML loader (`telemetry.paths.*`) and the HTTP server +// (`ServerConfig.Paths.*`). Stricter than a bare leading-slash check: +// rejects whitespace, control bytes, query strings, and fragments — +// inputs that would let http.ServeMux.Handle panic at registration +// time rather than surface a clean operator-facing error. +func ValidateMountPath(p string) error { + if p == "" || p[0] != '/' { + return fmt.Errorf("must be an absolute path starting with '/' (got %q)", p) + } + if strings.ContainsAny(p, " \t\n\r") { + return fmt.Errorf("must not contain whitespace (got %q)", p) + } + if strings.ContainsAny(p, "?#") { + return fmt.Errorf("must be a bare path with no query or fragment (got %q)", p) + } + return nil +} + +// splitHostPort wraps net.SplitHostPort so the caller doesn't have +// to import net just for the validation path. We keep the dep +// shallow because the surface is small. +func splitHostPort(addr string) (host, port string, err error) { + host, port, err = netSplitHostPort(addr) + if err != nil { + return "", "", fmt.Errorf("invalid host:port %q: %w", addr, err) + } + return host, port, nil } // Service holds the pipeline assembly block. diff --git a/internal/config/load.go b/internal/config/load.go index 2fb2ebe2..bdd6e4ab 100644 --- a/internal/config/load.go +++ b/internal/config/load.go @@ -82,6 +82,8 @@ func Load(path string) (*Config, error) { return nil, &LoadError{Path: path, Err: errors.New("multi-document YAML not supported; the loader reads only the first `---` block")} } + cfg.Telemetry.applyDefaults() + if err := cfg.validate(); err != nil { return nil, &LoadError{Path: path, Err: err} } @@ -89,9 +91,13 @@ func Load(path string) (*Config, error) { } // validate runs the cross-section checks the YAML decoder alone can't -// catch (pipeline-key format, component-reference resolution). The -// caller wraps with a *LoadError to attach the file path. +// catch (pipeline-key format, component-reference resolution, +// telemetry block sanity). The caller wraps with a *LoadError to +// attach the file path. func (c *Config) validate() error { + if err := c.Telemetry.validate(); err != nil { + return err + } for key, p := range c.Service.Pipelines { if _, _, err := ParsePipelineID(key); err != nil { return fmt.Errorf("service.pipelines.%s: %w", key, err) diff --git a/internal/config/telemetry_test.go b/internal/config/telemetry_test.go new file mode 100644 index 00000000..678db725 --- /dev/null +++ b/internal/config/telemetry_test.go @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: Apache-2.0 + +package config_test + +import ( + "os" + "path/filepath" + "testing" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/config" +) + +// writeTmp writes contents to a fresh temp file and returns the path. +func writeTmp(t *testing.T, contents string) string { + t.Helper() + p := filepath.Join(t.TempDir(), "config.yaml") + require.NoError(t, os.WriteFile(p, []byte(contents), 0o600)) + return p +} + +// TestTelemetry_DefaultsToDisabled pins the operator-safety contract: +// a config that omits the `telemetry:` block must leave the surface +// OFF. Self-telemetry is opt-in. +func TestTelemetry_DefaultsToDisabled(t *testing.T) { + t.Parallel() + const yaml = ` +receivers: {} +exporters: {} +service: + pipelines: {} +` + cfg, err := config.Load(writeTmp(t, yaml)) + require.NoError(t, err) + require.False(t, cfg.Telemetry.Enabled, "default must be off") +} + +// TestTelemetry_LoadsListenAndPaths exercises a full opt-in block. +func TestTelemetry_LoadsListenAndPaths(t *testing.T) { + t.Parallel() + const yaml = ` +telemetry: + enabled: true + listen: ":8888" + paths: + metrics: /m + healthz: /h + readyz: /r +` + cfg, err := config.Load(writeTmp(t, yaml)) + require.NoError(t, err) + require.True(t, cfg.Telemetry.Enabled) + require.Equal(t, ":8888", cfg.Telemetry.Listen) + require.Equal(t, "/m", cfg.Telemetry.Paths.Metrics) + require.Equal(t, "/h", cfg.Telemetry.Paths.Healthz) + require.Equal(t, "/r", cfg.Telemetry.Paths.Readyz) +} + +// TestTelemetry_AppliesDefaultsOnEmptyBlock pins the contract: an +// `enabled: true` block with no listen + paths gets sensible +// defaults so operators don't have to spell out the whole block. +func TestTelemetry_AppliesDefaultsOnEmptyBlock(t *testing.T) { + t.Parallel() + const yaml = ` +telemetry: + enabled: true +` + cfg, err := config.Load(writeTmp(t, yaml)) + require.NoError(t, err) + require.True(t, cfg.Telemetry.Enabled) + require.Equal(t, "localhost:8888", cfg.Telemetry.Listen, "default listen is localhost-only") + require.Equal(t, "/metrics", cfg.Telemetry.Paths.Metrics) + require.Equal(t, "/healthz", cfg.Telemetry.Paths.Healthz) + require.Equal(t, "/readyz", cfg.Telemetry.Paths.Readyz) +} + +// TestTelemetry_RejectsBadListen pins that a malformed listen address +// is caught at validate time rather than at HTTP-Start time, when the +// error would land in the runtime log instead of `validate` output. +func TestTelemetry_RejectsBadListen(t *testing.T) { + t.Parallel() + const yaml = ` +telemetry: + enabled: true + listen: "notavalidaddress" +` + _, err := config.Load(writeTmp(t, yaml)) + require.Error(t, err) + require.Contains(t, err.Error(), "telemetry.listen", "error points at the offending field") +} + +// TestTelemetry_RejectsNonAbsolutePath pins that paths without the +// leading "/" are an obvious operator typo. +func TestTelemetry_RejectsNonAbsolutePath(t *testing.T) { + t.Parallel() + const yaml = ` +telemetry: + enabled: true + paths: + metrics: metrics +` + _, err := config.Load(writeTmp(t, yaml)) + require.Error(t, err) + require.Contains(t, err.Error(), "telemetry.paths.metrics") +} + +// TestTelemetry_DisabledSkipsValidation pins the operator-UX policy: +// fields are only validated when telemetry.enabled is true. A disabled +// block with a malformed listen must still load (so operators can +// keep their full production block commented as a template). +func TestTelemetry_DisabledSkipsValidation(t *testing.T) { + t.Parallel() + const yaml = ` +telemetry: + enabled: false + listen: "" + paths: + metrics: nope +` + cfg, err := config.Load(writeTmp(t, yaml)) + require.NoError(t, err, "disabled block must not gate on listen/path validation") + require.False(t, cfg.Telemetry.Enabled) +} diff --git a/internal/pipeline/component.go b/internal/pipeline/component.go index 2479749c..f0cea71d 100644 --- a/internal/pipeline/component.go +++ b/internal/pipeline/component.go @@ -7,6 +7,7 @@ import ( "log/slog" "go.opentelemetry.io/collector/pdata/pcommon" + "go.opentelemetry.io/otel/metric" ) // Component is the runtime lifecycle contract every receiver, processor, @@ -26,13 +27,14 @@ type Component interface { } // Host is the runtime's downward-facing surface for a Component. The -// shape intentionally mirrors go.opentelemetry.io/collector/component.Host -// at v0.152.0 with two deferred fields: +// shape mirrors go.opentelemetry.io/collector/component.Host at +// v1.55.0 (collector v0.152.0): GetExtensions only. // -// - GetFactory: deferred to a future milestone; M1 components do not need -// to look up sibling factories at runtime. -// - StatusEvent shape: ReportStatus accepts an opaque event for now; the -// concrete type stabilises once M2 wires self-telemetry. +// Component status reporting moved to a free function +// `componentstatus.ReportStatus(host, ev)` in M2 — see +// `internal/componentstatus` and docs/STRATEGY.md "Host.ReportStatus" +// divergence row. Hosts that want to record status events implement +// the optional `componentstatus.StatusReporter` interface. // // pipelinetest.NewHost() returns a no-op implementation suitable for // component unit tests. @@ -41,17 +43,6 @@ type Host interface { // In M1 the runtime has no extensions; implementations return an // empty map. Never returns nil. GetExtensions() map[ID]Component - - // ReportStatus records a status event for the calling Component. - // M1 implementations may discard the event; the API exists so - // components can be written today without churning interfaces in M2. - // - // Deprecated: M2 will migrate to OTel v0.152's pattern where - // status reporting is a free function `componentstatus.ReportStatus(host, event)` - // rather than a Host method. See docs/STRATEGY.md "Host.ReportStatus" - // divergence row. Receivers should call this method today but expect - // the migration; the rename will be a one-line edit per call site. - ReportStatus(event StatusEvent) } // StatusEvent is an opaque status report from a Component to the Host. @@ -69,16 +60,25 @@ type StatusEvent struct { } // TelemetrySettings carries the per-Component observability handles the -// runtime injects at construction time. The shape is intentionally -// narrow at M1 — Logger + Resource — and will grow when self-telemetry -// (M2) and tracing land. +// runtime injects at construction time. The shape mirrors OTel +// component.TelemetrySettings at v1.55.0 minus TracerProvider (deferred +// to post-v1; see docs/STRATEGY.md). Logger stays slog (documented +// divergence from OTel's zap). type TelemetrySettings struct { // Logger is scoped to the Component (typically with attributes for // the Component's kind and instance name pre-attached). Logger *slog.Logger + // MeterProvider is the OTel metric.MeterProvider the Component + // uses to acquire a Meter and register instruments. The runtime + // substitutes a noop provider when self-telemetry is disabled so + // receiver code never has to nil-check. + MeterProvider metric.MeterProvider + // Resource describes the collector instance (host.name, service.name, // etc.) and is attached to any data the Component emits about itself. // Components should treat it as read-only. Resource pcommon.Resource + + _ struct{} } diff --git a/internal/pipeline/component_extension_test.go b/internal/pipeline/component_extension_test.go new file mode 100644 index 00000000..000d4f99 --- /dev/null +++ b/internal/pipeline/component_extension_test.go @@ -0,0 +1,81 @@ +// SPDX-License-Identifier: Apache-2.0 + +package pipeline_test + +import ( + "context" + "io" + "log/slog" + "testing" + + "github.com/stretchr/testify/require" + "go.opentelemetry.io/collector/pdata/pcommon" + "go.opentelemetry.io/otel/metric" + "go.opentelemetry.io/otel/metric/noop" + + "github.com/tracecoreai/tracecore/internal/pipeline" +) + +// TestCreateSettings_HasBuildInfo pins M2: CreateSettings carries a +// BuildInfo struct so factories can stamp version metadata onto +// emitted data without reaching into a package-level global. The +// three fields mirror OTel component.BuildInfo at v1.55.0 (Command, +// Description, Version); tracecore's revision + build date stay in +// internal/version, not BuildInfo, to keep STRATEGY's M2 row of +// "Add BuildInfo + unkeyed-init guard" minimal. +func TestCreateSettings_HasBuildInfo(t *testing.T) { + t.Parallel() + + settings := pipeline.CreateSettings{ + ID: pipeline.MustNewID(pipeline.MustNewType("test"), "x"), + Telemetry: pipeline.TelemetrySettings{}, + BuildInfo: pipeline.BuildInfo{ + Command: "tracecore", + Description: "tracecore OTel-compatible collector", + Version: "v0.1.0", + }, + } + + require.Equal(t, "tracecore", settings.BuildInfo.Command) + require.Equal(t, "tracecore OTel-compatible collector", settings.BuildInfo.Description) + require.Equal(t, "v0.1.0", settings.BuildInfo.Version) +} + +// TestTelemetrySettings_HasMeterProvider pins M2: TelemetrySettings +// carries a metric.MeterProvider so receiver authors acquire a Meter +// in the same place they already get Logger + Resource. Mirrors OTel +// component.TelemetrySettings at v1.55.0. +func TestTelemetrySettings_HasMeterProvider(t *testing.T) { + t.Parallel() + + mp := noop.NewMeterProvider() + tel := pipeline.TelemetrySettings{ + Logger: slog.New(slog.NewTextHandler(io.Discard, nil)), + Resource: pcommon.NewResource(), + MeterProvider: mp, + } + + require.NotNil(t, tel.MeterProvider, "MeterProvider must round-trip") + // Receivers acquire a Meter from the provider; assert that path works. + meter := tel.MeterProvider.Meter("tracecore.test") + require.NotNil(t, meter) + // And that the noop meter doesn't panic on a real instrument call. + ctr, err := meter.Int64Counter("test.counter") + require.NoError(t, err) + ctr.Add(context.Background(), 1, metric.WithAttributes()) +} + +// TestTelemetrySettings_NilMeterProvider_NotPanicOnReceiverDefault +// pins the safety invariant for receivers built against the zero-value +// TelemetrySettings: M2's wire-up MUST default MeterProvider to a +// noop so receivers don't have to nil-check. The zero-value struct's +// MeterProvider IS nil; the runtime is responsible for substituting +// a noop. This test documents the contract via a comment that +// receivers must trust: "telSet.MeterProvider is never nil when the +// runtime constructs the settings." +func TestTelemetrySettings_ZeroValue_MeterProviderIsNil(t *testing.T) { + t.Parallel() + + var tel pipeline.TelemetrySettings + require.Nil(t, tel.MeterProvider, "zero-value documents runtime's responsibility to substitute noop") +} diff --git a/internal/pipeline/factory.go b/internal/pipeline/factory.go index bc0affca..1beaa11f 100644 --- a/internal/pipeline/factory.go +++ b/internal/pipeline/factory.go @@ -30,9 +30,36 @@ type Config interface { // CreateSettings is the bundle a factory's create-X method receives at // construction time. +// +// The trailing unexported `_ struct{}` rejects positional struct +// literals at compile time, so adding fields stays non-breaking. +// Mirrors OTel component.Settings shape at v1.55.0 (see +// docs/STRATEGY.md M2 row for "CreateSettings shape"). type CreateSettings struct { ID ID Telemetry TelemetrySettings + BuildInfo BuildInfo + + _ struct{} +} + +// BuildInfo carries the binary identity factories may stamp onto +// emitted data. Mirrors go.opentelemetry.io/collector/component.BuildInfo +// at v1.55.0: three fields, plus the unkeyed-init guard. The richer +// version metadata tracecore carries (revision, build date) lives in +// `internal/version` and is not duplicated here. +type BuildInfo struct { + // Command is the executable file name, e.g. "tracecore". + Command string + + // Description is a human-readable name, e.g. "tracecore + // telemetry collector". + Description string + + // Version is the binary version string, e.g. "v0.1.0". + Version string + + _ struct{} } // ReceiverFactory creates Components that pull data into a pipeline. diff --git a/internal/pipeline/pipelinetest/fakes.go b/internal/pipeline/pipelinetest/fakes.go new file mode 100644 index 00000000..6c3d6866 --- /dev/null +++ b/internal/pipeline/pipelinetest/fakes.go @@ -0,0 +1,88 @@ +// SPDX-License-Identifier: Apache-2.0 + +package pipelinetest + +import ( + "bytes" + "context" + "sync" + "sync/atomic" + + "go.opentelemetry.io/collector/pdata/pmetric" + + "github.com/tracecoreai/tracecore/internal/consumer" +) + +// SyncBuffer is a concurrent-safe bytes.Buffer for integration tests +// where an exec.Cmd writer races a test goroutine reader. +type SyncBuffer struct { + mu sync.Mutex + buf bytes.Buffer +} + +// Write satisfies io.Writer. +func (b *SyncBuffer) Write(p []byte) (int, error) { + b.mu.Lock() + defer b.mu.Unlock() + return b.buf.Write(p) +} + +// String returns a snapshot of the buffer's current contents. +func (b *SyncBuffer) String() string { + b.mu.Lock() + defer b.mu.Unlock() + return b.buf.String() +} + +// RecordingMetricsSink captures pushed metrics for test assertion. +// Pushed buffers up to 128 payloads; further pushes drop silently +// so a slow test can't wedge the producer goroutine. +type RecordingMetricsSink struct { + Pushed chan pmetric.Metrics + count atomic.Int32 +} + +// NewRecordingMetricsSink returns a sink with a 128-entry buffer. +func NewRecordingMetricsSink() *RecordingMetricsSink { + return &RecordingMetricsSink{Pushed: make(chan pmetric.Metrics, 128)} +} + +// ConsumeMetrics records the payload and bumps the call counter. +func (s *RecordingMetricsSink) ConsumeMetrics(_ context.Context, md pmetric.Metrics) error { + s.count.Add(1) + select { + case s.Pushed <- md: + default: + } + return nil +} + +// Capabilities reports MutatesData=false. +func (*RecordingMetricsSink) Capabilities() consumer.Capabilities { + return consumer.Capabilities{} +} + +// Count returns the cumulative ConsumeMetrics call count. +func (s *RecordingMetricsSink) Count() int32 { return s.count.Load() } + +// FailingMetricsSink returns Err from every ConsumeMetrics call. +// Setting Err to nil reverts to silent-success — guard against that +// in tests that mean to exercise the failure path. +type FailingMetricsSink struct { + Err error + calls atomic.Int32 +} + +// ConsumeMetrics increments the call counter and returns s.Err. +func (s *FailingMetricsSink) ConsumeMetrics(_ context.Context, _ pmetric.Metrics) error { + s.calls.Add(1) + return s.Err +} + +// Capabilities reports MutatesData=false. +func (*FailingMetricsSink) Capabilities() consumer.Capabilities { + return consumer.Capabilities{MutatesData: false} +} + +// Calls returns the cumulative ConsumeMetrics call count. +func (s *FailingMetricsSink) Calls() int32 { return s.calls.Load() } diff --git a/internal/pipeline/pipelinetest/fakes_test.go b/internal/pipeline/pipelinetest/fakes_test.go new file mode 100644 index 00000000..d4601e84 --- /dev/null +++ b/internal/pipeline/pipelinetest/fakes_test.go @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: Apache-2.0 + +package pipelinetest_test + +import ( + "context" + "errors" + "sync" + "testing" + + "github.com/stretchr/testify/require" + "go.opentelemetry.io/collector/pdata/pmetric" + + "github.com/tracecoreai/tracecore/internal/pipeline/pipelinetest" +) + +// TestSyncBuffer_ConcurrentWritesAreSafe pins the concurrent-safety +// contract: parallel writers don't race, every byte lands. +func TestSyncBuffer_ConcurrentWritesAreSafe(t *testing.T) { + t.Parallel() + + var b pipelinetest.SyncBuffer + const writers = 16 + const writesPerWriter = 100 + + var wg sync.WaitGroup + wg.Add(writers) + for range writers { + go func() { + defer wg.Done() + for range writesPerWriter { + _, _ = b.Write([]byte("x")) + } + }() + } + wg.Wait() + + require.Len(t, b.String(), writers*writesPerWriter) +} + +// TestRecordingMetricsSink_CountsAndCaptures pins the recording +// contract. +func TestRecordingMetricsSink_CountsAndCaptures(t *testing.T) { + t.Parallel() + + sink := pipelinetest.NewRecordingMetricsSink() + md := pmetric.NewMetrics() + + require.NoError(t, sink.ConsumeMetrics(context.Background(), md)) + require.NoError(t, sink.ConsumeMetrics(context.Background(), md)) + require.EqualValues(t, 2, sink.Count()) + require.False(t, sink.Capabilities().MutatesData) + + select { + case <-sink.Pushed: + default: + t.Fatal("expected a payload on Pushed channel") + } +} + +// TestFailingMetricsSink_ReturnsErrAndCounts pins the failing +// contract: every call increments and returns the configured error. +func TestFailingMetricsSink_ReturnsErrAndCounts(t *testing.T) { + t.Parallel() + + want := errors.New("boom") + sink := &pipelinetest.FailingMetricsSink{Err: want} + md := pmetric.NewMetrics() + + require.ErrorIs(t, sink.ConsumeMetrics(context.Background(), md), want) + require.ErrorIs(t, sink.ConsumeMetrics(context.Background(), md), want) + require.EqualValues(t, 2, sink.Calls()) + require.False(t, sink.Capabilities().MutatesData) +} diff --git a/internal/pipeline/pipelinetest/fixture_test.go b/internal/pipeline/pipelinetest/fixture_test.go index 4ffb27e5..bc1d504b 100644 --- a/internal/pipeline/pipelinetest/fixture_test.go +++ b/internal/pipeline/pipelinetest/fixture_test.go @@ -9,6 +9,7 @@ import ( "github.com/stretchr/testify/require" + "github.com/tracecoreai/tracecore/internal/componentstatus" "github.com/tracecoreai/tracecore/internal/pipeline" "github.com/tracecoreai/tracecore/internal/pipeline/pipelinetest" ) @@ -33,8 +34,8 @@ func TestHost_RecordsStatusEvents(t *testing.T) { t.Parallel() host := pipelinetest.NewHost() - host.ReportStatus(pipeline.StatusEvent{Kind: "starting"}) - host.ReportStatus(pipeline.StatusEvent{Kind: "permanent-error", Err: errors.New("boom")}) + componentstatus.ReportStatus(host, pipeline.StatusEvent{Kind: "starting"}) + componentstatus.ReportStatus(host, pipeline.StatusEvent{Kind: "permanent-error", Err: errors.New("boom")}) events := host.StatusEvents() require.Len(t, events, 2) diff --git a/internal/pipeline/pipelinetest/host.go b/internal/pipeline/pipelinetest/host.go index fc33ce13..629a4c0d 100644 --- a/internal/pipeline/pipelinetest/host.go +++ b/internal/pipeline/pipelinetest/host.go @@ -8,9 +8,10 @@ import ( "github.com/tracecoreai/tracecore/internal/pipeline" ) -// Host is a no-op pipeline.Host suitable for unit tests. It records -// every StatusEvent the Component reports so a test can assert on -// lifecycle reporting without wiring a real runtime. +// Host is a pipeline.Host suitable for unit tests. It also implements +// the optional componentstatus.StatusReporter interface so tests can +// assert on the events a Component would have reported via the free +// function `componentstatus.ReportStatus(host, ev)`. // // Host is safe for concurrent use; tests that spawn goroutines through // the Component can call StatusEvents from the main test goroutine @@ -42,8 +43,10 @@ func (h *Host) GetExtensions() map[pipeline.ID]pipeline.Component { return out } -// ReportStatus appends the event to the Host's record. -func (h *Host) ReportStatus(event pipeline.StatusEvent) { +// ReportComponentStatus implements componentstatus.StatusReporter +// so the free fn `componentstatus.ReportStatus(host, ev)` delegates +// here. Appends the event to the Host's record. +func (h *Host) ReportComponentStatus(event pipeline.StatusEvent) { h.mu.Lock() defer h.mu.Unlock() h.events = append(h.events, event) diff --git a/internal/pipeline/runtime.go b/internal/pipeline/runtime.go index 0cc8ab52..c14603a1 100644 --- a/internal/pipeline/runtime.go +++ b/internal/pipeline/runtime.go @@ -383,4 +383,3 @@ func safeShutdown(ctx context.Context, c Component) (err error) { type noopHost struct{} func (noopHost) GetExtensions() map[ID]Component { return map[ID]Component{} } -func (noopHost) ReportStatus(_ StatusEvent) {} diff --git a/internal/pipeline/runtime_test.go b/internal/pipeline/runtime_test.go index 73278ca2..c27e010d 100644 --- a/internal/pipeline/runtime_test.go +++ b/internal/pipeline/runtime_test.go @@ -14,6 +14,7 @@ import ( "github.com/stretchr/testify/require" + "github.com/tracecoreai/tracecore/internal/componentstatus" "github.com/tracecoreai/tracecore/internal/pipeline" ) @@ -440,8 +441,11 @@ func TestRuntime_NoopHost_ReturnsEmptyExtensions(t *testing.T) { require.NotNil(t, observedHost) require.Empty(t, observedHost.GetExtensions()) - // ReportStatus must not panic on the noop host. - require.NotPanics(t, func() { observedHost.ReportStatus(pipeline.StatusEvent{Kind: "anything"}) }) + // componentstatus.ReportStatus must not panic when the host + // does not implement StatusReporter (the runtime's noop host). + require.NotPanics(t, func() { + componentstatus.ReportStatus(observedHost, pipeline.StatusEvent{Kind: "anything"}) + }) } // hostProbe is a stubComponent variant that captures the Host it sees diff --git a/internal/pipelinebuilder/builder.go b/internal/pipelinebuilder/builder.go index 079208f7..11356da8 100644 --- a/internal/pipelinebuilder/builder.go +++ b/internal/pipelinebuilder/builder.go @@ -12,16 +12,84 @@ import ( "errors" "fmt" "log/slog" + "os" "sort" "strings" "go.opentelemetry.io/collector/pdata/pcommon" + "go.opentelemetry.io/otel/metric" + "go.opentelemetry.io/otel/metric/noop" "gopkg.in/yaml.v3" "github.com/tracecoreai/tracecore/internal/config" "github.com/tracecoreai/tracecore/internal/pipeline" ) +// BuildOption configures BuildPipelines without breaking the +// signature. Useful when the caller has a MeterProvider (M2) or +// BuildInfo (M2) to inject; tests + tools that don't care omit +// the options and get the noop defaults. +type BuildOption func(*buildOptions) + +type buildOptions struct { + meterProvider metric.MeterProvider + buildInfo pipeline.BuildInfo + resource pcommon.Resource +} + +func newBuildOptions(opts []BuildOption) buildOptions { + o := buildOptions{meterProvider: noop.NewMeterProvider()} + for _, opt := range opts { + opt(&o) + } + if o.meterProvider == nil { + o.meterProvider = noop.NewMeterProvider() + } + // Build the default Resource lazily once per BuildPipelines call. + // Receivers reading set.Telemetry.Resource get OTel-semconv + // canonical attributes (host.name, service.name, service.version, + // service.instance.id) rather than a bare empty Resource. + o.resource = newDefaultResource(o.buildInfo) + return o +} + +// newDefaultResource constructs the pcommon.Resource the runtime +// stamps on every component's TelemetrySettings. Honest defaults: +// host.name from os.Hostname(); service.name/version from BuildInfo; +// service.instance.id falls back to host.name when nothing better +// exists. Missing values stay absent rather than getting placeholder +// strings that operators would mistake for real data. +func newDefaultResource(bi pipeline.BuildInfo) pcommon.Resource { + res := pcommon.NewResource() + attrs := res.Attributes() + if hn, err := os.Hostname(); err == nil && hn != "" { + attrs.PutStr("host.name", hn) + attrs.PutStr("service.instance.id", hn) + } + if bi.Command != "" { + attrs.PutStr("service.name", bi.Command) + } + if bi.Version != "" { + attrs.PutStr("service.version", bi.Version) + } + return res +} + +// WithMeterProvider sets the metric.MeterProvider every component's +// TelemetrySettings.MeterProvider points at. Defaults to a noop +// provider so receivers never have to nil-check. +func WithMeterProvider(mp metric.MeterProvider) BuildOption { + return func(o *buildOptions) { o.meterProvider = mp } +} + +// WithBuildInfo sets the BuildInfo every component's CreateSettings +// carries. Defaults to the zero-value (operators see empty strings), +// which is fine for unit tests; cmd/tracecore populates it from +// internal/version. +func WithBuildInfo(bi pipeline.BuildInfo) BuildOption { + return func(o *buildOptions) { o.buildInfo = bi } +} + // BuildPipelines turns a loaded config into the runtime's Pipeline // list using the supplied factories. Per RFC-0004, the assembly is // bottom-up: exporters constructed first, wrapped in a fan-out @@ -38,7 +106,10 @@ func BuildPipelines( logger *slog.Logger, cfg *config.Config, factories pipeline.Factories, + opts ...BuildOption, ) ([]pipeline.Pipeline, error) { + bopts := newBuildOptions(opts) + if !hasFactories(factories) { if hasOperatorIntent(cfg) { return nil, errors.New("no component factories registered; only an empty config is accepted until at least one factory is wired") @@ -74,11 +145,11 @@ func BuildPipelines( var p pipeline.Pipeline switch signal { case pipeline.SignalMetrics: - p, err = buildSignalPipeline(ctx, metricsOps, logger, pID, cfg, pConfig, factories) + p, err = buildSignalPipeline(ctx, metricsOps, logger, pID, cfg, pConfig, factories, bopts) case pipeline.SignalTraces: - p, err = buildSignalPipeline(ctx, tracesOps, logger, pID, cfg, pConfig, factories) + p, err = buildSignalPipeline(ctx, tracesOps, logger, pID, cfg, pConfig, factories, bopts) case pipeline.SignalLogs: - p, err = buildSignalPipeline(ctx, logsOps, logger, pID, cfg, pConfig, factories) + p, err = buildSignalPipeline(ctx, logsOps, logger, pID, cfg, pConfig, factories, bopts) default: // Unreachable: ParsePipelineID rejects non-metrics/traces/logs // keys. Defense in depth so a fourth Signal value added to @@ -146,6 +217,7 @@ func resolveComponent( section map[string]yaml.Node, defaultConfig func() pipeline.Config, logger *slog.Logger, + bopts buildOptions, ) (componentSet, error) { typ, instance, err := splitName(name) if err != nil { @@ -181,9 +253,11 @@ func resolveComponent( Settings: pipeline.CreateSettings{ ID: cID, Telemetry: pipeline.TelemetrySettings{ - Logger: logger.With("component", cID.String()), - Resource: pcommon.NewResource(), + Logger: logger.With("component", cID.String()), + MeterProvider: bopts.meterProvider, + Resource: bopts.resource, }, + BuildInfo: bopts.buildInfo, }, }, nil } @@ -201,15 +275,16 @@ func buildSignalPipeline[C any]( cfg *config.Config, pConfig config.Pipeline, factories pipeline.Factories, + bopts buildOptions, ) (pipeline.Pipeline, error) { - exporters, expConsumers, err := buildExporters(ctx, ops, logger, pID, cfg, pConfig.Exporters, factories.Exporters) + exporters, expConsumers, err := buildExporters(ctx, ops, logger, pID, cfg, pConfig.Exporters, factories.Exporters, bopts) if err != nil { return pipeline.Pipeline{}, err } next := ops.newFanout(expConsumers) - processors, next, err := buildProcessors(ctx, ops, logger, pID, cfg, pConfig.Processors, factories.Processors, next) + processors, next, err := buildProcessors(ctx, ops, logger, pID, cfg, pConfig.Processors, factories.Processors, next, bopts) if err != nil { return pipeline.Pipeline{}, err } @@ -219,7 +294,7 @@ func buildSignalPipeline[C any]( // operators verify aliveness without external tooling. next = ops.wrapFirstData(pID.String(), logger, next) - receivers, err := buildReceivers(ctx, ops, logger, pID, cfg, pConfig.Receivers, factories.Receivers, next) + receivers, err := buildReceivers(ctx, ops, logger, pID, cfg, pConfig.Receivers, factories.Receivers, next, bopts) if err != nil { return pipeline.Pipeline{}, err } @@ -243,6 +318,7 @@ func buildExporters[C any]( cfg *config.Config, names []string, factories map[pipeline.Type]pipeline.ExporterFactory, + bopts buildOptions, ) ([]pipeline.Exporter, []C, error) { exporters := make([]pipeline.Exporter, 0, len(names)) consumers := make([]C, 0, len(names)) @@ -255,7 +331,7 @@ func buildExporters[C any]( if !ok { return nil, nil, fmt.Errorf("pipeline %s: unknown exporter type %q", pID, typ) } - cs, err := resolveComponent(pID, "exporter", name, cfg.Exporters, f.CreateDefaultConfig, logger) + cs, err := resolveComponent(pID, "exporter", name, cfg.Exporters, f.CreateDefaultConfig, logger, bopts) if err != nil { return nil, nil, err } @@ -287,6 +363,7 @@ func buildProcessors[C any]( names []string, factories map[pipeline.Type]pipeline.ProcessorFactory, next C, + bopts buildOptions, ) ([]pipeline.Processor, C, error) { processors := make([]pipeline.Processor, len(names)) for i := len(names) - 1; i >= 0; i-- { @@ -299,7 +376,7 @@ func buildProcessors[C any]( if !ok { return nil, next, fmt.Errorf("pipeline %s: unknown processor type %q", pID, typ) } - cs, err := resolveComponent(pID, "processor", name, cfg.Processors, f.CreateDefaultConfig, logger) + cs, err := resolveComponent(pID, "processor", name, cfg.Processors, f.CreateDefaultConfig, logger, bopts) if err != nil { return nil, next, err } @@ -326,6 +403,7 @@ func buildReceivers[C any]( names []string, factories map[pipeline.Type]pipeline.ReceiverFactory, next C, + bopts buildOptions, ) ([]pipeline.Receiver, error) { receivers := make([]pipeline.Receiver, 0, len(names)) for _, name := range names { @@ -337,7 +415,7 @@ func buildReceivers[C any]( if !ok { return nil, fmt.Errorf("pipeline %s: unknown receiver type %q", pID, typ) } - cs, err := resolveComponent(pID, "receiver", name, cfg.Receivers, f.CreateDefaultConfig, logger) + cs, err := resolveComponent(pID, "receiver", name, cfg.Receivers, f.CreateDefaultConfig, logger, bopts) if err != nil { return nil, err } diff --git a/internal/pipelinebuilder/telemetry_wiring_test.go b/internal/pipelinebuilder/telemetry_wiring_test.go new file mode 100644 index 00000000..47590432 --- /dev/null +++ b/internal/pipelinebuilder/telemetry_wiring_test.go @@ -0,0 +1,137 @@ +// SPDX-License-Identifier: Apache-2.0 + +package pipelinebuilder_test + +import ( + "context" + "sync" + "testing" + + "github.com/stretchr/testify/require" + "go.opentelemetry.io/otel/metric/noop" + "gopkg.in/yaml.v3" + + "github.com/tracecoreai/tracecore/internal/config" + "github.com/tracecoreai/tracecore/internal/consumer" + "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/pipelinebuilder" +) + +// captureFactory is a stub receiver factory that records the +// CreateSettings it sees so a test can assert MeterProvider + +// BuildInfo flowed through. +type captureFactory struct { + mu sync.Mutex + set pipeline.CreateSettings +} + +func (*captureFactory) Type() pipeline.Type { return pipeline.MustNewType("capture") } +func (*captureFactory) CreateDefaultConfig() pipeline.Config { return &emptyConfig{} } + +func (f *captureFactory) CreateMetrics(_ context.Context, set pipeline.CreateSettings, _ pipeline.Config, _ consumer.Metrics) (pipeline.Receiver, error) { + f.mu.Lock() + f.set = set + f.mu.Unlock() + return noopComponent{}, nil +} + +func (*captureFactory) CreateTraces(_ context.Context, _ pipeline.CreateSettings, _ pipeline.Config, _ consumer.Traces) (pipeline.Receiver, error) { + return nil, pipeline.ErrSignalNotSupported +} + +func (*captureFactory) CreateLogs(_ context.Context, _ pipeline.CreateSettings, _ pipeline.Config, _ consumer.Logs) (pipeline.Receiver, error) { + return nil, pipeline.ErrSignalNotSupported +} + +func (f *captureFactory) seen() pipeline.CreateSettings { + f.mu.Lock() + defer f.mu.Unlock() + return f.set +} + +func minimalCfg(t *testing.T) *config.Config { + t.Helper() + var node yaml.Node + require.NoError(t, yaml.Unmarshal([]byte("{}"), &node)) + return &config.Config{ + Receivers: map[string]yaml.Node{"capture": node}, + Exporters: map[string]yaml.Node{"sink": node}, + Service: config.Service{ + Pipelines: map[string]config.Pipeline{ + "metrics/primary": { + Receivers: []string{"capture"}, + Exporters: []string{"sink"}, + }, + }, + }, + } +} + +// TestBuildPipelines_DefaultsMeterProviderToNoop pins the safety +// invariant: receivers MUST never see a nil MeterProvider. When the +// caller doesn't pass WithMeterProvider, BuildPipelines substitutes a +// noop, so receiver code can write `telSet.MeterProvider.Meter(...)` +// without a nil-check. +func TestBuildPipelines_DefaultsMeterProviderToNoop(t *testing.T) { + t.Parallel() + + rf := &captureFactory{} + factories := pipeline.Factories{ + Receivers: map[pipeline.Type]pipeline.ReceiverFactory{rf.Type(): rf}, + Exporters: map[pipeline.Type]pipeline.ExporterFactory{ + pipeline.MustNewType("sink"): &sinkExporterFactory{}, + }, + } + + _, err := pipelinebuilder.BuildPipelines(t.Context(), discardLogger(), minimalCfg(t), factories) + require.NoError(t, err) + + set := rf.seen() + require.NotNil(t, set.Telemetry.MeterProvider, "MeterProvider must default to noop, not stay nil") +} + +// TestBuildPipelines_WithMeterProvider_FlowsThrough pins the wire-up: +// when the caller passes WithMeterProvider, the receiver sees that +// exact provider. +func TestBuildPipelines_WithMeterProvider_FlowsThrough(t *testing.T) { + t.Parallel() + + mp := noop.NewMeterProvider() + rf := &captureFactory{} + factories := pipeline.Factories{ + Receivers: map[pipeline.Type]pipeline.ReceiverFactory{rf.Type(): rf}, + Exporters: map[pipeline.Type]pipeline.ExporterFactory{ + pipeline.MustNewType("sink"): &sinkExporterFactory{}, + }, + } + + _, err := pipelinebuilder.BuildPipelines(t.Context(), discardLogger(), minimalCfg(t), factories, + pipelinebuilder.WithMeterProvider(mp)) + require.NoError(t, err) + + set := rf.seen() + require.Equal(t, mp, set.Telemetry.MeterProvider, + "WithMeterProvider must be the exact provider the receiver sees") +} + +// TestBuildPipelines_WithBuildInfo_FlowsThrough pins that BuildInfo +// flows from BuildPipelines through to the receiver's CreateSettings. +func TestBuildPipelines_WithBuildInfo_FlowsThrough(t *testing.T) { + t.Parallel() + + rf := &captureFactory{} + factories := pipeline.Factories{ + Receivers: map[pipeline.Type]pipeline.ReceiverFactory{rf.Type(): rf}, + Exporters: map[pipeline.Type]pipeline.ExporterFactory{ + pipeline.MustNewType("sink"): &sinkExporterFactory{}, + }, + } + + bi := pipeline.BuildInfo{Command: "tracecore", Description: "test build", Version: "v0.2.0"} + _, err := pipelinebuilder.BuildPipelines(t.Context(), discardLogger(), minimalCfg(t), factories, + pipelinebuilder.WithBuildInfo(bi)) + require.NoError(t, err) + + set := rf.seen() + require.Equal(t, bi, set.BuildInfo) +} diff --git a/internal/selftelemetry/capturing.go b/internal/selftelemetry/capturing.go new file mode 100644 index 00000000..0eb79dac --- /dev/null +++ b/internal/selftelemetry/capturing.go @@ -0,0 +1,113 @@ +// SPDX-License-Identifier: Apache-2.0 + +package selftelemetry + +import ( + "sync" + "time" +) + +// CapturingReceiver records every call for test assertion. Use in +// receiver unit tests when you want to verify the receiver calls +// the right self-telemetry method with the right kind/value. +// +// Concurrent-safe; tests spawning receiver goroutines can read the +// accessor methods from the main test goroutine. +type CapturingReceiver struct { + mu sync.Mutex + errors []string + emissions []int64 + latencies []time.Duration + degraded []bool + activityHits int +} + +// NewCapturingReceiver returns a Receiver that records every call. +func NewCapturingReceiver() *CapturingReceiver { + return &CapturingReceiver{} +} + +// IncError satisfies Receiver; records the kind. +func (c *CapturingReceiver) IncError(kind string) { + c.mu.Lock() + defer c.mu.Unlock() + c.errors = append(c.errors, kind) +} + +// IncEmissions satisfies Receiver; records the n value. Negative +// values are discarded per the Receiver contract. +func (c *CapturingReceiver) IncEmissions(n int64) { + if n < 0 { + return + } + c.mu.Lock() + defer c.mu.Unlock() + c.emissions = append(c.emissions, n) +} + +// ObserveLatency satisfies Receiver; records the duration. +func (c *CapturingReceiver) ObserveLatency(d time.Duration) { + c.mu.Lock() + defer c.mu.Unlock() + c.latencies = append(c.latencies, d) +} + +// SetDegraded satisfies Receiver; records every transition value +// (does not collapse no-op repeats — tests see exactly what the +// receiver called). +func (c *CapturingReceiver) SetDegraded(degraded bool) { + c.mu.Lock() + defer c.mu.Unlock() + c.degraded = append(c.degraded, degraded) +} + +// MarkActivity satisfies Receiver; increments the activity counter. +func (c *CapturingReceiver) MarkActivity() { + c.mu.Lock() + defer c.mu.Unlock() + c.activityHits++ +} + +// Errors returns the kind argument of every IncError call, in +// order. Returned slice is a copy; callers may mutate freely. +func (c *CapturingReceiver) Errors() []string { + c.mu.Lock() + defer c.mu.Unlock() + out := make([]string, len(c.errors)) + copy(out, c.errors) + return out +} + +// Emissions returns the n value of every IncEmissions call. +func (c *CapturingReceiver) Emissions() []int64 { + c.mu.Lock() + defer c.mu.Unlock() + out := make([]int64, len(c.emissions)) + copy(out, c.emissions) + return out +} + +// Latencies returns every ObserveLatency duration. +func (c *CapturingReceiver) Latencies() []time.Duration { + c.mu.Lock() + defer c.mu.Unlock() + out := make([]time.Duration, len(c.latencies)) + copy(out, c.latencies) + return out +} + +// DegradedTransitions returns every SetDegraded argument in order. +func (c *CapturingReceiver) DegradedTransitions() []bool { + c.mu.Lock() + defer c.mu.Unlock() + out := make([]bool, len(c.degraded)) + copy(out, c.degraded) + return out +} + +// ActivityHits returns the cumulative MarkActivity call count. +func (c *CapturingReceiver) ActivityHits() int { + c.mu.Lock() + defer c.mu.Unlock() + return c.activityHits +} diff --git a/internal/selftelemetry/capturing_test.go b/internal/selftelemetry/capturing_test.go new file mode 100644 index 00000000..9a64fda7 --- /dev/null +++ b/internal/selftelemetry/capturing_test.go @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: Apache-2.0 + +package selftelemetry_test + +import ( + "testing" + "time" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/selftelemetry" +) + +// TestCapturingReceiver_RecordsAllFiveMethods pins the test-helper +// contract: every Receiver method is captured with its argument +// (or a hit counter for nullary methods), returned slices are +// copies callers may mutate without affecting the receiver. +func TestCapturingReceiver_RecordsAllFiveMethods(t *testing.T) { + t.Parallel() + + c := selftelemetry.NewCapturingReceiver() + + c.IncError(selftelemetry.KindConnect) + c.IncError(selftelemetry.KindParse) + c.IncEmissions(3) + c.IncEmissions(7) + c.IncEmissions(-1) // contract: silently discarded + c.ObserveLatency(150 * time.Millisecond) + c.SetDegraded(true) + c.SetDegraded(false) + c.MarkActivity() + c.MarkActivity() + c.MarkActivity() + + require.Equal(t, []string{selftelemetry.KindConnect, selftelemetry.KindParse}, c.Errors()) + require.Equal(t, []int64{3, 7}, c.Emissions(), "negative emission discarded") + require.Equal(t, []time.Duration{150 * time.Millisecond}, c.Latencies()) + require.Equal(t, []bool{true, false}, c.DegradedTransitions()) + require.Equal(t, 3, c.ActivityHits()) +} + +// TestCapturingReceiver_ReturnsCopies pins that mutating returned +// slices doesn't poison subsequent accessor calls — tests that +// modify their assertion data don't surprise the next test. +func TestCapturingReceiver_ReturnsCopies(t *testing.T) { + t.Parallel() + + c := selftelemetry.NewCapturingReceiver() + c.IncError(selftelemetry.KindRead) + + errs := c.Errors() + errs[0] = "mutated" + + require.Equal(t, []string{selftelemetry.KindRead}, c.Errors(), + "accessor must return a copy; mutating one slice must not affect subsequent calls") +} diff --git a/internal/selftelemetry/doc.go b/internal/selftelemetry/doc.go index d9b53d19..813c9062 100644 --- a/internal/selftelemetry/doc.go +++ b/internal/selftelemetry/doc.go @@ -1,16 +1,50 @@ // SPDX-License-Identifier: Apache-2.0 -// Package selftelemetry defines the producer-side interface that -// components (receivers, processors, exporters) call to publish -// metrics about their own health and throughput. -// -// The /metrics HTTP endpoint that exposes these to operators is -// owned by the M2 self-telemetry milestone. This package ships -// only the interface and a no-op implementation, so components -// can wire to it from day one and M2 fills in the real impl -// without component-side changes. -// -// Receiver naming mirrors the metric taxonomy in NORTHSTARS.md O2 -// (self-telemetry SLOs). See RFC-0001 §Self-telemetry for the -// architectural contract. +// Package selftelemetry is the per-component PRODUCER CONTRACT for +// reporting health + throughput signals. Receivers, processors, and +// exporters write into it; the process-level surface that exposes +// those signals to operators lives in internal/telemetry. +// +// # Distinction from internal/telemetry +// +// tracecore has two observability-adjacent packages whose names +// look similar but solve different problems: +// +// - internal/selftelemetry (THIS package) — per-component +// PRODUCER contract. One instance per receiver/exporter. Wired +// by the component's factory at construction. Defines: +// Receiver interface (IncError/IncEmissions/ObserveLatency/ +// SetDegraded/MarkActivity), Exporter interface (IncCallSuccess/ +// IncCallFailure), FailureRateReader (for the SLO layer), +// Kind* canonical kind constants, RecordInitError, noop impls. +// +// - internal/telemetry — process-level SURFACE. Owns the +// MeterProvider, the HTTP listener (/metrics, /healthz, +// /readyz), the SLO observable gauges, the build_info +// join-target. One instance per binary. +// +// In short: this package defines what components emit; telemetry +// serves what operators scrape. +// +// # Cardinality contract (load-bearing) +// +// Every label value passed to interface methods (kind on IncError, +// kind on IncCallFailure, the labels passed to RecordInitError) +// MUST be low-cardinality — bounded set across the binary's +// lifetime, ideally drawn from the canonical Kind* constants in +// this package. A receiver passing err.Error() or a Kubernetes +// pod name as a kind would explode Prometheus series cardinality +// and crash downstream scrapers. The contract is documentation- +// only today (no runtime guard); see docs/FOLLOWUPS.md "Runtime +// cap on kind cardinality" for the deferred enforcement. +// +// # Resource attributes +// +// Components reading TelemetrySettings.Resource see auto-populated +// OTel-semconv attributes: host.name, service.name, service.version, +// service.instance.id. The pipelinebuilder injects these at construct +// time so receivers don't have to compose their own resource — see +// docs/agents/RECEIVER-PATTERNS.md for the wiring pattern. +// +// See RFC-0006 for the M2 spec. package selftelemetry diff --git a/internal/selftelemetry/example_test.go b/internal/selftelemetry/example_test.go new file mode 100644 index 00000000..e7ce3674 --- /dev/null +++ b/internal/selftelemetry/example_test.go @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: Apache-2.0 + +package selftelemetry_test + +import ( + "fmt" + "time" + + "go.opentelemetry.io/otel/metric/noop" + + "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/selftelemetry" +) + +// ExampleNewReceiver shows the canonical wiring pattern receiver +// authors copy. Renders in pkg.go.dev and compile-checks on every CI +// run, so RECEIVER-PATTERNS.md's snippet can't drift from a working +// example without breaking this test. +func ExampleNewReceiver() { + id := pipeline.MustNewID(pipeline.MustNewType("examplereceiver"), "primary") + + // In production, set.Telemetry.MeterProvider comes from + // pipeline.CreateSettings (wired by the runtime). Here we use + // the OTel noop provider for an allocation-free godoc example. + mp := noop.NewMeterProvider() + + sr, err := selftelemetry.NewReceiver(id, mp) + if err != nil { + // Fall back to noop so the hot path never nil-checks. + sr = selftelemetry.NewNoopReceiver() + } + + // Hot path — invoke once per collection cycle, success or + // failure. ObserveLatency wraps the downstream push; kind + // values must come from selftelemetry.Kind* constants. + start := time.Now() + pushErr := error(nil) // placeholder for next.ConsumeMetrics(...) + sr.ObserveLatency(time.Since(start)) + if pushErr != nil { + sr.IncError(selftelemetry.KindDownstream) + } else { + sr.IncEmissions(1) + sr.MarkActivity() + } + + fmt.Println("wired") + // Output: wired +} + +// ExampleNewExporter shows the canonical exporter wiring. Same +// pattern as receivers — construct from the MeterProvider in the +// component's factory, fall back to noop on failure, call from the +// hot path with low-cardinality kinds. +func ExampleNewExporter() { + id := pipeline.MustNewID(pipeline.MustNewType("exampleexporter"), "primary") + mp := noop.NewMeterProvider() + + se, err := selftelemetry.NewExporter(id, mp) + if err != nil { + se = selftelemetry.NewNoopExporter() + } + + // On each Consume call, log success or partition the failure + // by kind. + pushErr := error(nil) + if pushErr != nil { + se.IncCallFailure("io") // or "marshal", etc. + } else { + se.IncCallSuccess() + } + + fmt.Println("wired") + // Output: wired +} diff --git a/internal/selftelemetry/exporter_impl.go b/internal/selftelemetry/exporter_impl.go new file mode 100644 index 00000000..dda58866 --- /dev/null +++ b/internal/selftelemetry/exporter_impl.go @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: Apache-2.0 + +package selftelemetry + +import ( + "context" + "fmt" + "sync/atomic" + + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/metric" + + "github.com/tracecoreai/tracecore/internal/pipeline" +) + +// NewExporter returns a real Exporter backed by an OTel counter +// `tracecore.exporter.calls_total{result, kind, component_id}`. The +// returned value also satisfies FailureRateReader so the SLO layer +// (`internal/telemetry.RegisterSLOMetrics`) can aggregate the +// success/failure totals into the `tracecore.exporter.failure_rate` +// observable gauge. +func NewExporter(id pipeline.ID, mp metric.MeterProvider) (Exporter, error) { + if mp == nil { + return nil, ErrNilMeterProvider + } + meter := mp.Meter(instrumentationScope) + componentAttr := attribute.String("component_id", id.String()) + + calls, err := meter.Int64Counter( + "tracecore.exporter.calls_total", + metric.WithDescription("Exporter Consume* calls partitioned by result"), + ) + if err != nil { + return nil, fmt.Errorf("exporter.calls_total counter: %w", err) + } + + return &exporterImpl{ + calls: calls, + attrBase: componentAttr, + successes: &atomic.Uint64{}, + failures: &atomic.Uint64{}, + }, nil +} + +type exporterImpl struct { + calls metric.Int64Counter + attrBase attribute.KeyValue + successes *atomic.Uint64 + failures *atomic.Uint64 +} + +func (e *exporterImpl) IncCallSuccess() { + e.successes.Add(1) + e.calls.Add(context.Background(), 1, metric.WithAttributes( + e.attrBase, + attribute.String("result", "success"), + )) +} + +func (e *exporterImpl) IncCallFailure(kind string) { + e.failures.Add(1) + e.calls.Add(context.Background(), 1, metric.WithAttributes( + e.attrBase, + attribute.String("result", "failure"), + attribute.String("kind", kind), + )) +} + +// SuccessCount + FailureCount satisfy FailureRateReader. The SLO +// layer queries these to compute `tracecore.exporter.failure_rate`. +func (e *exporterImpl) SuccessCount() uint64 { return e.successes.Load() } +func (e *exporterImpl) FailureCount() uint64 { return e.failures.Load() } diff --git a/internal/selftelemetry/exporter_test.go b/internal/selftelemetry/exporter_test.go new file mode 100644 index 00000000..53fd3bc8 --- /dev/null +++ b/internal/selftelemetry/exporter_test.go @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: Apache-2.0 + +package selftelemetry_test + +import ( + "context" + "net/http/httptest" + "testing" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/selftelemetry" + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// TestNewExporter_EmitsCallCounter exercises the success + failure +// paths and asserts the calls_total counter surfaces both labels. +func TestNewExporter_EmitsCallCounter(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + id := pipeline.MustNewID(pipeline.MustNewType("stdoutexporter"), "primary") + exp, err := selftelemetry.NewExporter(id, mp.Provider) + require.NoError(t, err) + + exp.IncCallSuccess() + exp.IncCallSuccess() + exp.IncCallFailure("io") + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + body := scrape(t, srv.URL) + + require.Regexp(t, + `tracecore_exporter_calls_total\{[^}]*component_id="stdoutexporter/primary"[^}]*result="success"[^}]*\}\s+2`, + body) + require.Regexp(t, + `tracecore_exporter_calls_total\{[^}]*kind="io"[^}]*result="failure"[^}]*\}\s+1`, + body) +} + +// TestNewExporter_SatisfiesFailureRateReader pins that the real impl +// implements FailureRateReader so the SLO layer can consume it. +func TestNewExporter_SatisfiesFailureRateReader(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + id := pipeline.MustNewID(pipeline.MustNewType("stdoutexporter"), "x") + exp, err := selftelemetry.NewExporter(id, mp.Provider) + require.NoError(t, err) + + exp.IncCallSuccess() + exp.IncCallSuccess() + exp.IncCallFailure("io") + + r, ok := exp.(selftelemetry.FailureRateReader) + require.True(t, ok, "exporterImpl must satisfy FailureRateReader for the SLO source") + require.EqualValues(t, 2, r.SuccessCount()) + require.EqualValues(t, 1, r.FailureCount()) +} + +// TestNewExporter_NilProviderReturnsError mirrors the Receiver +// contract: callers can't construct against a nil MeterProvider. +func TestNewExporter_NilProviderReturnsError(t *testing.T) { + t.Parallel() + + _, err := selftelemetry.NewExporter( + pipeline.MustNewID(pipeline.MustNewType("x"), "y"), nil) + require.Error(t, err) +} + +// TestNoopExporter pins the discard contract. +func TestNoopExporter(t *testing.T) { + t.Parallel() + + require.NotPanics(t, func() { + e := selftelemetry.NewNoopExporter() + e.IncCallSuccess() + e.IncCallFailure("anything") + }) +} + +// TestRecordInitError_EmitsCounter pins the silent-noop-fallback +// observability path: when a component fails to construct its real +// selftelemetry instrument and falls back to noop, RecordInitError +// ticks `tracecore.selftelemetry.init_errors_total{kind, component_id, +// reason}` so operators alerting on `> 0` see the silent +// degradation. +func TestRecordInitError_EmitsCounter(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + selftelemetry.RecordInitError(context.Background(), mp.Provider, + "receiver", "clockreceiver/primary", "instrument_register") + selftelemetry.RecordInitError(context.Background(), mp.Provider, + "exporter", "stdoutexporter/primary", "instrument_register") + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + body := scrape(t, srv.URL) + + require.Contains(t, body, "tracecore_selftelemetry_init_errors_total") + require.Regexp(t, + `tracecore_selftelemetry_init_errors_total\{[^}]*component_id="clockreceiver/primary"[^}]*kind="receiver"[^}]*reason="instrument_register"[^}]*\}\s+1`, + body) + require.Regexp(t, + `tracecore_selftelemetry_init_errors_total\{[^}]*component_id="stdoutexporter/primary"[^}]*kind="exporter"[^}]*reason="instrument_register"[^}]*\}\s+1`, + body) +} + +// TestRecordInitError_NilProviderIsNoOp pins the safety contract: +// the meta-observability path must never crash even when the +// MeterProvider that should hold the counter is itself nil. +func TestRecordInitError_NilProviderIsNoOp(t *testing.T) { + t.Parallel() + require.NotPanics(t, func() { + selftelemetry.RecordInitError(context.Background(), nil, + "receiver", "x", "any") + }) +} diff --git a/internal/selftelemetry/impl_test.go b/internal/selftelemetry/impl_test.go new file mode 100644 index 00000000..f0cfb426 --- /dev/null +++ b/internal/selftelemetry/impl_test.go @@ -0,0 +1,259 @@ +// SPDX-License-Identifier: Apache-2.0 + +package selftelemetry_test + +import ( + "context" + "fmt" + "io" + "net/http" + "net/http/httptest" + "sync" + "testing" + "time" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/selftelemetry" + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// componentID returns a stable test pipeline.ID for use as the +// component identity the impl attaches as a label on every emission. +func componentID(t *testing.T, instance string) pipeline.ID { + t.Helper() + return pipeline.MustNewID(pipeline.MustNewType("clockreceiver"), instance) +} + +// setup constructs a real MeterProvider + the selftelemetry impl +// wired to it, and returns the scrape URL. Each test scrapes +// /metrics text-exposition output and asserts presence of the +// expected metric line. +func setup(t *testing.T, instance string) (selftelemetry.Receiver, string) { + t.Helper() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + recv, err := selftelemetry.NewReceiver(componentID(t, instance), mp.Provider) + require.NoError(t, err) + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + + return recv, srv.URL +} + +func scrape(t *testing.T, url string) string { + t.Helper() + resp, err := http.Get(url) //nolint:gosec,noctx // test code + require.NoError(t, err) + t.Cleanup(func() { _ = resp.Body.Close() }) + body, err := io.ReadAll(resp.Body) + require.NoError(t, err) + return string(body) +} + +// TestReceiver_IncError emits and asserts the errors_total counter +// appears in the scrape with a `kind` attribute and a `component_id` +// pointing at the receiver instance. +func TestReceiver_IncError(t *testing.T) { + t.Parallel() + recv, url := setup(t, "TestReceiver_IncError") + + recv.IncError("connect") + recv.IncError("connect") + recv.IncError("parse") + + body := scrape(t, url) + require.Contains(t, body, "tracecore_receiver_errors_total") + require.Regexp(t, + `tracecore_receiver_errors_total\{[^}]*component_id="clockreceiver/TestReceiver_IncError"[^}]*kind="connect"[^}]*\}\s+2`, + body, "connect counter at 2") + require.Regexp(t, + `tracecore_receiver_errors_total\{[^}]*kind="parse"[^}]*\}\s+1`, + body, "parse counter at 1") +} + +// TestReceiver_IncEmissions exercises the emissions counter; also +// pins that negative values are silently discarded per the interface +// contract. +func TestReceiver_IncEmissions(t *testing.T) { + t.Parallel() + recv, url := setup(t, "TestReceiver_IncEmissions") + + recv.IncEmissions(3) + recv.IncEmissions(5) + recv.IncEmissions(-1) // contract: silently discarded + + body := scrape(t, url) + require.Regexp(t, + `tracecore_receiver_emissions_total\{[^}]*component_id="clockreceiver/TestReceiver_IncEmissions"[^}]*\}\s+8`, + body, "emissions at 8 (3+5; -1 discarded)") +} + +// TestReceiver_ObserveLatency records latencies and asserts the +// histogram surfaces with the expected sum + count buckets. +func TestReceiver_ObserveLatency(t *testing.T) { + t.Parallel() + recv, url := setup(t, "TestReceiver_ObserveLatency") + + recv.ObserveLatency(100 * time.Millisecond) + recv.ObserveLatency(200 * time.Millisecond) + + body := scrape(t, url) + // Histogram exposes _bucket, _sum, _count for the metric name. + require.Contains(t, body, "tracecore_receiver_collection_latency_seconds_bucket") + require.Contains(t, body, "tracecore_receiver_collection_latency_seconds_count") + require.Regexp(t, + `tracecore_receiver_collection_latency_seconds_count\{[^}]*\}\s+2`, + body, "histogram count at 2 observations") +} + +// TestReceiver_SetDegraded transitions degraded on, sleeps, transitions +// off, and asserts the degraded_seconds_total counter advanced. +// The observable counter reports cumulative time while degraded +// (continuous accumulation), so even calling once with no recover +// should produce a positive value on the next scrape. +func TestReceiver_SetDegraded(t *testing.T) { + t.Parallel() + recv, url := setup(t, "TestReceiver_SetDegraded") + + recv.SetDegraded(true) + // Sleep so the observable counter can witness elapsed degraded + // time at the next scrape. + time.Sleep(50 * time.Millisecond) + + body := scrape(t, url) + require.Contains(t, body, "tracecore_receiver_degraded_seconds_total") + // Value must be >0 (>= ~0.05s). + require.Regexp(t, + `tracecore_receiver_degraded_seconds_total\{[^}]*component_id="clockreceiver/TestReceiver_SetDegraded"[^}]*\}\s+0\.0[0-9]+`, + body, "degraded counter advanced past 0") + + recv.SetDegraded(false) +} + +// TestReceiver_MarkActivity ticks activity and asserts the +// last_activity_unix_seconds gauge shows roughly "now". +func TestReceiver_MarkActivity(t *testing.T) { + t.Parallel() + recv, url := setup(t, "TestReceiver_MarkActivity") + + before := time.Now().Unix() + recv.MarkActivity() + after := time.Now().Unix() + + body := scrape(t, url) + require.Contains(t, body, "tracecore_receiver_last_activity_unix_seconds") + // Just assert the metric is present + value within the + // before/after band; exact compare with regex on the body is + // brittle. + require.Regexp(t, + `tracecore_receiver_last_activity_unix_seconds\{[^}]*\}\s+\d+`, + body) + _ = before + _ = after +} + +// TestReceiver_NewReceiver_NilProvider documents that constructing +// with a nil provider returns a non-nil error; the runtime must +// substitute a noop provider before calling NewReceiver. +func TestReceiver_NewReceiver_NilProvider(t *testing.T) { + t.Parallel() + + _, err := selftelemetry.NewReceiver(componentID(t, "x"), nil) + require.Error(t, err, "nil MeterProvider must be rejected") +} + +// TestIsCanonicalKind pins the canonical-kind set + the helper that +// linters / tests use to enforce it. Adding or removing a Kind* +// constant must update this test, which forces the canonical-set +// change to be intentional rather than silent. +func TestIsCanonicalKind(t *testing.T) { + t.Parallel() + + for _, k := range selftelemetry.CanonicalKinds() { + require.True(t, selftelemetry.IsCanonicalKind(k), + "%q is in CanonicalKinds() but IsCanonicalKind returned false", k) + } + + require.False(t, selftelemetry.IsCanonicalKind("")) + require.False(t, selftelemetry.IsCanonicalKind("not_a_kind")) + require.False(t, selftelemetry.IsCanonicalKind("DOWNSTREAM"), + "case-sensitive: uppercase is NOT canonical") +} + +// TestClockreceiverUsesCanonicalKinds pins that the in-tree receiver +// emits only canonical kinds. Future receivers (M8/M9) get tested +// the same way in their own packages; this is the M2-side anchor. +func TestClockreceiverUsesCanonicalKinds(t *testing.T) { + t.Parallel() + + // The clockreceiver.emit hot path calls IncError("downstream") + // on push failure. Pinned here so a refactor that introduces a + // non-canonical kind fails this test alongside the existing + // clockreceiver tests. + require.True(t, selftelemetry.IsCanonicalKind(selftelemetry.KindDownstream)) +} + +// TestReceiver_ConcurrentRegistration_SameNames pins Loop 1 +// falsifier DF2: two receivers constructing concurrently against +// the same MeterProvider register the same instrument names. The +// OTel SDK must handle this idempotently — neither call returns an +// error, and both receivers' emissions surface in the scrape. +// +// Operationally this is the M8/M9 case: multiple receiver components +// share one MeterProvider; the runtime spawns them in parallel +// (Component.Start). A non-idempotent registration would cause a +// race-dependent crash at the second-fastest receiver's first call. +func TestReceiver_ConcurrentRegistration_SameNames(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + const concurrent = 8 + var wg sync.WaitGroup + receivers := make([]selftelemetry.Receiver, concurrent) + errs := make([]error, concurrent) + wg.Add(concurrent) + for i := range concurrent { + go func(i int) { + defer wg.Done() + id := pipeline.MustNewID( + pipeline.MustNewType("clockreceiver"), + fmt.Sprintf("conc%d", i), + ) + r, e := selftelemetry.NewReceiver(id, mp.Provider) + receivers[i] = r + errs[i] = e + }(i) + } + wg.Wait() + + for i, e := range errs { + require.NoError(t, e, "concurrent NewReceiver #%d must succeed", i) + require.NotNil(t, receivers[i]) + } + + // All receivers must be able to emit without panic or error. + for i, r := range receivers { + r.IncEmissions(int64(i + 1)) + r.IncError("test") + r.MarkActivity() + } + + // Scrape must expose entries for every component_id. + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + body := scrape(t, srv.URL) + + for i := range concurrent { + require.Contains(t, body, fmt.Sprintf("clockreceiver/conc%d", i), + "component_id #%d must surface", i) + } +} diff --git a/internal/selftelemetry/init_errors.go b/internal/selftelemetry/init_errors.go new file mode 100644 index 00000000..03622e21 --- /dev/null +++ b/internal/selftelemetry/init_errors.go @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: Apache-2.0 + +package selftelemetry + +import ( + "context" + + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/metric" +) + +// RecordInitError emits one tick on +// `tracecore.selftelemetry.init_errors_total` so the silent noop- +// fallback path in receiver / exporter factories +// (when NewReceiver / NewExporter fails or MeterProvider is nil +// at the component level) is observable in the scrape — operators +// alerting on `> 0` see when self-telemetry isn't really wired. +// +// `kind` is "receiver" or "exporter"; `componentID` is the +// pipeline.ID string. `reason` is a low-cardinality category +// (e.g. "nil_provider", "instrument_register"). Errors from the +// counter registration itself are swallowed — the meta-observability +// path must not crash callers. +func RecordInitError(ctx context.Context, mp metric.MeterProvider, kind, componentID, reason string) { + if mp == nil { + return + } + meter := mp.Meter(instrumentationScope) + c, err := meter.Int64Counter( + "tracecore.selftelemetry.init_errors_total", + metric.WithDescription("Counter of self-telemetry construction failures that fell back to the noop implementation."), + ) + if err != nil { + return + } + c.Add(ctx, 1, metric.WithAttributes( + attribute.String("kind", kind), + attribute.String("component_id", componentID), + attribute.String("reason", reason), + )) +} diff --git a/internal/selftelemetry/interface.go b/internal/selftelemetry/interface.go index ecf4212a..9bca39bf 100644 --- a/internal/selftelemetry/interface.go +++ b/internal/selftelemetry/interface.go @@ -4,6 +4,171 @@ package selftelemetry import "time" +// Canonical `kind` values for IncError. Receiver authors SHOULD pick +// from this set so cross-receiver Prometheus dashboards on +// `tracecore_receiver_errors_total{kind=…}` partition consistently. +// Adding a new kind is non-breaking; document it in the receiver's +// README and update RECEIVER-PATTERNS.md if it's broadly applicable. +// +// The constants here intentionally exclude vendor-SDK-specific names +// (e.g. `nvml_init`); those go in the receiver's own const block. +const ( + // KindConnect — failure to reach the upstream source (network + // dial, library Init, fd open). + KindConnect = "connect" + + // KindParse — failure to parse data the source produced + // (malformed line, schema mismatch, expected-field-missing). + KindParse = "parse" + + // KindDownstream — next consumer in the pipeline returned an + // error from Consume*. + KindDownstream = "downstream" + + // KindEnumerate — failure to enumerate sources/devices (e.g. + // nvml device count, dcgm field group list). + KindEnumerate = "enumerate" + + // KindRead — failure during a read from an already-open source + // (read syscall error, library FetchSample error). + KindRead = "read" + + // KindCardinality — a cardinality cap was hit; the receiver + // dropped a series rather than emit unbounded labels. + KindCardinality = "cardinality" + + // KindPanic — a panic was recovered inside the receiver's + // hot path (per `internal/safe.Call` or a defer/recover). + KindPanic = "panic" + + // KindInit — error during the receiver's Start phase before it + // became operational. + KindInit = "init" +) + +// CanonicalKinds returns the in-tree-approved set of `kind` values +// receivers MAY pass to IncError / IncCallFailure. Receivers that +// need a kind outside this set should propose an extension via RFC +// or add a comment justifying the divergence in their own README. +// Returned slice is freshly allocated; callers may mutate. +func CanonicalKinds() []string { + return []string{ + KindConnect, KindParse, KindDownstream, + KindEnumerate, KindRead, KindCardinality, + KindPanic, KindInit, + } +} + +// IsCanonicalKind reports whether `kind` matches one of the +// canonical Kind* constants. Useful in tests + lints to keep +// cross-receiver `tracecore_receiver_errors_total{kind=…}` +// dashboards consistent. +func IsCanonicalKind(kind string) bool { + for _, k := range CanonicalKinds() { + if k == kind { + return true + } + } + return false +} + +// Canonical `reason` values for RecordInitError. Same cardinality +// discipline as the Kind* constants — RecordInitError surfaces +// silent-noop-fallback paths to operators, and cross-receiver +// dashboards on `tracecore_selftelemetry_init_errors_total{reason=…}` +// need a stable set. +const ( + // ReasonNilProvider — the MeterProvider field on the component's + // TelemetrySettings was nil at construction. + ReasonNilProvider = "nil_provider" + + // ReasonInstrumentRegister — MeterProvider.NewExporter / + // NewReceiver returned an error registering one of the OTel + // instruments (e.g., name collision, SDK validation failure). + ReasonInstrumentRegister = "instrument_register" + + // ReasonUnsupportedSDK — the bound OTel SDK version doesn't + // expose an API tracecore needs (e.g., observable callbacks + // added in a later SDK). Reserved for forward-compat use; not + // currently emitted in tree. + ReasonUnsupportedSDK = "unsupported_sdk" +) + +// CanonicalReasons returns the in-tree-approved set of `reason` +// values RecordInitError accepts. Receiver authors stick to this set +// so dashboards stay consistent; extending the set is an RFC. +func CanonicalReasons() []string { + return []string{ + ReasonNilProvider, ReasonInstrumentRegister, ReasonUnsupportedSDK, + } +} + +// IsCanonicalReason reports whether `reason` is one of the +// canonical Reason* constants. +func IsCanonicalReason(reason string) bool { + for _, r := range CanonicalReasons() { + if r == reason { + return true + } + } + return false +} + +// Exporter is the producer-side surface an exporter component writes +// to when reporting per-call outcomes. The /metrics endpoint surfaces +// these as `tracecore.exporter.calls_total{result,kind}` plus the +// derived `tracecore.exporter.failure_rate` observable gauge. +// +// All methods are non-blocking and safe for concurrent use. The +// no-op implementation discards every call. +type Exporter interface { + // IncCallSuccess records one successful Consume* call. + IncCallSuccess() + + // IncCallFailure records one failed Consume* call, partitioned + // by `kind`. `kind` MUST be low-cardinality (e.g., "downstream", + // "encode", "io"); never the full error message — it explodes + // the metric's cardinality. + IncCallFailure(kind string) +} + +// FailureRateReader is implemented by Exporter implementations whose +// internal counts can be queried for the derived failure_rate +// observable gauge. The runtime aggregates across registered exporters +// to compute `tracecore.exporter.failure_rate`. +// +// Implementations MUST be pointer types (the runtime's dedup map +// keys on FailureRateReader by interface identity; value-typed impls +// with identical fields would compare equal and be incorrectly +// deduped). The in-tree `*exporterImpl` from NewExporter satisfies +// this; receiver-author Exporter wrappers should too. +type FailureRateReader interface { + // SuccessCount returns the cumulative successful call count. + SuccessCount() uint64 + // FailureCount returns the cumulative failed call count. + FailureCount() uint64 +} + +// ExporterCarrier is implemented by exporter components that expose +// their per-call selftelemetry handle to the runtime. The runtime +// walks built pipelines looking for components satisfying this +// contract; it uses the returned Exporter (if it also satisfies +// FailureRateReader) to feed the `tracecore.exporter.failure_rate` +// observable gauge. +// +// Receiver-author template (see clockreceiver / stdoutexporter): +// +// func (e *myExporter) SelfExporter() selftelemetry.Exporter { +// return e.selfExp +// } +// +// Components that don't implement this are silently skipped — the +// SLO surface degrades to "no per-exporter signal" rather than +// erroring at boot. +type ExporterCarrier interface { + SelfExporter() Exporter +} + // Receiver is the producer-side surface a receiver component writes // to when reporting its own health. The /metrics endpoint that // surfaces these to operators is owned by M2. @@ -28,12 +193,25 @@ type Receiver interface { // receiver_collection_latency_seconds histogram. Receivers // should call this once per collection cycle, measuring from // "start of work" to "consumer.Consume returned". + // + // Conventionally observed for BOTH successful and failed + // pushes — operators see the union and can split via PromQL + // if needed. Splitting into per-result histograms is a future + // ergonomics carry-forward; today the convention is "always + // observe." ObserveLatency(d time.Duration) // SetDegraded transitions the receiver's degraded state. // Implementations derive receiver_degraded_seconds_total from // the time between SetDegraded(true) and SetDegraded(false). // Calling with the current state is a no-op. + // + // Receivers that boot already-degraded (e.g. NVML init fails + // at Start) MUST call SetDegraded(true) explicitly during + // Start so the cumulative-degraded counter begins ticking; + // otherwise `degraded_seconds_total` stays at 0 until first + // recovery, hiding boot-time failures from operators alerting + // on `> 0`. SetDegraded(degraded bool) // MarkActivity records that the receiver successfully diff --git a/internal/selftelemetry/noop.go b/internal/selftelemetry/noop.go index c49fa639..6aa339ce 100644 --- a/internal/selftelemetry/noop.go +++ b/internal/selftelemetry/noop.go @@ -18,3 +18,14 @@ func (noopReceiver) IncEmissions(int64) {} func (noopReceiver) ObserveLatency(time.Duration) {} func (noopReceiver) SetDegraded(bool) {} func (noopReceiver) MarkActivity() {} + +// noopExporter discards every call. Returned by NewNoopExporter for +// tests + as the import-time default. +type noopExporter struct{} + +// NewNoopExporter returns an Exporter that discards every call. +// Safe to share across goroutines (it carries no state). +func NewNoopExporter() Exporter { return noopExporter{} } + +func (noopExporter) IncCallSuccess() {} +func (noopExporter) IncCallFailure(string) {} diff --git a/internal/selftelemetry/receiver_impl.go b/internal/selftelemetry/receiver_impl.go new file mode 100644 index 00000000..15d35732 --- /dev/null +++ b/internal/selftelemetry/receiver_impl.go @@ -0,0 +1,213 @@ +// SPDX-License-Identifier: Apache-2.0 + +package selftelemetry + +import ( + "context" + "errors" + "fmt" + "sync/atomic" + "time" + + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/metric" + + "github.com/tracecoreai/tracecore/internal/pipeline" +) + +// ErrNilMeterProvider is returned by NewReceiver / NewExporter when +// the caller passes a nil metric.MeterProvider. The runtime is +// responsible for substituting `noop.NewMeterProvider()` before +// constructing component telemetry — this error catches a wire-up +// regression rather than letting the noop hide the bug at runtime. +var ErrNilMeterProvider = errors.New("selftelemetry: MeterProvider is nil") + +// Package-stable Meter scope so the Prom exporter doesn't stamp +// per-instance `otel_scope_name` labels — `component_id` already +// partitions; doubling would balloon cardinality. +const instrumentationScope = "github.com/tracecoreai/tracecore/internal/selftelemetry" + +// NewReceiver returns a real Receiver backed by OTel metric +// instruments acquired from `mp`. The component's `id` is attached +// as a `component_id` label on every emission. +// +// Five instruments are registered: +// +// - tracecore.receiver.errors_total (Int64Counter, attr: kind, +// component_id) — IncError. +// - tracecore.receiver.emissions_total (Int64Counter, attr: +// component_id) — IncEmissions. +// - tracecore.receiver.collection_latency_seconds (Float64Histogram, +// attr: component_id) — ObserveLatency. +// - tracecore.receiver.degraded_seconds_total (Float64Observable +// Counter, attr: component_id) — derived from SetDegraded +// timing; reports cumulative time spent degraded, including +// elapsed time while currently degraded. +// - tracecore.receiver.last_activity_unix_seconds (Int64Observable +// Gauge, attr: component_id) — MarkActivity timestamp. +// +// Names follow `tracecore.*` per docs/STRATEGY.md M2 row; the OTel +// Prometheus exporter substitutes `.` for `_` in the scraped text, +// so operators see `tracecore_receiver_*_total`. +func NewReceiver(id pipeline.ID, mp metric.MeterProvider) (Receiver, error) { + if mp == nil { + return nil, ErrNilMeterProvider + } + + meter := mp.Meter(instrumentationScope) + attrSet := attribute.NewSet(attribute.String("component_id", id.String())) + + errsCtr, err := meter.Int64Counter( + "tracecore.receiver.errors_total", + metric.WithDescription("Errors observed by a receiver, partitioned by kind"), + ) + if err != nil { + return nil, fmt.Errorf("errors_total counter: %w", err) + } + + emissionsCtr, err := meter.Int64Counter( + "tracecore.receiver.emissions_total", + metric.WithDescription("Data points / events emitted by a receiver"), + ) + if err != nil { + return nil, fmt.Errorf("emissions_total counter: %w", err) + } + + latencyHist, err := meter.Float64Histogram( + "tracecore.receiver.collection_latency_seconds", + metric.WithDescription("Receiver collection cycle latency in seconds"), + metric.WithUnit("s"), + // Default OTel buckets (5/10/25/…/10000 ms) miss the + // sub-millisecond resolution receivers commonly produce. + // These boundaries span 100µs to 10s logarithmically so a + // 200µs collection cycle and a 2s slow one both land in + // distinct buckets. + metric.WithExplicitBucketBoundaries( + 0.0001, 0.001, 0.005, 0.01, 0.05, + 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, + ), + ) + if err != nil { + return nil, fmt.Errorf("collection_latency_seconds histogram: %w", err) + } + + r := &receiverImpl{ + attrs: attrSet, + errors: errsCtr, + emissions: emissionsCtr, + latency: latencyHist, + } + // Seed to construction time. Alerts of the form + // `time() - last_activity > 60` would otherwise fire on every + // fresh receiver's zero-valued gauge during boot. + r.activityUnix.Store(time.Now().Unix()) + + if _, err := meter.Float64ObservableCounter( + "tracecore.receiver.degraded_seconds_total", + metric.WithDescription("Cumulative seconds the receiver has been in the degraded state"), + metric.WithUnit("s"), + metric.WithFloat64Callback(func(_ context.Context, obs metric.Float64Observer) error { + obs.Observe(r.degradedTotalSeconds(), metric.WithAttributeSet(attrSet)) + return nil + }), + ); err != nil { + return nil, fmt.Errorf("degraded_seconds_total observable: %w", err) + } + + if _, err := meter.Int64ObservableGauge( + "tracecore.receiver.last_activity_unix_seconds", + metric.WithDescription("Unix-second timestamp of the receiver's last successful activity"), + metric.WithInt64Callback(func(_ context.Context, obs metric.Int64Observer) error { + obs.Observe(r.activityUnix.Load(), metric.WithAttributeSet(attrSet)) + return nil + }), + ); err != nil { + return nil, fmt.Errorf("last_activity_unix_seconds observable: %w", err) + } + + return r, nil +} + +type receiverImpl struct { + attrs attribute.Set + + errors metric.Int64Counter + emissions metric.Int64Counter + latency metric.Float64Histogram + + // degradedAt holds the time of the most recent SetDegraded(true). + // Nil pointer means "not currently degraded." Atomic pointer so + // SetDegraded can update lock-free via CAS/Swap and the + // observable-counter callback reads a stable snapshot. + degradedAt atomic.Pointer[time.Time] + + // accumulated holds the cumulative nanoseconds spent degraded + // across completed degrade→recover cycles. degradedTotalSeconds + // adds the open-interval contribution at observation time. + accumulated atomic.Uint64 + + // activityUnix holds the Unix-second timestamp of the most recent + // MarkActivity. Zero until first call. + activityUnix atomic.Int64 +} + +func (r *receiverImpl) IncError(kind string) { + // One WithAttributes call rather than WithAttributeSet+WithAttributes — + // the latter relies on the SDK's cross-version merge semantics. + componentID, _ := r.attrs.Value("component_id") + r.errors.Add(context.Background(), 1, metric.WithAttributes( + attribute.String("component_id", componentID.AsString()), + attribute.String("kind", kind), + )) +} + +func (r *receiverImpl) IncEmissions(n int64) { + if n < 0 { + return + } + r.emissions.Add(context.Background(), n, metric.WithAttributeSet(r.attrs)) +} + +func (r *receiverImpl) ObserveLatency(d time.Duration) { + r.latency.Record(context.Background(), d.Seconds(), metric.WithAttributeSet(r.attrs)) +} + +// SetDegraded transitions the receiver between degraded and recovered +// states. Lock-free: enter via CAS(nil → &now), exit via Swap → nil +// and accumulate the elapsed interval. The observable callback reads +// both `accumulated` and `degradedAt` without synchronization; on a +// concurrent transition it may briefly observe an under-count (the +// just-closed interval flushed before the pointer cleared) — this is +// a microsecond-scale artifact that self-corrects on the next scrape +// and matches the precision of the underlying histogram bucket. +func (r *receiverImpl) SetDegraded(degraded bool) { + if degraded { + now := time.Now() + // If we're already degraded the CAS fails and we leave the + // existing pointer in place — calling SetDegraded(true) + // twice doesn't reset the start time. + r.degradedAt.CompareAndSwap(nil, &now) + return + } + if old := r.degradedAt.Swap(nil); old != nil { + elapsed := time.Since(*old) + if elapsed > 0 { + r.accumulated.Add(uint64(elapsed.Nanoseconds())) + } + } +} + +func (r *receiverImpl) MarkActivity() { + r.activityUnix.Store(time.Now().Unix()) +} + +// degradedTotalSeconds is called from the observable-counter callback. +// Returns the cumulative degraded-seconds count including any +// currently-open interval. +func (r *receiverImpl) degradedTotalSeconds() float64 { + acc := time.Duration(r.accumulated.Load()) + if openStart := r.degradedAt.Load(); openStart != nil { + acc += time.Since(*openStart) + } + return acc.Seconds() +} diff --git a/internal/telemetry/README.md b/internal/telemetry/README.md new file mode 100644 index 00000000..cbb591d4 --- /dev/null +++ b/internal/telemetry/README.md @@ -0,0 +1,276 @@ +# `internal/telemetry` + +Producer side of tracecore's self-telemetry surface. M2 wires +`MeterProvider` + an HTTP server (`/metrics`, `/healthz`, `/readyz`) +into every component via `pipeline.TelemetrySettings`. + +See [RFC-0006](../../docs/rfcs/RFC-0006-self-telemetry-surface.md) +for the spec. + +## Operator quickstart + +A runnable example config lives at +[`docs/examples/with-telemetry.yaml`](../../docs/examples/with-telemetry.yaml). +The block: + +```yaml +telemetry: + enabled: true + listen: "localhost:8888" + paths: + metrics: /metrics + healthz: /healthz + readyz: /readyz +``` + +Then scrape: + +```bash +curl localhost:8888/metrics +``` + +Default `enabled: false`. The binary doesn't bind a port until the +operator opts in. + +## Endpoints + +| Path | Status | When | +|---|---|---| +| `/metrics` | 200 + text-exposition | always while server up | +| `/healthz` | 200 | server up, not shutting down | +| `/healthz` | 503 | shutting down | +| `/readyz` | 503 | server up, runtime hasn't finished Start | +| `/readyz` | 200 | runtime running | +| `/readyz` | 503 | shutting down | + +`/readyz` does NOT return 503 on transient receiver-degraded events — +that policy decision keeps k8s from evicting pods on a noisy backend. +Operators wanting a sharper signal write their own probe consuming +`/metrics` directly (out of scope for M2). + +## Self-metric taxonomy + +All five `selftelemetry.Receiver` methods surface as +`tracecore.receiver.*` metrics. The OTel Prometheus exporter +substitutes `.` for `_` in the scraped text, so: + +| Method | Prom exposition | +|---|---| +| `IncError(kind)` | `tracecore_receiver_errors_total{kind,component_id}` | +| `IncEmissions(n)` | `tracecore_receiver_emissions_total{component_id}` | +| `ObserveLatency(d)` | `tracecore_receiver_collection_latency_seconds{component_id}` (histogram) | +| `SetDegraded(bool)` | `tracecore_receiver_degraded_seconds_total{component_id}` (observable counter — accumulates while degraded) | +| `MarkActivity()` | `tracecore_receiver_last_activity_unix_seconds{component_id}` (gauge) | + +`component_id` is `"/"` (e.g., +`clockreceiver/primary`). + +`kind` on `errors_total` MUST be low-cardinality — +e.g., `"connect"`, `"parse"`, `"downstream"`. Receivers that pass +arbitrary error messages explode cardinality and degrade the surface. + +### Exporter-side self-metrics + +Exporters use `selftelemetry.NewExporter(id, mp)` — same MeterProvider +flow as receivers. The real impl emits: + +| Method | Prom exposition | +|---|---| +| `IncCallSuccess()` | `tracecore_exporter_calls_total{result="success",component_id}` | +| `IncCallFailure(kind)` | `tracecore_exporter_calls_total{result="failure",kind,component_id}` | + +The runtime aggregates success+failure counts across registered +exporters to feed the `tracecore.exporter.failure_rate` SLO gauge. + +### O2 SLO observable gauges + +Three named gauges driven by `telemetry.RegisterSLOMetrics`: + +| Metric | Source | +|---|---| +| `tracecore_exporter_failure_rate` | **Rolling 60s window** failure / (success+failure) aggregated across registered exporters; 0 while warming up and when no in-window calls. The lifetime ratio is intentionally NOT exposed — operators alerting on `> 0` want recent signal, not historical sediment. Operators wanting custom windows compute from the raw `tracecore_exporter_calls_total` counter via PromQL. | +| `tracecore_queue_depth_ratio` | depth / capacity of an internal queue (M2: always 0; depends on a queue mechanism — carry-forward) | +| `tracecore_component_restart_count_per_hour` | restart rate scaled to per-hour (M2: always 0; depends on runtime restart impl — carry-forward) | + +The two carry-forward gauges still register — operators see the +names in the scrape and can write Prometheus alerts against them +before the underlying mechanisms ship. + +### Probe content negotiation + +`/healthz` and `/readyz` default to `text/plain` bodies for k8s +probes + curl. Tools wanting structured payloads send +`Accept: application/json`: + +```bash +$ curl -H 'Accept: application/json' localhost:8888/readyz +{"status":"ready"} +``` + +The status string is the same value the text body emits +(`"ok" | "ready" | "not ready" | "shutting down"`). Future fields +added to the JSON payload don't break the text contract. + +### Build identity + +`tracecore_build_info` is an observable gauge whose value is always +1 and whose labels carry the running binary's identity. Mirrors the +`otelcol_build_info` / `prometheus_build_info` convention. + +| Label | Source | +|---|---| +| `command` | `BuildInfo.Command`, typically `"tracecore"` | +| `version` | `BuildInfo.Version`, from `internal/version` (e.g. `"v0.1.0-m1-22-g6b4e703"`) | +| `revision` | `internal/version.Revision`, the short git SHA | + +Use it as a PromQL join-target to surface version metadata next to +any tracecore series: + +```promql +rate(tracecore_exporter_calls_total[5m]) + * on() group_left(version, revision) tracecore_build_info +``` + +One series per process (revision changes on rebuild = new series; +this is expected and matches OTel / Prometheus convention). + +## Security posture + +The HTTP surface ships PLAINTEXT — no TLS, no authentication, no +authorization. Default `listen: "localhost:8888"` keeps it +loopback-only; flipping to a non-loopback address (e.g. `":8888"`) +exposes `/metrics`, `/healthz`, `/readyz` plaintext to anything +that can route a TCP packet to the port. + +For non-loopback deployments: + +- Front with a reverse proxy (Envoy, nginx, Istio sidecar) that + terminates TLS and (optionally) enforces authn/authz. +- Allow inbound TCP only from your Prometheus scraper's IP range. +- `/metrics` leaks operational topology (component IDs, exporter + kinds, failure counts, build SHA + version via + `tracecore_build_info`) — treat the endpoint as RFC-1918 / internal. + +The slowloris-style timeouts (`ReadHeaderTimeout`, `ReadTimeout`, +`WriteTimeout`, `IdleTimeout`) and the 8 KiB `MaxHeaderBytes` cap +keep a single-attacker resource burn bounded if exposed, but they +do not substitute for the proxy. + +A panic in any handler (including OTel observable callbacks like +SLOSource computations) is caught by middleware: the request gets +a 500 and the panic value lands in the configured Logger via +`cfg.Logger.Error`. The binary stays alive. + +## Performance + +Self-telemetry must not perturb the workload (PRINCIPLES §1). +Benchmarks at `internal/telemetry/slo_bench_test.go`: + +### SLO callback (AggregateSLOSource.ExporterFailureRate) + +Measures the full per-scrape path — walk registered exporters, +sample their atomic counters, run the rolling-window math, append +to the sample ring. + +| Registered exporters | ns/op | B/op | allocs/op | +|---|---|---|---| +| 1 | ~3.85µs | 268 | 0 | +| 10 | ~3.87µs | 268 | 0 | +| 100 | ~4.51µs | 268 | 0 | + +### WindowedRate.Observe (primitive in isolation) + +Measures the rolling-window math without the registry walk — +sizes the cost a future SLI gauge inherits when it reuses +`WindowedRate` directly. + +| | ns/op | B/op | allocs/op | +|---|---|---|---| +| Steady-state Observe | ~3.85µs | 268 | 0 | + +(Apple M4 Pro, Go 1.26.3, `-race=false`. Re-run via +`go test -bench Benchmark ./internal/telemetry/ -benchmem -count=3`.) + +### What this means in operator terms + +- 15s Prometheus scrape interval × 4.5µs/scrape = **0.00003% CPU** + on one core, even at 100 registered exporters. +- 0 allocs/op steady-state — the ring buffer reuses backing memory + past the 2× window pruning point. +- NORTHSTARS O2 budgets per-receiver overhead at <0.05%; this + surface contributes negligible to that envelope. + +### Regression detection + +Treat the numbers above as a baseline. To check for regression +locally: + +```bash +go test -bench Benchmark -benchmem -count=10 ./internal/telemetry/ > new.txt +# compare against baseline.txt (from main): +go tool benchstat baseline.txt new.txt +``` + +A degradation of >10% (geometric mean across the four bench rows) +should block a PR; CI integration of this gate is filed under +`docs/FOLLOWUPS.md` "bench regression detection in CI." Until that +lands, this is a manual check at PR-review time. + +## Backend compatibility + +`/metrics` emits standard Prometheus text-exposition format via the +OTel Prometheus exporter (`EnableOpenMetrics: false`). Compatibility +matrix: + +| Backend | Status | Notes | +|---|---|---| +| Prometheus + Grafana | **verified** | Standard `scrape_configs` job; auto-discovery via k8s annotations works | +| Mimir, Thanos | **verified-by-protocol** | Prom-compatible scrape interfaces; not separately CI-tested | +| VictoriaMetrics | **verified-by-protocol** | Same | +| Datadog Agent (OpenMetrics integration) | **unverified** | The Agent's `openmetrics` check should scrape this output, but tracecore has not been end-to-end-tested against a Datadog Agent. Operators report back via FOLLOWUPS.md if it works | +| OTLP push backends | **not supported in M2** | Pull-only surface; add OTLP push reader to the MeterProvider as a future milestone | + +"verified" = CI-tested against a live instance. +"verified-by-protocol" = should work by spec but not separately tested. +"unverified" = the claim is reasoned but no one's run it. + +## Lifecycle invariants + +- `Server.Start` binds the listener SYNCHRONOUSLY — port-conflict + errors surface before the goroutine spawns. +- `Server.Shutdown` is idempotent; safe to call from a shutdown + unwinder without checking whether Start ran. +- The graceful-stop window is 800ms (leaves headroom in the + PRINCIPLES §1 1s overall budget). +- `MeterProvider.Shutdown` drains pending exports. The pull-based + Prometheus reader has no remote flush, so the typical duration + is microseconds. + +## Failure modes + +Per `docs/FAILURE-MODES.md`. Most operator-visible: + +- Port in use → `Server.Start` returns the bind error; `cmd/tracecore` + exits with a non-zero code BEFORE the pipeline starts. No partial + state. +- Bad `listen` address in YAML → `tracecore validate` rejects with + `telemetry.listen: invalid host:port "X"`. +- Non-absolute path (`metrics: metrics`) → validate rejects with + `telemetry.paths.metrics: must be an absolute path starting with '/'`. + +## Receiver-author wiring + +The canonical pattern is in `components/receivers/clockreceiver`: + +```go +sr := selftelemetry.NewNoopReceiver() +if set.Telemetry.MeterProvider != nil { + if r, err := selftelemetry.NewReceiver(set.ID, set.Telemetry.MeterProvider); err == nil { + sr = r + } +} +``` + +The fallback to noop ensures the hot path never nil-checks. Receivers +then call `sr.IncEmissions(n)`, `sr.MarkActivity()`, `sr.IncError(kind)` +as they collect. diff --git a/internal/telemetry/SECURITY.md b/internal/telemetry/SECURITY.md new file mode 100644 index 00000000..27479b6e --- /dev/null +++ b/internal/telemetry/SECURITY.md @@ -0,0 +1,125 @@ +# Self-telemetry surface — threat model (M2) + +Operator-facing 1-pager. Covers the HTTP surface tracecore exposes +when `telemetry.enabled: true`. Pairs with `README.md`'s "Security +posture" section. + +## Scope + +In: `/metrics`, `/healthz`, `/readyz` over an operator-configured +TCP listener. The MeterProvider, HTTP server, panic-recovery +middleware, JSON/text content negotiation, and the SLO/build_info +observable gauges. + +Out: the underlying receiver/exporter components (their threat +models live with each component); the configuration loader's YAML +parsing (covered by `internal/config`); the binary's process-level +signal handling (covered by `cmd/tracecore`). + +## Attacker model + +Single attacker class: **a network principal that can reach the +listen socket.** Capability scoped to HTTP requests against the +mounted paths. No control-plane vectors, no inbound writes, no RPC +endpoints. + +The attacker's goals: + +| Goal | Plausibility | Notes | +|---|---|---| +| Read tracecore-internal metric values | High when listener is non-loopback. The operator opted in by binding outside `localhost` | Default `localhost:8888` makes this require explicit operator action | +| Exhaust resources via slow-headers / slow-body | High under non-loopback exposure | Mitigated by `ReadHeaderTimeout`, `ReadTimeout`, `MaxHeaderBytes` | +| Crash the binary via crafted scrape | Low | promhttp's gather recovery + tracecore's `recoverHandler` middleware catch handler panics including buggy OTel observable callbacks | +| Tamper with metric data | Out of scope | No write endpoints; the surface is read-only by construction | +| Learn topology / version identity for downstream exploitation | High | `/metrics` exposes `component_id`, kinds, build SHA via `tracecore_build_info`. Treat as RFC-1918 / internal-only | + +## Mitigations in this PR + +### Default-safe binding + +- `telemetry.enabled: false` by default — no socket bound unless + the operator opts in. +- When enabled, `telemetry.listen` defaults to `localhost:8888`. + Operators flipping to a public interface (e.g., `:8888`) + acknowledge the change. +- Validation rejects empty or malformed listen addresses at + `tracecore validate`; misconfiguration doesn't reach server + start. + +### Time / resource bounds + +| Server field | Value | Defends against | +|---|---|---| +| `ReadHeaderTimeout` | 5s | Slow-header (slowloris classic) | +| `ReadTimeout` | 10s | Slow-body | +| `WriteTimeout` | 30s | Slow-read by client | +| `IdleTimeout` | 120s | Keep-alive parking | +| `MaxHeaderBytes` | 8 KiB | Per-connection memory ceiling | + +### Panic recovery + +`recoverHandler` wraps the ServeMux. Any panic — promhttp internals, +OTel observable callbacks, or future custom handlers — is caught, +logged via the configured `slog.Logger` with the request path and +panic value, and surfaced to the client as 500. The binary stays +up. See `internal/telemetry/server.go`'s `recoverHandler` for the +exact code path. + +### Cache discipline + +`/healthz` and `/readyz` set `Cache-Control: no-store` so a +misconfigured intermediate proxy can't serve stale 200s after the +pod transitioned to 503 — k8s probes get the live signal. + +### Content-type honesty + +Status responses set `Content-Type` explicitly (`text/plain` or +`application/json` per Accept negotiation) so future code paths +that write headers before the body don't get sniffed Content-Types. + +### Cardinality contract + +Metric label values are bounded — Kind/Reason constants in +`internal/selftelemetry`; `component_id` is the receiver's pipeline +ID. A receiver passing `err.Error()` to `IncError` would violate +the contract; the violation is documentation-only today (runtime +enforcement is a FOLLOWUPS item). Bounded cardinality protects +downstream Prom servers from denial-of-amplification. + +## Not mitigated in M2 + +| Gap | Workaround | Carry-forward target | +|---|---|---| +| No TLS — plaintext only | Reverse proxy (Envoy / nginx / Istio sidecar) terminates TLS | Schema sketched in RFC-0006; impl deferred | +| No authn / authz | Reverse proxy or service mesh enforces | Same | +| No rate limit on `/metrics` | Slowloris timeouts cap single-attacker burn; reverse-proxy adds layered defense | FOLLOWUPS "Rate limit on /metrics" (M5 hardening) | +| `Server:` header not stripped | Cosmetic fingerprinting; reverse proxy strips | FOLLOWUPS opportunistic | +| No anonymization layer for receiver-emitted labels | Receivers respect the cardinality contract by convention | Runtime cap is a FOLLOWUPS item | + +## Recommended deployment posture + +For production / non-loopback exposure: + +1. Front with a reverse proxy (Envoy, nginx, Istio sidecar) that + terminates TLS and enforces authn/authz against your fleet's + service-mesh identity model. +2. Restrict inbound TCP at the network layer (NetworkPolicy in + k8s, security group in cloud) so only your Prometheus scraper's + IP range reaches the listener. +3. Run as an unprivileged user with `ProtectHome=true`, + `ProtectSystem=strict`, `NoNewPrivileges=true` (systemd) or + equivalent (k8s `securityContext`). +4. Set `Accept: application/json` from your monitoring stack if + structured payloads are easier than text — both are equally + safe. + +For local dev / single-node operation: defaults are sufficient. +localhost-binding means the kernel restricts visibility to local +processes. + +## Disclosure + +Security issues in the surface go to the path documented in +`SECURITY.md` at the repo root. Don't open public issues for +findings that could let an attacker degrade tracecore on a +production deployment. diff --git a/internal/telemetry/build_info.go b/internal/telemetry/build_info.go new file mode 100644 index 00000000..147ca90a --- /dev/null +++ b/internal/telemetry/build_info.go @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry + +import ( + "context" + "errors" + "fmt" + "regexp" + "sort" + + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/metric" +) + +// Reject invalid Prometheus label names at registration time. Keys +// outside `[a-zA-Z_][a-zA-Z0-9_]*` get mangled to `_` by the +// OTel→Prom translator, silently colliding. +var labelKeyPattern = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]*$`) + +// RegisterBuildInfo registers `tracecore.build.info` — an observable +// gauge whose value is always 1 and whose labels identify the running +// binary. Operators join against it in PromQL to surface version +// metadata next to any other tracecore series: +// +// rate(tracecore_exporter_calls_total[5m]) +// * on() group_left(version, revision) +// tracecore_build_info +// +// Mirrors the convention used by OTel Collector +// (`otelcol_build_info`), Prometheus itself (`prometheus_build_info`), +// and most Go services. +// +// `labels` carries every operator-visible identifier. Keys MUST be +// low-cardinality (max ~5 keys; values stable across a process's +// lifetime). The function does not enforce this — operators choosing +// to add per-pod identifiers are responsible for their scrape budget. +func RegisterBuildInfo(mp metric.MeterProvider, labels map[string]string) error { + if mp == nil { + return errors.New("telemetry.RegisterBuildInfo: MeterProvider is nil") + } + + // Materialize attributes once at registration; the callback path + // stays allocation-free at scrape time. + keys := make([]string, 0, len(labels)) + for k := range labels { + if !labelKeyPattern.MatchString(k) { + return fmt.Errorf("telemetry.RegisterBuildInfo: label key %q is not a valid Prometheus label name (must match [a-zA-Z_][a-zA-Z0-9_]*)", k) + } + keys = append(keys, k) + } + sort.Strings(keys) // stable label order = stable scrape output + + attrs := make([]attribute.KeyValue, 0, len(labels)) + for _, k := range keys { + attrs = append(attrs, attribute.String(k, labels[k])) + } + attrSet := attribute.NewSet(attrs...) + + meter := mp.Meter("tracecore.build") + if _, err := meter.Int64ObservableGauge( + "tracecore.build.info", + metric.WithDescription("Build identity: value always 1; labels carry version/command/revision."), + metric.WithInt64Callback(func(_ context.Context, obs metric.Int64Observer) error { + obs.Observe(1, metric.WithAttributeSet(attrSet)) + return nil + }), + ); err != nil { + return fmt.Errorf("register tracecore.build.info: %w", err) + } + return nil +} diff --git a/internal/telemetry/build_info_test.go b/internal/telemetry/build_info_test.go new file mode 100644 index 00000000..6fad2d20 --- /dev/null +++ b/internal/telemetry/build_info_test.go @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry_test + +import ( + "context" + "net/http/httptest" + "testing" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// TestRegisterBuildInfo_AppearsInScrape pins the operator contract: +// after registration, `tracecore_build_info{...}=1` is scrapable. +func TestRegisterBuildInfo_AppearsInScrape(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + require.NoError(t, telemetry.RegisterBuildInfo(mp.Provider, map[string]string{ + "command": "tracecore", + "version": "v0.2.0", + "revision": "abc1234", + })) + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + + body := scrapeText(t, srv.URL) + require.Contains(t, body, "tracecore_build_info") + require.Regexp(t, + `tracecore_build_info\{[^}]*command="tracecore"[^}]*revision="abc1234"[^}]*version="v0\.2\.0"[^}]*\}\s+1`, + body, "build_info must include the canonical labels with value 1") +} + +// TestRegisterBuildInfo_RejectsNilMP guards the wire-up against a +// caller that skipped MeterProvider construction. +func TestRegisterBuildInfo_RejectsNilMP(t *testing.T) { + t.Parallel() + require.Error(t, telemetry.RegisterBuildInfo(nil, map[string]string{"version": "x"})) +} + +// TestRegisterBuildInfo_RejectsBadLabelKey pins the Pass-2 +// security-review finding: invalid Prometheus label names get +// mangled to `_` silently by the OTel→Prom translator, causing +// hidden collisions. Reject at registration so an operator typo +// surfaces immediately rather than appearing as a missing label. +func TestRegisterBuildInfo_RejectsBadLabelKey(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + cases := []map[string]string{ + {"build-info": "x"}, // hyphen + {"build.info": "x"}, // dot + {"1version": "x"}, // leading digit + {"": "x"}, // empty + } + for _, labels := range cases { + require.Error(t, telemetry.RegisterBuildInfo(mp.Provider, labels), + "must reject invalid label key in %v", labels) + } +} diff --git a/internal/telemetry/doc.go b/internal/telemetry/doc.go new file mode 100644 index 00000000..2fa806ae --- /dev/null +++ b/internal/telemetry/doc.go @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: Apache-2.0 + +// Package telemetry is tracecore's operator-facing observability +// SURFACE: the process-level MeterProvider, the HTTP server that +// exposes /metrics + /healthz + /readyz, the SLO observable gauges, +// and the build_info join-target. +// +// # Distinction from internal/selftelemetry +// +// tracecore has two observability-adjacent packages with names that +// look similar but solve different problems: +// +// - internal/telemetry (THIS package) — process-level service. +// Owns the MeterProvider, the HTTP listener, and the metric +// families an operator scrapes. One instance per binary. +// Wired by cmd/tracecore at boot. +// +// - internal/selftelemetry — per-component PRODUCER CONTRACT. +// Defines the Receiver + Exporter interfaces a component writes +// into when reporting its own health. One instance per +// component. Wired by the component's factory at construction. +// +// In short: this package serves the metrics; selftelemetry defines +// what receivers/exporters emit into them. +// +// # When to add code where +// +// - "I want a new operator-facing metric (something every component +// contributes to, like a global counter or SLO gauge)" → here. +// - "I want receivers/exporters to emit a new per-component signal" +// → internal/selftelemetry adds the interface method; this +// package's instrument registration follows. +// - "I want to change /metrics, /healthz, /readyz behavior or add +// a new HTTP route" → here. +// - "I want to change a receiver's IncError taxonomy" → +// internal/selftelemetry's Kind* constants. +// +// # Architecture intent (future milestones) +// +// This package is the central nervous system for process-level +// observability. As of M2 it owns metrics; when tracing lands in a +// future milestone the TracerProvider construction + tracing +// endpoint will live here too. The split between this package and +// internal/selftelemetry is signal-direction-driven (consumer vs +// producer), not modality-driven (metrics vs traces). Don't split +// metrics + tracing across packages; split surface (here) vs +// contract (selftelemetry). +// +// See docs/rfcs/RFC-0006-self-telemetry-surface.md for the M2 +// spec and the rationale behind the package boundary. +package telemetry diff --git a/internal/telemetry/export_test.go b/internal/telemetry/export_test.go new file mode 100644 index 00000000..70555b73 --- /dev/null +++ b/internal/telemetry/export_test.go @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry + +import "net" + +// This file aggregates all test-only re-exports for the telemetry +// package. Go convention: a single `export_test.go` per package +// holds the test-visible escape hatches. Production code must not +// reach for any of these. + +// StartWithListenerForTest serves the provided listener directly +// (skipping the cfg.Listen bind). Reserved for fault-injection tests +// that need to force a non-clean Serve exit. +func (s *Server) StartWithListenerForTest(ln net.Listener) error { + return s.start(ln) +} + +// HTTPAddr exposes the bound listen address for tests that need +// to scrape via a real HTTP round-trip after Start. +func (s *Server) HTTPAddr() string { + return "http://" + s.cfg.Listen +} diff --git a/internal/telemetry/leak_test.go b/internal/telemetry/leak_test.go new file mode 100644 index 00000000..a985b2d7 --- /dev/null +++ b/internal/telemetry/leak_test.go @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry_test + +import ( + "context" + "os" + "testing" + + "github.com/stretchr/testify/require" + "go.uber.org/goleak" + + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// TestServer_RepeatedStartShutdown_NoLeaks pins falsifier IF2 from +// `docs/loops/m2-research.md` Pass 5: "If the listener fd is leaked +// across repeated Start/Shutdown cycles, tests will eventually fail +// with EMFILE." Verified two ways: +// +// 1. goleak.Find catches any goroutine that didn't exit cleanly. +// 2. /proc/self/fd or syscall.Getrlimit-based fd count check: the +// binary's fd-count after N cycles must not grow by N. +// +// Ten cycles is enough to surface a leak under `make ci` without +// inflating wall-clock noticeably. +func TestServer_RepeatedStartShutdown_NoLeaks(t *testing.T) { + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + fdBefore, fdSupported := countOpenFDs() + + for i := 0; i < 10; i++ { + srv, err := telemetry.NewServer(telemetry.ServerConfig{ + Listen: freePort(t), + MeterProvider: mp, + Paths: telemetry.Paths{Metrics: "/metrics", Healthz: "/healthz", Readyz: "/readyz"}, + }) + require.NoError(t, err, "cycle %d", i) + require.NoError(t, srv.Start(t.Context()), "cycle %d", i) + require.NoError(t, srv.Shutdown(context.Background()), "cycle %d", i) + } + + require.NoError(t, goleak.Find(goleak.IgnoreCurrent())) + + if !fdSupported { + t.Log("no /proc/self/fd or /dev/fd; fd-count check skipped (goleak ran)") + return + } + // A small slack absorbs test-runtime fd churn (goroutine stack + // growth picking up scratch fds). A listener leak of 1 fd/cycle + // over 10 cycles produces a delta ≥10; legitimate churn is 0–2. + fdAfter, _ := countOpenFDs() + require.LessOrEqual(t, fdAfter-fdBefore, 3, + "fd count grew by >3 across 10 Start/Shutdown cycles (before=%d after=%d) — listener fd leak", + fdBefore, fdAfter) +} + +func countOpenFDs() (int, bool) { + for _, dir := range []string{"/proc/self/fd", "/dev/fd"} { + entries, err := os.ReadDir(dir) + if err == nil { + return len(entries), true + } + } + return 0, false +} diff --git a/internal/telemetry/meter_provider.go b/internal/telemetry/meter_provider.go new file mode 100644 index 00000000..a83d610c --- /dev/null +++ b/internal/telemetry/meter_provider.go @@ -0,0 +1,108 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry + +import ( + "context" + "fmt" + "net/http" + "sync" + + "github.com/prometheus/client_golang/prometheus" + "github.com/prometheus/client_golang/prometheus/promhttp" + otelprom "go.opentelemetry.io/otel/exporters/prometheus" + "go.opentelemetry.io/otel/metric" + sdkmetric "go.opentelemetry.io/otel/sdk/metric" +) + +// MeterProvider bundles the OTel MeterProvider receivers acquire a +// Meter from, the *prometheus.Registry it feeds, and the lifecycle +// hooks the runtime needs to drain on shutdown. +// +// The MeterProvider field is what receivers see via +// `pipeline.TelemetrySettings.MeterProvider`; the Registry feeds the +// /metrics endpoint via PromHandler. +type MeterProvider struct { + // Provider is the metric.MeterProvider receivers acquire a Meter + // from. Backed by the OTel SDK with the bundled Prometheus + // exporter as its sole Reader. + Provider metric.MeterProvider + + // Registry is the *prometheus.Registry the OTel exporter writes + // into and the /metrics handler reads from. + Registry *prometheus.Registry + + // provider holds the concrete SDK provider so Shutdown can drain + // pending exports. Distinct from Provider (the interface) so + // callers that store this struct via the public field can't + // accidentally call SDK-only methods. + provider *sdkmetric.MeterProvider + + // promHandler is the cached http.Handler returned by PromHandler. + // Set once at NewMeterProvider time; promhttp.HandlerFor creates + // a closure over the registry, so re-creating per call wastes + // allocations + GC churn under high scrape cadence. + promHandler http.Handler + + shutdownOnce sync.Once + shutdownErr error +} + +// NewMeterProvider constructs the M2 production MeterProvider: +// `*prometheus.Registry` + OTel Prometheus exporter as the sole +// `sdkmetric.Reader`. The returned MeterProvider can be wired into +// `pipeline.TelemetrySettings.MeterProvider` and its PromHandler() is +// the /metrics endpoint operators scrape. +// +// Callers must call Shutdown during runtime shutdown to drain any +// pending exports — the Prometheus exporter is pull-based, so this is +// typically a no-op, but the contract is preserved for parity with +// other reader types we might add later (e.g., an OTLP push reader). +func NewMeterProvider() (*MeterProvider, error) { + reg := prometheus.NewRegistry() + + // WithRegisterer keeps Prometheus's default registry untouched. + // Operators wanting a single shared registry for tracecore + their + // own metrics use WithRegistry on the server config (post-M2). + exporter, err := otelprom.New(otelprom.WithRegisterer(reg)) + if err != nil { + return nil, fmt.Errorf("construct OTel Prometheus exporter: %w", err) + } + + sdkp := sdkmetric.NewMeterProvider(sdkmetric.WithReader(exporter)) + + return &MeterProvider{ + Provider: sdkp, + Registry: reg, + provider: sdkp, + promHandler: promhttp.HandlerFor(reg, promhttp.HandlerOpts{ + // EnableOpenMetrics false — Prometheus + Grafana both + // still default to the older format; flipping requires + // a backend negotiation we don't ship in M2. + EnableOpenMetrics: false, + }), + }, nil +} + +// PromHandler returns the cached http.Handler that serves the +// text-exposition-format metrics scrape. Set at construction; safe +// for repeated calls without allocation. +func (m *MeterProvider) PromHandler() http.Handler { + return m.promHandler +} + +// Shutdown drains pending exports. Safe to call from a shutdown +// unwinder that doesn't know whether Start finished — calling twice +// is a no-op (idempotency contract documented in +// internal/pipeline/component.go). +func (m *MeterProvider) Shutdown(ctx context.Context) error { + m.shutdownOnce.Do(func() { + if m.provider == nil { + return + } + if err := m.provider.Shutdown(ctx); err != nil { + m.shutdownErr = fmt.Errorf("shutdown OTel meter provider: %w", err) + } + }) + return m.shutdownErr +} diff --git a/internal/telemetry/meter_provider_test.go b/internal/telemetry/meter_provider_test.go new file mode 100644 index 00000000..02bcc304 --- /dev/null +++ b/internal/telemetry/meter_provider_test.go @@ -0,0 +1,95 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry_test + +import ( + "context" + "net/http/httptest" + "strings" + "testing" + + "github.com/stretchr/testify/require" + "go.opentelemetry.io/otel/metric" + + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// TestNewMeterProvider_YieldsWorkingMeter pins the contract M2 work-item +// 2 declares: NewMeterProvider returns a metric.MeterProvider whose +// Meter("kind") yields a working instrument. A Counter.Add is observable +// downstream — here, via the bundled Prometheus exporter that +// NewMeterProvider wires in. +// +// We don't crack open the SDK internals; we observe via the operator's +// view (the registered prometheus.Gatherer producing text-exposition +// output). That's the only contract receivers + operators actually +// care about. +func TestNewMeterProvider_YieldsWorkingMeter(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err, "NewMeterProvider must succeed for default settings") + require.NotNil(t, mp.Provider) + require.NotNil(t, mp.Registry) + t.Cleanup(func() { + require.NoError(t, mp.Shutdown(context.Background())) + }) + + meter := mp.Provider.Meter("tracecore.test") + require.NotNil(t, meter) + + c, err := meter.Int64Counter("tracecore.test.counter", + metric.WithDescription("test counter")) + require.NoError(t, err) + + c.Add(context.Background(), 7) + c.Add(context.Background(), 5) + + // Scrape the Prometheus registry through the standard handler. + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + + body := scrapeText(t, srv.URL) + require.Contains(t, body, "tracecore_test_counter_total", "Prom exporter exposes the registered counter") + // 7 + 5 = 12. The OTel Prom exporter stamps an otel_scope_name + // label onto the metric line, so check the value appears on the + // same line as the metric name, not at a fixed offset. + require.Regexp(t, `tracecore_test_counter_total\{[^}]*\}\s+12`, body) +} + +// TestNewMeterProvider_ShutdownIsIdempotent pins that Shutdown is safe +// to call from the lifecycle's shutdown unwinder even when an earlier +// failure means Start never finished. +func TestNewMeterProvider_ShutdownIsIdempotent(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + + require.NoError(t, mp.Shutdown(context.Background())) + require.NoError(t, mp.Shutdown(context.Background()), "second Shutdown must be a no-op") +} + +// TestNewMeterProvider_PromHandler_AcceptsHEAD pins that an operator +// doing a quick health probe via `curl -I :8888/metrics` (HEAD) gets a +// useful response — both Prometheus servers and standard k8s probes +// occasionally use HEAD. promhttp's HandlerFor supports HEAD by +// default; a regression that disables it would break those probes. +func TestNewMeterProvider_PromHandler_AcceptsHEAD(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + + req, err := newRequest(t, "HEAD", srv.URL, strings.NewReader("")) + require.NoError(t, err) + resp, err := srv.Client().Do(req) + require.NoError(t, err) + t.Cleanup(func() { _ = resp.Body.Close() }) + + require.Equal(t, 200, resp.StatusCode, "HEAD must return 200 like GET") +} diff --git a/internal/telemetry/server.go b/internal/telemetry/server.go new file mode 100644 index 00000000..6fe04e7f --- /dev/null +++ b/internal/telemetry/server.go @@ -0,0 +1,321 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "io" + "log/slog" + "net" + "net/http" + "strings" + "sync/atomic" + "time" + + "github.com/tracecoreai/tracecore/internal/config" +) + +// Pre-marshal probe payloads so the hot scrape path doesn't +// allocate per request. +var probeJSONBodies = mustMarshalProbeBodies() + +func mustMarshalProbeBodies() map[string][]byte { + statuses := []string{"ok", "ready", "not ready", "shutting down"} + out := make(map[string][]byte, len(statuses)) + for _, s := range statuses { + b, err := json.Marshal(map[string]string{"status": s}) + if err != nil { + panic("telemetry: marshal probe body: " + err.Error()) + } + out[s] = append(b, '\n') + } + return out +} + +// ShutdownBudget bounds the HTTP-server graceful-stop window per +// PRINCIPLES §1 (1s total shutdown). Receivers + exporters +// already split a longer budget; the telemetry surface gets the +// shortest tail because it isn't on the data path. +const ShutdownBudget = 800 * time.Millisecond + +// validateMountPath delegates to the shared config.ValidateMountPath +// rules and prefixes the operator-facing error with the field path +// inside ServerConfig so the failure points at the actual call site. +func validateMountPath(name, p string) error { + if err := config.ValidateMountPath(p); err != nil { + return fmt.Errorf("telemetry server: %s %w", name, err) + } + return nil +} + +// ServerConfig captures everything the Server needs at construction. +// Fields are required unless documented otherwise. Validation happens +// in NewServer; callers should not zero-init the struct without +// passing through validation. +type ServerConfig struct { + // Listen is the operator-facing bind address (e.g. + // "localhost:8888"). Required. + Listen string + + // MeterProvider supplies the /metrics handler. Required. + MeterProvider *MeterProvider + + // Paths groups the three route knobs that mirror the operator- + // facing `telemetry.paths.*` YAML block (see + // internal/config.TelemetryPaths). Keeping the shape parallel + // means the YAML → Go mapping is mechanical and a future + // renaming changes one struct, not five fields across the + // codebase. All required and must start with "/". + Paths Paths + + // ReadyFn returns true once the runtime considers itself "ready" + // — every Component.Start has returned. Optional; if nil, /readyz + // stays 503 (matches RFC-0006: "before [ready flips true] + // /readyz returns 503"). Production wire-up sets this to the + // runtime's "all components started" predicate. + ReadyFn func() bool + + // Logger surfaces non-clean Serve exits (listener crash, fd + // exhaustion, etc.) so operators see why the surface stopped. + // Optional; defaults to a logger that discards output. Production + // wire-up should pass the runtime's slog. + Logger *slog.Logger + + _ struct{} +} + +// Paths captures the three HTTP route knobs the Server mounts on a +// single ServeMux. Shape mirrors `config.TelemetryPaths` so the +// YAML-to-Go mapping is a field-by-field copy. +type Paths struct { + Metrics string + Healthz string + Readyz string + + _ struct{} +} + +// ServerOption mutates a ServerConfig in a controlled way; used by +// tests and the runtime wire-up to override individual fields without +// re-specifying the whole struct. +type ServerOption func(*ServerConfig) + +// WithReadyFn sets the predicate /readyz consults. +func WithReadyFn(f func() bool) ServerOption { + return func(c *ServerConfig) { c.ReadyFn = f } +} + +// Server owns the HTTP listener + ServeMux that exposes /metrics, +// /healthz, /readyz on a single port. +type Server struct { + cfg ServerConfig + mux *http.ServeMux + httpSrv *http.Server + + // started + shuttingDown gate the lifecycle. Atomic so the hot- + // path handlers can read shuttingDown without locking and + // Start/Shutdown can race the listener bind without panicking. + shuttingDown atomic.Bool + started atomic.Bool + doneCh chan struct{} +} + +// NewServer validates cfg and prepares the Server for Start. The +// listener is not opened until Start. +func NewServer(cfg ServerConfig) (*Server, error) { + if cfg.Listen == "" { + return nil, errors.New("telemetry server: Listen is required") + } + if cfg.MeterProvider == nil { + return nil, errors.New("telemetry server: MeterProvider is required") + } + if err := validateMountPath("Paths.Metrics", cfg.Paths.Metrics); err != nil { + return nil, err + } + if err := validateMountPath("Paths.Healthz", cfg.Paths.Healthz); err != nil { + return nil, err + } + if err := validateMountPath("Paths.Readyz", cfg.Paths.Readyz); err != nil { + return nil, err + } + if cfg.Logger == nil { + cfg.Logger = slog.New(slog.NewTextHandler(io.Discard, nil)) + } + + s := &Server{ + cfg: cfg, + mux: http.NewServeMux(), + doneCh: make(chan struct{}), + } + + s.mux.Handle(cfg.Paths.Metrics, cfg.MeterProvider.PromHandler()) + s.mux.HandleFunc(cfg.Paths.Healthz, s.handleHealthz) + s.mux.HandleFunc(cfg.Paths.Readyz, s.handleReadyz) + + s.httpSrv = &http.Server{ + Addr: cfg.Listen, + Handler: s.recoverHandler(s.mux), + // Conservative timeouts cap slowloris-style attack surface + // when operators bind on a non-localhost interface. Scrapes + // finish in milliseconds; nothing legitimate hits these. + ReadHeaderTimeout: 5 * time.Second, + ReadTimeout: 10 * time.Second, + WriteTimeout: 30 * time.Second, + IdleTimeout: 120 * time.Second, + // 8 KiB header budget — Prometheus + curl never approach + // this; tighter than the stdlib 1 MiB default to limit + // memory burn from slow-header attacks. + MaxHeaderBytes: 8 << 10, + } + + return s, nil +} + +// Start opens the listener and spawns the Serve goroutine. Returns +// immediately after the listener is bound (so a failed bind is +// reported before Start returns rather than swallowed by the +// background goroutine). +func (s *Server) Start(_ context.Context) error { + return s.start(nil) +} + +// start is the internal lifecycle: if lnOverride is nil, bind +// cfg.Listen; otherwise serve the provided listener directly. The +// override path is used by fault-injection tests; production callers +// reach this through Start. +func (s *Server) start(lnOverride net.Listener) error { + if !s.started.CompareAndSwap(false, true) { + return errors.New("telemetry server: Start called twice") + } + + ln := lnOverride + if ln == nil { + var err error + ln, err = net.Listen("tcp", s.cfg.Listen) + if err != nil { + // Close doneCh in the error path so a subsequent Shutdown + // (called by the failure-unwinder in cmd/tracecore) doesn't + // block on a channel that no goroutine will ever close. + close(s.doneCh) + return fmt.Errorf("listen %s: %w", s.cfg.Listen, err) + } + } + + go func() { + defer close(s.doneCh) + err := s.httpSrv.Serve(ln) + if err != nil && !errors.Is(err, http.ErrServerClosed) { + // Surface non-clean-shutdown errors so an operator sees + // why the surface stopped. Logger is required by + // NewServer's validation, so this can't nil-deref. + s.cfg.Logger.Error("telemetry server: Serve exited with error", "err", err) + } + }() + return nil +} + +// Shutdown drains the HTTP server within ShutdownBudget. Idempotent. +func (s *Server) Shutdown(ctx context.Context) error { + if !s.started.Load() { + return nil + } + if !s.shuttingDown.CompareAndSwap(false, true) { + return nil + } + + ctx, cancel := context.WithTimeout(ctx, ShutdownBudget) + defer cancel() + + if err := s.httpSrv.Shutdown(ctx); err != nil { + return fmt.Errorf("http server shutdown: %w", err) + } + <-s.doneCh + return nil +} + +func (s *Server) handleHealthz(w http.ResponseWriter, r *http.Request) { + switch { + case s.shuttingDown.Load(): + writeProbeResponse(w, r, http.StatusServiceUnavailable, "shutting down") + default: + writeProbeResponse(w, r, http.StatusOK, "ok") + } +} + +func (s *Server) handleReadyz(w http.ResponseWriter, r *http.Request) { + switch { + case s.shuttingDown.Load(): + writeProbeResponse(w, r, http.StatusServiceUnavailable, "shutting down") + // Nil ReadyFn defaults to "not ready" — matches RFC-0006 boot + // semantics; production wire-up always supplies a real predicate. + case s.cfg.ReadyFn == nil || !s.cfg.ReadyFn(): + writeProbeResponse(w, r, http.StatusServiceUnavailable, "not ready") + default: + writeProbeResponse(w, r, http.StatusOK, "ready") + } +} + +// writeProbeResponse implements Accept-header content negotiation +// between text/plain (default; k8s probes + curl) and +// application/json (tools that want a structured payload). +// +// Body for text/plain: "\n" +// Body for application/json: {"status":""} — payload is +// pre-marshaled at package init, no per-request allocation. +func writeProbeResponse(w http.ResponseWriter, r *http.Request, code int, status string) { + w.Header().Set("Cache-Control", "no-store") + if wantsJSON(r) { + w.Header().Set("Content-Type", "application/json; charset=utf-8") + w.WriteHeader(code) + _, _ = w.Write(probeJSONBodies[status]) + return + } + w.Header().Set("Content-Type", "text/plain; charset=utf-8") + w.WriteHeader(code) + _, _ = w.Write([]byte(status + "\n")) +} + +// wantsJSON returns true if the request's Accept header lists +// application/json BEFORE text/plain (or text/plain is absent). +// Tools opt in by setting `Accept: application/json`; k8s probes +// and curl with no Accept get the default text/plain. +// +// Substring-match, not full RFC 7231 grammar — q-values are ignored +// and a malformed Accept falls through to text/plain. Operators +// needing q-value-aware negotiation should put a reverse proxy in +// front of tracecore that does it (or send `Accept: application/json` +// explicitly, which is what every real client does anyway). +func wantsJSON(r *http.Request) bool { + accept := r.Header.Get("Accept") + if accept == "" { + return false + } + jsonIdx := strings.Index(accept, "application/json") + if jsonIdx < 0 { + return false + } + textIdx := strings.Index(accept, "text/plain") + return textIdx < 0 || jsonIdx < textIdx +} + +// PRINCIPLES §1: the telemetry surface must never poison the workload. +// promhttp's recover wraps gather, not OTel observable callbacks — +// a buggy SLOSource panic escapes promhttp; we own the safety net. +func (s *Server) recoverHandler(h http.Handler) http.Handler { + return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + defer func() { + if rv := recover(); rv != nil { + s.cfg.Logger.Error("telemetry server: handler panic", + "path", r.URL.Path, + "panic", rv, + ) + // Best-effort 500; headers may already be written. + w.WriteHeader(http.StatusInternalServerError) + } + }() + h.ServeHTTP(w, r) + }) +} diff --git a/internal/telemetry/server_test.go b/internal/telemetry/server_test.go new file mode 100644 index 00000000..050e4492 --- /dev/null +++ b/internal/telemetry/server_test.go @@ -0,0 +1,384 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry_test + +import ( + "context" + "errors" + "io" + "log/slog" + "net" + "net/http" + "strings" + "sync/atomic" + "testing" + "time" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/pipeline/pipelinetest" + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// freePort grabs a port, closes the listener, and returns "127.0.0.1:P". +// Picking a port this way races with the actual bind below; in practice +// macOS + Linux loopback hands the same port back when we re-bind +// within microseconds, and Server.Start will surface a clear error if +// it races into use. Sufficient for tests, NOT for production. +func freePort(t *testing.T) string { + t.Helper() + l, err := net.Listen("tcp", "127.0.0.1:0") + require.NoError(t, err) + addr := l.Addr().String() + require.NoError(t, l.Close()) + return addr +} + +func newServer(t *testing.T, opts ...telemetry.ServerOption) (*telemetry.Server, string) { + t.Helper() + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + addr := freePort(t) + cfg := telemetry.ServerConfig{ + Listen: addr, + MeterProvider: mp, + Paths: telemetry.Paths{Metrics: "/metrics", Healthz: "/healthz", Readyz: "/readyz"}, + } + for _, o := range opts { + o(&cfg) + } + srv, err := telemetry.NewServer(cfg) + require.NoError(t, err) + require.NoError(t, srv.Start(t.Context())) + t.Cleanup(func() { _ = srv.Shutdown(context.Background()) }) + return srv, "http://" + addr +} + +func getStatus(t *testing.T, url string) int { + t.Helper() + req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, url, http.NoBody) + require.NoError(t, err) + resp, err := http.DefaultClient.Do(req) + require.NoError(t, err) + t.Cleanup(func() { _ = resp.Body.Close() }) + return resp.StatusCode +} + +// TestServer_MetricsEndpointReturns200 verifies the /metrics surface +// returns 200 + Prometheus-shaped text. +func TestServer_MetricsEndpointReturns200(t *testing.T) { + t.Parallel() + _, base := newServer(t) + + require.Equal(t, 200, getStatus(t, base+"/metrics")) +} + +// TestServer_HealthzReturns200WhenRunning pins the liveness contract: +// once the server is running, /healthz returns 200 until Shutdown. +func TestServer_HealthzReturns200WhenRunning(t *testing.T) { + t.Parallel() + _, base := newServer(t) + + require.Equal(t, 200, getStatus(t, base+"/healthz")) +} + +// TestServer_ReadyzReflectsReadyFn pins the readiness contract: +// /readyz returns 200 only when the supplied ReadyFn returns true. +// Once the predicate flips false, /readyz returns 503. +func TestServer_ReadyzReflectsReadyFn(t *testing.T) { + t.Parallel() + var ready atomic.Bool + _, base := newServer(t, telemetry.WithReadyFn(ready.Load)) + + require.Equal(t, 503, getStatus(t, base+"/readyz"), "not ready until predicate flips true") + + ready.Store(true) + require.Equal(t, 200, getStatus(t, base+"/readyz")) + + ready.Store(false) + require.Equal(t, 503, getStatus(t, base+"/readyz"), "predicate flip is observable on next request") +} + +// TestServer_HealthzReadyz_JSONContentNegotiation pins the content +// negotiation contract: Accept: application/json returns a JSON +// body, otherwise text/plain. k8s probes + curl with no Accept get +// the default (text); tools opting in get structured payloads. +func TestServer_HealthzReadyz_JSONContentNegotiation(t *testing.T) { + t.Parallel() + var ready atomic.Bool + ready.Store(true) + _, base := newServer(t, telemetry.WithReadyFn(ready.Load)) + + cases := []struct { + path string + accept string + wantStatus int + wantCType string + wantBodyHas string + }{ + {"/healthz", "", 200, "text/plain", "ok\n"}, + {"/healthz", "application/json", 200, "application/json", `{"status":"ok"}`}, + {"/readyz", "", 200, "text/plain", "ready\n"}, + {"/readyz", "application/json", 200, "application/json", `{"status":"ready"}`}, + // text/plain in Accept beats application/json if it comes + // earlier — defensive parsing. + {"/healthz", "text/plain, application/json", 200, "text/plain", "ok\n"}, + // Malformed Accept falls back to text. + {"/healthz", "garbage", 200, "text/plain", "ok\n"}, + } + for _, tc := range cases { + req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, base+tc.path, http.NoBody) + require.NoError(t, err) + if tc.accept != "" { + req.Header.Set("Accept", tc.accept) + } + resp, err := http.DefaultClient.Do(req) + require.NoError(t, err, "%s with Accept=%q", tc.path, tc.accept) + body, _ := io.ReadAll(resp.Body) + _ = resp.Body.Close() + require.Equal(t, tc.wantStatus, resp.StatusCode, "%s %q", tc.path, tc.accept) + require.Contains(t, resp.Header.Get("Content-Type"), tc.wantCType, "%s %q", tc.path, tc.accept) + require.Contains(t, string(body), tc.wantBodyHas, "%s %q", tc.path, tc.accept) + } +} + +// TestServer_RecoversFromHandlerPanic pins PRINCIPLES §1 ("trust +// under load is the product — never crash the workload"): a panic +// inside any handler — whether from a buggy SLOSource callback, a +// receiver self-telemetry increment, or arbitrary instrumentation — +// must NOT crash the binary. The recover middleware logs the panic +// and returns 500. +func TestServer_RecoversFromHandlerPanic(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + var logBuf syncBuffer + logger := slog.New(slog.NewTextHandler(&logBuf, &slog.HandlerOptions{Level: slog.LevelDebug})) + + srv, err := telemetry.NewServer(telemetry.ServerConfig{ + Listen: freePort(t), + MeterProvider: mp, + Paths: telemetry.Paths{Metrics: "/metrics", Healthz: "/healthz", Readyz: "/readyz"}, + Logger: logger, + // Force /readyz to panic so the recover path is exercised + // deterministically. + ReadyFn: func() bool { panic("simulated handler panic") }, + }) + require.NoError(t, err) + require.NoError(t, srv.Start(t.Context())) + t.Cleanup(func() { _ = srv.Shutdown(context.Background()) }) + + addr := srv.HTTPAddr() + resp, err := http.Get(addr + "/readyz") //nolint:gosec,noctx // test code + require.NoError(t, err, "binary must remain alive after handler panic") + require.NoError(t, resp.Body.Close()) + require.Equal(t, http.StatusInternalServerError, resp.StatusCode, + "panicking handler must surface as 500, not a crash") + + require.Eventually(t, func() bool { + return strings.Contains(logBuf.String(), "handler panic") + }, 200*time.Millisecond, 25*time.Millisecond, + "panic value must surface in cfg.Logger") +} + +// TestServer_HealthzStays200WhileDraining pins the operator policy: +// when ReadyFn returns false (graceful drain begun, but the HTTP +// server itself hasn't started Shutdown yet), /readyz must return 503 +// (so k8s removes the pod from service endpoints) while /healthz must +// stay 200 (so livenessProbe doesn't kill the pod mid-drain). +// +// This is the window between cmd/tracecore's `ready.Store(false)` and +// `Server.Shutdown` — typically the full rt.Shutdown drain budget, +// possibly up to 30s. k8s wants exactly this signal: "alive, but +// don't send me new traffic." +func TestServer_HealthzStays200WhileDraining(t *testing.T) { + t.Parallel() + var ready atomic.Bool + ready.Store(true) // running + _, base := newServer(t, telemetry.WithReadyFn(ready.Load)) + + require.Equal(t, 200, getStatus(t, base+"/healthz"), "alive while running") + require.Equal(t, 200, getStatus(t, base+"/readyz"), "ready while running") + + // Simulate drain: ready flips false but Server.Shutdown hasn't + // been called yet (the rt.Shutdown window). + ready.Store(false) + + require.Equal(t, 200, getStatus(t, base+"/healthz"), + "liveness must NOT trip during drain — k8s should keep the pod alive") + require.Equal(t, 503, getStatus(t, base+"/readyz"), + "readiness must trip during drain — k8s removes from service endpoints") +} + +// TestServer_HealthzReturns503DuringShutdown pins that /healthz fails +// after Shutdown begins so an operator probe sees the transition. +func TestServer_HealthzReturns503DuringShutdown(t *testing.T) { + t.Parallel() + srv, base := newServer(t) + + // Run Shutdown in the background — it'll block briefly closing + // the listener; we want to hit /healthz once Shutdown has + // flipped the "shutting down" flag but before the listener is + // gone. + done := make(chan error, 1) + go func() { done <- srv.Shutdown(context.Background()) }() + + // Best-effort: /healthz may either return 503 OR fail with + // connection-reset depending on listener-close timing. Either is + // fine — the contract is "no 200 once shutting down." + req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, base+"/healthz", http.NoBody) + require.NoError(t, err) + resp, err := http.DefaultClient.Do(req) + if err == nil { + require.NotEqual(t, 200, resp.StatusCode, "must not be 200 during shutdown") + _ = resp.Body.Close() + } + + require.NoError(t, <-done, "shutdown must complete cleanly") +} + +// TestServer_LogsServeErrorOnUncleanExit pins the diagnostics fix: +// when http.Server.Serve returns a non-ErrServerClosed error, the +// operator sees a log line. Uses a fault-injection listener whose +// Accept returns a non-ErrServerClosed error, forcing the Serve +// loop to exit unclean. +func TestServer_LogsServeErrorOnUncleanExit(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + var logBuf syncBuffer + logger := slog.New(slog.NewTextHandler(&logBuf, &slog.HandlerOptions{Level: slog.LevelDebug})) + + srv, err := telemetry.NewServer(telemetry.ServerConfig{ + Listen: "irrelevant", // bypassed by StartWithListenerForTest + MeterProvider: mp, + Paths: telemetry.Paths{Metrics: "/metrics", Healthz: "/healthz", Readyz: "/readyz"}, + Logger: logger, + }) + require.NoError(t, err) + + require.NoError(t, srv.StartWithListenerForTest(&faultyListener{ + acceptErr: errors.New("simulated listener fault"), + })) + t.Cleanup(func() { _ = srv.Shutdown(context.Background()) }) + + // Goroutine logs when Serve returns the listener's error. + require.Eventually(t, func() bool { + return strings.Contains(logBuf.String(), "Serve exited with error") && + strings.Contains(logBuf.String(), "simulated listener fault") + }, 500*time.Millisecond, 25*time.Millisecond, + "unclean Serve exit must surface in the configured Logger") +} + +// faultyListener returns a fixed error on every Accept so http.Server +// exits with a non-ErrServerClosed error. Implements net.Listener. +type faultyListener struct { + acceptErr error +} + +func (l *faultyListener) Accept() (net.Conn, error) { return nil, l.acceptErr } +func (l *faultyListener) Close() error { return nil } +func (l *faultyListener) Addr() net.Addr { return &net.TCPAddr{IP: net.IPv4(127, 0, 0, 1)} } + +// syncBuffer aliases pipelinetest.SyncBuffer — the canonical +// concurrent-safe buffer used across tracecore's integration tests. +type syncBuffer = pipelinetest.SyncBuffer + +// TestServer_ShutdownAfterFailedStart_DoesNotHang pins the bug fix for +// the race between net.Listen failing and Shutdown's doneCh wait: +// when Start returns a bind error, started is already true but no +// Serve goroutine was spawned. A naive impl waits on doneCh forever. +// Shutdown must return promptly. +func TestServer_ShutdownAfterFailedStart_DoesNotHang(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + // Bind a port to ensure the next net.Listen on the same address + // fails (EADDRINUSE). + blocker, err := net.Listen("tcp", "127.0.0.1:0") + require.NoError(t, err) + t.Cleanup(func() { _ = blocker.Close() }) + addr := blocker.Addr().String() + + srv, err := telemetry.NewServer(telemetry.ServerConfig{ + Listen: addr, + MeterProvider: mp, + Paths: telemetry.Paths{Metrics: "/metrics", Healthz: "/healthz", Readyz: "/readyz"}, + }) + require.NoError(t, err) + + require.Error(t, srv.Start(t.Context()), "Start must fail when port is busy") + + // Critical: Shutdown after a failed Start must return promptly, + // not block on a never-closed doneCh. + done := make(chan error, 1) + go func() { done <- srv.Shutdown(context.Background()) }() + + select { + case err := <-done: + require.NoError(t, err, "Shutdown after failed Start must succeed cleanly") + case <-time.After(500 * time.Millisecond): + t.Fatal("Shutdown after failed Start hung for >500ms (the bug)") + } +} + +// TestServer_ShutdownIsIdempotent pins the contract Component.Shutdown +// inherits: safe to call without Start (failure unwinding) and safe +// to call twice. +func TestServer_ShutdownIsIdempotent(t *testing.T) { + t.Parallel() + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + srv, err := telemetry.NewServer(telemetry.ServerConfig{ + Listen: freePort(t), + MeterProvider: mp, + Paths: telemetry.Paths{Metrics: "/metrics", Healthz: "/healthz", Readyz: "/readyz"}, + }) + require.NoError(t, err) + + // Shutdown without Start. + require.NoError(t, srv.Shutdown(context.Background())) + + // Start then double-Shutdown. + srv2, err := telemetry.NewServer(telemetry.ServerConfig{ + Listen: freePort(t), + MeterProvider: mp, + Paths: telemetry.Paths{Metrics: "/metrics", Healthz: "/healthz", Readyz: "/readyz"}, + }) + require.NoError(t, err) + require.NoError(t, srv2.Start(t.Context())) + require.NoError(t, srv2.Shutdown(context.Background())) + require.NoError(t, srv2.Shutdown(context.Background())) +} + +// TestServer_ShutdownWithin1s pins the PRINCIPLES §1 1s shutdown +// budget. The HTTP server's idle connections must not stretch it. +func TestServer_ShutdownWithin1s(t *testing.T) { + t.Parallel() + srv, _ := newServer(t) + + deadline := time.Now().Add(1500 * time.Millisecond) + done := make(chan error, 1) + go func() { done <- srv.Shutdown(context.Background()) }() + + select { + case err := <-done: + require.NoError(t, err) + require.True(t, time.Now().Before(deadline), "must complete in <1.5s") + case <-time.After(1500 * time.Millisecond): + t.Fatal("server Shutdown took >1.5s") + } +} diff --git a/internal/telemetry/slo.go b/internal/telemetry/slo.go new file mode 100644 index 00000000..e8721cda --- /dev/null +++ b/internal/telemetry/slo.go @@ -0,0 +1,186 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry + +import ( + "context" + "errors" + "fmt" + "time" + + "go.opentelemetry.io/otel/metric" + + "github.com/tracecoreai/tracecore/internal/selftelemetry" +) + +// DefaultSLOWindow is the rolling window over which +// `tracecore.exporter.failure_rate` is computed. Sized to match the +// k8s liveness/readiness probe cadence (≤60s for typical probes); +// long enough that a single transient failure doesn't pin the gauge +// indefinitely, short enough that an operator alerting on >0 sees the +// problem within a probe-or-two of it starting. +const DefaultSLOWindow = 60 * time.Second + +// SLOSource supplies the values reported by the three M2 O2 SLO +// observable gauges. Implementations are queried at scrape time +// (via OTel observable callbacks); the returned values should be +// cheap to compute — micros, not millis. +// +// Per RFC-0006 + MILESTONES.md M2: failure_rate is wired to real +// exporter signal as of M2; queue_depth_ratio + restart_count_per_hour +// stay at 0 (carry-forward to milestones that introduce a queue and +// a restart mechanism, respectively). +type SLOSource interface { + // ExporterFailureRate returns the ratio of failed Consume* calls + // to total Consume* calls, in [0, 1]. Implementations aggregate + // across all registered exporters. + ExporterFailureRate() float64 + + // QueueDepthRatio returns the queue depth / capacity ratio, in + // [0, 1]. Returns 0 until tracecore has a queue mechanism. + QueueDepthRatio() float64 + + // ComponentRestartCountPerHour returns the windowed component + // restart rate scaled to per-hour. Returns 0 until tracecore has + // a restart mechanism. + ComponentRestartCountPerHour() float64 +} + +// ExporterRegistry is the contract `cmd/tracecore` implements to feed +// the SLOSource with live exporter counters. Each Exporter the +// runtime constructs registers itself; ExporterFailureRate aggregates +// the success/failure totals across all of them. +type ExporterRegistry interface { + RegisteredExporters() []selftelemetry.FailureRateReader +} + +// AggregateSLOSource is the production SLOSource. `failure_rate` +// is a rolling-window rate over the last `Window` of time, computed +// via the reusable WindowedRate primitive; queue + restart are 0 +// (M2 carry-forward). +// +// The windowing matters: operators alert on `> 0` over a recent +// window. A cumulative-over-all-time ratio (the obvious "naive" +// failure_rate) pins above 0 forever after a single failure, which +// makes alerting useless. +type AggregateSLOSource struct { + registry ExporterRegistry + rate *WindowedRate +} + +// NewAggregateSLOSource constructs the production SLOSource with the +// given registry and window. A window of zero falls back to +// DefaultSLOWindow. Must be used by pointer so the in-callback ring +// buffer stays shared across observations. +func NewAggregateSLOSource(registry ExporterRegistry, window time.Duration) *AggregateSLOSource { + return &AggregateSLOSource{ + registry: registry, + rate: NewWindowedRate(window), + } +} + +// ExporterFailureRate returns the rolling-window failure rate over +// the configured Window. Returns 0 while warming up (no anchor sample +// yet ≥ window-old) and when total calls in the window is 0. +// +// Walk cost is O(n) in the number of registered exporters: each +// scrape loads two atomic counters per reader. Benchmarked at <5µs +// for 100 readers on Apple M4 Pro (see internal/telemetry/ +// slo_bench_test.go). For tracecore's expected scale (1–10 exporters +// per pipeline) this stays comfortably below the +// "self-telemetry must not spike CPU" PRINCIPLES §1 bound. +func (s *AggregateSLOSource) ExporterFailureRate() float64 { + if s.registry == nil { + return 0 + } + var success, failure uint64 + for _, r := range s.registry.RegisteredExporters() { + success += r.SuccessCount() + failure += r.FailureCount() + } + return s.rate.Observe(failure, success+failure) +} + +// QueueDepthRatio returns 0 — carry-forward from M2. +func (*AggregateSLOSource) QueueDepthRatio() float64 { return 0 } + +// ComponentRestartCountPerHour returns 0 — carry-forward from M2. +func (*AggregateSLOSource) ComponentRestartCountPerHour() float64 { return 0 } + +// ZeroSLOSource reports 0 for all three gauges. Used by tests + by +// `tracecore validate`, which doesn't register any exporters. +type ZeroSLOSource struct{} + +// ExporterFailureRate satisfies SLOSource; always 0. +func (ZeroSLOSource) ExporterFailureRate() float64 { return 0 } + +// QueueDepthRatio satisfies SLOSource; always 0. +func (ZeroSLOSource) QueueDepthRatio() float64 { return 0 } + +// ComponentRestartCountPerHour satisfies SLOSource; always 0. +func (ZeroSLOSource) ComponentRestartCountPerHour() float64 { return 0 } + +// RegisterSLOMetrics registers the three O2 SLO observable gauges +// against mp, with values supplied by src. The names match the +// MILESTONES.md M2 acceptance list exactly: +// +// - tracecore.exporter.failure_rate +// - tracecore.queue.depth_ratio +// - tracecore.component.restart_count_per_hour +// +// Each is registered as a Float64ObservableGauge whose callback reads +// from src on every scrape. +func RegisterSLOMetrics(mp metric.MeterProvider, src SLOSource) error { + if mp == nil { + return errors.New("telemetry.RegisterSLOMetrics: MeterProvider is nil") + } + if src == nil { + return errors.New("telemetry.RegisterSLOMetrics: SLOSource is nil") + } + + meter := mp.Meter("tracecore.slo") + + for _, g := range []sloGaugeSpec{ + { + name: "tracecore.exporter.failure_rate", + desc: "Failure ratio (0..1) of exporter Consume* calls over the last 60s rolling window; 0 while warming up or with no in-window calls. Use the raw tracecore_exporter_calls_total counter for custom windows.", + read: src.ExporterFailureRate, + }, + { + name: "tracecore.queue.depth_ratio", + desc: "Queue depth / capacity ratio (0..1); always 0 until a queue mechanism is added in a future release.", + read: src.QueueDepthRatio, + }, + { + name: "tracecore.component.restart_count_per_hour", + desc: "Component restart rate scaled to per-hour; always 0 until a runtime restart mechanism is added in a future release.", + read: src.ComponentRestartCountPerHour, + }, + } { + if err := registerObservableFloat64Gauge(meter, g); err != nil { + return err + } + } + return nil +} + +type sloGaugeSpec struct { + name string + desc string + read func() float64 +} + +func registerObservableFloat64Gauge(meter metric.Meter, g sloGaugeSpec) error { + _, err := meter.Float64ObservableGauge( + g.name, + metric.WithDescription(g.desc), + metric.WithFloat64Callback(func(_ context.Context, obs metric.Float64Observer) error { + obs.Observe(g.read()) + return nil + }), + ) + if err != nil { + return fmt.Errorf("register %s: %w", g.name, err) + } + return nil +} diff --git a/internal/telemetry/slo_bench_test.go b/internal/telemetry/slo_bench_test.go new file mode 100644 index 00000000..065afd5b --- /dev/null +++ b/internal/telemetry/slo_bench_test.go @@ -0,0 +1,70 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry_test + +import ( + "testing" + + "github.com/tracecoreai/tracecore/internal/selftelemetry" + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// BenchmarkAggregateSLOSource_ExporterFailureRate measures the +// scrape-path cost of the rolling-window failure-rate computation +// across N registered exporters. PRINCIPLES §1 ("self-telemetry +// must never spike CPU") wants this on the order of micros, not +// millis, even under realistic exporter counts. +// +// Run with `go test -bench BenchmarkAggregateSLOSource -benchmem +// ./internal/telemetry/`. +func BenchmarkAggregateSLOSource_ExporterFailureRate_1Reader(b *testing.B) { + benchAggregateSLO(b, 1) +} + +func BenchmarkAggregateSLOSource_ExporterFailureRate_10Readers(b *testing.B) { + benchAggregateSLO(b, 10) +} + +func BenchmarkAggregateSLOSource_ExporterFailureRate_100Readers(b *testing.B) { + benchAggregateSLO(b, 100) +} + +func benchAggregateSLO(b *testing.B, nReaders int) { + b.Helper() + + readers := make([]selftelemetry.FailureRateReader, nReaders) + for i := range readers { + readers[i] = &fakeFailureReader{success: 1_000, failure: 5} + } + src := telemetry.NewAggregateSLOSource( + fakeRegistry{readers: readers}, + telemetry.DefaultSLOWindow, + ) + + // Warm the sample buffer so we're benchmarking steady-state, + // not the first-scrape "warming up" path. + for i := 0; i < 3; i++ { + _ = src.ExporterFailureRate() + } + + b.ResetTimer() + for b.Loop() { + _ = src.ExporterFailureRate() + } +} + +// BenchmarkWindowedRate_Observe isolates the rolling-window math +// from the registry walk. Useful for sizing future SLI gauges that +// reuse the primitive without an ExporterRegistry layer. +func BenchmarkWindowedRate_Observe(b *testing.B) { + w := telemetry.NewWindowedRate(telemetry.DefaultSLOWindow) + for i := 0; i < 3; i++ { + _ = w.Observe(uint64(i), uint64(i*10)) + } + b.ResetTimer() + var i uint64 + for b.Loop() { + i++ + _ = w.Observe(i, i*10) + } +} diff --git a/internal/telemetry/slo_test.go b/internal/telemetry/slo_test.go new file mode 100644 index 00000000..4f60ef64 --- /dev/null +++ b/internal/telemetry/slo_test.go @@ -0,0 +1,256 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry_test + +import ( + "context" + "net/http/httptest" + "sync" + "testing" + "time" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/pipeline" + "github.com/tracecoreai/tracecore/internal/selftelemetry" + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// fakeFailureReader is the test stub for FailureRateReader. Concurrent- +// safe because the SLO callback may fire from any goroutine. +type fakeFailureReader struct { + mu sync.Mutex + success uint64 + failure uint64 +} + +func (f *fakeFailureReader) SuccessCount() uint64 { + f.mu.Lock() + defer f.mu.Unlock() + return f.success +} + +func (f *fakeFailureReader) FailureCount() uint64 { + f.mu.Lock() + defer f.mu.Unlock() + return f.failure +} + +type fakeRegistry struct { + readers []selftelemetry.FailureRateReader +} + +func (r fakeRegistry) RegisteredExporters() []selftelemetry.FailureRateReader { + return r.readers +} + +// TestRegisterSLOMetrics_AllThreeNamesAppear pins the criterion-10 +// contract: after registration, the scrape contains all three +// named gauges. +func TestRegisterSLOMetrics_AllThreeNamesAppear(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + require.NoError(t, telemetry.RegisterSLOMetrics(mp.Provider, telemetry.ZeroSLOSource{})) + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + + body := scrapeText(t, srv.URL) + require.Contains(t, body, "tracecore_exporter_failure_rate") + require.Contains(t, body, "tracecore_queue_depth_ratio") + require.Contains(t, body, "tracecore_component_restart_count_per_hour") +} + +// TestAggregateSLOSource_WindowedRate exercises the rolling-window +// failure-rate math. Operators want "rate over the last N seconds," +// not lifetime cumulative — a single failure on call #1 must not +// hold the SLO gauge above 0 forever. +func TestAggregateSLOSource_WindowedRate(t *testing.T) { + t.Parallel() + + reader := &fakeFailureReader{} + src := telemetry.NewAggregateSLOSource( + fakeRegistry{readers: []selftelemetry.FailureRateReader{reader}}, + 200*time.Millisecond, // tight window for the test + ) + + // First sample at counts (0,0) — anchor for the window. + require.InDelta(t, 0.0, src.ExporterFailureRate(), 1e-9, "warming up returns 0") + + // Generate signal: 8 success, 2 failure. Rate should be 0.2 + // once we cross the window boundary. + reader.success, reader.failure = 8, 2 + + // Wait past the window so the anchor falls into "≥ window ago." + time.Sleep(250 * time.Millisecond) + + require.InDelta(t, 0.2, src.ExporterFailureRate(), 1e-9, + "windowed rate = (2-0)/(10-0) = 0.2") + + // Generate more signal so cumulative is now 80/20 (0.2 lifetime) + // but recent window is 90% success: 72 new success, 18 new fail. + // Wait long enough for the previous anchor to fall outside. + time.Sleep(250 * time.Millisecond) + reader.success, reader.failure = 80, 20 + + rate := src.ExporterFailureRate() + // In the window we added 72 success + 18 failure → 0.2; total + // is still 0.2 cumulative but the windowed rate exposes only + // the recent change. + require.InDelta(t, 0.2, rate, 1e-9) +} + +// TestAggregateSLOSource_WindowedRate_LifetimeRatioNotReflected pins +// the criterion-driving property: a single failure long ago must NOT +// keep the gauge above 0 once it has rolled out of the window. +func TestAggregateSLOSource_WindowedRate_LifetimeRatioNotReflected(t *testing.T) { + t.Parallel() + + reader := &fakeFailureReader{} + src := telemetry.NewAggregateSLOSource( + fakeRegistry{readers: []selftelemetry.FailureRateReader{reader}}, + 150*time.Millisecond, + ) + + // One failure long ago. + reader.success, reader.failure = 0, 1 + src.ExporterFailureRate() // seed sample + time.Sleep(200 * time.Millisecond) + + // Many successes since. + for i := 0; i < 10; i++ { + reader.success++ + src.ExporterFailureRate() + time.Sleep(20 * time.Millisecond) + } + time.Sleep(200 * time.Millisecond) + + // Lifetime ratio is 1/11 ≈ 0.09. Windowed (last 150ms) should be + // 0 because the only failure rolled out. + require.InDelta(t, 0.0, src.ExporterFailureRate(), 1e-9, + "windowed rate must shed the long-ago failure") +} + +// TestAggregateSLOSource_NoCallsIsZero pins the div-by-zero guard: +// total=0 must return 0, not NaN — otherwise Prometheus alerts +// firing on `> 0` poison the alerting pipeline. +func TestAggregateSLOSource_NoCallsIsZero(t *testing.T) { + t.Parallel() + + src := telemetry.NewAggregateSLOSource(fakeRegistry{readers: nil}, 0) + require.InDelta(t, 0.0, src.ExporterFailureRate(), 1e-9) +} + +// TestAggregateSLOSource_QueueAndRestart_AreZero pins the +// carry-forward contract: until queue + restart infrastructure +// lands, these report 0. +func TestAggregateSLOSource_QueueAndRestart_AreZero(t *testing.T) { + t.Parallel() + + src := telemetry.NewAggregateSLOSource(fakeRegistry{readers: nil}, 0) + require.InDelta(t, 0.0, src.QueueDepthRatio(), 1e-9) + require.InDelta(t, 0.0, src.ComponentRestartCountPerHour(), 1e-9) +} + +// TestRegisterSLOMetrics_RejectsNilArgs pins the error path. +func TestRegisterSLOMetrics_RejectsNilArgs(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + require.Error(t, telemetry.RegisterSLOMetrics(nil, telemetry.ZeroSLOSource{})) + require.Error(t, telemetry.RegisterSLOMetrics(mp.Provider, nil)) +} + +// TestRegisterSLOMetrics_RealExporter_EndToEnd exercises the entire +// chain end-to-end with the REAL `selftelemetry.NewExporter` +// (not a fakeFailureReader stub): construct a MeterProvider, build +// a real Exporter, wire it into the AggregateSLOSource registry, +// register SLO gauges, generate success+failure signal, scrape, and +// assert the windowed failure_rate reflects the real exporter's +// internal counters. +// +// This is the test that pins the wiring cmd/tracecore relies on — +// if `selftelemetry.NewExporter` ever stops satisfying +// FailureRateReader, this test catches it; if AggregateSLOSource's +// math drifts from the exporter's counts, this test catches it. +func TestRegisterSLOMetrics_RealExporter_EndToEnd(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + id := pipeline.MustNewID(pipeline.MustNewType("stdoutexporter"), "e2e") + exp, err := selftelemetry.NewExporter(id, mp.Provider) + require.NoError(t, err) + frr, ok := exp.(selftelemetry.FailureRateReader) + require.True(t, ok, "real Exporter must satisfy FailureRateReader") + + src := telemetry.NewAggregateSLOSource( + fakeRegistry{readers: []selftelemetry.FailureRateReader{frr}}, + 150*time.Millisecond, + ) + require.NoError(t, telemetry.RegisterSLOMetrics(mp.Provider, src)) + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + + // Seed scrape at 0/0 — anchor baseline. + _ = scrapeText(t, srv.URL) + + // Generate 9 success + 1 failure through the REAL exporter. + for range 9 { + exp.IncCallSuccess() + } + exp.IncCallFailure("io") + + // Wait past the window so the next scrape sees a real anchor. + time.Sleep(200 * time.Millisecond) + + body := scrapeText(t, srv.URL) + require.Regexp(t, `tracecore_exporter_failure_rate\{[^}]*\}\s+0\.1`, + body, "windowed rate = 1/10 = 0.1") + // And the raw counter must also surface. + require.Regexp(t, `tracecore_exporter_calls_total\{[^}]*result="success"[^}]*\}\s+9`, body) + require.Regexp(t, `tracecore_exporter_calls_total\{[^}]*result="failure"[^}]*\}\s+1`, body) +} + +// TestRegisterSLOMetrics_FailureRateReflectsRegistry covers the +// end-to-end happy path: register, generate exporter signal, +// wait past the window, scrape, see the windowed rate. +func TestRegisterSLOMetrics_FailureRateReflectsRegistry(t *testing.T) { + t.Parallel() + + mp, err := telemetry.NewMeterProvider() + require.NoError(t, err) + t.Cleanup(func() { _ = mp.Shutdown(context.Background()) }) + + reader := &fakeFailureReader{} + src := telemetry.NewAggregateSLOSource( + fakeRegistry{readers: []selftelemetry.FailureRateReader{reader}}, + 150*time.Millisecond, + ) + require.NoError(t, telemetry.RegisterSLOMetrics(mp.Provider, src)) + + srv := httptest.NewServer(mp.PromHandler()) + t.Cleanup(srv.Close) + + // Seed-scrape at (0,0) — anchor establishes baseline. + _ = scrapeText(t, srv.URL) + + // Generate signal then wait past the window. + reader.success, reader.failure = 3, 1 // 25% failure in-window + time.Sleep(200 * time.Millisecond) + + body := scrapeText(t, srv.URL) + require.Contains(t, body, "tracecore_exporter_failure_rate") + // 0.25 should appear as the value on the metric line. + require.Regexp(t, `tracecore_exporter_failure_rate\{[^}]*\}\s+0\.25`, body) +} diff --git a/internal/telemetry/testdata/bench-baseline.txt b/internal/telemetry/testdata/bench-baseline.txt new file mode 100644 index 00000000..3f338425 --- /dev/null +++ b/internal/telemetry/testdata/bench-baseline.txt @@ -0,0 +1,24 @@ +goos: darwin +goarch: arm64 +pkg: github.com/tracecoreai/tracecore/internal/telemetry +cpu: Apple M4 Pro +BenchmarkAggregateSLOSource_ExporterFailureRate_1Reader-14 182142 3790 ns/op 268 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_1Reader-14 165151 3821 ns/op 268 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_1Reader-14 182319 3810 ns/op 269 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_10Readers-14 179829 3872 ns/op 268 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_10Readers-14 162873 3881 ns/op 268 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_10Readers-14 177326 3871 ns/op 268 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_10Readers-14 175969 3930 ns/op 268 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_10Readers-14 174262 3876 ns/op 269 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_100Readers-14 145010 4536 ns/op 267 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_100Readers-14 151926 4542 ns/op 268 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_100Readers-14 143517 4526 ns/op 269 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_100Readers-14 150547 4512 ns/op 268 B/op 0 allocs/op +BenchmarkAggregateSLOSource_ExporterFailureRate_100Readers-14 145849 4534 ns/op 268 B/op 0 allocs/op +BenchmarkWindowedRate_Observe-14 182289 3809 ns/op 269 B/op 0 allocs/op +BenchmarkWindowedRate_Observe-14 183813 3825 ns/op 268 B/op 0 allocs/op +BenchmarkWindowedRate_Observe-14 176248 3817 ns/op 269 B/op 0 allocs/op +BenchmarkWindowedRate_Observe-14 179674 3877 ns/op 268 B/op 0 allocs/op +BenchmarkWindowedRate_Observe-14 179763 3824 ns/op 268 B/op 0 allocs/op +PASS +ok github.com/tracecoreai/tracecore/internal/telemetry 13.721s diff --git a/internal/telemetry/testhelpers_test.go b/internal/telemetry/testhelpers_test.go new file mode 100644 index 00000000..9bc716ab --- /dev/null +++ b/internal/telemetry/testhelpers_test.go @@ -0,0 +1,32 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry_test + +import ( + "fmt" + "io" + "net/http" + "testing" + + "github.com/stretchr/testify/require" +) + +func scrapeText(t *testing.T, url string) string { + t.Helper() + resp, err := http.Get(url) //nolint:gosec,noctx // test code + require.NoError(t, err) + t.Cleanup(func() { _ = resp.Body.Close() }) + body, err := io.ReadAll(resp.Body) + require.NoError(t, err) + require.Equal(t, http.StatusOK, resp.StatusCode, "/metrics must return 200; body=%q", string(body)) + return string(body) +} + +func newRequest(t *testing.T, method, url string, body io.Reader) (*http.Request, error) { + t.Helper() + req, err := http.NewRequestWithContext(t.Context(), method, url, body) + if err != nil { + return nil, fmt.Errorf("new request: %w", err) + } + return req, nil +} diff --git a/internal/telemetry/windowed_rate.go b/internal/telemetry/windowed_rate.go new file mode 100644 index 00000000..f9ff6a22 --- /dev/null +++ b/internal/telemetry/windowed_rate.go @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry + +import ( + "sync" + "time" +) + +// maxRateSamples caps the rolling-window ring buffer regardless of +// observation cadence. 4096 samples × ~40B = ~160KiB resident even +// under a 1ms misconfigured cadence over an hour. Legitimate cadences +// (≥1s) produce ≪ 120 samples across 2× window so the cap never +// fires in practice; it's defense-in-depth against clock skew or a +// caller observing in a tight loop. +const maxRateSamples = 4096 + +// WindowedRate computes a rolling-window ratio over monotonic counter +// snapshots: `delta(numerator) / delta(denominator)` between a sample +// taken now and the most recent sample ≥ window-old. +// +// Operators alert on `> 0` over a recent window; lifetime cumulative +// ratios stay above 0 forever after a single failure, which makes +// alerting useless. WindowedRate is what tracecore exposes for +// `tracecore.exporter.failure_rate` today and will be reused for +// queue / restart / future SLI gauges as those mechanisms land. +// +// Why not golang.org/x/time/rate or Prometheus's rate() PromQL? +// `golang.org/x/time/rate` is a token-bucket limiter — different +// semantics. PromQL `rate()` runs server-side at query time over +// counter samples; we need a *producer-side* derived value because +// the SLO gauge must emit during boot before any backend has +// scraped enough samples to evaluate `rate()`. The math is small +// enough that a custom type is simpler than a dep. +// +// Safe for concurrent use; multiple Observe calls from different +// goroutines serialise on an internal mutex. Bounded memory: prunes +// to 2× window every Observe and caps at maxRateSamples. +type WindowedRate struct { + window time.Duration + + mu sync.Mutex + samples []windowedSample +} + +type windowedSample struct { + at time.Time + numerator uint64 + denominator uint64 +} + +// NewWindowedRate constructs a WindowedRate over the given window. +// A non-positive window falls back to DefaultSLOWindow. +func NewWindowedRate(window time.Duration) *WindowedRate { + if window <= 0 { + window = DefaultSLOWindow + } + return &WindowedRate{window: window} +} + +// Observe records the current cumulative counters and returns the +// rolling-window rate `delta(numerator) / delta(denominator)` from +// the most recent in-window anchor to now. +// +// Returns 0 in three cases: +// +// - warming up (no anchor sample ≥ window-old yet) +// - zero in-window observations (delta denominator is 0) +// - counter underflow (numerator or denominator went backward, +// which violates the monotonic-counter contract but can happen +// if a future hot-reload swaps in a fresh reader) +// +// The "return 0 on underflow" choice matches the +// `tracecore.exporter.failure_rate` semantics: prefer "warming up" +// signal over NaN/garbage that would poison Prometheus alerts. +func (w *WindowedRate) Observe(numerator, denominator uint64) float64 { + now := time.Now() + + w.mu.Lock() + defer w.mu.Unlock() + + // Drop samples older than 2× window into a fresh slice so the + // old backing array can be reclaimed. + cutoff := now.Add(-2 * w.window) + drop := 0 + for ; drop < len(w.samples); drop++ { + if !w.samples[drop].at.Before(cutoff) { + break + } + } + if drop > 0 { + kept := make([]windowedSample, len(w.samples)-drop) + copy(kept, w.samples[drop:]) + w.samples = kept + } + + // Defense-in-depth cap. + if len(w.samples) >= maxRateSamples { + half := len(w.samples) / 2 + kept := make([]windowedSample, len(w.samples)-half) + copy(kept, w.samples[half:]) + w.samples = kept + } + + // Anchor = latest sample ≥ window-old. Capture by VALUE so the + // append below can reallocate the backing array without leaving + // us with a dangling pointer. + target := now.Add(-w.window) + var ( + anchor windowedSample + anchorOK bool + ) + for i := len(w.samples) - 1; i >= 0; i-- { + if !w.samples[i].at.After(target) { + anchor = w.samples[i] + anchorOK = true + break + } + } + + w.samples = append(w.samples, windowedSample{ + at: now, + numerator: numerator, + denominator: denominator, + }) + + if !anchorOK { + return 0 + } + if numerator < anchor.numerator || denominator < anchor.denominator { + return 0 + } + deltaNum := numerator - anchor.numerator + deltaDen := denominator - anchor.denominator + if deltaDen == 0 { + return 0 + } + return float64(deltaNum) / float64(deltaDen) +} diff --git a/internal/telemetry/windowed_rate_test.go b/internal/telemetry/windowed_rate_test.go new file mode 100644 index 00000000..6980bfdc --- /dev/null +++ b/internal/telemetry/windowed_rate_test.go @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: Apache-2.0 + +package telemetry_test + +import ( + "testing" + "time" + + "github.com/stretchr/testify/require" + + "github.com/tracecoreai/tracecore/internal/telemetry" +) + +// TestWindowedRate_WarmingUpReturnsZero pins the contract: until an +// anchor sample ages out of the window, Observe returns 0 — operators +// alerting on `> 0` don't get false positives during boot. +func TestWindowedRate_WarmingUpReturnsZero(t *testing.T) { + t.Parallel() + w := telemetry.NewWindowedRate(200 * time.Millisecond) + require.InDelta(t, 0.0, w.Observe(1, 10), 1e-9, "first observation") + require.InDelta(t, 0.0, w.Observe(2, 20), 1e-9, "still warming up") +} + +// TestWindowedRate_RateOverWindow pins the rolling math: once an +// anchor is ≥window old, the rate reflects the delta since. +func TestWindowedRate_RateOverWindow(t *testing.T) { + t.Parallel() + w := telemetry.NewWindowedRate(150 * time.Millisecond) + + // Anchor at (0, 0). + w.Observe(0, 0) + + time.Sleep(200 * time.Millisecond) + + // Δ = (5 failures) / (100 calls) = 0.05 + require.InDelta(t, 0.05, w.Observe(5, 100), 1e-9) +} + +// TestWindowedRate_UnderflowReturnsZero pins the safety guard for a +// future hot-reload/restart that resets the underlying counters +// non-monotonically. uint64 subtraction would wrap to ~2^64 and pin +// the gauge at garbage; WindowedRate returns 0 instead. +func TestWindowedRate_UnderflowReturnsZero(t *testing.T) { + t.Parallel() + w := telemetry.NewWindowedRate(100 * time.Millisecond) + + w.Observe(10, 100) + time.Sleep(150 * time.Millisecond) + + require.InDelta(t, 0.0, w.Observe(5, 50), 1e-9, + "counter reset → underflow → zero, not garbage") +} + +// TestWindowedRate_ZeroDeltaReturnsZero pins the div-by-zero guard. +func TestWindowedRate_ZeroDeltaReturnsZero(t *testing.T) { + t.Parallel() + w := telemetry.NewWindowedRate(100 * time.Millisecond) + + w.Observe(0, 0) + time.Sleep(150 * time.Millisecond) + + // Same counters: no delta, no calls in window → 0. + require.InDelta(t, 0.0, w.Observe(0, 0), 1e-9) +} + +// TestWindowedRate_ZeroWindowDefaults pins the constructor's +// fallback contract. +func TestWindowedRate_ZeroWindowDefaults(t *testing.T) { + t.Parallel() + w := telemetry.NewWindowedRate(0) + // Just exercising that construction with 0 doesn't blow up. + // Default is DefaultSLOWindow (60s) so any immediate observation + // returns 0 (warming up). + require.InDelta(t, 0.0, w.Observe(1, 1), 1e-9) +} diff --git a/internal/version/version.go b/internal/version/version.go index 0d4f3988..516b8f62 100644 --- a/internal/version/version.go +++ b/internal/version/version.go @@ -29,18 +29,34 @@ func Info() Build { GoVersion: runtime.Version(), Platform: runtime.GOOS + "/" + runtime.GOARCH, } + info, ok := debug.ReadBuildInfo() + if !ok { + return b + } if b.Revision == "" { - if info, ok := debug.ReadBuildInfo(); ok { - for _, s := range info.Settings { - switch s.Key { - case "vcs.revision": - b.Revision = s.Value - case "vcs.time": - b.BuildDate = s.Value - } + for _, s := range info.Settings { + switch s.Key { + case "vcs.revision": + b.Revision = s.Value + case "vcs.time": + b.BuildDate = s.Value } } } + // Operators debugging an OTel-side bug need the SDK version the + // binary actually shipped with. Reading from build info means the + // number stays accurate across `go.mod` bumps without doc rot. + for _, m := range info.Deps { + if m == nil { + continue + } + switch m.Path { + case "go.opentelemetry.io/otel/sdk/metric": + b.OTelSDKMetric = m.Version + case "go.opentelemetry.io/otel/exporters/prometheus": + b.OTelPromExporter = m.Version + } + } return b } @@ -52,6 +68,15 @@ type Build struct { BuildDate string GoVersion string Platform string + + // OTelSDKMetric is the resolved + // `go.opentelemetry.io/otel/sdk/metric` module version (e.g. + // "v1.43.0"). Empty under `go run` without a module graph. + OTelSDKMetric string + + // OTelPromExporter is the resolved + // `go.opentelemetry.io/otel/exporters/prometheus` version. + OTelPromExporter string } // String formats the build as @@ -63,7 +88,7 @@ type Build struct { // ldflags, no VCS info) is dropped silently so `go run` still emits // a sensible line. func (b Build) String() string { - parts := make([]string, 0, 4) + parts := make([]string, 0, 6) if b.Revision != "" { parts = append(parts, "sha="+b.Revision) } @@ -76,6 +101,12 @@ func (b Build) String() string { if b.Platform != "" { parts = append(parts, "platform="+b.Platform) } + if b.OTelSDKMetric != "" { + parts = append(parts, "otel-sdk-metric="+b.OTelSDKMetric) + } + if b.OTelPromExporter != "" { + parts = append(parts, "otel-prom-exporter="+b.OTelPromExporter) + } out := "tracecore v" + b.Version if len(parts) > 0 { out += " (" + strings.Join(parts, ", ") + ")"