Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ Backend (exporter-side) recipes:
| [integrations/clickhouse-direct.md](integrations/clickhouse-direct.md) | 👤 | Self-hosted ClickHouse via the bundled `clickhouseexporter`. |
| [integrations/loki.md](integrations/loki.md) | 👤 | Grafana Loki via OTLP/HTTP native ingestion (`otlphttp` exporter, `X-Scope-OrgID` tenant header); labels-vs-structured-metadata mapping for `pattern.*` verdict attributes. |
| [integrations/tempo.md](integrations/tempo.md) | 👤 | Grafana Tempo (OSS, AGPL-3.0) trace backend via the in-tree `otlphttp` exporter. |
| [integrations/multi-cluster.md](integrations/multi-cluster.md) | 👤 | Multi-cluster federation v0 (read-only roll-up): N source clusters stamp `cluster.id` via OTTL transform, forward OTLP/HTTP to a central aggregation collector that fans out to backends. |

Source (receiver-side) recipes — RFC-0013 §migration PR-J replacements for the deleted in-tree receivers:

Expand Down
98 changes: 98 additions & 0 deletions docs/integrations/examples/multi-cluster-aggregation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Multi-cluster federation v0 — AGGREGATION-cluster role config (one
# central collector). Receives OTLP/HTTP from N source-cluster
# tracecore collectors and fans out to backends (Loki for verdict logs,
# Tempo/Datadog/ClickHouse/etc. for metrics + traces). Every record on
# the wire already carries the `cluster.id` resource attribute stamped
# by the source-cluster collector (see multi-cluster-source.yaml).
#
# Adoption matrix (RFC-0013 §1, §2): every component is upstream OTel
# collector core or contrib. The aggregation tier deliberately does NOT
# re-run patterndetectorprocessor — verdicts arrive as logs that the
# source clusters already emitted, and re-detecting on top would
# require write-path dedup (v1+ roadmap C4, intentionally out of scope
# for federation v0).
#
# receiver/otlp — collector core. The aggregation listener;
# accepts OTLP/HTTP from every source cluster.
# exporter/otlphttp — collector core. Backend egress; this example
# targets Loki (verdict logs); add additional
# exporters per integrations/{tempo,datadog,
# clickhouse-direct}.md for metrics + traces.
# processor/batch — collector core. Standard.
#
# v0 READ-ONLY contract:
# - Aggregation receives verdicts (already-emitted logs) from every
# source cluster.
# - Aggregation does NOT re-emit deduplicated verdicts across
# clusters; per-cluster verdicts roll up side-by-side, keyed by
# `cluster.id` at query time on the backend.
# - patterndetector is intentionally absent from the pipelines below.
# Re-running detection at the aggregation tier would either
# duplicate verdicts (same input → same output) or require write-
# path dedup, both deferred to v1+.
#
# Endpoint (Loki distributor): see docs/integrations/loki.md for the
# full breakdown. The `otlphttp` exporter appends `/v1/logs` per the
# OTLP spec, so the endpoint resolves to `…/otlp/v1/logs`.

receivers:
# OTLP listener for source-cluster traffic. mTLS posture is the
# recommended deploy shape (configure with `tls:` and the CA bundle
# the gateway uses to issue source-cluster client certs); the
# listener below accepts unauthenticated HTTP for local validation,
# rendering is the operator's responsibility at deploy time.
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
# max_request_body_size guards against pathological source-
# cluster bursts. Default is 20 MiB; raise only if source
# batchprocessor flush sizes legitimately exceed it.
max_request_body_size: 20971520

processors:
batch: {}

exporters:
# Loki for verdict logs. The recipe in docs/integrations/loki.md
# documents the X-Scope-OrgID tenant header model and the labels-vs-
# structured-metadata mapping that `cluster.id` lands in (resource
# attribute, so it is a candidate for the Loki label index if added
# to `default_resource_attributes_as_index_labels` on the Loki side).
otlphttp/loki:
endpoint: http://loki-distributor.observability.svc.cluster.local:3100/otlp
compression: gzip
headers:
X-Scope-OrgID: tracecore
sending_queue:
enabled: true
num_consumers: 4
queue_size: 1000
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s

service:
pipelines:
# Verdict logs from every source cluster roll up here. The
# `cluster.id` resource attribute is preserved by the otlphttp
# exporter and lands in Loki as either a structured-metadata field
# (default) or a stream label (if the Loki operator adds
# `cluster.id` to `default_resource_attributes_as_index_labels`).
logs/federation:
receivers: [otlp]
processors: [batch]
exporters: [otlphttp/loki]
# Metrics + traces roll-up is optional at v0; uncomment and add a
# backend exporter per integrations/{tempo,datadog,clickhouse-
# direct}.md when the backend is provisioned.
# metrics/federation:
# receivers: [otlp]
# processors: [batch]
# exporters: [otlphttp/metrics_backend]
# traces/federation:
# receivers: [otlp]
# processors: [batch]
# exporters: [otlphttp/traces_backend]
121 changes: 121 additions & 0 deletions docs/integrations/examples/multi-cluster-source.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Multi-cluster federation v0 — SOURCE-cluster role config (one per
# source cluster). The collector runs the standard tracecore signal
# pipeline (receivers, patterndetectorprocessor, batch) and exports
# OTLP/HTTP to an aggregation-cluster collector instead of directly to
# a backend. The aggregation cluster fans out to Loki / Tempo / etc.
#
# Adoption matrix (RFC-0013 §1, §2): every component below is upstream
# OTel collector core or contrib. No tracecore-specific federation code.
#
# receiver/otlp — collector core, accepts traffic from the
# cluster's signal sources (or from a
# cluster-internal feeder collector). Listening
# keeps this file role-symmetric with
# multi-cluster-aggregation.yaml; replace with
# filelog / journald / k8sobjects / prometheus
# per the source recipes if this collector also
# ingests directly.
# processor/transform — contrib, stamps the `cluster.id` resource
# attribute that the aggregation tier uses to
# tell verdicts from different clusters apart.
# processor/patterndetector — tracecore; detects verdicts AT the source
# cluster so the aggregation tier sees verdict
# logs rather than raw signals (verdict-routing
# model documented in docs/multi-cluster.md).
# processor/batch — collector core, standard.
# exporter/otlphttp — collector core, OTLP/HTTP egress to the
# aggregation cluster's OTLP receiver. Same
# component the otel-backend / honeycomb / loki
# recipes use; only the endpoint changes.
#
# cluster.id injection: the `transform/cluster_id` processor stamps the
# resource attribute on every log/metric/trace record at this collector
# before it hits the OTLP exporter. The literal value must be replaced
# per cluster (REPLACE_WITH_CLUSTER_ID, e.g. `cluster-a`). Render at
# deploy time via Helm template or envsubst; tracecore does not expand
# environment variables in YAML (doc-check.sh banned-placeholder gate).
#
# Endpoint: the aggregation cluster's OTLP/HTTP receiver. The default
# port for OTLP/HTTP is 4318; the path suffix `/v1/{logs,metrics,traces}`
# is appended by the exporter automatically per the OTLP spec — do NOT
# include it in `endpoint`.
#
# Auth: a literal `Authorization: Bearer …` header is the minimum
# upstream OTLP receivers accept. mTLS is the recommended posture for
# cross-cluster ingress; configure with the `tls:` block on both sides
# and a CA / cert bundle issued by the aggregation cluster's gateway.
# The marker below is a placeholder so `tracecore validate` succeeds
# offline; render the real token / cert at deploy time.

receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318

processors:
# Stamp `cluster.id` on every resource. The aggregation tier uses
# this attribute (and only this attribute) to attribute every verdict
# to its source cluster. Resource context is correct because OTLP
# resource attributes flow downstream attached to every record in the
# batch without re-stamping per datapoint.
transform/cluster_id:
log_statements:
- context: resource
statements:
- set(attributes["cluster.id"], "REPLACE_WITH_CLUSTER_ID")
metric_statements:
- context: resource
statements:
- set(attributes["cluster.id"], "REPLACE_WITH_CLUSTER_ID")
trace_statements:
- context: resource
statements:
- set(attributes["cluster.id"], "REPLACE_WITH_CLUSTER_ID")

# Detect verdicts AT this source cluster. Emitting verdicts here
# (rather than at the aggregation tier) keeps the federation v0
# contract READ-ONLY at the aggregation cluster: aggregation never
# re-runs detection on the same raw signal, so there is no per-cluster
# vs cross-cluster verdict-dedup problem at this milestone (v1+
# roadmap C4 covers write-path dedup).
patterndetector: {}

batch: {}

exporters:
# Egress to the aggregation cluster's OTLP receiver. The endpoint
# below is a placeholder; replace with the gateway / service address
# exposed by the aggregation cluster. The `headers:` block carries
# the cross-cluster auth token; mTLS via `tls:` is the recommended
# posture for production deploys.
otlphttp/aggregation:
endpoint: https://REPLACE_WITH_AGGREGATION_HOST:4318
compression: gzip
timeout: 10s
headers:
Authorization: REPLACE_WITH_BEARER_TOKEN
sending_queue:
enabled: true
num_consumers: 4
queue_size: 1000
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s

service:
pipelines:
logs/federation:
receivers: [otlp]
processors: [transform/cluster_id, patterndetector, batch]
exporters: [otlphttp/aggregation]
metrics/federation:
receivers: [otlp]
processors: [transform/cluster_id, batch]
exporters: [otlphttp/aggregation]
traces/federation:
receivers: [otlp]
processors: [transform/cluster_id, batch]
exporters: [otlphttp/aggregation]
Loading