Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ Pre-alpha. **Distribution-first pivot adopted ([RFC-0013](docs/rfcs/0013-distro-
Pivot landed across three waves of PRs:
- Wave 1 (#166 RFC doc accepted, #168 delete kueue + kineto receivers, #169 pre-PR-A drift sweep + Helm security tighten, #170 containerstdout deletion explicit in §7, #171 PR-A OCB skeleton + `builder-config.yaml` + `make build-ocb`, #172 dedup gate execution, #173 rename check tiers + add PR body-artifact guard, #174 PR-C release pipeline → goreleaser stack + RFC supersession + top-level doc alignment, #175 wave-1 self-review fixes + delete archive folder).
- Wave 2 (#176 PR-D image build → ko + `_build/` walker fix + PR-B reframe as side-effect of binary swap, #177 build-ocb CI gate, #178 post-wave-2 drift sweep, #179 v0.1→v0.2 migration guide skeleton).
- Wave 3 (PR-E: bench heartbeat swap `clockreceiver` → `hostmetricsreceiver`).
- Wave 3 (PR-E: bench heartbeat swap `clockreceiver` → `hostmetricsreceiver`; PR-J: ship the four receiver-side recipes that replace the deleted in-tree receivers — filelog+container, journald+filelog+OTTL, k8sobjects+transform, prometheusreceiver).

**PR-J landed: four receiver-side integration recipes for the v0.2.0 swap.** New docs ship under `docs/integrations/`: [`filelog-container.md`](docs/integrations/filelog-container.md) (replaces `containerstdout` — `filelogreceiver` with the container parser stanza, `k8sattributesprocessor`, and `file_storage` for restart-safe checkpoints), [`journald-kernel.md`](docs/integrations/journald-kernel.md) (replaces `kernelevents` — `journaldreceiver` + `filelogreceiver` on `/dev/kmsg` + OTTL `transform` that preserves the customer-stable `kernelevents.xid` and `gpu.id` attributes from RFC-0013 §3), [`k8sobjects-events.md`](docs/integrations/k8sobjects-events.md) (replaces `k8sevents` — `k8sobjectsreceiver` watch mode + OTTL `transform` that derives the eleven-entry `k8s.event.hint` enum), and [`prometheus-scrape.md`](docs/integrations/prometheus-scrape.md) (replaces `dcgm` + `kueue` — generic `prometheusreceiver` scrape with the four GPU vendor exporters tabulated and an OTTL stamp of the `gpu.vendor` resource attribute). Every recipe ships a matching `docs/integrations/examples/*.yaml` validated end-to-end by `make validator-recipe` against the OCB-built `./_build/tracecore validate`. The k8sobjects recipe introduces a new `<!-- tested-against: requires-k8s-cluster -->` marker recognized by both `scripts/doc-check.sh` (accepted) and `scripts/validator-recipe.sh` (skipped with a named log line) because the upstream `k8sobjectsreceiver`'s `Validate()` enumerates server-preferred resources via the discovery client and therefore cannot be exercised offline — its example is gated by the kind-cluster job that runs the chart. Updates `docs/migration/v0.1-to-v0.2.md` to flip the PR-J open-item to done with file pointers. CHANGELOG only — no operator-visible runtime change; v0.2.0 release still gates on PR-K (in-tree-receiver deletion) and PR-L (final migration guide body).

**PR-E unblocked.** Original RFC-0013 §migration plan named `telemetrygeneratorreceiver` as the upstream replacement for `clockreceiver`. Verified 2026-05-30: the receiver does not exist in `opentelemetry-collector-contrib` at any tag from v0.95.0 through v0.130.0; two community proposals (contrib issues #41687 and #43657) were closed `not_planned`. Replacement landed on `hostmetricsreceiver` (loadscraper @ 1s) — an upstream OCB-bundled receiver that emits 3 low-cardinality series (`system.cpu.load_average.{1m,5m,15m}`) at the cadence the bench's pass condition needs (first parseable JSON line at the sink — see `bench/install/run.sh`). This PR adds `hostmetricsreceiver` to `builder-config.yaml`, adds a `receivers.hostmetrics` opt-in block to the chart values (default disabled — chart default stays `clockreceiver` this release), and flips `bench/install/tracecore-values.yaml` to enable hostmetrics + disable clockreceiver. RFC-0013 §migration PR-E + §4 + §7 deletion table updated. Chart-default flip from `clockreceiver` to `hostmetrics` + source-deletion of `components/receivers/clockreceiver/` are deferred to PR-K (in-tree-receiver deletion wave) so the values-keys migration ships together with `NOTES.txt` deprecation warnings and the coordinated migration of ~92 in-tree test-fixture references in one cut rather than two operator-visible changes.

Expand Down
11 changes: 11 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,24 @@ Legend: 👤 operator · 🛠️ contributor · 🏛️ maintainer · 🌐 exter

## Integrations

Backend (exporter-side) recipes:

| File | Audience | Purpose |
|---|---|---|
| [integrations/otel-backend.md](integrations/otel-backend.md) | 👤 | OTLP/HTTP to a generic OpenTelemetry Collector via the in-tree `otlphttp` exporter. |
| [integrations/honeycomb.md](integrations/honeycomb.md) | 👤 | Direct OTLP/HTTP to Honeycomb via the in-tree `otlphttp` exporter. |
| [integrations/datadog.md](integrations/datadog.md) | 👤 | Datadog via the bundled `datadogexporter`. |
| [integrations/clickhouse-direct.md](integrations/clickhouse-direct.md) | 👤 | Self-hosted ClickHouse via the bundled `clickhouseexporter`. |

Source (receiver-side) recipes — RFC-0013 §migration PR-J replacements for the deleted in-tree receivers:

| File | Audience | Purpose |
|---|---|---|
| [integrations/filelog-container.md](integrations/filelog-container.md) | 👤 | Container stdout/stderr tailing via `filelogreceiver` + container parser + `k8sattributesprocessor` + `file_storage`. Replaces `containerstdout`. |
| [integrations/journald-kernel.md](integrations/journald-kernel.md) | 👤 | Kernel + systemd events via `journaldreceiver` + `filelogreceiver` (kmsg) + OTTL transform preserving `kernelevents.xid` / `gpu.id`. Replaces `kernelevents`. |
| [integrations/k8sobjects-events.md](integrations/k8sobjects-events.md) | 👤 | Kubernetes events via `k8sobjectsreceiver` + OTTL transform preserving the eleven-entry `k8s.event.hint` enum. Replaces `k8sevents`. |
| [integrations/prometheus-scrape.md](integrations/prometheus-scrape.md) | 👤 | Generic Prometheus scrape via `prometheusreceiver` (dcgm-exporter, AMD/Intel/Habana exporters, Kueue) + OTTL `gpu.vendor` normalization. Replaces `dcgm` and `kueue`. |

## Per-component docs

| Path | Audience | Purpose |
Expand Down
94 changes: 94 additions & 0 deletions docs/integrations/examples/filelog-container.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Container stdout/stderr tailing via the upstream `filelogreceiver`
# with a container parser stanza, k8sattributes enrichment, and the
# `file_storage` extension for restart-safe checkpointing. Replaces
# the in-tree `containerstdout` receiver scheduled for deletion at
# v0.2.0 per RFC-0013 §migration PR-K + §7. The OCB-assembled
# tracecore binary bundles every component below; `tracecore
# validate` covers this file.
#
# Deployment shape: tracecore DaemonSet -> reads /var/log/pods/*/* on
# each node -> filelogreceiver parses CRI/JSON -> k8sattributesprocessor
# enriches with pod/namespace/labels -> otlphttpexporter to backend.
# Replace the OTLP endpoint placeholder at deploy time; tracecore does
# not expand environment variables in YAML, so render the literal with
# a secret-injection tool (envsubst, External Secrets, sealed-secrets)
# before `helm install`. See docs/integrations/filelog-container.md.

extensions:
file_storage/checkpoints:
directory: /var/lib/tracecore/filelog
create_directory: true
timeout: 1s
compaction:
directory: /var/lib/tracecore/filelog
on_start: true
on_rebound: true

receivers:
filelog/container:
include:
- /var/log/pods/*/*/*.log
exclude:
- /var/log/pods/*/tracecore/*.log
start_at: end
include_file_path: true
include_file_name: false
storage: file_storage/checkpoints
operators:
- id: container-parser
type: container
format: auto
add_metadata_from_filepath: true
- id: severity-parser
type: severity_parser
parse_from: attributes.stream
mapping:
error: stderr
info: stdout
if: 'attributes.stream != nil'

processors:
k8sattributes:
auth_type: serviceAccount
passthrough: false
extract:
metadata:
- k8s.namespace.name
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.statefulset.name
- k8s.daemonset.name
- k8s.node.name
- k8s.container.name
labels:
- tag_name: app
key: app.kubernetes.io/name
from: pod
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: resource_attribute
name: k8s.namespace.name
- from: resource_attribute
name: k8s.pod.name
batch:
send_batch_size: 8192
timeout: 5s
send_batch_max_size: 16384

exporters:
otlphttp:
endpoint: REPLACE_WITH_OTLP_HTTP_ENDPOINT
compression: gzip
timeout: 10s

service:
extensions: [file_storage/checkpoints]
pipelines:
logs/container:
receivers: [filelog/container]
processors: [k8sattributes, batch]
exporters: [otlphttp]
100 changes: 100 additions & 0 deletions docs/integrations/examples/journald-kernel.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Kernel + systemd events via upstream `journaldreceiver` (journald
# stream) + `filelogreceiver` (raw /dev/kmsg for kernel ring buffer),
# normalized to OTel log records via OTTL `transform` so operator
# alerts against `kernelevents.xid` survive the swap per RFC-0013
# §3 (customer-stable contracts). Replaces the in-tree
# `kernelevents` receiver scheduled for deletion at v0.2.0 per
# RFC-0013 §migration PR-K + §7. The OCB-assembled tracecore binary
# bundles every component below; `tracecore validate` covers this
# file.
#
# Deployment shape: tracecore DaemonSet -> reads /var/log/journal
# + /dev/kmsg on each node (hostPath mounts) -> OTTL transform
# normalizes severity + extracts NVRM Xid -> otlphttpexporter to
# backend. Replace the OTLP endpoint placeholder at deploy time;
# tracecore does not expand environment variables in YAML, so render
# the literal with a secret-injection tool (envsubst, External
# Secrets, sealed-secrets) before `helm install`. See
# docs/integrations/journald-kernel.md.

receivers:
journald:
directory: /var/log/journal
units:
- kubelet.service
- containerd.service
- systemd-networkd.service
priority: info
filelog/kmsg:
include:
- /dev/kmsg
start_at: end
include_file_path: false
operators:
- id: kmsg-parser
type: regex_parser
regex: '^(?P<priority>\d+),(?P<seq>\d+),(?P<usec>\d+),(?P<flag>\w+);(?P<message>.*)$'
# Map syslog numeric priority (0-7) to OTel severity. The
# numeric values are quoted because severity_parser's
# mapping accepts string keys.
severity:
parse_from: attributes.priority
mapping:
error: "3"
warn: "4"
info: "6"
debug: "7"

processors:
# Kmsg lines arrive with body=string (the raw `/dev/kmsg` line)
# so IsMatch / ExtractPatterns on body type-check correctly. The
# NVRM Xid signal only appears here, never in journald output.
transform/kmsg_xid:
log_statements:
- context: log
# OTTL statements are single-quoted to prevent YAML from
# interpreting embedded `:` (inside regex character classes)
# as a map-key separator. Without the quotes the parser
# rejects the value as `type=string cannot be used as a
# Conf` at validate time.
statements:
# NVRM Xid extraction preserves the customer-stable
# `kernelevents.xid` attribute across the receiver swap.
- 'set(attributes["kernelevents.xid"], Int(ExtractPatterns(body, "NVRM: Xid \\(PCI:[0-9a-fA-F:.]+\\): (?P<xid>\\d+)")["xid"])) where IsMatch(body, "NVRM: Xid")'
# gpu.id (PCI BDF) extracted from the Xid line where
# present; matches §3 customer-stable contract.
- 'set(attributes["gpu.id"], ExtractPatterns(body, "NVRM: Xid \\((?P<bdf>PCI:[0-9a-fA-F:.]+)\\)")["bdf"]) where IsMatch(body, "NVRM: Xid")'
# Journald records arrive with body=map (every journald field
# keyed verbatim, e.g. body["_SYSTEMD_UNIT"]). Lifting the unit
# name to the standard `service.name` resource attribute lets
# downstream filters speak the OTel convention instead of the
# journald-specific underscored names.
transform/journald_service_name:
log_statements:
- context: log
statements:
- 'set(attributes["service.name"], body["_SYSTEMD_UNIT"]) where body["_SYSTEMD_UNIT"] != nil'
batch:
send_batch_size: 4096
timeout: 5s

exporters:
otlphttp:
endpoint: REPLACE_WITH_OTLP_HTTP_ENDPOINT
compression: gzip
timeout: 10s

service:
pipelines:
# Separate pipelines per receiver so each OTTL transform sees
# the body shape it was written against (string for kmsg, map
# for journald). A single shared pipeline would force the NVRM
# Xid IsMatch to run against a map and runtime-error.
logs/kmsg:
receivers: [filelog/kmsg]
processors: [transform/kmsg_xid, batch]
exporters: [otlphttp]
logs/journald:
receivers: [journald]
processors: [transform/journald_service_name, batch]
exporters: [otlphttp]
64 changes: 64 additions & 0 deletions docs/integrations/examples/k8sobjects-events.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Kubernetes Events API via upstream `k8sobjectsreceiver` in watch
# mode, normalized through OTTL `transform` to populate the
# customer-stable `k8s.event.hint` attribute (RFC-0013 §3 - the
# 11-entry enum pod_evicted / mount_failure / backoff / oom_killed /
# node_unhealthy / schedule_failure / create_failure /
# volume_attach_failure / container_status_unknown / node_pressure /
# image_pull_failure). Replaces the in-tree `k8sevents` receiver
# scheduled for deletion at v0.2.0 per RFC-0013 §migration PR-K +
# §7. The OCB-assembled tracecore binary bundles every component
# below; `tracecore validate` covers this file.
#
# Deployment shape: tracecore Deployment (single replica, not
# DaemonSet - one watcher per cluster) -> watches core/v1 Events ->
# OTTL transform derives `k8s.event.hint` from event.reason ->
# otlphttpexporter to backend. Replace the OTLP endpoint placeholder
# at deploy time; tracecore does not expand environment variables in
# YAML, so render the literal with a secret-injection tool
# (envsubst, External Secrets, sealed-secrets) before `helm
# install`. See docs/integrations/k8sobjects-events.md.

receivers:
k8sobjects:
auth_type: serviceAccount
objects:
- name: events
mode: watch
group: ""

processors:
# Derive the 11-entry `k8s.event.hint` enum from the Kubernetes
# Event.reason field so operator alerts against §3's customer-stable
# contract survive the receiver swap. Reason values come from
# kubernetes/kubernetes/pkg/kubelet/events/event.go.
transform/hint:
log_statements:
- context: log
statements:
- set(attributes["k8s.event.hint"], "pod_evicted") where body["object"]["reason"] == "Evicted"
- set(attributes["k8s.event.hint"], "oom_killed") where body["object"]["reason"] == "OOMKilling"
- set(attributes["k8s.event.hint"], "backoff") where body["object"]["reason"] == "BackOff"
- set(attributes["k8s.event.hint"], "create_failure") where body["object"]["reason"] == "FailedCreatePodSandBox" or body["object"]["reason"] == "FailedCreate"
- set(attributes["k8s.event.hint"], "schedule_failure") where body["object"]["reason"] == "FailedScheduling"
- set(attributes["k8s.event.hint"], "mount_failure") where body["object"]["reason"] == "FailedMount"
- set(attributes["k8s.event.hint"], "volume_attach_failure") where body["object"]["reason"] == "FailedAttachVolume"
- set(attributes["k8s.event.hint"], "image_pull_failure") where body["object"]["reason"] == "Failed" or body["object"]["reason"] == "ErrImagePull" or body["object"]["reason"] == "ImagePullBackOff"
- set(attributes["k8s.event.hint"], "container_status_unknown") where body["object"]["reason"] == "ContainerStatusUnknown"
- set(attributes["k8s.event.hint"], "node_pressure") where body["object"]["reason"] == "EvictionThresholdMet" or body["object"]["reason"] == "NodeHasInsufficientMemory" or body["object"]["reason"] == "NodeHasDiskPressure" or body["object"]["reason"] == "NodeHasInsufficientPID"
- set(attributes["k8s.event.hint"], "node_unhealthy") where body["object"]["reason"] == "NodeNotReady" or body["object"]["reason"] == "NodeNotSchedulable"
batch:
send_batch_size: 1024
timeout: 10s

exporters:
otlphttp:
endpoint: REPLACE_WITH_OTLP_HTTP_ENDPOINT
compression: gzip
timeout: 10s

service:
pipelines:
logs/k8sevents:
receivers: [k8sobjects]
processors: [transform/hint, batch]
exporters: [otlphttp]
67 changes: 67 additions & 0 deletions docs/integrations/examples/prometheus-scrape.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Generic Prometheus scrape via upstream `prometheusreceiver` -
# the adoption shape for every vendor GPU exporter per RFC-0013 §2
# (NVIDIA `dcgm-exporter`, AMD `ROCm/device-metrics-exporter`,
# Intel `intel/xpumanager`, Habana Prometheus Metric Exporter) and
# for Kueue scheduler metrics. Replaces the in-tree `dcgm` and
# `kueue` receivers (deleted at v0.1.0 per RFC-0013 §7 deletion
# table). The OCB-assembled tracecore binary bundles every
# component below; `tracecore validate` covers this file.
#
# Deployment shape: tracecore DaemonSet (for per-node scrape
# targets like dcgm-exporter) or Deployment (for cluster-scoped
# targets like the Kueue control-plane) -> Prometheus-style scrape
# at the configured interval -> OTTL transform normalizes
# customer-stable resource attributes (`gpu.vendor`, `gpu.id`) ->
# otlphttpexporter to backend. The example below scrapes a
# dcgm-exporter DaemonSet at the conventional :9400/metrics
# endpoint. Replace the OTLP endpoint and dcgm-exporter target
# placeholders at deploy time; tracecore does not expand
# environment variables in YAML, so render the literals with a
# secret-injection tool (envsubst, External Secrets,
# sealed-secrets) before `helm install`. See
# docs/integrations/prometheus-scrape.md.

receivers:
prometheus:
config:
scrape_configs:
- job_name: dcgm-exporter
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
static_configs:
- targets:
- REPLACE_WITH_DCGM_EXPORTER_TARGET
# Add bearer_token / tls_config blocks here for
# authenticated targets such as Kueue's controller-manager
# metrics endpoint; see the recipe markdown for the full
# field list.

processors:
# Normalize the cross-vendor customer-stable attributes from
# RFC-0013 §3 so dashboards survive a future swap from
# dcgm-exporter to AMD/Intel/Habana equivalents.
transform/gpu_vendor:
metric_statements:
- context: datapoint
statements:
- set(resource.attributes["gpu.vendor"], "nvidia") where IsMatch(metric.name, "^DCGM_")
- set(resource.attributes["gpu.vendor"], "amd") where IsMatch(metric.name, "^amdsmi_")
- set(resource.attributes["gpu.vendor"], "intel") where IsMatch(metric.name, "^xpum_")
- set(resource.attributes["gpu.vendor"], "habana") where IsMatch(metric.name, "^habanalabs_")
batch:
send_batch_size: 8192
timeout: 10s

exporters:
otlphttp:
endpoint: REPLACE_WITH_OTLP_HTTP_ENDPOINT
compression: gzip
timeout: 10s

service:
pipelines:
metrics/scrape:
receivers: [prometheus]
processors: [transform/gpu_vendor, batch]
exporters: [otlphttp]
Loading
Loading