Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 26 additions & 50 deletions install/kubernetes/tracecore/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

Minimal-privilege DaemonSet for the [tracecore](https://github.com/tracecoreai/tracecore)
OpenTelemetry collector. Renders a `restricted`-class Pod Security Standard
pod spec by default; per-receiver toggles let an operator opt into
hardware-coupled receivers (DCGM, kernelevents) without changing the
pod spec by default; per-receiver toggles let an operator wire upstream
OCB-bundled components (hostmetrics, otlphttp, …) without changing the
template.

| Chart attribute | Value |
Expand All @@ -30,11 +30,12 @@ helm template tracecore install/kubernetes/tracecore \
| kubectl apply --dry-run=server -f -
```

The default values enable the hardware-free `clockreceiver` paired with
the in-tree `stdoutexporter`; the DaemonSet boots cleanly on a no-GPU
cluster. To enable the GPU `dcgm` receiver or the host-kernel
`kernelevents` receiver, see `values.yaml` and the deviations table in
"Pod Security Standard compliance" below.
The default values enable the upstream OCB-bundled `hostmetrics`
receiver (loadscraper) paired with the `debug` exporter; the DaemonSet
boots cleanly on a no-GPU cluster and writes load-average metrics to
pod stdout. Swap `debug` for `otlphttp` (also OCB-bundled) before
treating the DaemonSet as a steady-state production deployment — see
the worked overlay below.

## Upgrade

Expand Down Expand Up @@ -112,9 +113,9 @@ automatically; PersistentVolumeClaims (if any are added via the
| `containerSecurityContext.capabilities.add` | list | `[]` | SYS_PTRACE is the only allowed addition; conftest rejects any other. |
| `telemetry.enabled` | bool | `true` | tracecore `/metrics`+`/healthz`+`/readyz` listener. |
| `telemetry.listen` | string | `0.0.0.0:8888` | Pod-IP listener; kubelet probes hit the pod IP. |
| `receivers.<name>.enabled` | bool | varies | Toggle per receiver. `clockreceiver` on by default. |
| `exporters.<name>.enabled` | bool | varies | Toggle per exporter. `stdoutexporter` on by default. |
| `pipelines.<key>` | map | `metrics: {receivers:[clockreceiver], exporters:[stdoutexporter]}` | Pipeline wiring. References to disabled components are silently dropped at render time. |
| `receivers.<name>.enabled` | bool | varies | Toggle per receiver. `hostmetrics` on by default. |
| `exporters.<name>.enabled` | bool | varies | Toggle per exporter. `debug` on by default. |
| `pipelines.<key>` | map | `metrics: {receivers:[hostmetrics], exporters:[debug]}` | Pipeline wiring. References to disabled components are silently dropped at render time. |
| `config` | map | `{}` | Free-form override deep-merged INTO the rendered tracecore config last. Do NOT place credentials here; ConfigMaps are unencrypted in etcd. |
| `resources.requests` | map | `{cpu: 10m, memory: 32Mi}` | Conservative defaults; tune for receiver load. |
| `resources.limits` | map | `{cpu: 100m, memory: 128Mi}` | Conservative defaults; tune for receiver load. |
Expand All @@ -135,41 +136,23 @@ A few worked examples for typical adopter overlays. Save each as a
file and pass with `-f <file>`; `--reuse-values` preserves anything
not overridden.

**Enable DCGM on every node (requires `nv-hostengine` reachable):**

```yaml
# dcgm-overlay.yaml
receivers:
clockreceiver:
enabled: false
dcgm:
enabled: true
endpoint: localhost:5555
pipelines:
metrics:
receivers: [dcgm]
exporters: [stdoutexporter]
```

Apply: `helm upgrade tracecore install/kubernetes/tracecore -n tracecore-system -f dcgm-overlay.yaml`

**Route output to an OTLP backend (structured `exporters.otlphttp` toggle):**

```yaml
# otlp-overlay.yaml
exporters:
stdoutexporter:
debug:
enabled: false
otlphttp:
enabled: true
endpoint: https://collector.example.com:4318
pipelines:
metrics:
receivers: [clockreceiver]
receivers: [hostmetrics]
exporters: [otlphttp]
```

The full otlphttp field reference (headers, compression, timeout, max_retries, insecure_skip_verify, ...) lives at [`components/exporters/otlphttp/README.md`](../../../components/exporters/otlphttp/README.md). For fields the structured block doesn't expose, use the free-form `config.exporters.otlphttp.*` deep-merge block.
The full otlphttp field reference (headers, compression, timeout, retry_on_failure, sending_queue, tls.*, ...) follows the upstream [`otlphttpexporter`](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlphttpexporter) README. For fields the structured block doesn't expose, use the free-form `config.exporters.otlphttp.*` deep-merge block.

**Run on every node including tainted ones (control plane, GPU pools):**

Expand Down Expand Up @@ -212,12 +195,12 @@ template in a way that violates the minimum-privilege charter. Re-read
and the fixture set under `policies/conftest/testdata/` before
patching the template.

**`OOMKilled` after enabling `receivers.kernelevents`.** The kernel
event source can buffer large batches under journald load; the chart's
default `resources.limits.memory: 128Mi` is sized for the default
hardware-free configuration. Bump to `256Mi` or higher
(`--set resources.limits.memory=256Mi`) and monitor RSS with
`kubectl top pod`.
**`OOMKilled` after wiring a high-volume upstream receiver.** The
chart's default `resources.limits.memory: 128Mi` is sized for the
hardware-free hostmetrics + debug pairing. Receivers that buffer large
batches (filelog, journald, kafkareceiver) push RSS well above that.
Bump to `256Mi` or higher (`--set resources.limits.memory=256Mi`) and
monitor RSS with `kubectl top pod`.

**Rollout takes hours on fleets above ~500 nodes.** Default
`updateStrategy.rollingUpdate.maxUnavailable: 1` × per-node readiness
Expand Down Expand Up @@ -290,19 +273,12 @@ chart's deviations from a literal reading of `restricted`:
capability is in the conftest allowlist; any other addition rejects
the build.

2. **Host-path mounts are required for some receivers.** Enabling
`receivers.kernelevents` requires `hostPath` mounts (`/dev/kmsg`
read-only, optionally `/var/log/journal` and
`/run/systemd/journal` for the journald source). The chart does
not render those mounts by default; operators opt in via the
`config:` override and accept the deviation.

3. **DCGM standalone mode connects to `nv-hostengine`.** When
`receivers.dcgm.enabled=true` and `mode=standalone`, the
DaemonSet connects to an external DCGM endpoint specified in
values. The chart does not run `nv-hostengine` in-process and does
not add capabilities for it. Embedded mode is out of scope for the
default chart.
2. **Host-path mounts are required for some upstream receivers.**
Wiring `journaldreceiver` or `filelogreceiver` (via the `config:`
override) typically requires `hostPath` mounts (`/var/log/journal`,
`/run/systemd/journal`, `/var/log/pods`, …). The chart does not
render those mounts by default; operators opt in via a custom
DaemonSet patch and accept the deviation.

Each deviation is bounded by the conftest policy: the policy only
permits SYS_PTRACE, never relaxes hostPID/hostIPC/hostNetwork, and
Expand Down
17 changes: 4 additions & 13 deletions install/kubernetes/tracecore/ci/all-receivers-off-values.yaml
Original file line number Diff line number Diff line change
@@ -1,25 +1,16 @@
# Used by the chart-render CI gate. Disables every receiver and
# exporter; rendered config has no pipelines and only the
# service.telemetry self-metrics + health_check extension.
# The chart.yml workflow skips `tracecore validate` on this fixture
# (RFC-0013 PR-A2 disabled the gate while the chart still emits the
# legacy `telemetry:` top-level key the OCB binary does not recognise;
# PR-K reinstates the gate after the chart shape migrates to upstream
# `service.telemetry`).
# The chart.yml workflow does NOT run `tracecore validate` on this
# fixture: upstream OTel requires at least one receiver configuration,
# so an all-off render cannot pass validate by design. The fixture's
# purpose is the helm render + conftest gates.
receivers:
hostmetrics:
enabled: false
clockreceiver:
enabled: false
dcgm:
enabled: false
kernelevents:
enabled: false

exporters:
debug:
enabled: false
stdoutexporter:
enabled: false

pipelines: {}
8 changes: 0 additions & 8 deletions install/kubernetes/tracecore/ci/one-receiver-on-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,10 @@ receivers:
collection_interval: 1s
scrapers:
load: {}
clockreceiver:
enabled: false
dcgm:
enabled: false
kernelevents:
enabled: false

exporters:
debug:
enabled: true
stdoutexporter:
enabled: false

pipelines:
metrics:
Expand Down
18 changes: 5 additions & 13 deletions install/kubernetes/tracecore/ci/pyspy-on-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,17 @@
# addition because the helper does the frame walking inside the
# target Python process.
#
# Pyspy is an in-tree-only receiver pending PR-K deletion; this
# fixture exists ONLY to exercise the conftest + caps assertions on
# the DaemonSet. The rendered configmap is NOT fed to `tracecore
# validate` (the OCB binary does not register pyspy). The chart-
# render workflow's validate gate uses one-receiver-on-values.yaml.
# Pyspy is an in-tree-only receiver pending an upstream OTel Profiles
# replacement; this fixture exists ONLY to exercise the conftest + caps
# assertions on the DaemonSet. The rendered configmap is NOT fed to
# `tracecore validate` (the OCB binary does not register pyspy). The
# chart-render workflow's validate gate uses one-receiver-on-values.yaml.
receivers:
hostmetrics:
enabled: true
collection_interval: 1s
scrapers:
load: {}
clockreceiver:
enabled: false
dcgm:
enabled: false
kernelevents:
enabled: false
pyspy:
enabled: true
target:
Expand All @@ -34,8 +28,6 @@ receivers:
exporters:
debug:
enabled: true
stdoutexporter:
enabled: false

pipelines:
logs:
Expand Down
80 changes: 0 additions & 80 deletions install/kubernetes/tracecore/policies/conftest/tracecore.rego
Original file line number Diff line number Diff line change
Expand Up @@ -44,32 +44,16 @@ deny contains msg if {
msg := sprintf("%s/%s sets hostUsers=true; user-namespace sharing with the host is forbidden", [input.kind, input.metadata.name])
}

# containerstdout-allowlist — the M15 containerstdout receiver is the
# only sanctioned path to runAsUser=0 in the chart because the kubelet
# CRI symlink tree under /var/log/pods is root-owned on every distro
# tracecore supports. The exemption is gated on the *presence of the
# containerstdout-pod-logs hostPath volume*, NOT on a values flag, so
# a custom DaemonSet patch that drops the volume cannot smuggle root
# through this rule. The receiver's RUNBOOK enumerates the privilege
# tradeoff; values.yaml comments document operator opt-in.
containerstdout_enabled if {
some v in object.get(pod_spec, "volumes", [])
v.name == "containerstdout-pod-logs"
}

# Pod-level runAsNonRoot — restricted PSS requires this OR every container's
# securityContext.runAsNonRoot=true. The chart sets it at pod level; the
# policy enforces that contract so a values-override can't downgrade.
#
# Guarded with input.spec.template.spec so the rule only fires on
# pod-bearing objects (Deployment, DaemonSet, StatefulSet, Job, …);
# ConfigMap / ServiceAccount documents are exempt.
#
# Exempts containerstdout-enabled DaemonSets — they MUST run as root.
deny contains msg if {
input.spec.template.spec
not pod_spec.securityContext.runAsNonRoot == true
not containerstdout_enabled
msg := sprintf("%s/%s must set pod securityContext.runAsNonRoot=true", [input.kind, input.metadata.name])
}

Expand Down Expand Up @@ -99,20 +83,15 @@ all_containers_have_seccomp if {
# UID 0 / GID 0 forbidden — restricted PSS requires non-root. The chart
# default sets runAsUser/Group to 65532; this rule rejects a values
# override that downgrades to root.
#
# Exempts containerstdout-enabled DaemonSets — they MUST run as root
# to read kubelet-owned CRI symlinks under /var/log/pods.
deny contains msg if {
input.spec.template.spec
pod_spec.securityContext.runAsUser == 0
not containerstdout_enabled
msg := sprintf("%s/%s sets runAsUser=0; root execution is forbidden", [input.kind, input.metadata.name])
}

deny contains msg if {
input.spec.template.spec
pod_spec.securityContext.runAsGroup == 0
not containerstdout_enabled
msg := sprintf("%s/%s sets runAsGroup=0; root group is forbidden", [input.kind, input.metadata.name])
}

Expand Down Expand Up @@ -154,62 +133,3 @@ deny contains msg if {
msg := sprintf("container %q adds capability %q; only SYS_PTRACE is allowed", [c.name, cap])
}

# ─── containerstdout (M15) operational invariants ──────────────────
#
# The containerstdout receiver opts into root execution + a per-node
# Pod informer via the chart's values knob. These rules pin the
# operational shape so a chart edit that omits the volumes, RBAC, or
# downward-API env can't ship — the receiver would crash-loop in any
# of those states, but the failure would land at runtime in
# `kubectl logs` rather than at `helm install --dry-run`.

# Required hostPath: /var/log/pods. The tailer cannot resolve any
# CRI symlink without it; absence is a hard fail.
deny contains msg if {
input.spec.template.spec
containerstdout_enabled
not has_volume_named("containerstdout-pod-logs")
msg := sprintf("%s/%s enables containerstdout but missing hostPath volume 'containerstdout-pod-logs' (/var/log/pods); the tailer cannot read CRI symlinks without it", [input.kind, input.metadata.name])
}

# containerstdout-pod-logs hostPath must point at /var/log/pods. Any
# other path defeats the CRI symlink resolution contract and risks
# tailing an attacker-controlled directory.
deny contains msg if {
some vol in object.get(pod_spec, "volumes", [])
vol.name == "containerstdout-pod-logs"
vol.hostPath.path != "/var/log/pods"
msg := sprintf("containerstdout-pod-logs volume must mount /var/log/pods, got %q", [vol.hostPath.path])
}

# Required hostPath: cursor directory. Without it cursor writes go to
# the read-only rootfs and every checkpoint increments
# KindCursorWriteFailed — verified by TestFailure_CursorWriteFailedReadOnlyFs.
deny contains msg if {
input.spec.template.spec
containerstdout_enabled
not has_volume_named("containerstdout-cursor")
msg := sprintf("%s/%s enables containerstdout but missing hostPath volume 'containerstdout-cursor' (cursor persistence dir); restarts will double-emit lines", [input.kind, input.metadata.name])
}

# Required env: K8S_NODE_NAME from downward API. The per-node Pod
# informer scope-filters on it; an empty value falls back to a
# cluster-wide watch and blows the RFC-0010 §Egress budget.
deny contains msg if {
input.spec.template.spec
containerstdout_enabled
not containerstdout_has_node_name_env
msg := sprintf("%s/%s enables containerstdout but the tracecore container is missing K8S_NODE_NAME env (downward API fieldRef spec.nodeName); the per-node informer cannot scope-filter", [input.kind, input.metadata.name])
}

has_volume_named(name) if {
some v in object.get(pod_spec, "volumes", [])
v.name == name
}

containerstdout_has_node_name_env if {
some c in pod_spec.containers
some e in object.get(c, "env", [])
e.name == "K8S_NODE_NAME"
e.valueFrom.fieldRef.fieldPath == "spec.nodeName"
}
17 changes: 6 additions & 11 deletions install/kubernetes/tracecore/templates/NOTES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,11 @@ OTLP/Datadog/ClickHouse exporter (otlphttp / datadog / clickhouse — all
OCB-bundled) before treating the DaemonSet as a steady-state
production deployment.
{{- end }}
{{- if or .Values.receivers.dcgm.enabled .Values.receivers.kernelevents.enabled .Values.receivers.containerstdout.enabled .Values.receivers.clockreceiver.enabled .Values.receivers.pyspy.enabled .Values.exporters.stdoutexporter.enabled }}
{{- if .Values.receivers.pyspy.enabled }}

WARNING (RFC-0013 PR-A2, 2026-05-30): you have enabled an in-tree-only
component that the OCB-assembled binary does NOT register. Enabling any
of {clockreceiver, dcgm, kernelevents, pyspy, containerstdout,
stdoutexporter} will cause the pod to crash at startup because the
factory is unknown. The chart preserves these toggles for migration
tooling; PR-J (RFC-0013 §migration) delivers the upstream recipes
(hostmetrics already shipped, journald + filelog + k8sobjects +
prometheus follow). Until then, use the `config:` free-form override
block to wire OCB-supported components (run `./_build/tracecore
components` to see the live registry).
WARNING: pyspy is an in-tree-only receiver pending an upstream OTel
Profiles GA replacement. The OCB-assembled binary does NOT register
pyspy yet, so enabling it crashes the pod at startup. Use the
`config:` free-form override block to wire OCB-supported components
instead (run `tracecore components` to see the live registry).
{{- end }}
19 changes: 5 additions & 14 deletions install/kubernetes/tracecore/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,10 @@ values.
RFC-0013 PR-A2 (2026-05-30): output shape is the upstream OpenTelemetry
config schema. The OCB-assembled binary registers only upstream/contrib
factories — see `tracecore components` for the live registry. The
chart's per-receiver toggles for in-tree-only components
(clockreceiver, dcgm, kernelevents, pyspy, containerstdout,
stdoutexporter) still render their blocks when enabled so the validate
gate in chart.yml falsifies the regression, but those code paths fail
fast at boot until PR-J delivers the upstream replacements.
chart still ships a `pyspy` toggle (in-tree-only; no upstream
equivalent pending OTel Profiles GA) — enabling it crashes the pod at
startup, but the toggle survives so the values shape doesn't break
operators that pinned it.

Intermediate dict shape:
receivers: { <name>: <factory-config> }
Expand All @@ -77,19 +76,11 @@ Intermediate dict shape:
{{- define "tracecore.renderedConfig" -}}
{{- $built := dict -}}

{{/* Receivers — include only enabled blocks; strip the `enabled` key.
containerstdout carries chart-only keys (rbac, hostPath) that
control RBAC + DaemonSet volume rendering but are NOT part of the
receiver's runtime config schema; omit them from the rendered
tracecore config so config.Load does not reject the chart output
with an unknown-field error. */}}
{{/* Receivers — include only enabled blocks; strip the `enabled` key. */}}
{{- $recvs := dict -}}
{{- range $name, $cfg := .Values.receivers -}}
{{- if $cfg.enabled -}}
{{- $body := omit $cfg "enabled" -}}
{{- if eq $name "containerstdout" -}}
{{- $body = omit $body "rbac" "hostPath" -}}
{{- end -}}
{{- $_ := set $recvs $name $body -}}
{{- end -}}
{{- end -}}
Expand Down
Loading
Loading