diff --git a/install/kubernetes/tracecore/README.md b/install/kubernetes/tracecore/README.md index d1e70620..4b9026d3 100644 --- a/install/kubernetes/tracecore/README.md +++ b/install/kubernetes/tracecore/README.md @@ -2,8 +2,8 @@ Minimal-privilege DaemonSet for the [tracecore](https://github.com/tracecoreai/tracecore) OpenTelemetry collector. Renders a `restricted`-class Pod Security Standard -pod spec by default; per-receiver toggles let an operator opt into -hardware-coupled receivers (DCGM, kernelevents) without changing the +pod spec by default; per-receiver toggles let an operator wire upstream +OCB-bundled components (hostmetrics, otlphttp, …) without changing the template. | Chart attribute | Value | @@ -30,11 +30,12 @@ helm template tracecore install/kubernetes/tracecore \ | kubectl apply --dry-run=server -f - ``` -The default values enable the hardware-free `clockreceiver` paired with -the in-tree `stdoutexporter`; the DaemonSet boots cleanly on a no-GPU -cluster. To enable the GPU `dcgm` receiver or the host-kernel -`kernelevents` receiver, see `values.yaml` and the deviations table in -"Pod Security Standard compliance" below. +The default values enable the upstream OCB-bundled `hostmetrics` +receiver (loadscraper) paired with the `debug` exporter; the DaemonSet +boots cleanly on a no-GPU cluster and writes load-average metrics to +pod stdout. Swap `debug` for `otlphttp` (also OCB-bundled) before +treating the DaemonSet as a steady-state production deployment — see +the worked overlay below. ## Upgrade @@ -112,9 +113,9 @@ automatically; PersistentVolumeClaims (if any are added via the | `containerSecurityContext.capabilities.add` | list | `[]` | SYS_PTRACE is the only allowed addition; conftest rejects any other. | | `telemetry.enabled` | bool | `true` | tracecore `/metrics`+`/healthz`+`/readyz` listener. | | `telemetry.listen` | string | `0.0.0.0:8888` | Pod-IP listener; kubelet probes hit the pod IP. | -| `receivers..enabled` | bool | varies | Toggle per receiver. `clockreceiver` on by default. | -| `exporters..enabled` | bool | varies | Toggle per exporter. `stdoutexporter` on by default. | -| `pipelines.` | map | `metrics: {receivers:[clockreceiver], exporters:[stdoutexporter]}` | Pipeline wiring. References to disabled components are silently dropped at render time. | +| `receivers..enabled` | bool | varies | Toggle per receiver. `hostmetrics` on by default. | +| `exporters..enabled` | bool | varies | Toggle per exporter. `debug` on by default. | +| `pipelines.` | map | `metrics: {receivers:[hostmetrics], exporters:[debug]}` | Pipeline wiring. References to disabled components are silently dropped at render time. | | `config` | map | `{}` | Free-form override deep-merged INTO the rendered tracecore config last. Do NOT place credentials here; ConfigMaps are unencrypted in etcd. | | `resources.requests` | map | `{cpu: 10m, memory: 32Mi}` | Conservative defaults; tune for receiver load. | | `resources.limits` | map | `{cpu: 100m, memory: 128Mi}` | Conservative defaults; tune for receiver load. | @@ -135,41 +136,23 @@ A few worked examples for typical adopter overlays. Save each as a file and pass with `-f `; `--reuse-values` preserves anything not overridden. -**Enable DCGM on every node (requires `nv-hostengine` reachable):** - -```yaml -# dcgm-overlay.yaml -receivers: - clockreceiver: - enabled: false - dcgm: - enabled: true - endpoint: localhost:5555 -pipelines: - metrics: - receivers: [dcgm] - exporters: [stdoutexporter] -``` - -Apply: `helm upgrade tracecore install/kubernetes/tracecore -n tracecore-system -f dcgm-overlay.yaml` - **Route output to an OTLP backend (structured `exporters.otlphttp` toggle):** ```yaml # otlp-overlay.yaml exporters: - stdoutexporter: + debug: enabled: false otlphttp: enabled: true endpoint: https://collector.example.com:4318 pipelines: metrics: - receivers: [clockreceiver] + receivers: [hostmetrics] exporters: [otlphttp] ``` -The full otlphttp field reference (headers, compression, timeout, max_retries, insecure_skip_verify, ...) lives at [`components/exporters/otlphttp/README.md`](../../../components/exporters/otlphttp/README.md). For fields the structured block doesn't expose, use the free-form `config.exporters.otlphttp.*` deep-merge block. +The full otlphttp field reference (headers, compression, timeout, retry_on_failure, sending_queue, tls.*, ...) follows the upstream [`otlphttpexporter`](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlphttpexporter) README. For fields the structured block doesn't expose, use the free-form `config.exporters.otlphttp.*` deep-merge block. **Run on every node including tainted ones (control plane, GPU pools):** @@ -212,12 +195,12 @@ template in a way that violates the minimum-privilege charter. Re-read and the fixture set under `policies/conftest/testdata/` before patching the template. -**`OOMKilled` after enabling `receivers.kernelevents`.** The kernel -event source can buffer large batches under journald load; the chart's -default `resources.limits.memory: 128Mi` is sized for the default -hardware-free configuration. Bump to `256Mi` or higher -(`--set resources.limits.memory=256Mi`) and monitor RSS with -`kubectl top pod`. +**`OOMKilled` after wiring a high-volume upstream receiver.** The +chart's default `resources.limits.memory: 128Mi` is sized for the +hardware-free hostmetrics + debug pairing. Receivers that buffer large +batches (filelog, journald, kafkareceiver) push RSS well above that. +Bump to `256Mi` or higher (`--set resources.limits.memory=256Mi`) and +monitor RSS with `kubectl top pod`. **Rollout takes hours on fleets above ~500 nodes.** Default `updateStrategy.rollingUpdate.maxUnavailable: 1` × per-node readiness @@ -290,19 +273,12 @@ chart's deviations from a literal reading of `restricted`: capability is in the conftest allowlist; any other addition rejects the build. -2. **Host-path mounts are required for some receivers.** Enabling - `receivers.kernelevents` requires `hostPath` mounts (`/dev/kmsg` - read-only, optionally `/var/log/journal` and - `/run/systemd/journal` for the journald source). The chart does - not render those mounts by default; operators opt in via the - `config:` override and accept the deviation. - -3. **DCGM standalone mode connects to `nv-hostengine`.** When - `receivers.dcgm.enabled=true` and `mode=standalone`, the - DaemonSet connects to an external DCGM endpoint specified in - values. The chart does not run `nv-hostengine` in-process and does - not add capabilities for it. Embedded mode is out of scope for the - default chart. +2. **Host-path mounts are required for some upstream receivers.** + Wiring `journaldreceiver` or `filelogreceiver` (via the `config:` + override) typically requires `hostPath` mounts (`/var/log/journal`, + `/run/systemd/journal`, `/var/log/pods`, …). The chart does not + render those mounts by default; operators opt in via a custom + DaemonSet patch and accept the deviation. Each deviation is bounded by the conftest policy: the policy only permits SYS_PTRACE, never relaxes hostPID/hostIPC/hostNetwork, and diff --git a/install/kubernetes/tracecore/ci/all-receivers-off-values.yaml b/install/kubernetes/tracecore/ci/all-receivers-off-values.yaml index 50b2a368..7680e54f 100644 --- a/install/kubernetes/tracecore/ci/all-receivers-off-values.yaml +++ b/install/kubernetes/tracecore/ci/all-receivers-off-values.yaml @@ -1,25 +1,16 @@ # Used by the chart-render CI gate. Disables every receiver and # exporter; rendered config has no pipelines and only the # service.telemetry self-metrics + health_check extension. -# The chart.yml workflow skips `tracecore validate` on this fixture -# (RFC-0013 PR-A2 disabled the gate while the chart still emits the -# legacy `telemetry:` top-level key the OCB binary does not recognise; -# PR-K reinstates the gate after the chart shape migrates to upstream -# `service.telemetry`). +# The chart.yml workflow does NOT run `tracecore validate` on this +# fixture: upstream OTel requires at least one receiver configuration, +# so an all-off render cannot pass validate by design. The fixture's +# purpose is the helm render + conftest gates. receivers: hostmetrics: enabled: false - clockreceiver: - enabled: false - dcgm: - enabled: false - kernelevents: - enabled: false exporters: debug: enabled: false - stdoutexporter: - enabled: false pipelines: {} diff --git a/install/kubernetes/tracecore/ci/one-receiver-on-values.yaml b/install/kubernetes/tracecore/ci/one-receiver-on-values.yaml index 9e87f162..d6c49f51 100644 --- a/install/kubernetes/tracecore/ci/one-receiver-on-values.yaml +++ b/install/kubernetes/tracecore/ci/one-receiver-on-values.yaml @@ -7,18 +7,10 @@ receivers: collection_interval: 1s scrapers: load: {} - clockreceiver: - enabled: false - dcgm: - enabled: false - kernelevents: - enabled: false exporters: debug: enabled: true - stdoutexporter: - enabled: false pipelines: metrics: diff --git a/install/kubernetes/tracecore/ci/pyspy-on-values.yaml b/install/kubernetes/tracecore/ci/pyspy-on-values.yaml index 5728a9fe..19ab20d0 100644 --- a/install/kubernetes/tracecore/ci/pyspy-on-values.yaml +++ b/install/kubernetes/tracecore/ci/pyspy-on-values.yaml @@ -6,23 +6,17 @@ # addition because the helper does the frame walking inside the # target Python process. # -# Pyspy is an in-tree-only receiver pending PR-K deletion; this -# fixture exists ONLY to exercise the conftest + caps assertions on -# the DaemonSet. The rendered configmap is NOT fed to `tracecore -# validate` (the OCB binary does not register pyspy). The chart- -# render workflow's validate gate uses one-receiver-on-values.yaml. +# Pyspy is an in-tree-only receiver pending an upstream OTel Profiles +# replacement; this fixture exists ONLY to exercise the conftest + caps +# assertions on the DaemonSet. The rendered configmap is NOT fed to +# `tracecore validate` (the OCB binary does not register pyspy). The +# chart-render workflow's validate gate uses one-receiver-on-values.yaml. receivers: hostmetrics: enabled: true collection_interval: 1s scrapers: load: {} - clockreceiver: - enabled: false - dcgm: - enabled: false - kernelevents: - enabled: false pyspy: enabled: true target: @@ -34,8 +28,6 @@ receivers: exporters: debug: enabled: true - stdoutexporter: - enabled: false pipelines: logs: diff --git a/install/kubernetes/tracecore/policies/conftest/tracecore.rego b/install/kubernetes/tracecore/policies/conftest/tracecore.rego index 0696f114..ce596db1 100644 --- a/install/kubernetes/tracecore/policies/conftest/tracecore.rego +++ b/install/kubernetes/tracecore/policies/conftest/tracecore.rego @@ -44,19 +44,6 @@ deny contains msg if { msg := sprintf("%s/%s sets hostUsers=true; user-namespace sharing with the host is forbidden", [input.kind, input.metadata.name]) } -# containerstdout-allowlist — the M15 containerstdout receiver is the -# only sanctioned path to runAsUser=0 in the chart because the kubelet -# CRI symlink tree under /var/log/pods is root-owned on every distro -# tracecore supports. The exemption is gated on the *presence of the -# containerstdout-pod-logs hostPath volume*, NOT on a values flag, so -# a custom DaemonSet patch that drops the volume cannot smuggle root -# through this rule. The receiver's RUNBOOK enumerates the privilege -# tradeoff; values.yaml comments document operator opt-in. -containerstdout_enabled if { - some v in object.get(pod_spec, "volumes", []) - v.name == "containerstdout-pod-logs" -} - # Pod-level runAsNonRoot — restricted PSS requires this OR every container's # securityContext.runAsNonRoot=true. The chart sets it at pod level; the # policy enforces that contract so a values-override can't downgrade. @@ -64,12 +51,9 @@ containerstdout_enabled if { # Guarded with input.spec.template.spec so the rule only fires on # pod-bearing objects (Deployment, DaemonSet, StatefulSet, Job, …); # ConfigMap / ServiceAccount documents are exempt. -# -# Exempts containerstdout-enabled DaemonSets — they MUST run as root. deny contains msg if { input.spec.template.spec not pod_spec.securityContext.runAsNonRoot == true - not containerstdout_enabled msg := sprintf("%s/%s must set pod securityContext.runAsNonRoot=true", [input.kind, input.metadata.name]) } @@ -99,20 +83,15 @@ all_containers_have_seccomp if { # UID 0 / GID 0 forbidden — restricted PSS requires non-root. The chart # default sets runAsUser/Group to 65532; this rule rejects a values # override that downgrades to root. -# -# Exempts containerstdout-enabled DaemonSets — they MUST run as root -# to read kubelet-owned CRI symlinks under /var/log/pods. deny contains msg if { input.spec.template.spec pod_spec.securityContext.runAsUser == 0 - not containerstdout_enabled msg := sprintf("%s/%s sets runAsUser=0; root execution is forbidden", [input.kind, input.metadata.name]) } deny contains msg if { input.spec.template.spec pod_spec.securityContext.runAsGroup == 0 - not containerstdout_enabled msg := sprintf("%s/%s sets runAsGroup=0; root group is forbidden", [input.kind, input.metadata.name]) } @@ -154,62 +133,3 @@ deny contains msg if { msg := sprintf("container %q adds capability %q; only SYS_PTRACE is allowed", [c.name, cap]) } -# ─── containerstdout (M15) operational invariants ────────────────── -# -# The containerstdout receiver opts into root execution + a per-node -# Pod informer via the chart's values knob. These rules pin the -# operational shape so a chart edit that omits the volumes, RBAC, or -# downward-API env can't ship — the receiver would crash-loop in any -# of those states, but the failure would land at runtime in -# `kubectl logs` rather than at `helm install --dry-run`. - -# Required hostPath: /var/log/pods. The tailer cannot resolve any -# CRI symlink without it; absence is a hard fail. -deny contains msg if { - input.spec.template.spec - containerstdout_enabled - not has_volume_named("containerstdout-pod-logs") - msg := sprintf("%s/%s enables containerstdout but missing hostPath volume 'containerstdout-pod-logs' (/var/log/pods); the tailer cannot read CRI symlinks without it", [input.kind, input.metadata.name]) -} - -# containerstdout-pod-logs hostPath must point at /var/log/pods. Any -# other path defeats the CRI symlink resolution contract and risks -# tailing an attacker-controlled directory. -deny contains msg if { - some vol in object.get(pod_spec, "volumes", []) - vol.name == "containerstdout-pod-logs" - vol.hostPath.path != "/var/log/pods" - msg := sprintf("containerstdout-pod-logs volume must mount /var/log/pods, got %q", [vol.hostPath.path]) -} - -# Required hostPath: cursor directory. Without it cursor writes go to -# the read-only rootfs and every checkpoint increments -# KindCursorWriteFailed — verified by TestFailure_CursorWriteFailedReadOnlyFs. -deny contains msg if { - input.spec.template.spec - containerstdout_enabled - not has_volume_named("containerstdout-cursor") - msg := sprintf("%s/%s enables containerstdout but missing hostPath volume 'containerstdout-cursor' (cursor persistence dir); restarts will double-emit lines", [input.kind, input.metadata.name]) -} - -# Required env: K8S_NODE_NAME from downward API. The per-node Pod -# informer scope-filters on it; an empty value falls back to a -# cluster-wide watch and blows the RFC-0010 §Egress budget. -deny contains msg if { - input.spec.template.spec - containerstdout_enabled - not containerstdout_has_node_name_env - msg := sprintf("%s/%s enables containerstdout but the tracecore container is missing K8S_NODE_NAME env (downward API fieldRef spec.nodeName); the per-node informer cannot scope-filter", [input.kind, input.metadata.name]) -} - -has_volume_named(name) if { - some v in object.get(pod_spec, "volumes", []) - v.name == name -} - -containerstdout_has_node_name_env if { - some c in pod_spec.containers - some e in object.get(c, "env", []) - e.name == "K8S_NODE_NAME" - e.valueFrom.fieldRef.fieldPath == "spec.nodeName" -} diff --git a/install/kubernetes/tracecore/templates/NOTES.txt b/install/kubernetes/tracecore/templates/NOTES.txt index 0d1b578f..fabe7f11 100644 --- a/install/kubernetes/tracecore/templates/NOTES.txt +++ b/install/kubernetes/tracecore/templates/NOTES.txt @@ -26,16 +26,11 @@ OTLP/Datadog/ClickHouse exporter (otlphttp / datadog / clickhouse — all OCB-bundled) before treating the DaemonSet as a steady-state production deployment. {{- end }} -{{- if or .Values.receivers.dcgm.enabled .Values.receivers.kernelevents.enabled .Values.receivers.containerstdout.enabled .Values.receivers.clockreceiver.enabled .Values.receivers.pyspy.enabled .Values.exporters.stdoutexporter.enabled }} +{{- if .Values.receivers.pyspy.enabled }} -WARNING (RFC-0013 PR-A2, 2026-05-30): you have enabled an in-tree-only -component that the OCB-assembled binary does NOT register. Enabling any -of {clockreceiver, dcgm, kernelevents, pyspy, containerstdout, -stdoutexporter} will cause the pod to crash at startup because the -factory is unknown. The chart preserves these toggles for migration -tooling; PR-J (RFC-0013 §migration) delivers the upstream recipes -(hostmetrics already shipped, journald + filelog + k8sobjects + -prometheus follow). Until then, use the `config:` free-form override -block to wire OCB-supported components (run `./_build/tracecore -components` to see the live registry). +WARNING: pyspy is an in-tree-only receiver pending an upstream OTel +Profiles GA replacement. The OCB-assembled binary does NOT register +pyspy yet, so enabling it crashes the pod at startup. Use the +`config:` free-form override block to wire OCB-supported components +instead (run `tracecore components` to see the live registry). {{- end }} diff --git a/install/kubernetes/tracecore/templates/_helpers.tpl b/install/kubernetes/tracecore/templates/_helpers.tpl index 8079da8f..81caf116 100644 --- a/install/kubernetes/tracecore/templates/_helpers.tpl +++ b/install/kubernetes/tracecore/templates/_helpers.tpl @@ -59,11 +59,10 @@ values. RFC-0013 PR-A2 (2026-05-30): output shape is the upstream OpenTelemetry config schema. The OCB-assembled binary registers only upstream/contrib factories — see `tracecore components` for the live registry. The -chart's per-receiver toggles for in-tree-only components -(clockreceiver, dcgm, kernelevents, pyspy, containerstdout, -stdoutexporter) still render their blocks when enabled so the validate -gate in chart.yml falsifies the regression, but those code paths fail -fast at boot until PR-J delivers the upstream replacements. +chart still ships a `pyspy` toggle (in-tree-only; no upstream +equivalent pending OTel Profiles GA) — enabling it crashes the pod at +startup, but the toggle survives so the values shape doesn't break +operators that pinned it. Intermediate dict shape: receivers: { : } @@ -77,19 +76,11 @@ Intermediate dict shape: {{- define "tracecore.renderedConfig" -}} {{- $built := dict -}} -{{/* Receivers — include only enabled blocks; strip the `enabled` key. - containerstdout carries chart-only keys (rbac, hostPath) that - control RBAC + DaemonSet volume rendering but are NOT part of the - receiver's runtime config schema; omit them from the rendered - tracecore config so config.Load does not reject the chart output - with an unknown-field error. */}} +{{/* Receivers — include only enabled blocks; strip the `enabled` key. */}} {{- $recvs := dict -}} {{- range $name, $cfg := .Values.receivers -}} {{- if $cfg.enabled -}} {{- $body := omit $cfg "enabled" -}} - {{- if eq $name "containerstdout" -}} - {{- $body = omit $body "rbac" "hostPath" -}} - {{- end -}} {{- $_ := set $recvs $name $body -}} {{- end -}} {{- end -}} diff --git a/install/kubernetes/tracecore/templates/containerstdout-rbac.yaml b/install/kubernetes/tracecore/templates/containerstdout-rbac.yaml deleted file mode 100644 index 4f342676..00000000 --- a/install/kubernetes/tracecore/templates/containerstdout-rbac.yaml +++ /dev/null @@ -1,59 +0,0 @@ -{{- /* - RBAC for the containerstdout receiver (M15, alpha). - - Scope: get,list,watch on core/v1.Pods cluster-wide AND get on - core/v1.Nodes. The Pod informer reads container env on the local - node (kubelet labels filter to NODE_NAME at the informer layer); - the Node read is the node-name discovery used by the cursor path - scheme. No write verbs, no Secret / ConfigMap reads — RFC-0010 - §Egress model pins this list. - - Render gated by `.Values.receivers.containerstdout.enabled AND - .Values.receivers.containerstdout.rbac.create` so operators with bring-your- - own RBAC (or strict cluster-admin policy) can ship the receiver - without the chart-managed Role. - - Mirrors components/receivers/k8sevents/rbac.yaml in shape; the - binding subject targets the same ServiceAccount the chart's - DaemonSet uses (serviceAccountName helper), so no second SA is - rendered for this receiver — one identity, one audit trail. -*/}} -{{- if and .Values.receivers.containerstdout.enabled .Values.receivers.containerstdout.rbac.create }} ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRole -metadata: - name: {{ include "tracecore.fullname" . }}-containerstdout - labels: {{- include "tracecore.labels" . | nindent 4 }} -rules: - # core/v1.Pods — the per-node informer watches Pods on NODE_NAME so - # the receiver can map (pod_uid, container) → namespace/name/ - # container-env at attribution time. Cluster-scoped get,list,watch - # because Pod metadata is the join surface RFC-0010 §Attribution - # mandates. - - apiGroups: [""] - resources: ["pods"] - verbs: ["get", "list", "watch"] - # core/v1.Nodes get — node-name discovery + a single-shot read of - # the Node's labels (cursor path scheme + future kubelet log-group - # alignment). No list/watch — the receiver knows its NODE_NAME from - # the downward API and reads only that node's record. - - apiGroups: [""] - resources: ["nodes"] - # TODO(RFC-0013): scope to per-node Node via aggregator pattern when refactoring RBAC for OCB swap - verbs: ["get"] ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: ClusterRoleBinding -metadata: - name: {{ include "tracecore.fullname" . }}-containerstdout - labels: {{- include "tracecore.labels" . | nindent 4 }} -subjects: - - kind: ServiceAccount - name: {{ include "tracecore.serviceAccountName" . }} - namespace: {{ .Values.namespace }} -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: {{ include "tracecore.fullname" . }}-containerstdout -{{- end }} diff --git a/install/kubernetes/tracecore/templates/daemonset.yaml b/install/kubernetes/tracecore/templates/daemonset.yaml index c9e95f2a..e3918e82 100644 --- a/install/kubernetes/tracecore/templates/daemonset.yaml +++ b/install/kubernetes/tracecore/templates/daemonset.yaml @@ -21,15 +21,7 @@ spec: {{- end }} spec: serviceAccountName: {{ include "tracecore.serviceAccountName" . }} - # automountServiceAccountToken flips true when containerstdout - # is enabled — its per-node Pod informer is an in-cluster - # apiserver client and needs the projected SA token mount. - # Otherwise honour the explicit values.yaml knob (defaults - # false because the alpha receiver set is API-server-free). - {{- if and .Values.receivers.containerstdout.enabled (not .Values.receivers.containerstdout.rbac.create) }} - {{- fail "containerstdout.enabled=true requires containerstdout.rbac.create=true; otherwise the ServiceAccount token is mounted without backing ClusterRole" }} - {{- end }} - automountServiceAccountToken: {{ or .Values.serviceAccount.automount .Values.receivers.containerstdout.enabled }} + automountServiceAccountToken: {{ .Values.serviceAccount.automount }} {{- with .Values.imagePullSecrets }} imagePullSecrets: {{- toYaml . | nindent 8 }} {{- end }} @@ -39,24 +31,7 @@ spec: {{- with .Values.priorityClassName }} priorityClassName: {{ . | quote }} {{- end }} - {{- if .Values.receivers.containerstdout.enabled }} - # containerstdout requires root to read /var/log/pods symlinks - # owned by root on every distro tracecore supports. The - # podSecurityContext override is conditional so the chart - # default (UID 65532, non-root) still applies to all other - # receivers — operators MUST treat enabling containerstdout as - # an explicit privilege escalation decision (the conftest - # policy's containerstdout-allowlist rule documents this). - securityContext: - runAsNonRoot: false - runAsUser: 0 - runAsGroup: 0 - fsGroup: 0 - seccompProfile: - type: RuntimeDefault - {{- else }} securityContext: {{- toYaml .Values.podSecurityContext | nindent 8 }} - {{- end }} {{- with .Values.nodeSelector }} nodeSelector: {{- toYaml . | nindent 8 }} {{- end }} @@ -77,23 +52,6 @@ spec: args: - --config=/etc/tracecore/config.yaml securityContext: {{- toYaml .Values.containerSecurityContext | nindent 12 }} - # Downward-API env vars for receivers that stamp pod/node - # context onto emitted attributes. - {{- if or .Values.receivers.containerstdout.enabled .Values.receivers.kernelevents.enabled }} - env: - - name: K8S_POD_NAME - valueFrom: - fieldRef: - fieldPath: metadata.name - - name: K8S_POD_NAMESPACE - valueFrom: - fieldRef: - fieldPath: metadata.namespace - - name: K8S_NODE_NAME - valueFrom: - fieldRef: - fieldPath: spec.nodeName - {{- end }} {{- if .Values.telemetry.enabled }} {{/* RFC-0013 PR-A2: two listener ports. - `telemetry` (default :8888) = collector self-metrics @@ -134,40 +92,9 @@ spec: readOnly: true - name: tmp mountPath: /tmp - {{- if .Values.receivers.containerstdout.enabled }} - # containerstdout mounts. /var/log/pods is the kubelet's - # symlink tree; /var/log/containers carries the - # containerd target on some distros (resolved per the - # RFC-0010 §Path attribution table). Both read-only. - # cursorDir is the persisted cursor.json location — - # read-write hostPath so a pod restart doesn't double- - # emit lines from the same inode. - - name: containerstdout-pod-logs - mountPath: {{ .Values.receivers.containerstdout.hostPath.podLogs }} - readOnly: true - - name: containerstdout-container-logs - mountPath: {{ .Values.receivers.containerstdout.hostPath.containers }} - readOnly: true - - name: containerstdout-cursor - mountPath: {{ .Values.receivers.containerstdout.hostPath.cursorDir }} - {{- end }} volumes: - name: config configMap: name: {{ include "tracecore.fullname" . }}-config - name: tmp emptyDir: {} - {{- if .Values.receivers.containerstdout.enabled }} - - name: containerstdout-pod-logs - hostPath: - path: {{ .Values.receivers.containerstdout.hostPath.podLogs }} - type: Directory - - name: containerstdout-container-logs - hostPath: - path: {{ .Values.receivers.containerstdout.hostPath.containers }} - type: DirectoryOrCreate - - name: containerstdout-cursor - hostPath: - path: {{ .Values.receivers.containerstdout.hostPath.cursorDir }} - type: DirectoryOrCreate - {{- end }} diff --git a/install/kubernetes/tracecore/values.yaml b/install/kubernetes/tracecore/values.yaml index 5be696d3..112abf5e 100644 --- a/install/kubernetes/tracecore/values.yaml +++ b/install/kubernetes/tracecore/values.yaml @@ -103,33 +103,10 @@ receivers: collection_interval: 1s scrapers: load: {} - # clockreceiver — in-tree heartbeat retired by RFC-0013 PR-A2; the - # OCB binary does not register it. The toggle survives for one - # release so the values shape doesn't break for operators that pin - # it explicitly; enabling it now fails `helm install --dry-run` - # via the validate gate in .github/workflows/chart.yml. Removed - # entirely in PR-K. - clockreceiver: - enabled: false - interval: 1s - # dcgm — in-tree NVIDIA GPU receiver. NOT registered by OCB; will - # be replaced by dcgm-exporter DaemonSet + prometheusreceiver in - # PR-J. Enabling now fails validate. Toggle preserved for migration - # tooling; removed in PR-K. - dcgm: - enabled: false - mode: standalone - endpoint: localhost:5555 - collection_interval: 15s - # kernelevents — in-tree. NOT registered by OCB; replaced by - # journaldreceiver + filelogreceiver + OTTL Xid transform in PR-J. - # Enabling now fails validate. Removed in PR-K. - kernelevents: - enabled: false - min_severity: info - # pyspy — in-tree. NOT registered by OCB; no upstream equivalent yet - # (deferred until OTel Profiles GA). Enabling now fails validate. - # Toggle survives until the deferral lands a replacement. + # pyspy — in-tree-only receiver pending upstream replacement (OTel + # Profiles GA). The OCB binary does NOT register pyspy yet; enabling + # it crashes the pod at startup. Toggle survives until the deferral + # lands an upstream recipe. pyspy: enabled: false target: @@ -137,47 +114,14 @@ receivers: cadence: full_dump_interval: 60s main_dump_interval: 15s - # containerstdout — in-tree. NOT registered by OCB; replaced by - # filelogreceiver + container stanza + file_storage extension in - # PR-J. Enabling now fails validate. - # - # NOTE: when re-enabled via the PR-J recipe, the DaemonSet's pod - # securityContext still flips to root (UID 0) because /var/log/pods - # symlinks land under root-owned directories on every distro - # tracecore supports. The chart's conftest policy (deny - # runAsUser==0) is bypassed for this opt-in path by a dedicated - # allowlist rule in tracecore.rego. The chart ALSO adds a - # ClusterRole granting core/v1.pods get,list,watch + core/v1.nodes - # get so the per-node informer can resolve container env vars. - containerstdout: - enabled: false - include: - - /var/log/pods/*/*/*.log - namespaces: [] - max_log_size: 1048576 - rank_source: informer - egress_rate_limit: - rate: 200 - burst: 1000 - rbac: - create: true - hostPath: - podLogs: /var/log/pods - containers: /var/log/containers - cursorDir: /var/lib/tracecore/container_stdout exporters: - # debug — upstream OCB-bundled. Replaces the in-tree `stdoutexporter` - # as the chart default. Writes pipeline data to pod stdout (visible - # via `kubectl logs`); swap for otlphttp / datadog / clickhouse via - # this block before treating the DaemonSet as a steady-state - # production deployment. + # debug — upstream OCB-bundled. Writes pipeline data to pod stdout + # (visible via `kubectl logs`); swap for otlphttp / datadog / + # clickhouse via this block before treating the DaemonSet as a + # steady-state production deployment. debug: enabled: true - # stdoutexporter — in-tree. NOT registered by OCB; replaced by - # `debug` above. Enabling now fails validate. Removed in PR-K. - stdoutexporter: - enabled: false # otlphttp — upstream OCB-bundled otlphttpexporter. Schema follows # the upstream README. Required field when enabled: `endpoint`. # Common optional fields (compression, headers, tls.*, timeout, @@ -191,10 +135,10 @@ exporters: endpoint: "" # service.pipelines wiring. The defaults pair hostmetrics to debug. -# Override to wire dcgm + kernelevents + custom exporters (when PR-J -# delivers their upstream recipes); entries whose components are not -# enabled above are silently dropped at render time so a partial -# override still validates. +# Override to wire other receivers/exporters from the OCB-bundled +# registry (run `tracecore components` to see the live list); entries +# whose components are not enabled above are silently dropped at +# render time so a partial override still validates. pipelines: metrics: receivers: [hostmetrics]