Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cmd/tracecore/components.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions components.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ receivers:
package: github.com/tracecoreai/tracecore/components/receivers/dcgm
- type: kernelevents
package: github.com/tracecoreai/tracecore/components/receivers/kernelevents
- type: k8s_events
package: github.com/tracecoreai/tracecore/components/receivers/k8sevents
- type: nccl_fr
package: github.com/tracecoreai/tracecore/components/receivers/nccl_fr

Expand Down
207 changes: 207 additions & 0 deletions components/receivers/k8sevents/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# k8sevents

**Stability:** alpha — public config keys MAY change with one-minor-
cycle deprecation warning. Schema URL pinned at
`https://tracecore.ai/schemas/k8sevents/v0`; downstream pattern
detectors version-gate on this string. See the
[Schema versioning policy](#schema-versioning-policy) section.

Watches the `events.k8s.io/v1` Events stream via a client-go
`SharedInformer` with resync ≥10 min, and emits one `plog.LogRecord`
per Event with the typed-attribute schema pinned by `Record` and the
`Attr*` constants. Ships a typed `Record` struct so pattern detectors
can join on a compile-time-stable shape instead of grepping
attributes.

## Overview

| Aspect | Detail |
|---|---|
| Upstream API | `events.k8s.io/v1/events` |
| Watch primitive | client-go `SharedInformer` (one per process) |
| Resync floor | 10 minutes (API-courtesy) |
| Client-side limits | `QPS=5`, `Burst=10` pinned in code |
| Auth | `kubeconfig:` field → `KUBECONFIG` env → in-cluster (see [Auth resolution](#auth-resolution)) |
| Deployment shape | cluster-singleton `Deployment` `replicas: 1` (NOT DaemonSet) |
| Egress model | `events.k8s.io` only; no Pod / Secret / ConfigMap reads |

## Configuration reference

| Key | Type | Default | Notes |
|---|---|---|---|
| `kubeconfig` | string | "" | Absolute path to a kubeconfig file. Mutually exclusive with `KUBECONFIG` env AND in-cluster service-account credentials — both-set is rejected with exit 2. |
| `namespaces` | []string | [] | Optional. Length=1 → server-side scope; ≥2 → cluster-wide watch + in-process filter (documented egress cost). |
| `resync_interval` | duration | `10m` | Informer full-resync cadence. Floor 10 minutes (API-courtesy). |
| `note_max_bytes` | int | `0` (off) | Truncate `Event.Note` bytes; 64–4096. Operator-controlled defence-in-depth against unbounded message bodies (PII, exec args). |
| `min_event_type` | enum | `""` | `""` / `"Normal"` / `"Warning"`. `Warning` drops Normal events at the source. |
| `reason_regex` | RE2 string | "" | Compiled at Validate; bad regex → exit 2 with named-field error. |
| `include_namespaces` | []string | [] | In-process namespace allowlist. |
| `exclude_namespaces` | []string | [] | In-process namespace denylist (applied after include). |
| `max_attributes` | int | `16` | Cardinality cap. Floor 9 keeps the 7 join keys (`event.uid`, `event.reason`, `event.hint`, `regarding.{kind,namespace,name,uid}`) + `event.time` + `series.count` intact. |
| `channel_cap` | int | `1024` | Bounded internal channel. Floor 64. |

`qps` / `burst` are surfaced for HW-validation overrides only. The
API-courtesy contract pins them in code at `5` / `10`; operator overrides are
discouraged.

## Emitted attribute schema

Every emitted `plog.LogRecord` carries the canonical typed attributes
plus the tracecore-canonical hint:

| Key | Source |
|---|---|
| `event.uid` | `metadata.uid` |
| `event.reason` | `Event.Reason` |
| `event.action` | `Event.Action` |
| `event.type` | `Event.Type` (`Normal` / `Warning`) |
| `k8s.event.hint` | derived from `Reason` via the Hint taxonomy below |
| `regarding.kind` | `Event.Regarding.Kind` |
| `regarding.namespace` | `Event.Regarding.Namespace` |
| `regarding.name` | `Event.Regarding.Name` |
| `regarding.uid` | `Event.Regarding.UID` |
| `reporting.controller` | `Event.ReportingController` |
| `note` | `Event.Note` (also `Body`) |
| `series.count` | `Event.Series.Count` |
| `event.time` | RFC3339Nano from `Event.EventTime` |

### Hint taxonomy

Pinned by a table-driven test (`TestHintTaxonomy`). The 11 supported
reasons map to:

| `event.reason` | `k8s.event.hint` | Go constant |
|---|---|---|
| `Evicted` | `pod_evicted` | `HintPodEvicted` |
| `FailedMount` | `mount_failure` | `HintMountFailure` |
| `BackOff` | `backoff` | `HintBackoff` |
| `SystemOOM` (kubelet) / `OOMKilled` (CRI) | `oom_killed` | `HintOOMKilled` |
| `NodeNotReady` | `node_unhealthy` | `HintNodeUnhealthy` |
| `FailedScheduling` | `schedule_failure` | `HintScheduleFailure` |
| `FailedCreate` | `create_failure` | `HintCreateFailure` |
| `FailedAttachVolume` | `volume_attach_failure` | `HintVolumeAttachFailure` |
| `ContainerStatusUnknown` | `container_status_unknown` | `HintContainerStatusUnknown` |
| `NodeAllocatableEnforced` | `node_pressure` | `HintNodePressure` |
| `ImagePullBackOff` | `image_pull_failure` | `HintImagePullFailure` |

`Hint` is a named string type. Downstream pattern detectors should
switch on the `Hint*` constants — a raw string literal in a `case`
is a type error. Full switch-arm exhaustiveness requires the
`exhaustive` linter; consumers wanting that wire it into their own
pipeline.

`SystemOOM` is the kubelet's node-level OOM Event reason
(`pkg/kubelet/oom/oom_watcher_linux.go` in `kubernetes/kubernetes`).
The prior `OOMKilling` row was a typo — there is no `OOMKilling`
event reason upstream.

## Auth resolution

1. If `kubeconfig:` config field is set → load that file.
2. Else if `KUBECONFIG` env var is set → load that file.
3. Else → `rest.InClusterConfig()` (service-account mount).

If the in-cluster service-account token file
(`/var/run/secrets/kubernetes.io/serviceaccount/token`) is present
**AND** either `kubeconfig:` or `KUBECONFIG` is set, Validate
returns `ErrAmbiguousAuth` and the binary exits 2 with the offending
field named. The receiver refuses to silently choose because the
chosen identity determines what the receiver can see.

## RBAC + Deployment

Manifests live alongside the receiver:

- [`rbac.yaml`](./rbac.yaml) — `ServiceAccount`, `ClusterRole`
(verbs `get,list,watch` on `events.k8s.io/v1/events` only — the
legacy core/v1 events alias is NOT granted), `ClusterRoleBinding`.
- [`rbac.can-i.golden`](./rbac.can-i.golden) — the permitted verb
list, CI-asserted by `TestRBAC_MatchesGolden`.
- [`example-deployment.yaml`](./example-deployment.yaml) —
cluster-singleton `Deployment` (`replicas: 1`, not DaemonSet),
non-root, read-only root FS, no host PID/IPC/network, plus a
custom `tracecore-cluster-critical` PriorityClass (the reserved
`system-cluster-critical` is admission-restricted to the
`kube-system` namespace) and a sibling PodDisruptionBudget.
Voluntary disruption via the eviction API (node drain,
cluster-autoscaler) is blocked; direct deletion and
`--disable-eviction` bypass the PDB. Involuntary disruption
(node failure) causes a brief Events-observability gap that the
`K8sEventsReceiverDegraded` alert surfaces.

## Schema versioning policy

`SchemaURL = "https://tracecore.ai/schemas/k8sevents/v0"` is the
current attribute-vocabulary URL. The receiver is alpha, so the
following rules apply:

- **Additive fields on `Record`** (e.g. adding `Related ObjectRef`
in a later milestone) do NOT bump the URL. Consumer Go code reads
zero-value fields safely without recompiling.
- **Field renames or removals** bump the URL (`/v0` → `/v1`). The
old URL constant remains exported alongside the new one until the
alpha-stability deprecation window closes.
- Downstream pattern detectors should reference `k8sevents.SchemaURL`
(current) when stamping derived records, and string-literal-pin
against the URL they were authored against when behaviour depends
on a specific field set.

## Degraded mode

Informer `WatchErrorHandler` failures:

- Increment `tracecore_receiver_errors_total{kind="watch"}` once per
failure.
- Set `Degraded()=true`; cleared on the next successful emission.

The receiver stays alive; client-go's `cache.Reflector` reconnects
in the background.

The schedule pinned in `degraded.go` — `1s`, `2s`, `5s`, then `30s`
ceiling — drives the `K8sEventsReceiverDegraded` alert and the
runbook narrative (log lines emit `next_backoff` per failure).
**It does not drive the network-level reconnect cadence.** The
reflector owns retry timing via its own `ExponentialBackoff`
(`1s` initial, `30s` cap); the receiver-side schedule is the
**observable** layer that operators alert on, not the **enforcing**
layer.

## Semantic-convention divergence

The receiver stamps attributes under the `event.*`, `regarding.*`,
and `reporting.*` namespaces (see [Emitted attribute schema](#emitted-attribute-schema)).
The OpenTelemetry semantic-convention v1.32 `k8s.event.*` /
`k8s.object.*` keys use a different prefix.

The divergence is deliberate:

1. **Downstream pattern detectors** join on the typed `Record`
struct, not on attribute string keys. The wire-format attribute
names exist for backends that consume `plog.LogRecord` without
the typed package import; pinning the names to a stable prefix
tracecore owns insulates those backends from upstream semconv
churn.
2. **The taxonomy hint (`k8s.event.hint`)** uses the upstream
prefix because it is the cross-receiver join key the
pod-evicted pattern detector reads — it's the one attribute
where ecosystem-standard naming matters more than tracecore's
internal stability.

A `semconv_compat: true` config knob that emits BOTH namespaces is
a deliberate followup (see `docs/FOLLOWUPS.md`); it is not in the
alpha-stability surface to keep the cardinality budget honest.

## Limitations

- **Linux Getrusage benchmark deferred.** The NFR budget
(`≤0.02% CPU, ≤0.02 Mbps egress, ≤10 MB RSS` at 1k events/min) is
bench-falsifiable today via `BenchmarkEmitOne` (~700 ns/op on
Apple M4 Pro). A full Linux-runner Getrusage harness lands in a
follow-up under `test-extras`.
- **Multi-namespace watch is cluster-wide.** When `namespaces:`
length is ≥2, the informer falls back to a cluster-wide watch
with in-process filtering. Operators paying for FieldSelector
efficiency should use a single namespace.
- **`Related` ObjectReference is not emitted.** Only `Regarding` is
in the current schema; if a future pattern detector needs `Related`,
extend the `Record` shape AND bump `SchemaURL`.
143 changes: 143 additions & 0 deletions components/receivers/k8sevents/RUNBOOK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# k8sevents RUNBOOK

Operator-facing playbook for the k8sevents receiver (alpha
stability).

## First 15 minutes

- `kubectl logs -n tracecore deploy/tracecore-k8sevents --tail=200`
— receiver logs `"k8sevents started"` once and `"k8sevents stopped"`
once. Anything else is a symptom.
- Match the symptom to a section below by `grep`:
- `"watch error; degraded"` → [K8sEventsReceiverDegraded](#k8seventsreceiverdegraded)
- `"backpressure_drop"` counter rising → [K8sEventsBackpressureDrops](#k8seventsbackpressuredrops)
- `ErrAmbiguousAuth` on boot, CrashLoopBackOff → [Receiver fails to start with ambiguous-auth error](#receiver-fails-to-start-with-ambiguous-auth-error)
- `"k8sevents started"` line present but zero downstream Events
→ [Receiver started but no events emitted](#receiver-started-but-no-events-emitted)

## K8sEventsReceiverDegraded

The receiver has been in degraded state ≥5 minutes — the informer's
underlying watch has been failing. The reflector reconnects on its
own schedule (client-go `cache.Reflector` exponential backoff, `1s`
initial through `30s` cap); the receiver-side schedule pinned in
`degraded.go` (`1s → 2s → 5s → 30s` ceiling) drives the log line
the alert references and the narrative below, NOT the actual
network retry.

Triage:

1. Check `tracecore_receiver_errors_total{component="k8s_events",kind="watch"}`
— a steady climb means the apiserver is rejecting the watch.
2. `kubectl auth can-i get events.k8s.io --as=system:serviceaccount:tracecore:tracecore-k8sevents`
— should return `yes`. If `no`, RBAC drift; re-apply
`components/receivers/k8sevents/rbac.yaml`.
3. `kubectl logs -n tracecore deploy/tracecore-k8sevents` and grep for
`"k8sevents: watch error; degraded"` — the wrapped error names the
underlying client-go failure (network reset, 401, etc.).
4. Verified by: `TestReceiver_WatchErrorIncrementsDegradedAndCounter`.

## K8sEventsBackpressureDrops

More than 1 in 1000 incoming Events is being dropped AND ≥1/min
absolute. The bounded internal channel (default `channel_cap: 1024`)
is full because the downstream consumer can't drain fast enough.

Triage:

1. Look at the downstream exporter's
`tracecore_exporter_failure_rate{component="<exporter>"}` — a
stuck exporter is the most common cause.
2. If the volume is legitimately high and the downstream is healthy,
raise `channel_cap` (floor 64; default 1024) — the channel can
absorb larger bursts at the cost of memory.
3. Verified by: `TestReceiver_BackPressureDropsPastChannelCap`.

## Receiver fails to start with ambiguous-auth error

The binary crashes immediately with
`k8sevents: both in-cluster service-account credentials AND
out-of-cluster kubeconfig are present` and an exit code of 2. The
receiver refuses to silently pick one identity because the choice
determines what Events it can see.

Triage:

1. `kubectl exec -n tracecore deploy/tracecore-k8sevents -- env | grep
KUBECONFIG` — if non-empty, the Pod's environment was injected
(downward API, sidecar mutation, custom controller). Either unset
the env var in the Deployment, or remove the `kubeconfig:` field
from the receiver config.
2. The receiver's `automountServiceAccountToken: true` mounts the
in-cluster credentials at
`/var/run/secrets/kubernetes.io/serviceaccount/token`. If you
want to *deliberately* use a kubeconfig from a Secret, set
`automountServiceAccountToken: false` on the Pod spec.
3. Verified by: `TestConfig_AmbiguousAuth_InClusterPlusKubeconfigField`
and `TestConfig_AmbiguousAuth_InClusterPlusKubeconfigEnv`.

## Receiver started but no events emitted

The receiver logs `"k8sevents started"` and stays up, but no Events
appear in the downstream exporter and
`tracecore_receiver_emissions_total{component="k8s_events"}` stays
at 0.

Triage:

1. `namespaces:` plus `include_namespaces:` mismatch — if you set
`namespaces: [app]` (server-side scope) AND
`include_namespaces: [other]` (in-process allowlist), every Event
is dropped because `other` is never delivered to the informer.
Remove one of the lists, or make them consistent.
2. `reason_regex:` over-matches — a too-restrictive regex silently
drops everything. Temporarily set `reason_regex: ""` and recheck.
3. `min_event_type: Warning` drops Normal events at the source. If
you expected `kubectl get events` to flow through, set
`min_event_type: Normal` (or omit).
4. RBAC drift — see K8sEventsReceiverDegraded triage step 2;
`can-i get events.k8s.io` MUST return `yes`.

## Disruption semantics (cluster-singleton)

The receiver runs as a singleton Deployment with a sibling
PodDisruptionBudget (`minAvailable: 1`). The PDB blocks the
eviction API path — which covers:

- `kubectl drain` (default, eviction-based)
- cluster-autoscaler scale-down on the receiver's node
- Vertical Pod Autoscaler-driven recreations

The PDB does NOT block:

- `kubectl drain --disable-eviction` — deletes the Pod directly,
bypassing the eviction subresource and the PDB.
- `kubectl delete pod tracecore-k8sevents-<hash>` — same.
- Force node deletion (`kubectl delete node --force`).

If an operator must drain a node hosting the receiver during an
outage, the receiver will accept the disruption and log
`"k8sevents stopped"`; the `K8sEventsReceiverDegraded` alert will
not fire (the gap is a brief absence, not a degraded state). Plan
for a few-second Events-observability gap during such operations.

## ServiceAccount token rotation

The example Deployment sets `automountServiceAccountToken: true`.
On Kubernetes 1.22+ this provisions a bound, projected token with
automatic rotation (no operator action needed). On older clusters,
the token is a long-lived Secret — operators on those clusters
should add an explicit `projected` volume with
`serviceAccountToken { expirationSeconds: 3600 }` to opt into the
modern path.

## Failure mode inventory

| Failure | Behaviour | Test |
|---|---|---|
| Informer watch fails | `kind="watch"` ticks; `Degraded()=true`; client-go reflector backs off (1s/2s/5s/30s); receiver stays alive. | `TestReceiver_WatchErrorIncrementsDegradedAndCounter` |
| Bounded channel saturates | Drop with `kind="backpressure_drop"`; informer never blocks. | `TestReceiver_BackPressureDropsPastChannelCap` + `TestReceiver_GoleakNoLeakAfterShutdown` |
| Informer callback panic | Recovered via `defer/recover`; `kind="panic"` ticks; process stays up. | `TestReceiver_GoroutineDeferRecover_KeepsProcessAlive` |
| Auth ambiguity at config-load | `ErrAmbiguousAuth` exit 2 with offending field named. | `TestConfig_AmbiguousAuth_*` |
| Bad RE2 in `reason_regex` | Exit 2 with `k8sevents.reason_regex:` named-field error. | `TestConfig_RejectsBadReasonRegex` |
| Cardinality cap exceeded | Drop past `max_attributes`; join keys preserved. | `TestBuildLogRecord_CapPreservesJoinKeys` |
19 changes: 19 additions & 0 deletions components/receivers/k8sevents/bench_export.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// SPDX-License-Identifier: Apache-2.0

package k8sevents

import (
"go.opentelemetry.io/collector/pdata/plog"
eventsv1 "k8s.io/api/events/v1"
)

// BuildLogRecordForBench re-exports buildLogRecord for benchmarks
// in a `_test` package.
func BuildLogRecordForBench(lr plog.LogRecord, rec Record, maxAttrs, noteMaxBytes int) int {
return buildLogRecord(lr, rec, maxAttrs, noteMaxBytes)
}

// ConvertEventForBench re-exports convertEvent for benchmarks.
func ConvertEventForBench(e *eventsv1.Event) Record {
return convertEvent(e)
}
Loading
Loading