Skip to content

[docs] FOLLOWUPS: close squash-merge-trailer item with observation#14

Merged
trilamsr merged 1 commit into
mainfrom
chore/m1.6-postmerge
May 14, 2026
Merged

[docs] FOLLOWUPS: close squash-merge-trailer item with observation#14
trilamsr merged 1 commit into
mainfrom
chore/m1.6-postmerge

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

What this PR does

Closes the one FOLLOWUPS item from PR #13 that could only be observed
post-merge: whether GitHub's squash-merge preserves the per-commit
Assisted-by: trailers. Observed on squash commit a57883f: it
does not.
The audit trail survives via the PR description (embedded
in the squash commit body, listing every source SHA), but the
trailer-level disclosure AGENTS.md describes is gone.

Documents the finding + flags two governance paths that aren't being
chosen unilaterally:

  • Update AGENTS.md to treat the PR-description SHA list as canonical
    squash-merge disclosure, or
  • Switch AI-assisted PRs to merge-commit / rebase-merge so per-commit
    trailers survive.

Tag v0.1.0-m1 was also created on a57883f (dev-preview, not a
release artifact) — pushed alongside this branch but separately.

Linked issue(s)

Refs PR #13 (post-merge observation). N/A — no GitHub issue.

Release notes

NONE

Checklist

  • N/A — docs-only change
  • make ci passes locally
  • Commits are signed off (git commit -s)
  • Commits cryptographically signed (SSH-based, see scripts/setup-signing.sh)
  • N/A — no new components in this PR

PR-13 (commit a57883f) merged via GitHub's squash-merge. Observed:
the per-commit Assisted-by trailers on the 43 source commits did
NOT survive as trailers on the squash commit. The squash body
includes the PR description with every source SHA, so the audit
trail is intact; the trailer-level disclosure AGENTS.md describes
is not.

Two governance paths, neither chosen here:
- update AGENTS.md to treat the PR-description SHA list as
  canonical squash-merge disclosure
- switch AI-assisted PRs to merge-commit / rebase-merge so
  per-commit trailers survive

Flagging for governance review; not a code fix.

Signed-off-by: tree <tree@lumalabs.ai>
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
@trilamsr trilamsr enabled auto-merge (squash) May 14, 2026 09:09
@trilamsr trilamsr merged commit e410234 into main May 14, 2026
9 checks passed
@trilamsr trilamsr deleted the chore/m1.6-postmerge branch May 14, 2026 09:10
trilamsr added a commit that referenced this pull request May 14, 2026
Address items #9 (--version OTel SDK), #13 (Example_* tests),
#14 (Kind constant coverage + helper), #15 (shutdown error logging).
Also tactical: extract initTelemetryStack from runCollect so the
silent-shutdown-error fix didn't push cyclomatic complexity past 15.

- internal/version: Build gains OTelSDKMetric + OTelPromExporter
  fields, populated from debug.ReadBuildInfo. Surfaces in --version
  output as `otel-sdk-metric=vX.Y.Z, otel-prom-exporter=vA.B.C`.
  Operators debugging OTel-side bugs see which SDK actually shipped
  without grep'ing go.mod.

- internal/selftelemetry: new CanonicalKinds() returning the
  Kind* constant set and IsCanonicalKind(s) helper. Test pins the
  set + asserts clockreceiver's KindDownstream is canonical. Future
  receivers (M8/M9) plug in the same test pattern; eventual lint
  enforcement is documented in FOLLOWUPS.

- internal/selftelemetry/example_test.go: ExampleNewReceiver +
  ExampleNewExporter render in pkg.go.dev for M8/M9 authors and
  compile-check the 6-line wiring pattern every CI run. The
  RECEIVER-PATTERNS.md snippet can no longer drift from a working
  example without breaking this test.

- cmd/tracecore.runCollect: previously swallowed mp.Shutdown errors
  in the failure-unwinder paths with `_ = mp.Shutdown(ctx)`. Now
  logs at Warn level so a real drain failure surfaces. Extracted
  the telemetry-init block into initTelemetryStack helper to keep
  runCollect's cyclomatic complexity under the gocyclo 15 ceiling
  (the helper's own error-handling carries the rest of the
  cleanup branches).

make ci clean. Coverage unchanged on most pkgs; selftelemetry up
to 91.0% via the new IsCanonicalKind test.

Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr added a commit that referenced this pull request May 19, 2026
…rd surface

## What this PR does

Lands M19 (pattern #14 pod-evicted) end-to-end against the rubric's
hermetic replay-test gate, plus the M10 NodeRecord surface the
detector consumes.

### Pattern detector (internal/synthesis/patterns/)

- `pod_evicted.go`: `PodEvictedDetector{}.Evaluate(events, nodeConds)`
  joins each Pod-Evicted Record against the most-recent per-node
  pressure transition within a 30s sliding window (configurable via
  `JoinWindow`). Returns one `PodEvictedVerdict` per evicted pod,
  sorted by event time for golden-test determinism.
- Headline matches the rubric's regex contract for the canonical
  disk-pressure fixture (`/Pod .* evicted at .* due to disk pressure/`);
  remediation matches `/relocate.*NVMe/`. Three other pressure roots
  (memory/pid/notready) get distinct operator-facing remediations.
- Missing-layer path: when no NodeRecord joins, verdict falls back to
  `confidence: partial` with `missing_layers: ["node_condition"]` and
  the pressure kind is parsed heuristically from the Event Note.
- `verdict.go`: `PodEvictedVerdict` + `EvidenceRef` + `Confidence`
  with JSON tags pinned by `testdata/verdict.schema.json` (additional-
  Properties:false; drift rejected in `verdict_schema_test.go`).
- Generic `Detector` interface deferred per PRINCIPLES §3 rule-of-three
  — M17 / M18 land as the second and third consumers.

### Replay runner + corpus (internal/synthesis/replay/)

- `runner.go`: `LoadFixture(dir)` / `LoadFixturesUnder(root)` walk a
  directory tree, load `manifest.json` + `events.json` +
  `node_conditions.json` + `golden.json`. Pure stdlib; hermetic by
  design per rubric L414.
- `runner_test.go`: drives every fixture under `pod_evicted/`,
  JSONEq-diffs detector output against `golden.json`. Asserts the
  canonical + three negative fixtures all exist as a structural gate.
- Canonical fixture: `pod_evicted/canonical/` (Evicted Pod on
  `gpu-node-0001` at T=10:00:00Z; DiskPressure transition at T-5s →
  full-confidence verdict).
- Negative fixtures under `_negative/`: `killing/`, `preempted/`,
  `failed_scheduling/` — each produces an empty verdict slice (rubric
  L407 FP-rate contract).

### M10 NodeRecord surface (components/receivers/k8sevents/)

- `node_record.go`: typed `NodeRecord` + `NodePressureKind` constants
  (memory/disk/pid/notready) + distinct `NodeSchemaURL` so downstream
  consumers version-gate the two surfaces independently. `HintForPressure`
  inversion: NotReady → HintNodeUnhealthy; rest → HintNodePressure.
- `node_emit.go`: `buildNodeLogRecord` analogous to `buildLogRecord`.
- `node_informer.go`: pure `convertNodeConditions(*corev1.Node)
  []NodeRecord` (only alarming statuses surface: True for the three
  pressures, False for Ready) + `nodeConditionDedup` that tracks
  (nodeUID, conditionType, transitionNano) so an unrelated Node-object
  update doesn't re-emit.
- `receiver.go`: Node SharedInformer registered when
  `Config.NodeConditions.Enabled=true` (default false — wider RBAC).
  `nodeRecords chan NodeRecord` joins the existing `events chan` in
  `run()`'s select; nil channel blocks when disabled.
- `rbac.yaml`: adds `core/v1.nodes get,list,watch`. `rbac.can-i.golden`
  updated to match (sorted verb-then-resource).
- `Record.ReportingInstance` added (kubelet sets to node name; M19's
  join key); `MaxAttributesFloor` bumped 9→10 to keep it intact.
- `pattern_consumer_test.go`: 4 new compile-gate tests pin the
  NodeRecord field walk, the four-kind exhaustiveness, the pressure→
  hint inversion, and `reporting.instance` wire-format pinning.
- `node_record_test.go`: mirrors `hint_test.go` for the
  conditionType→NodePressureKind taxonomy and the pressure→Hint map.

### CLI

- `tracecore failure-inject pod-evict --reason=... --seed=N --out=path`
  wraps `tools/failure-inject/podevict/podevict.go` for deterministic
  Event YAML generation. Closes FOLLOWUPS L806-815 carry-forward.
  `--allow-cluster-write` still returns `ErrClusterWriteUnsupported`
  (exit 70) — captured in M19's new carry-forward #5.

### Hygiene

- CHANGELOG entry under [Unreleased] / ### Added.
- MILESTONES §M19 flipped ☐ → ⧗ partial; rubric items annotated ☑/⧗
  with carry-forwards listed.
- FOLLOWUPS L806-815 closed; L927-934 (chaos.yml pattern_evicted
  matrix row) trigger-fired annotation added.

## Linked issue(s)

_No linked issue._

## Release notes

```release-notes
NONE
```

## Checklist

- [x] `make ci` exit 0 (verified twice; runtime budget under 60s)
- [x] All new unit + replay tests pass; no regressions across repo suite
- [x] JSON Schema gates verdict shape (additionalProperties:false)
- [x] Negative fixtures produce zero verdicts
- [x] RBAC golden regenerated; pattern_consumer compile-gate updated
- [x] Commits signed off

Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr added a commit that referenced this pull request May 19, 2026
…h error handler; defer 6 findings

Phase 2 dispatched 8 lens reviewers (performance, SRE/Infra,
maintainer, contributor, operator, adopter, security, researcher)
+ 1 mechanical compliance gate in parallel. Real bugs applied;
load-bearing-but-deferable findings logged in FOLLOWUPS.

## Pushback table

| ID  | Lens(es) | Beneficiary | Severity | Finding | Proof | Contradict | TDD record | Rubric+? | Action |
|-----|----------|-------------|----------|---------|-------|------------|------------|----------|--------|
| P2.1 | security + mechanical | operator (DCO policy) | BLOCKER | 4 of 5 commits on branch carried `Co-Authored-By: Claude` + `🤖 Generated` trailers, violating AGENTS.md "Load-bearing lessons" forward-comply rule. | `git log origin/main..HEAD --pretty=%B \| grep -c 'Co-Authored-By:'` returned 4. | Rule is advisory / grandfathered? Falsified — AGENTS.md explicit forward-comply, commits dated 2026-05-18/19. | n/a (process). | yes (gate added) | applied — operator-authorized interactive rebase + force-push-with-lease. New gate at `git log origin/main..HEAD --pretty=%B \| grep -cE 'Co-Authored-By:\|🤖 Generated' == 0`. |
| P2.4 | researcher | repo-long-term + operator | BLOCKER | `node.name` / `node.uid` collide with OTel semconv canonical `k8s.node.name` / `k8s.node.uid`; package already uses `k8s.event.hint`, so the chosen `node.*` broke local convention. | Existing `AttrEventHint = "k8s.event.hint"` precedent; OTel semconv k8s resource attributes doc. | tracecore-private namespace per `NodeSchemaURL` scope? Partial — but two of five attrs have direct upstream canonicals (name, uid); the other three (`node.pressure.kind`, `node.condition.transition_at`, `node.condition.message`) ARE tracecore-private and remain in `node.*`. | RED: `TestPatternConsumer_NodeRecordTypeCompiles` updated to expect `k8s.node.name`; ran, failed with `expected k8s.node.name got node.name`. GREEN: `AttrNodeName` / `AttrNodeUID` constants flipped; test passes; integration test propagates via Go-name reference (no string callsite drift). | yes (semconv-compat rubric addition) | applied (this commit) |
| P2.2 | operator | operator (alert visibility) | BLOCKER | Node informer registered AddEventHandler but never SetWatchErrorHandler — when nodes RBAC is missing the watch fails silently, the K8sEventsReceiverDegraded alert never fires, `tracecore_receiver_errors_total{kind="watch"}` does not move. | `git diff origin/main -- receiver.go` showed only eventInformer carried SetWatchErrorHandler. | Symptom only visible without RBAC? Confirmed — that's the production failure mode the alert exists for. | Shared `onWatchError` handler now wired on the Node informer (mirrors eventInformer). The handler itself is already tested via `TriggerWatchError`. Integration coverage for "no-nodes-RBAC → KindWatch increments" needs apiserver-side watch-fail injection; deferred to FOLLOWUPS as Hard-proof falsifier with concrete reproducer command. | no | applied (this commit) |
| P2.3 | operator | operator (alert correctness) | CONCERN → deferred | `emitNode` unconditionally `SetDegraded(false)` — a successful node emit clears degraded state set by an Event-watch failure, masking the Event-side outage. | Receiver.go `run()` calls `SetDegraded(false)` on either-side emit success; degraded is shared receiver state. | Both-informer-broken case fires both error paths and re-sets degraded? True — but a healthy node informer + broken event informer is the lasting masked-failure case. | Falsifier sketch logged in FOLLOWUPS: `TriggerWatchError → inject NodeRecord → assert Degraded()=true`. | no | deferred (FOLLOWUPS, M21-audit trigger) |
| P2.6 | operator | operator (telemetry triage) | CONCERN → deferred | Self-telemetry counters `KindParse`/`KindPanic`/`KindBackpressureDrop` shared between event + node deliverers; operator cannot disambiguate via `kind` label. | `IncError(selftelemetry.KindPanic)` callsites in `deliver` and `deliverNode`. | Adopters look at the message body, not the kind label? Falsifier: RUNBOOK triage steps that say "filter by kind label" lose specificity. | Fix shape (add `KindNodeParse` etc.) logged in FOLLOWUPS. | no | deferred (FOLLOWUPS, M21-audit trigger) |
| A1 | adopter | adopter (first-touch) | CONCERN → deferred | Helm-installed chart has no RBAC templates; `helm install` → silent zero events. Chart README Troubleshooting section has 8 entries; none mention RBAC. | `ls install/kubernetes/tracecore/templates/` shows no ClusterRole. | Already in FOLLOWUPS? Yes, but trigger ("first operator reports") is post-failure; adopter lens raised it as a first-touch gap. | n/a | no | deferred (FOLLOWUPS — existing entry sufficient, Troubleshooting README enhancement is the smaller follow-up) |
| A2 | adopter | adopter | NIT → deferred | `failure-inject pod-evict` undocumented in `docs/` or `getting-started.md`. | `grep -rn failure-inject docs/` zero hits. | Chaos tool, not setup step? Defensible. | n/a | no | deferred (FOLLOWUPS, M21 docs-sweep trigger) |
| F4 | operator | operator | NIT → deferred | Exit-70 contract for `--allow-cluster-write` not in `--help` or RUNBOOK. | Kingpin description omits exit code; RUNBOOK has no failure-inject section. | n/a | n/a | no | deferred (FOLLOWUPS, M21 docs-sweep trigger) |
| Contrib-F1 | contributor | repo-long-term | NIT → deferred | `testdata/verdict.schema.json` is pattern-#14-specific (`pattern.id const: "14"`); M17/M18 cannot reuse the path. | File body L12. | rule-of-three: don't anticipate the 2nd schema? Counter: 1-line rename is cheap. Counter-counter: every M17/M18 PR can rename in its own first commit. Verdict: defer. | n/a | no | deferred (FOLLOWUPS, M17/M18 PR-start trigger) |
| Contrib-F3 | contributor | repo-long-term | NIT → deferred | No `docs/patterns/pattern-14-pod-evicted.md`; precedent broken silently. | `ls docs/patterns/` shows pattern-1, -3, -4, -5 but no -14. | M19 MILESTONES entry IS the spec doc? Partially — pre-existing pattern docs are research notes; this PR didn't write one. | n/a | no | deferred (FOLLOWUPS, M21 docs-sweep trigger) |
| Mtnr-M1 | maintainer | repo-long-term | NIT → deferred | `Manifest.Description`/`ExpectedTiming` are dead fields (no consumer reads them). | `grep -rn Manifest.Description` returns zero non-test hits. | YAGNI per §2; trim later. | n/a | no | explicitly-skipped (operator-prose value in manifest body justifies retention even if no Go reader yet) |
| Perf-1..5 | performance | repo-long-term | NIT → APPROVED | Detector hot-path allocation, RFC3339Nano formatting, dedup mutex contention, golden bytes retention, nil-channel select cost — all within v0 scale. | Reviewer's per-finding benchmark thresholds documented in their report. | n/a | n/a | no | explicitly-skipped — PRINCIPLES §10 / §2 |

## Verdicts

| Lens | Verdict |
|------|---------|
| Performance | APPROVED |
| Maintainer | APPROVED (nits) |
| Contributor | APPROVED (doc-followups) |
| Operator | CONCERNS-REQUIRE-FIX → applied (P2.2) + deferred (P2.3, P2.6, F4) |
| Adopter | CONCERNS → deferred (chart gap, doc gap) |
| Security | BLOCKER-REQUIRES-RESOLUTION → applied (P2.1 DCO trailer purge) |
| Researcher | CHANGES-REQUESTED → applied (P2.4 semconv align) |

## Rubric evolution

- [P2-researcher] semconv-compat — attributes with OTel canonical use the `k8s.*` prefix; private attrs keep project namespace.
- [P2-security] forward-comply DCO trailer gate — `git log origin/main..HEAD | grep -cE 'Co-Authored-By:|🤖 Generated' == 0`.
- [P2-operator] node-side watch errors must surface via SetWatchErrorHandler.

## Verification

`make ci` exit 0. Force-pushed origin via interactive rebase
(operator-authorized). Gate 1 (AI-vocab grep) clean. Gate 2 (DCO
trailer) now clean.

Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr added a commit that referenced this pull request May 19, 2026
…rd surface (#75)

## Summary

First tracecore pattern detector + the supporting receiver-side surface,
with the operator-facing failure modes fully instrumented:

- **`PodEvictedDetector`** joins Pod-Evicted Events against per-node
pressure transitions inside a configurable sliding window (30s default)
and emits `PodEvictedVerdict` with structured evidence trail,
remediation prose pinned to the offending node, and `pattern.id="14"`.
- **Hermetic replay corpus** under
`internal/synthesis/replay/pod_evicted/`: one canonical disk-pressure
fixture + three negatives (Killing, Preempted, FailedScheduling). Runner
JSONEq-diffs detector output against per-fixture `golden.json`; runs
inside `make ci` budget.
- **`NodeRecord` surface on k8sevents** — typed record +
`NodePressureKind` constants + opt-in Node SharedInformer behind
`node_conditions.enabled` (default false). `core/v1.nodes` RBAC delta.
Attribute names follow OTel semconv (`k8s.node.name`, `k8s.node.uid`).
- **Per-source operability discipline** — `degradedEvent` /
`degradedNode` atomic bits OR'd to the receiver-wide flag, so a
successful Node emit cannot mask an Event-watch outage. Distinct
`KindNodeWatch` / `KindNodeParse` / `KindNodePanic` /
`KindNodeBackpressureDrop` labels on `tracecore_receiver_errors_total`.
`K8sEventsBackpressureDrops` alert matches both kinds via regex.
- **`tracecore failure-inject pod-evict`** CLI wraps the YAML generator
with O_NOFOLLOW-guarded output, deterministic-by-seed, exit 70 sentinel
for the live-cluster stub.
- **`Record.ReportingInstance`** added so detectors can join Pod-Evicted
events to nodes without inspecting the message body.
`MaxAttributesFloor` bumped 9 → 10 (breaking, flagged under `###
Changed` in CHANGELOG).
- **Operator runbook section** for the Node-informer failure modes,
triage by `kind` label, and the disable-in-a-hurry config knob.

## Test plan

- [x] `make ci` exit 0 — full lint, vet, race-enabled tests, doc-check,
alert-check, banned-phrase scan, govulncheck, deterministic build.
- [x] `go test -race ./... -count=1` clean across the whole repo.
- [x] `BenchmarkPodEvictedDetector_1kEventWindow` baseline: 865μs/op on
Apple M4 Pro against the ≤1ms/eval budget. `.github/workflows/bench.yml`
runs the same bench on GHA `ubuntu-latest` for a native Linux x86_64
datapoint after merge.
- [x] JSON Schema rejects extra fields, out-of-enum confidence, empty
evidence trail, numeric `pattern.id`, and unknown evidence kinds (five
negative-coverage cases).
- [x] Replay corpus round-trips deterministically; negative fixtures
produce empty verdict slices.
- [x] Node informer end-to-end through fake clientset: emits
`NodeRecord` with the correct `NodeSchemaURL`, opt-in gate enforced.
- [x] Node-watch RBAC denial via `PrependWatchReactor("nodes",
forbidden)` increments `KindNodeWatch` and not `KindWatch`; removing the
SetWatchErrorHandler line on the Node informer flips this test red.
- [x] Event-watch failure followed by successful Node emit keeps the
receiver-wide degraded flag true (alert silencing regression test).
- [x] `tracecore failure-inject pod-evict --reason=DiskPressure
--seed=42` produces byte-identical YAML; symlink targets refused with
ELOOP on Linux.

## Scope and follow-ups

Generic `Detector` interface deferred per the rule of three — M17/M18
land as the second and third consumers. `MILESTONES §M19` ships at ⧗
partial: **7/7 functional and 5/5 non-functional rubrics ☑**; the only
remaining work flagged on the milestone is anonymized-real-world replay
corpus (needs external capture). Eleven items captured in
`docs/FOLLOWUPS.md` each with a concrete trigger — these cover the Helm
chart RBAC gap, live-cluster detection wiring (M20+ scope),
distro-patched kubelet variants, OTel processor integration,
custom-controller NodeCondition Message variants, the `failure-inject
pod-evict` core/v1 vs events.k8s.io/v1 API-version mismatch, and four
smaller doc/refactor items.

```release-notes
NONE
```

---------

Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr added a commit that referenced this pull request May 31, 2026
…2b) (#246)

## Summary

Two net-new OTel processors land in the in-repo submodule under
`module/processor/` (RFC-0013 §M19 / PR-I.2b):

- **`rankjoinprocessor`** — 5s windowed cross-signal join. NCCL FR
  rank records + kubelet pod-evicted Events for the same pod
  cross-stamp `m19.joined`, `gen_ai.training.rank`, and
  `k8s.pod_evicted_at`. Same-batch joins stamp both records;
  across-batch joins stamp the eviction-side record (the rank record
  was already forwarded — the eviction-side stamp is the
  M19-canonical surface). Component type: `rankjoin`.

- **`patterndetectorprocessor`** — wraps `module/pkg/patterns/` +
  `module/pkg/replay/`. Projects OTel log records into the typed
  pattern-library model, runs the pod-evicted detector, and appends
  one verdict log record per match. The verdict carries
  `pattern.id`, `pattern.confidence`, `pattern.headline`,
  `pattern.remediation`, and `pattern.verdict_json` (full evidence
  trail). Golden test against the canonical `pod_evicted/canonical`
  fixture pins the OTel-attrs → verdict round-trip. Component type:
  `patterndetector`.

Closes #163 — M19 cross-signal join integration test now lives at
`module/processor/rankjoinprocessor/rankjoin_test.go` and exercises
the real 5s windowed join (not just the compile-pin the prior
containerstdout test asserted).

`builder-config.yaml` adds the two processor entries pointing at
`github.com/tracecoreai/tracecore/module v0.2.0`. The
`ncclfrreceiver` line moves from `v0.1.0` to `v0.2.0` in lockstep so
OCB's MVS picks one submodule version per build. The submodule tag
bump from `v0.1.0` to `v0.2.0` is **post-merge** via `git tag` —
not part of this PR.

`module/go.mod` gains
`go.opentelemetry.io/collector/processor v0.115.0` (+ its
`processortest` test dep).

### Adversarial review (self, fallback path)

The orchestrator requested an `Agent`-tool review subagent. The
`Agent` tool was not available in this session's tool surface; the
spec explicitly authorizes self-adversarial review as a fallback.
Findings reviewed against:

- **Race on `nowFn`:** `-race` tag passed; tests overwrite `nowFn`
  sequentially with no concurrent reads. Production code reads
  `nowFn` under `p.mu.Lock()`.
- **Across-batch pointer-mutation safety:** the rank/eviction
  buffers retain `plog.LogRecord` pointers ONLY for the in-batch
  case; `invalidateBufferPointers` at the tail of every
  `ConsumeLogs` call drops the pointers and flags the entries
  pointer-stale before forwarding downstream (the next consumer
  declared `MutatesData=true`).
- **Memory bound:** `MaxBufferedRecords` (default 4096) caps both
  buffers with FIFO eviction at capacity. `evictExpired` runs at
  the start of every `ConsumeLogs` call so the buffer also drains
  on the time axis.
- **Patterndetector across-call statelessness:** the detector
  evaluates records arriving in ONE `ConsumeLogs` call together. The
  README + RUNBOOK both call this out; cross-call buffering is on the
  v0.3 roadmap. The canonical fixture's events fit in one call so
  the golden test exercises the realistic shape.

```release-notes
- new processor: `rankjoin` — windowed cross-signal join between NCCL FlightRecorder rank records and kubelet pod-evicted Events (default window 5s, configurable). M19 integration-test home.
- new processor: `patterndetector` — wraps `module/pkg/patterns/`; emits one verdict log record per matched pattern. Today's set: pod-evicted (pattern #14).
- bump: `github.com/tracecoreai/tracecore/module` baseline in `builder-config.yaml` moves from `v0.1.0` → `v0.2.0` to track the processor additions; submodule tag cuts post-merge.
```

## Test plan

- [x] `cd module && GOWORK=off go test -race -count=1
./processor/rankjoinprocessor/...
./processor/patterndetectorprocessor/... ./pkg/...`
- [x] root `GOWORK=off go test ./...`
- [x] `make build` (OCB end-to-end, ./_build/tracecore registers nccl_fr
+ rankjoin + patterndetector)
- [x] `docker run --rm -v $(pwd)/install/kubernetes/tracecore:/chart
alpine/helm:3.16.4 lint /chart`
- [x] `make check` (fmt, lint, vet, tidy-check, mod-verify)

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request Jun 1, 2026
…ORTHSTAR pattern #3) (#255)

## Summary

Third real detector in `module/pkg/patterns/` — closes v0.3.0 NORTHSTAR
O1 ("3 patterns at v0.1.0") alongside `pod_evicted` (#14) and
`nccl_hang` (#15).

**Pattern**: a GPU Xid kernel event (carrying the customer-stable
`kernelevents.xid` attribute) followed within a configurable window
(default 60s) by a pod eviction on the same node → one verdict per (Xid,
evicted_pod) tuple. Operator-actionable: "Xid 79 on gpu-node-0001 →
training/job-rank-3 evicted 10s later" with remediation pinning the node
to drain and the pod to reschedule.

**Detector** (`module/pkg/patterns/xid_correlation.go`):
- `XidCorrelationDetector` zero-value usable; `CorrelationWindow`
overrideable (defaults to `DefaultXidCorrelationWindow = 60s`).
- `Evaluate(xids, events)` joins on `(node, time-within-window)` with
the most-recent Xid winning the proximate-cause attribution.
- Causal-direction guard: eviction BEFORE Xid does not correlate
(mirrors pod_evicted's future-transition exclusion).
- Deterministic output sorted by `(eviction_time, EventUID)`.

**Verdict shape** — single vs multi-layer decision: structurally two
evidence surfaces (kernel_event + pod_event in causal order in the
evidence trail), but emission rule is "both layers joined or no
verdict". Xid-without-eviction is already operator-visible via the raw
`kernelevents.xid` telemetry; an Xid-only verdict would duplicate that
signal. So `Confidence` / `MissingLayers` are omitted (dead-fields
discipline from PR #250 review). Mirrors the `nccl_hang` shape.

**Multi-pod decision**: one verdict PER evicted pod (not one verdict
listing all). Per-pod is the operator-actionable shape — each verdict's
`Remediation` pins a specific pod to drain/recreate so alert routing
fans out by pod owner; collapsing would force operators to parse a list
and lose per-pod routing.

**Schema** (`testdata/xid_correlation_verdict.schema.json`):
`additionalProperties:false`, `pattern.id` const="16",
`evidence_trail.kind` enum=`["kernel_event","pod_event"]`, `minItems:2`
(both layers required). 7-row drift-rejection battery covers
re-introduced `confidence`/`missing_layers`, evidence-minimum violation,
kind-enum, and `pattern.id` numeric vs string drift.

**Wiring** (`patterndetectorprocessor`):
- New `Config.XidCorrelationWindow` YAML field (default 60s, floor 1s),
validated alongside the existing window fields.
- `collectInputs` returns a fourth typed slice (`[]patterns.XidRecord`)
built by `projectXidRecord`, which gates on `kernelevents.xid` and reads
the host node from the resource attribute `k8s.node.name` (the standard
k8sattributes / resourcedetection stamp on a DaemonSet — same pattern
`projectNodeCondition` uses). Per-record `k8s.node.name` fallback for
non-DaemonSet emitters.
- `appendXidCorrelationVerdict` mirrors `appendNCCLHangVerdict`'s wire
format — broken-out scalar attrs plus full `pattern.verdict_json`.

**Recipe alignment**: `docs/integrations/journald-kernel.md`
§"Customer-stable attribute mapping" already documents
`kernelevents.xid` (int) as the OTTL-stamped surface. No recipe update
needed; the existing pipeline emits exactly what the detector consumes.
`gpu.id` (PCI BDF) is also stamped but intentionally not projected — the
detector doesn't use it yet, so it stays available on the raw record
without entering the pattern library as a dead field.

## Test plan

- [x] `cd module && go test ./pkg/patterns/... ./processor/... -race
-count=1` — green
- [x] `make build` — collector binary compiles
- [x] `make check` (golangci-lint + go vet + go mod verify) — 0 issues
- [x] TDD red-test-first: both `xid_correlation_test.go` files failed to
compile/assert before their impl landed
- [x] Detector test cases: positive Xid 79 → eviction, no-eviction
negative, cross-node negative, out-of-window edge (61s default),
multi-pod per Xid (3 verdicts), window-configurable, pre-Xid eviction
excluded, non-evicted-hint ignored, deterministic order,
most-recent-Xid-wins
- [x] Schema-conformance + 7-row drift-rejection battery
- [x] Wiring tests: positive verdict emission, healthy non-emission,
window override
- [x] `TestConfig_Validate` extended with sub-1s xid_correlation_window
floor case
- [x] `TestFactory_Surface` pins `DefaultXidCorrelationWindow` (and the
previously-unpinned `DefaultNCCLHangThreshold`) on the factory's default
config

```release-notes
feat(patterns): add xid_correlation detector (v0.3.0 NORTHSTAR pattern #3) — correlates GPU Xid kernel events (`kernelevents.xid` attribute per RFC-0013 §3) with downstream pod evictions on the same node within a configurable window (default 60s). Emits one verdict per evicted pod so alert routing can fan out per pod owner. Wired into patterndetectorprocessor; configurable via `xid_correlation_window` YAML field.
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr pushed a commit that referenced this pull request Jun 1, 2026
Adds install/kubernetes/tracecore/dashboards/patterns.json — an
operator-facing Grafana 10+ dashboard for the three shipped pattern
verdicts (pod_evicted #14, nccl_hang #15, xid_correlation #16) plus
future-proof templating on pattern.id.

Seven panels:

1. Verdict rate by pattern_id (timeseries, LogQL)
2. Top 10 evicted pods — pod_evicted (table, LogQL)
3. Top 10 hung NCCL collectives — nccl_hang (table, LogQL)
4. Top 10 Xid+eviction correlations — xid_correlation (table, LogQL)
5. Verdict count by node (bargauge, LogQL)
6. Confidence distribution full vs partial (piechart, LogQL)
7. patterndetector processor throughput (timeseries, PromQL)

The verdict signal lives in OTLP log records (the processor's README
explicitly defers self-tel metrics to v0.3 — tracked in #261), so the
six verdict-derived panels query Loki via LogQL and the throughput
panel queries Prometheus for the upstream-standard otelcol_processor_*
items metrics. Two datasource template vars (prometheus_datasource,
loki_datasource) keep both backends pluggable on import.

Bundled .lint config waives target-promql-rule on the six LogQL
panels — the dashboard-linter parses every target as PromQL
irrespective of target.datasource.type and fails on the first | in a
LogQL pipeline. The target-logql-rule still validates each query.

dashboard-linter lint --strict --config .lint patterns.json → exit 0.
json.load → 7 panels, 6 templating vars.

README documents prerequisites (Prometheus + Loki datasources, the
patterndetector processor wired into service.pipelines.logs), three
install paths (manual upload, grafana-cli API, kube-prometheus-stack
ConfigMap), the pattern → panel coverage matrix, the self-tel gap, and
the lint-exclusion justification.

Refs #261

Signed-off-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request Jun 1, 2026
## Summary

Adds `install/kubernetes/tracecore/dashboards/patterns.json` — an
operator-facing Grafana 10+ dashboard for the three shipped pattern
verdicts (pod_evicted #14, nccl_hang #15, xid_correlation #16), with
future-proof templating on `pattern.id` for M17/M18 detectors.

Closes #280 (LogQL drift vs Loki native OTLP). Issue #270 (scalar
attribute promotion) was closed by PR #275 upstream of this rebase.

### LogQL shape: Loki native OTLP structured metadata

Verdict log records carry the verdict scalars as OTLP log-record
attributes (PR #275): `pattern.id`, `pattern.confidence`,
`k8s.pod.name`, `k8s.pod.namespace`, `k8s.node.name`,
`k8s.event.reason`, `nccl.fr.pg_id`, `nccl.fr.collective_seq_id`,
`nccl.fr.hanging_ranks_count`, `kernelevents.xid`. The Loki backend
recipe (PR #278, `docs/integrations/loki.md`) sends these to Loki
3.0+'s native OTLP endpoint (`/otlp/v1/logs`) via the bundled
`otlphttp` exporter.

Loki's OTLP receiver lands log attributes as **structured metadata**,
queryable as direct LogQL label filters — no `| json` parser stage,
no `attributes_` prefix. Dots in attribute names normalize to
underscores at the LogQL surface (per Loki upstream docs
`docs/sources/shared/otel.md` "Format considerations"):

- `pattern.id` → `pattern_id`
- `pattern.confidence` → `pattern_confidence`
- `k8s.pod.name` → `k8s_pod_name`
- `nccl.fr.pg_id` → `nccl_fr_pg_id`
- `kernelevents.xid` → `kernelevents_xid`

All six verdict panels are written against this shape (e.g.
`{job=~"$job"} | pattern_id="14" | k8s_node_name=~".+" [$__auto]`).
README documents the Loki native-OTLP install path (cross-refs PR
#278) and notes the Promtail/Alloy fallback for operators not on
native OTLP (those need `| json pattern_id="attributes_pattern_id"`
extraction; not shipped in-tree, fork the JSON).

### Self-telemetry counter still on v0.3 roadmap

The spec proposed PromQL queries against
`otelcol_processor_patterndetector_verdicts_emitted_total`. That
metric does not exist yet — the patterndetectorprocessor README says
*"No self-telemetry yet. Self-telemetry is on the v0.3 roadmap"*.
Per the spec's constraint (*"DO NOT touch detector code or processor
code"*), root-causing the missing metric is out of scope for this
PR; the six verdict-derived panels query Loki via LogQL in the
meantime. Tracked under #261. When that lands, the six LogQL panels
swap to PromQL and the Loki dependency drops.

Panel 7 (throughput) queries Prometheus against the upstream
OTel-Collector standard `otelcol_processor_{incoming,outgoing}_items`
which the collector emits for every processor automatically.

### Panels shipped (7)

| # | Title | Datasource | Patterns covered |
|---|---|---|---|
| 1 | Verdict rate by pattern_id | Loki / LogQL | 14, 15, 16 (templated)
|
| 2 | Top 10 evicted pods | Loki / LogQL | 14 (pod_evicted) |
| 3 | Top 10 hung NCCL collectives | Loki / LogQL | 15 (nccl_hang) |
| 4 | Top 10 Xid+eviction correlations | Loki / LogQL | 16
(xid_correlation) |
| 5 | Verdict count by node | Loki / LogQL | all (templated) |
| 6 | Confidence distribution (full vs partial) | Loki / LogQL | all
(templated) |
| 7 | patterndetector processor throughput | Prom / PromQL | (pipeline
liveness) |

### Templating vars (6)

- `prometheus_datasource`, `loki_datasource` — datasource selectors (no
  hardcoded UIDs).
- `job`, `instance` — linter-mandated PromQL matchers, populated from
  `otelcol_process_uptime`.
- `cluster` — multi-cluster slice, populated from
`otelcol_process_uptime`.
- `pattern_id` — custom-options var seeded with the three shipped IDs
  (14/15/16). Extend in-place when new detectors land.

### Linter exclusion (justified)

`.lint` config waives `target-promql-rule` on the six Loki panels. The
dashboard-linter parses every target as PromQL irrespective of
`target.datasource.type` and fails on the first `|` in a valid LogQL
pipeline. `target-logql-rule` still validates each query as LogQL and
passes. Full justification in
`install/kubernetes/tracecore/dashboards/README.md` §"Linter exclusions"
and inline in the `.lint` `reason:` block. Removable once issue #261
swaps the panels to PromQL.

## Test plan

- [x] `dashboard-linter lint --strict --config .lint patterns.json`
      → exit 0 (built from source — `go install` rejects the linter's
      own go.mod replace directives; build steps in
      `install/kubernetes/tracecore/dashboards/README.md`).
- [x] `python3 -c "import json;
json.load(open('install/kubernetes/tracecore/dashboards/patterns.json'))"`
      → exit 0; 7 panels, 6 templating vars confirmed.
- [x] `make check` (golangci-lint + vet + tidy-check + mod verify) →
      exit 0.
- [x] `make verify` (license-check + doc-check + register-lint +
      actionlint + zizmor + no-autoupdate) → exit 0.
- [x] LogQL shape verified against Loki upstream docs
      (`docs/sources/shared/otel.md` "Format considerations" — dots
      and special characters normalize to underscores; no
      `attributes_` prefix on the native OTLP surface) and
      `docs/sources/send-data/otel/native_otlp_vs_loki_exporter.md`
      query examples (`{service_name="auth"} | severity_text="INFO"`).
- [x] Promoted scalar attrs verified against
      `module/processor/patterndetectorprocessor/patterndetector.go`
      `VerdictAttr*` constants (lines 25-88) and `appendVerdict` /
      `appendNCCLHangVerdict` / `appendXidCorrelationVerdict` emitters
      (lines 510-517+).
- [ ] Manual smoke test against a live cluster with Loki 3.0+ native
      OTLP receiver (deferred — adversarial reviewer to verify panels
      render against actual OTLP-native Loki output before merge).

```release-notes
Ship Grafana dashboard for tracecore's pattern verdicts: install/kubernetes/tracecore/dashboards/patterns.json. Seven panels cover the three shipped pattern detectors (pod_evicted #14, nccl_hang #15, xid_correlation #16) plus templated pattern.id for future M17/M18 patterns. LogQL queries target Loki 3.0+ native OTLP structured metadata (pairs with the Loki backend recipe). Includes README install guide (manual upload, grafana-cli, kube-prometheus-stack ConfigMap), Promtail/Alloy fallback notes, and pattern coverage matrix.
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request Jun 1, 2026
## Summary

Closes #331. Replaces the single `pattern-pod-evicted` chaos job with a
matrix-strategy `pattern-detectors` job covering every detector named
in v1-rc1 cut-criterion §1: `hbm_ecc`, `thermal_throttle`, `pcie_aer`,
`pod_evicted`, `nccl_hang`, `xid_correlation`.

Each row runs the detector's positive + negative + schema `*_test.go`
prefix under `-race`:

- **Positive / Edge asserts** — verdict emitted with correct
  `pattern.id` within bounded time.
- **Negative / SchemaRejectsDrift asserts** — zero verdicts on
  baseline / confounder input (the false-positive gate the chaos
  matrix exists to enforce).

The `pod_evicted` row additionally runs the on-disk replay corpus under
`module/pkg/replay/pod_evicted/`; sibling rows skip that step until
their `replay/<pattern>/` fixtures land — tracked as Path A in
[`docs/v1-rc1-test-audit.md`](../blob/main/docs/v1-rc1-test-audit.md)
§4.

## Why this matrix shape, not new `failure-inject` modes

Audit §4 explicitly recommends Path A (matrix over existing
hermetic detector tests) for rc1 and defers Path B (per-pattern
`failure-inject` CLI modes) to v1.x. Path A reuses the
race-detector + golden-SHA shape already validated by
`pattern-pod-evicted` and costs ~60 LoC of workflow YAML instead of
five new CLI subcommands.

## Wall-time budget

Observed per-row runtime under `-race` (measured locally on this
worktree):

| Row | runtime |
|---|---|
| hbm_ecc | 1.35s |
| thermal_throttle | 1.33s |
| pcie_aer | 1.30s |
| pod_evicted (incl. replay corpus) | 1.35s + 1.34s |
| nccl_hang | 1.31s |
| xid_correlation | 1.32s |

Per-row `timeout-minutes: 5` is ~3× headroom; matrix parallelism
keeps total wall-time well inside the chaos-gate budget.

## Test plan

- [x] `actionlint .github/workflows/chaos.yml` — clean.
- [x] `python3 -c "import yaml;
yaml.safe_load(open('.github/workflows/chaos.yml'))"` — parses, 3 jobs
(`harness-determinism`, `cpu-steal-mpstat`, `pattern-detectors`), 6-row
matrix.
- [x] For every matrix row, `cd module && go test -race -count=1 -run
'<regex>' ./pkg/patterns/...` — all green (counts: HBMECC=16,
ThermalThrottle=17, PCIeAER=15, PodEvicted+PressureFromNote=12,
NCCLHang=10, XidCorrelation=14).
- [x] `cd module && go test -race -count=1 ./pkg/replay/...`
(pod_evicted corpus step) — green.
- [x] Pre-commit hook: golangci-lint + go vet + go mod verify +
attribute-namespace-check — all green.
- [ ] Workflow runs green on `main` after merge (verifiable only
post-merge — nightly cron + push-to-main).

```release-notes
ci: chaos.yml now matrix-tests all six shipped detectors (#14
pod_evicted, #8 nccl_hang, #3 hbm_ecc, #4 thermal_throttle, #5
pcie_aer, xid_correlation) instead of just pod_evicted. Closes the
chaos-coverage gap called out in v1-rc1-test-audit §4. No
operator-visible behavior change.
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request Jun 1, 2026
## Summary

Lands replay corpora for the seven pattern detectors that lacked one,
closing the Path-A test gap named in the v1-rc1 audit (#366). Each
pattern now ships `module/pkg/replay/<pattern>/{canonical,_negative,
_real_world}/` fixtures plus a `*_replay_test.go` runner that JSON-eqs
detector output against the on-disk golden.

Detectors covered: `hbm_ecc` (#3), `thermal_throttle` (#4),
`pcie_aer` (#5), `ib_link_flap` (#2), `nccl_hang` (#15), `cuda_oom`
(#10), `xid_correlation` (#16). `pod_evicted` (#14) corpus is
unchanged.

## Design

- Each detector takes a different input shape (e.g. `HBMECCRecord +
  XidRecord` for `hbm_ecc`, `ThermalThrottleRecord` for
  `thermal_throttle`, ...), so the existing `LoadFixturesUnder` helper
  (typed on `Record + NodeRecord`) cannot be reused. Each detector
  gets its own `*_replay_test.go` that inlines the per-detector JSON
  read; shared fixture-discovery and golden-assert helpers live in
  `helpers_test.go`.
- Two detectors (`nccl_hang`, `ib_link_flap`) take a `Now` reference.
  Tests pin `Now` to a fixed timestamp matching the fixture's
  `started_ns` so hang-age and flap-window inclusion stay
  deterministic across replay runs (otherwise wall-clock drift would
  silently flip the verdicts as the fixture aged).
- Goldens were generated from the live detectors via
  `UPDATE_REPLAY_GOLDEN=1 go test ./module/pkg/replay/...` and pinned;
  future drift in detector output (headline / remediation prose,
  evidence-trail UID shape, scalar-field rename) surfaces as a
  `JSONEq` diff against the fixture. Operators can also eyeball the
  golden to assert what they EXPECT vs what fires.
- Negative fixtures each exercise a distinct discriminator (wrong Xid
  code, single GPU, no AER, no eviction, completed state, single
  transition, no OOM log) so a regression in one false-positive guard
  lights up the corresponding row only.
- Flipped `run_corpus: true` on every row of the chaos.yml
  pattern-detectors matrix now that every detector has a corpus.

## Test plan

- [x] `make check` — clean
- [x] `go test -race -count=1 ./pkg/replay/...` — 35 tests pass
  (28 new + 7 pre-existing pod_evicted)
- [x] `go test -count=1 ./processor/patterndetectorprocessor/...` —
  unchanged, still green
- [x] `make verify` (pre-push) — clean
- [ ] CI: chaos.yml `pattern-detectors` matrix — 8 rows, each runs
  hermetic regex + replay-corpus step

Closes #366.

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request Jun 4, 2026
Signed-off-by: Tri Lam <tree@lumalabs.ai>
trilamsr added a commit that referenced this pull request Jun 4, 2026
## Summary

Closes the missing-pattern-docs gap in `docs/patterns/` for the two
numerically-missing IDs (#6, #14) against NORTHSTARS Appendix A's
15-pattern roster.

## Investigation

Surveyed `module/pkg/patterns/`, NORTHSTARS Appendix A, and
MILESTONES M18/M19 to pin actual detector state per ID:

| ID | NORTHSTARS Appendix A claim | Actual repo state | Decision |
|---|---|---|---|
| #6 Stragglers (M18) | `☑ in-tree detector` | **No detector exists** —
no `straggler*.go`, no `data_time` aggregation, no replay fixture. Only
`failure-inject cpu-steal` chaos hook (M4b) lands the symptom.
MILESTONES line 237 acknowledges M18 is at-risk and build-time-coupled
to M17's `cross_rank.go`. The Appendix A mark is aspirational, not
current state. | **Option B** — README gap-note explicitly documenting
the unimplemented state + linking the M18 milestone. Reconciliation of
Appendix A's claim is left for the detector-shipping PR. |
| #14 Pod evicted (M19) | `☑ in-tree detector` | Shipped —
`patterns.PodEvictedDetector` in `module/pkg/patterns/pod_evicted.go`,
processor wiring in `module/processor/patterndetectorprocessor/`,
canonical replay fixture at `module/pkg/replay/pod_evicted/canonical/`.
| **Option A** — backfill full operator walkthrough
`14-pod-evicted-walkthrough.md` following the
`02-ib-link-flap-walkthrough.md` template (Symptom → Signal → Query →
Alert → Escalation → Replay → Detector status → Verdict shape →
Integration recipe). |

Also surfaced #15 (Image pull / `FailedMount` on restart) — present
in Appendix A but absent from this directory. Added to the same
"Reserved / unfilled" table for completeness so every Appendix A
ID is accounted for somewhere in `docs/patterns/`.

## Changes

- `docs/patterns/14-pod-evicted-walkthrough.md` (new, ~210 lines) —
  full operator walkthrough sourced from
  `module/pkg/patterns/pod_evicted.go` godoc, `verdict.go`
  `PodEvictedVerdict` struct, processor config knobs
  (`join_window`, `emit_partial_verdicts`), the
  `k8sobjectsreceiver` integration recipe, and the canonical replay
  fixture's headline regex contract.
- `docs/patterns/README.md` — adds the #14 row to the Operator
  walkthroughs (shipped) table; rewrites the coverage prose
  (5 → 7 patterns walkthrough'd); adds a new "Reserved / unfilled"
  section documenting #6 and #15 explicitly so the numeric gaps in
  the existing tables are no longer silent.

## Test plan

- [x] `make doc-check` green — all markdown links + anchors resolve,
  banned-phrase lint clean, walkthrough renders.
- [x] Pre-commit hooks pass (`golangci-lint`, `go vet`,
  `go mod verify`, `attribute-namespace-check`,
  `deprecation-check`).
- [x] Cross-checked every link in the new walkthrough resolves to a
  real on-disk file (`pod_evicted.go`, `pod_evicted_test.go`,
  `pod_evicted_bench_test.go`, `replay/pod_evicted/canonical/`,
  `k8sobjects-events.md`, `ATTRIBUTES.md`,
  `example_config.yaml`).

```release-notes
docs(patterns): add operator walkthrough for pattern #14 (pod evicted); document pattern #6 + #15 gaps in the patterns README.
```

---------

Signed-off-by: Tri Lam <tree@lumalabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant