Skip to content

refactor(patterns): introduce Detector registry seam#520

Merged
trilamsr merged 2 commits into
mainfrom
refactor/pattern-registry-seam
Jun 4, 2026
Merged

refactor(patterns): introduce Detector registry seam#520
trilamsr merged 2 commits into
mainfrom
refactor/pattern-registry-seam

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Wave-end audit flagged the patterndetectorprocessor fanout site as an unmet refactor: ConsumeLogs hand-rolled dispatch for every shipped detector (12 today: 7 inline + 5 wrapped), so adding pattern #13 required editing the fanout body — not registering a new entry. Past the rule-of-three by 9x.

This PR introduces a minimal Detector registry seam:

  • module/pkg/patterns/detector.go: new Detector interface (PatternID() string) + Registered slice that pins all 12 detector pointers. Each *Detector struct gets a one-line PatternID() method.
  • module/pkg/patterns/detector_test.go: TestRegistered_PinsAllPatterns (exact PatternID set + count), TestRegistered_UniquePatternIDs, TestRegistered_NonEmptyPatternIDs. Drift gate — accidental drops fail in CI.
  • patterndetector.go: introduces detectorRunners []detectorRunner closure list iterated by ConsumeLogs. ConsumeLogs body drops from ~77 lines to 12. Adding pattern [general] M1.6 hardening: lint + coverage + panic recovery + pipelinebuilder #13 = one append to Registered + one append to detectorRunners, no fanout-site edit.

Design decision: metadata-only interface

The Detector interface is intentionally PatternID() string only — not a uniform Evaluate method. Each detector's Evaluate signature is intrinsically heterogeneous (different input record shapes — events+nodeConds, ncclRecs, xidRecs+events, etc. — and different verdict types). A uniform Evaluate would force a lossy any-typed contract that the typed test suite has been fighting for 12 patterns. The closure-per-detector approach keeps the typed Evaluate calls at their concrete-runner sites while letting the registry pin identity + iteration.

Behavior preservation

  • Same telemetry vocabulary: PodEvicted and IBLinkFlap still IncVerdict with string(v.Confidence) (they gate on partial); the other 5 inline detectors still pass "". The 5 wrapped runners still tick inside their own helpers (unchanged).
  • Same emission order: detectorRunners is declared in the legacy emission order.
  • Same partial-confidence gating: emitPodEvicted and emitIBLinkFlap preserve the !emitPartial skip.

Test plan

  • cd module && go build ./... clean
  • cd module && go test ./pkg/patterns/ green (incl. 3 new pin tests)
  • cd module && go test ./processor/patterndetectorprocessor/ green except pre-existing test: synthetic-2026-06-multi-rank-disk-pressure fixture mis-labelled as negative #497 (TestPatternDetector_NegativeFixturesEmitNoVerdicts/synthetic-2026-06-multi-rank-disk-pressure, fixed in Lane J)
  • make lint clean (0 issues)
  • make vet, go mod verify, attribute-namespace-check all green (pre-push hook)

LoC delta

  • +321 / -79 across 3 files.
  • ConsumeLogs body: 77 → 12 lines.
  • Growth is in: registry plumbing (164 lines, mostly comments + the pin tests), runner closures (one per detector). The seam earns its bytes — adding pattern #N is now O(append) instead of O(edit-fanout).

Closes-the-loop

Closes wave-end-audit next-wave item #2 (pattern registry seam).

- refactor(patterns): introduce `patterns.Detector` interface + `patterns.Registered` slice. The patterndetectorprocessor now iterates a registry-driven runner list instead of hand-rolled fanout — adding a new pattern is one append, not a processor edit. Behavior-preserving; no operator-facing change.

Wave-end audit flagged the patterndetectorprocessor fanout site as an
unmet refactor: ConsumeLogs hand-rolled the dispatch for every shipped
detector (12 today, 7 inline + 5 wrapped), so adding pattern #13
required editing the fanout, not registering a new entry.

This change introduces a minimal Detector interface + Registered slice
in module/pkg/patterns/ that pins the full set, plus a detectorRunners
closure list in the processor that ConsumeLogs iterates. ConsumeLogs
drops from ~77 lines to 12. Adding pattern #13 = one append to each
list, drift-pinned by TestRegistered_PinsAllPatterns.

Detector contract is deliberately metadata-only (PatternID() string) -
each detector's Evaluate signature is heterogeneous (different record
shapes, different verdict types), so a uniform Evaluate method on the
interface would force a lossy any-typed contract.

Behavior preserved: same telemetry vocabulary, same emission order,
same partial-confidence gating. Only pre-existing #497 failure
(synthetic-2026-06-multi-rank-disk-pressure, fixed in Lane J) remains.

Closes wave-end-audit next-wave item: pattern registry seam.

Signed-off-by: Tri Lam <tree@lumalabs.ai>
@trilamsr trilamsr enabled auto-merge (squash) June 4, 2026 01:07
Signed-off-by: Tri Lam <tree@lumalabs.ai>
@trilamsr trilamsr merged commit 3b56d5e into main Jun 4, 2026
27 checks passed
@trilamsr trilamsr deleted the refactor/pattern-registry-seam branch June 4, 2026 01:21
trilamsr added a commit that referenced this pull request Jun 4, 2026
## Summary

Finishes the `appendVerdict[V verdictAttrer]` consolidation that Lane M
(#508 / merged as #520) started. Before this change, only 7 of 12
verdict types satisfied the `verdictAttrer` interface — the other 5
(cuda_oom, checkpointer_hang, dataloader_hang, nccl_bootstrap,
silent_data_corruption) bypassed the generic emitter and called the
lower-level `appendVerdictRecord` directly with hand-built
`verdictCommon{...}` literals + per-pattern stamp closures.

## Root cause

The half-refactor wasn't an interface limitation — it was an unfinished
migration. The 5 verdict types simply never got `Common()` +
`PutAttrs()` methods written for them, and the per-pattern attribute-key
constants stayed in the consumer (`patterndetectorprocessor`) package
instead of moving to the producer (`patterns`) package alongside the
keys for the other 7 detectors. The result was three-tier indirection
(runner → `appendXxxVerdict` wrapper → `appendVerdictRecord`) for half
the detectors and one-tier (runner → `appendVerdict`) for the other half
— inconsistent enough that adding a new pattern required a coin-flip on
which template to follow.

## What changed

- Adds `Common()` + `PutAttrs(pcommon.Map)` methods to the 5 remaining
verdict types in `module/pkg/patterns/`.
- Moves the per-pattern attribute-key constants into the canonical block
in `module/pkg/patterns/verdict.go` next to the keys for the other 7 —
wire-format vocabulary is now entirely owned by the producer.
- Replaces the 5 inline `appendXxxVerdict` wrappers with
`appendVerdict[V]` call-site uses.
- Inlines `appendVerdictRecord` into `appendVerdict` (single caller
after the consolidation).
- Drops the now-dead `verdictCommon` type alias, `putStrIfSet` helper,
and orphaned `verdictAttrConfidence` / `verdictAttrPCIeAERGPUID`
constants from the processor.

All 12 detector emit paths now route through one generic emitter.

## Wire-format preservation

Verdict wire-format is preserved bit-for-bit: `PutAttrs` methods stamp
the same keys with the same values in the same order; concrete-type
`json.Marshal` still honors per-verdict json tags (because `V` is
concrete at the generic call site); the diagnostic kind label flows from
`VerdictCommon.Kind` set by each verdict's `Common()`.

## LoC delta

+266 / -369 (-103 net) across 12 files. Net deletion because 5 wrapper
functions + their per-pattern key-constant blocks collapsed into
method-based stamping with shared constants.

## Closes-the-loop

Wave-end-audit next-wave item #6 (full `appendVerdict` consolidation).

## Test plan

- [x] `cd module && go test ./...` green (full module, including
pattern-detector golden + verdict-attribute tests that pin the wire
format).
- [x] `make lint` green (0 issues; golangci-lint catches dead constants
+ unused helpers).
- [x] `go vet ./...` clean.
- [x] Manual diff sweep confirms wire-format equivalence: each
`PutAttrs` method stamps the same key/value/order as the previous inline
closure.

```release-notes
NONE
```

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant