Skip to content

refactor(patterndetector): move per-pattern projectors out of patterndetector.go #375

Description

@trilamsr

Context

Post-wave-audit finding #1 (see docs/v1-rc1-post-wave-audit.md).

module/processor/patterndetectorprocessor/patterndetector.go is 850+ lines and now hosts the surviving projector functions for 9 patterns:

  • projectThermalThrottleRecord (line 452)
  • projectXidRecord (line 499)
  • projectHBMECCRecord (line 547)
  • projectPCIeAERRecord (line 604)
  • projectPCIeIORecord (line 648)
  • projectIBPortStateRecord (line 708)
  • projectNCCLFRRecord (line 761)
  • projectPodEvent (line 803)
  • projectObjectRef (line 849)

Post-wave detectors (checkpointer_hang, dataloader_hang, nccl_bootstrap, cuda_oom, silent_data_corruption) have moved their projectors into their sibling files. The pre-wave detectors did not. Inconsistent layout.

Fix

Move each pattern-specific projector into its sibling file (thermal_throttle.go, xid_correlation.go, hbm_ecc.go, pcie_aer.go, ib_link_flap.go, nccl_hang.go, pod_evicted.go). Leave the orchestration loop + shared helpers in patterndetector.go.

Pairs well with #2 (consolidate shared projectors into projectors_shared.go) and #3 (hoist k8s scope helper).

Acceptance

  • patterndetector.go drops to ≤ ~300 lines (orchestration + shared-projector calls only).
  • Each pattern's projector lives next to its detector glue.
  • No behavior change; existing tests pass.

Discipline

Bias deletion. Ask: "is there a smaller way?" If a projector can collapse into the shared helper from #3, do that instead of moving.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpost-wave-auditFiled by docs/v1-rc1-post-wave-audit.md (2026-06-01)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions