Skip to content

chore(config): document pattern-prefixed knob naming convention (v1.0-rc1 cosmetic / v2.0 nest) #379

Description

@trilamsr

Context

Post-wave-audit finding #5 (see docs/v1-rc1-post-wave-audit.md).

module/processor/patterndetectorprocessor/config.go (540 lines) carries three knob-naming styles:

  • Bare (pre-wave): JoinWindow, NCCLHangThreshold, XidCorrelationWindow, HBMECCWindow, HBMECCDeltaThreshold, ThermalThrottleWindow, PCIeAERWindow, IBLinkFlapWindow, CUDAOOMCorrelationWindow
  • Prefixed (post-wave): CheckpointerHangBackwardWindow, CheckpointerHangForwardWindow, DataLoaderHangStallThreshold, NCCLBootstrapDeadline, NCCLBootstrapCorrelationWindow
  • Inconsistent flat (pre-wave): EmitPartialVerdicts, JoinWindow (top-level)

The prefixed form is the right shape (no collisions on *_window, *_threshold), but renaming bare→prefixed for v1.0-rc1 would break every existing values.yaml.

Options

  1. Pre-RC1 (cosmetic only) — leave alone. Document the convention in config.go top-of-file comment so future detectors follow the prefixed shape.
  2. v2.0 — nest into nccl_hang: { threshold: ... } blocks. Cleaner; requires migration helper.

Recommendation

Take option 1 for v1.0-rc1. File this issue against v2.0 milestone.

Acceptance

  • config.go top-of-file comment: "New pattern knobs MUST use <pattern>_<knob> prefix; bare names exist for backward-compat with pre-v0.4 values.yaml."
  • Lint check (optional): make ci flags new struct fields without a recognized prefix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationpost-wave-auditFiled by docs/v1-rc1-post-wave-audit.md (2026-06-01)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions