Context
Post-wave-audit finding #5 (see docs/v1-rc1-post-wave-audit.md).
module/processor/patterndetectorprocessor/config.go (540 lines) carries three knob-naming styles:
- Bare (pre-wave):
JoinWindow, NCCLHangThreshold, XidCorrelationWindow, HBMECCWindow, HBMECCDeltaThreshold, ThermalThrottleWindow, PCIeAERWindow, IBLinkFlapWindow, CUDAOOMCorrelationWindow
- Prefixed (post-wave):
CheckpointerHangBackwardWindow, CheckpointerHangForwardWindow, DataLoaderHangStallThreshold, NCCLBootstrapDeadline, NCCLBootstrapCorrelationWindow
- Inconsistent flat (pre-wave):
EmitPartialVerdicts, JoinWindow (top-level)
The prefixed form is the right shape (no collisions on *_window, *_threshold), but renaming bare→prefixed for v1.0-rc1 would break every existing values.yaml.
Options
- Pre-RC1 (cosmetic only) — leave alone. Document the convention in
config.go top-of-file comment so future detectors follow the prefixed shape.
- v2.0 — nest into
nccl_hang: { threshold: ... } blocks. Cleaner; requires migration helper.
Recommendation
Take option 1 for v1.0-rc1. File this issue against v2.0 milestone.
Acceptance
config.go top-of-file comment: "New pattern knobs MUST use <pattern>_<knob> prefix; bare names exist for backward-compat with pre-v0.4 values.yaml."
- Lint check (optional):
make ci flags new struct fields without a recognized prefix.
Context
Post-wave-audit finding #5 (see
docs/v1-rc1-post-wave-audit.md).module/processor/patterndetectorprocessor/config.go(540 lines) carries three knob-naming styles:JoinWindow,NCCLHangThreshold,XidCorrelationWindow,HBMECCWindow,HBMECCDeltaThreshold,ThermalThrottleWindow,PCIeAERWindow,IBLinkFlapWindow,CUDAOOMCorrelationWindowCheckpointerHangBackwardWindow,CheckpointerHangForwardWindow,DataLoaderHangStallThreshold,NCCLBootstrapDeadline,NCCLBootstrapCorrelationWindowEmitPartialVerdicts,JoinWindow(top-level)The prefixed form is the right shape (no collisions on
*_window,*_threshold), but renaming bare→prefixed for v1.0-rc1 would break every existingvalues.yaml.Options
config.gotop-of-file comment so future detectors follow the prefixed shape.nccl_hang: { threshold: ... }blocks. Cleaner; requires migration helper.Recommendation
Take option 1 for v1.0-rc1. File this issue against v2.0 milestone.
Acceptance
config.gotop-of-file comment: "New pattern knobs MUST use<pattern>_<knob>prefix; bare names exist for backward-compat with pre-v0.4 values.yaml."make ciflags new struct fields without a recognized prefix.