Skip to content

docs(patterns): correlation-window semantics rationale (#367)#388

Closed
trilamsr wants to merge 1 commit into
mainfrom
docs/correlation-window-audit-367
Closed

docs(patterns): correlation-window semantics rationale (#367)#388
trilamsr wants to merge 1 commit into
mainfrom
docs/correlation-window-audit-367

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

The patterndetector ships 11 detectors with 14 time-bounded knobs. The join shape varies across patterns and the rationale lived only in code comments + PR review threads — operators tuning windows had to read source per detector. PR #346 reviewer flagged this on three named detectors; the actual scope is wider.

Root-cause finding

Five distinct join shapes exist across the 11 detectors. They are load-bearing, not bugs — each shape matches the causal physics of the signal:

Decision

Document the existing reality, do not converge. The original suggestion (unified asymmetric two-knob form) was explored and rejected: it would force forward_window=0s on the one-sided detectors (a footgun on clock skew across nodes) and would not apply to #13 at all (bounds come from the eval record itself, not an operator knob).

Changes

  • docs/patterns/README.md — new "Correlation-window semantics" section covering all 11 detectors with the 5-shape vocabulary, per-detector predicate / anchor / default / rationale table, and a "why not converge" explanation.
  • docs/patterns/07-dataloader-hang.md — "Why this correlation shape" subsection (one-sided lookback rationale, predicate location, default).
  • docs/patterns/11-checkpointer-hang.md — "Why this correlation shape" subsection (asymmetric two-sided rationale, both legs explained).
  • docs/patterns/13-silent-data-corruption.md — "Why this correlation shape" subsection (job-bounded rationale, why no *_window knob).

The README table cross-links into the three per-pattern sections via in-page anchors.

Test plan

  • make check clean (gofumpt, golangci-lint 0 issues, go vet, go mod verify, attribute-namespace-check 100/100).
  • Source-of-truth claims verified by direct grep: predicate locations cited (module/pkg/patterns/*.go line numbers), defaults match module/processor/patterndetectorprocessor/config.go, bound-inclusivity (> for correlation, < for freshness) confirmed by tests GOWORK=off go test ./module/pkg/patterns/... -run Window green on main.
  • All cross-links resolve (in-page anchors validated by header slug match).

No code changes; no detector behavior changes.

Closes #367.

NONE

Three v1 detectors use three different correlation-window shapes:
pattern #7 (dataloader_hang) one-sided look-back, pattern #11
(checkpointer_hang) asymmetric back/forward legs, pattern #13
(silent_data_corruption) symmetric job window. The shapes were
chosen independently and the rationale lived only in code comments
and review threads.

Decision: (A) document the existing reality, do not converge.
Each shape matches a distinct physical event-ordering: strict
cause→symptom for #7, bidirectional log/threshold race for #11,
order-free job-scoped attribution for #13. A unified asymmetric
two-knob form (issue suggestion) would silently zero one leg for
#7 and would not apply to #13 at all; operators would tune knobs
that have no physical meaning for their pattern.

Adds a "Why this correlation shape" subsection to each pattern doc
and a "Correlation-window semantics" comparison table to
docs/patterns/README.md cross-linking the three rationales. No
code changes; no detector behavior changes.

Closes #367.

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr

trilamsr commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Empty diff — content was discarded during worktree-collision recovery. PR #392 bundles the same #367-adjacent work; reopening #367 fresh as separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[audit] cross-detector correlation-window semantics consistency

1 participant