Skip to content

chore: cleanup boilerplate doc.go + factory.go + unexport VerdictAttr#310

Merged
trilamsr merged 2 commits into
mainfrom
chore/arch-cleanup-ship-now
Jun 1, 2026
Merged

chore: cleanup boilerplate doc.go + factory.go + unexport VerdictAttr#310
trilamsr merged 2 commits into
mainfrom
chore/arch-cleanup-ship-now

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Mechanical cleanup of three accumulated boilerplate patterns surfaced by an arch audit of module/. All changes are no-op for tests and the OCB-built binary; net -47 LOC across 20 files.

  • Unexport patterndetectorprocessor.VerdictAttr* (13 consts). A repo-wide grep confirmed zero consumers outside the package; the consts only existed for in-package + in-package-test use. Renamed VerdictAttr*verdictAttr* across 8 .go files. Shrinks the package API by 13 symbols with no behavior change.
  • Drop the duplicated rankjoinprocessor.HintPodEvicted const + stale "import cycle" comment. go list -deps ./processor/rankjoinprocessor/... showed no patterns in the dep graph today, so the dodge was no longer paying for itself. rankjoin.go and rankjoin_test.go now reference patterns.HintPodEvicted directly.
  • Delete 5 boilerplate doc.go files; move each package comment inline to an existing file. module/doc.go deliberately skipped — it is load-bearing for the Go module proxy (the genesis tag module/v0.0.1 needs at least one .go file at the module root for proxies to validate the module).
  • Merge factory.go into the main package file for both patterndetectorprocessor and rankjoinprocessor. Factory funcs now sit at the top of patterndetector.go / rankjoin.go, immediately after imports — one file per package instead of two with circular cross-reads to follow NewFactory → newProcessor.

Root cause

This is a refactor of cumulative boilerplate, not a bug fix:

  1. Early-pivot pattern-detector PRs each landed a public VerdictAttr* const set anticipating cross-package consumers that never materialized.
  2. The rankjoinprocessor HintPodEvicted const was duplicated from patterns.HintPodEvicted to dodge an import cycle that the package graph no longer has.
  3. The Go convention of doc.go for the package comment + factory.go for the OCB factory both fragmented packages that are otherwise 1–2 files. The package comments cost a whole file each; the factory funcs cost a circular cross-read.

All three trace to a "PR-shaped boilerplate" reflex during the v0.1.x → v0.2.x velocity push. No external blocker; the fix is in this PR.

Test plan

  • go vet ./... (root) — clean
  • go vet ./... (module/) — clean
  • go build ./... (root) — clean
  • go build ./... (module/) — clean
  • go test ./... -count=1 -short (root) — all packages PASS (12 packages)
  • go test ./... -count=1 -short (module/) — all packages PASS (6 packages)
  • Pre-commit hook: golangci-lint (0 issues) + go vet + go mod verify (all modules verified)
  • CI: full lint + race tests on PR
NONE

Mechanical cleanup of three accumulated boilerplate patterns surfaced
by an arch audit. All changes are no-op for tests + binary; net -47 LOC.

- Unexport patterndetectorprocessor.VerdictAttr* (13 consts) — zero
  consumers outside the package (verified by grep across the repo).
  Renames carry across 8 .go files in the package (patterndetector.go
  + 7 _test.go). +0/-0 behavior; -13 exports from the package API.

- Drop the duplicated rankjoinprocessor.HintPodEvicted const + its
  stale "import cycle" comment. Verified no cycle via
  `go list -deps ./processor/rankjoinprocessor/...` (no `patterns`
  in dep graph today). rankjoin.go + rankjoin_test.go now reference
  patterns.HintPodEvicted directly. config.go -6, rankjoin.go +1 net.

- Delete 5 boilerplate doc.go files; move package comments inline
  to an existing file in each dir. module/doc.go skipped — it is
  load-bearing for the Go module proxy (genesis tag resolution).
  - module/pkg/patterns/doc.go        -> verdict.go      (-17/+14)
  - module/processor/patterndetectorprocessor/doc.go
                                      -> patterndetector.go header
  - module/processor/rankjoinprocessor/doc.go
                                      -> rankjoin.go header
  - module/receiver/ncclfrreceiver/doc.go
                                      -> nccl_fr.go header (-13/+10)
  - internal/integration/doc.go       -> already duplicated at top
                                         of ocb_scrape_test.go; just
                                         delete (-7)

- Merge factory.go into the main package file (factory funcs land
  at the top, immediately after imports, before instrumentationScope).
  - module/processor/patterndetectorprocessor/factory.go
                                      -> patterndetector.go (-42)
  - module/processor/rankjoinprocessor/factory.go
                                      -> rankjoin.go (-48)

Verification (mandatory): both modules vet + build + test green
under `go vet ./... && go build ./... && go test ./... -count=1 -short`.

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) June 1, 2026 06:29
Resolve conflicts with #309 verdict-writer consolidation by applying
the VerdictAttr* -> verdictAttr* unexport rename to main's new code.

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr merged commit 63e4098 into main Jun 1, 2026
17 checks passed
@trilamsr trilamsr deleted the chore/arch-cleanup-ship-now branch June 1, 2026 06:57
trilamsr pushed a commit that referenced this pull request Jun 1, 2026
Resolve 5 conflicts post-PR #310 / #312 / #313:
- factory.go deleted on main (merged into patterndetector.go);
  port wave's selftel wiring (#261) into the merged createLogs
- VerdictAttr* unexported per #310; rename 16 wave-added consts
  + all callers across cuda_oom + ib_link_flap + pcie_aer tests
- docs/{MILESTONES,FOLLOWUPS,patterns/README}.md path + content
  reconcile after MILESTONES.md moved to docs/

Address reviewer findings before PR:
- docs/THREAT-MODEL.md case-mismatch -> docs/threat-model.md
  (Linux CI is case-sensitive)
- pattern.id schema drift: 8 specs said `ib_link_flap`/`cuda_oom`,
  code emits "2"/"10"/.../"13"; rewrite spec attribute tables to
  match shipped customer-stable namespace
- pattern.confidence: 8 specs said `high|partial`, code emits
  `full|partial`; rewrite
- 02-ib-link-flap.md attribute drift: spec said
  tracecore.alert.ib_link_flap.{hca_device,port}, code emits
  hw.network.ib.{device,port.num}; align spec to shipped code
- v1-rc1-cut-criteria criterion #1 status stale-on-arrival
  ("6 patterns shipped" -> "8 patterns shipped, 4 remaining")
- NetPol UX trap: NOTES.txt warning when networkPolicy.enabled=true
  with empty allowedEgressEndpoints (silently kills OTLP exporter)
  + warning when ServiceMonitor scraper in different namespace
- File #337 for missing OTTL recipe projecting DCGM FB_USED/FREE
  -> hw.gpu.memory.{free,total} log shape (CUDA OOM detector
  consumes but recipe gap means it ships dark)

Tests: ./module/processor/patterndetectorprocessor/... +
./module/pkg/patterns/... both ok.

Signed-off-by: Tri Lam <tri@maydow.com>
trilamsr added a commit that referenced this pull request Jun 1, 2026
…ts (#338)

## Summary

15-agent parallel wave bridging v1.0-rc1 knowledge gaps + closing
horizon backlog. 31 commits, 81 files, +8650/-180.

**Code (5 detectors / features):**
- `feat(iblinkflap)` pattern #2 IB link flap detector — 13 tests,
cross-rank helper extracted for reuse by patterns #7/#9
- `feat(cudaoom)` pattern #10 CUDA OOM detector +
fragmentation-vs-true-OOM discriminator — 35 tests, 0/6 false-positive
rate on fixture corpus (#303 wiring — recipe gap tracked at #337)
- `feat(verdict)` deprecate EvictedPod, co-emit PodName + PodNamespace
(#277) with regression-pinning test
- `feat(chart)` opt-in default-deny NetworkPolicy + cert-manager mTLS
reference (#301); ServiceMonitor + scrape annotations (#296); NOTES.txt
UX warnings for empty-egress / cross-ns scraper traps
- `feat(bench)` per-detector allocs/event harness + soft ratchet gate,
graduation criterion documented (#302)
- `feat(patterndetector)` verdict counter metric for dashboard panels
(#261)
- `fix(slo-rules)` correct otelcol_* label set + drop silent-no-op
`unless on (instance)` join (#298)

**8 pattern design specs (`docs/patterns/{02,07-13}-*.md`):**
- Per pattern: symptom, layers crossed, signal sources, detector
evaluation rule, verdict attrs, edge cases, open questions.
- 7 load-bearing spec gaps flagged for future TDD red-test work
(multi-vendor SDC signal, cohort grouping, processor metrics path, etc).

**9 v1.0-rc1 audit / knowledge-gap docs:**
- `docs/v1-rc1-cut-criteria.md` — 12 falsifiable cut gates derived from
O1-O7
- `docs/v1-rc1-operational-gaps.md` — SLSA L3 + air-gap +
upgrade-rollback audit (8 issues filed #314-#321)
- `docs/v1-rc1-governance-gaps.md` — CODEOWNERS 0%, lint-principles
4/16, retros, `make ci` 148s (5 issues #322-#325, #327)
- `docs/v1-rc1-test-audit.md` — 82.9% coverage, fuzz harness inventory
(5 issues #328-#332)
- `docs/v1-rc1-simplification-audit.md` — top deletion candidates ~9.6K
LOC (3 issues #333-#335)
- `docs/threat-model.md` — STRIDE per trust boundary + audit RFP scope
(#336)
- `docs/reference-environments.md` — Tier 1 kind + Tier 2 32×H100
binding spec for O2 hero KPI
- `docs/adoption-pipeline.md` — S0-S3 funnel + comms templates for O5
hero KPI
- `docs/standards-roadmap.md` — 10 `gen_ai.training.*` attributes
proposed upstream (#326)

**Doc-drift cleanup:** 11 issues closed (#265, #268, #269, #276, #283,
#287, #292-295, #299).

**OTTL recipe wiring:** 6 issues closed (#260, #261, #273, #282, #284,
#285); #272 deferred to standards-roadmap.

**Multi-cluster auth:** bearer-token + mTLS examples (#297).

**Merge resolution + reviewer fixes:**
- Resolved 5 conflicts post-PR #310/#312/#313 (factory.go delete,
VerdictAttr* unexport, MILESTONES.md → docs/, FOLLOWUPS, patterns
README)
- Adversarial reviewer found 1 BLOCKER + 6 MAJOR; all addressed before
push:
  - Renamed 16 `VerdictAttr*` → `verdictAttr*` per #310 convention
  - Re-ported selftel wiring (#261) into main's merged `createLogs`
- Fixed case-mismatch `docs/THREAT-MODEL.md` → `docs/threat-model.md`
(Linux CI is case-sensitive)
- 8 pattern specs schema drift: `pattern.id` slug → numeric (`"2"`,
`"7"`...`"13"`), `pattern.confidence` `high` → `full`
- `02-ib-link-flap.md` attribute drift: spec said
`tracecore.alert.ib_link_flap.{hca_device,port}`, code emits
`hw.network.ib.{device,port.num}`
- `v1-rc1-cut-criteria` criterion #1 status stale-on-arrival ("6
patterns shipped" → "8 patterns shipped, 4 remaining")
- NetPol UX trap: NOTES.txt warns when `enabled=true` with empty
`allowedEgressEndpoints` (silently kills OTLP) or cross-ns Prometheus
- Filed #337 for missing OTTL recipe projecting `DCGM_FI_DEV_FB_*` →
`hw.gpu.memory.{free,total}` (CUDA OOM detector consumes but recipe gap)
- Post-merge stale-relative-path sweep: 6 wave docs + NORTHSTARS.md +
MILESTONES.md (`docs/`, `../`, `docs/docs/` drift after MILESTONES +
NORTHSTARS moved to docs/)
- Documented 5 newly-emitted attributes in ATTRIBUTES.md (drop_ratio +
IB tier — `attribute-namespace-check` now 67/67)

## Test plan

- [x] `go test ./module/processor/patterndetectorprocessor/...
./module/pkg/patterns/...` — ok
- [x] `make lint` (golangci-lint via goreleaser-style gate) — 0 issues
- [x] `go vet ./...` — clean
- [x] `make doc-check` — passes after stale-link sweep
- [x] `scripts/attribute-namespace-check.sh` — 67/67 documented
- [x] `helm lint install/kubernetes/tracecore` — 0 chart(s) failed
- [x] `promtool check rules` on slo-rules.yaml — 13 rules / SUCCESS
- [ ] CI compat-matrix (rc1 criterion #6) — gated on next wave
- [ ] manual smoke install on real cluster — owner clearance pending

```release-notes
Lands two new pattern detectors (#2 IB link flap, #10 CUDA OOM
fragmentation-vs-true discriminator), 8 pattern design specs for the
remaining v1.0 root-cause patterns, opt-in default-deny NetworkPolicy
+ Prometheus Operator ServiceMonitor on the Helm chart, the
EvictedPod → PodName/PodNamespace verdict-attribute deprecation
co-emit, per-detector allocs/event bench harness, SLO-rules label
fix, and the v1.0-rc1 knowledge-gap audit set (cut criteria, ops gaps,
governance gaps, test audit, simplification audit, threat model,
reference envs, adoption pipeline, standards roadmap).
```

---------

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant