docs: move MILESTONES + NORTHSTARS into docs/#313
Merged
Conversation
Audit flagged that MILESTONES.md (108K) and NORTHSTARS.md (28K) lived at
the repo root while every other narrative doc (STRATEGY, FAILURE-MODES,
maintainership, RELEASE-CHECKLIST, etc.) lives under docs/. Both files
are project-internal narrative, not user-facing surface — they belong in
docs/ alongside their siblings.
Mechanical move + cross-reference sweep:
- git mv MILESTONES.md docs/MILESTONES.md
- git mv NORTHSTARS.md docs/NORTHSTARS.md
- inside-file relative-link rewrites in both moved files (docs/* paths
lose their prefix; ../SECURITY.md, ../go.mod, ../CODE_OF_CONDUCT.md,
../CONTRIBUTING.md, ../PRINCIPLES.md, ../STYLE.md gain theirs)
- cross-file link-syntax rewrites: 52 markdown links across README,
CONTRIBUTING, AGENTS, docs/README, docs/RELEASE-CHECKLIST,
docs/maintainership, docs/patterns/README, docs/followups/README +
M3, docs/research/{m5-m6,m15,m16}, docs/rfcs/{0002,0003,0008,0009,
0010,0011,0012}, components/receivers/pyspy/README, docs/proposals/
gen-ai-training-semconv
- CI/script load-bearing paths: scripts/doc-check.sh milestones_doc
variable, install/kubernetes/tracecore/Chart.yaml artifacthub URL,
.github/{branch-protection.yml,workflows/chaos.yml,ISSUE_TEMPLATE/
feature_request.md} comments
- Go-comment factual references in bench/overhead, module/pkg/nccl/
fr_parser, module/receiver/ncclfrreceiver, tools/failure-inject
Files kept at root per the audit charter: README, CONTRIBUTING,
CODE_OF_CONDUCT, SECURITY, LICENSE, CHANGELOG, PRINCIPLES, STYLE,
AGENTS, CLAUDE.
Verification: doc-check.sh passes (554 markdown links resolve, 7
unverified markers still detected at the new path, banned-phrase lint
clean across 114 files). The 112 remaining bare-text mentions of the
filename (e.g., "the MILESTONES.md M15 rubric") are prose references,
not links, and still readable.
Signed-off-by: Tri Lam <tri@maydow.com>
added 2 commits
May 31, 2026 23:51
…o-docs-dir # Conflicts: # docs/MILESTONES.md
…o-docs-dir # Conflicts: # module/receiver/ncclfrreceiver/doc.go
trilamsr
pushed a commit
that referenced
this pull request
Jun 1, 2026
Resolve 5 conflicts post-PR #310 / #312 / #313: - factory.go deleted on main (merged into patterndetector.go); port wave's selftel wiring (#261) into the merged createLogs - VerdictAttr* unexported per #310; rename 16 wave-added consts + all callers across cuda_oom + ib_link_flap + pcie_aer tests - docs/{MILESTONES,FOLLOWUPS,patterns/README}.md path + content reconcile after MILESTONES.md moved to docs/ Address reviewer findings before PR: - docs/THREAT-MODEL.md case-mismatch -> docs/threat-model.md (Linux CI is case-sensitive) - pattern.id schema drift: 8 specs said `ib_link_flap`/`cuda_oom`, code emits "2"/"10"/.../"13"; rewrite spec attribute tables to match shipped customer-stable namespace - pattern.confidence: 8 specs said `high|partial`, code emits `full|partial`; rewrite - 02-ib-link-flap.md attribute drift: spec said tracecore.alert.ib_link_flap.{hca_device,port}, code emits hw.network.ib.{device,port.num}; align spec to shipped code - v1-rc1-cut-criteria criterion #1 status stale-on-arrival ("6 patterns shipped" -> "8 patterns shipped, 4 remaining") - NetPol UX trap: NOTES.txt warning when networkPolicy.enabled=true with empty allowedEgressEndpoints (silently kills OTLP exporter) + warning when ServiceMonitor scraper in different namespace - File #337 for missing OTTL recipe projecting DCGM FB_USED/FREE -> hw.gpu.memory.{free,total} log shape (CUDA OOM detector consumes but recipe gap means it ships dark) Tests: ./module/processor/patterndetectorprocessor/... + ./module/pkg/patterns/... both ok. Signed-off-by: Tri Lam <tri@maydow.com>
trilamsr
pushed a commit
that referenced
this pull request
Jun 1, 2026
The audit docs were authored when NORTHSTARS.md + MILESTONES.md lived at the repo root. main moved them to docs/ in PR #313 just before this wave landed. Sibling docs reference these by relative path; 22 links were stale. Replaced ../{NORTHSTARS,MILESTONES}.md → {NORTHSTARS,MILESTONES}.md across three files. doc-check passes. Signed-off-by: Tri Lam <tri@maydow.com>
trilamsr
pushed a commit
that referenced
this pull request
Jun 1, 2026
After PR #313 moved NORTHSTARS.md into docs/, the Spec column links added in the pattern-spec commit kept the pre-move ../docs/patterns/ prefix; from docs/NORTHSTARS.md the correct relative path is just patterns/. 12 links fixed; doc-check clears. Signed-off-by: Tri Lam <tri@maydow.com>
trilamsr
pushed a commit
that referenced
this pull request
Jun 1, 2026
Authoring drift: the v1-rc1-cut-criteria bullets pre-existed the PR #313 MILESTONES.md → docs/MILESTONES.md move, so links carried docs/ prefix that now double-resolves to docs/docs/. Strip to sibling-relative. Signed-off-by: Tri Lam <tri@maydow.com>
9 tasks
trilamsr
added a commit
that referenced
this pull request
Jun 1, 2026
…ts (#338) ## Summary 15-agent parallel wave bridging v1.0-rc1 knowledge gaps + closing horizon backlog. 31 commits, 81 files, +8650/-180. **Code (5 detectors / features):** - `feat(iblinkflap)` pattern #2 IB link flap detector — 13 tests, cross-rank helper extracted for reuse by patterns #7/#9 - `feat(cudaoom)` pattern #10 CUDA OOM detector + fragmentation-vs-true-OOM discriminator — 35 tests, 0/6 false-positive rate on fixture corpus (#303 wiring — recipe gap tracked at #337) - `feat(verdict)` deprecate EvictedPod, co-emit PodName + PodNamespace (#277) with regression-pinning test - `feat(chart)` opt-in default-deny NetworkPolicy + cert-manager mTLS reference (#301); ServiceMonitor + scrape annotations (#296); NOTES.txt UX warnings for empty-egress / cross-ns scraper traps - `feat(bench)` per-detector allocs/event harness + soft ratchet gate, graduation criterion documented (#302) - `feat(patterndetector)` verdict counter metric for dashboard panels (#261) - `fix(slo-rules)` correct otelcol_* label set + drop silent-no-op `unless on (instance)` join (#298) **8 pattern design specs (`docs/patterns/{02,07-13}-*.md`):** - Per pattern: symptom, layers crossed, signal sources, detector evaluation rule, verdict attrs, edge cases, open questions. - 7 load-bearing spec gaps flagged for future TDD red-test work (multi-vendor SDC signal, cohort grouping, processor metrics path, etc). **9 v1.0-rc1 audit / knowledge-gap docs:** - `docs/v1-rc1-cut-criteria.md` — 12 falsifiable cut gates derived from O1-O7 - `docs/v1-rc1-operational-gaps.md` — SLSA L3 + air-gap + upgrade-rollback audit (8 issues filed #314-#321) - `docs/v1-rc1-governance-gaps.md` — CODEOWNERS 0%, lint-principles 4/16, retros, `make ci` 148s (5 issues #322-#325, #327) - `docs/v1-rc1-test-audit.md` — 82.9% coverage, fuzz harness inventory (5 issues #328-#332) - `docs/v1-rc1-simplification-audit.md` — top deletion candidates ~9.6K LOC (3 issues #333-#335) - `docs/threat-model.md` — STRIDE per trust boundary + audit RFP scope (#336) - `docs/reference-environments.md` — Tier 1 kind + Tier 2 32×H100 binding spec for O2 hero KPI - `docs/adoption-pipeline.md` — S0-S3 funnel + comms templates for O5 hero KPI - `docs/standards-roadmap.md` — 10 `gen_ai.training.*` attributes proposed upstream (#326) **Doc-drift cleanup:** 11 issues closed (#265, #268, #269, #276, #283, #287, #292-295, #299). **OTTL recipe wiring:** 6 issues closed (#260, #261, #273, #282, #284, #285); #272 deferred to standards-roadmap. **Multi-cluster auth:** bearer-token + mTLS examples (#297). **Merge resolution + reviewer fixes:** - Resolved 5 conflicts post-PR #310/#312/#313 (factory.go delete, VerdictAttr* unexport, MILESTONES.md → docs/, FOLLOWUPS, patterns README) - Adversarial reviewer found 1 BLOCKER + 6 MAJOR; all addressed before push: - Renamed 16 `VerdictAttr*` → `verdictAttr*` per #310 convention - Re-ported selftel wiring (#261) into main's merged `createLogs` - Fixed case-mismatch `docs/THREAT-MODEL.md` → `docs/threat-model.md` (Linux CI is case-sensitive) - 8 pattern specs schema drift: `pattern.id` slug → numeric (`"2"`, `"7"`...`"13"`), `pattern.confidence` `high` → `full` - `02-ib-link-flap.md` attribute drift: spec said `tracecore.alert.ib_link_flap.{hca_device,port}`, code emits `hw.network.ib.{device,port.num}` - `v1-rc1-cut-criteria` criterion #1 status stale-on-arrival ("6 patterns shipped" → "8 patterns shipped, 4 remaining") - NetPol UX trap: NOTES.txt warns when `enabled=true` with empty `allowedEgressEndpoints` (silently kills OTLP) or cross-ns Prometheus - Filed #337 for missing OTTL recipe projecting `DCGM_FI_DEV_FB_*` → `hw.gpu.memory.{free,total}` (CUDA OOM detector consumes but recipe gap) - Post-merge stale-relative-path sweep: 6 wave docs + NORTHSTARS.md + MILESTONES.md (`docs/`, `../`, `docs/docs/` drift after MILESTONES + NORTHSTARS moved to docs/) - Documented 5 newly-emitted attributes in ATTRIBUTES.md (drop_ratio + IB tier — `attribute-namespace-check` now 67/67) ## Test plan - [x] `go test ./module/processor/patterndetectorprocessor/... ./module/pkg/patterns/...` — ok - [x] `make lint` (golangci-lint via goreleaser-style gate) — 0 issues - [x] `go vet ./...` — clean - [x] `make doc-check` — passes after stale-link sweep - [x] `scripts/attribute-namespace-check.sh` — 67/67 documented - [x] `helm lint install/kubernetes/tracecore` — 0 chart(s) failed - [x] `promtool check rules` on slo-rules.yaml — 13 rules / SUCCESS - [ ] CI compat-matrix (rc1 criterion #6) — gated on next wave - [ ] manual smoke install on real cluster — owner clearance pending ```release-notes Lands two new pattern detectors (#2 IB link flap, #10 CUDA OOM fragmentation-vs-true discriminator), 8 pattern design specs for the remaining v1.0 root-cause patterns, opt-in default-deny NetworkPolicy + Prometheus Operator ServiceMonitor on the Helm chart, the EvictedPod → PodName/PodNamespace verdict-attribute deprecation co-emit, per-detector allocs/event bench harness, SLO-rules label fix, and the v1.0-rc1 knowledge-gap audit set (cut criteria, ops gaps, governance gaps, test audit, simplification audit, threat model, reference envs, adoption pipeline, standards roadmap). ``` --------- Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Audit flagged that
MILESTONES.md(108K) andNORTHSTARS.md(28K) lived at the repo root while every other narrative doc (STRATEGY,FAILURE-MODES,maintainership,RELEASE-CHECKLIST, etc.) lives underdocs/. Both are project-internal narrative, not user-facing surface — same audience and same lifecycle as their siblings already indocs/. The inconsistency was the root cause; this PR fixes it.Root at
/reserved for the conventional top-level files only:README,CONTRIBUTING,CODE_OF_CONDUCT,SECURITY,LICENSE,CHANGELOG,PRINCIPLES,STYLE,AGENTS,CLAUDE.Changes
git mv MILESTONES.md docs/MILESTONES.mdgit mv NORTHSTARS.md docs/NORTHSTARS.mddocs/...paths lose their prefix;SECURITY.md,go.mod,CODE_OF_CONDUCT.md,CONTRIBUTING.md,PRINCIPLES.md,STYLE.mdgain../.README,CONTRIBUTING,AGENTS,docs/README,docs/RELEASE-CHECKLIST,docs/maintainership,docs/patterns/README,docs/followups/{README,M3},docs/research/{m5-m6,m15,m16},docs/rfcs/{0002,0003,0008,0009,0010,0011,0012},components/receivers/pyspy/README,docs/proposals/gen-ai-training-semconv.scripts/doc-check.shmilestones_docvariable nowdocs/MILESTONES.md;install/kubernetes/tracecore/Chart.yamlartifacthub URL bumped;.github/{branch-protection.yml, workflows/chaos.yml, ISSUE_TEMPLATE/feature_request.md}comment refs updated.bench/overhead,module/pkg/nccl/fr_parser,module/receiver/ncclfrreceiver,tools/failure-injectbrought current.112 remaining bare-text mentions of the filename in prose (e.g., "the MILESTONES.md M15 rubric names...") were left alone — they identify the doc by name, not as a clickable link, and remain readable.
Test plan
bash scripts/doc-check.sh— passes (554 markdown links resolve, 64 non-md intra-repo paths resolve, 7 unverified markers still detected at the newdocs/MILESTONES.mdbaseline location, banned-phrase lint clean across 114 files, all required top-level docs present)golangci-lint run ./...— 0 issuesgo vet ./...— cleango mod verify— all modules verifiedno-autoupdate-check_test— all assertions pass[X](path)referencing either moved file): 52/52 link-syntax targets resolve to on-disk files; 0 brokenMILESTONES.mdorNORTHSTARS.mdat repo root