[docs] MILESTONES: re-grade post-merge rubrics by evidence#59
Merged
Conversation
Self-review of the previous status flip found the per-rubric ☑ markers were too generous. A code-merge does not satisfy a rubric that requires measurement; a CI gate satisfies it only if the gate actually exists and is fail-closed. Auditing each rubric against its file:line evidence: M4b: top-line ⧗→☑ partial. nccl-hang CLI shim is a stub returning ErrPending (tools/failure-inject/ncclhang/ncclhang.go); two nccl-hang functional rubrics flipped to ⧗. Cross-arch SHA-256 equality is not explicitly enforced — single-arch SHA gate exists, cross-arch comparison is carry-forward. Determinism rubric flipped to ⧗. M5b: top-line stays ☑ delivered. Single rubric flipped to ⧗ — the ≤5min-median-across-10-runs hero-KPI rubric is satisfied by a single-run ≤300s gate today; 10-run aggregation is carry-forward. M10: top-line stays ☑ alpha. Overhead-budget rubric flipped to ⧗ — per-component benches (BenchmarkEmitOne, BenchmarkConvertOne, rusage_linux_test.go) exist but an end-to-end 1k-events/min run asserting CPU + egress + RSS budgets together is the open work. M11: top-line stays ☑ alpha. Five rubrics flipped to ⧗: - nccl-hang CLI reachability (blocked on M4b carry-forward) - 5s dump-watcher emit timing assertion - 2.31-drift fixture + parser-diff CI gate - overhead bench promoted from advisory to fail-closed gate - make generate-fixtures byte-identical regen gate M3: 11/11 rubrics keep ☑ — every gate is wired into release.yml and fail-closed; no real v0.X.Y tag yet but rubrics gate on workflow existence, not on past releases. Side cleanup: the "Foundation entries (M1/M2/M4/M9) predate this convention" line in MILESTONES preamble belonged in FOLLOWUPS per the currency rule. Moved to docs/FOLLOWUPS.md § Documentation as an opportunistic backfill item. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to merged PR #53. Self-review found the per-rubric
☑markers were too generous — a code merge doesn't satisfy a rubric that requires measurement, and a CI gate only satisfies a rubric if the gate is actually fail-closed. This PR audits each shipped rubric against its file:line evidence and downgrades the ones that need it.Re-grading
M4b (failure-injection harness): top-line
☑delivered →☑ partial. Thenccl-hangCLI shim is a stub returningErrPending(tools/failure-inject/ncclhang/ncclhang.go); the underlying capability exists inpkg/nccl/fr_parser/synthesize.gofrom M11. Three rubrics flipped to⧗:M5b (Helm chart): top-line stays
☑ delivered. One rubric flipped to⧗— the ≤5-min hero-KPI median across 10 CI runs is satisfied by a single-run ≤300s gate; 10-run aggregation is the open work.M10 (k8s events receiver): top-line stays
☑ alpha. Overhead-budget rubric flipped to⧗—BenchmarkEmitOne+BenchmarkConvertOne+rusage_linux_test.goexist, but an end-to-end 1k-events/min run asserting CPU + egress + RSS budgets together is the open work.M11 (NCCL FlightRecorder): top-line stays
☑ alpha. Five rubrics flipped to⧗:make generate-fixturesbyte-identical regen gateM3 (reproducible-build CI): all 11 rubrics keep
☑. Every gate is wired intorelease.ymland fail-closed; the lack of a realv0.X.Ytag doesn't invalidate the rubrics because they gate on workflow existence, not on past published releases.Side cleanup
The "Foundation entries (M1/M2/M4/M9) predate this convention" line in the MILESTONES preamble belonged in FOLLOWUPS per the currency rule. Moved to
docs/FOLLOWUPS.md§ Documentation as an opportunistic backfill item.Test plan
bash scripts/doc-check.shexits 0 — unverified marker count stable at 7Note
This PR was the third commit on the (now-merged) PR #53 branch and didn't make it into the squash. Re-applying as a focused follow-up against post-merge main.
🤖 Generated with Claude Code