Skip to content

[docs] MILESTONES: re-grade post-merge rubrics by evidence#59

Merged
trilamsr merged 1 commit into
mainfrom
worktree-milestones-rubric-regrade
May 19, 2026
Merged

[docs] MILESTONES: re-grade post-merge rubrics by evidence#59
trilamsr merged 1 commit into
mainfrom
worktree-milestones-rubric-regrade

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

Summary

Follow-up to merged PR #53. Self-review found the per-rubric markers were too generous — a code merge doesn't satisfy a rubric that requires measurement, and a CI gate only satisfies a rubric if the gate is actually fail-closed. This PR audits each shipped rubric against its file:line evidence and downgrades the ones that need it.

Re-grading

M4b (failure-injection harness): top-line delivered → ☑ partial. The nccl-hang CLI shim is a stub returning ErrPending (tools/failure-inject/ncclhang/ncclhang.go); the underlying capability exists in pkg/nccl/fr_parser/synthesize.go from M11. Three rubrics flipped to :

  • nccl-hang byte-identical round-trip (CLI not wired)
  • nccl-hang safe-opcode-only (CLI not wired)
  • determinism on amd64 + arm64 (single-arch SHA gate; cross-arch equality is carry-forward)

M5b (Helm chart): top-line stays ☑ delivered. One rubric flipped to — the ≤5-min hero-KPI median across 10 CI runs is satisfied by a single-run ≤300s gate; 10-run aggregation is the open work.

M10 (k8s events receiver): top-line stays ☑ alpha. Overhead-budget rubric flipped to BenchmarkEmitOne + BenchmarkConvertOne + rusage_linux_test.go exist, but an end-to-end 1k-events/min run asserting CPU + egress + RSS budgets together is the open work.

M11 (NCCL FlightRecorder): top-line stays ☑ alpha. Five rubrics flipped to :

  • nccl-hang CLI reachability (blocked on M4b carry-forward)
  • 5s dump-watcher emit timing assertion
  • 2.31-drift fixture + parser-diff CI gate
  • overhead bench promoted from advisory to fail-closed gate
  • make generate-fixtures byte-identical regen gate

M3 (reproducible-build CI): all 11 rubrics keep . Every gate is wired into release.yml and fail-closed; the lack of a real v0.X.Y tag doesn't invalidate the rubrics because they gate on workflow existence, not on past published releases.

Side cleanup

The "Foundation entries (M1/M2/M4/M9) predate this convention" line in the MILESTONES preamble belonged in FOLLOWUPS per the currency rule. Moved to docs/FOLLOWUPS.md § Documentation as an opportunistic backfill item.

Test plan

  • bash scripts/doc-check.sh exits 0 — unverified marker count stable at 7
  • CI green
  • MILESTONES.md renders correctly on GitHub
  • Every re-graded rubric has its evidence-gap noted inline in italics

Note

This PR was the third commit on the (now-merged) PR #53 branch and didn't make it into the squash. Re-applying as a focused follow-up against post-merge main.

🤖 Generated with Claude Code

Self-review of the previous status flip found the per-rubric ☑
markers were too generous. A code-merge does not satisfy a rubric
that requires measurement; a CI gate satisfies it only if the gate
actually exists and is fail-closed. Auditing each rubric against
its file:line evidence:

M4b: top-line ⧗→☑ partial. nccl-hang CLI shim is a stub returning
ErrPending (tools/failure-inject/ncclhang/ncclhang.go); two
nccl-hang functional rubrics flipped to ⧗. Cross-arch SHA-256
equality is not explicitly enforced — single-arch SHA gate exists,
cross-arch comparison is carry-forward. Determinism rubric flipped
to ⧗.

M5b: top-line stays ☑ delivered. Single rubric flipped to ⧗ — the
≤5min-median-across-10-runs hero-KPI rubric is satisfied by a
single-run ≤300s gate today; 10-run aggregation is carry-forward.

M10: top-line stays ☑ alpha. Overhead-budget rubric flipped to ⧗
— per-component benches (BenchmarkEmitOne, BenchmarkConvertOne,
rusage_linux_test.go) exist but an end-to-end 1k-events/min run
asserting CPU + egress + RSS budgets together is the open work.

M11: top-line stays ☑ alpha. Five rubrics flipped to ⧗:
- nccl-hang CLI reachability (blocked on M4b carry-forward)
- 5s dump-watcher emit timing assertion
- 2.31-drift fixture + parser-diff CI gate
- overhead bench promoted from advisory to fail-closed gate
- make generate-fixtures byte-identical regen gate

M3: 11/11 rubrics keep ☑ — every gate is wired into release.yml
and fail-closed; no real v0.X.Y tag yet but rubrics gate on
workflow existence, not on past releases.

Side cleanup: the "Foundation entries (M1/M2/M4/M9) predate this
convention" line in MILESTONES preamble belonged in FOLLOWUPS per
the currency rule. Moved to docs/FOLLOWUPS.md § Documentation as
an opportunistic backfill item.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@trilamsr trilamsr merged commit 24dea82 into main May 19, 2026
5 checks passed
@trilamsr trilamsr deleted the worktree-milestones-rubric-regrade branch May 19, 2026 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant