[docs] split FOLLOWUPS.md into per-milestone shards + union merge#132
Merged
Conversation
The single 2,837-line docs/FOLLOWUPS.md conflicted on nearly every merge when two branches happened to touch it concurrently. The split addresses the conflict surface directly: - 19 shards under docs/followups/ (one per milestone plus opportunistic / skipped / otlphttp / cross-cutting buckets). - .gitattributes declares docs/followups/*.md merge=union so concurrent appends auto-merge; only same-line edits still raise a real conflict. - _needs-prod-data.md and _needs-gpu.md collect resource-gated items independent of milestone, so the resource gate is visible at a glance. - docs/FOLLOWUPS.md kept as a redirect stub with a migration table so existing inbound links resolve. - docs/followups/README.md documents the filing convention, conflict-free append rules, and decay rule. Cross-references updated in AGENTS.md, CONTRIBUTING.md, MILESTONES.md, docs/README.md, docs/STYLE-docs.md, docs/maintainership.md, and three docs/notes/ files so contributors land items in the right shard going forward. Integrity: 211 checkboxes in, 211 out, zero duplicates. doc-check green (410 markdown links resolve). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… from em-dash gate Rebase onto main introduced 4 broken relative links (one extra level deep under docs/followups/) and surfaced em-dashes in newly-added lines across AGENTS.md / CONTRIBUTING.md / MILESTONES.md / the docs/FOLLOWUPS.md stub / two docs/notes/ files. scripts/doc-check.sh adds docs/followups/ to the em-dash diff-gate exemption list. The shards are a mechanical back-fill from the (grandfathered) monolithic docs/FOLLOWUPS.md; their prose carries pre-existing em-dashes that the split surfaced as "additions" to git diff. The gate's purpose is to prevent NEW em-dash drift; the shards are not new content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f6fca46 to
a6d63b3
Compare
4 tasks
trilamsr
added a commit
that referenced
this pull request
May 20, 2026
…k Item 5 shipped
Validation pass on docs/followups/opportunistic.md "Next up" items
against the current code tree (per the pr-workflow.md "audit stale
rows before implementing" lesson):
- Item 5 (DCGM adopt lifecycle.Lifecycle): SHIPPED. receiver.go
L67-70 + L81 confirm the helper is in use. Replaced the row with
a "Closed" marker citing the validation date.
- Item 8 (resolveIncErrorCall CallExpr): anchor corrected
(kernelevents/runbook_kinds_test.go, not dcgm/). Trigger fired:
selftelemetry.Kind("...") call sites now exist in
capturing_test.go and classify_internal_test.go.
- Items 2, 7, 8, 10, 11, 12, 13: still active. Promoted to GitHub
Issues (#135-#141) with enhancement + help wanted labels. Each
row in opportunistic.md gains a *Tracked:* link.
- Item 4 (tracecore.dev/schemas migration): still active but
blocked on external hosting, not contributor pickup. No Issue.
Resource-bucket re-audit across every milestone shard: zero net
moves. Other shards' GPU/production mentions are trigger conditions
or describe what the receiver observes, not resource gates on the
follow-up itself. Documented as "Coverage scan" notes in
_needs-prod-data.md and _needs-gpu.md so a future curator knows
the audit ran without leaving the buckets empty-looking.
This builds on PR #132 (which split the monolithic FOLLOWUPS.md
into per-milestone shards); merging order does not strictly matter
since the changes are append-style under the union-merge driver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trilamsr
added a commit
that referenced
this pull request
May 20, 2026
…e shipped (#142) ## Summary Curation pass on `docs/followups/opportunistic.md` "Next up" rows. Builds on #132 (the shard split, now merged) and lands alongside #133 (RUNBOOK + chart-appversion audit, also merged). - **Validated every Next-up row against the current code tree.** Per the `pr-workflow.md` "audit stale rows *before* implementing" lesson: any row naming a specific file/symbol gets a grep/ls check before pickup. - **Item 5 (DCGM adopt lifecycle.Lifecycle): SHIPPED.** `components/receivers/dcgm/receiver.go` L67-70 + L81 already use `lifecycle.Lifecycle`. Row replaced with a "Closed" marker citing the validation date. - **Item 8 (resolveIncErrorCall CallExpr): anchor corrected.** Cited as `dcgm/runbook_kinds_test.go`; correct path is `kernelevents/runbook_kinds_test.go`. Also: trigger fired — `selftelemetry.Kind("...")` call sites now exist (`capturing_test.go`, `classify_internal_test.go`). - **Items 2, 7, 8, 10, 11, 12, 13: promoted to GitHub Issues** ([#135](#135), [#136](#136), [#137](#137), [#138](#138), [#139](#139), [#140](#140), [#141](#141)) with `enhancement` + `help wanted` labels. Each opportunistic row gains a `*Tracked:* #NN` back-link. - **Item 4 (tracecore.dev/schemas migration): no Issue.** Trigger is external hosting, not contributor pickup. Stays in the shard. ## Resource-bucket re-audit Scanned every milestone shard for follow-ups gated on GPU hardware or production data; **zero net moves**. Other shards' GPU/production mentions are trigger conditions ("operator reports X") or describe what the receiver observes, not resource gates on the follow-up itself. M14 explicitly self-notes this. Documented as "Coverage scan (2026-05-20)" notes in `_needs-prod-data.md` and `_needs-gpu.md` so the next curator knows the audit ran. ## Files changed - `docs/followups/opportunistic.md` — 7 Tracked-links, Item 5 strike, Item 8 anchor + trigger update. - `docs/followups/_needs-prod-data.md` — coverage-scan note. - `docs/followups/_needs-gpu.md` — coverage-scan note. ## Test plan - [x] `bash scripts/doc-check.sh` green locally (436 markdown links resolve, em-dash gate clean, 7 baseline unverified markers non-growing). - [x] All 7 created Issues exist on GitHub (linked above). - [x] Item 5 ship-verification re-confirmed: `grep -n lifecycle.Lifecycle components/receivers/dcgm/receiver.go` shows the field at L81. - [ ] CI green on this branch. ## Rebase note Branch was originally based on the pre-merge `worktree-followups-split` branch (#132). After #132 squash-merged + #133 landed, a straight rebase tried to replay #132's already-merged commits and conflicted with their squashed form on `docs/notes/`. Resolution: reset branch to `origin/main` and cherry-pick only the self-contained curation commit (`fdebda8` → `897acc8`). Lesson: branch follow-on work off the same base the upstream PR targets (main), not off the feature branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
trilamsr
added a commit
that referenced
this pull request
May 21, 2026
…ards (#143) ## Summary Cross-shard anchor + trigger audit of `docs/followups/*.md`. Targets the same staleness class that the opportunistic.md curation pass (#142) addressed, applied to every other milestone shard. **Method:** for each of 192 rows across 14 milestone/component shards, extracted backticked paths/symbols/Make-targets and verified each anchor against the current tree. For rows with explicit `*Trigger:*` conditions, evaluated whether the condition has fired. ## Findings applied - **M13 §pyspy `Operator RUNBOOK.md expansion`: SHIPPED.** PR #133 created `components/receivers/pyspy/RUNBOOK.md` with per-kind triage for all 12 RFC-0009 IncError kinds. Row marked `[x]` with ship reference; original ask text struck through. - **M3 §`chart-appversion` drift gate: PARTIALLY SHIPPED.** PR #133's `scripts/chart-appversion-check.sh` compares `Chart.yaml.appVersion` against `internal/version/version.go::Version` (in-tree drift). Original row asked for drift against the actual binary release tag (`gh release view`). In-tree half is now covered; binary-tag half still gated on M21 publishing a real tag. Row kept open with explicit partial-ship line. ## Findings not applied (with reasoning) Total: 72 candidate rows surfaced by the sweep, partitioned 29 + 9 + 34. Each bucket's rejection reasoning below. - **29 candidate "all anchors exist" rows**: anchors resolved because they're common files (`receiver.go`, `config.go`), not because the specific feature shipped. Examples: `docs/followups/M14.md` L70 concurrent ingest race-detector test (counters exist, test does not); `docs/followups/otlphttp.md` L25 `MaxBodyBytes` config (zero grep hits for the feature); `docs/followups/M8.md` L35 `tracecore validate --show-defaults` (subcommand absent from `cmd/tracecore/`). - **9 trigger candidates** (5 + 4): - **5 rows triggered by "M11 NVML lands"**: M11 in `MILESTONES.md` is the NCCL_fr receiver, not NVML. NVML is a future trigger. - **4 rows triggered by "cgo client lands"**: `pkg/dcgm/client_cgo.go` is an explicit placeholder ("PLACEHOLDER" caps in the doc comment; exposes `cgo-placeholder` variant string in `tracecore receivers list`). - **34 "stale path" candidates**: most resolved to common Go identifiers (`process.pid`, `plog.LogRecord`) or files with the same name in multiple packages — false positives from path-stripped greps; the rows' true anchors are correct in context. ## Root cause The two findings applied came from PR #133 shipping work that closed two specific follow-up rows but didn't update the matching shards. The `MILESTONES.md § "Keeping this document current"` rule (updated in #132) now mandates this, but #133 was authored against the older single-file convention. Going forward, the rule covers `docs/followups/*.md` directly. ## Files changed - `docs/followups/M13.md` — strike-through `Operator RUNBOOK.md expansion` row. - `docs/followups/M3.md` — partial-ship note on chart-appversion drift gate. ## Test plan - [x] `bash scripts/doc-check.sh` green locally (436 markdown links resolve, em-dash + en-dash diff gate clean, comment-noise diff gate clean). - [x] Strike target verified on `main`: `components/receivers/pyspy/RUNBOOK.md` (blob `2e189a2`) exists and covers all 12 IncError kinds. - [x] Partial-ship target verified on `main`: `scripts/chart-appversion-check.sh` (blob `9d8e2a5`) exists and is wired into `make doc-check` (Makefile L227-231). - [x] CI green on this branch (8/8 checks passing as of body-edit time: verify, verify-test, verify-lint, verify-static, build, pr-lint, CodeQL Analyze, CodeQL). ## Sequencing Builds on PR #132 (shard split, merged) and PR #133 (RUNBOOK + chart-appversion gate, merged). Independent of #142 (opportunistic.md curation, currently auto-merging) since they touch different shards. Merging order does not matter. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
trilamsr
added a commit
that referenced
this pull request
May 21, 2026
#147) ## Summary Single-PR bundle of 10 low-risk follow-up actions. Each row was anchor-verified on `main` before editing; no production behavior change. Diff is 10 files, +91/-37, dominated by markdown. **Breakdown:** - 3 strikes (anchor shipped, row was stale) - 1 test-only struct add (k8sevents `NodeWatchErrors`) - 1 bash test add (`no-autoupdate-check` hit-line format lock) - 5 doc-only clarifications / partial-ship / audit notes ## Items applied ### Strikes — anchor on `main` confirms shipped 1. **M3.md L188 `make doc-check`.** `scripts/doc-check.sh` header reads "verify every Test\*/Fuzz\*/Benchmark\* name referenced in docs"; wired into `make doc-check` and `make ci`. 2. **M8.md L103 `docs/HARDWARE-TESTING.md` libdcgm + nv-hostengine setup.** File exists (28 hits for libdcgm/dcgm/nv-hostengine); covers Ubuntu 22.04 driver / `libdcgm-dev` / `nv-hostengine` provisioning, x86_64 + aarch64-SBSA build matrix, and the `//go:build dcgm,hardware` distinction. Doc shipped ahead of cgo client to unblock GPU-less contributors. 3. **M19.md L18 `nodeWatchErrCount` not in SnapshotCounters.** Closed by item 6 below — added `NodeWatchErrors` field symmetrically. ### Test-only struct add 4. **components/receivers/k8sevents/export_test.go.** Added `NodeWatchErrors int64` field on `CountersForTest`; `SnapshotCounters` now reads `rr.nodeWatchErrCount.Load()` symmetrically with `rr.watchErrCount.Load()`. Both call sites are keyed-init inside the same file; no external positional callers to break (grep confirmed: only 2 hits, both in `export_test.go`). ### Bash test add (M23 grep-gate format lock) 5. **scripts/no-autoupdate-check_test.sh "hit-line-format-stable".** New assertion that runs the gate against the hyphenated-go-update fixture, captures stdout (existing tests discard it), and asserts at least one line matches `^[^:]+:[0-9]+:`. Locks the parseable hit-line shape *before* the first automation consumer (CI summary, dashboard, Slack notifier) wires up — a cosmetic tweak to the gate's message body now fails CI instead of silently breaking downstream parsers. M23.md row struck. ### Doc-only clarifications 6. **M15.md L185 falsifying-check backfill.** Anchored the "/var/lib/tracecore/ subdir governance" row's grep-falsifying-check to RFC-0010 §Proposal — `docs/rfcs/0010-containerstdout-receiver-scope.md` L177/L217/L274/L393/L407 already carry the convention ("M15 owns `/var/lib/tracecore/container_stdout/`. Future siblings reserve their own subdirectories."). Row marked `[x]`. 7. **M15.md L192 + RFC-0010 §Pod-attribution forward-pointer.** Appended one-line cross-reference at RFC-0010 L158 → `docs/followups/M15.md` "Cross-receiver rank-label reconciliation" so the deferred audit trail is discoverable from the RFC. Row marked `[x]`. 8. **M8.md L30 `tracecore debug dump` partial-ship.** `cmd/tracecore/debug.go::runDebugDump` already writes version + revision + branch + build date + Go runtime stats + registered components + redacted config to `tracecore debug dump > diagnostic.txt`. Remaining gap is "last N samples" — needs receiver-side ring buffer (M2 carry-forward). Row kept open with partial-ship line + remaining-trigger. 9. **M3.md L153 SUPPLY-CHAIN-IDENTITY.md scope clarification.** Added one sentence noting the consolidation is a copy-and-deduplicate pass against existing `release.yml` comment blocks (cosign-sign-blob, gh-attestation-sign), not net-new authoring — so the next reader sees the actual scope of work, not a misleading "30-min write" estimate that implies green-field. 10. **otlphttp.md L182 workflow paths audit + M14.md L88 test pointer.** - **otlphttp**: inlined audit findings (2026-05-20). `chart.yml` and `install-bench.yml` are substrate-aware (include `cmd/tracecore/**`, `internal/**`); `kernelevents-integration.yml` and `pyspy-integration.yml` cover only `components/receivers/<name>/**` + `internal/runtime/lifecycle/**` — a `cmd/tracecore` factory wiring or `internal/pipeline` contract change can land without re-running these integration jobs. `chaos.yml` covers `tools/failure-inject/**` + `internal/synthesis/**` only (indirect coupling, acceptable). Remaining: 6-line YAML edit per integration workflow. - **M14**: added inline pointer from the multi-retry slow-write fixture row to the existing single-retry baseline at `components/receivers/kineto/shutdown_test.go::TestIngest_RetryOnTruncated` so the future author has the test-shape anchor. ## Files changed | File | LOC | Kind | |---|---|---| | `components/receivers/k8sevents/export_test.go` | +2 | test struct field | | `scripts/no-autoupdate-check_test.sh` | +20 | bash test add | | `docs/rfcs/0010-containerstdout-receiver-scope.md` | +1/-1 | inline cross-ref | | `docs/followups/M3.md` | +9/-5 | strike + scope clarification | | `docs/followups/M8.md` | +16/-5 | strike + partial-ship | | `docs/followups/M14.md` | +1/-1 | test pointer | | `docs/followups/M15.md` | +15/-8 | 2 strikes | | `docs/followups/M19.md` | +5/-9 | strike (anchored to test add) | | `docs/followups/M23.md` | +9/-7 | strike | | `docs/followups/otlphttp.md` | +13/-1 | audit findings inline | ## Test plan - [x] `go test ./components/receivers/k8sevents/...` green. - [x] `bash scripts/no-autoupdate-check_test.sh` 10/10 assertions pass (added "hit-line-format-stable" — the 10th). - [x] `bash scripts/doc-check.sh` green (437 markdown links resolve, em-dash + en-dash diff gate clean, comment-noise diff gate clean). - [x] Pre-commit hook ran full `make check` + `make ci` (all package tests cached/passing). - [ ] CI green on this branch. ## Release notes ```release-notes NONE ``` ## Sequencing Builds on `main` after PRs #132 (shard split), #133 (RUNBOOK + chart-appversion), #142 (opportunistic curation), #134 (chaos.yml row), #143 (cross-shard audit). Independent of currently-open PRs #144 (m6 integration recipes) and #145 (m3 GHCR image publish). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
trilamsr
added a commit
that referenced
this pull request
May 21, 2026
…149) ## Summary Two commits, both follow-on from PR #147: 1. **`[ci] integration paths: add substrate to kernelevents + pyspy`** — closes the workflow-paths audit gap surfaced in PR #147's `docs/followups/otlphttp.md` "Workflow paths trigger" row. 2. **`[docs] MILESTONES: backfill rubric blocks for M1, M2, M4, M9`** — closes `docs/followups/M3.md` "Backfill Foundation milestone rubrics" row. Both follow-up rows are marked `[x]` in their respective shards with strike-through and ship-evidence. ## Commit 1: integration workflow paths PR #147's audit found that `.github/workflows/kernelevents-integration.yml` and `pyspy-integration.yml` `paths:` filters cover only: - `components/receivers/<name>/**` - `internal/runtime/lifecycle/**` So a change to `cmd/tracecore` factory wiring, `internal/pipeline` contract, or `internal/selftelemetry` surface could land without re-running either integration suite — even though receiver behavior depends on all three substrates. This commit adds the three substrate path patterns to both `push:` and `pull_request:` filters on both workflows. Symmetric with `install-bench.yml` (P3-Rev1 #10 fix) and `chart.yml`. `chaos.yml` was audited and intentionally not changed — its substrate coupling runs via `tools/failure-inject/**` + `internal/synthesis/**` only. `actionlint` clean on both workflows. ## Commit 2: MILESTONES rubric backfill M1, M2, M4, M9 predate the per-rubric `☑` convention adopted in PR #53 and shipped as prose-only delivery summaries. This commit reformats each section to match M3 / M5b / M10+ shape: - **Functional rubrics:** block with `☑` bullets citing RFC sections or shipped file paths. - **Non-functional rubrics:** block for budget / policy / overhead guarantees. Every claim was extracted from the existing prose summary; no new guarantees added. Source citations: - **M1**: RFC-0003 (Component / Host / Factory contracts, two-phase shutdown, push-based consumers, factory map, `safe.Call`, operator UX). - **M2**: RFC-0006 (`/metrics` + `/healthz` + `/readyz`, `selftelemetry.Receiver`, O2 SLO gauges, three OTel divergences closed). - **M4**: `.golangci.yml` + `Makefile` + `scripts/` (no RFC; convention is the tooling files themselves). - **M9**: RFC-0007 (`/dev/kmsg` + journald via one `source` interface, NVRM Xid extraction, RE2 filters compile at Start, trace context propagation, non-Linux stubs, overhead budget). Also fixed three stale `docs/FOLLOWUPS.md` references that survived the shard split (PR #132): - MILESTONES.md L210 M21 carry-forward → `docs/followups/M3.md` - MILESTONES.md L269 benchstat → `docs/followups/opportunistic.md` - MILESTONES.md L552 M8 carry-forward → `docs/followups/M8.md` ## Files changed | File | Commit | LOC | |---|---|---| | `.github/workflows/kernelevents-integration.yml` | 1 | +6 | | `.github/workflows/pyspy-integration.yml` | 1 | +6 | | `docs/followups/otlphttp.md` | 1 | +5/-5 (strike + audit closure) | | `MILESTONES.md` | 2 | +48/-12 (4 rubric blocks + 3 link fixes) | | `docs/followups/M3.md` | 2 | +5/-4 (strike) | ## Release notes ```release-notes NONE ``` ## Test plan - [x] `make ci` green (verify, verify-lint, verify-static, verify-test, build, vet, golangci-lint, zizmor, actionlint, govulncheck, fuzz 30s). - [x] `bash scripts/doc-check.sh` green (em-dash + en-dash diff gate clean, comment-noise diff gate clean, 452 markdown links resolve). - [x] `actionlint` clean on both modified workflows. - [x] `bash scripts/no-autoupdate-check_test.sh` 10/10 assertions pass. - [x] `release-doc-parity` clean, `chart-appversion-check` clean, `alert-check` clean. - [ ] CI green on this branch. ## Sequencing Builds on PR #147 (merged) which surfaced both follow-up items. Independent of any currently-open PRs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/FOLLOWUPS.mdwas a 2,837-line single file that generated merge conflicts on nearly every parallel branch that happened to touch it. Root cause: monolithic file = one conflict surface.docs/followups/(per milestone + opportunistic / skipped / otlphttp + cross-cutting_needs-prod-data.md/_needs-gpu.md) so concurrent branches touching different milestones touch different files — zero conflict..gitattributesdeclaresdocs/followups/*.md merge=unionso even concurrent edits to the same shard auto-merge as long as the edits are on different lines; only same-line edits still raise a real conflict._needs-prod-data.mdand_needs-gpu.mdcollect resource-gated items independent of milestone, so the resource gate is visible at a glance (and the M13 pyspy "empirical validation" phase no longer hides 4 mixed-gate items inside a milestone shard).docs/FOLLOWUPS.mdis preserved as a redirect stub with a migration table so existing inbound links across the repo still resolve.docs/followups/README.mdis the new convention doc: which shard owns what, conflict-free append rules, decay rule.Documentation updates
Convention-bearing docs updated to direct future contributors at the shard layout:
AGENTS.md— anchors pointing into FOLLOWUPS now point at specific shards (docs/followups/M3.mdetc.).CONTRIBUTING.md— "Keep the tracking docs current" rule now references shards + filing convention.MILESTONES.md— "Keeping this document current" rule coversdocs/followups/*.mdand notes the append-to-bottom + union-merge mechanics.docs/README.md— index entry redirects; subdirectory entry added forfollowups/.docs/STYLE-docs.md— "trigger condition in FOLLOWUPS" → "trigger condition in adocs/followups/shard".docs/maintainership.md— currency rule pointer updated.docs/notes/{pr-workflow,reviews,autonomous-feature-flow}.md— three lessons rebased onto the shard layout.Integrity
scripts/doc-check.shgreen: 410 markdown links resolve, 53 test references verified, baseline (unverified) marker count = 7, banned-phrase lint clean, em-dash diff gate clean.How items are filed going forward
_needs-prod-data.md/_needs-gpu.md.M*.md.skipped.md(or milestone-scoped skip subsection).opportunistic.md.Append to the bottom of the relevant section so union-merge handles concurrent additions cleanly.
Test plan
scripts/doc-check.shpasses locally.docs/FOLLOWUPS.mdredirect stub's migration table covers every old section heading.🤖 Generated with Claude Code