[docs] follow-up curation: validate Next-up, promote to Issues, strike shipped#142
Merged
Conversation
…k Item 5 shipped
Validation pass on docs/followups/opportunistic.md "Next up" items
against the current code tree (per the pr-workflow.md "audit stale
rows before implementing" lesson):
- Item 5 (DCGM adopt lifecycle.Lifecycle): SHIPPED. receiver.go
L67-70 + L81 confirm the helper is in use. Replaced the row with
a "Closed" marker citing the validation date.
- Item 8 (resolveIncErrorCall CallExpr): anchor corrected
(kernelevents/runbook_kinds_test.go, not dcgm/). Trigger fired:
selftelemetry.Kind("...") call sites now exist in
capturing_test.go and classify_internal_test.go.
- Items 2, 7, 8, 10, 11, 12, 13: still active. Promoted to GitHub
Issues (#135-#141) with enhancement + help wanted labels. Each
row in opportunistic.md gains a *Tracked:* link.
- Item 4 (tracecore.dev/schemas migration): still active but
blocked on external hosting, not contributor pickup. No Issue.
Resource-bucket re-audit across every milestone shard: zero net
moves. Other shards' GPU/production mentions are trigger conditions
or describe what the receiver observes, not resource gates on the
follow-up itself. Documented as "Coverage scan" notes in
_needs-prod-data.md and _needs-gpu.md so a future curator knows
the audit ran without leaving the buckets empty-looking.
This builds on PR #132 (which split the monolithic FOLLOWUPS.md
into per-milestone shards); merging order does not strictly matter
since the changes are append-style under the union-merge driver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fdebda8 to
897acc8
Compare
4 tasks
trilamsr
added a commit
that referenced
this pull request
May 21, 2026
…ards (#143) ## Summary Cross-shard anchor + trigger audit of `docs/followups/*.md`. Targets the same staleness class that the opportunistic.md curation pass (#142) addressed, applied to every other milestone shard. **Method:** for each of 192 rows across 14 milestone/component shards, extracted backticked paths/symbols/Make-targets and verified each anchor against the current tree. For rows with explicit `*Trigger:*` conditions, evaluated whether the condition has fired. ## Findings applied - **M13 §pyspy `Operator RUNBOOK.md expansion`: SHIPPED.** PR #133 created `components/receivers/pyspy/RUNBOOK.md` with per-kind triage for all 12 RFC-0009 IncError kinds. Row marked `[x]` with ship reference; original ask text struck through. - **M3 §`chart-appversion` drift gate: PARTIALLY SHIPPED.** PR #133's `scripts/chart-appversion-check.sh` compares `Chart.yaml.appVersion` against `internal/version/version.go::Version` (in-tree drift). Original row asked for drift against the actual binary release tag (`gh release view`). In-tree half is now covered; binary-tag half still gated on M21 publishing a real tag. Row kept open with explicit partial-ship line. ## Findings not applied (with reasoning) Total: 72 candidate rows surfaced by the sweep, partitioned 29 + 9 + 34. Each bucket's rejection reasoning below. - **29 candidate "all anchors exist" rows**: anchors resolved because they're common files (`receiver.go`, `config.go`), not because the specific feature shipped. Examples: `docs/followups/M14.md` L70 concurrent ingest race-detector test (counters exist, test does not); `docs/followups/otlphttp.md` L25 `MaxBodyBytes` config (zero grep hits for the feature); `docs/followups/M8.md` L35 `tracecore validate --show-defaults` (subcommand absent from `cmd/tracecore/`). - **9 trigger candidates** (5 + 4): - **5 rows triggered by "M11 NVML lands"**: M11 in `MILESTONES.md` is the NCCL_fr receiver, not NVML. NVML is a future trigger. - **4 rows triggered by "cgo client lands"**: `pkg/dcgm/client_cgo.go` is an explicit placeholder ("PLACEHOLDER" caps in the doc comment; exposes `cgo-placeholder` variant string in `tracecore receivers list`). - **34 "stale path" candidates**: most resolved to common Go identifiers (`process.pid`, `plog.LogRecord`) or files with the same name in multiple packages — false positives from path-stripped greps; the rows' true anchors are correct in context. ## Root cause The two findings applied came from PR #133 shipping work that closed two specific follow-up rows but didn't update the matching shards. The `MILESTONES.md § "Keeping this document current"` rule (updated in #132) now mandates this, but #133 was authored against the older single-file convention. Going forward, the rule covers `docs/followups/*.md` directly. ## Files changed - `docs/followups/M13.md` — strike-through `Operator RUNBOOK.md expansion` row. - `docs/followups/M3.md` — partial-ship note on chart-appversion drift gate. ## Test plan - [x] `bash scripts/doc-check.sh` green locally (436 markdown links resolve, em-dash + en-dash diff gate clean, comment-noise diff gate clean). - [x] Strike target verified on `main`: `components/receivers/pyspy/RUNBOOK.md` (blob `2e189a2`) exists and covers all 12 IncError kinds. - [x] Partial-ship target verified on `main`: `scripts/chart-appversion-check.sh` (blob `9d8e2a5`) exists and is wired into `make doc-check` (Makefile L227-231). - [x] CI green on this branch (8/8 checks passing as of body-edit time: verify, verify-test, verify-lint, verify-static, build, pr-lint, CodeQL Analyze, CodeQL). ## Sequencing Builds on PR #132 (shard split, merged) and PR #133 (RUNBOOK + chart-appversion gate, merged). Independent of #142 (opportunistic.md curation, currently auto-merging) since they touch different shards. Merging order does not matter. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
trilamsr
added a commit
that referenced
this pull request
May 21, 2026
#147) ## Summary Single-PR bundle of 10 low-risk follow-up actions. Each row was anchor-verified on `main` before editing; no production behavior change. Diff is 10 files, +91/-37, dominated by markdown. **Breakdown:** - 3 strikes (anchor shipped, row was stale) - 1 test-only struct add (k8sevents `NodeWatchErrors`) - 1 bash test add (`no-autoupdate-check` hit-line format lock) - 5 doc-only clarifications / partial-ship / audit notes ## Items applied ### Strikes — anchor on `main` confirms shipped 1. **M3.md L188 `make doc-check`.** `scripts/doc-check.sh` header reads "verify every Test\*/Fuzz\*/Benchmark\* name referenced in docs"; wired into `make doc-check` and `make ci`. 2. **M8.md L103 `docs/HARDWARE-TESTING.md` libdcgm + nv-hostengine setup.** File exists (28 hits for libdcgm/dcgm/nv-hostengine); covers Ubuntu 22.04 driver / `libdcgm-dev` / `nv-hostengine` provisioning, x86_64 + aarch64-SBSA build matrix, and the `//go:build dcgm,hardware` distinction. Doc shipped ahead of cgo client to unblock GPU-less contributors. 3. **M19.md L18 `nodeWatchErrCount` not in SnapshotCounters.** Closed by item 6 below — added `NodeWatchErrors` field symmetrically. ### Test-only struct add 4. **components/receivers/k8sevents/export_test.go.** Added `NodeWatchErrors int64` field on `CountersForTest`; `SnapshotCounters` now reads `rr.nodeWatchErrCount.Load()` symmetrically with `rr.watchErrCount.Load()`. Both call sites are keyed-init inside the same file; no external positional callers to break (grep confirmed: only 2 hits, both in `export_test.go`). ### Bash test add (M23 grep-gate format lock) 5. **scripts/no-autoupdate-check_test.sh "hit-line-format-stable".** New assertion that runs the gate against the hyphenated-go-update fixture, captures stdout (existing tests discard it), and asserts at least one line matches `^[^:]+:[0-9]+:`. Locks the parseable hit-line shape *before* the first automation consumer (CI summary, dashboard, Slack notifier) wires up — a cosmetic tweak to the gate's message body now fails CI instead of silently breaking downstream parsers. M23.md row struck. ### Doc-only clarifications 6. **M15.md L185 falsifying-check backfill.** Anchored the "/var/lib/tracecore/ subdir governance" row's grep-falsifying-check to RFC-0010 §Proposal — `docs/rfcs/0010-containerstdout-receiver-scope.md` L177/L217/L274/L393/L407 already carry the convention ("M15 owns `/var/lib/tracecore/container_stdout/`. Future siblings reserve their own subdirectories."). Row marked `[x]`. 7. **M15.md L192 + RFC-0010 §Pod-attribution forward-pointer.** Appended one-line cross-reference at RFC-0010 L158 → `docs/followups/M15.md` "Cross-receiver rank-label reconciliation" so the deferred audit trail is discoverable from the RFC. Row marked `[x]`. 8. **M8.md L30 `tracecore debug dump` partial-ship.** `cmd/tracecore/debug.go::runDebugDump` already writes version + revision + branch + build date + Go runtime stats + registered components + redacted config to `tracecore debug dump > diagnostic.txt`. Remaining gap is "last N samples" — needs receiver-side ring buffer (M2 carry-forward). Row kept open with partial-ship line + remaining-trigger. 9. **M3.md L153 SUPPLY-CHAIN-IDENTITY.md scope clarification.** Added one sentence noting the consolidation is a copy-and-deduplicate pass against existing `release.yml` comment blocks (cosign-sign-blob, gh-attestation-sign), not net-new authoring — so the next reader sees the actual scope of work, not a misleading "30-min write" estimate that implies green-field. 10. **otlphttp.md L182 workflow paths audit + M14.md L88 test pointer.** - **otlphttp**: inlined audit findings (2026-05-20). `chart.yml` and `install-bench.yml` are substrate-aware (include `cmd/tracecore/**`, `internal/**`); `kernelevents-integration.yml` and `pyspy-integration.yml` cover only `components/receivers/<name>/**` + `internal/runtime/lifecycle/**` — a `cmd/tracecore` factory wiring or `internal/pipeline` contract change can land without re-running these integration jobs. `chaos.yml` covers `tools/failure-inject/**` + `internal/synthesis/**` only (indirect coupling, acceptable). Remaining: 6-line YAML edit per integration workflow. - **M14**: added inline pointer from the multi-retry slow-write fixture row to the existing single-retry baseline at `components/receivers/kineto/shutdown_test.go::TestIngest_RetryOnTruncated` so the future author has the test-shape anchor. ## Files changed | File | LOC | Kind | |---|---|---| | `components/receivers/k8sevents/export_test.go` | +2 | test struct field | | `scripts/no-autoupdate-check_test.sh` | +20 | bash test add | | `docs/rfcs/0010-containerstdout-receiver-scope.md` | +1/-1 | inline cross-ref | | `docs/followups/M3.md` | +9/-5 | strike + scope clarification | | `docs/followups/M8.md` | +16/-5 | strike + partial-ship | | `docs/followups/M14.md` | +1/-1 | test pointer | | `docs/followups/M15.md` | +15/-8 | 2 strikes | | `docs/followups/M19.md` | +5/-9 | strike (anchored to test add) | | `docs/followups/M23.md` | +9/-7 | strike | | `docs/followups/otlphttp.md` | +13/-1 | audit findings inline | ## Test plan - [x] `go test ./components/receivers/k8sevents/...` green. - [x] `bash scripts/no-autoupdate-check_test.sh` 10/10 assertions pass (added "hit-line-format-stable" — the 10th). - [x] `bash scripts/doc-check.sh` green (437 markdown links resolve, em-dash + en-dash diff gate clean, comment-noise diff gate clean). - [x] Pre-commit hook ran full `make check` + `make ci` (all package tests cached/passing). - [ ] CI green on this branch. ## Release notes ```release-notes NONE ``` ## Sequencing Builds on `main` after PRs #132 (shard split), #133 (RUNBOOK + chart-appversion), #142 (opportunistic curation), #134 (chaos.yml row), #143 (cross-shard audit). Independent of currently-open PRs #144 (m6 integration recipes) and #145 (m3 GHCR image publish). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
trilamsr
pushed a commit
that referenced
this pull request
May 21, 2026
Sync feature branch with main per the merge-not-rebase policy documented in CONTRIBUTING.md (commit ddf86f7). Main moved 5 PRs ahead during this branch's lifetime: - PR #143 (followups sweep) - PR #134 (chaos.yml pattern-pod-evicted) - PR #142 (follow-up curation) - PR #144 (M6 integration recipes) - PR #146 (kineto MaxEvents stub) - PR #147 (followups bundle) Conflicts expected in CHANGELOG.md and docs/followups/M3.md (both additive). # Conflicts: # CHANGELOG.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Curation pass on
docs/followups/opportunistic.md"Next up" rows. Builds on #132 (the shard split, now merged) and lands alongside #133 (RUNBOOK + chart-appversion audit, also merged).pr-workflow.md"audit stale rows before implementing" lesson: any row naming a specific file/symbol gets a grep/ls check before pickup.components/receivers/dcgm/receiver.goL67-70 + L81 already uselifecycle.Lifecycle. Row replaced with a "Closed" marker citing the validation date.dcgm/runbook_kinds_test.go; correct path iskernelevents/runbook_kinds_test.go. Also: trigger fired —selftelemetry.Kind("...")call sites now exist (capturing_test.go,classify_internal_test.go).enhancement+help wantedlabels. Each opportunistic row gains a*Tracked:* #NNback-link.Resource-bucket re-audit
Scanned every milestone shard for follow-ups gated on GPU hardware or production data; zero net moves. Other shards' GPU/production mentions are trigger conditions ("operator reports X") or describe what the receiver observes, not resource gates on the follow-up itself. M14 explicitly self-notes this. Documented as "Coverage scan (2026-05-20)" notes in
_needs-prod-data.mdand_needs-gpu.mdso the next curator knows the audit ran.Files changed
docs/followups/opportunistic.md— 7 Tracked-links, Item 5 strike, Item 8 anchor + trigger update.docs/followups/_needs-prod-data.md— coverage-scan note.docs/followups/_needs-gpu.md— coverage-scan note.Test plan
bash scripts/doc-check.shgreen locally (436 markdown links resolve, em-dash gate clean, 7 baseline unverified markers non-growing).grep -n lifecycle.Lifecycle components/receivers/dcgm/receiver.goshows the field at L81.Rebase note
Branch was originally based on the pre-merge
worktree-followups-splitbranch (#132). After #132 squash-merged + #133 landed, a straight rebase tried to replay #132's already-merged commits and conflicted with their squashed form ondocs/notes/. Resolution: reset branch toorigin/mainand cherry-pick only the self-contained curation commit (fdebda8→897acc8). Lesson: branch follow-on work off the same base the upstream PR targets (main), not off the feature branch.🤖 Generated with Claude Code