chore(pivot): PR-K.3 — drop helm dead toggles for deleted receivers#234
Merged
Conversation
PR-K.2 deleted clockreceiver/dcgm/kernelevents/containerstdout in-tree receivers + stdoutexporter; the chart still carried their toggles, RBAC, volume mounts, and policy exemptions. Wipe them so the values surface matches what the OCB binary can actually load. values.yaml drops the 5 retired blocks; helpers/daemonset/NOTES drop the conditional branches; containerstdout-rbac.yaml deleted entirely; rego policy drops the runAsUser=0 exemption and the M15 operational invariants that no longer have a receiver to gate. README replaces the dcgm overlay example with the otlphttp pattern. CI fixtures shrink to just the toggles that still resolve. helm lint, helm template against all 3 ci fixtures, conftest against the rendered DaemonSet + every bad-*.yaml fixture, make check, and make doc-check all green. Refs: #220 Signed-off-by: Tri Lam <tri@maydow.com>
This was referenced May 31, 2026
trilamsr
pushed a commit
that referenced
this pull request
Jun 1, 2026
trilamsr
added a commit
that referenced
this pull request
Jun 1, 2026
## Summary Release-prep cut for v0.2.0. Pure version-string bump — no code or dep changes. - `builder-config.yaml` `dist.version`: `0.1.0-m9-alpha` → `0.2.0`. - `install/kubernetes/tracecore/Chart.yaml` `version`: `0.1.0` → `0.2.0`; `appVersion`: `"0.1.0-m9-alpha"` → `"0.2.0"` (kept in lockstep by `scripts/chart-appversion-check.sh`). - `CHANGELOG.md`: promote the existing `[Unreleased]` body to a new `## [0.2.0] - 2026-05-31` section with a one-paragraph user-facing summary on top (OCB-built binary, OTel v0.130, in-repo `module/` submodule, self-tel rename, ko-published image, RFC-0013 pointer). A fresh empty `[Unreleased]` header sits above it. - `docs/migration/v0.1-to-v0.2.md`: flip 3 stale `[ ]` checkboxes (PR-I.1b #224, PR-I.2 #246, PR-K.3 #234) to `[x]` — all three landed pre-bump. After this merges, the operator follow-up is the manual `git tag -s v0.2.0 && git push origin v0.2.0` — that triggers `.github/workflows/release.yml` (goreleaser + ko + cosign + SLSA provenance). The tag push is intentionally NOT in this PR per release-sequencing discipline. ## Verification - `make check` — green (fmt, tidy-check, lint, vet, mod-verify) pre-bump AND post-bump. - `bash scripts/chart-appversion-check.sh` — green: `Chart.yaml appVersion (0.2.0) matches builder-config.yaml dist.version`. - `make build` — succeeded; OCB-built `./_build/tracecore` linked. - `./_build/tracecore --version` — prints `tracecore version v0.1.0-m1-206-g69f8981-dev` pre-tag (expected: no v0.2.0 tag yet; `git describe --match 'v*' --dirty=-dev` resolves against `v0.1.0-m1`). The release pipeline pins `TRACECORE_VERSION="$TAG"` for both goreleaser and ko-publish, so the in-image binary published from the v0.2.0 tag will report `v0.2.0` verbatim. The hardcoded `dist.version` fallback covers tarball-extract / no-git scenarios. - `make test` — green (root unit tests, race detector, all packages). - `cd module && go test ./...` — green (submodule: nccl_fr parser, patterns, replay, patterndetectorprocessor, rankjoinprocessor, ncclfrreceiver). ## Known follow-up (not in this PR) - CHANGELOG `[0.2.0]` section still carries pre-pivot M1-M11 `Added` rows that describe code PR-F.1/F.2/K.2 (same section) explicitly delete. Honest history but operators reading top-down can't tell what landed in v0.2.0 vs what was pre-pivot scaffolding. Filed for a separate sweep PR — kept out of this scope-of-3 contract. ## Test plan - [ ] Reviewer confirms `Chart.yaml::appVersion` matches `builder-config.yaml::dist.version` after merge (`bash scripts/chart-appversion-check.sh`). - [ ] Reviewer confirms `./_build/tracecore --version` will report `v0.2.0` when built from the v0.2.0 tag commit. - [ ] Operator follow-up post-merge: `git tag -s v0.2.0 -m "tracecore v0.2.0" <merge-sha> && git push origin v0.2.0` — release workflow fires automatically from the `push: tags: ['v*']` trigger. ## Release notes ```release-notes - release(v0.2.0): bump builder-config.yaml dist.version and install/kubernetes/tracecore/Chart.yaml version + appVersion from 0.1.0-m9-alpha to 0.2.0; promote CHANGELOG [Unreleased] body to a tagged [0.2.0] - 2026-05-31 section; sweep stale migration-guide checkboxes. ``` --------- Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
5 tasks
trilamsr
added a commit
that referenced
this pull request
Jun 2, 2026
…460) (#466) ## Summary Closes #460. The `exit 0` on `scripts/doc-check.sh` ran unconditionally whenever `docs/FAILURE-MODES.md` carried no `Test*`/`Fuzz*`/`Benchmark*` identifiers (its current state on `main` — `grep -c` = 0), silently bypassing every gate below it. Fix scopes the skip to the Go-test parity block only (if/else, not `exit`), then surfaces and fixes the dead refs the gates were supposed to be catching. ## Root cause Commit a57883f (#13) shipped `doc-check.sh` with one gate — the Go-test name parity check — so `[ -z "$referenced" ] && exit 0` was correct then. PRs #28, #56, #115, #131, #144, #149, #195, #234, #241, #443, #455, #459 (and others) appended gates **below** that line without recognising they'd become dead code whenever `FAILURE-MODES.md` lost its `Test*` references. PR #459 worked around the bug by placing its new YAML gate *above* line 99 and tracked the root cause separately as #460. ## What surfaced Once `exit 0` was removed, three real issues fired: 1. **Dead `.md` link**: `docs/FOLLOWUPS.md` → `followups/otlphttp.md`. The shard was never committed to `main`'s ancestry. Folded into the existing "Shards deleted post-v0.2.0 as fully resolved-via-pivot" prose block (sibling treatment to M9, M14, M16). 2. **Banned-phrase hits** (3x `production-grade`): reworded in `docs/cut-criteria.yaml.md` (2x) and `install/kubernetes/tracecore/README.md` (1x) to falsifiable language. 3. **`docs/getting-started.md` block cap**: 7 fenced bash/sh blocks. The M6 cap of 5 was set for the quickstart only — `## Install via Helm` and `## Air-gapped install` are alternate deployment paths that landed post-M6 and aren't part of the quickstart budget. Rescoped the gate to count blocks inside the `## Walkthrough` H2 section only (1 block, well under cap). ## Gate count Empirically verified via `grep -c '^doc-check: '` on `make doc-check` output on a clean tree: | State | Status lines emitted | Gates the early-exit was hiding | |---|---|---| | Pre-fix on `main` (post-#459) | 3 (trust-posture, YAML cross-link, parity-skip) | 14 | | Post-fix this PR (post-rebase) | 17 | 0 | The "14 gates hidden" number is invariant across the rebase: it counts gates placed below the early-exit line. The "3 → 17" total reflects post-#459 reality on `main`; pre-#459 baseline was "2 → 16" (the figure originally in this PR body), and #459 itself worked around the bug by placing its YAML gate above line 99. ## Mutation tests Each gate below the original early-exit was confirmed to fire post-fix: | Mutation | Gate expected to fire | Exit code post-mutation | Exit code post-restore | |---|---|---|---| | Inject `[bad](nonexistent-ghost.md)` into `docs/FOLLOWUPS.md` | markdown link-rot | 1 | 0 | | Append `blazing-fast` + `rock-solid` to `docs/getting-started.md` | banned-phrase lint | 1 | 0 | | Delete `<!-- tested-against: ... -->` from `docs/integrations/datadog.md` | M6 recipe markers | 1 | 0 | ## Test plan - [x] `make doc-check` exits 0 on clean tree (re-run post-rebase onto origin/main; 17 status lines) - [x] 3 mutation tests above each toggle exit 1 → 0 across mutate / restore - [x] Pre-push hooks green: golangci-lint (0 issues), `go vet ./...`, `go mod verify`, `attribute-namespace-check` (100 attrs, all documented), `register-lint`, `actionlint`, `zizmor`, `deprecation-check`, `no-autoupdate-check` - [x] Rebased onto current `origin/main` (includes #459, #461, #462, #456); no conflicts; gate count re-verified empirically post-rebase - [x] No changes to gates above line 99 (the trust-posture callout + YAML cross-link gate from #459 still run and emit unchanged status lines) ## Self-grade **A+** — root cause named in commit body (a57883f #13 with one gate; gates appended below without exit-path awareness); 3 mutation tests (success criteria required 1–2); rescoped the getting-started gate to match M6 intent rather than papering over the surfaced overflow; the `[ -z "$referenced" ]` legitimate skip is preserved via if/else (not `:` no-op, which would have left the `defined=` / `orphans=` block running on empty input); gate count corrected empirically post-rebase per reviewer B feedback. ```release-notes - fix(ci): `scripts/doc-check.sh` no longer exits 0 at the Go-test parity gate when `docs/FAILURE-MODES.md` carries no `Test*` references. 14 gates below that line (link-rot, banned-phrase, M6 recipe markers, etc.) are now actually enforced on every `make doc-check` invocation. Closes #460. ``` --------- Signed-off-by: Tri Lam <tree@lumalabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR-K.2 (#217) deleted the in-tree
clockreceiver,dcgm,kernelevents,containerstdoutreceivers andstdoutexporterfrom the binary; the chart still carried their values blocks, RBAC, volume mounts, and policy exemptions. Operators turning any toggle on would get a render that the OCB binary cannot load. This drops the dead surface — pre-1.0, no operator-deprecation tax owed (per single-contributor latitude).Partial close of #220: chart bits only. The
k8seventsvalues key referenced in the issue is not touched here (the receiver still ships); thecomponent-bug-kernelevents.ymlissue template referenced in #220 was already absent onmain.What changed
values.yaml: deletes the 5 retired toggle blocks (clockreceiver,dcgm,kernelevents,containerstdout,stdoutexporter); keepspyspy(in-tree-only pending OTel Profiles GA, per scope).templates/_helpers.tpl: drops thecontainerstdoutchart-only-keys omit branch inrenderedConfig; refreshes the retire-soon comment to present-state language.templates/daemonset.yaml: drops the conditionalautomountServiceAccountToken, conditional rootsecurityContext, conditional downward-API env block, and conditional volumes/volumeMounts. The mainlinerestricted-PSS path is the only path left.templates/containerstdout-rbac.yaml: deleted.templates/NOTES.txt: replaces the multi-toggle WARNING with a pyspy-only note.policies/conftest/tracecore.rego: dropscontainerstdout_enabledexemption from therunAsNonRoot/runAsUser=0/runAsGroup=0rules, and removes the full M15 operational-invariants block (4 deny rules + 2 helpers) that no longer have a receiver to gate.README.md: replaces the dcgm overlay worked example with the otlphttp pattern; updates the defaults table + lead paragraph from clockreceiver+stdoutexporter to hostmetrics+debug; rewrites the kernelevents-OOMKilled troubleshooting note as a generic high-volume-receiver bullet; trims deviations table.ci/{all-receivers-off,one-receiver-on,pyspy-on}-values.yaml: dropsenabled: falseentries for the deleted toggles (no render impact, just cleanup).Verification
Locally, against the worktree:
helm lint install/kubernetes/tracecore/— 0 failures, 0 WARNINGs.helm templateagainst the default values + all 3 ci fixtures — renders clean; defaultautomountServiceAccountToken: false;pyspy-onstill emits the pyspy receiver block;all-receivers-offcorrectly produces noreceivers:and nopipelines:.conftestagainst the rendered default DaemonSet — 39/39 pass. Eachbad-*.yamlfixture inpolicies/conftest/testdata/still denies (12/13 pass with 1 expected deny each —bad-runasuser-0/bad-runasgroup-0/bad-runasrootnow deny without the exemption escape hatch).good-baseline.yaml+good-sys-ptrace.yamlstill pass — 26/26.make check+make doc-check— green.CI (
chartworkflow): exercises every gate above against an actualhelm+conftestinstall, plus the kind-cluster e2e install, plustracecore validateon theone-receiver-onrendered config.Refs: #220