docs(integration): PR-J — filelog/journald/k8sobjects/prometheus recipes#195
Merged
Conversation
Ship the four receiver-side integration recipes named in RFC-0013 §migration PR-J. Each recipe replaces an in-tree receiver scheduled for deletion at v0.2.0: - filelog-container.md → replaces containerstdout - journald-kernel.md → replaces kernelevents (preserves kernelevents.xid + gpu.id per RFC-0013 §3) - k8sobjects-events.md → replaces k8sevents (derives the 11-entry k8s.event.hint enum per RFC-0013 §3) - prometheus-scrape.md → replaces dcgm + kueue (stamps the cross-vendor gpu.vendor resource attribute per RFC-0013 §3) Each recipe ships a matching docs/integrations/examples/*.yaml validated end-to-end by `make validator-recipe` against the OCB-built `./_build/tracecore validate`. New `requires-k8s-cluster` tested-against marker recognized by both scripts/doc-check.sh and scripts/validator-recipe.sh covers the k8sobjects recipe: the upstream k8sobjectsreceiver.Validate() enumerates server-preferred resources via the discovery client and cannot be exercised offline. The marker is mutation-verified — the gate fires on an unknown marker, the validator-recipe job skips with a named log line, and the recipe documents the cluster-side verification path. Updates docs/migration/v0.1-to-v0.2.md to flip the PR-J open-item to done with file pointers. CHANGELOG entry under Wave 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Tri Lam <tri@maydow.com>
Signed-off-by: Tri Lam <tri@maydow.com>
3 tasks
trilamsr
added a commit
that referenced
this pull request
May 31, 2026
) ## Summary Four amendments to `docs/rfcs/0013-distro-first-pivot.md` (plus a one-line sweep across 6 companion docs) per the scope-review findings staged before PR-I.1 / PR-K / PR-M code work begins. Pre-stages each decision in the RFC so the autonomous code PRs don't escalate mid-flight. ## Root cause #181 (RFC-0013 PR-I in-repo submodule rescope) was incomplete: 1. Sweep missed 6 companion docs still pointing at the original-design external `tracecoreai/tracecore-components` repo. 2. §7 listed 3 GitHub workflows for deletion that were already removed pre-RFC and 1 issue template (`component-bug-dcgm.yml`) that was already removed pre-RFC. 3. PR-K was a single 4-receiver-delete-plus-chart-migration mega-PR with no decoupling of the `internal/synthesis/patterns/` k8sevents dep break, which is on PR-I.2's critical path. 4. PR-I.1 conflated the `module/go.mod` scaffolding with the `git mv` + package rename, blocking PR-I.1a from landing without PR-B2 even though the scaffolding step has no nccl_fr dep. Mid-flight discovery during merge cycle: merged commit #188 (`feat(pivot): PR-B2 — port dcgm off internal selftel + lifecycle`) reused the `PR-B2` slug for a PR-B1-shape dcgm port (which is moot since dcgm is deleted entirely in PR-F), creating a naming collision against the canonical PR-B2 defined in the RFC — the nccl_fr `internal/{pipeline,consumer,runtime/lifecycle}` → upstream port that hard-gates the PR-I.1b `git mv`. ## Amendments 1. **§6/§7 sweep miss (Amendment 1)**: Remove surviving `tracecoreai/tracecore-components` external-repo references across `docs/getting-started.md`, `docs/followups/M11.md`, `docs/followups/M19.md`, `docs/FOLLOWUPS.md`, `docs/rfcs/0003-pipeline-runtime-and-component-contract.md`, `AGENTS.md`. All re-pointed at `github.com/tracecoreai/tracecore/module` per RFC-0013 §6. Verified zero surviving stale refs. 2. **§7 nonexistent workflow entries (Amendment 2)**: Collapse `pyspy-integration.yml`, `python-publish.yml`, `kernelevents-integration.yml` deletion rows into one row marked "already removed pre-RFC". `component-bug-dcgm.yml` also already removed. Only `component-bug-kernelevents.yml` survives for PR-K. §4 v0.3.0 row + PR-M slug cleaned for consistency. 3. **§migration PR-K sub-slice (Amendment 3)**: - **PR-K.1** — sever `internal/synthesis/patterns/` from `components/receivers/k8sevents` via local model types in `internal/synthesis/patterns/model.go`. No deletions. **Unblocks PR-I.2.** - **PR-K.2** — delete `components/receivers/{clockreceiver,kernelevents,k8sevents,containerstdout}` + migrate ~86 test fixtures + delete `tools/failure-inject/xidgen/` + keep `tools/failure-inject/ncclhang/`. - **PR-K.3** — chart cleanup: flip `containerstdout-on-values.yaml` to filelog+container-stanza, delete `containerstdout-rbac.yaml`, delete `.github/ISSUE_TEMPLATE/component-bug-kernelevents.yml`, ship `NOTES.txt` deprecation + values-key removal. 4. **§migration PR-I sub-slice + PR-B2 promotion (Amendment 4)**: - **PR-B2** reframed as hard gate for PR-I.1b: port `components/receivers/nccl_fr` off `internal/{pipeline,consumer,runtime/lifecycle}` to upstream `go.opentelemetry.io/collector/{component,receiver,consumer,pipeline}`. Slug-collision note added re: merged #188. - **PR-I.1a** — `module/go.mod` + root `go.work` + `builder-config.yaml` `replaces:` skeleton. No file movement. Tag `module/v0.0.1` (genesis tag, validates the tagging contract). - **PR-I.1b** — `git mv components/receivers/nccl_fr → module/receiver/ncclfrreceiver` + `git mv pkg/nccl/fr_parser → module/pkg/nccl/fr_parser` + rename Go package `nccl_fr` → `ncclfrreceiver` + update all importers. Hard-gated on PR-B2. No new tag; next bump is `module/v0.1.0` at PR-I.2. - **PR-I.2** — `rankjoinprocessor` + `patterndetectorprocessor` net-new. Hard-gated on PR-K.1. Tag `module/v0.1.0` (first version pinned in `builder-config.yaml` for v0.2.0). Also: PR-J marked `(landed, #195)` with note that recipe docs landed but chart-values compat map follows in PR-K.3. ## Adversarial review (5 lenses, inline) - **(a) PR slug internal consistency**: PR-I.1b ↔ PR-B2 ↔ PR-I.2 ↔ PR-K.1 bidirectional gates all match. PR-J landed marker consistent with #195. §4 v0.2.0 row has pre-existing drift (mentions dcgm+kueue in v0.2.0 when PR-F+#168 already deleted them in v0.1.0) — out of these 4 amendments' scope; flag for follow-up. - **(b) PR-B2 hard-gate naming**: tightened from "Hard gate for PR-I.1" to "Hard gate for PR-I.1b" — accurate because PR-I.1a is scaffolding-only with no file movement. - **(c) Sub-PR numbering collision**: #188 explicitly addressed in slug-collision note. #185/#186/#187/#193/#194/#196 are PR-B1-shape ports for non-nccl receivers (no PR-slug label in their commits), no collision. - **(d) Stale external-repo refs**: `grep -rn "tracecoreai/tracecore-components" docs/ AGENTS.md README.md` returns zero hits post-amendment. - **(e) Cross-reference link integrity**: `docs/migration/v0.1-to-v0.2.md` references `#migration--rollout` and `#3-customer-stable-telemetry-contracts`; both anchors preserved (§ headers unchanged). `make doc-check` confirms 526 markdown links resolve. ## Test plan - [x] `make doc-check` — 526 markdown links resolve, 0 stale refs, banned-phrase lint clean, alert-check + chart-appversion gates green. - [x] Pre-push hooks: golangci-lint clean, go vet clean, go mod verify clean. - [ ] CI doc-check + actionlint + zizmor gates pass on PR. ```release-notes NONE ``` Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
4 tasks
trilamsr
added a commit
that referenced
this pull request
May 31, 2026
## Summary Reconcile the four pivot-tracking docs (`docs/rfcs/0013-distro-first-pivot.md`, `CHANGELOG.md`, `MILESTONES.md`, `docs/migration/v0.1-to-v0.2.md`) with the wave-3 (PR-B1-shape sibling ports) and wave-4 (PR-B2-shape upstream-only ports + PR-F.1 + PR-J + PR-L + PR-N) landings. Pure doc sweep — no code or config touched. ## What changed ### `docs/rfcs/0013-distro-first-pivot.md` §migration PR sequence rows updated with PR-number citations and landed markers: - **PR-A2** (landed, #189, 2026-05-30) - **PR-B2** (landed, #201) — also enumerates sibling-receiver follow-ups under PR-B2 to dispel the slug collision with #188's PR-B2-labelled dcgm port: stdoutexporter (#202), pyspy (#203), kernelevents (#208), containerstdout (#209) - **PR-F.1** (landed) — fleshed-out delete list (`internal/{selftelemetry,telemetry}` + `components/receivers/dcgm/` + `pkg/dcgm/` + one orphan clockreceiver integration test) - **PR-F.2** re-scoped — now deletes the whole `internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}` bundle in one cut once the last three pipeline+consumer-importing receivers land (#204 k8sevents, #205 clockreceiver, #207 otlphttp). Per the import-graph state — `internal/componentstatus`'s only non-test consumer is `internal/pipeline`, so they delete together - **PR-G** (landed, #182), **PR-H** (landed, #183) - **PR-I.1a** (in flight — scaffold agent), **PR-I.1b** (pre-staged; gate satisfied by #201) - **PR-J** (landed, #195) — kept existing marker - **PR-K.1** (in flight — separate agent landing) - **PR-L** (landed, skeleton #179 + body #191) — flagged as living document - **PR-N** (landed, #200) — shipped at v0.1.0 ahead of v0.3.0 as a doc-only update at `docs/migration/v0.2-to-v0.3.md` ### `CHANGELOG.md` [Unreleased] - Restructured the pivot wave list as **four waves** (was three). Wave 3 enumerates PR-B1-shape sibling ports + support infra (#180-#194/#196). Wave 4 enumerates PR-B2-shape upstream-only ports + PR-J (#195) + PR-F.1 (#206) + PR-N (#200) + lint/TOCTOU hardening (#198/#210). - Tightened the PR-F.2 deferred note to point at the three open ports (#204/#205/#207) as the gate. ### `MILESTONES.md` - **M1** (pipeline runtime) — status row now cites PR-A2 (#189), PR-F.1 (#206), PR-F.2 gate (#204/#205/#207), PR-E (#180), retains `internal/config/` (still load-bearing for `tracecore validate`). - **M2** (self-telemetry) — status row now cites PR-F.1 (#206); flags `internal/componentstatus` as travelling with `internal/pipeline` in PR-F.2. - **M8** (DCGM receiver) — status flipped to *landed-and-replaced*: cites PR-F.1 (#206) deletion + PR-J (#195) `docs/integrations/prometheus-scrape.md` recipe. Notes the inert chart toggle retention until PR-K.3. ### `docs/migration/v0.1-to-v0.2.md` - §`internal/*` package deletion (PR-F) status flips from "not yet open" to "PR-F.1 landed (#206), PR-F.2 gated on three open ports". - Open-items checklist expanded from 5 to 13 entries — tracks every PR letter the migration guide cares about (A2 / E / F.1 / F.2 / I.1a-c / J / K.1-3 / L / N) with PR numbers and links. ## Why now Tracking docs accumulated drift across wave-3 + wave-4 because every sibling-port PR (and the support-infra PRs around them) updated the bottom of `CHANGELOG.md` but did not always touch the upstream sequencing section in RFC-0013. Per memory rule `[Keeping this document current]`: status drift is a review blocker. This PR is the consolidated catch-up; future port PRs include their RFC-row flip in-PR. ## What this PR does NOT change - No code, no config, no YAML, no chart — only the four tracking docs. - No new doc gates added; existing gates pass. - No PRs other than the four named docs are modified. ## Test plan - [x] `bash scripts/doc-check.sh` clean (33 test refs, 528 links resolve, comment-noise diff gate clean vs `origin/main`, all 13 gates green). - [x] Pre-commit hook (`commitlint` 72-char subject limit + DCO + AI-trailer gates) passed. - [x] Pre-push hook (`make ci-fast` equivalent: `golangci-lint`, `go vet`, `go mod verify`, `no-autoupdate-check`, `doc-check.sh`) passed on second attempt after `git fetch origin main` populated the worktree's `origin/main` ref — first push failed because the worktree previously tracked the (gone) `pr-a2-ocb-main-swap` branch, so `doc-check.sh`'s comment-noise diff-scope gate exited 128 on the missing ref. Root cause fixed by the fetch; not a workaround. - [ ] CI green on this branch. ```release-notes NONE ``` Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
8 tasks
trilamsr
added a commit
that referenced
this pull request
May 31, 2026
## Summary RFC-0013 §migration PR-K.2: delete the four in-tree receivers (`clockreceiver`, `kernelevents`, `k8sevents`, `containerstdout`) plus the `xidgen` failure-injector (whose sole consumer was `kernelevents`'s wire shape per RFC §migration L253). PR-K.1 (#211) just severed `internal/synthesis/patterns/` + `replay` from `k8sevents`, unblocking the source-tree cut. PR-J (#195) shipped the four upstream-OTel recipes that replace the in-tree receivers in the bundled Helm-chart pipeline; this PR retires the behind-the-curtain code. Chart cleanup (values keys, DaemonSet template refs, `NOTES.txt` deprecation warnings) intentionally stays in PR-K.3 so operators get one minor of deprecation telemetry before the values shape breaks. > **Branch state note:** PR-F.2 (#215, `internal/{componentstatus,pipeline,pipelinebuilder,config,consumer,fanout,runtime/lifecycle}` deletion) and PR-I.1a (#214, `module/` Go submodule scaffold + `go.work`) both landed on main mid-flight and have been merged into this branch. The `_test.go` placeholder-name migration originally scoped here (RFC §migration's "~86 fixture refs" line) is now moot — PR-F.2 deleted every `internal/*` file that held those refs, so the migration target evaporated. `make ci` + `make verify` + `make build` re-run green against the merged tree. ```release-notes [CHANGE] In-tree receivers clockreceiver/kernelevents/k8sevents/containerstdout deleted in favor of PR-J upstream-OTel recipes. xidgen failure-injector deleted alongside kernelevents (sole consumer). ``` ## What lands ### Deletions — receivers (4) | Path | Files | LOC | Replacement (PR-J recipe) | |---|---|---|---| | `components/receivers/clockreceiver/` | 10 | ~1.4k | `hostmetricsreceiver` (loadscraper @ 1s) — PR-E landed in #180. RFC-0013 §migration originally named `telemetrygeneratorreceiver` but that receiver does not exist in opentelemetry-collector-contrib (contrib #41687 + #43657 both closed `not_planned`). | | `components/receivers/kernelevents/` | 49 | ~13.2k | [`journaldreceiver` + `filelogreceiver` (kmsg) + OTTL Xid transform](../blob/main/docs/integrations/journald-kernel.md). Customer-stable `kernelevents.xid` + `gpu.id` attributes preserved via the OTTL transform per RFC-0013 §3. | | `components/receivers/k8sevents/` | 37 | ~6.8k | [`k8sobjectsreceiver` (watch mode on `events`) + OTTL `k8s.event.hint` transform](../blob/main/docs/integrations/k8sobjects-events.md). 11-entry hint enum preserved via the OTTL transform per RFC-0013 §3. The typed `internal/synthesis/patterns/Record` + `NodeRecord` (severed in PR-K.1) keeps M19's pod-evicted detector pinned. | | `components/receivers/containerstdout/` | 56 | ~6.4k | [`filelogreceiver` + container stanza + `file_storage` extension](../blob/main/docs/integrations/filelog-container.md). Per-rank attribution + dataloader-timing extraction move to OTTL transforms in the bundled recipe per RFC-0013 §3. | ### Deletions — supporting infra | Path | Why | |---|---| | `tools/failure-inject/xidgen/` (2 files, 281 LOC) | Sole consumer was the kernelevents wire shape per RFC-0013 §migration L253. Operators inject NVRM Xid via real `/dev/kmsg` (`sudo tee`) or `systemd-cat` against the journald recipe. | | `install/kubernetes/tracecore/ci/containerstdout-on-values.yaml` (41 LOC) | Chart-render fixture for the deleted receiver. Remaining chart fixtures (`all-receivers-off-values.yaml`, `one-receiver-on-values.yaml`, `pyspy-on-values.yaml`) untouched; their `clockreceiver: enabled: false` / `kernelevents: enabled: false` rows survive to PR-K.3's chart cleanup. | | `.github/ISSUE_TEMPLATE/component-bug-kernelevents.yml` | Receiver-specific bug template; no surviving receiver. | ### failure-inject CLI surface `failure-inject xid --code=N [--format=…] [--count=N]` removed. `failure-inject {pod-evict,nccl-hang,cpu-steal}` unchanged. `tools/failure-inject/testdata/golden.sha256` drops the two `xid` golden rows. `.github/workflows/chaos.yml` drops the two two-run-determinism steps that exercised the xid byte-determinism contract; pod-evict's determinism + the golden-SHA replay loop survive. ### Tooling shed - `Makefile`: retire `test-extras-sustained` body (was kernelevents- only; now `@true` — target retained so downstream automation has a stable name; future sustained-load suites slot back in). Retire `test-extras-fuzz-kmsg` / `test-extras-fuzz-journald`; `test-extras-fuzz` loop drops to nccl-fr only. Drop the kernelevents row from `test-extras-race`. Empty the `bench-check` for-loop (k8sevents was the only baseline; PR-F.2 already rewrote the comment block to reflect this — the merge keeps both edits aligned). - `go.mod`: sheds `k8s.io/{api,apimachinery,client-go,klog/v2,kube-openapi,utils}`, `sigs.k8s.io/{json,randfill,structured-merge-diff/v6,yaml}`, `gopkg.in/{evanphx/json-patch.v4,inf.v0}` — the dep cluster k8sevents dragged in. `go.uber.org/goleak` also dropped post- merge (was held only by PR-F.2's-deleted `internal/pipeline/chaos_test.go`). ### Doc + comment sweep Comment-only references to deleted receivers in `components/exporters/{otlphttp,stdoutexporter}/`, `components/receivers/{nccl_fr,pyspy}/`, `internal/synthesis/patterns/{doc,model,verdict}.go` rewired to surviving references (or to upstream recipe pointers). `docs/README.md` per-component-docs table drops five dead links (caught by `doc-check.sh`'s rotten-link gate — that is in fact how I caught the last drift). `bench/install/README.md` tick-alias note + schema-v2-rename note updated. `tools/failure-inject/README.md` xid section removed; status banner rewritten. CHANGELOG (new PR-K.2 entry under Unreleased + wave-4 paragraph re-balanced), MILESTONES (M1 + M9 + M10 + M15 status lines flipped from "DELETED at v0.2.0" → "DELETED in PR-K.2" with file pointers to the integration recipes), `docs/migration/v0.1-to-v0.2.md` (PR-K.1 + PR-K.2 checkboxes flipped, PR-F.2 + PR-I.1a status block updated post-merge), AGENTS.md (queued-for-deletion paragraph updated to current reality) all swept. ### What evaporated mid-flight Originally this PR was also going to migrate a handful of `_test.go` files that held placeholder-name string references to `clockreceiver` (in `internal/pipeline/saferun_test.go`, `internal/config/fuzz_test.go`, `internal/pipelinebuilder/fuzz_test.go`). PR-F.2 (#215) deleted those files outright while this branch was being drafted, so the migration target disappeared. No follow-up needed. ## Net LOC delta ``` 192 files changed, 120 insertions(+), 28,927 deletions(-) ``` ## What is intentionally NOT in this PR - **Helm-chart `receivers.clockreceiver` / `receivers.kernelevents` / `receivers.containerstdout` toggles + DaemonSet template refs + `containerstdout-rbac.yaml` template** — stays for PR-K.3 so operators see `NOTES.txt` deprecation warnings before the values shape breaks. The toggles are already inert post-PR-A2 (enabling any of them in `values.yaml` crashes the OCB binary at boot because the factories are not registered) — keeping them as no-ops for one minor preserves operator UX. - **`internal/{componentstatus,pipeline,…}/` deletion** — already done in PR-F.2 (#215) before this branch landed; merged in. - **Chart `values.yaml` `# clockreceiver — in-tree heartbeat retired by RFC-0013 PR-A2` style retire-banners** — stays for PR-K.3 alongside the actual values-keys cleanup. - **`tools/failure-inject/ncclhang/`** — KEPT. Used by `pkg/nccl/fr_parser/synthesize_test.go` + `bench/overhead/nccl_fr_bench_test.go`; this is the canonical example of a failure-injector that survives the v0.2.0 cut. ## Root cause The four receivers + xidgen survived only because PR-K.1's pattern- library severance from `k8sevents` was still in flight. With #211 merged at the start of this session, the deletion is unblocked. There is no workaround being applied here — this PR is the root-cause deletion of the in-tree receivers themselves, which RFC-0013 §migration set as the v0.2.0 deletion target. ## Test plan - [x] `make ci` green post-merge (verify + license + nccl-fr-rce-gate + register-lint + actionlint + zizmor + coverage-check + ci-fuzz-nccl-fr + govulncheck + doc-check + no-autoupdate-check + build). - [x] `make verify` green post-merge. - [x] `make build` green post-merge (OCB compile against `builder-config.yaml` with `GOWORK=off` per the PR-I.1a isolation guard yields `./_build/tracecore`). - [x] `go test ./...` green across the post-merge tree. - [x] Hard pre-flight: zero external Go importers for any deletion target (`clockreceiver`, `kernelevents`, `k8sevents`, `containerstdout`, `xidgen`) — re-verified after the merge. - [ ] CI: `chart-render` job validates the surviving chart fixtures (`all-receivers-off-values.yaml`, `one-receiver-on-values.yaml`, `pyspy-on-values.yaml`); the deleted `containerstdout-on-values.yaml` drop-out should not regress conftest coverage of the containerstdout-allowlist / operational-invariant rules because the template guard still fires when `containerstdout.enabled=true` is set in any future values file. PR-K.3 will reassess once the chart-side keys go. - [ ] CI: `install (kind)` job continues to render the bench tracecore-values.yaml against the OCB binary (hostmetricsreceiver heartbeat surface). - [ ] CI: `harness-determinism (amd64/arm64)` job no longer runs the xid byte-determinism steps; pod-evict + golden-SHA loop survive. Expected: 2 fewer steps per matrix arm. ## Gates that should fail this PR if I missed something - `doc-check`'s "dead markdown link" gate would catch any surviving link into the deleted dirs (caught the `docs/README.md` regressions on the first run; fixed and re-verified). - `go vet ./...` would catch a stale import; ran clean. - `golangci-lint run ./...` would catch unused imports or dead code introduced by the sweep; 0 issues reported. - `go mod tidy -diff` would catch a missing dep prune; ran clean after the post-merge prune of `go.uber.org/goleak`. Refs RFC-0013 §migration PR-K.2. Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
This was referenced Jun 1, 2026
trilamsr
added a commit
that referenced
this pull request
Jun 2, 2026
…460) (#466) ## Summary Closes #460. The `exit 0` on `scripts/doc-check.sh` ran unconditionally whenever `docs/FAILURE-MODES.md` carried no `Test*`/`Fuzz*`/`Benchmark*` identifiers (its current state on `main` — `grep -c` = 0), silently bypassing every gate below it. Fix scopes the skip to the Go-test parity block only (if/else, not `exit`), then surfaces and fixes the dead refs the gates were supposed to be catching. ## Root cause Commit a57883f (#13) shipped `doc-check.sh` with one gate — the Go-test name parity check — so `[ -z "$referenced" ] && exit 0` was correct then. PRs #28, #56, #115, #131, #144, #149, #195, #234, #241, #443, #455, #459 (and others) appended gates **below** that line without recognising they'd become dead code whenever `FAILURE-MODES.md` lost its `Test*` references. PR #459 worked around the bug by placing its new YAML gate *above* line 99 and tracked the root cause separately as #460. ## What surfaced Once `exit 0` was removed, three real issues fired: 1. **Dead `.md` link**: `docs/FOLLOWUPS.md` → `followups/otlphttp.md`. The shard was never committed to `main`'s ancestry. Folded into the existing "Shards deleted post-v0.2.0 as fully resolved-via-pivot" prose block (sibling treatment to M9, M14, M16). 2. **Banned-phrase hits** (3x `production-grade`): reworded in `docs/cut-criteria.yaml.md` (2x) and `install/kubernetes/tracecore/README.md` (1x) to falsifiable language. 3. **`docs/getting-started.md` block cap**: 7 fenced bash/sh blocks. The M6 cap of 5 was set for the quickstart only — `## Install via Helm` and `## Air-gapped install` are alternate deployment paths that landed post-M6 and aren't part of the quickstart budget. Rescoped the gate to count blocks inside the `## Walkthrough` H2 section only (1 block, well under cap). ## Gate count Empirically verified via `grep -c '^doc-check: '` on `make doc-check` output on a clean tree: | State | Status lines emitted | Gates the early-exit was hiding | |---|---|---| | Pre-fix on `main` (post-#459) | 3 (trust-posture, YAML cross-link, parity-skip) | 14 | | Post-fix this PR (post-rebase) | 17 | 0 | The "14 gates hidden" number is invariant across the rebase: it counts gates placed below the early-exit line. The "3 → 17" total reflects post-#459 reality on `main`; pre-#459 baseline was "2 → 16" (the figure originally in this PR body), and #459 itself worked around the bug by placing its YAML gate above line 99. ## Mutation tests Each gate below the original early-exit was confirmed to fire post-fix: | Mutation | Gate expected to fire | Exit code post-mutation | Exit code post-restore | |---|---|---|---| | Inject `[bad](nonexistent-ghost.md)` into `docs/FOLLOWUPS.md` | markdown link-rot | 1 | 0 | | Append `blazing-fast` + `rock-solid` to `docs/getting-started.md` | banned-phrase lint | 1 | 0 | | Delete `<!-- tested-against: ... -->` from `docs/integrations/datadog.md` | M6 recipe markers | 1 | 0 | ## Test plan - [x] `make doc-check` exits 0 on clean tree (re-run post-rebase onto origin/main; 17 status lines) - [x] 3 mutation tests above each toggle exit 1 → 0 across mutate / restore - [x] Pre-push hooks green: golangci-lint (0 issues), `go vet ./...`, `go mod verify`, `attribute-namespace-check` (100 attrs, all documented), `register-lint`, `actionlint`, `zizmor`, `deprecation-check`, `no-autoupdate-check` - [x] Rebased onto current `origin/main` (includes #459, #461, #462, #456); no conflicts; gate count re-verified empirically post-rebase - [x] No changes to gates above line 99 (the trust-posture callout + YAML cross-link gate from #459 still run and emit unchanged status lines) ## Self-grade **A+** — root cause named in commit body (a57883f #13 with one gate; gates appended below without exit-path awareness); 3 mutation tests (success criteria required 1–2); rescoped the getting-started gate to match M6 intent rather than papering over the surfaced overflow; the `[ -z "$referenced" ]` legitimate skip is preserved via if/else (not `:` no-op, which would have left the `defined=` / `orphans=` block running on empty input); gate count corrected empirically post-rebase per reviewer B feedback. ```release-notes - fix(ci): `scripts/doc-check.sh` no longer exits 0 at the Go-test parity gate when `docs/FAILURE-MODES.md` carries no `Test*` references. 14 gates below that line (link-rot, banned-phrase, M6 recipe markers, etc.) are now actually enforced on every `make doc-check` invocation. Closes #460. ``` --------- Signed-off-by: Tri Lam <tree@lumalabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Ships the four receiver-side integration recipes named in RFC-0013 §migration PR-J. Each recipe replaces an in-tree receiver scheduled for deletion at v0.2.0 (PR-K), wires the matching upstream contrib component(s) already bundled into the OCB binary, and preserves every customer-stable telemetry contract from RFC-0013 §3 so operator alerts survive the receiver swap.
filelog-container.mdcontainerstdoutk8sattributesprocessorjournald-kernel.mdkerneleventskernelevents.xid,gpu.idk8sobjects-events.mdk8seventsk8s.event.hintenumprometheus-scrape.mddcgm+kueuegpu.vendorEach recipe ships a matching
docs/integrations/examples/*.yamlvalidated end-to-end bymake validator-recipeagainst the OCB-built./_build/tracecore validate.Adversarial findings caught + fixed in this PR (root causes, not symptoms)
k8sobjectsreceiver puts the Event in
body, notattributes["object"]. Initial draft usedattributes["object"]["reason"]in every OTTLwhereclause. Reading the upstream emit path (unstructured_to_logdata.go:75-78) confirmed the receiver doesrecord.Body().SetEmptyMap()+FromRaw(e.Object)— the entire watch payload (type+object) lives inbody, not attributes (attributes hold onlyk8s.resource.name,event.domain,event.name). All eleven hint statements rewritten tobody["object"]["reason"].journald and kmsg have different body shapes; a single shared OTTL transform would runtime-error.
journaldreceiveremitsbody=map(every journald field keyed verbatim — confirmed inpkg/stanza/operator/input/journald/input.go:175-208).filelogreceiveron/dev/kmsgkeepsbody=stringbecause the regex_parser writes captures toattributesby default (helper/parser.go:21ParseTo: NewAttributeField()). The initial draft fanned both receivers into one pipeline with one transform —IsMatch(body, ...)would type-error against the journald map at runtime. Restructured into two pipelines (logs/kmsgwithtransform/kmsg_xid,logs/journaldwithtransform/journald_service_name) so each transform sees the body shape it was written against.YAML
:inside OTTL regex character classes breaks confmap parse. The errorcannot resolve the configuration: retrieved value (type=string) cannot be used as a Confsurfaced at validate time. Root cause: an unquoted[0-9a-fA-F:.]OTTL regex value triggered YAML mapping-key parsing on the embedded:. Single-quoted every OTTL statement injournald-kernel.yaml; added a comment + matching failure-mode row so future edits keep the quoting.k8sobjectsreceiver.Validate()makes a live API call. The receiver'sConfig.Validate()callsgetValidObjects()→getDiscoveryClient()→ServerPreferredResources()to enumerate valid object names BEFORE iteratingc.Objects. Without a reachable API servertracecore validateexits non-zero withKUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined. This is upstream-by-design — no in-repo workaround.Fix: new
<!-- tested-against: requires-k8s-cluster -->marker recognized by bothscripts/doc-check.sh(accepts) andscripts/validator-recipe.sh(skips with a named log line). The k8sobjects recipe carries this marker AND documents the cluster-side verification path operators can run to validate the recipe in their own environment. Mutation-verified: removing the marker fails the gate; an unknown marker fails the gate; the known marker passes both gates.file_storageextension rejectsdirectory must existat validate. Fixed by addingcreate_directory: true(upstream-supported field) so the checkpoint path is materialized at boot without an initContainer.Linked issue(s)
Refs RFC-0013 §migration PR-J. Closes the corresponding open-item in
docs/migration/v0.1-to-v0.2.md.Release notes
Docs-only PR. No operator-visible runtime change in this cut — v0.2.0 still gates on PR-K (in-tree-receiver deletion) and PR-L (final migration guide body). Recipes ship now so operators can prepare migration plans against the actual configs that will be the supported path.
Checklist
make validator-recipegate (5 validated, 3 skipped on this non-Linux dev host) and shape-asserted bymake doc-check(8/8 recipes carry examples-file + tested-against + last-verified ≤180d).make checkruns green continuously while editing;make verify(= check + license + generate-fixtures-check + build-tags + nccl-fr-rce-gate + register-lint + actionlint + zizmor + doc-check + no-autoupdate-check) passes before pushing.git commit -s).STYLE.md— N/A (no new components; recipes are docs + YAML examples consumed by existing M6 gates).Test plan
make build— OCB-assembles./_build/tracecorefrombuilder-config.yaml../_build/tracecore validate --config=docs/integrations/examples/filelog-container.yaml→ exit 0../_build/tracecore validate --config=docs/integrations/examples/journald-kernel.yaml→ exit 0../_build/tracecore validate --config=docs/integrations/examples/prometheus-scrape.yaml→ exit 0../_build/tracecore validate --config=docs/integrations/examples/k8sobjects-events.yaml→ expected non-zero on a dev laptop (no API server); validator-recipe gate skips with therequires-k8s-clusterlog line. Operator-side verification path documented in the recipe.make validator-recipe—5 validated, 3 skipped (non-linux host) of 8 recipe(s).make doc-check— 8 integration recipes + 8 examples pass all M6 gates;docs/README.mdindexes every recipe.make verify— full pre-push gate green (check, license-check, generate-fixtures-check, build-tags, nccl-fr-rce-gate, register-lint, actionlint, zizmorNo findings to report, doc-check, no-autoupdate-check).requires-k8s-clustermarker — replace it withsome-unknown-markerink8sobjects-events.md, watch doc-check name the failing file; restore.