docs(integrations): add trust-posture callout to 11 recipes (#443)#453
Conversation
Recipes default to a single-cluster ClusterIP-only OTLP receiver on 0.0.0.0:4318 with no auth wired in. That is the correct shape for the sidecar/DaemonSet ingestion path the recipes document, but an operator can silently expose any of them via LoadBalancer/Ingress without realising the receiver is unauthenticated. PR #447 (issue #297) added bearer-token + mTLS shapes to docs/integrations/multi-cluster.md; this PR points operators at that upgrade path from every non-auth recipe. Adds an identical 3-line "Trust posture" callout at the top of each of the 11 files enumerated by #443. Cross-link uses the repo-relative docs/integrations/multi-cluster.md path so the existing doc-check link gate keeps it honest. Adds a doc-check.sh lint gate that asserts every docs/integrations/examples/*.yaml carries the callout marker (with the auth recipes and cert-manager-mtls.yaml exempted by basename). Mutation-verified: delete the callout from any non-exempt file -> gate fails naming the file; restore -> gate passes. Closes #443. Refs: #297, #447. Signed-off-by: Tri Lam <tree@lumalabs.ai>
B/A/A+ Criteria
Findings
Simplification Sweep: Clean
VERDICT: AJustification:
Merge when ready. |
## Summary Closes #455. Surfaced by PR #453 review (finding 2). The existing `scripts/doc-check.sh` markdown link-rot gates walk `*.md` files only (`git ls-files '*.md'`), so the trust-posture callouts #453 added to 11 recipe YAMLs — every one of which cross-links `docs/integrations/multi-cluster.md` — would silently rot if that target is renamed. The "doc-check link gate keeps it honest" claim in #453's body did not actually cover the YAML side. ## Root cause Scope gap in `doc-check.sh`. Both markdown link-rot blocks iterate `git ls-files '*.md'` and resolve link targets relative to the source markdown file. YAML files were out of scope, and the cross-link convention in `docs/**/*.yaml` is repo-root-relative (not source-file-relative), so even if scope were extended naively the resolver would point at the wrong base. A second root cause was surfaced by reviewer B and addressed in the BLOCK-fix commit: the original gate's unanchored regex would match `docs/foo.md` substrings inside URL path segments like `https://example.com/docs/foo.md`, producing false positives. Fixed by anchoring the regex on a leading line-start or non-URL boundary character (whitespace, quote, backtick, paren, square bracket). ## Changes - `scripts/doc-check.sh`: adds a YAML cross-link rot gate. Scope is every tracked YAML under `docs/` — both top-level (`docs/*.yaml`, e.g. `docs/cut-criteria.yaml`) and nested (`docs/**/*.yaml`). Bash globstar with `**` requires ≥1 subdir so the top-level pattern is enumerated separately; same for `.yml`. Extracts every `docs/...(md|yaml|yml)` reference (anchored regex rejects URL path segments) and asserts the target exists on disk. Single batched `grep -HoE` keeps the gate under 1s. - `docs/integrations/examples/multi-cluster-source.yaml`: fixes one pre-existing dead reference the original gate surfaced (`docs/multi-cluster.md` → `docs/integrations/multi-cluster.md`). - `docs/cut-criteria.yaml`: two side-effects of extending scope to top-level YAMLs. - `partner-outreach` rubric_check hard-coded to `"false"` (was a literal `test -f docs/partner-outreach.md` ref to a file that intentionally does not exist; rubric_check semantics are identical — both render the criterion as PLANNED). - Migration-guide prose updated to point at real per-minor files (`v0.1-to-v0.2.md`, `v0.2-to-v0.3.md`) instead of the `v0.X-to-v0.Y.md` placeholder template. - `docs/v1-rc1-cut-criteria.md`: regenerated via `make cut-criteria-render` to track the YAML edit (CI gate `cut-criteria-check` enforces parity). ## Placement note The new gate sits *above* the test-identifier resolution block (`referenced=...`) because that block's `[ -z "$referenced" ] && exit 0` short-circuit currently skips every gate below it when `scan_paths` has no `Test*`/`Fuzz*`/`Benchmark*` references — which is its current state on `main`. That short-circuit is a separate pre-existing bug; it also silently bypasses the markdown link-rot gates this PR did NOT touch. Tracked as #465 (not #460 — that was a stale reference; #465 is the actual filed issue). ## Test plan - [x] `make doc-check` exits 0 on clean tree (output: `46 YAML cross-link(s) under docs/ resolve to on-disk files`; runtime ~0.84s wall) - [x] Mutation A (rename target): rename `docs/integrations/multi-cluster.md` → gate exits 1, names all 12 YAMLs referencing the now-gone target; restore → gate exits 0 - [x] Mutation B (ghost path): inject `# ghost: docs/this-does-not-exist.md` into `loki.yaml` → gate exits 1; restore → gate exits 0 - [x] Mutation C (URL false-positive): inject `# upstream: https://example.com/docs/never-exists.md` into `loki.yaml` → gate exits 0 (no false positive; anchored regex rejects URL path segment) - [x] Mutation D (delete target file): `mv docs/integrations/multi-cluster.md` to a temp dir → gate exits 1 - [x] `make cut-criteria-check` exits 0 (rendered markdown matches YAML) - [x] Pre-push hooks green: golangci-lint, go vet, go mod verify, attribute-namespace-check, no-autoupdate-check ## Grade B/A/A+ self-grade: **A+** — scope swept ALL `docs/*.yaml` + `docs/**/*.yaml` + `.yml` variants (not just the two example dirs the issue named), anchored regex prevents URL false-positives, mutation-verified across 4 cases (rename / ghost / URL / delete), surfaced and fixed two pre-existing dead refs in `docs/cut-criteria.yaml`, and held the <1s timing budget. ```release-notes - ci(doc-check): YAML files under `docs/` are now scanned for repo-relative cross-link rot. Every `docs/...(md|yaml|yml)` reference in tracked YAMLs under `docs/` (top-level and nested) must resolve to a real on-disk file or `doc-check.sh` exits non-zero. Anchored regex rejects URL path segments to avoid false positives. Closes #455. ``` --------- Signed-off-by: Tri Lam <tree@lumalabs.ai>
Reviewer B flagged the PR body's "2 → 16" gate count as inaccurate and asked whether the test plan had been re-run against post-#453 state. Resolution recorded here: Gate count, empirically verified via `grep -c '^doc-check: '` on `make doc-check` output on a clean tree: Pre-fix on main (after #459 landed) 3 status lines Post-fix this PR (after rebase onto origin) 17 status lines The PR body's "2 → 16" was correct at PR open (before #459 merged). After rebasing onto origin/main, the YAML cross-link gate from #459 adds one more enforced gate (it was placed above the early-exit line specifically to work around the bug this PR fixes, so it ran on the pre-fix baseline too). The "14 gates below that line" framing in the release-notes block is correct for either base: it counts gates that the early-exit was hiding, which is invariant across the rebase. Test plan re-run post-rebase: `make doc-check` exits 0 on clean tree, 17 status lines emitted, no mutation regressions. Refresh the script header docstring to record the empirical count and reflect that doc-check now covers 17 drift patterns, not just the original Go-test-name parity check it shipped with in #13. No functional change to gate logic. Signed-off-by: Tri Lam <tree@lumalabs.ai>
…454) (#458) ## Summary `docs/examples/with-telemetry.yaml` failed `tracecore validate` against the OCB binary because it referenced components and a config shape retired by RFC-0013: ``` 'receivers' unknown type: "clockreceiver" 'exporters' unknown type: "stdoutexporter" '' has invalid keys: telemetry ``` Swapped to the canonical post-pivot shape, matching `docs/getting-started.md` and the in-repo Helm chart default: - `hostmetrics` (load scraper @ 1s) replaces `clockreceiver`. Note: `telemetrygeneratorreceiver` does not exist in opentelemetry-collector-contrib (RFC-0013 PR-E; upstream contrib issues #41687 and #43657 both closed `not_planned`). `hostmetrics` is the actual successor. - `debug` (verbosity: detailed) replaces `stdoutexporter`. - `service.telemetry.metrics.readers.pull.exporter.prometheus.{host,port}` replaces the legacy top-level `telemetry:` block (gone in PR-A2). - `healthcheckextension` at `:13133/` carries liveness + readiness (the legacy block's `healthz`/`readyz` paths) via `service.extensions`. ## Root cause Two layers: 1. **Direct**: RFC-0013 PR-A2 + PR-E retired `clockreceiver` and `stdoutexporter` and folded the top-level `telemetry:` block into upstream `service.telemetry`; the operator example was not migrated. 2. **Why it drifted undetected**: `scripts/validator-recipe.sh` only walks `docs/integrations/*.md`; `docs/examples/` has no CI gate, so the file stayed broken on main until PR #453 reviewer noticed. A follow-up to add a `docs/examples/*.yaml` validate sweep to `make validator-recipe` (or a sibling `validator-examples` target) is the right structural fix. I'd file it as a separate issue if reviewer agrees the scope is correct. ## A+ sweep finding Scanned all repo YAMLs for the retired symbols. One additional broken example surfaced: `module/receiver/ncclfrreceiver/example_config.yaml` still references `stdoutexporter`, and its README invokes the retired `tracecore collect` subcommand. Filed as #457 (out of scope for #454). `docs/examples/prometheus-alerts.example.yaml` mentions `stdoutexporter` in a regex-example comment string, but the file is a Prometheus rules file (not a collector config) and structurally validates fine. Noted in #457. ## Verification ``` $ ./_build/tracecore validate --config=docs/examples/with-telemetry.yaml # (was exit 1 pre-fix; now exit 0) $ make validator-recipe validator-recipe: 14 validated, 3 skipped (non-linux host) of 12 recipe(s) PASS test 1..5; validator-recipe_test: all assertions passed $ make smoke-quickstart smoke-quickstart: validated 2 YAML payload(s) and shell-syntax-checked 8 shell block(s) across 2 quickstart doc(s) ``` Pre-commit gates (DCO, AI-trailer, golangci-lint, go vet, go mod verify, attribute-namespace-check) all passed. ## Test plan - [x] `./_build/tracecore validate --config=docs/examples/with-telemetry.yaml` exits 0 (was 1) - [x] `make validator-recipe` passes (sibling-example sanity) - [x] `make smoke-quickstart` passes (quickstart doc sanity) - [x] A+ sweep across `**/*.{yaml,yml}` for `clockreceiver|stdoutexporter|telemetrygeneratorreceiver`; #457 filed ```release-notes docs(examples): fix with-telemetry.yaml so it validates against the OCB-assembled tracecore binary. Swap retired clockreceiver / stdoutexporter / top-level telemetry: block for hostmetrics + debug + service.telemetry + healthcheckextension per RFC-0013. ``` Closes #454. Signed-off-by: Tri Lam <tree@lumalabs.ai>
Summary
Closes #443. Follow-up to PR #447 (issue #297). The 11 non-auth recipes under
docs/integrations/examples/anddocs/examples/with-telemetry.yamldefault to a ClusterIP-only OTLP receiver on0.0.0.0:4318with noauth:extension. That is the correct shape for the sidecar/DaemonSet ingestion path each recipe documents, but an operator can silently expose any of them via LoadBalancer/Ingress and never realise the receiver is unauthenticated.This PR points operators at the upgrade path (
docs/integrations/multi-cluster.md§Cross-cluster authentication) from every non-auth recipe, and adds a CI lint gate so future recipes can't ship without the callout.Changes
# Trust posture: ...callout block to the top of all 11 files listed in docs(integrations): non-multi-cluster recipes ship unauth OTLP receivers — audit + scope guidance #443:docs/integrations/examples/{honeycomb,tempo,clickhouse-direct,datadog,journald-kernel,k8sobjects-events,filelog-container,loki,otel-backend,prometheus-scrape}.yamldocs/examples/with-telemetry.yamldocs/integrations/multi-cluster.md, sodoc-check.sh's existing markdown link-rot gate keeps it honest.doc-check.shlint gate that asserts everydocs/integrations/examples/*.yaml(plusdocs/examples/with-telemetry.yaml) carries the# Trust posture:marker. Exempted by basename: the sixmulti-cluster-*.yamlfiles (the upgrade target itself) andcert-manager-mtls.yaml(not a tracecore receiver config — Kubernetes manifests).Root cause
#447 closed the cross-cluster auth gap by adding bearer-token + mTLS shapes to
multi-cluster.md, but did not annotate the other recipes. An operator landing onhoneycomb.mdortempo.md(etc.) sees a nakedotlpreceiver on0.0.0.0:4318and no prose telling them the recipe assumes a single-cluster trust boundary. The fix names that trust assumption at the top of each file and points to the auth upgrade shape one click away.Test plan
bash scripts/doc-check.shreports11 integration example(s) carry trust-posture calloutand exits 0honeycomb.yaml→ gate fails naming the file; restored → gate passesmake validator-recipe: 14 recipes validated, 3 OS-skipped (requires-k8s-cluster,requires-linux), 0 failuresmake buildsucceeds (tracecore OCB binary)Note:
docs/examples/with-telemetry.yamlfailstracecore validateagainst the current OCB binary because it still references the retiredclockreceiver+stdoutexporter(RFC-0013 cleanup). That failure pre-dates this PR (reproduced onmainviagit stash), is unrelated to the trust-posture callout, and is out of scope for #443. The callout was still added per the issue spec.