Skip to content

docs(integrations): add trust-posture callout to 11 recipes (#443)#453

Merged
trilamsr merged 1 commit into
mainfrom
docs/443-trust-posture-callouts
Jun 2, 2026
Merged

docs(integrations): add trust-posture callout to 11 recipes (#443)#453
trilamsr merged 1 commit into
mainfrom
docs/443-trust-posture-callouts

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #443. Follow-up to PR #447 (issue #297). The 11 non-auth recipes under docs/integrations/examples/ and docs/examples/with-telemetry.yaml default to a ClusterIP-only OTLP receiver on 0.0.0.0:4318 with no auth: extension. That is the correct shape for the sidecar/DaemonSet ingestion path each recipe documents, but an operator can silently expose any of them via LoadBalancer/Ingress and never realise the receiver is unauthenticated.

This PR points operators at the upgrade path (docs/integrations/multi-cluster.md §Cross-cluster authentication) from every non-auth recipe, and adds a CI lint gate so future recipes can't ship without the callout.

Changes

  • Adds the identical 4-line # Trust posture: ... callout block to the top of all 11 files listed in docs(integrations): non-multi-cluster recipes ship unauth OTLP receivers — audit + scope guidance #443:
    • docs/integrations/examples/{honeycomb,tempo,clickhouse-direct,datadog,journald-kernel,k8sobjects-events,filelog-container,loki,otel-backend,prometheus-scrape}.yaml
    • docs/examples/with-telemetry.yaml
  • Cross-link uses repo-relative path docs/integrations/multi-cluster.md, so doc-check.sh's existing markdown link-rot gate keeps it honest.
  • Adds a doc-check.sh lint gate that asserts every docs/integrations/examples/*.yaml (plus docs/examples/with-telemetry.yaml) carries the # Trust posture: marker. Exempted by basename: the six multi-cluster-*.yaml files (the upgrade target itself) and cert-manager-mtls.yaml (not a tracecore receiver config — Kubernetes manifests).

Root cause

#447 closed the cross-cluster auth gap by adding bearer-token + mTLS shapes to multi-cluster.md, but did not annotate the other recipes. An operator landing on honeycomb.md or tempo.md (etc.) sees a naked otlp receiver on 0.0.0.0:4318 and no prose telling them the recipe assumes a single-cluster trust boundary. The fix names that trust assumption at the top of each file and points to the auth upgrade shape one click away.

Test plan

  • bash scripts/doc-check.sh reports 11 integration example(s) carry trust-posture callout and exits 0
  • Mutation-verified: deleted callout from honeycomb.yaml → gate fails naming the file; restored → gate passes
  • make validator-recipe: 14 recipes validated, 3 OS-skipped (requires-k8s-cluster, requires-linux), 0 failures
  • make build succeeds (tracecore OCB binary)
  • Pre-push hooks green: golangci-lint, go vet, go mod verify, attribute-namespace-check, no-autoupdate-check

Note: docs/examples/with-telemetry.yaml fails tracecore validate against the current OCB binary because it still references the retired clockreceiver + stdoutexporter (RFC-0013 cleanup). That failure pre-dates this PR (reproduced on main via git stash), is unrelated to the trust-posture callout, and is out of scope for #443. The callout was still added per the issue spec.

- docs(integrations): every non-auth OTLP-receiver recipe now carries a top-of-file "Trust posture" callout naming the single-cluster ClusterIP-only assumption and cross-linking the bearer-token / mTLS upgrade shape in `multi-cluster.md`. CI gate (`doc-check.sh`) blocks future recipes that ship without the callout. Closes #443.

Recipes default to a single-cluster ClusterIP-only OTLP receiver on
0.0.0.0:4318 with no auth wired in. That is the correct shape for the
sidecar/DaemonSet ingestion path the recipes document, but an operator
can silently expose any of them via LoadBalancer/Ingress without
realising the receiver is unauthenticated. PR #447 (issue #297) added
bearer-token + mTLS shapes to docs/integrations/multi-cluster.md; this
PR points operators at that upgrade path from every non-auth recipe.

Adds an identical 3-line "Trust posture" callout at the top of each of
the 11 files enumerated by #443. Cross-link uses the repo-relative
docs/integrations/multi-cluster.md path so the existing doc-check link
gate keeps it honest.

Adds a doc-check.sh lint gate that asserts every
docs/integrations/examples/*.yaml carries the callout marker (with
the auth recipes and cert-manager-mtls.yaml exempted by basename).
Mutation-verified: delete the callout from any non-exempt file ->
gate fails naming the file; restore -> gate passes.

Closes #443.

Refs: #297, #447.
Signed-off-by: Tri Lam <tree@lumalabs.ai>
@trilamsr

trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

B/A/A+ Criteria

  • Operator UX: Copy-paste recipe → operator sees trust boundary callout immediately before deploy (solves UX target)
  • Regression guard: CI gate prevents future recipes from shipping without the callout (mutation-verifiable)
  • Simplicity: Identical 3-line callout across all 11 files; gate uses straightforward grep -qF pattern match

Findings

  1. Scope: 11 files, 44 identical callout lines — operator noise or necessary? Each recipe file now has the 4-line (blank comment included) identical prose block. For operators copy-pasting recipes, this is first-thing-seen UX. For reviewers / maintainers, it's 44 lines of boilerplate. No simpler alternative observed (single 1-line callout would require conditional formatting in the YAML that breaks the examples). Acceptable tradeoff for UX-first priority.

  2. Gate placed early, but comment is misleading. Builder notes: "Placed early in the script so it runs unconditionally; the test-name parity gate below short-circuits on referenced empty." This is factually correct in outcome but misdirected in reasoning — the trust-posture gate runs unconditionally because it comes FIRST (line 18), not because later gates short-circuit. The short-circuit behavior is a property of the test-name gate (line 24–26), which only affects gates AFTER it. Not a functional bug, but the comment could confuse future maintainers. Suggest rewording: "Placed early in the script so it runs before the test-name gate (which may short-circuit on empty references)."

  3. Cross-link rot risk: YAML callouts point to docs/integrations/multi-cluster.md, but doc-check.sh only scans markdown files. PR body claims "Cross-link uses repo-relative path docs/integrations/multi-cluster.md, so doc-check.sh's existing markdown link-rot gate keeps it honest." False. The gate at lines 46–98 only walks git ls-files '*.md' (line 88). If multi-cluster.md is renamed/moved, the links in the 11 YAML callouts will rot silently — they won't trigger doc-check.sh. This is a coverage gap. The markdown link-rot gate does NOT currently catch cross-links from YAML files into markdown targets. Risk: Future doc-cleanup sweeps could rename multi-cluster.md without realizing 11 recipe files now reference it by path.

  4. Exemption list verified: 6 multi-cluster files + cert-manager-mtls are correctly excluded. File count (11 in-scope + 7 exempt = 18 total under docs/integrations/examples/). Mutation test scenario works as documented.

  5. with-telemetry.yaml pre-existing validation failure noted correctly. PR acknowledges the file fails tracecore validate due to retired clockreceiver/stdoutexporter (pre-dates this PR). Callout was added anyway per issue spec (correct scope decision).

Simplification Sweep: Clean

  • No dead code, redundant abstractions, or premature helpers.
  • Comment structure is clear and mutation-test pattern is documented.
  • Exemption list is hardcoded (good — unlikely to change; avoids regex fragility).

VERDICT: A

Justification:

  • Closes docs(integrations): non-multi-cluster recipes ship unauth OTLP receivers — audit + scope guidance #443 per spec: callout on all 11 recipes, CI gate blocks future omissions, cross-link to auth upgrade path.
  • Operator UX is strong (first-thing-seen callout names trust assumption).
  • Gate is mutation-verifiable and runs unconditionally.
  • Minor: misleading comment on short-circuit reasoning (low severity, UX-only).
  • Risk: Cross-link rot in YAML files isn't caught by existing doc-check.sh scope, but this is a pre-existing architectural limitation of the gate (YAML files were explicitly out of scope). Not this PR's regression — noted for future doc-check enhancement (separate item).

Merge when ready.

@trilamsr trilamsr enabled auto-merge (squash) June 2, 2026 02:19
@trilamsr trilamsr merged commit 4373fca into main Jun 2, 2026
12 checks passed
@trilamsr trilamsr deleted the docs/443-trust-posture-callouts branch June 2, 2026 02:21
trilamsr added a commit that referenced this pull request Jun 2, 2026
## Summary

Closes #455. Surfaced by PR #453 review (finding 2). The existing
`scripts/doc-check.sh` markdown link-rot gates walk `*.md` files only
(`git ls-files '*.md'`), so the trust-posture callouts #453 added to 11
recipe YAMLs — every one of which cross-links
`docs/integrations/multi-cluster.md` — would silently rot if that target
is renamed. The "doc-check link gate keeps it honest" claim in #453's
body did not actually cover the YAML side.

## Root cause

Scope gap in `doc-check.sh`. Both markdown link-rot blocks iterate `git
ls-files '*.md'` and resolve link targets relative to the source
markdown file. YAML files were out of scope, and the cross-link
convention in `docs/**/*.yaml` is repo-root-relative (not
source-file-relative), so even if scope were extended naively the
resolver would point at the wrong base.

A second root cause was surfaced by reviewer B and addressed in the
BLOCK-fix commit: the original gate's unanchored regex would match
`docs/foo.md` substrings inside URL path segments like
`https://example.com/docs/foo.md`, producing false positives. Fixed by
anchoring the regex on a leading line-start or non-URL boundary
character (whitespace, quote, backtick, paren, square bracket).

## Changes

- `scripts/doc-check.sh`: adds a YAML cross-link rot gate. Scope is
every tracked YAML under `docs/` — both top-level (`docs/*.yaml`, e.g.
`docs/cut-criteria.yaml`) and nested (`docs/**/*.yaml`). Bash globstar
with `**` requires ≥1 subdir so the top-level pattern is enumerated
separately; same for `.yml`. Extracts every `docs/...(md|yaml|yml)`
reference (anchored regex rejects URL path segments) and asserts the
target exists on disk. Single batched `grep -HoE` keeps the gate under
1s.
- `docs/integrations/examples/multi-cluster-source.yaml`: fixes one
pre-existing dead reference the original gate surfaced
(`docs/multi-cluster.md` → `docs/integrations/multi-cluster.md`).
- `docs/cut-criteria.yaml`: two side-effects of extending scope to
top-level YAMLs.
- `partner-outreach` rubric_check hard-coded to `"false"` (was a literal
`test -f docs/partner-outreach.md` ref to a file that intentionally does
not exist; rubric_check semantics are identical — both render the
criterion as PLANNED).
- Migration-guide prose updated to point at real per-minor files
(`v0.1-to-v0.2.md`, `v0.2-to-v0.3.md`) instead of the `v0.X-to-v0.Y.md`
placeholder template.
- `docs/v1-rc1-cut-criteria.md`: regenerated via `make
cut-criteria-render` to track the YAML edit (CI gate
`cut-criteria-check` enforces parity).

## Placement note

The new gate sits *above* the test-identifier resolution block
(`referenced=...`) because that block's `[ -z "$referenced" ] && exit 0`
short-circuit currently skips every gate below it when `scan_paths` has
no `Test*`/`Fuzz*`/`Benchmark*` references — which is its current state
on `main`. That short-circuit is a separate pre-existing bug; it also
silently bypasses the markdown link-rot gates this PR did NOT touch.
Tracked as #465 (not #460 — that was a stale reference; #465 is the
actual filed issue).

## Test plan

- [x] `make doc-check` exits 0 on clean tree (output: `46 YAML
cross-link(s) under docs/ resolve to on-disk files`; runtime ~0.84s
wall)
- [x] Mutation A (rename target): rename
`docs/integrations/multi-cluster.md` → gate exits 1, names all 12 YAMLs
referencing the now-gone target; restore → gate exits 0
- [x] Mutation B (ghost path): inject `# ghost:
docs/this-does-not-exist.md` into `loki.yaml` → gate exits 1; restore →
gate exits 0
- [x] Mutation C (URL false-positive): inject `# upstream:
https://example.com/docs/never-exists.md` into `loki.yaml` → gate exits
0 (no false positive; anchored regex rejects URL path segment)
- [x] Mutation D (delete target file): `mv
docs/integrations/multi-cluster.md` to a temp dir → gate exits 1
- [x] `make cut-criteria-check` exits 0 (rendered markdown matches YAML)
- [x] Pre-push hooks green: golangci-lint, go vet, go mod verify,
attribute-namespace-check, no-autoupdate-check

## Grade

B/A/A+ self-grade: **A+** — scope swept ALL `docs/*.yaml` +
`docs/**/*.yaml` + `.yml` variants (not just the two example dirs the
issue named), anchored regex prevents URL false-positives,
mutation-verified across 4 cases (rename / ghost / URL / delete),
surfaced and fixed two pre-existing dead refs in
`docs/cut-criteria.yaml`, and held the <1s timing budget.

```release-notes
- ci(doc-check): YAML files under `docs/` are now scanned for repo-relative cross-link rot. Every `docs/...(md|yaml|yml)` reference in tracked YAMLs under `docs/` (top-level and nested) must resolve to a real on-disk file or `doc-check.sh` exits non-zero. Anchored regex rejects URL path segments to avoid false positives. Closes #455.
```

---------

Signed-off-by: Tri Lam <tree@lumalabs.ai>
trilamsr added a commit that referenced this pull request Jun 2, 2026
Reviewer B flagged the PR body's "2 → 16" gate count as inaccurate
and asked whether the test plan had been re-run against post-#453
state. Resolution recorded here:

Gate count, empirically verified via `grep -c '^doc-check: '` on
`make doc-check` output on a clean tree:

  Pre-fix on main (after #459 landed)           3 status lines
  Post-fix this PR (after rebase onto origin)  17 status lines

The PR body's "2 → 16" was correct at PR open (before #459 merged).
After rebasing onto origin/main, the YAML cross-link gate from #459
adds one more enforced gate (it was placed above the early-exit line
specifically to work around the bug this PR fixes, so it ran on the
pre-fix baseline too). The "14 gates below that line" framing in
the release-notes block is correct for either base: it counts gates
that the early-exit was hiding, which is invariant across the rebase.

Test plan re-run post-rebase: `make doc-check` exits 0 on clean tree,
17 status lines emitted, no mutation regressions.

Refresh the script header docstring to record the empirical count
and reflect that doc-check now covers 17 drift patterns, not just
the original Go-test-name parity check it shipped with in #13.

No functional change to gate logic.

Signed-off-by: Tri Lam <tree@lumalabs.ai>
trilamsr added a commit that referenced this pull request Jun 2, 2026
…454) (#458)

## Summary

`docs/examples/with-telemetry.yaml` failed `tracecore validate` against
the OCB binary because it referenced components and a config shape
retired by RFC-0013:

```
'receivers' unknown type: "clockreceiver"
'exporters' unknown type: "stdoutexporter"
'' has invalid keys: telemetry
```

Swapped to the canonical post-pivot shape, matching
`docs/getting-started.md` and the in-repo Helm chart default:

- `hostmetrics` (load scraper @ 1s) replaces `clockreceiver`. Note:
`telemetrygeneratorreceiver` does not exist in
opentelemetry-collector-contrib (RFC-0013 PR-E; upstream contrib issues
#41687 and #43657 both closed `not_planned`). `hostmetrics` is the
actual successor.
- `debug` (verbosity: detailed) replaces `stdoutexporter`.
-
`service.telemetry.metrics.readers.pull.exporter.prometheus.{host,port}`
replaces the legacy top-level `telemetry:` block (gone in PR-A2).
- `healthcheckextension` at `:13133/` carries liveness + readiness (the
legacy block's `healthz`/`readyz` paths) via `service.extensions`.

## Root cause

Two layers:

1. **Direct**: RFC-0013 PR-A2 + PR-E retired `clockreceiver` and
`stdoutexporter` and folded the top-level `telemetry:` block into
upstream `service.telemetry`; the operator example was not migrated.
2. **Why it drifted undetected**: `scripts/validator-recipe.sh` only
walks `docs/integrations/*.md`; `docs/examples/` has no CI gate, so the
file stayed broken on main until PR #453 reviewer noticed.

A follow-up to add a `docs/examples/*.yaml` validate sweep to `make
validator-recipe` (or a sibling `validator-examples` target) is the
right structural fix. I'd file it as a separate issue if reviewer agrees
the scope is correct.

## A+ sweep finding

Scanned all repo YAMLs for the retired symbols. One additional broken
example surfaced: `module/receiver/ncclfrreceiver/example_config.yaml`
still references `stdoutexporter`, and its README invokes the retired
`tracecore collect` subcommand. Filed as #457 (out of scope for #454).

`docs/examples/prometheus-alerts.example.yaml` mentions `stdoutexporter`
in a regex-example comment string, but the file is a Prometheus rules
file (not a collector config) and structurally validates fine. Noted in
#457.

## Verification

```
$ ./_build/tracecore validate --config=docs/examples/with-telemetry.yaml
# (was exit 1 pre-fix; now exit 0)

$ make validator-recipe
validator-recipe: 14 validated, 3 skipped (non-linux host) of 12 recipe(s)
PASS test 1..5; validator-recipe_test: all assertions passed

$ make smoke-quickstart
smoke-quickstart: validated 2 YAML payload(s) and shell-syntax-checked 8 shell block(s) across 2 quickstart doc(s)
```

Pre-commit gates (DCO, AI-trailer, golangci-lint, go vet, go mod verify,
attribute-namespace-check) all passed.

## Test plan

- [x] `./_build/tracecore validate
--config=docs/examples/with-telemetry.yaml` exits 0 (was 1)
- [x] `make validator-recipe` passes (sibling-example sanity)
- [x] `make smoke-quickstart` passes (quickstart doc sanity)
- [x] A+ sweep across `**/*.{yaml,yml}` for
`clockreceiver|stdoutexporter|telemetrygeneratorreceiver`; #457 filed

```release-notes
docs(examples): fix with-telemetry.yaml so it validates against the OCB-assembled tracecore binary. Swap retired clockreceiver / stdoutexporter / top-level telemetry: block for hostmetrics + debug + service.telemetry + healthcheckextension per RFC-0013.
```

Closes #454.

Signed-off-by: Tri Lam <tree@lumalabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs(integrations): non-multi-cluster recipes ship unauth OTLP receivers — audit + scope guidance

1 participant