From f02410203bdb73204f6575bcee8a778eef03975e Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Mon, 1 Jun 2026 21:47:49 -0700 Subject: [PATCH 1/3] docs(audit): fix 15 broken cross-ref anchors + add anchor-drift gate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Deep cross-ref audit of docs/**/*.md after the v1-rc1 wave. All 15 broken anchor fragments traced to two heading renames whose call sites were never updated: - ATTRIBUTES.md `Soft-lock policy → Lock policy` (6 sites across ATTRIBUTES, DEPRECATION, standards-roadmap, migration/v0.x-to-v1.0). - RFC-0013 `Migration / rollout` heading whose GitHub slug carries a double-dash around the stripped `/` (8 sites across integrations, notes/ci, reproducibility, migration/v0.{1,2}-to-v0.{2,3}). - reference-architectures/README §-pointer to RFC-0013 §1 retargeted to §2 (the prose was describing the adoption matrix, not "binary assembled by OCB"). New gate: scripts/md-anchor-check.py wired into scripts/doc-check.sh as a sibling to the existing .md-file-exists and YAML-link gates. Holds the GitHub-flavored heading-slug algorithm, walks every tracked .md outside docs/rfcs/** and docs/research/**, and validates every #anchor fragment against the resolved target's heading inventory. Skips #L blob-line anchors and code-span literals. Mutation test (break a link, expect exit 1) confirmed during introduction. Audit report at docs/audits/2026-06-cross-ref.md tracks before/after counts and the rename traces. make doc-check exit 0; 217 anchor refs resolve to heading slugs. Signed-off-by: Tri Lam --- docs/ATTRIBUTES.md | 2 +- docs/DEPRECATION.md | 6 +- docs/audits/2026-06-cross-ref.md | 139 +++++++++++++++++++ docs/integrations/filelog-container.md | 2 +- docs/integrations/journald-kernel.md | 2 +- docs/integrations/k8sobjects-events.md | 2 +- docs/migration/v0.1-to-v0.2.md | 4 +- docs/migration/v0.2-to-v0.3.md | 2 +- docs/migration/v0.x-to-v1.0.md | 2 +- docs/notes/ci.md | 2 +- docs/reference-architectures/README.md | 2 +- docs/reproducibility.md | 2 +- docs/standards-roadmap.md | 2 +- scripts/doc-check.sh | 19 +++ scripts/md-anchor-check.py | 184 +++++++++++++++++++++++++ 15 files changed, 357 insertions(+), 15 deletions(-) create mode 100644 docs/audits/2026-06-cross-ref.md create mode 100755 scripts/md-anchor-check.py diff --git a/docs/ATTRIBUTES.md b/docs/ATTRIBUTES.md index a137825f..4a1c08d7 100644 --- a/docs/ATTRIBUTES.md +++ b/docs/ATTRIBUTES.md @@ -99,7 +99,7 @@ predates the k8s-semconv split into `k8s.pod.name` + `k8s.pod.namespace` documented above. Both names co-emit through the v0.4–v0.5 deprecation window; the legacy name is removed in v0.6 per the two-minor removal-without-rename contract in -[Soft-lock policy](#soft-lock-policy). Migrate dashboards and +[Soft-lock policy](#lock-policy). Migrate dashboards and LogQL queries off `evicted_pod` onto `k8s.pod.{name,namespace}` before v0.6. diff --git a/docs/DEPRECATION.md b/docs/DEPRECATION.md index 73f14cdb..dbf560dc 100644 --- a/docs/DEPRECATION.md +++ b/docs/DEPRECATION.md @@ -7,7 +7,7 @@ binding document for v1.0-rc1 cut criterion 4 and is enforced by `scripts/deprecation-check.sh` (wired into `make deprecation-check` and CI). -Read [`docs/ATTRIBUTES.md` §Soft-lock policy](ATTRIBUTES.md#soft-lock-policy) +Read [`docs/ATTRIBUTES.md` §Soft-lock policy](ATTRIBUTES.md#lock-policy) first for the attribute-namespace-specific rename mechanics that this policy generalises. @@ -87,7 +87,7 @@ before it can move to `removed`: value): half the window — 1 minor for v1.x. The old name co-emits as `deprecated` for one minor, then is `removed` in the next minor. The attribute-namespace soft-lock policy -([`docs/ATTRIBUTES.md`](ATTRIBUTES.md#soft-lock-policy)) already +([`docs/ATTRIBUTES.md`](ATTRIBUTES.md#lock-policy)) already documents this for `pattern.*` / `tracecore.*` / `gen_ai.*` keys; that policy is the binding one for attribute renames specifically. @@ -266,7 +266,7 @@ symbol (and add the migration guide entry). — the binding rubric this doc satisfies. - [`docs/RELEASE-CHECKLIST.md`](RELEASE-CHECKLIST.md) RC gates — "Deprecation policy doc in place at `docs/DEPRECATION.md`". -- [`docs/ATTRIBUTES.md` §Soft-lock policy](ATTRIBUTES.md#soft-lock-policy) +- [`docs/ATTRIBUTES.md` §Soft-lock policy](ATTRIBUTES.md#lock-policy) — attribute-namespace-specific rename mechanics; this policy generalises it to non-attribute surfaces. - [`docs/STYLE-docs.md`](STYLE-docs.md) — doc-prose conventions diff --git a/docs/audits/2026-06-cross-ref.md b/docs/audits/2026-06-cross-ref.md new file mode 100644 index 00000000..06a5c64c --- /dev/null +++ b/docs/audits/2026-06-cross-ref.md @@ -0,0 +1,139 @@ +# Cross-reference audit — 2026-06 + +## Scope + +Deep audit of internal cross-references across `docs/**/*.md` after the +v1-rc1 wave (partner-outreach, launch/, NetworkPolicy README, +threat-model §6.G, security-audit-rfp). Five reference kinds inspected: + +1. Markdown links to other `.md` files (relative + repo-root-absolute paths). +2. Markdown links with `#fragment` anchors (heading slugs). +3. PR / issue numbers (`#NNN`). +4. RFC numbers (`RFC-0NNN`). +5. Memory-style slugs (`[[name]]`). + +## Method + +`scripts/md-anchor-check.py` (introduced by this audit, see §"Enforcement" +below) holds the GitHub-flavored markdown heading-slug algorithm and +walks every git-tracked `.md` file, validating relative links and +their `#anchor` fragments against the resolved target's heading +inventory. RFC numbers were validated by spot-check against +`docs/rfcs/`. PR/issue numbers were not validated (no offline source of +truth; they resolve to github.com). + +Historical-record exemptions (consistent with the rest of +`scripts/doc-check.sh`): + +- `docs/rfcs/**` — RFCs intentionally reference deleted / pre-merge + artifacts as part of the design record. +- `docs/research/**` — raw extracts from external sources. + +## Before / after + +| Reference kind | Total | Broken (pre-audit) | Broken (post-audit) | +| ------------------------------------ | ----- | ------------------ | ------------------- | +| `.md` file targets | 1099 | 0 | 0 | +| Non-`.md` intra-repo targets | 235 | 0 | 0 | +| `#anchor` fragments (this audit) | 217 | 15 | 0 | +| YAML cross-links into `docs/` | 48 | 0 | 0 | +| RFC-style references | 677 | 0 | 0 | +| `[[memory-slug]]` references | 6 | 0 | 0 | + +The file-existence and YAML-link gates were already green before this +audit — `make doc-check` was exit-0 from PR #459 onward. The drift +this audit caught lived one layer down: the fragment portion of the +`.md` link, which the existing gate intentionally stripped. + +## Fixes applied (15 broken anchors → 0) + +All fifteen breaks trace to two heading renames where the consuming +docs were not updated: + +### 1. `ATTRIBUTES.md` heading rename: `Soft-lock policy → Lock policy` + +The page's anchor became `#lock-policy` but five call sites still +referenced `#soft-lock-policy`. Anchor updated; the link-text "Soft-lock +policy" was preserved because it remains semantically accurate (the +section documents both soft-lock add/rename/remove mechanics and the +v1.0 hard-lock CI gate that consolidated under it). + +| File | Sites | +| ----------------------------------- | ----- | +| `docs/ATTRIBUTES.md` | 1 | +| `docs/DEPRECATION.md` | 3 | +| `docs/standards-roadmap.md` | 1 | +| `docs/migration/v0.x-to-v1.0.md` | 1 | + +### 2. `RFC-0013` heading slug: `migration-rollout → migration--rollout` + +GitHub's slug algorithm preserves the double dash that surrounds the +`/` in `Migration / rollout`. The original consuming docs all used the +single-dash form `#migration-rollout` (a manual guess that didn't +match how the slugifier actually behaves around stripped punctuation). + +| File | Sites | +| ------------------------------------------ | ----- | +| `docs/integrations/k8sobjects-events.md` | 1 | +| `docs/integrations/filelog-container.md` | 1 | +| `docs/integrations/journald-kernel.md` | 1 | +| `docs/notes/ci.md` | 1 | +| `docs/reproducibility.md` | 1 | +| `docs/migration/v0.1-to-v0.2.md` | 2 | +| `docs/migration/v0.2-to-v0.3.md` | 1 | + +### 3. `RFC-0013 §1 (adoption posture)` — wrong section number + +`docs/reference-architectures/README.md` linked +`#1-adoption-posture` but RFC-0013 §1 is "Binary assembled by OCB" and +§2 is "Adoption matrix" — the latter is what the prose calls out +(upstream OTel core/contrib vs tracecore in-tree). Retargeted to +`#2-adoption-matrix` and link text updated to match. + +## Intentional / known-stale references retained + +- **`#L` blob-line anchors** (e.g. the + `reproducibility.md#L36` citations in + `docs/v1-rc1-operational-gaps.md`). These resolve on github.com's + blob view, not the rendered markdown view, and are an intentional + citation style across operational-gap memos. The anchor gate + introduced in this audit skips `#L` so this style does not + false-positive. +- **`` `[text](path)` `` inside double backticks** in + `docs/notes/pr-workflow.md` line 73 — that is meta-documentation + explaining the link-rot gate's own scope, not a real link. + +## De-duplication candidates + +The audit surfaced two heavily-reused cross-refs but neither warrants +single-sourcing today: + +- `NORTHSTARS.md#o2-convenience--quality` (12 sites) and + `NORTHSTARS.md#o5-distribution--community` (11 sites) — these are + the project's stable goal anchors; high call-site count is the + feature, not the bug. Inlining or includes would obscure the + one-hop traceability that the doc set leans on. +- `v1-rc1-cut-criteria.md` per-criterion anchors (multiple) — same + reasoning. The cut criteria are deliberately the single source of + truth and every consuming doc one-hops to a specific criterion. + +## Enforcement + +This audit introduces `scripts/md-anchor-check.py`, wired into +`scripts/doc-check.sh` as a sibling to the existing +`.md-file-exists` and YAML-link gates. Mutation test (break a link, +expect gate fail) verified during introduction. + +The gate's scope: + +- Walks every git-tracked `*.md` file outside `docs/rfcs/**` and + `docs/research/**`. +- For every relative markdown link with an `#anchor` fragment whose + target is a `.md` file, computes the target file's + heading-slug set with the GitHub-flavored algorithm and asserts the + anchor is present. +- Skips `#L` blob-line anchors (intentional doc-style). +- Skips anchors inside fenced or inline-code spans (meta-doc + references to link syntax). + +The gate ran clean on 217 anchor references at audit close. diff --git a/docs/integrations/filelog-container.md b/docs/integrations/filelog-container.md index a111a06f..756dbe5c 100644 --- a/docs/integrations/filelog-container.md +++ b/docs/integrations/filelog-container.md @@ -20,7 +20,7 @@ unit-normalized) + `cuda_oom.gpu_index` (Int) attributes that [pattern #10's detector](../patterns/10-cuda-oom-deceptive.md) consumes. Replaces the in-tree `containerstdout` receiver scheduled for deletion at v0.2.0 per -[RFC-0013 §migration PR-K](../rfcs/0013-distro-first-pivot.md#migration-rollout) +[RFC-0013 §migration PR-K](../rfcs/0013-distro-first-pivot.md#migration--rollout) and §7 (Deletion list). ## Config diff --git a/docs/integrations/journald-kernel.md b/docs/integrations/journald-kernel.md index 0567908e..12c06073 100644 --- a/docs/integrations/journald-kernel.md +++ b/docs/integrations/journald-kernel.md @@ -10,7 +10,7 @@ journald and `/dev/kmsg` and normalizing the records through an OTTL [RFC-0013 §3](../rfcs/0013-distro-first-pivot.md#3-customer-stable-telemetry-contracts) so existing operator alerts survive the swap. Replaces the in-tree `kernelevents` receiver scheduled for deletion at v0.2.0 per -[RFC-0013 §migration PR-K](../rfcs/0013-distro-first-pivot.md#migration-rollout) +[RFC-0013 §migration PR-K](../rfcs/0013-distro-first-pivot.md#migration--rollout) and §7 (Deletion list). ## Config diff --git a/docs/integrations/k8sobjects-events.md b/docs/integrations/k8sobjects-events.md index 4b15a9c8..38e646db 100644 --- a/docs/integrations/k8sobjects-events.md +++ b/docs/integrations/k8sobjects-events.md @@ -13,7 +13,7 @@ Tracecore watches the Kubernetes Events API via the upstream `node_pressure`, `image_pull_failure`) via an OTTL `transform` processor. Replaces the in-tree `k8sevents` receiver scheduled for deletion at v0.2.0 per -[RFC-0013 §migration PR-K](../rfcs/0013-distro-first-pivot.md#migration-rollout) +[RFC-0013 §migration PR-K](../rfcs/0013-distro-first-pivot.md#migration--rollout) and §7 (Deletion list). > **Validation note.** The upstream `k8sobjectsreceiver` calls diff --git a/docs/migration/v0.1-to-v0.2.md b/docs/migration/v0.1-to-v0.2.md index 698499b1..451d7bff 100644 --- a/docs/migration/v0.1-to-v0.2.md +++ b/docs/migration/v0.1-to-v0.2.md @@ -1,6 +1,6 @@ # Migration: v0.1.x → v0.2.0 -This guide tells operators how to move from a `v0.1.x` deployment to `v0.2.0`. Every operator-visible break gets a row; everything not listed below is unchanged. Sections below mirror [RFC-0013 §migration](../rfcs/0013-distro-first-pivot.md#migration-rollout) (PR-A through PR-L). +This guide tells operators how to move from a `v0.1.x` deployment to `v0.2.0`. Every operator-visible break gets a row; everything not listed below is unchanged. Sections below mirror [RFC-0013 §migration](../rfcs/0013-distro-first-pivot.md#migration--rollout) (PR-A through PR-L). ## TL;DR @@ -79,7 +79,7 @@ To verify what's actually registered in the binary you're running: ./_build/tracecore components ``` -The chart does not yet ship a per-receiver `recipe:` switch — that mechanism arrives in [RFC-0013 PR-J](../rfcs/0013-distro-first-pivot.md#migration-rollout) along with the upstream-recipe templates. Until PR-J lands, the migration path for the in-tree receivers other than `clockreceiver`/`stdoutexporter` is: pin v0.1.x → wait for PR-J → cut over to the upstream-recipe values shape in one minor. +The chart does not yet ship a per-receiver `recipe:` switch — that mechanism arrives in [RFC-0013 PR-J](../rfcs/0013-distro-first-pivot.md#migration--rollout) along with the upstream-recipe templates. Until PR-J lands, the migration path for the in-tree receivers other than `clockreceiver`/`stdoutexporter` is: pin v0.1.x → wait for PR-J → cut over to the upstream-recipe values shape in one minor. ## Self-telemetry metric vocabulary diff --git a/docs/migration/v0.2-to-v0.3.md b/docs/migration/v0.2-to-v0.3.md index 49288799..764962df 100644 --- a/docs/migration/v0.2-to-v0.3.md +++ b/docs/migration/v0.2-to-v0.3.md @@ -167,7 +167,7 @@ helm upgrade tracecore install/kubernetes/tracecore \ - [#222: PR-M deferral memo](https://github.com/TraceCoreAI/tracecore/issues/222) — current PR-M status + re-evaluation triggers (OTel Profiles → Beta, parca-agent OTLP export) - [RFC-0013 §Adoption matrix](../rfcs/0013-distro-first-pivot.md#2-adoption-matrix) — why pyspy is on the eventual deletion path in favour of parca-agent (note: timing in the RFC predates the #222 deferral) -- [RFC-0013 §Migration / rollout](../rfcs/0013-distro-first-pivot.md#migration-rollout) — original PR-M and PR-N sequencing (supersede with #222 for current timeline) +- [RFC-0013 §Migration / rollout](../rfcs/0013-distro-first-pivot.md#migration--rollout) — original PR-M and PR-N sequencing (supersede with #222 for current timeline) - [RFC-0009 §Safety properties](../rfcs/0009-pyspy-receiver-scope.md#proposal) — design record of the cooperative receiver's zero-capability posture (still in force at v0.3.0) - [`components/receivers/pyspy/README.md`](../../components/receivers/pyspy/README.md) — cooperative receiver's user-facing docs (the receiver ships in v0.3.0) - [`components/receivers/pyspy/RUNBOOK.md`](../../components/receivers/pyspy/RUNBOOK.md) — per-kind operator triage for the cooperative receiver diff --git a/docs/migration/v0.x-to-v1.0.md b/docs/migration/v0.x-to-v1.0.md index a23e234e..a8a49d08 100644 --- a/docs/migration/v0.x-to-v1.0.md +++ b/docs/migration/v0.x-to-v1.0.md @@ -570,7 +570,7 @@ rejected. **What changed.** v0.x did not publish a binding deprecation policy; the per-attribute soft-lock note in -[`docs/ATTRIBUTES.md` §Soft-lock policy](../ATTRIBUTES.md#soft-lock-policy) +[`docs/ATTRIBUTES.md` §Soft-lock policy](../ATTRIBUTES.md#lock-policy) named windows in prose but the lint enforcing it was advisory. v1.0.0-rc1 promotes both the policy document and the CI gate to binding status. diff --git a/docs/notes/ci.md b/docs/notes/ci.md index eeb10a00..2115309e 100644 --- a/docs/notes/ci.md +++ b/docs/notes/ci.md @@ -31,7 +31,7 @@ Anchor: issue #327; `PRINCIPLES.md` §10; `Makefile` targets The release + integration workflow set changes across the v0.1.0 → v0.3.0 migration window per -[RFC-0013 §Migration](../rfcs/0013-distro-first-pivot.md#migration-rollout) +[RFC-0013 §Migration](../rfcs/0013-distro-first-pivot.md#migration--rollout) and §7 (Deletion list). Concrete schedule: - **v0.1.0** - `.github/workflows/release.yml` is rewritten on top diff --git a/docs/reference-architectures/README.md b/docs/reference-architectures/README.md index 8f763201..85e52064 100644 --- a/docs/reference-architectures/README.md +++ b/docs/reference-architectures/README.md @@ -60,7 +60,7 @@ lives where: 2. **Required components.** Receivers + exporters + extensions that must be enabled. Every named component is upstream OTel collector core or contrib unless explicitly flagged as tracecore in-tree - (per [RFC-0013 §1](../rfcs/0013-distro-first-pivot.md#1-adoption-posture)). + (per [RFC-0013 §2 adoption matrix](../rfcs/0013-distro-first-pivot.md#2-adoption-matrix)). 3. **`values.yaml` snippet.** Drop-in overlay against the in-repo chart. Snippets are bounded — anything bigger goes into [`docs/integrations/`](../integrations/) and is cross-linked. diff --git a/docs/reproducibility.md b/docs/reproducibility.md index e02a6675..08ea688a 100644 --- a/docs/reproducibility.md +++ b/docs/reproducibility.md @@ -3,7 +3,7 @@ Verify a published `tracecore` release end-to-end from source. The release pipeline lives in [`.github/workflows/release.yml`](../.github/workflows/release.yml) -(`package` job per [RFC-0013 §Migration PR-C](rfcs/0013-distro-first-pivot.md#migration-rollout)): +(`package` job per [RFC-0013 §Migration PR-C](rfcs/0013-distro-first-pivot.md#migration--rollout)): inline shell builds each `linux/{amd64,arm64}` OCB binary via `make build`, archives it with `tar --sort=name --owner=0 --group=0 --numeric-owner --mtime=@$SOURCE_DATE_EPOCH | gzip -n` diff --git a/docs/standards-roadmap.md b/docs/standards-roadmap.md index e23de55b..5c5ae28a 100644 --- a/docs/standards-roadmap.md +++ b/docs/standards-roadmap.md @@ -185,7 +185,7 @@ post-training), and predates this filing. owning the namespace. 3. **Neither lands; OTel ships a *third* shape.** We adopt the third shape. Tracecore's renames are governed by [ATTRIBUTES.md - §Soft-lock policy](ATTRIBUTES.md#soft-lock-policy) (one-minor + §Soft-lock policy](ATTRIBUTES.md#lock-policy) (one-minor deprecation window at v0.4, two-minor at v1.0). ### 4.2 Scope-creep risk diff --git a/scripts/doc-check.sh b/scripts/doc-check.sh index 0a80c3f0..e3f96626 100755 --- a/scripts/doc-check.sh +++ b/scripts/doc-check.sh @@ -263,6 +263,25 @@ fi echo "doc-check: $link_count markdown link(s) resolve to on-disk files" +# --- Markdown anchor-fragment integrity ------------------------------------- +# +# Drift pattern this gate closes: a heading rename leaves +# [text](path.md#old-slug) links pointing at a fragment the file no +# longer contains. The .md-file-exists gate above (#frag stripped) +# doesn't see this — the file resolves; the anchor doesn't. +# +# Caught during 2026-06 cross-ref audit (docs/audits/2026-06-cross-ref.md): +# 15 such breaks across 11 files traced to two renames +# (ATTRIBUTES.md `Soft-lock policy → Lock policy`, RFC-0013 +# `Migration / rollout` heading slug doubled the dash, never updated +# in the consuming docs). +# +# Sibling to PR #459's YAML link-rot gate but for markdown anchors. +# Python script holds the GitHub-flavored slug algorithm; this block +# is just the wrapper that exits non-zero on dead anchors. + +scripts/md-anchor-check.py + # --- Markdown link integrity: non-.md intra-repo targets -------------------- # # Drift pattern this gate closes: post-wave-5 sweep deleted ~30 source diff --git a/scripts/md-anchor-check.py b/scripts/md-anchor-check.py new file mode 100755 index 00000000..673bc28a --- /dev/null +++ b/scripts/md-anchor-check.py @@ -0,0 +1,184 @@ +#!/usr/bin/env python3 +"""md-anchor-check.py — markdown anchor-fragment drift gate. + +Closes a drift pattern that scripts/doc-check.sh §"Markdown link +integrity" intentionally skipped: the file resolves but the +#section-slug anchor does not match any heading slug in the target. + +Sibling to PR #459's YAML link-rot gate; sibling to the .md-file-exists +gate already in doc-check.sh. + +Scope: every git-tracked *.md file's relative links of shape +[text](path.md#anchor). Validates anchor against the GitHub-flavored +markdown slug of every heading in the resolved file. + +Exempts (consistent with the rest of doc-check.sh): + - docs/rfcs/** — historical record, intentionally references + deleted / pre-merge artifacts + - docs/research/** — raw extracts from external sources + +False-positive exemptions (intentional doc-style, validated on +github.com blob view, not the rendered markdown view): + - #L anchors point at blob line numbers, not heading slugs. + Skipped so that line-anchor citations + ([`reproducibility.md:36`](reproducibility.md#L36)) don't + false-positive. + +Exit 0 on green; exit 1 with the dead-anchor list on the first failure +(consistent with doc-check.sh's exit shape). +""" +from __future__ import annotations + +import os +import re +import subprocess +import sys +from collections import defaultdict +from urllib.parse import unquote + + +REPO = subprocess.check_output( + ["git", "rev-parse", "--show-toplevel"], text=True +).strip() + +EXEMPT_PREFIXES = ("docs/rfcs/", "docs/research/") + +HEADING_RE = re.compile(r"^(#{1,6})\s+(.+?)\s*$", re.MULTILINE) +MD_LINK_RE = re.compile(r"\[([^\]]+)\]\(([^)]+)\)") + + +def github_slug(text: str) -> str: + """Approximate GitHub-flavored markdown heading slug. + + Matches the algorithm well enough for the heading shapes this repo + uses (verified against every cross-ref under docs/ as of the audit + that introduced this gate). + """ + s = text.strip().lower() + # Inline-code spans: keep content + s = re.sub(r"`([^`]*)`", r"\1", s) + # Inline links: keep link text + s = re.sub(r"\[([^\]]*)\]\([^)]*\)", r"\1", s) + out: list[str] = [] + for ch in s: + if ch.isalnum() or ch in "-_": + out.append(ch) + elif ch.isspace(): + out.append("-") + # else: dropped (period, ampersand, plus, slash, ×, →, …) + return "".join(out) + + +def anchors_for(path: str) -> set[str]: + if not os.path.exists(path): + return set() + with open(path, encoding="utf-8") as fp: + txt = fp.read() + # Strip fenced code so headings inside ``` blocks don't register + txt_no_code = re.sub(r"```.*?```", "", txt, flags=re.DOTALL) + anchors: set[str] = set() + seen: dict[str, int] = defaultdict(int) + for m in HEADING_RE.finditer(txt_no_code): + s = github_slug(m.group(2)) + seen[s] += 1 + anchors.add(s if seen[s] == 1 else f"{s}-{seen[s] - 1}") + # Explicit / + for m in re.finditer( + r" str: + """Remove fenced + inline-code spans so meta passages quoting + `[text](path)` as a literal example don't false-positive.""" + txt = re.sub(r"```.*?```", "", txt, flags=re.DOTALL) + # Double-backtick literal first (so the single-backtick rule below + # doesn't shred them). + txt = re.sub(r"``[^`]*(?:`[^`]+)*``", "", txt) + txt = re.sub(r"`[^`]+`", "", txt) + return txt + + +def is_exempt(rel_path: str) -> bool: + return any(rel_path.startswith(p) for p in EXEMPT_PREFIXES) + + +def main() -> int: + md_files = subprocess.check_output( + ["git", "ls-files", "*.md"], text=True, cwd=REPO + ).splitlines() + dead: list[str] = [] + checked = 0 + anchors_cache: dict[str, set[str]] = {} + for rel in md_files: + if is_exempt(rel): + continue + src = os.path.join(REPO, rel) + if not os.path.exists(src): + continue + with open(src, encoding="utf-8") as fp: + txt = fp.read() + txt = strip_link_meta(txt) + for m in MD_LINK_RE.finditer(txt): + url = m.group(2).strip() + if url.startswith(("http://", "https://", "mailto:", "tel:")): + continue + u = unquote(url) + if "#" not in u: + continue + path_part, _, anchor = u.partition("#") + if not anchor: + continue + # Blob-line-anchor citation (#L) is an intentional + # doc-style — points at GitHub blob view, not rendered .md. + if re.fullmatch(r"L\d+(?:-L?\d+)?", anchor): + continue + # Resolve target file (default to self if path_part empty) + if path_part == "": + target = src + elif path_part.startswith("/"): + target = os.path.join(REPO, path_part.lstrip("/")) + else: + target = os.path.normpath( + os.path.join(os.path.dirname(src), path_part) + ) + # Only validate anchors against .md targets (blob-anchor + # behaviour for non-.md is out of scope, matches the + # .md-only rule the file-existence gate uses). + if not target.endswith(".md"): + continue + if not os.path.exists(target): + # The file-existence gate already covers this; skip. + continue + if target not in anchors_cache: + anchors_cache[target] = anchors_for(target) + if anchor.lower() not in anchors_cache[target]: + tgt_rel = os.path.relpath(target, REPO) + dead.append( + f"{rel} → {url} (anchor #{anchor} not in {tgt_rel})" + ) + checked += 1 + if dead: + print("doc-check: dead markdown anchor(s) detected:") + for entry in dead: + print(f" - {entry}") + print() + print( + "Either fix the #anchor, point at the new heading slug, " + "or drop the fragment." + ) + return 1 + print( + f"doc-check: {checked} markdown anchor(s) resolve to heading " + "slugs" + ) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) From 7ff34752c44c3daeecffb583713abd2f47ddc562 Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Mon, 1 Jun 2026 22:21:37 -0700 Subject: [PATCH 2/3] =?UTF-8?q?docs(audit):=20correct=20file-count=20comme?= =?UTF-8?q?nt=2011=E2=86=9212=20(B=20fix)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Tri Lam --- scripts/doc-check.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/doc-check.sh b/scripts/doc-check.sh index e3f96626..98972395 100755 --- a/scripts/doc-check.sh +++ b/scripts/doc-check.sh @@ -271,7 +271,7 @@ echo "doc-check: $link_count markdown link(s) resolve to on-disk files" # doesn't see this — the file resolves; the anchor doesn't. # # Caught during 2026-06 cross-ref audit (docs/audits/2026-06-cross-ref.md): -# 15 such breaks across 11 files traced to two renames +# 15 such breaks across 12 files traced to two renames # (ATTRIBUTES.md `Soft-lock policy → Lock policy`, RFC-0013 # `Migration / rollout` heading slug doubled the dash, never updated # in the consuming docs). From 8b7e6f47d002aad1c6558132cbd0b62c9e280447 Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Mon, 1 Jun 2026 22:28:43 -0700 Subject: [PATCH 3/3] docs(audit): note three issues (renames + section-ref) (B fix #2) Signed-off-by: Tri Lam --- scripts/doc-check.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/doc-check.sh b/scripts/doc-check.sh index 98972395..c9554a9b 100755 --- a/scripts/doc-check.sh +++ b/scripts/doc-check.sh @@ -271,7 +271,7 @@ echo "doc-check: $link_count markdown link(s) resolve to on-disk files" # doesn't see this — the file resolves; the anchor doesn't. # # Caught during 2026-06 cross-ref audit (docs/audits/2026-06-cross-ref.md): -# 15 such breaks across 12 files traced to two renames +# 15 such breaks across 12 files traced to two renames + one section-ref fix # (ATTRIBUTES.md `Soft-lock policy → Lock policy`, RFC-0013 # `Migration / rollout` heading slug doubled the dash, never updated # in the consuming docs).