diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index b35c2137..adb2862c 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -42,6 +42,8 @@ jobs: run: make lint - name: nccl_fr RCE gate run: make nccl-fr-rce-gate + - name: register-lint + run: make register-lint - name: test (race) + coverage-check run: make coverage-check - name: 30s fuzz (nccl_fr parser) diff --git a/Makefile b/Makefile index a437559b..a685ec80 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -.PHONY: help build run test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify license-check license-fix govulncheck dco-check hooks clean check ci ci-fuzz-nccl-fr nccl-fr-rce-gate generate generate-check generate-fixtures coverage coverage-check doc-check smoke build-tags +.PHONY: help build run test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify license-check license-fix govulncheck dco-check hooks clean check ci ci-fuzz-nccl-fr nccl-fr-rce-gate register-lint generate generate-check generate-fixtures coverage coverage-check doc-check smoke build-tags BIN := tracecore PKG := ./cmd/tracecore @@ -193,7 +193,10 @@ doc-check: ## Verify test identifiers referenced in rot-prone docs exist in the @scripts/doc-check.sh @scripts/alert-check.sh -ci: license-check generate-check vet build-tags tidy-check mod-verify lint nccl-fr-rce-gate coverage-check ci-fuzz-nccl-fr govulncheck doc-check build ## Everything CI runs. Run before opening a PR. +register-lint: ## Verify `func Register*` symbols live only under components/** (or an explicit allowlist). Enforces STRATEGY.md §"Each component owns its own Factory var". + @scripts/register-lint.sh + +ci: license-check generate-check vet build-tags tidy-check mod-verify lint nccl-fr-rce-gate register-lint coverage-check ci-fuzz-nccl-fr govulncheck doc-check build ## Everything CI runs. Run before opening a PR. smoke: build ## End-to-end smoke test: validate the dcgm example config, run the binary for 1.5s, kill, assert lifecycle logs appear. No hardware required; receiver degrades cleanly on macOS/CI. @scripts/smoke.sh diff --git a/docs/FOLLOWUPS.md b/docs/FOLLOWUPS.md index 22eb98d7..d049ad70 100644 --- a/docs/FOLLOWUPS.md +++ b/docs/FOLLOWUPS.md @@ -550,11 +550,12 @@ predicate. Documented so they aren't re-litigated. ### Tooling -- [ ] **`make register-lint`** — fail CI if `func Register*(` - appears outside `components/**` (or a `Register*` call to a - centralized registry). Converts STRATEGY.md's "Each component - package owns its own Factory var" rule from policy into - enforcement. *Target:* opportunistic, ~1 hour. + +- *Closed (see comment above): `make register-lint` shipped and CI-gated.* - [ ] **OSS-Fuzz integration.** Tracecore fuzz targets currently run only inside `go test`. Continuous fuzzing is ~1 day of setup but premature pre-v0. *Trigger:* v0 cut — integrate or write diff --git a/scripts/register-lint.sh b/scripts/register-lint.sh new file mode 100755 index 00000000..19c5d230 --- /dev/null +++ b/scripts/register-lint.sh @@ -0,0 +1,89 @@ +#!/usr/bin/env bash +# register-lint.sh — enforce that `func Register*` symbols live only +# under `components/**` (or an explicit allowlist). +# +# Converts STRATEGY.md §"Each component package owns its own Factory +# var" from policy into enforcement. The hard rule there: +# +# "PRs that introduce a centralized Register*() API, a global +# factory map outside components.yaml + the generated +# components.go, or anything that looks like a plugin loading +# mechanism MUST be rejected without an accepted RFC." +# +# This gate trips the moment such an API surface re-appears outside +# `components/**`, instead of waiting for a reviewer to spot it. +# +# Scope: Go source files only (*.go), excluding vendor/, .git/, and +# nested worktrees under .claude/worktrees/. Test files are in scope +# — a test helper named `RegisterFactories` would be the same drift. +# +# Allowlist: files whose `Register*` functions are OTel-instrument +# registration helpers (different verb domain — they register metric +# instruments on a MeterProvider, not component factories). Keep this +# list narrow; each entry needs a one-line rationale. +# +# Exits 0 if no violations; exits 1 with a list of offenders. + +set -euo pipefail + +# Allowlist — paths (relative to repo root) where `func Register*` is +# legitimate. Each entry is OTel-instrument registration on a +# MeterProvider, NOT component-factory registration. Adding here +# requires a one-line rationale in this list AND review attention. +allowlist=( + # Registers `tracecore.build.info` observable gauge on a MeterProvider. + 'internal/telemetry/build_info.go' + # Registers tracecore.exporter.failure_rate / queue.depth_ratio / + # component.restart_count_per_hour gauges on a MeterProvider. + 'internal/telemetry/slo.go' +) + +# Find every Go file outside vendor/, .git/, .claude/worktrees/, and +# components/. The `find` flags exclude paths before grep ever sees them +# — cheaper than letting grep walk and prune later. Use a while-read +# loop instead of `mapfile` so the script runs on macOS bash 3.2. +violations="" +while IFS= read -r f; do + [ -n "$f" ] || continue + if ! grep -q '^func Register' "$f"; then + continue + fi + # Is this file allowlisted? + is_allowed=0 + for allowed in "${allowlist[@]}"; do + if [ "$f" = "$allowed" ]; then + is_allowed=1 + break + fi + done + if [ "$is_allowed" -eq 1 ]; then + continue + fi + # Capture the offending lines for the error message. + hits=$(grep -n '^func Register' "$f") + violations="${violations}${f}:\n${hits}\n" +done < <( + find . -name '*.go' \ + -not -path './.git/*' \ + -not -path './vendor/*' \ + -not -path './.claude/worktrees/*' \ + -not -path './components/*' \ + | sed 's|^\./||' \ + | sort +) + +if [ -n "$violations" ]; then + echo "register-lint: 'func Register*' found outside components/ (and outside the allowlist):" + printf '%b' "$violations" | sed 's/^/ /' + echo + echo "Per STRATEGY.md §\"Each component package owns its own Factory var\"," + echo "a centralized Register*() API for components is banned without an RFC." + echo "If this is OTel-instrument registration (not component-factory" + echo "registration), add the file to the allowlist in scripts/register-lint.sh" + echo "with a one-line rationale." + exit 1 +fi + +# Success: count the allowlisted files for visibility. +allow_count=${#allowlist[@]} +echo "register-lint: no 'func Register*' outside components/ (allowlist: $allow_count file(s))"