ci(makefile): wire bench-allocs-check into ci-full (#424)#442
Conversation
PR #416 shipped `make bench-allocs-check` as the absolute allocs/event ceiling gate but the target was standalone — not invoked from local `make ci-full`. CI's verify-static job already runs `make bench-check` (which transitively runs the same script), so per-PR enforcement was in place — but local `ci-full` divergence surfaced ceiling breaches only post-PR-open, defeating the per-PR enforcement intent of #302. Adds `bench-allocs-check` to `ci-full`'s dependency list. Wall-time on M1 Max grows from ~3.8m to ~4.7m (+~80s for `count=10 × 500ms` across six per-detector benches). PRINCIPLES.md §10's stale ~2.5m budget bumped to ~5m with the dominant steps named. Mutation-tested locally: tightened HBM ceiling 2 → 1, ran `make bench-allocs-check`, gate exit 1 with `BenchmarkHBMECCDetector allocs/event=1.3955 > ceiling=1 BREACH`; reverted; `make ci-full` PASS with cold testcache in 283s. Signed-off-by: Tri Lam <tree@lumalabs.ai>
Independent Adversarial ReviewB/A/A+ Criteria for THIS PRB (Acceptable):
A (Good):
A+ (Exemplary):
FindingsFinding 1: Budget correction (2.5m → 5m) conflates pre-existing drift with new cost Makefile.382 & PRINCIPLES.md: PR body claims "historical ~2.5m budget was already stale (cold-cache Per decision-priority rule 1 (UX/CI throughput pain): This is load-bearing. If 2.5m was aspirational/historical with no enforcement, correction is honest. If it was a taut constraint, the PR quietly relaxes it to fit the new gate without acknowledging the pre-existing problem. Status: Likely legitimate (no enforcement mechanism ever existed for the historical budget), but git-blame on PRINCIPLES.md §10 would confirm. Finding 2: Makefile comment (9 lines) is load-bearing, not noise The 9-line comment block explains why +80s wall-time is acceptable: local pre-PR enforcement > wall-time budget, because local/CI divergence defeats per-PR gate intent (#302). This is genuine architectural reasoning. Comment references issue #424, PRINCIPLES.md §10, and PR #302. Verdict: Clean; no simplification. Finding 3: Audit claims verified PR body audit of other bench targets is accurate:
No over-specification. Finding 4: Follow-up issue NOT filed (deferred ratchet path) PR body: "Will file separately: ratchet path to fold Per policy [[file-issues-for-unaddressed]]: load-bearing findings deferred "out-of-scope" → Recommendation: File issue before merge (e.g., "[follow-up #442] ratchet Simplification SweepMakefile comment and PRINCIPLES.md wording are tight. No redundancy, noise, scaffolding, or dead code. Verdict: clean. VERDICT: A — Functionally correct, wall-time tradeoff justified, deferred ratchet must be filed as issue before merge.✅ Closes #424 (functional gap). |
|
Filed follow-up issue #445 for ci-full wall-time ratchet (per reviewer action). Auto-merge armed. |
Closes #445. ## Root cause The `changes` job in `.github/workflows/ci.yml` has existed since #355 (Mon 2026-06-01) emitting an `outputs.code` boolean that flags docs-only PRs. The accompanying comment claimed `verify-test`, `verify-lint`, and `build` skip on doc-only changes — **but no job actually consumed the output**. Every PR has been running the full ci-full suite (coverage-check, vet+lint, build×2 arches, verify-static, sdk-python) regardless of diff scope. PR #442 (#424) wired `bench-allocs-check` into `make ci-full` and the CI `bench-check` step picked up the +80s standalone cost (+54s effective with warm cache). That made the long-orphaned `changes`-job miswiring visible enough to file as #445, but the underlying defect is older: a path-filter shipped without consumers. ## Change Two outputs on the `changes` job, consumed by `if:` conditions: - **`code`** (true iff any non-`docs/*`-non-`*.md` file changed) gates `verify-test`, `verify-lint`, `build`, `smoke-test-binary`, `sdk-python`. Go gates cannot regress on doc-only edits. - **`bench`** (true iff `module/pkg/patterns/**`, `bench/**`, `scripts/bench-*.sh`, or `Makefile` changed) gates the `bench-check` step *inside* `verify-static`. `bench-allocs-check` cannot regress on a diff that doesn't touch detector source, the bench harness, the bench scripts, or the bench-target wiring. Both filters **fail open** on an empty diff or unreachable base ref, so every push to `main` and every bench-touching PR keeps the gate mandatory. `verify-static` and `validator-recipe` keep running on every PR — they own the doc-touching gates (`doc-check`, `cut-criteria-check`, `slo-rules-check`, recipe-YAML validation under `docs/integrations/examples/`). The `verify` aggregator now treats `result=skipped` as satisfied on the doc-only path for the skippable sub-jobs (verify-test, verify-lint, sdk-python). verify-static + validator-recipe still must be `success` on every PR — encoded in the aggregator, not relying on branch-protection SKIPPED-is-OK semantics. ## Wall-time impact | PR shape | Before | After | Δ | |---|---|---|---| | Docs-only (e.g. #423) | ~283s (full suite) | bounded by `verify-static` minus bench-check step | drops the +54s bench delta + the verify-test 125s pole + the verify-lint 60s pole | | Bench-touching (e.g. #442, this one) | ~283s | ~283s | 0 (gate still runs) | | Non-bench code (e.g. processor change) | ~283s | ~203s | -80s (bench-check skip) | ## TDD (red→green) Mirrored the workflow filter logic in `/tmp/test-changes-filter.sh` (12 cases × code+bench outputs = 24 assertions). Covered: docs-only, detector source, bench-script, bench-registry-via-`bench-*.sh`-glob, Makefile, non-detector go, bench-baseline.txt under testdata, mixed docs+code, empty-diff fail-open, bench/ dir, top-level `.md` (PRINCIPLES.md), workflow-file change. All 24 green. Real-PR validation: - `gh pr diff 423` (docs-only) → `code=false bench=false` (heavy gates + bench-check skip) - `gh pr diff 442` (Makefile + PRINCIPLES.md) → `code=true bench=true` (gates run) ## Verify (green) - `make actionlint` — clean (initial run flagged SC2221/SC2222 on a redundant `scripts/bench-registry.sh` pattern that the `scripts/bench-*.sh` glob already subsumed; removed) - `make lint` — 0 issues - `make ci-fast` — clean (lint, vet, mod-verify, attribute-namespace-check, doc-check, alert-check, chart-appversion-check, rfc-status-check, slo-rules-check) - `make doc-check` — clean - Pre-commit hooks — clean (golangci-lint, vet, mod-verify, attribute-namespace-check, hit-line-format-stable, no-autoupdate-check_test) ## A+ audit notes Audited other ci-full jobs for docs-only skip eligibility: - `validator-recipe` — NOT skipped. Recipe YAMLs live under `docs/integrations/examples/`, so docs-classified diffs can include functional content. Out-of-scope for safe skip. - `verify-static` — runs always. Contains `doc-check`, `cut-criteria-check`, `slo-rules-check`, `actionlint`, `zizmor`, `register-lint`, `deprecation-check` — all can regress on docs. Step-level skip applied only to the `install benchstat` + `bench-check` pair via `changes.outputs.bench`. - `validator-recipe` and `smoke-test-binary` not eligible for `bench` output (don't touch detector code). PRINCIPLES.md §10 updated with the path-filter policy (which jobs skip on docs-only, why verify-static stays, fail-open semantics) so the next reader can see the policy without grepping the workflow. ```release-notes - CI bench-check step now skips on PRs that don't touch the detector tree, the bench harness, the bench scripts, or the Makefile. Heavy Go gates (verify-test, verify-lint, build, smoke-test-binary, sdk-python) skip on docs-only PRs. The verify aggregator treats SKIPPED-on-docs-only as success; verify-static and validator-recipe still run on every PR. Closes #445. ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>
Closes #424.
Root cause
PR #416 shipped
make bench-allocs-checkas the absolute allocs/event ceiling gate and wired it intomake bench-check(so CI'sverify-staticjob runs it). It was not wired intomake ci-full. Result: local pre-PRmake ci-fulldiverges from what CI enforces — a per-event allocs ceiling breach passes locally and only fails after the PR is opened, defeating the per-PR enforcement intent of #302.(The issue's framing claims per-PR enforcement is entirely missing; in fact CI's verify-static already enforces it via the
bench-checkstep. The actual gap is local/CI divergence, which is what this PR closes.)Change
One-line dep addition:
bench-allocs-checkappended toci-full's prerequisite list. Plus comment + PRINCIPLES.md §10 update naming the new wall-time floor.Wall-time impact (M1 Max)
make ci-full(cold testcache)make bench-allocs-checkstandaloneDelta: +54s (effective, with warm Go build cache from prior
ci-fullsteps; cold-only standalone bench is ~80s).PRINCIPLES.md §10's historical "~2.5m" budget was already stale (cold-cache
ci-fullmeasured 229s = 3.8m before this PR). Updated to ~5m and named the dominant steps (coverage-check,build,bench-allocs-check).Mutation test (TDD red)
Tightened HBM ceiling 2 → 1 in
scripts/bench-registry.sh, ranmake bench-allocs-check:Reverted. Confirmed
make -n ci-full | grep bench-allocs-checkresolves toscripts/bench-allocs-check.sh(i.e., the new dep is inci-full's graph).Verify (green)
make ci-full(cold testcache) → PASS in 283.54s; tail shows all six detectors at-or-below ceiling.make lint,make vet, pre-commit hooks → green.Audit (A+ criterion)
Other bench targets and CI status:
bench-check(composite: % delta + allocs ceiling) — CI's verify-static runs it; localci-fullnow runs the allocs-ceiling half. The % delta half (bench-check-all.sh) is not added toci-fullbecause it requiresbenchstat(CI installs it; local devs would need a manualgo install). Acceptable: CI is the ground truth; local divergence on the % delta gate is acknowledged.bench-detectors-check— intentionally SOFT gate (script comment), runs in.github/workflows/bench.ymlas a PR-comment annotation. Graduates to hard-fail perbench/detectors/README.mdroadmap. No action.bench,bench-detectors,bench-baseline,bench-detectors-baseline— manual exploration / regen helpers. Not gates. No action.Why not
verify(pre-push)?verifyis documented as<30s; +80s busts the budget 3x.Follow-up
Will file separately: ratchet path to fold
bench-allocs-checkwall-time down (fasterbenchtimeorcount) so the gate eventually folds back under the historical 2.5m local budget. Tracked-against-#424 in the issue's existing "Cost" line.