ci(makefile): wire bench-allocs-check into ci-full (#424) by trilamsr · Pull Request #442 · TraceCoreAI/tracecore

trilamsr · 2026-06-02T01:28:24Z

Closes #424.

Root cause

PR #416 shipped make bench-allocs-check as the absolute allocs/event ceiling gate and wired it into make bench-check (so CI's verify-static job runs it). It was not wired into make ci-full. Result: local pre-PR make ci-full diverges from what CI enforces — a per-event allocs ceiling breach passes locally and only fails after the PR is opened, defeating the per-PR enforcement intent of #302.

(The issue's framing claims per-PR enforcement is entirely missing; in fact CI's verify-static already enforces it via the bench-check step. The actual gap is local/CI divergence, which is what this PR closes.)

Change

One-line dep addition: bench-allocs-check appended to ci-full's prerequisite list. Plus comment + PRINCIPLES.md §10 update naming the new wall-time floor.

-ci-full: ... build smoke-quickstart  ## ...
+ci-full: ... build smoke-quickstart bench-allocs-check  ## ...

Wall-time impact (M1 Max)

Run	Before	After
`make ci-full` (cold testcache)	229s	283s
`make bench-allocs-check` standalone	n/a	~80s

Delta: +54s (effective, with warm Go build cache from prior ci-full steps; cold-only standalone bench is ~80s).

PRINCIPLES.md §10's historical "~2.5m" budget was already stale (cold-cache ci-full measured 229s = 3.8m before this PR). Updated to ~5m and named the dominant steps (coverage-check, build, bench-allocs-check).

Mutation test (TDD red)

Tightened HBM ceiling 2 → 1 in scripts/bench-registry.sh, ran make bench-allocs-check:

BenchmarkHBMECCDetector ceiling=1 allocs/op=1429 allocs/ev=1.3955 BREACH
FAIL: the following detectors exceeded their allocs/event ceiling:
  BenchmarkHBMECCDetector allocs/event=1.3955 > ceiling=1 (allocs/op=1429, window=1024)
make: *** [bench-allocs-check] Error 1

Reverted. Confirmed make -n ci-full | grep bench-allocs-check resolves to scripts/bench-allocs-check.sh (i.e., the new dep is in ci-full's graph).

Verify (green)

make ci-full (cold testcache) → PASS in 283.54s; tail shows all six detectors at-or-below ceiling.
make lint, make vet, pre-commit hooks → green.

Audit (A+ criterion)

Other bench targets and CI status:

bench-check (composite: % delta + allocs ceiling) — CI's verify-static runs it; local ci-full now runs the allocs-ceiling half. The % delta half (bench-check-all.sh) is not added to ci-full because it requires benchstat (CI installs it; local devs would need a manual go install). Acceptable: CI is the ground truth; local divergence on the % delta gate is acknowledged.
bench-detectors-check — intentionally SOFT gate (script comment), runs in .github/workflows/bench.yml as a PR-comment annotation. Graduates to hard-fail per bench/detectors/README.md roadmap. No action.
bench, bench-detectors, bench-baseline, bench-detectors-baseline — manual exploration / regen helpers. Not gates. No action.

Why not verify (pre-push)? verify is documented as <30s; +80s busts the budget 3x.

Follow-up

Will file separately: ratchet path to fold bench-allocs-check wall-time down (faster benchtime or count) so the gate eventually folds back under the historical 2.5m local budget. Tracked-against-#424 in the issue's existing "Cost" line.

- `make ci-full` now invokes `bench-allocs-check`, closing the local/CI gap that let per-event allocs-ceiling breaches pass local pre-PR but fail in CI (#424).

PR #416 shipped `make bench-allocs-check` as the absolute allocs/event ceiling gate but the target was standalone — not invoked from local `make ci-full`. CI's verify-static job already runs `make bench-check` (which transitively runs the same script), so per-PR enforcement was in place — but local `ci-full` divergence surfaced ceiling breaches only post-PR-open, defeating the per-PR enforcement intent of #302. Adds `bench-allocs-check` to `ci-full`'s dependency list. Wall-time on M1 Max grows from ~3.8m to ~4.7m (+~80s for `count=10 × 500ms` across six per-detector benches). PRINCIPLES.md §10's stale ~2.5m budget bumped to ~5m with the dominant steps named. Mutation-tested locally: tightened HBM ceiling 2 → 1, ran `make bench-allocs-check`, gate exit 1 with `BenchmarkHBMECCDetector allocs/event=1.3955 > ceiling=1 BREACH`; reverted; `make ci-full` PASS with cold testcache in 283s. Signed-off-by: Tri Lam <tree@lumalabs.ai>

trilamsr · 2026-06-02T01:31:44Z

Independent Adversarial Review

B/A/A+ Criteria for THIS PR

B (Acceptable):

Closes issue [follow-up #416] wire bench-allocs-check into ci-full / verify #424 (local/CI divergence on per-event allocs ceiling enforcement)
One-line mechanical Makefile change wires bench-allocs-check into ci-full
Comment and PRINCIPLES.md update acknowledge the wall-time impact and new budget floor

A (Good):

All of B, PLUS:
Mutation test (TDD red/green cycle) confirms gate catches ceiling breaches
Explicit wall-time delta measured: +54s effective on M1 Max (80s standalone bench, overlapped with build cache)
Audit verifies other bench targets (bench-check, bench-detectors-check) not added (with rationale)
Follow-up acknowledged: ratchet path to fold wall-time down

A+ (Exemplary):

All of A, PLUS:
No simplification opportunities AND deferred follow-up filed as issue before merge

Findings

Finding 1: Budget correction (2.5m → 5m) conflates pre-existing drift with new cost

Makefile.382 & PRINCIPLES.md: PR body claims "historical ~2.5m budget was already stale (cold-cache ci-full measured 229s = 3.8m before this PR)." This is internally incoherent: 229s is higher than 2.5m (3m49s > 2.5m), not "stale" due to drift-downward. The real story: budget was set at commit 8b98e69 (~362) and unchanged for ~6mo; CI ran slower than budgeted. This PR updates budget to reality (~5m) but doesn't cleanly separate: (a) pre-existing +90s drift, vs (b) new +54s from this change.

Per decision-priority rule 1 (UX/CI throughput pain): This is load-bearing. If 2.5m was aspirational/historical with no enforcement, correction is honest. If it was a taut constraint, the PR quietly relaxes it to fit the new gate without acknowledging the pre-existing problem.

Status: Likely legitimate (no enforcement mechanism ever existed for the historical budget), but git-blame on PRINCIPLES.md §10 would confirm.

Finding 2: Makefile comment (9 lines) is load-bearing, not noise

The 9-line comment block explains why +80s wall-time is acceptable: local pre-PR enforcement > wall-time budget, because local/CI divergence defeats per-PR gate intent (#302). This is genuine architectural reasoning. Comment references issue #424, PRINCIPLES.md §10, and PR #302.

Verdict: Clean; no simplification.

Finding 3: Audit claims verified

PR body audit of other bench targets is accurate:

bench-check (composite: % delta + allocs ceiling) — CI runs it; local now runs allocs-ceiling half only ✓
bench-detectors-check — soft gate, PR-comment annotation only ✓
Other bench-* — manual exploration only ✓

No over-specification.

Finding 4: Follow-up issue NOT filed (deferred ratchet path)

PR body: "Will file separately: ratchet path to fold bench-allocs-check wall-time down…" Builder self-graded A not A+, honestly.

Per policy [[file-issues-for-unaddressed]]: load-bearing findings deferred "out-of-scope" → gh issue create BEFORE merge. The +54s wall-time ratchet is load-bearing and deferred.

Recommendation: File issue before merge (e.g., "[follow-up #442] ratchet bench-allocs-check wall-time down: faster benchtime/count or alternative gate").

Simplification Sweep

Makefile comment and PRINCIPLES.md wording are tight. No redundancy, noise, scaffolding, or dead code. Verdict: clean.

VERDICT: A — Functionally correct, wall-time tradeoff justified, deferred ratchet must be filed as issue before merge.

✅ Closes #424 (functional gap).
✅ Mutation test confirms gate works (TDD met).
✅ Wall-time impact measured.
✅ No over-specification.
✅ Comment is load-bearing.
⚠️ Deferred ratchet path issue NOT filed → file before merge per policy.
❓ Budget correction (2.5m → 5m) likely legitimate but unconfirmed; git-blame §10 would clarify.

trilamsr · 2026-06-02T01:32:35Z

Filed follow-up issue #445 for ci-full wall-time ratchet (per reviewer action). Auto-merge armed.

Closes #445. ## Root cause The `changes` job in `.github/workflows/ci.yml` has existed since #355 (Mon 2026-06-01) emitting an `outputs.code` boolean that flags docs-only PRs. The accompanying comment claimed `verify-test`, `verify-lint`, and `build` skip on doc-only changes — **but no job actually consumed the output**. Every PR has been running the full ci-full suite (coverage-check, vet+lint, build×2 arches, verify-static, sdk-python) regardless of diff scope. PR #442 (#424) wired `bench-allocs-check` into `make ci-full` and the CI `bench-check` step picked up the +80s standalone cost (+54s effective with warm cache). That made the long-orphaned `changes`-job miswiring visible enough to file as #445, but the underlying defect is older: a path-filter shipped without consumers. ## Change Two outputs on the `changes` job, consumed by `if:` conditions: - **`code`** (true iff any non-`docs/*`-non-`*.md` file changed) gates `verify-test`, `verify-lint`, `build`, `smoke-test-binary`, `sdk-python`. Go gates cannot regress on doc-only edits. - **`bench`** (true iff `module/pkg/patterns/**`, `bench/**`, `scripts/bench-*.sh`, or `Makefile` changed) gates the `bench-check` step *inside* `verify-static`. `bench-allocs-check` cannot regress on a diff that doesn't touch detector source, the bench harness, the bench scripts, or the bench-target wiring. Both filters **fail open** on an empty diff or unreachable base ref, so every push to `main` and every bench-touching PR keeps the gate mandatory. `verify-static` and `validator-recipe` keep running on every PR — they own the doc-touching gates (`doc-check`, `cut-criteria-check`, `slo-rules-check`, recipe-YAML validation under `docs/integrations/examples/`). The `verify` aggregator now treats `result=skipped` as satisfied on the doc-only path for the skippable sub-jobs (verify-test, verify-lint, sdk-python). verify-static + validator-recipe still must be `success` on every PR — encoded in the aggregator, not relying on branch-protection SKIPPED-is-OK semantics. ## Wall-time impact | PR shape | Before | After | Δ | |---|---|---|---| | Docs-only (e.g. #423) | ~283s (full suite) | bounded by `verify-static` minus bench-check step | drops the +54s bench delta + the verify-test 125s pole + the verify-lint 60s pole | | Bench-touching (e.g. #442, this one) | ~283s | ~283s | 0 (gate still runs) | | Non-bench code (e.g. processor change) | ~283s | ~203s | -80s (bench-check skip) | ## TDD (red→green) Mirrored the workflow filter logic in `/tmp/test-changes-filter.sh` (12 cases × code+bench outputs = 24 assertions). Covered: docs-only, detector source, bench-script, bench-registry-via-`bench-*.sh`-glob, Makefile, non-detector go, bench-baseline.txt under testdata, mixed docs+code, empty-diff fail-open, bench/ dir, top-level `.md` (PRINCIPLES.md), workflow-file change. All 24 green. Real-PR validation: - `gh pr diff 423` (docs-only) → `code=false bench=false` (heavy gates + bench-check skip) - `gh pr diff 442` (Makefile + PRINCIPLES.md) → `code=true bench=true` (gates run) ## Verify (green) - `make actionlint` — clean (initial run flagged SC2221/SC2222 on a redundant `scripts/bench-registry.sh` pattern that the `scripts/bench-*.sh` glob already subsumed; removed) - `make lint` — 0 issues - `make ci-fast` — clean (lint, vet, mod-verify, attribute-namespace-check, doc-check, alert-check, chart-appversion-check, rfc-status-check, slo-rules-check) - `make doc-check` — clean - Pre-commit hooks — clean (golangci-lint, vet, mod-verify, attribute-namespace-check, hit-line-format-stable, no-autoupdate-check_test) ## A+ audit notes Audited other ci-full jobs for docs-only skip eligibility: - `validator-recipe` — NOT skipped. Recipe YAMLs live under `docs/integrations/examples/`, so docs-classified diffs can include functional content. Out-of-scope for safe skip. - `verify-static` — runs always. Contains `doc-check`, `cut-criteria-check`, `slo-rules-check`, `actionlint`, `zizmor`, `register-lint`, `deprecation-check` — all can regress on docs. Step-level skip applied only to the `install benchstat` + `bench-check` pair via `changes.outputs.bench`. - `validator-recipe` and `smoke-test-binary` not eligible for `bench` output (don't touch detector code). PRINCIPLES.md §10 updated with the path-filter policy (which jobs skip on docs-only, why verify-static stays, fail-open semantics) so the next reader can see the policy without grepping the workflow. ```release-notes - CI bench-check step now skips on PRs that don't touch the detector tree, the bench harness, the bench scripts, or the Makefile. Heavy Go gates (verify-test, verify-lint, build, smoke-test-binary, sdk-python) skip on docs-only PRs. The verify aggregator treats SKIPPED-on-docs-only as success; verify-static and validator-recipe still run on every PR. Closes #445. ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>

trilamsr mentioned this pull request Jun 2, 2026

perf(ci): ratchet ci-full wall-time back under 5m budget #445

Closed

trilamsr enabled auto-merge (squash) June 2, 2026 01:32

trilamsr merged commit 531515f into main Jun 2, 2026
12 checks passed

trilamsr deleted the ci/424-wire-bench-allocs-check branch June 2, 2026 01:36

trilamsr mentioned this pull request Jun 2, 2026

perf(ci): skip bench + heavy gates on docs-only PRs (#445) #452

Merged

trilamsr mentioned this pull request Jun 2, 2026

audit(wave-2026-06-02): autonomous-wave cross-cut review #488

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(makefile): wire bench-allocs-check into ci-full (#424)#442

ci(makefile): wire bench-allocs-check into ci-full (#424)#442
trilamsr merged 1 commit into
mainfrom
ci/424-wire-bench-allocs-check

trilamsr commented Jun 2, 2026

Uh oh!

trilamsr commented Jun 2, 2026

Uh oh!

trilamsr commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

trilamsr commented Jun 2, 2026

Root cause

Change

Wall-time impact (M1 Max)

Mutation test (TDD red)

Verify (green)

Audit (A+ criterion)

Follow-up

Uh oh!

trilamsr commented Jun 2, 2026

Independent Adversarial Review

B/A/A+ Criteria for THIS PR

Findings

Simplification Sweep

VERDICT: A — Functionally correct, wall-time tradeoff justified, deferred ratchet must be filed as issue before merge.

Uh oh!

trilamsr commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant