Skip to content

ci(makefile): wire bench-allocs-check into ci-full (#424)#442

Merged
trilamsr merged 1 commit into
mainfrom
ci/424-wire-bench-allocs-check
Jun 2, 2026
Merged

ci(makefile): wire bench-allocs-check into ci-full (#424)#442
trilamsr merged 1 commit into
mainfrom
ci/424-wire-bench-allocs-check

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Closes #424.

Root cause

PR #416 shipped make bench-allocs-check as the absolute allocs/event ceiling gate and wired it into make bench-check (so CI's verify-static job runs it). It was not wired into make ci-full. Result: local pre-PR make ci-full diverges from what CI enforces — a per-event allocs ceiling breach passes locally and only fails after the PR is opened, defeating the per-PR enforcement intent of #302.

(The issue's framing claims per-PR enforcement is entirely missing; in fact CI's verify-static already enforces it via the bench-check step. The actual gap is local/CI divergence, which is what this PR closes.)

Change

One-line dep addition: bench-allocs-check appended to ci-full's prerequisite list. Plus comment + PRINCIPLES.md §10 update naming the new wall-time floor.

-ci-full: ... build smoke-quickstart  ## ...
+ci-full: ... build smoke-quickstart bench-allocs-check  ## ...

Wall-time impact (M1 Max)

Run Before After
make ci-full (cold testcache) 229s 283s
make bench-allocs-check standalone n/a ~80s

Delta: +54s (effective, with warm Go build cache from prior ci-full steps; cold-only standalone bench is ~80s).

PRINCIPLES.md §10's historical "~2.5m" budget was already stale (cold-cache ci-full measured 229s = 3.8m before this PR). Updated to ~5m and named the dominant steps (coverage-check, build, bench-allocs-check).

Mutation test (TDD red)

Tightened HBM ceiling 2 → 1 in scripts/bench-registry.sh, ran make bench-allocs-check:

BenchmarkHBMECCDetector ceiling=1 allocs/op=1429 allocs/ev=1.3955 BREACH
FAIL: the following detectors exceeded their allocs/event ceiling:
  BenchmarkHBMECCDetector allocs/event=1.3955 > ceiling=1 (allocs/op=1429, window=1024)
make: *** [bench-allocs-check] Error 1

Reverted. Confirmed make -n ci-full | grep bench-allocs-check resolves to scripts/bench-allocs-check.sh (i.e., the new dep is in ci-full's graph).

Verify (green)

  • make ci-full (cold testcache) → PASS in 283.54s; tail shows all six detectors at-or-below ceiling.
  • make lint, make vet, pre-commit hooks → green.

Audit (A+ criterion)

Other bench targets and CI status:

  • bench-check (composite: % delta + allocs ceiling) — CI's verify-static runs it; local ci-full now runs the allocs-ceiling half. The % delta half (bench-check-all.sh) is not added to ci-full because it requires benchstat (CI installs it; local devs would need a manual go install). Acceptable: CI is the ground truth; local divergence on the % delta gate is acknowledged.
  • bench-detectors-check — intentionally SOFT gate (script comment), runs in .github/workflows/bench.yml as a PR-comment annotation. Graduates to hard-fail per bench/detectors/README.md roadmap. No action.
  • bench, bench-detectors, bench-baseline, bench-detectors-baseline — manual exploration / regen helpers. Not gates. No action.

Why not verify (pre-push)? verify is documented as <30s; +80s busts the budget 3x.

Follow-up

Will file separately: ratchet path to fold bench-allocs-check wall-time down (faster benchtime or count) so the gate eventually folds back under the historical 2.5m local budget. Tracked-against-#424 in the issue's existing "Cost" line.

- `make ci-full` now invokes `bench-allocs-check`, closing the local/CI gap that let per-event allocs-ceiling breaches pass local pre-PR but fail in CI (#424).

PR #416 shipped `make bench-allocs-check` as the absolute allocs/event
ceiling gate but the target was standalone — not invoked from local
`make ci-full`. CI's verify-static job already runs `make bench-check`
(which transitively runs the same script), so per-PR enforcement was
in place — but local `ci-full` divergence surfaced ceiling breaches
only post-PR-open, defeating the per-PR enforcement intent of #302.

Adds `bench-allocs-check` to `ci-full`'s dependency list. Wall-time
on M1 Max grows from ~3.8m to ~4.7m (+~80s for `count=10 × 500ms`
across six per-detector benches). PRINCIPLES.md §10's stale ~2.5m
budget bumped to ~5m with the dominant steps named.

Mutation-tested locally: tightened HBM ceiling 2 → 1, ran
`make bench-allocs-check`, gate exit 1 with
`BenchmarkHBMECCDetector allocs/event=1.3955 > ceiling=1 BREACH`;
reverted; `make ci-full` PASS with cold testcache in 283s.

Signed-off-by: Tri Lam <tree@lumalabs.ai>
@trilamsr

trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Independent Adversarial Review

B/A/A+ Criteria for THIS PR

B (Acceptable):

A (Good):

  • All of B, PLUS:
  • Mutation test (TDD red/green cycle) confirms gate catches ceiling breaches
  • Explicit wall-time delta measured: +54s effective on M1 Max (80s standalone bench, overlapped with build cache)
  • Audit verifies other bench targets (bench-check, bench-detectors-check) not added (with rationale)
  • Follow-up acknowledged: ratchet path to fold wall-time down

A+ (Exemplary):

  • All of A, PLUS:
  • No simplification opportunities AND deferred follow-up filed as issue before merge

Findings

Finding 1: Budget correction (2.5m → 5m) conflates pre-existing drift with new cost

Makefile.382 & PRINCIPLES.md: PR body claims "historical ~2.5m budget was already stale (cold-cache ci-full measured 229s = 3.8m before this PR)." This is internally incoherent: 229s is higher than 2.5m (3m49s > 2.5m), not "stale" due to drift-downward. The real story: budget was set at commit 8b98e69 (~362) and unchanged for ~6mo; CI ran slower than budgeted. This PR updates budget to reality (~5m) but doesn't cleanly separate: (a) pre-existing +90s drift, vs (b) new +54s from this change.

Per decision-priority rule 1 (UX/CI throughput pain): This is load-bearing. If 2.5m was aspirational/historical with no enforcement, correction is honest. If it was a taut constraint, the PR quietly relaxes it to fit the new gate without acknowledging the pre-existing problem.

Status: Likely legitimate (no enforcement mechanism ever existed for the historical budget), but git-blame on PRINCIPLES.md §10 would confirm.


Finding 2: Makefile comment (9 lines) is load-bearing, not noise

The 9-line comment block explains why +80s wall-time is acceptable: local pre-PR enforcement > wall-time budget, because local/CI divergence defeats per-PR gate intent (#302). This is genuine architectural reasoning. Comment references issue #424, PRINCIPLES.md §10, and PR #302.

Verdict: Clean; no simplification.


Finding 3: Audit claims verified

PR body audit of other bench targets is accurate:

  • bench-check (composite: % delta + allocs ceiling) — CI runs it; local now runs allocs-ceiling half only ✓
  • bench-detectors-check — soft gate, PR-comment annotation only ✓
  • Other bench-* — manual exploration only ✓

No over-specification.


Finding 4: Follow-up issue NOT filed (deferred ratchet path)

PR body: "Will file separately: ratchet path to fold bench-allocs-check wall-time down…" Builder self-graded A not A+, honestly.

Per policy [[file-issues-for-unaddressed]]: load-bearing findings deferred "out-of-scope" → gh issue create BEFORE merge. The +54s wall-time ratchet is load-bearing and deferred.

Recommendation: File issue before merge (e.g., "[follow-up #442] ratchet bench-allocs-check wall-time down: faster benchtime/count or alternative gate").


Simplification Sweep

Makefile comment and PRINCIPLES.md wording are tight. No redundancy, noise, scaffolding, or dead code. Verdict: clean.


VERDICT: A — Functionally correct, wall-time tradeoff justified, deferred ratchet must be filed as issue before merge.

✅ Closes #424 (functional gap).
✅ Mutation test confirms gate works (TDD met).
✅ Wall-time impact measured.
✅ No over-specification.
✅ Comment is load-bearing.
⚠️ Deferred ratchet path issue NOT filed → file before merge per policy.
❓ Budget correction (2.5m → 5m) likely legitimate but unconfirmed; git-blame §10 would clarify.

@trilamsr

trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Filed follow-up issue #445 for ci-full wall-time ratchet (per reviewer action). Auto-merge armed.

@trilamsr trilamsr merged commit 531515f into main Jun 2, 2026
12 checks passed
@trilamsr trilamsr deleted the ci/424-wire-bench-allocs-check branch June 2, 2026 01:36
trilamsr added a commit that referenced this pull request Jun 2, 2026
Closes #445.

## Root cause

The `changes` job in `.github/workflows/ci.yml` has existed since #355
(Mon 2026-06-01) emitting an `outputs.code` boolean that flags docs-only
PRs. The accompanying comment claimed `verify-test`, `verify-lint`, and
`build` skip on doc-only changes — **but no job actually consumed the
output**. Every PR has been running the full ci-full suite
(coverage-check, vet+lint, build×2 arches, verify-static, sdk-python)
regardless of diff scope.

PR #442 (#424) wired `bench-allocs-check` into `make ci-full` and the CI
`bench-check` step picked up the +80s standalone cost (+54s effective
with warm cache). That made the long-orphaned `changes`-job miswiring
visible enough to file as #445, but the underlying defect is older: a
path-filter shipped without consumers.

## Change

Two outputs on the `changes` job, consumed by `if:` conditions:

- **`code`** (true iff any non-`docs/*`-non-`*.md` file changed) gates
`verify-test`, `verify-lint`, `build`, `smoke-test-binary`,
`sdk-python`. Go gates cannot regress on doc-only edits.
- **`bench`** (true iff `module/pkg/patterns/**`, `bench/**`,
`scripts/bench-*.sh`, or `Makefile` changed) gates the `bench-check`
step *inside* `verify-static`. `bench-allocs-check` cannot regress on a
diff that doesn't touch detector source, the bench harness, the bench
scripts, or the bench-target wiring.

Both filters **fail open** on an empty diff or unreachable base ref, so
every push to `main` and every bench-touching PR keeps the gate
mandatory.

`verify-static` and `validator-recipe` keep running on every PR — they
own the doc-touching gates (`doc-check`, `cut-criteria-check`,
`slo-rules-check`, recipe-YAML validation under
`docs/integrations/examples/`).

The `verify` aggregator now treats `result=skipped` as satisfied on the
doc-only path for the skippable sub-jobs (verify-test, verify-lint,
sdk-python). verify-static + validator-recipe still must be `success` on
every PR — encoded in the aggregator, not relying on branch-protection
SKIPPED-is-OK semantics.

## Wall-time impact

| PR shape | Before | After | Δ |
|---|---|---|---|
| Docs-only (e.g. #423) | ~283s (full suite) | bounded by
`verify-static` minus bench-check step | drops the +54s bench delta +
the verify-test 125s pole + the verify-lint 60s pole |
| Bench-touching (e.g. #442, this one) | ~283s | ~283s | 0 (gate still
runs) |
| Non-bench code (e.g. processor change) | ~283s | ~203s | -80s
(bench-check skip) |

## TDD (red→green)

Mirrored the workflow filter logic in `/tmp/test-changes-filter.sh` (12
cases × code+bench outputs = 24 assertions). Covered: docs-only,
detector source, bench-script, bench-registry-via-`bench-*.sh`-glob,
Makefile, non-detector go, bench-baseline.txt under testdata, mixed
docs+code, empty-diff fail-open, bench/ dir, top-level `.md`
(PRINCIPLES.md), workflow-file change. All 24 green.

Real-PR validation:
- `gh pr diff 423` (docs-only) → `code=false bench=false` (heavy gates +
bench-check skip)
- `gh pr diff 442` (Makefile + PRINCIPLES.md) → `code=true bench=true`
(gates run)

## Verify (green)

- `make actionlint` — clean (initial run flagged SC2221/SC2222 on a
redundant `scripts/bench-registry.sh` pattern that the
`scripts/bench-*.sh` glob already subsumed; removed)
- `make lint` — 0 issues
- `make ci-fast` — clean (lint, vet, mod-verify,
attribute-namespace-check, doc-check, alert-check,
chart-appversion-check, rfc-status-check, slo-rules-check)
- `make doc-check` — clean
- Pre-commit hooks — clean (golangci-lint, vet, mod-verify,
attribute-namespace-check, hit-line-format-stable,
no-autoupdate-check_test)

## A+ audit notes

Audited other ci-full jobs for docs-only skip eligibility:
- `validator-recipe` — NOT skipped. Recipe YAMLs live under
`docs/integrations/examples/`, so docs-classified diffs can include
functional content. Out-of-scope for safe skip.
- `verify-static` — runs always. Contains `doc-check`,
`cut-criteria-check`, `slo-rules-check`, `actionlint`, `zizmor`,
`register-lint`, `deprecation-check` — all can regress on docs.
Step-level skip applied only to the `install benchstat` + `bench-check`
pair via `changes.outputs.bench`.
- `validator-recipe` and `smoke-test-binary` not eligible for `bench`
output (don't touch detector code).

PRINCIPLES.md §10 updated with the path-filter policy (which jobs skip
on docs-only, why verify-static stays, fail-open semantics) so the next
reader can see the policy without grepping the workflow.

```release-notes
- CI bench-check step now skips on PRs that don't touch the detector tree, the bench harness, the bench scripts, or the Makefile. Heavy Go gates (verify-test, verify-lint, build, smoke-test-binary, sdk-python) skip on docs-only PRs. The verify aggregator treats SKIPPED-on-docs-only as success; verify-static and validator-recipe still run on every PR. Closes #445.
```

Signed-off-by: Tri Lam <tree@lumalabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[follow-up #416] wire bench-allocs-check into ci-full / verify

1 participant