[ci] split verify into 3 parallel jobs + aggregator (~7m -> ~2:45m)#72
Merged
Conversation
The verify job ran 16 steps sequentially for 427s wall time on PR #70. Split into three parallel jobs grouped by the longest pole: - verify-test: coverage-check (race tests) ~125s - verify-lint: vet + lint ~155s - verify-static: license/generate/build-tags/tidy/register-lint/ nccl-fr RCE/actionlint/zizmor/fuzz/govulncheck/ doc-check/build ~140s A final aggregator job `verify` declares `needs: [verify-test, verify-lint, verify-static]` and echoes success. Branch protection already requires `verify`; keeping the name means zero protection churn while the three sub-jobs surface as informational rows. Wall time drops from ~7m to ~2:45m (bounded by verify-lint). Runner-minute cost is ~3.3x current — each job pays setup-go cache. `make ci` is unchanged — still sequential developer convenience. Signed-off-by: Tri Lam <trilamsr@gmail.com>
Adds one sentence to the partition comment block so a contributor adding a new lint/test/scanner gate knows the default bucket (verify-static) and the rebalance trigger (when it pushes verify-static past the verify-lint pole). Surfaced during review of the parallelization change. Signed-off-by: Tri Lam <trilamsr@gmail.com>
6 tasks
trilamsr
added a commit
that referenced
this pull request
May 19, 2026
## Summary Two root-cause fixes plus two compressions to the `pr-review-loop` skill. **Drops the ralph-loop plugin dependency.** The old path passed the full ~17 KB prompt body through `/ralph-loop:ralph-loop` as `$ARGUMENTS`, which Claude Code's permission classifier aborted with "parser timeout, resource limit, or over-length" on real-world invocation (caught when I tried to invoke the skill on PR #72). The documented manual-paste fallback hits the same classifier limit and was itself a dead end. The skill now orchestrates the 5 phases directly in the current session via `Agent` tool dispatches; no plugin invocation. **Adds a diff-size + path precondition.** Under 200 changed lines AND all paths in `*.md` / `.github/**` / `docs/**` / `.claude/**` auto-offers a single-subagent review instead of paying for ~14 subagent dispatches and 5 review-pass commits. The skill's own "When to run / Skip" guidance already named these as the wrong fit; the gate now enforces it. Operator opts in explicitly to keep the heavy loop. **Cuts.** The two-form `M<X>` / `(no-milestone)` substitution rule (resolved in the precondition step now, not in the prompt body). The enumerated MEMORY.md slug list (slugs are already in context via auto-memory). The reviewer-brief's motivational prose. The duplicate stopping-rule block (merged with acceptance into one canonical checklist). **Keeps.** All 5 phase definitions (Phase 1 self → Phase 2 stakeholder-lens parallel → Phase 3 adversarial → Phase 4 A+ aspiration → Phase 5 simplify + PR title/body sync). Validation cycle. TDD discipline. Reviewer-brief output format. Rubric evolution flow. Final-report format. Readiness-audit gate before completion promise. Net diff: -131 lines (-347/+216), 670 → 538 line file. The bigger change is structural: the skill is now self-contained instead of delegating to a third-party plugin. ## Test plan - [ ] Skim the precondition block — `#4` correctly auto-routes small CI/docs/`.claude/` PRs to a single-subagent path and only continues to 5 phases on explicit operator confirmation. - [ ] Skim "How the loop runs" — direct orchestration is clear, no leftover ralph-loop invocation language. - [ ] Spot-check that all 5 phase definitions are intact and reference the same `<reviewer-brief>`. - [ ] Confirm the merged acceptance/stopping-rule block still gates the completion promise on every prior criterion (PR title sync, AI-vocab clean, `make ci` green, `gh pr checks` green, DCO trailers, no `main` commits). - [ ] `make doc-check` clean on this branch (banned-phrase lint, link resolution, no growth in `(unverified)` count). - [ ] On the next `/pr-review-loop` invocation against a real PR, verify: (a) milestone auto-resolves from the branch name; (b) small/low-risk diff triggers the single-subagent prompt; (c) explicit override starts the 5 phases without trying to invoke `/ralph-loop:ralph-loop`. ## Out of scope Two items from the optimization sketch were not in this PR's ask and remain as follow-ups: dropping the rubric-evolution machinery if it isn't load-bearing in practice, and a lightweight sibling skill (`/pr-review`) for the everyday case. Signed-off-by: Tri Lam <trilamsr@gmail.com>
4 tasks
trilamsr
added a commit
that referenced
this pull request
May 19, 2026
## Summary PR #72's verify split introduced a silent branch-protection bypass. The aggregator job uses default `needs:` semantics, so any sub-job failure (`verify-test`, `verify-lint`, or `verify-static`) results in the aggregator being SKIPPED. GitHub treats SKIPPED required checks as satisfied, which means any failing sub-job allowed the PR to merge as if all checks passed. Caught on PR #73, which merged at 05:58:35Z despite `verify-test` failing at 05:57:19Z. The required check `verify` showed as SKIPPED rather than FAILURE, and protection treated the SKIPPED state as a pass. ## Fix The aggregator now runs `if: always()` (so it runs even when sub-jobs fail) and explicitly inspects each `needs.<sub-job>.result`, exiting non-zero when any sub-job did not return `success`. Each non-success result emits a `::error::` annotation naming the offending sub-job for operator debuggability. ```yaml verify: needs: [verify-test, verify-lint, verify-static] if: always() steps: - run: | fail=0 if [ "${{ needs.verify-test.result }}" != "success" ]; then ... ... [ "$fail" -eq 1 ] && exit 1 echo "all verify-* gates passed" ``` ## Why it matters Without this fix, the entire branch protection on `main` is effectively trivial — any flaky sub-job silently lets a merge through. The hole was open for the window between PR #72 (merged ~05:36Z) and this fix. ## Test plan - [ ] `make actionlint` clean ✅ - [ ] `make zizmor` clean ✅ (no findings, suppressed count went up because `${{ needs.*.result }}` is correctly recognized as a safe template expression). - [ ] CI on this PR exercises the new aggregator path: all three sub-jobs run, aggregator inspects results. If all pass, aggregator passes. If any fails, aggregator will FAIL (the previous behavior would have skipped here). - [ ] After merge, the next PR that hits a sub-job failure (e.g., a flaky test) will correctly block on `verify` reporting FAILURE. ## Out of scope The `TestReceiver_SLIBudget` flake observed on PR #73 (`p99 539ms > 500ms target` under race-on runner contention) is unrelated to this fix and worth a separate follow-up — either widen the budget under race or skip the SLI assertion in race mode. Signed-off-by: Tri Lam <trilamsr@gmail.com>
2 tasks
trilamsr
added a commit
that referenced
this pull request
May 19, 2026
## Summary After PR #72's verify split (wall 427s → 242s actual, vs 155s projected), investigated whether setup-go cache sharing could close the gap. It can't: all three `verify-*` jobs already share the same cache key and hit it. The gap is `verify-test` bounded at 235s by `make coverage-check` (race-on `go test ./...` across 50+ packages). Records two follow-ups in `docs/FOLLOWUPS.md` so future work doesn't re-chase setup-go: 1. `coverage-check` sharding across multiple jobs — real wall-time win (~150s target) but a real refactor (split test execution + coverage aggregation). Triggered by PR throughput becoming binding. 2. Move `build-tags` and `build` from verify-static into verify-lint — saves ~17% runner-minutes via dedupe of compile work, zero wall-time. Cheap, but no speed benefit; triggered by runner-minute pressure. ## Test plan - [x] `make doc-check` clean (banned-phrase, link-resolution, `(unverified)` baseline). - [ ] FOLLOWUPS-only PR; no code or workflow changes. ```release-notes No user-facing change. Records a CI wall-time investigation under docs/FOLLOWUPS.md so future agents don't re-chase setup-go cache sharing as a speedup lever. ``` Signed-off-by: Tri Lam <trilamsr@gmail.com>
4 tasks
trilamsr
added a commit
that referenced
this pull request
May 19, 2026
## Summary Adds two load-bearing lessons to `AGENTS.md` from this session's CI work. Both prevent a future contributor from repeating the same trap. **Aggregator bypass.** GitHub Actions short-circuits an aggregator job's `needs:` to SKIPPED on any sub-job failure, and treats SKIPPED required checks as satisfied. PR #73 silently merged past a failed `verify-test` because the aggregator from PR #72's verify split was SKIPPED rather than FAILURE. The fix shape (`if: always()` + `needs.*.result` check) shipped in PR #74; this lesson documents the trap and the fix so anyone splitting CI jobs in the future doesn't repeat it. **Perf-budget regex flake class.** `require.Regexp` with implicit upper bounds (e.g. `0\.0[0-9]+`) on values whose only invariant is `>0` flake on slow CI runners. Two of these hit in one session: `TestReceiver_SLIBudget` (emit-latency, observed 539ms) and `TestReceiver_SetDegraded` (degraded-seconds, observed 0.126s). The fix shape is the same in both — relax to any positive value (`\d+\.[0-9]*[1-9]`) or use baseline-relative comparisons. File goes from 128 to 148 lines (cap is 150, with 2 lines of remaining headroom — next addition should consider demoting an older entry to a topic note per the file's own promotion rule). ## Test plan - [x] `wc -l AGENTS.md` reports 148, under the 150-line cap. - [x] `make doc-check` clean (banned-phrase lint, 250 links resolve, `(unverified)` count = 7 baseline). - [x] Capture-flow format check (`learn-from-mistakes` skill): banned vocabulary absent, no first-person AI phrasing, no AI attribution, both entries carry `Anchor:` citations. - [ ] CI on this PR exercises the same gates plus the aggregator that's now itself an anchor of lesson #1. ```release-notes NONE — documentation only. Adds two load-bearing lessons to `AGENTS.md` covering GitHub Actions aggregator semantics and a recurring perf-budget regex flake class. No runtime behavior change. ``` Signed-off-by: Tri Lam <trilamsr@gmail.com>
3 tasks
trilamsr
added a commit
that referenced
this pull request
May 19, 2026
## Summary Three observations from the recent session that didn't fit the structured surfaces (already used for the load-bearing AGENTS.md entries in PR #81 and the agent-internal notes in PR #82). Each captured via the `learn-from-mistakes` flow and lands in its existing topic note. **`.claude/notes/automation.md`** — *Memory captures rationale; hooks enforce.* The pre-PR checklist personal memory landed mid-session was followed by a lint failure shipped to CI within the hour. The same gap closed reliably by the `PreToolUse` hook installed shortly after. For any "always do X before Y" pattern, prefer the hook; the memory documents *why* the hook exists. **`docs/notes/ci.md`** — *Frame CI / perf projections as ranges, not single numbers.* PR #72's 155s wall-time projection vs 242s actual cost an investigation round (later landed in PR #77) because the projection's setup-go-cache amortization assumption was unverified. Either verify assumptions empirically before publishing the number, or frame as a range. **`.claude/notes/review-patterns.md`** — *Self-rate work, then write criteria for the next grade up.* Forces articulation of measurable improvements rather than free-form "anything else?". PR #76's B+ → A → A+ came from two iterations of this exact pattern; each iteration closed real structural gaps. A fourth lesson — "fix existing tools before proposing new ones" — was captured to personal memory (no PR, lives in `~/.claude/projects/.../memory/`), not the repo, because it's a judgment heuristic about my own decision-making rather than a repo-resident convention. ## Test plan - [x] `make doc-check` clean (banned-phrase lint, link resolution, all gates). - [x] `learn-from-mistakes` format check: banned vocabulary absent, no first-person AI phrasing, no AI attribution, all three entries carry `Anchor:` lines pointing at concrete PRs. - [ ] CI on this PR exercises `doc-check` + `pr-lint`. ## Rollback Each entry is a self-contained `### title` + body + `Anchor:` block at the top of its file. No dependents elsewhere; reverting is a single Edit per file. ```release-notes NONE — documentation only. Three meta-lessons from a recent session retrospective land in their existing topic notes (`automation.md`, `ci.md`, `review-patterns.md`). No runtime behavior change. ``` Signed-off-by: Tri Lam <trilamsr@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
verifyCI job currently runs 16 steps sequentially — 427s wall time on PR #70 (the most recent merged PR). Split into three parallel jobs grouped by the longest single step, with an aggregator namedverifythat preserves the existing branch-protection required check.Partition
verify-testcoverage-check(race tests)verify-lintvet,lintverify-staticlicense-check,generate-check,build-tags,tidy-check,register-lint,nccl-fr RCE,actionlint, install+zizmor,ci-fuzz-nccl-fr,govulncheck,doc-check,buildverify(aggregator)needs: [verify-test, verify-lint, verify-static];echo successWall time: max(125, 155, 140) ≈ 155s vs 427s today — saves ~4.5 minutes per PR.
Why this shape
coverage-checkis the longest single step (~125s) — kept on its own job so it bounds the lower wall-time edge.vetandlintshare golangci-lint setup; pairing them keeps that overhead amortized.verify-staticis a grab-bag bounded bybuild(~55s) +fuzz(~40s). If it grows past the lint pole, rebalance later.Branch protection
Unchanged. The required check
verifycontinues to exist as the aggregator; sub-job failures propagate vianeeds:. Nogh apicalls or admin steps required.Cost
Runner-minute cost rises from 1x to ~3.3x current (each job pays setup-go; Go module cache amortizes the rest). For a repo at this PR cadence the wall-clock win pays for itself in operator time.
Local invariant
make ciis unchanged — still sequential developer convenience. CI just spreads the same Make targets across 3 jobs.Test plan
verify-test,verify-lint,verify-staticall green;verifyaggregator green.needs:semantics.make actionlintandmake zizmorboth clean locally on the diff.