Skip to content

[docs] AGENTS.md: aggregator bypass + perf-regex flake lessons#81

Merged
trilamsr merged 1 commit into
mainfrom
docs/agents-md-aggregator-and-flake-lessons
May 19, 2026
Merged

[docs] AGENTS.md: aggregator bypass + perf-regex flake lessons#81
trilamsr merged 1 commit into
mainfrom
docs/agents-md-aggregator-and-flake-lessons

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

Summary

Adds two load-bearing lessons to AGENTS.md from this session's CI work. Both prevent a future contributor from repeating the same trap.

Aggregator bypass. GitHub Actions short-circuits an aggregator job's needs: to SKIPPED on any sub-job failure, and treats SKIPPED required checks as satisfied. PR #73 silently merged past a failed verify-test because the aggregator from PR #72's verify split was SKIPPED rather than FAILURE. The fix shape (if: always() + needs.*.result check) shipped in PR #74; this lesson documents the trap and the fix so anyone splitting CI jobs in the future doesn't repeat it.

Perf-budget regex flake class. require.Regexp with implicit upper bounds (e.g. 0\.0[0-9]+) on values whose only invariant is >0 flake on slow CI runners. Two of these hit in one session: TestReceiver_SLIBudget (emit-latency, observed 539ms) and TestReceiver_SetDegraded (degraded-seconds, observed 0.126s). The fix shape is the same in both — relax to any positive value (\d+\.[0-9]*[1-9]) or use baseline-relative comparisons.

File goes from 128 to 148 lines (cap is 150, with 2 lines of remaining headroom — next addition should consider demoting an older entry to a topic note per the file's own promotion rule).

Test plan

  • wc -l AGENTS.md reports 148, under the 150-line cap.
  • make doc-check clean (banned-phrase lint, 250 links resolve, (unverified) count = 7 baseline).
  • Capture-flow format check (learn-from-mistakes skill): banned vocabulary absent, no first-person AI phrasing, no AI attribution, both entries carry Anchor: citations.
  • CI on this PR exercises the same gates plus the aggregator that's now itself an anchor of lesson ci(deps): bump the gh-actions group with 5 updates #1.
NONE — documentation only. Adds two load-bearing lessons to `AGENTS.md` covering GitHub Actions aggregator semantics and a recurring perf-budget regex flake class. No runtime behavior change.

Two universal rules learned at real cost this session:

1. GitHub Actions aggregator jobs under default `needs:` semantics
   silently bypass branch protection. PR #73 merged past a failed
   `verify-test` because the aggregator was SKIPPED and GitHub
   treats SKIPPED required checks as satisfied. Fix shape lives
   already in main (PR #74); the lesson now lives where the next
   contributor splitting CI jobs will see it before repeating.

2. `require.Regexp` with implicit upper bounds (e.g. `0\.0[0-9]+`)
   is a flake class. Two assertions of identical shape hit in one
   session (`TestReceiver_SLIBudget`, `TestReceiver_SetDegraded`).
   Fix shape: assert only the actual invariant.

File at 148 lines (cap is 150). `make doc-check` clean.

Signed-off-by: Tri Lam <trilamsr@gmail.com>
@trilamsr trilamsr merged commit 9201b67 into main May 19, 2026
8 checks passed
@trilamsr trilamsr deleted the docs/agents-md-aggregator-and-flake-lessons branch May 19, 2026 08:02
trilamsr added a commit that referenced this pull request May 19, 2026
…83)

## Summary

Flake-pattern audit follow-up to PR #76 + #78. Two assertions in
`tools/failure-inject/cpusteal/cpusteal_test.go` match the same shape we
fixed in `TestReceiver_SLIBudget` and `TestReceiver_SetDegraded`: hard
absolute upper bound on observed timing, calibrated to fast-runner
expectations.

| Before | After | What changed |
|---|---|---|
| `require.Less(elapsed, 500ms)` for 100ms request |
`require.Less(elapsed, 2s)` | Hang sentinel, not perf bound — busy-loop
scheduler delay under contention can run a 100ms request to 300-400ms |
| `require.Less(elapsed, 250ms)` for cancel response |
`require.Less(elapsed, 2s)` | Same — context-cancellation latency varies
by an order of magnitude under contention |

The lower-bound assertion on `TestRun_HonorsDuration` (`elapsed >=
95ms`) still pins the real contract (busy-loop runs for the requested
time). The upper bounds only catch "never returned." This matches the
lesson landed in `AGENTS.md` via PR #81 — *match perf-budget assertions
by the invariant only*.

## Test plan

- [x] Local: `go test -race -count=3 -v
./tools/failure-inject/cpusteal/` — all 4 tests PASS each iteration.
- [x] `make lint` clean.
- [x] `make vet` clean.
- [x] Audit completeness verified: broader grep sweep
(`require.Less.*Millisecond`, `assert.Less.*Millisecond`, `elapsed >
N*time.X`, `WithinDuration`, `Budget` callsites, `isRaceBuild`
callsites) found no other instances of the same shape outside the
kernelevents SLI test we already covered.
- [ ] CI on this PR.

## Rollback

Single Edit to restore the original numeric bounds. No dependents; the
bounds are local to two test functions.

```release-notes
NONE — test stability only. Relaxes two absolute-time assertions in cpusteal's test to hang sentinels rather than performance bounds, matching the pattern landed in PR #76 and #78. No production behavior change.
```

Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr added a commit that referenced this pull request May 19, 2026
## Summary

Three observations from the recent session that didn't fit the
structured surfaces (already used for the load-bearing AGENTS.md entries
in PR #81 and the agent-internal notes in PR #82). Each captured via the
`learn-from-mistakes` flow and lands in its existing topic note.

**`.claude/notes/automation.md`** — *Memory captures rationale; hooks
enforce.* The pre-PR checklist personal memory landed mid-session was
followed by a lint failure shipped to CI within the hour. The same gap
closed reliably by the `PreToolUse` hook installed shortly after. For
any "always do X before Y" pattern, prefer the hook; the memory
documents *why* the hook exists.

**`docs/notes/ci.md`** — *Frame CI / perf projections as ranges, not
single numbers.* PR #72's 155s wall-time projection vs 242s actual cost
an investigation round (later landed in PR #77) because the projection's
setup-go-cache amortization assumption was unverified. Either verify
assumptions empirically before publishing the number, or frame as a
range.

**`.claude/notes/review-patterns.md`** — *Self-rate work, then write
criteria for the next grade up.* Forces articulation of measurable
improvements rather than free-form "anything else?". PR #76's B+ → A →
A+ came from two iterations of this exact pattern; each iteration closed
real structural gaps.

A fourth lesson — "fix existing tools before proposing new ones" — was
captured to personal memory (no PR, lives in
`~/.claude/projects/.../memory/`), not the repo, because it's a judgment
heuristic about my own decision-making rather than a repo-resident
convention.

## Test plan

- [x] `make doc-check` clean (banned-phrase lint, link resolution, all
gates).
- [x] `learn-from-mistakes` format check: banned vocabulary absent, no
first-person AI phrasing, no AI attribution, all three entries carry
`Anchor:` lines pointing at concrete PRs.
- [ ] CI on this PR exercises `doc-check` + `pr-lint`.

## Rollback

Each entry is a self-contained `### title` + body + `Anchor:` block at
the top of its file. No dependents elsewhere; reverting is a single Edit
per file.

```release-notes
NONE — documentation only. Three meta-lessons from a recent session retrospective land in their existing topic notes (`automation.md`, `ci.md`, `review-patterns.md`). No runtime behavior change.
```

Signed-off-by: Tri Lam <trilamsr@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant