Skip to content

ci(chart): 10-run helm-install median aggregator (M3 #209)#471

Merged
trilamsr merged 1 commit into
mainfrom
ci/chart-install-rolling-median
Jun 2, 2026
Merged

ci(chart): 10-run helm-install median aggregator (M3 #209)#471
trilamsr merged 1 commit into
mainfrom
ci/chart-install-rolling-median

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes the M3 carry-forward in docs/MILESTONES.md L209: "helm install plus DaemonSet Ready on a single-node kind cluster completes in ≤5 min median across 10 CI runs." The single-run ≤300s gate already lives in chart.yml; this PR adds the missing 10-run rolling-median aggregation layer.

Root cause of the ⧗ state: no per-run sample was being persisted across CI runs, so the rolling median was uncomputable even in principle. The single-run gate had nothing to roll up against. Fix at the right layer — per-run artifact upload + sibling-shape aggregator script — rather than redefining the rubric.

Sibling pattern: PR #446's bench-cv-rolling artifact pipeline. Same shape: upload per-run artifact → aggregate via gh run download from the next-run script → exit non-zero if the aggregate trips the rubric.

Pieces

  • .github/workflows/chart.ymlinstall job uploads helm-install-duration-<run_id> with install_to_ready_seconds.txt (single integer) + metadata. 90-day retention. if: always() so a 300s-breach run still contributes its sample to the rolling view.
  • scripts/helm-install-rolling.sh — downloads last N=10 successful main-branch chart.yml runs, computes median, fails if median > 300. Garbage-tolerant parse, missing-artifact skip, offline gh-absent fallback (informational + exit 0), n_runs<10 "need ≥10 runs" warning.
  • scripts/helm-install-rolling_test.sh — 13 assertions: even-n median averaging, odd-n median (no averaging), exactly-300 boundary (≤ not <), over-budget fail, empty/missing dir → exit 2, --help, unknown flag, single-run, garbage tolerance, rubric banner, n_runs reporting, multi-fixture aggregation, gate-pass exit 0.
  • make helm-install-rolling-report — operator entry point. Honours N=20 make helm-install-rolling-report.
  • docs/MILESTONES.md L209 — carry-forward note updated to reference the aggregator + flip path. Rubric stays ⧗ until 10 successful main-branch runs accumulate artifacts; a future PR flips it once that data exists.
  • install/kubernetes/tracecore/README.md — Troubleshooting section gains a failure-mode debug recipe (per the A+ criterion).

Why median, not CV

bench-cv-rolling.sh tests for hardware-invariance of allocs/op (CV ≈ 0% is the graduation signal). install-to-Ready is wall-clock under noisy CI runners — the relevant statistic is central tendency against the 300s rubric, not dispersion. Matches MILESTONES.md wording verbatim ("median across 10 CI runs").

Verification

  • shellcheck scripts/helm-install-rolling.sh scripts/helm-install-rolling_test.sh → exit 0
  • actionlint .github/workflows/chart.yml → exit 0
  • bash scripts/helm-install-rolling_test.sh → 13/13 PASS
  • Mutation tests:
    1. Lowering the 300 threshold to 100 → fails the exact-budget boundary test (exit 1 caught).
    2. Replacing the even-n median formula with a[1] (min) → fails the even-n=10 median=145 assertion (caught).
  • Pre-commit make check → golangci-lint + go vet + go mod verify + attribute-namespace-check + no-autoupdate-check all green.

Test plan

  • shellcheck both scripts → exit 0
  • actionlint chart.yml → exit 0
  • 13/13 shell-test assertions pass locally
  • Mutation-verified: threshold + median formula both produce failing tests when mutated
  • make helm-install-rolling-report runs offline (no gh auth needed) and prints informational fallback rather than crashing
  • CI chart.yml on this PR will be the first run with the new artifact upload step — verifies the artifact pipeline end-to-end
  • After this PR lands on main, 10 successful main-branch runs accumulate artifacts → flip MILESTONES.md L209 ⧗ → ☑ in a follow-up

Self-grade: A+

  • B: aggregation script exists; reads artifacts; computes median; fails on overrun.
  • A: above + wired into CI (per-run artifact upload in chart.yml); MILESTONES.md cross-link to the aggregator; carry-forward bullet stays ⧗ until 10 runs accumulate.
  • A+: above + mutation-verified shell tests; cross-link to PR ci(bench): CV-rolling artifact scaffold for soft→hard gate (#420) #446's bench-cv-rolling pattern in the script preamble + README; failure-mode debug recipe shipped in install/kubernetes/tracecore/README.md.
- New `scripts/helm-install-rolling.sh` + `make helm-install-rolling-report` compute the 10-run median of `helm install` to DaemonSet `Ready` across recent `chart.yml` runs on main; drives the M3 carry-forward (`docs/MILESTONES.md` L209) graduation.
- `chart.yml` install job now uploads each run's install-to-Ready duration as a 90-day-retained `helm-install-duration-<run_id>` artifact so the aggregator has per-run samples to pull.
- Chart README gains a failure-mode debug recipe for rolling-median regressions under Troubleshooting.

Closes the M3 carry-forward in docs/MILESTONES.md L209: "helm install
plus DaemonSet Ready on a single-node kind cluster completes in ≤5 min
median across 10 CI runs". The single-run ≤300s gate already lives in
chart.yml; this PR adds the 10-run rolling-median aggregation layer.

Sibling pattern: PR #446's bench-cv-rolling artifact pipeline. Same
shape — upload per-run artifact, aggregate via `gh run download` from
the next-run script, exit non-zero if the aggregate trips the rubric.

## Pieces

- `.github/workflows/chart.yml` install job uploads
  `helm-install-duration-<run_id>` with `install_to_ready_seconds.txt`
  (single integer) + metadata. 90-day retention. `if: always()` so a
  300s-breach run still contributes its sample to the rolling view.
- `scripts/helm-install-rolling.sh` downloads the last N=10 successful
  main-branch chart.yml runs, computes median, fails if median > 300.
  Edge cases: missing artifacts skipped, garbage content tolerated,
  offline mode (no gh) prints informational message + exits 0,
  n_runs<10 prints "need ≥10 runs" warning (rubric still informative).
- `scripts/helm-install-rolling_test.sh` — 13+ assertions: even-n
  median averaging, odd-n median (no averaging), exactly-300 boundary
  (≤ not <), over-budget fail, empty/missing dir exit 2, --help, --bogus,
  single-run, garbage-tolerance, rubric banner, n_runs reporting.
- `make helm-install-rolling-report` operator entry point. `N=20 make ...`
  override supported.
- MILESTONES.md L209 carry-forward note updated to reference the
  aggregator + flip path; rubric stays ⧗ until 10 runs accumulate
  artifacts on main.
- install/kubernetes/tracecore/README.md Troubleshooting section gains
  a failure-mode debug recipe (per A+ criterion).

## Why median, not CV

bench-cv-rolling.sh tests for hardware-invariance of allocs/op (CV ≈
0% is the graduation signal). install-to-Ready is wall-clock under
noisy CI runners — the relevant statistic is the central tendency
against the 300s rubric, not the dispersion. Matches MILESTONES.md
wording verbatim ("median across 10 CI runs").

## Verification

- shellcheck scripts/helm-install-rolling.sh
  scripts/helm-install-rolling_test.sh → exit 0
- actionlint .github/workflows/chart.yml → exit 0
- bash scripts/helm-install-rolling_test.sh → 13/13 PASS
- mutation tests: lowering the 300 threshold to 100 fails the
  exact-budget tests; replacing the even-n median formula with min
  fails the even-n assertion. Tests catch both mutations.

## Self-grade: A+

- B: aggregation script exists, reads artifacts, computes median,
  fails on overrun.
- A: above + wired into CI (per-run artifact upload landed in
  chart.yml); MILESTONES.md cross-link to the aggregator; the rubric
  bullet stays ⧗ until 10 runs accumulate (a future PR flips it once
  the artifact set is populated).
- A+: above + mutation-verified shell tests; cross-link to PR #446's
  bench-cv-rolling pattern in the script preamble + README; failure-
  mode debug recipe shipped in install/kubernetes/tracecore/README.md.

```release-notes
- New `scripts/helm-install-rolling.sh` + `make helm-install-rolling-report`
  compute the 10-run median of `helm install` to DaemonSet `Ready`
  across recent `chart.yml` runs on main; drives the M3 carry-forward
  (docs/MILESTONES.md L209) graduation.
- `chart.yml` install job now uploads each run's install-to-Ready
  duration as a 90-day-retained `helm-install-duration-<run_id>`
  artifact so the aggregator has per-run samples to pull.
- Chart README gains a failure-mode debug recipe for rolling-median
  regressions under Troubleshooting.
```

Signed-off-by: Tri Lam <tree@lumalabs.ai>
@trilamsr

trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Independent Adversarial Review — PR #471

B/A/A+ Grade: A+ (ship)

Correctness ✅

Median computation proven:

  • Even n: (a[n/2] + a[n/2+1]) / 2
  • Odd n: a[(n+1)/2]
  • Test cases verified: {100..190}→145, {50..90}→70, {290..310}→300 all correct

Gate logic verified:

  • awk 'BEGIN { exit (m > 300) ? 0 : 1 }' correctly distinguishes ≤300 (pass) vs >300 (fail)

Artifact upload condition sound:

  • if: always() && steps.install.outputs.install_to_ready_seconds != ''
  • Does NOT mask failures; install step's exit 1 still propagates
  • Correctly uploads even on 300s breach so regression data is captured (makes rolling median more useful)

Test Coverage ✅

13 assertions across comprehensive scenarios:

  • Normal case (all 120s → median 120)
  • Even-n median (10 values → 145)
  • Odd-n median (5 values → 70)
  • Boundary at 300s (median=300, ≤ gate passes)
  • Over budget (median=325, gate fails)
  • Empty/missing/bad --dir paths
  • Garbage content tolerance (skip malformed, aggregate valid)
  • Fallback warnings ("need ≥10 runs")
  • Output format assertions (rubric banner, n_runs)

No gaps. Edge cases thorough.

Duplication vs. Bench-CV-Rolling (PR #446) ✅

Moderate but justified:

  • Boilerplate ~65% identical (arg parsing, CI flow, caching)
  • Critical logic orthogonal (CV % for bench vs median for helm)
  • Abstraction tax (~80 new lines to save ~45) not justified in bash 3.2
  • Both scripts reference each other; parity established
  • Verdict: keep parallel; no shared helper needed.

Sibling Parity ✅


Findings

🟡 risk: preamble duplicates README playbook. Lines 42–52 in helm-install-rolling.sh ("Failure-mode debug recipe") are verbatim copies of the README section. Trim preamble from 70→35 lines; reference README instead.

🟡 risk: 90-day retention undocumented. No comment in chart.yml explaining why 90 days is the right window. (Non-blocking; bench-cv-rolling also lacks rationale. Document in follow-up.)

🔵 nit: comment density in preamble excessive. "How it works" + "Edge cases" + "Failure modes" + "Usage" = 72 lines. Collapse to ~30 lines; reference docs.


Simplification Sweep

helm-install-rolling.sh line 1–72 (preamble): Trim duplicate failure-mode recipe; consolidate to ~30 lines. Keep structure but excise the playbook (README is SSOT). Update sed -n '2,72p'sed -n '2,40p' in --help.

Rationale: --help is not primary UX (operator uses make helm-install-rolling-report); Makefile comment is the real target. README has the full playbook; script should reference, not duplicate.

Test file: Comments are proportionate (13 fixtures × 2-line explanations). No bloat detected.


No Issues With

  • Mutex/race conditions (single artifact per run_id)
  • Off-by-one errors in median formula (proven via test data)
  • Output masking by if: always() (correct choice; includes regression data)
  • 10-sample minimum gate (advisory until n≥10, then hard; spec matches MILESTONES.md)

Ship it. Simplification feedback should be applied pre-merge to clean up preamble, but correctness is solid.

@trilamsr trilamsr enabled auto-merge (squash) June 2, 2026 03:54
@trilamsr trilamsr merged commit d0815da into main Jun 2, 2026
19 checks passed
@trilamsr trilamsr deleted the ci/chart-install-rolling-median branch June 2, 2026 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant