ci(chart): 10-run helm-install median aggregator (M3 #209)#471
Conversation
Closes the M3 carry-forward in docs/MILESTONES.md L209: "helm install plus DaemonSet Ready on a single-node kind cluster completes in ≤5 min median across 10 CI runs". The single-run ≤300s gate already lives in chart.yml; this PR adds the 10-run rolling-median aggregation layer. Sibling pattern: PR #446's bench-cv-rolling artifact pipeline. Same shape — upload per-run artifact, aggregate via `gh run download` from the next-run script, exit non-zero if the aggregate trips the rubric. ## Pieces - `.github/workflows/chart.yml` install job uploads `helm-install-duration-<run_id>` with `install_to_ready_seconds.txt` (single integer) + metadata. 90-day retention. `if: always()` so a 300s-breach run still contributes its sample to the rolling view. - `scripts/helm-install-rolling.sh` downloads the last N=10 successful main-branch chart.yml runs, computes median, fails if median > 300. Edge cases: missing artifacts skipped, garbage content tolerated, offline mode (no gh) prints informational message + exits 0, n_runs<10 prints "need ≥10 runs" warning (rubric still informative). - `scripts/helm-install-rolling_test.sh` — 13+ assertions: even-n median averaging, odd-n median (no averaging), exactly-300 boundary (≤ not <), over-budget fail, empty/missing dir exit 2, --help, --bogus, single-run, garbage-tolerance, rubric banner, n_runs reporting. - `make helm-install-rolling-report` operator entry point. `N=20 make ...` override supported. - MILESTONES.md L209 carry-forward note updated to reference the aggregator + flip path; rubric stays ⧗ until 10 runs accumulate artifacts on main. - install/kubernetes/tracecore/README.md Troubleshooting section gains a failure-mode debug recipe (per A+ criterion). ## Why median, not CV bench-cv-rolling.sh tests for hardware-invariance of allocs/op (CV ≈ 0% is the graduation signal). install-to-Ready is wall-clock under noisy CI runners — the relevant statistic is the central tendency against the 300s rubric, not the dispersion. Matches MILESTONES.md wording verbatim ("median across 10 CI runs"). ## Verification - shellcheck scripts/helm-install-rolling.sh scripts/helm-install-rolling_test.sh → exit 0 - actionlint .github/workflows/chart.yml → exit 0 - bash scripts/helm-install-rolling_test.sh → 13/13 PASS - mutation tests: lowering the 300 threshold to 100 fails the exact-budget tests; replacing the even-n median formula with min fails the even-n assertion. Tests catch both mutations. ## Self-grade: A+ - B: aggregation script exists, reads artifacts, computes median, fails on overrun. - A: above + wired into CI (per-run artifact upload landed in chart.yml); MILESTONES.md cross-link to the aggregator; the rubric bullet stays ⧗ until 10 runs accumulate (a future PR flips it once the artifact set is populated). - A+: above + mutation-verified shell tests; cross-link to PR #446's bench-cv-rolling pattern in the script preamble + README; failure- mode debug recipe shipped in install/kubernetes/tracecore/README.md. ```release-notes - New `scripts/helm-install-rolling.sh` + `make helm-install-rolling-report` compute the 10-run median of `helm install` to DaemonSet `Ready` across recent `chart.yml` runs on main; drives the M3 carry-forward (docs/MILESTONES.md L209) graduation. - `chart.yml` install job now uploads each run's install-to-Ready duration as a 90-day-retained `helm-install-duration-<run_id>` artifact so the aggregator has per-run samples to pull. - Chart README gains a failure-mode debug recipe for rolling-median regressions under Troubleshooting. ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>
Independent Adversarial Review — PR #471B/A/A+ Grade: A+ (ship)Correctness ✅Median computation proven:
Gate logic verified:
Artifact upload condition sound:
Test Coverage ✅13 assertions across comprehensive scenarios:
No gaps. Edge cases thorough. Duplication vs. Bench-CV-Rolling (PR #446) ✅Moderate but justified:
Sibling Parity ✅
Findings🟡 risk: preamble duplicates README playbook. Lines 42–52 in helm-install-rolling.sh ("Failure-mode debug recipe") are verbatim copies of the README section. Trim preamble from 70→35 lines; reference README instead. 🟡 risk: 90-day retention undocumented. No comment in chart.yml explaining why 90 days is the right window. (Non-blocking; bench-cv-rolling also lacks rationale. Document in follow-up.) 🔵 nit: comment density in preamble excessive. "How it works" + "Edge cases" + "Failure modes" + "Usage" = 72 lines. Collapse to ~30 lines; reference docs. Simplification Sweephelm-install-rolling.sh line 1–72 (preamble): Trim duplicate failure-mode recipe; consolidate to ~30 lines. Keep structure but excise the playbook (README is SSOT). Update Rationale: --help is not primary UX (operator uses Test file: Comments are proportionate (13 fixtures × 2-line explanations). No bloat detected. No Issues With
Ship it. Simplification feedback should be applied pre-merge to clean up preamble, but correctness is solid. |
Summary
Closes the M3 carry-forward in
docs/MILESTONES.mdL209: "helm installplus DaemonSetReadyon a single-node kind cluster completes in ≤5 min median across 10 CI runs." The single-run ≤300s gate already lives inchart.yml; this PR adds the missing 10-run rolling-median aggregation layer.Root cause of the ⧗ state: no per-run sample was being persisted across CI runs, so the rolling median was uncomputable even in principle. The single-run gate had nothing to roll up against. Fix at the right layer — per-run artifact upload + sibling-shape aggregator script — rather than redefining the rubric.
Sibling pattern: PR #446's
bench-cv-rollingartifact pipeline. Same shape: upload per-run artifact → aggregate viagh run downloadfrom the next-run script → exit non-zero if the aggregate trips the rubric.Pieces
.github/workflows/chart.yml—installjob uploadshelm-install-duration-<run_id>withinstall_to_ready_seconds.txt(single integer) + metadata. 90-day retention.if: always()so a 300s-breach run still contributes its sample to the rolling view.scripts/helm-install-rolling.sh— downloads last N=10 successful main-branch chart.yml runs, computes median, fails if median > 300. Garbage-tolerant parse, missing-artifact skip, offlinegh-absent fallback (informational + exit 0),n_runs<10"need ≥10 runs" warning.scripts/helm-install-rolling_test.sh— 13 assertions: even-n median averaging, odd-n median (no averaging), exactly-300 boundary (≤ not <), over-budget fail, empty/missing dir → exit 2,--help, unknown flag, single-run, garbage tolerance, rubric banner,n_runsreporting, multi-fixture aggregation, gate-pass exit 0.make helm-install-rolling-report— operator entry point. HonoursN=20 make helm-install-rolling-report.docs/MILESTONES.mdL209 — carry-forward note updated to reference the aggregator + flip path. Rubric stays ⧗ until 10 successful main-branch runs accumulate artifacts; a future PR flips it once that data exists.install/kubernetes/tracecore/README.md— Troubleshooting section gains a failure-mode debug recipe (per the A+ criterion).Why median, not CV
bench-cv-rolling.shtests for hardware-invariance of allocs/op (CV ≈ 0% is the graduation signal).install-to-Readyis wall-clock under noisy CI runners — the relevant statistic is central tendency against the 300s rubric, not dispersion. MatchesMILESTONES.mdwording verbatim ("median across 10 CI runs").Verification
shellcheck scripts/helm-install-rolling.sh scripts/helm-install-rolling_test.sh→ exit 0actionlint .github/workflows/chart.yml→ exit 0bash scripts/helm-install-rolling_test.sh→ 13/13 PASS300threshold to100→ fails theexact-budgetboundary test (exit 1 caught).a[1](min) → fails theeven-n=10 median=145assertion (caught).make check→ golangci-lint + go vet + go mod verify + attribute-namespace-check + no-autoupdate-check all green.Test plan
chart.yml→ exit 0make helm-install-rolling-reportruns offline (noghauth needed) and prints informational fallback rather than crashingchart.ymlon this PR will be the first run with the new artifact upload step — verifies the artifact pipeline end-to-endSelf-grade: A+
chart.yml); MILESTONES.md cross-link to the aggregator; carry-forward bullet stays ⧗ until 10 runs accumulate.bench-cv-rollingpattern in the script preamble + README; failure-mode debug recipe shipped ininstall/kubernetes/tracecore/README.md.