ci(compat): publish v1.0-rc1 compat-matrix workflow#355
Merged
Conversation
Closes v1.0-rc1 cut criteria 5 + 6: - .github/workflows/compat-matrix.yml: weekly cron + workflow_dispatch + v* tag triggers run (k8s 1.30/1.31/1.32/1.33 x pinned OTel v0.130) plus a non-blocking (k8s 1.32 x previous-minor OTel v0.129) cell. Each cell does OCB build, docker build, kind install, helm install, port-forward healthcheck + self-metrics. report job opens a dedupe-by-label tracking issue on red per the criterion 6 rubric. - docs/SUPPORT-MATRIX.md: k8s rows 1.30/1.31/1.33 flipped from 'Tier 2 - compat-matrix pending' to Tier 1 with the new workflow as source of truth; 1.28/1.29 stay Tier 2 (out of upstream support window); OTel previous-minor row tagged informational until N-back policy adopted. - docs/v1-rc1-cut-criteria.md: criterion 5 unboxed (support matrix published) and criterion 6 unboxed (compat CI matrix wired). - docs/RELEASE-CHECKLIST.md: RC additional gates cite the new workflow + manual workflow_dispatch step during release-prep. Disciplines: - actionlint clean - python3 yaml.safe_load clean - zizmor --min-severity=high clean (workflow-level permissions scoped down to read; issues:write granted only on the report job; setup-go cache disabled on this publishing workflow) - All actions pinned to commit SHA matching the canonical references in chart.yml + install-bench.yml Signed-off-by: Tri Lam <tree@lumalabs.ai>
Signed-off-by: Tri Lam <tri@maydow.com>
Contributor
Author
|
Addressed reviewer findings: Fixed:
|
# Conflicts: # docs/RELEASE-CHECKLIST.md # docs/SUPPORT-MATRIX.md # docs/v1-rc1-cut-criteria.md
trilamsr
added a commit
that referenced
this pull request
Jun 2, 2026
Closes #445. ## Root cause The `changes` job in `.github/workflows/ci.yml` has existed since #355 (Mon 2026-06-01) emitting an `outputs.code` boolean that flags docs-only PRs. The accompanying comment claimed `verify-test`, `verify-lint`, and `build` skip on doc-only changes — **but no job actually consumed the output**. Every PR has been running the full ci-full suite (coverage-check, vet+lint, build×2 arches, verify-static, sdk-python) regardless of diff scope. PR #442 (#424) wired `bench-allocs-check` into `make ci-full` and the CI `bench-check` step picked up the +80s standalone cost (+54s effective with warm cache). That made the long-orphaned `changes`-job miswiring visible enough to file as #445, but the underlying defect is older: a path-filter shipped without consumers. ## Change Two outputs on the `changes` job, consumed by `if:` conditions: - **`code`** (true iff any non-`docs/*`-non-`*.md` file changed) gates `verify-test`, `verify-lint`, `build`, `smoke-test-binary`, `sdk-python`. Go gates cannot regress on doc-only edits. - **`bench`** (true iff `module/pkg/patterns/**`, `bench/**`, `scripts/bench-*.sh`, or `Makefile` changed) gates the `bench-check` step *inside* `verify-static`. `bench-allocs-check` cannot regress on a diff that doesn't touch detector source, the bench harness, the bench scripts, or the bench-target wiring. Both filters **fail open** on an empty diff or unreachable base ref, so every push to `main` and every bench-touching PR keeps the gate mandatory. `verify-static` and `validator-recipe` keep running on every PR — they own the doc-touching gates (`doc-check`, `cut-criteria-check`, `slo-rules-check`, recipe-YAML validation under `docs/integrations/examples/`). The `verify` aggregator now treats `result=skipped` as satisfied on the doc-only path for the skippable sub-jobs (verify-test, verify-lint, sdk-python). verify-static + validator-recipe still must be `success` on every PR — encoded in the aggregator, not relying on branch-protection SKIPPED-is-OK semantics. ## Wall-time impact | PR shape | Before | After | Δ | |---|---|---|---| | Docs-only (e.g. #423) | ~283s (full suite) | bounded by `verify-static` minus bench-check step | drops the +54s bench delta + the verify-test 125s pole + the verify-lint 60s pole | | Bench-touching (e.g. #442, this one) | ~283s | ~283s | 0 (gate still runs) | | Non-bench code (e.g. processor change) | ~283s | ~203s | -80s (bench-check skip) | ## TDD (red→green) Mirrored the workflow filter logic in `/tmp/test-changes-filter.sh` (12 cases × code+bench outputs = 24 assertions). Covered: docs-only, detector source, bench-script, bench-registry-via-`bench-*.sh`-glob, Makefile, non-detector go, bench-baseline.txt under testdata, mixed docs+code, empty-diff fail-open, bench/ dir, top-level `.md` (PRINCIPLES.md), workflow-file change. All 24 green. Real-PR validation: - `gh pr diff 423` (docs-only) → `code=false bench=false` (heavy gates + bench-check skip) - `gh pr diff 442` (Makefile + PRINCIPLES.md) → `code=true bench=true` (gates run) ## Verify (green) - `make actionlint` — clean (initial run flagged SC2221/SC2222 on a redundant `scripts/bench-registry.sh` pattern that the `scripts/bench-*.sh` glob already subsumed; removed) - `make lint` — 0 issues - `make ci-fast` — clean (lint, vet, mod-verify, attribute-namespace-check, doc-check, alert-check, chart-appversion-check, rfc-status-check, slo-rules-check) - `make doc-check` — clean - Pre-commit hooks — clean (golangci-lint, vet, mod-verify, attribute-namespace-check, hit-line-format-stable, no-autoupdate-check_test) ## A+ audit notes Audited other ci-full jobs for docs-only skip eligibility: - `validator-recipe` — NOT skipped. Recipe YAMLs live under `docs/integrations/examples/`, so docs-classified diffs can include functional content. Out-of-scope for safe skip. - `verify-static` — runs always. Contains `doc-check`, `cut-criteria-check`, `slo-rules-check`, `actionlint`, `zizmor`, `register-lint`, `deprecation-check` — all can regress on docs. Step-level skip applied only to the `install benchstat` + `bench-check` pair via `changes.outputs.bench`. - `validator-recipe` and `smoke-test-binary` not eligible for `bench` output (don't touch detector code). PRINCIPLES.md §10 updated with the path-filter policy (which jobs skip on docs-only, why verify-static stays, fail-open semantics) so the next reader can see the policy without grepping the workflow. ```release-notes - CI bench-check step now skips on PRs that don't touch the detector tree, the bench harness, the bench scripts, or the Makefile. Heavy Go gates (verify-test, verify-lint, build, smoke-test-binary, sdk-python) skip on docs-only PRs. The verify aggregator treats SKIPPED-on-docs-only as success; verify-static and validator-recipe still run on every PR. Closes #445. ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes v1.0-rc1 cut criteria 5 (support matrix published) and 6 (compat CI matrix green) together: ships the missing
.github/workflows/compat-matrix.ymland createsdocs/SUPPORT-MATRIX.mdwith every Tier-1 row backed by a named CI workflow per the criterion 5 rubric..github/workflows/compat-matrix.yml—schedule: weekly Mon 04:00 UTC+workflow_dispatch+push tags: v*. Matrix is skinny by design (~10-15 min wall-time):make bump-otelreverse) —continue-on-error: true, informational until a published N-back support policy lands.helm install --wait→ port-forward smoke (healthcheck on:13133+ self-metrics on:8888, asserting non-zerootelcol_*counters) → uninstall + tear-down.reportaggregator job dedupes red runs onto a singlecompat-matrix-red-labeled tracking issue (criterion 6 rubric: "failures opening an issue automatically").docs/SUPPORT-MATRIX.md— new file. k8s 1.30 / 1.31 / 1.33 flip from "Tier 2 — compat-matrix pending" to Tier 1 (CI-gated bycompat-matrix.yml); 1.32 stays Tier 1 (already gated bychart.yml, now also by the matrix); 1.28 / 1.29 stay Tier 2 with an explicit "compat-matrix not exercised" note (outside the upstream-supported window). Note: this file is also created by #350; whichever PR merges second resolves the trivial conflict by adopting both edits.docs/v1-rc1-cut-criteria.md— criterion 5☐ → ☑and criterion 6☐ → ☑with the workflow + matrix cells named.docs/RELEASE-CHECKLIST.md— RC "Compat CI" line cites the new workflow + a manualworkflow_dispatchstep during release-prep.Why this design
ci.ymlevery PR; OTel axis is gated byci.yml+chart.yml+install-bench.ymlevery PR. The compat matrix's job is the (k8s × OTel) cross-product, which nothing else exercises.v*tag re-validates beforegoreleaserships — catches a release that would crash on a newer k8s minor we haven't bumped to in months.Disciplines
actionlint .github/workflows/compat-matrix.yml— cleanpython3 -c "import yaml; yaml.safe_load(open(...))"— cleanbash scripts/zizmor.sh(--min-severity=high) — clean. Workflow-level perms scoped tocontents: read;issues: writegranted only on thereportjob.setup-gocache disabled (this workflow publishes on tag, so the cache-poisoning audit fires oncache: true).make doc-check— cleanTest plan
workflow_dispatchonce to confirm all 5 cells run end-to-end onmain.reportjob opens an issue labeledcompat-matrix-red.v*tag (release-prep PR), confirm the workflow gates the tag push beforegoreleaserships.