[ci+docs] integration paths: substrate + MILESTONES rubric backfill#149
Merged
Conversation
PR #147 audit found that kernelevents-integration.yml and pyspy-integration.yml `paths:` filters cover only `components/receivers/<name>/**` + `internal/runtime/lifecycle/**`. A change to `cmd/tracecore` factory wiring, `internal/pipeline` contract, or `internal/selftelemetry` surface could land without re-running either integration suite, even though the receiver's behavior depends on all three substrates. Add the three substrate path patterns to both push and pull_request filters on both workflows. Symmetric with install-bench.yml (P3-Rev1 #10 fix) and chart.yml. `chaos.yml` audited and intentionally not changed — its substrate-coupling is via `tools/failure-inject/**` + `internal/synthesis/**` only. Mark `docs/followups/otlphttp.md` "Workflow paths trigger" row shipped with the explicit list of paths added. `actionlint` clean on both workflows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Tri Lam <tri@maydow.com>
M1, M2, M4, M9 predate the per-rubric `☑` convention adopted in PR #53 and shipped as prose-only delivery summaries. Reformat each section to match M3 / M5b / M10+ shape: - **Functional rubrics:** block with `☑` bullets citing RFC sections or shipped file paths. - **Non-functional rubrics:** block for budget / policy / overhead guarantees. Every claim was extracted from the existing prose summary; no new guarantees added. Source citations point at RFC-0003 (M1), RFC-0006 (M2), `.golangci.yml` + `Makefile` + `scripts/` (M4 has no RFC; convention is the tooling files themselves), RFC-0007 (M9). Strike `docs/followups/M3.md` "Backfill Foundation milestone rubrics (M1, M2, M4, M9)" row — landed. Also fix three stale `docs/FOLLOWUPS.md` references that survived the shard split (PR #132): - L210 M21 carry-forward → `docs/followups/M3.md` - L269 benchstat → `docs/followups/opportunistic.md` - L552 M8 carry-forward → `docs/followups/M8.md` `make doc-check` green: em-dash + en-dash diff gate clean, comment-noise diff gate clean, 437+ markdown links resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Tri Lam <tri@maydow.com>
7 tasks
trilamsr
added a commit
that referenced
this pull request
May 21, 2026
## Summary Closes the long-standing chart-default-image gap. The chart's `install/kubernetes/tracecore/values.yaml` has shipped with `image.repository: ghcr.io/tracecoreai/tracecore` as the default since M5b, but `release.yml` only ever published the binary + SBOM + cosign-bundle + provenance as GitHub Release artifacts. Operators following the chart's defaults could not `helm install`. RFC-0008 names this path as the target operator-pull surface. ### Root cause The chart's default `image.repository` and `release.yml`'s output set drifted. The chart was deliberately specified against a future-state image registry; the registry-publish job was tracked as a M3 follow-up and not yet built. This PR closes the gap at the source by adding the publish job, not by walking back the chart default. ### Architecture - **Dockerfile** pins `gcr.io/distroless/static-debian12:nonroot` by digest (`sha256:d093aa3e30...`). Non-root UID 65532 matches the chart's `runAsUser`. CGO_ENABLED=0 makes `scratch` viable too, but distroless gives a working CA bundle for the `otlphttp` exporter's HTTPS path and tzdata for RFC3339 stamping with zero shell-attack surface. The Dockerfile also declares `ARG SOURCE_DATE_EPOCH` so the determinism contract is visible to a Dockerfile-only reader. - The image consumes the **pre-built reproducible binary** from the `build` job (`COPY release/$BINARY_BASENAME`), not a recompile. Image reproducibility reduces to binary reproducibility (already gated) plus the digest-pinned base layer plus `SOURCE_DATE_EPOCH` threaded through buildkit's layer-rewrite (via the step `env:` block, not just `--build-arg`). ### `release.yml` `image` job - `needs: build` (downloads binary artifact, verifies SHA-256 matches `build.outputs.digest` before push). - `docker/build-push-action@v6.19.2` with `SOURCE_DATE_EPOCH` set via both the step `env:` block AND `--build-arg` so buildkit's layer-rewrite kicks in. - Always tags `:TAG`. Floats `:latest` **only** on stable releases (no `-` in the SemVer pre-release field), so a pre-release cannot silently promote alpha bits to the chart's default-pull surface. - `cosign sign --yes "$IMAGE_REPO@$DIGEST"`: signs by **digest**, not tag. A registry rebuild of a floating tag would otherwise let an attacker replace what `cosign verify` resolves. - `cosign verify` smoke check pins the same identity binding the binary blob already uses (`--certificate-github-workflow-ref refs/tags/$TAG`, `--trigger push`). - `attest-build-provenance` with `push-to-registry: true` attaches the SLSA v1.0 provenance to the manifest in the registry, so a verifier pulls everything from one place via `gh attestation verify oci://`. Permissions: `id-token: write`, `attestations: write`, `packages: write`. No long-lived registry credentials (GHCR auth uses the workflow's `GITHUB_TOKEN`); no long-lived signing keys (cosign keyless via OIDC). ### Docs - `docs/reproducibility.md` grows two steps (8: resolve digest with `crane digest`, then `cosign verify` by digest; 9: `gh attestation verify oci://`) with the same identity-binding flags as the binary-side steps. `crane` added to prerequisites. - `install/kubernetes/tracecore/README.md` "Pre-release note" replaced with the live-publish contract. Troubleshooting "ImagePullBackOff on first install" entry updated with the Dockerfile-based local-build workaround (was: "M3 release stream has not landed yet"). - `docs/followups/M3.md` "Container-image publish" item closed with the HTML-comment + struck-italic convention used by the rows already closed in that shard. New section "Items impossible to accomplish locally" added for the three M21-trigger items (end-to-end push, oci:// attestation smoke, two-build image-digest equality) so a future contributor does not file a "missing test" issue assuming the gap is oversight. - `CHANGELOG.md [Unreleased] ### Added` gains an M3 entry. ### Self-review fixes (commits 2-4) Two rounds of self-review surfaced and closed: **Round 1 (commit 2 — `7578feb`):** - **F3:** `cosign triangulate --type digest` was the wrong tool. It resolves the signature reference for a subject, not the subject's own digest. Replaced with `crane digest` (canonical tag→digest resolver); added `crane` to prerequisites. - **F5:** `SOURCE_DATE_EPOCH` did not actually reach buildkit. Build-args undeclared in the Dockerfile are silently ignored, so the COPY layer's mtime was non-deterministic. Now threaded through both `env:` block (buildkit layer-rewrite) and `ARG SOURCE_DATE_EPOCH` (Dockerfile contract). - **F1:** `release-doc-parity.sh` only covered the binary surface. Extended with a parallel block for image-side `cosign verify`. Mutation-verified. - **F4:** Force-push comment overstated the SHA pin's guarantee. Reworded to match the actual (binary-digest guard + tree-checkout) closure. **Round 2 (commit 3 — `7034e1a`, commit 4 — `459b686`):** - **R1 (gh CLI semantic drift):** New `scripts/gh-attestation-flag-lint.sh` parses `gh attestation verify --help` and asserts every long flag used in `release.yml` + `reproducibility.md` is still recognised by the installed CLI. Wired into `make doc-check`. Mutation-verified (mutated `--help` output that drops one flag → script exits 1 with fix hint). - **R2 (distroless base digest rotation):** New `scripts/base-digest-check.sh` compares the Dockerfile pin against `crane digest gcr.io/distroless/static-debian12:nonroot`. Two modes: `--warn` (default, exits 0) for periodic cadences and `--strict` (exits non-zero) for M21 release-prep via `make base-digest-check`. Deliberately NOT in `doc-check` (network + legitimate-lag). Mutation-verified. - **A++ #1 (gate-the-gate):** `scripts/testdata/release-doc-parity/{intact,drift-binary,drift-image}/` fixtures exercise both parity blocks; `scripts/test-release-doc-parity.sh` drives them with WORKFLOW/DOC env overrides and asserts expected exit codes. Mutation-verified: breaking the image-side awk anchor in the gate makes the `intact` fixture fail. - **R3 (`timeout-minutes`):** Out of scope (no per-job timeouts exist anywhere in `release.yml` today). Documented as a M3 follow-up with concrete per-job minute suggestions. - Commit 4 fixes one em-dash the doc-check em-dash gate flagged in the fixture README. ### Items impossible to accomplish locally (documented in `docs/followups/M3.md`) Three checks only become exercisable at M21 v0.1.0 (or any `vX.Y.Z` tag) push time, because the `image` job is tag-triggered: 1. **End-to-end image push smoke against `ghcr.io/tracecoreai/tracecore`.** Mitigations in place: `actionlint`, `release-doc-parity.sh` image block, `gh-attestation-flag-lint.sh`, binary-digest guard. 2. **`gh attestation verify "oci://$DIGEST"` against a real attestation in the shape this pipeline emits.** No public OCI image carries a GitHub Actions provenance attestation in matching shape, so the verifier walkthrough cannot be smoke-tested end-to-end before M21. `gh-attestation-flag-lint.sh` partially covers this by asserting flag-name compatibility; semantic flag changes are the residual risk. 3. **Two-build digest equality for the image.** The `SOURCE_DATE_EPOCH` plumbing claims image reproducibility, but the claim is only verifiable by building twice at the same SHA and diff'ing the manifest digests. The local dev environment currently lacks a working `docker buildx`; CI has buildx but doubling the runner-time at every tag is a tradeoff worth revisiting post-M21. ## Release notes ```release-notes [FEATURE] Container images publish to ghcr.io/tracecoreai/tracecore:<TAG> on every release tag, signed and attested (cosign keyless + SLSA v1.0 provenance, both pushed to the registry). The Helm chart's default image.repository is now a live pull path. Verification walkthrough in docs/reproducibility.md steps 8-9. ``` ## Test plan - [x] `make ci` clean: golangci-lint, govulncheck, vet, mod-verify, RCE gate, register-lint, actionlint, zizmor, all unit/race tests. - [x] `make doc-check` clean (14 sub-gates including 3 new ones from round 2: `test-release-doc-parity` (3 fixtures), `gh-attestation-flag-lint` (6 flags), image-side `release-doc-parity` block). - [x] `actionlint` clean on `release.yml` after the `env:` block addition. - [x] `make base-digest-check` clean against live gcr.io (pinned digest is current). - [x] Mutation-verified: every new gate's failure mode (gh CLI flag rename, Dockerfile digest forge, parity-script regex break). - [x] Dockerfile validates by inspection: distroless base pinned by digest; UID 65532 matches chart `runAsUser`; ENTRYPOINT/CMD shape allows the chart's `args: [collect, --config=/etc/tracecore/config.yaml]` to override cleanly; `ARG SOURCE_DATE_EPOCH` declared for local-reproducibility. - [ ] End-to-end image push exercise + `gh attestation verify oci://` against a real attestation + two-build image-digest equality: impossible locally; see "Items impossible to accomplish locally" above. First real exercise will be M21 v0.1.0 (or any pre-release tag). 🤖 Generated with [Claude Code](https://claude.com/claude-code) ### Update (commits 5-6, after main moved further) While this PR was open, `main` advanced an additional 4 PRs (#143, #144, #146, #147, #148, #149). Branch caught up via `git merge origin/main` per the merge-not-rebase policy this PR also documents (see commit `ddf86f7`). - **Commit 5 — `ddf86f7`:** Adds the explicit branch-sync guidance to `CONTRIBUTING.md`. Triggered by direct observation in this session that the implicit "rebase to keep main linear" assumption was wrong (`required_linear_history` on `main` only governs PR landing; squash-merge collapses any feature-branch shape). - **Commit 6 — `59e675c`:** Merge commit resolving conflicts in `CHANGELOG.md` (additive) and `docs/followups/M3.md` (PR #143 partially-shipped row vs my closure HTML comment). `Makefile` auto-merged cleanly with the new gate wires from commits 2-4 plus the `validator-recipe` target from #144. All 14 doc-check gates still green post-merge. The merge commit is preserved on the branch (`git merge` with `--no-ff`); GitHub squash-merge on the PR button will collapse it into the same single-commit-on-main shape every other tracecore PR lands as. ### Update (commit 7 — `285640c`, A+ polish) Self-review pass after the merge surfaced two cross-cutting hardening items both worth one-line-per-job to land, and both gaps that would have made the surface incomplete: 1. **`timeout-minutes` on every `release.yml` job** (build=20, sbom=15, sign=10, provenance=10, image=20, release=10). GitHub's default ceiling is 360m / 6h; a wedged push or hung Sigstore round-trip now fails fast inside the per-job cap rather than burning a runner-hour. Caps chosen at 2-4x observed real wall-clock so transient ghcr/Sigstore weather doesn't trip on healthy runs. Closes the M3.md row that previously held this out as "opportunistic." 2. **`cosign verify-attestation --type slsaprovenance1` smoke check** in the `image` job after `attest-build-provenance` pushes the SLSA v1 attestation to the registry. Uses the same identity binding (refs/tags/$TAG + release.yml workflow path + push trigger) the manifest-signature verify already enforces. Now every artifact this pipeline publishes — binary blob, image manifest, image provenance — is CI-verified inside the same run that produced it, against the same identity claims a third-party verifier would reproduce offline. `docs/followups/M3.md` also gains a new explicit "Out of scope for M3" section rowing three items the self-review asked about: multi-arch image build (`linux/arm64`), container vulnerability scan gate (trivy/grype), and image SBOM sub-attestation (syft/cyclonedx with `--upload`). Each is rowed with a trigger so a future audit can find them without commit archaeology rather than ambiguously deferred. `actionlint` clean on `release.yml`; `make doc-check` clean across all gates including the new `release-doc-parity` image block, `test-release-doc-parity` (3/3 fixtures), `gh-attestation-flag-lint` (6 flags), and `chart-appversion-check`. --------- Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Tri Lam <tri@maydow.com>
5 tasks
trilamsr
added a commit
that referenced
this pull request
Jun 2, 2026
…460) (#466) ## Summary Closes #460. The `exit 0` on `scripts/doc-check.sh` ran unconditionally whenever `docs/FAILURE-MODES.md` carried no `Test*`/`Fuzz*`/`Benchmark*` identifiers (its current state on `main` — `grep -c` = 0), silently bypassing every gate below it. Fix scopes the skip to the Go-test parity block only (if/else, not `exit`), then surfaces and fixes the dead refs the gates were supposed to be catching. ## Root cause Commit a57883f (#13) shipped `doc-check.sh` with one gate — the Go-test name parity check — so `[ -z "$referenced" ] && exit 0` was correct then. PRs #28, #56, #115, #131, #144, #149, #195, #234, #241, #443, #455, #459 (and others) appended gates **below** that line without recognising they'd become dead code whenever `FAILURE-MODES.md` lost its `Test*` references. PR #459 worked around the bug by placing its new YAML gate *above* line 99 and tracked the root cause separately as #460. ## What surfaced Once `exit 0` was removed, three real issues fired: 1. **Dead `.md` link**: `docs/FOLLOWUPS.md` → `followups/otlphttp.md`. The shard was never committed to `main`'s ancestry. Folded into the existing "Shards deleted post-v0.2.0 as fully resolved-via-pivot" prose block (sibling treatment to M9, M14, M16). 2. **Banned-phrase hits** (3x `production-grade`): reworded in `docs/cut-criteria.yaml.md` (2x) and `install/kubernetes/tracecore/README.md` (1x) to falsifiable language. 3. **`docs/getting-started.md` block cap**: 7 fenced bash/sh blocks. The M6 cap of 5 was set for the quickstart only — `## Install via Helm` and `## Air-gapped install` are alternate deployment paths that landed post-M6 and aren't part of the quickstart budget. Rescoped the gate to count blocks inside the `## Walkthrough` H2 section only (1 block, well under cap). ## Gate count Empirically verified via `grep -c '^doc-check: '` on `make doc-check` output on a clean tree: | State | Status lines emitted | Gates the early-exit was hiding | |---|---|---| | Pre-fix on `main` (post-#459) | 3 (trust-posture, YAML cross-link, parity-skip) | 14 | | Post-fix this PR (post-rebase) | 17 | 0 | The "14 gates hidden" number is invariant across the rebase: it counts gates placed below the early-exit line. The "3 → 17" total reflects post-#459 reality on `main`; pre-#459 baseline was "2 → 16" (the figure originally in this PR body), and #459 itself worked around the bug by placing its YAML gate above line 99. ## Mutation tests Each gate below the original early-exit was confirmed to fire post-fix: | Mutation | Gate expected to fire | Exit code post-mutation | Exit code post-restore | |---|---|---|---| | Inject `[bad](nonexistent-ghost.md)` into `docs/FOLLOWUPS.md` | markdown link-rot | 1 | 0 | | Append `blazing-fast` + `rock-solid` to `docs/getting-started.md` | banned-phrase lint | 1 | 0 | | Delete `<!-- tested-against: ... -->` from `docs/integrations/datadog.md` | M6 recipe markers | 1 | 0 | ## Test plan - [x] `make doc-check` exits 0 on clean tree (re-run post-rebase onto origin/main; 17 status lines) - [x] 3 mutation tests above each toggle exit 1 → 0 across mutate / restore - [x] Pre-push hooks green: golangci-lint (0 issues), `go vet ./...`, `go mod verify`, `attribute-namespace-check` (100 attrs, all documented), `register-lint`, `actionlint`, `zizmor`, `deprecation-check`, `no-autoupdate-check` - [x] Rebased onto current `origin/main` (includes #459, #461, #462, #456); no conflicts; gate count re-verified empirically post-rebase - [x] No changes to gates above line 99 (the trust-posture callout + YAML cross-link gate from #459 still run and emit unchanged status lines) ## Self-grade **A+** — root cause named in commit body (a57883f #13 with one gate; gates appended below without exit-path awareness); 3 mutation tests (success criteria required 1–2); rescoped the getting-started gate to match M6 intent rather than papering over the surfaced overflow; the `[ -z "$referenced" ]` legitimate skip is preserved via if/else (not `:` no-op, which would have left the `defined=` / `orphans=` block running on empty input); gate count corrected empirically post-rebase per reviewer B feedback. ```release-notes - fix(ci): `scripts/doc-check.sh` no longer exits 0 at the Go-test parity gate when `docs/FAILURE-MODES.md` carries no `Test*` references. 14 gates below that line (link-rot, banned-phrase, M6 recipe markers, etc.) are now actually enforced on every `make doc-check` invocation. Closes #460. ``` --------- Signed-off-by: Tri Lam <tree@lumalabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two commits, both follow-on from PR #147:
[ci] integration paths: add substrate to kernelevents + pyspy— closes the workflow-paths audit gap surfaced in PR [docs] followups bundle: 10 easy items (3 strikes, 1 test add, 6 docs) #147'sdocs/followups/otlphttp.md"Workflow paths trigger" row.[docs] MILESTONES: backfill rubric blocks for M1, M2, M4, M9— closesdocs/followups/M3.md"Backfill Foundation milestone rubrics" row.Both follow-up rows are marked
[x]in their respective shards with strike-through and ship-evidence.Commit 1: integration workflow paths
PR #147's audit found that
.github/workflows/kernelevents-integration.ymlandpyspy-integration.ymlpaths:filters cover only:components/receivers/<name>/**internal/runtime/lifecycle/**So a change to
cmd/tracecorefactory wiring,internal/pipelinecontract, orinternal/selftelemetrysurface could land without re-running either integration suite — even though receiver behavior depends on all three substrates.This commit adds the three substrate path patterns to both
push:andpull_request:filters on both workflows. Symmetric withinstall-bench.yml(P3-Rev1 #10 fix) andchart.yml.chaos.ymlwas audited and intentionally not changed — its substrate coupling runs viatools/failure-inject/**+internal/synthesis/**only.actionlintclean on both workflows.Commit 2: MILESTONES rubric backfill
M1, M2, M4, M9 predate the per-rubric
☑convention adopted in PR #53 and shipped as prose-only delivery summaries. This commit reformats each section to match M3 / M5b / M10+ shape:☑bullets citing RFC sections or shipped file paths.Every claim was extracted from the existing prose summary; no new guarantees added. Source citations:
safe.Call, operator UX)./metrics+/healthz+/readyz,selftelemetry.Receiver, O2 SLO gauges, three OTel divergences closed)..golangci.yml+Makefile+scripts/(no RFC; convention is the tooling files themselves)./dev/kmsg+ journald via onesourceinterface, NVRM Xid extraction, RE2 filters compile at Start, trace context propagation, non-Linux stubs, overhead budget).Also fixed three stale
docs/FOLLOWUPS.mdreferences that survived the shard split (PR #132):docs/followups/M3.mddocs/followups/opportunistic.mddocs/followups/M8.mdFiles changed
.github/workflows/kernelevents-integration.yml.github/workflows/pyspy-integration.ymldocs/followups/otlphttp.mdMILESTONES.mddocs/followups/M3.mdRelease notes
Test plan
make cigreen (verify, verify-lint, verify-static, verify-test, build, vet, golangci-lint, zizmor, actionlint, govulncheck, fuzz 30s).bash scripts/doc-check.shgreen (em-dash + en-dash diff gate clean, comment-noise diff gate clean, 452 markdown links resolve).actionlintclean on both modified workflows.bash scripts/no-autoupdate-check_test.sh10/10 assertions pass.release-doc-parityclean,chart-appversion-checkclean,alert-checkclean.Sequencing
Builds on PR #147 (merged) which surfaced both follow-up items. Independent of any currently-open PRs.
🤖 Generated with Claude Code