Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions docs/2026-Q1-retrospective.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 2026-Q1 retrospective

**Scope.** Backfilled per [`v1-rc1-governance-gaps.md`](v1-rc1-governance-gaps.md) §3.
**Scope.** Backfilled per [`history/v1-rc1/governance-gaps.md`](history/v1-rc1/governance-gaps.md) §3.
Covers the project's first epoch: repo scaffold (2026-05-07) through end of
the v0.2.0 / v0.3.x pivot wave (2026-05-31). The strict calendar quarter
(Jan–Mar 2026) is pre-repo; this retro adopts the project-quarter
Expand Down Expand Up @@ -62,15 +62,15 @@ targets, then this retro for the snapshot.
≥80% by M12; current state is governance-doc-only coverage with
zero rules touching `components/`, `internal/`, `module/`,
`install/`, `scripts/`, `tools/`, `python/`, `docs/`, or `bench/`.
Audited in [`v1-rc1-governance-gaps.md`](v1-rc1-governance-gaps.md) §1.
Audited in [`history/v1-rc1/governance-gaps.md`](history/v1-rc1/governance-gaps.md) §1.
Remediation lands inside the rc1 cap as an in-repo issue.

2. **Lint-enforced principles at 4 of 16, not the ≥6 target.** Strict
`golangci-lint` coverage hits §3 (depguard), §8 (revive), §9
(errcheck/errorlint), §13 (loggercheck/contextcheck). PRINCIPLES.md
grew §16 ("Adopt > build") this quarter without the KPI denominator
being updated from the NORTHSTARS-original "15" → "16" — drift
recorded in [`v1-rc1-governance-gaps.md`](v1-rc1-governance-gaps.md) §2.
recorded in [`history/v1-rc1/governance-gaps.md`](history/v1-rc1/governance-gaps.md) §2.

3. **Quarterly retro discipline missed (this doc is the backfill).**
NORTHSTARS O6 lists the retro as a supporting KPI; absence is a
Expand Down Expand Up @@ -156,17 +156,17 @@ memory). Source for each row in parentheses.
| Quarterly ship-commitment hit-rate (O6 hero) | ≥80% | v0.1.0-m1, v0.2.0, module/v0.3.0 cut; pattern library exceeded 6-pattern goal (12 docs shipped) | `git tag --sort=creatordate`; [`docs/patterns/`](patterns/) |
| Quarterly retro published | published | this doc (backfilled) | [`docs/2026-Q1-retrospective.md`](2026-Q1-retrospective.md) |
| RFC accept/reject/supersede log | logged per quarter in retro | RFC section above | RFC `Status:` headers |
| `make ci` runtime | <60s on dev laptop | 148s measured (split mitigation in #362) | [`v1-rc1-governance-gaps.md`](v1-rc1-governance-gaps.md) §5 |
| CODEOWNERS coverage of code paths (O7) | ≥80% by M12 | 0% (governance docs only) | [`v1-rc1-governance-gaps.md`](v1-rc1-governance-gaps.md) §1 |
| Lint-enforced principles (O7) | ≥6 of 16 | 4 of 16 strict; 9 of 16 if counting scripted gates | [`v1-rc1-governance-gaps.md`](v1-rc1-governance-gaps.md) §2 |
| Maintainer count with merge authority (O7) | ≥3 by M9 | 1 | [`docs/maintainership.md`](maintainership.md); [`v1-rc1-governance-gaps.md`](v1-rc1-governance-gaps.md) §6 |
| `make ci` runtime | <60s on dev laptop | 148s measured (split mitigation in #362) | [`history/v1-rc1/governance-gaps.md`](history/v1-rc1/governance-gaps.md) §5 |
| CODEOWNERS coverage of code paths (O7) | ≥80% by M12 | 0% (governance docs only) | [`history/v1-rc1/governance-gaps.md`](history/v1-rc1/governance-gaps.md) §1 |
| Lint-enforced principles (O7) | ≥6 of 16 | 4 of 16 strict; 9 of 16 if counting scripted gates | [`history/v1-rc1/governance-gaps.md`](history/v1-rc1/governance-gaps.md) §2 |
| Maintainer count with merge authority (O7) | ≥3 by M9 | 1 | [`docs/maintainership.md`](maintainership.md); [`history/v1-rc1/governance-gaps.md`](history/v1-rc1/governance-gaps.md) §6 |
| Release cadence (O6) | ≥1 minor / quarter | 1 minor (v0.2.0) + 1 module-tag (module/v0.3.0) | `git tag` |
| Time-to-merge p50 / p90 (O6) | <7d / <14d | not measured this quarter; instrument in 2026-Q2 retro | gh PR API (deferred) |

**Carry-forward into 2026-Q2 retro:**

- Close CODEOWNERS coverage gap (in-repo, follow-up issue from
`v1-rc1-governance-gaps.md` §1).
`history/v1-rc1/governance-gaps.md` §1).
- Reconcile PRINCIPLES.md §10 line with current `make ci` budget.
- Measure PR time-to-merge p50/p90 across the v0.3.x wave.
- Track RFC-0015+ decisions (if any) and re-snapshot the RFC table.
Expand Down
2 changes: 1 addition & 1 deletion docs/MILESTONES.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ M20a/b/c are gates against the same artifact (`bench/install/run.sh`) at progres
- **Depends on:** M3, M5b, M6, ≥3 receivers at alpha (M8 partial; M10/M13/M15/M16 from Lanes 4-5; M11/M12 from Lane 6 if flood gate open)
- **NORTHSTARS coupling:** NORTHSTARS.md O1 targets 3 patterns covered at M6/v0. If the flood gate has not opened by M21, only M19 (pattern #14, GPU-independent) is guaranteed; M17 (pattern #1) and M18 (pattern #6, build-time coupled to M17's `cross_rank.go`) are at risk. Either the flood gate opens before M21 or NORTHSTARS O1 is explicitly relaxed in M21's release notes with written reason - silent divergence is a blocker per PRINCIPLES §15.
- **Carry-forward from M3:** asset-shape reconciliation owed at the v0.1.0 cut. M3's `release.yml` publishes raw `tracecore_<tag>_linux_amd64` (not the `.tar.gz` line 156 names), `*.cosign.bundle` (not the detached `*.sig` line 156 names), and `*.intoto.jsonl` (file is a Sigstore bundle JSON, not in-toto JSONL; extension is the de-facto convention but misleads sniff-by-extension tooling). M21 decides: keep raw-binary + bundle, switch to tar.gz + detached `.sig`, and pick a stable name for the Sigstore-bundle artifact. The hardening backlog (SLSA L3, build-env sanitization, CycloneDX `mod`→`app`, cosign / `gh attestation` flag tightening, nightly drift cron, repo tag-protection on `v*`, CI Actions linter, github-actions Dependabot, Rekor log-index in release notes) lives in [`docs/followups/M3.md`](followups/M3.md) "M3 release-pipeline hardening (post-PR #28)".
- **v1.0-rc1 operational-gap dependency:** the three gates between the current pipeline and a `v1.0-rc1` cut (SLSA L3 prerequisites, air-gapped install path, DaemonSet upgrade-rollback) are audited in [`docs/v1-rc1-operational-gaps.md`](v1-rc1-operational-gaps.md) with per-section remediation steps, effort estimates, and blockers. The doc's "Cross-cut" minimum bar (air-gap docs + M20 row; upgrade-rollback doc-reconcile + `minReadySeconds`) gates rc1; SLSA L3 stays an O3 stretch goal.
- **v1.0-rc1 operational-gap dependency:** the three gates between the current pipeline and a `v1.0-rc1` cut (SLSA L3 prerequisites, air-gapped install path, DaemonSet upgrade-rollback) are audited in [`docs/history/v1-rc1/operational-gaps.md`](history/v1-rc1/operational-gaps.md) with per-section remediation steps, effort estimates, and blockers. The doc's "Cross-cut" minimum bar (air-gap docs + M20 row; upgrade-rollback doc-reconcile + `minReadySeconds`) gates rc1; SLSA L3 stays an O3 stretch goal.

**Rubric summary:** Signed annotated tag `v0.1.0` (`git tag -v` passes). GitHub release ships `tracecore_v0.1.0_linux_amd64.tar.gz` + CycloneDX SBOM + SLSA `*.intoto.jsonl` provenance + cosign `*.sig`; post-release CI asserts presence + `cosign verify-blob` + `slsa-verifier verify-artifact` succeed on fresh checkout. `CHANGELOG.md` `## [0.1.0]` with `### Added`; release notes link `getting-started.md` + ≥1 `integrations/*`; ISSUE_TEMPLATE YAML lints clean; synthesis gate enumerates contributing milestones. NFR: `make release` on `v0.1.0` byte-identical via `diffoscope` (P0); pinned action SHAs + `id-token:write`/`contents:write` only + zero `zizmor`/`actionlint` findings; SBOM `len(components) >= direct_dep_count`; `SECURITY.md` referenced + private-vuln reporting verified enabled at release time.

Expand Down
2 changes: 1 addition & 1 deletion docs/NORTHSTARS.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ Seven lines of work. Each has one accountable owner role, one hero KPI, supporti
**Operating rule:** *Trust under load is the product* ([`PRINCIPLES.md`](../PRINCIPLES.md) §1). Any supply-chain regression - broken reproducibility, missing signatures, lapsed SBOM, missed disclosure SLA - is P0 and blocks the next release.

**Caveats:**
- SLSA L3 requires hermetic, parameterless builds signed by trusted infrastructure. The build system needs the work - not just the policy. **rc1 posture (2026-06):** L2 build platform + L3-grade provenance attestation infrastructure (Sigstore-Fulcio keyless under GHA OIDC, recorded in Rekor) — see [`v1-rc1-operational-gaps.md`](v1-rc1-operational-gaps.md#1-slsa-l3-prerequisites) §1 for the full posture and the upstream blocker (OCB-style generated entrypoints + missing pre-build hook in the trusted reusable workflow, tracked at [slsa-framework/slsa-github-generator#2483](https://github.com/slsa-framework/slsa-github-generator/issues/2483)). The architectural gap applies to both the build job and the user-defined sign job (`sign` consumes `package`'s artifact across a job boundary — the same "build influences signing" pattern L3 forbids), so both inherit the same deferral against M12.
- SLSA L3 requires hermetic, parameterless builds signed by trusted infrastructure. The build system needs the work - not just the policy. **rc1 posture (2026-06):** L2 build platform + L3-grade provenance attestation infrastructure (Sigstore-Fulcio keyless under GHA OIDC, recorded in Rekor) — see [`history/v1-rc1/operational-gaps.md`](history/v1-rc1/operational-gaps.md#1-slsa-l3-prerequisites) §1 for the full posture and the upstream blocker (OCB-style generated entrypoints + missing pre-build hook in the trusted reusable workflow, tracked at [slsa-framework/slsa-github-generator#2483](https://github.com/slsa-framework/slsa-github-generator/issues/2483)). The architectural gap applies to both the build job and the user-defined sign job (`sign` consumes `package`'s artifact across a job boundary — the same "build influences signing" pattern L3 forbids), so both inherit the same deferral against M12.
- Reproducibility policy lives here (P0 designation, response); the CI gate that *enforces* it lives in O2. Same artifact, two homes - by design.
- Disclosure SLAs are aspirational until tracecore has a security inbox in place; [`SECURITY.md`](../SECURITY.md) must name the contact before the SLA clock starts publicly.

Expand Down
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Legend: 👤 operator · 🛠️ contributor · 🏛️ maintainer · 🌐 exter
| [maintainership.md](maintainership.md) | 🏛️ 🛠️ | Governance: who has commit access, how RFCs are sponsored, how security issues are handled. |
| [ATTRIBUTES.md](ATTRIBUTES.md) | 👤 🛠️ | Customer-stable attribute namespace inventory + soft-lock policy. Every `pattern.*` / `tracecore.*` / `hw.gpu.*` / `k8s.*` / `nccl.fr.*` / `kernelevents.*` / `gen_ai.training.*` key the collector emits or consumes, with stability tags and the v0.4-advisory → v1.0-enforced rename policy. |
| [v1-rc1-cut-criteria.md](v1-rc1-cut-criteria.md) | 🏛️ | Twelve falsifiable rubrics for the `v1.0.0-rc1` cut (deriving from NORTHSTARS O1-O7) + Tier-2 GA path-clearing items + out-of-scope deferrals. Authoritative rubric source for `MILESTONES.md` M22. |
| [v1-rc1-governance-gaps.md](v1-rc1-governance-gaps.md) | 🏛️ | Audit of O6 velocity + O7 governance supporting KPIs (CODEOWNERS coverage, lint-enforced principles, quarterly retros, RFC log, `make ci` budget, maintainer count). One section per gap; closing action list. |
| [history/v1-rc1/](history/v1-rc1/) | 🏛️ | Archived v1.0-rc1 audit snapshots — governance-gaps + operational-gaps. See [history/v1-rc1/README.md](history/v1-rc1/README.md). |
| [standards-roadmap.md](standards-roadmap.md) | 🏛️ | NORTHSTARS O4 tracking artifact for the `gen_ai.training.*` semconv upstream motion. Inventory of upstream + tracecore-emitted training keys, proposal set for PR-1/PR-2, SIG cadence (Tuesdays 09:00 PT), competing-proposal risk (`rl.*` Issue #88), and cross-ref to in-repo work that depends on each PR landing. |

## Subdirectories
Expand Down
2 changes: 1 addition & 1 deletion docs/audits/wave-2026-06-01.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Audit of 27 PRs merged this session (#339–#374) per `feedback_review_disciplin
| 6 | **`TrainingStepStallRecord` is the only intentionally shared `Record` type.** No other accidental sharing detected. Other shared types in `model.go` are intentional core abstractions. **Clean.** | none | `module/pkg/patterns/checkpointer_hang.go:105` (canonical) | **No action.** |
| 7 | **12 stale "future PR-B" comments** across `module/` + `docs/` reference RFC-0014 `WithMetrics` bridge. Resolved: ADR-0001 PR-B shipped for cuda_oom (#10) via [#437](https://github.com/TraceCoreAI/tracecore/issues/437) / PR #461; comment sweep landed via [#380](https://github.com/TraceCoreAI/tracecore/issues/380) — surviving references now point at the cuda_oom precedent and name patterns #1 / #2 / #3 / #4 / #5 as pending sibling-consumer follow-ups under #260. | low | 10 hits in `module/`, 2 in `docs/` | **Closed.** |
| 8 | **`module/doc.go` references PR-I.1a / PR-I.1b / PR-I.2 as "Contents land in…"** — these PRs all landed long ago. Comment is a historical artifact. | low | `module/doc.go:5-13` | **Trim** — 8-line deletion. **Issue [#381](https://github.com/TraceCoreAI/tracecore/issues/381).** |
| 9 | **`docs/v1-rc1-simplification-audit.md` is now historical** (references #333/#334 which closed). Compare to `v1-rc1-cut-criteria.md` (current source of truth) + `v1-rc1-test-audit.md` + `v1-rc1-governance-gaps.md` + `v1-rc1-operational-gaps.md`. | low | `docs/v1-rc1-simplification-audit.md` | **No action pre-RC1** — flip to "Status: ☑ shipped (historical)" at RC1 tag time. Bundled into #383. |
| 9 | **`docs/v1-rc1-simplification-audit.md` is now historical** (references #333/#334 which closed). Compare to `v1-rc1-cut-criteria.md` (current source of truth) + `v1-rc1-test-audit.md` + `history/v1-rc1/governance-gaps.md` + `history/v1-rc1/operational-gaps.md`. | low | `docs/v1-rc1-simplification-audit.md` | **No action pre-RC1** — flip to "Status: ☑ shipped (historical)" at RC1 tag time. Bundled into #383. |
| 10 | **`chart.yml` workflow is 496 lines with 3 jobs that each re-setup helm + kind + image-build.** Triple `Install helm`, double `Build tracecore image`, double `Create kind cluster`, double `Load image into kind`. | medium | `.github/workflows/chart.yml:292,378` | **Refactor** — extract `.github/actions/kind-tracecore-up/`. ~60-line reduction. **Issue [#382](https://github.com/TraceCoreAI/tracecore/issues/382).** Not blocking RC1. |
| 11 | **`docs/rfcs/archived/0004-clockreceiver-stdoutexporter.md`** — only archived RFC. Convention OK. | none | — | **No action.** |
| 12 | **No `.bak` / `_unused` / `.orig` files** outside `archived/`. | none | — | **No action.** |
Expand Down
6 changes: 3 additions & 3 deletions docs/followups/M3.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,16 +53,16 @@ tracked at slsa-framework/slsa-github-generator#2483 ("Pre and Post
Build Actions for BYOB" — `slsa-prebuild-action-path` proposal;
open, type:feature, area:BYOB). (Corrects an earlier cite of #3033,
which is actually a Maven E2E test issue, not the pre-build hook
proposal — docs/v1-rc1-operational-gaps.md §1 corrected in the same
proposal — docs/history/v1-rc1/operational-gaps.md §1 corrected in the same
PR.) The rc1 posture is documented in
docs/v1-rc1-operational-gaps.md §1 (L2 build platform + L3-grade
docs/history/v1-rc1/operational-gaps.md §1 (L2 build platform + L3-grade
provenance attestation infrastructure), and docs/reproducibility.md
steps 6 + 9 now cite Build L2 rather than Build L1. Re-open if
upstream #2483 ships, or if M12's L3 binding becomes load-bearing
before then. -->
- *Closed (see comment above): L3 re-evaluated at rc1 cut and
deferred upstream-blocked (slsa-framework/slsa-github-generator#2483).
rc1 posture documented in docs/v1-rc1-operational-gaps.md §1; the
rc1 posture documented in docs/history/v1-rc1/operational-gaps.md §1; the
same architectural gap applies to the `sign` job migration, so
both inherit the same deferral. docs/reproducibility.md steps 6 + 9
now cite SLSA Build L2 (build platform) + L3-grade provenance
Expand Down
13 changes: 13 additions & 0 deletions docs/history/v1-rc1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# v1.0-rc1 Audit History

Archived audit snapshots from the `v1.0.0-rc1` cut window. Preserved
for traceability; do not edit historical claims.

| File | Scope |
|---|---|
| [governance-gaps.md](governance-gaps.md) | O6 velocity + O7 governance KPI audit (CODEOWNERS, lint-enforced principles, retros, RFC log, `make ci` budget, maintainer count). |
| [operational-gaps.md](operational-gaps.md) | SLSA L3 prerequisites, air-gapped install path, DaemonSet upgrade-rollback. |

The live rubric `docs/v1-rc1-cut-criteria.md` is generated by
`make cut-criteria-render` (path pinned in `scripts/cut_criteria.py`)
and stays in `docs/` for the render gate.
Loading
Loading