Skip to content

chore(pivot): pre-PR-A drift sweep + Helm security tighten#169

Merged
trilamsr merged 2 commits into
mainfrom
cleanup-pre-pivot
May 30, 2026
Merged

chore(pivot): pre-PR-A drift sweep + Helm security tighten#169
trilamsr merged 2 commits into
mainfrom
cleanup-pre-pivot

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

Summary

Pre-PR-A drift sweep + Helm security tighten. All zero-blocker edits, no OCB or recipe dependency. Sets the stage for PR-A (OCB skeleton) by clearing pre-pivot build-first language and tightening operator-facing surfaces.

LOC delta: +75 / -53 across 19 files.

Doc drift (10 files)

File Edit
README.md Quickstart: components.yaml is current; builder-config.yaml lands at v0.1.0
CONTRIBUTING.md Drop clockreceiver/dcgm/docs_parity walkthrough refs; route via RFC-0013 §6 + tracecore-components
STYLE.md components.yaml + components-gen marked superseded by RFC-0013
docs/STRATEGY.md Divergence row: metric names adopt otelcol_* per RFC-0013 §3
docs/FAILURE-MODES.md tracecore_*otelcol_*; deprecated-receiver alert rows flagged
docs/getting-started.md Stale tracecore.container.lines_per_s flagged as pre-v0.1.0 placeholder
docs/FLAKY-TESTS.md Clockreceiver* test row marked scheduled-for-deletion
docs/README.md Receiver index: dcgm/kernelevents/k8sevents/clockreceiver/pyspy rows annotated per RFC-0013 §7
MILESTONES.md M5/M9/M13/M14/M16 lane rows marked obsolete-post-RFC-0013
AGENTS.md No edit — already RFC-0013-aware

Helm + chart (5 files)

File Edit
Chart.yaml appVersion sync comment: post-PR-A target is builder-config.yaml dist.version
install/.../README.md Pre-OCB build instructions flagged transitional
values.schema.json additionalProperties: false on podSecurityContext + containerSecurityContext — typo overrides now error
templates/daemonset.yaml Downward-API env gated on containerstdout/k8s_events/kernelevents enabled flags; fail-guard when containerstdout.enabled=true but rbac.create=false
templates/containerstdout-rbac.yaml TODO(RFC-0013) on cluster-wide Node get (least-priv scope defer to OCB-era RBAC refactor)
policies/conftest/tracecore.rego hostPath.path validator for containerstdout-pod-logs volume

Code quality (3 files)

File Edit
internal/runtime/lifecycle/lifecycle.go WARN log runtime.NumGoroutine() on Shutdown leak (silent goroutine leak no longer hidden)
cmd/tracecore/integration_test.go goleak.Find() runs unconditionally (was gated on code == 0; hidden hangs if unrelated test fails)
components/receivers/nccl_fr/factory.go Early cfg.Validate() in factory (mirrors containerstdout/factory.go:77)

Meta

  • .gitignore: .claude/scheduled_tasks.lock ignored

Skipped (justified)

  • Self-tel wiring extract: surviving receivers (containerstdout + nccl_fr) have divergent patterns; helper would force a forced abstraction. Reassess post-PR-F when only moat code remains.
  • Containerstdout API surface trim: every exported symbol is referenced by black-box _test.go files in same dir. Unexporting forces test rewrites for an M15 alpha. Defer to post-PR-A when test scaffold gets rewritten for OCB anyway.

Test plan

  • go build ./...
  • go test -race -count=1 ./internal/runtime/lifecycle/... ./cmd/tracecore/... ./components/receivers/nccl_fr/...
  • make ci exit=0 (coverage, generate-check, doc-check, build, lint, license, govulncheck, register-lint, actionlint, zizmor, nccl-fr-rce-gate, ci-fuzz-nccl-fr, no-autoupdate-check)
  • CI gates green on PR
chore: pre-PR-A drift sweep + Helm security tighten (doc rewrite, downward-API gating, schema strict, lifecycle leak log).

Tri Lam added 2 commits May 30, 2026 02:37
Doc + Helm + small code-quality wins before PR-A (OCB skeleton).
All edits are zero-blocker, no OCB or recipe dependency.

Doc drift (10 files):
- README quickstart: components.yaml is current; builder-config.yaml at v0.1.0
- CONTRIBUTING: drop clockreceiver/dcgm walkthrough refs; route via
  RFC-0013 §6 + tracecore-components
- STYLE: mark components.yaml + components-gen superseded by RFC-0013
- STRATEGY: reconcile divergence row (metric names adopt otelcol_*)
- FAILURE-MODES: tracecore_* -> otelcol_*; flag deprecated-receiver rows
- getting-started: stale metric flagged as pre-v0.1.0 placeholder
- FLAKY-TESTS: clockreceiver test row marked scheduled-for-deletion
- docs/README: receiver index annotated per RFC-0013 §7
- MILESTONES: M5/M9/M13/M14/M16 lane rows marked obsolete-post-RFC-0013

Helm + chart (5 files):
- Chart.yaml: appVersion sync comment updated for post-PR-A target
- install README: pre-OCB build instructions flagged transitional
- values.schema.json: additionalProperties:false on pod/container
  securityContext (catches typo overrides)
- daemonset.yaml: downward-API env now gated on
  containerstdout/k8s_events/kernelevents enabled flags; fail-guard
  when containerstdout.enabled=true but rbac.create=false
- containerstdout-rbac.yaml: TODO(RFC-0013) on cluster-wide Node get
  (least-priv scoping defer to OCB-era RBAC refactor)
- conftest tracecore.rego: hostPath path validator for
  containerstdout-pod-logs volume

Code quality (3 files):
- internal/runtime/lifecycle/lifecycle.go: WARN log
  runtime.NumGoroutine() on Shutdown leak
- cmd/tracecore/integration_test.go: goleak.Find() runs
  unconditionally (was gated on test-suite exit=0)
- components/receivers/nccl_fr/factory.go: early cfg.Validate() in
  factory (mirrors containerstdout pattern)

Skipped (justified):
- Self-tel wiring extract: surviving receivers (containerstdout,
  nccl_fr) have divergent patterns; helper would force forced
  abstraction. Reassess post-PR-F.
- Containerstdout API surface trim: every exported symbol referenced
  by black-box _test.go files in same dir. Unexporting forces test
  rewrites for an M15 alpha. Defer to post-PR-A when test scaffold
  gets rewritten anyway.

LOC delta: +74 / -53. make ci green.

Refs RFC-0013.

Signed-off-by: Tri Lam <tri@maydow.com>
Root cause: my downward-API env gate referenced
.Values.receivers.k8s_events.enabled but values.yaml has no
k8s_events block (k8s_events receiver factory exists but never
got a values.yaml knob; receiver runs with factory defaults).

Result: helm lint + helm template both failed with "nil pointer
evaluating interface {}.enabled". Both render and install-bench
CI jobs surfaced this.

Fix: tighten the gate to (containerstdout || kernelevents) only.
k8s_events receiver does not consume downward-API env vars in
the current factory wiring, so excluding it is correct.

Verified: alpine/helm:3.16.4 lint clean post-fix.

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr merged commit b10ff20 into main May 30, 2026
13 checks passed
@trilamsr trilamsr deleted the cleanup-pre-pivot branch May 30, 2026 09:56
trilamsr added a commit that referenced this pull request May 31, 2026
## What this PR does

Bundles three RFC-0013 PR slices that have zero file overlap with each
other.

### PR-C: release pipeline → goreleaser stack

- New `.goreleaser.yaml`: linux/amd64 + linux/arm64 builds; reproducible
via `SOURCE_DATE_EPOCH`; LDFLAGS shape matches the Makefile build
target.
- Rewritten `.github/workflows/release.yml`: invokes goreleaser,
`anchore/sbom-action`, `sigstore/cosign-installer`,
`slsa-framework/slsa-github-generator` (tag-pinned per SLSA OIDC subject
identity requirement; all other actions SHA-pinned per repo security
policy), `actions/attest-build-provenance`.
- Old `release.yml` moved to
`.github/workflows/archived/release.yml.legacy`.
- Goreleaser builds the **legacy** `cmd/tracecore` binary; OCB-output
migration deferred to PR-D (image build → ko), per inline comment in
`.goreleaser.yaml`.

### PR-G + PR-H: RFC supersession + top-level docs alignment

- Audit confirmed all 12 RFCs already carry the correct supersedence
headers from prior pivot work (PRs #166/#168/#169/#170). Only two
top-level docs needed alignment:
- `NORTHSTARS.md` O1 caveat: replaced "own-binary architecture"
assumption wording with OCB-distribution-posture wording; closed Open
Question #1 by RFC-0013 ref.
- `CHANGELOG.md`: appended pivot-wave-1 PR list
(#166/#168/#169/#170/#171/#172/#173) citing PR-A as the prior step
before this commit.
- No edits needed to
README/STRATEGY/PRINCIPLES/MILESTONES/CONTRIBUTING/AGENTS/docs/README —
all already aligned.

### PR-E: clockreceiver swap — BLOCKED

- `telemetrygeneratorreceiver` does not exist in
`opentelemetry-collector-contrib` at any version. Verified against the
Go module proxy, GitHub tree API at v0.95→v0.130, and the full receiver
listing at v0.110.0 (94 receivers; no `telemetrygenerator`, `loadgen`,
`mockreceiver`, `dummyreceiver`, or any `*generator*`). The RFC-0013 §1
example shape referenced it speculatively; it was never upstreamed.
- `builder-config.yaml`: replaced the misleading "no v0.110.0 tag"
omission comment with a verified TODO block describing the actual
blocker (receiver doesn't exist anywhere) and decision rationale.
- `bench/install/tracecore-values.yaml`: appended `[BLOCKED]` marker on
the clockreceiver→telgen mapping; bench continues to use in-tree
clockreceiver until PR-F deletes it (likely rewires to
`hostmetricsreceiver`).

## Root cause (PR-E blocker)

RFC-0013 §1 listed `telemetrygeneratorreceiver` as the swap target
without verifying the receiver existed upstream. Reality: the OTel
contrib repo has no such module path at any tag. PR-E cannot complete
until either (a) the receiver lands upstream, or (b) a different
replacement is chosen (e.g., `hostmetricsreceiver` for heartbeat
semantics). Tracked in the in-file TODO block; revisit in PR-F (delete
clockreceiver) or as a separate followup.

## Release notes

```release-notes
[CHANGE] Release pipeline migrated to goreleaser + SBOM + SLSA provenance + cosign signing. The release.yml workflow now invokes goreleaser instead of building binaries directly. Operators consuming release artifacts: artifact shape (filename, archive contents, checksum file format) follows goreleaser defaults; see CHANGELOG.md for the migration note.
```

## Test plan

- [x] `make verify` runs and passes
- [x] `make actionlint` passes (new release.yml workflow +
suppression-block YAML valid)
- [x] `make zizmor` passes (SLSA reusable-workflow tag-pin justified
inline + accepted)
- [x] `make build` (legacy) still works
- [x] `make build-ocb` (OCB) still works
- [ ] Goreleaser dry-run in CI on first push to a tag (gated until a tag
exists)

Signed-off-by: Tri Lam <tri@maydow.com>
Co-authored-by: Tri Lam <tri@maydow.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant