docs(migration): PR-L — flesh v0.1.x→v0.2.0 guide body#191
Merged
Conversation
Expand the migration guide from the PR-179 skeleton (and the PR-A2 metric-name / chart-values rows landed inline) into a comprehensive v0.1.x → v0.2.0 cutover guide. Every post-wave-2 landing now has an operator-facing migration row: - CLI surface table — every removed subcommand and flag from PR-A2, with the upstream replacement. - Self-telemetry metric vocabulary table — `tracecore_*` → `otelcol_*` for receiver / exporter / queue / component-status / build-info families. Names the two contract metrics `ocb_scrape_test` pins. - Helm chart values rename — `telemetry.listen` + `telemetry.paths.*` → `telemetry.metricsListen` + `telemetry.healthListen` + `telemetry.healthPath`. Default-port values inline. - Probes — `:8888/healthz` + `:8888/readyz` → `:13133/`. - Default pipeline — `clockreceiver→stdoutexporter` → `hostmetrics→debug` snippet. - Orphan components — all 9 (7 receivers + 2 exporters) mapped to their upstream replacement + PR-J recipe. - `stdoutexporter` failure-rate gap — debugexporter pins send_failed_* at zero, so debug-only pipelines lose the signal. - Build / CI changes — Makefile, output path, source tree, smoke, image build, release pipeline, version source. - `internal/*` package deletion (PR-F, in flight) — selftelemetry, lifecycle, componentstatus, telemetry public-surface migration map. - Reproducibility note — `0.1.0-m9-alpha` hardcoded; cross-ref to `docs/reproducibility.md` workaround. - Verification — adds probe smoke test + `tracecore components` parity check against the rendered config. - Rollback — recipe-toggle path doesn't exist for the deleted set; pin chart + image at v0.1.x. `make doc-check` clean (banned-phrase, link resolution, test-name parity, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Tri Lam <tri@maydow.com>
Contributor
Author
Code reviewFound 2 issues:
tracecore/docs/migration/v0.1-to-v0.2.md Lines 65 to 83 in acf3767
tracecore/docs/migration/v0.1-to-v0.2.md Lines 188 to 190 in acf3767 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Drop the fictional `<receiver>.recipe: upstream` advice from the orphan-components table and the closing paragraph — that switch lands in PR-J, not v0.2.0, so today the only operator action for those receivers is to leave them disabled and pin v0.1.x if they need the signal. Rewrite the rollback section to point at the actual `v0.1.0-m1` tag instead of the non-existent `0.1.0-m8-alpha`, and drop the chart `appVersion` pin — charts are pinned by `--version` (chart-package version), not by `appVersion`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Tri Lam <tri@maydow.com>
5 tasks
trilamsr
added a commit
that referenced
this pull request
May 31, 2026
…de (#200) ## Summary Adds `docs/migration/v0.2-to-v0.3.md` covering the v0.3.0 security-posture migration per [RFC-0013 §migration](https://github.com/TraceCoreAI/tracecore/blob/main/docs/rfcs/0013-distro-first-pivot.md#migration--rollout). The single operator-visible break at v0.3.0 is the Python-profiling story: the cooperative `pyspy` receiver (zero capabilities added; in-process `faulthandler.dump_traceback` over UDS per [RFC-0009](https://github.com/TraceCoreAI/tracecore/blob/main/docs/rfcs/0009-pyspy-receiver-scope.md)) is deleted in PR-M, and the replacement `parca-agent` (eBPF) requires `CAP_SYS_ADMIN` (or root) + `hostPID: true` + a BTF-enabled kernel ≥5.3. The guide names: - The exact upstream capability requirement (`root` or `CAP_SYS_ADMIN` per [parca-dev/parca-agent](https://www.parca.dev/docs/parca-agent-security)) and why `CAP_BPF`/`CAP_PERFMON` is **not** yet a documented narrower alternative (conservative grant remains `CAP_SYS_ADMIN`). - Failure shapes by **syscall + errno** (`bpf(BPF_PROG_LOAD,…)` → `EPERM`, `perf_event_open(…)` → `EACCES`, `open("/sys/kernel/btf/vmlinux",…)` → `ENOENT`, etc.) — the stable surface across parca-agent versions, not paraphrased agent log strings. - A minimum-grant container `SecurityContext` snippet (DaemonSet-shaped, `add: [SYS_ADMIN]` not `privileged: true`) with explicit disclaimers about `readOnlyRootFilesystem` (deferred to upstream manifest verification) and PSS interactions. - A clean rollback path (pin v0.2.x chart + image; `tracecore-pyspy` PyPI helper remains installable one minor past v0.3.0). ### Why this PR exists despite cooperative pyspy needing zero capabilities tracecore's `pyspy` receiver does **not** use `CAP_SYS_PTRACE`. Per [RFC-0009 §Safety properties](https://github.com/TraceCoreAI/tracecore/blob/main/docs/rfcs/0009-pyspy-receiver-scope.md#proposal) and the chart's `conftest` policy (`add: []` asserted in `chart.yml`), the cooperative design walks Python frames in-process via `faulthandler.dump_traceback` and ships them over UDS — no `ptrace`, no `process_vm_readv`. The security-posture change at v0.3.0 is the **delta** from the cooperative path (zero capabilities, tracecore pod) to the eBPF path (`CAP_SYS_ADMIN`, separate `parca-agent` pod). PR-N documents that delta. ### Why now (not deferred with PR-M) parca-agent research confirms the OTel Profiles signal is still **Alpha** (Mar 2026 release) and parca-agent has no OTLP profiles exporter yet. That means operators upgrading to v0.3.0 will run parca-agent **alongside** tracecore for at least one more minor — they need the security-posture delta documented before PR-M cuts the receiver. PR-N landing ahead of PR-M gives operators an evaluation window. ### Drift fix `docs/migration/v0.1-to-v0.2.md`'s `pyspy` row claimed "Deferred until OTel Profiles GA. No upstream replacement exists today; the toggle survives until contrib ships `pprofreceiver`." This contradicted RFC-0013, which has named `parca-agent` (separate DaemonSet) as the replacement since the pivot landed. Updated the row to forward-reference the new guide and removed the stale "no upstream replacement" claim. Root cause: the v0.1-to-v0.2.md skeleton (PR #179, then fleshed in PR #191) predated the RFC-0013 §2 adoption-matrix line that explicitly maps `components/receivers/pyspy/` → `parca-agent` at v0.3.0. Fixed at the row, not paved over. ## Test plan - [x] `make doc-check` — 510 markdown links resolve to on-disk files; banned-phrase lint clean across 109 markdown files; new file's RFC + pyspy README/RUNBOOK cross-links verified. - [x] `make check` — `golangci-lint run ./...` 0 issues; `go vet ./...` clean; `go mod verify` clean. - [x] Pre-commit hooks (no-autoupdate-check, license-check) green at push time. - [ ] Reviewer sanity-check the SecurityContext YAML snippet renders as valid Kubernetes `apps/v1.DaemonSet` (apiVersion + spec.template structure). - [ ] Reviewer sanity-check the failure-mode table syscall + errno columns match Linux man-page conventions (`bpf(2)`, `perf_event_open(2)`). ```release-notes NONE ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
4 tasks
trilamsr
added a commit
that referenced
this pull request
May 31, 2026
## Summary Reconcile the four pivot-tracking docs (`docs/rfcs/0013-distro-first-pivot.md`, `CHANGELOG.md`, `MILESTONES.md`, `docs/migration/v0.1-to-v0.2.md`) with the wave-3 (PR-B1-shape sibling ports) and wave-4 (PR-B2-shape upstream-only ports + PR-F.1 + PR-J + PR-L + PR-N) landings. Pure doc sweep — no code or config touched. ## What changed ### `docs/rfcs/0013-distro-first-pivot.md` §migration PR sequence rows updated with PR-number citations and landed markers: - **PR-A2** (landed, #189, 2026-05-30) - **PR-B2** (landed, #201) — also enumerates sibling-receiver follow-ups under PR-B2 to dispel the slug collision with #188's PR-B2-labelled dcgm port: stdoutexporter (#202), pyspy (#203), kernelevents (#208), containerstdout (#209) - **PR-F.1** (landed) — fleshed-out delete list (`internal/{selftelemetry,telemetry}` + `components/receivers/dcgm/` + `pkg/dcgm/` + one orphan clockreceiver integration test) - **PR-F.2** re-scoped — now deletes the whole `internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}` bundle in one cut once the last three pipeline+consumer-importing receivers land (#204 k8sevents, #205 clockreceiver, #207 otlphttp). Per the import-graph state — `internal/componentstatus`'s only non-test consumer is `internal/pipeline`, so they delete together - **PR-G** (landed, #182), **PR-H** (landed, #183) - **PR-I.1a** (in flight — scaffold agent), **PR-I.1b** (pre-staged; gate satisfied by #201) - **PR-J** (landed, #195) — kept existing marker - **PR-K.1** (in flight — separate agent landing) - **PR-L** (landed, skeleton #179 + body #191) — flagged as living document - **PR-N** (landed, #200) — shipped at v0.1.0 ahead of v0.3.0 as a doc-only update at `docs/migration/v0.2-to-v0.3.md` ### `CHANGELOG.md` [Unreleased] - Restructured the pivot wave list as **four waves** (was three). Wave 3 enumerates PR-B1-shape sibling ports + support infra (#180-#194/#196). Wave 4 enumerates PR-B2-shape upstream-only ports + PR-J (#195) + PR-F.1 (#206) + PR-N (#200) + lint/TOCTOU hardening (#198/#210). - Tightened the PR-F.2 deferred note to point at the three open ports (#204/#205/#207) as the gate. ### `MILESTONES.md` - **M1** (pipeline runtime) — status row now cites PR-A2 (#189), PR-F.1 (#206), PR-F.2 gate (#204/#205/#207), PR-E (#180), retains `internal/config/` (still load-bearing for `tracecore validate`). - **M2** (self-telemetry) — status row now cites PR-F.1 (#206); flags `internal/componentstatus` as travelling with `internal/pipeline` in PR-F.2. - **M8** (DCGM receiver) — status flipped to *landed-and-replaced*: cites PR-F.1 (#206) deletion + PR-J (#195) `docs/integrations/prometheus-scrape.md` recipe. Notes the inert chart toggle retention until PR-K.3. ### `docs/migration/v0.1-to-v0.2.md` - §`internal/*` package deletion (PR-F) status flips from "not yet open" to "PR-F.1 landed (#206), PR-F.2 gated on three open ports". - Open-items checklist expanded from 5 to 13 entries — tracks every PR letter the migration guide cares about (A2 / E / F.1 / F.2 / I.1a-c / J / K.1-3 / L / N) with PR numbers and links. ## Why now Tracking docs accumulated drift across wave-3 + wave-4 because every sibling-port PR (and the support-infra PRs around them) updated the bottom of `CHANGELOG.md` but did not always touch the upstream sequencing section in RFC-0013. Per memory rule `[Keeping this document current]`: status drift is a review blocker. This PR is the consolidated catch-up; future port PRs include their RFC-row flip in-PR. ## What this PR does NOT change - No code, no config, no YAML, no chart — only the four tracking docs. - No new doc gates added; existing gates pass. - No PRs other than the four named docs are modified. ## Test plan - [x] `bash scripts/doc-check.sh` clean (33 test refs, 528 links resolve, comment-noise diff gate clean vs `origin/main`, all 13 gates green). - [x] Pre-commit hook (`commitlint` 72-char subject limit + DCO + AI-trailer gates) passed. - [x] Pre-push hook (`make ci-fast` equivalent: `golangci-lint`, `go vet`, `go mod verify`, `no-autoupdate-check`, `doc-check.sh`) passed on second attempt after `git fetch origin main` populated the worktree's `origin/main` ref — first push failed because the worktree previously tracked the (gone) `pr-a2-ocb-main-swap` branch, so `doc-check.sh`'s comment-noise diff-scope gate exited 128 on the missing ref. Root cause fixed by the fetch; not a workaround. - [ ] CI green on this branch. ```release-notes NONE ``` Signed-off-by: Tri Lam <tri@maydow.com> Co-authored-by: Tri Lam <tri@maydow.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RFC-0013 PR-L — expand
docs/migration/v0.1-to-v0.2.mdfrom the PR-179 skeleton (plus the metric-name + chart-values rows landed inline in PR-A2 / #189) into a comprehensive v0.1.x → v0.2.0 cutover guide. Every post-wave-2 landing now has a corresponding operator-facing migration row.What landed
collect,receivers list,debug dump,failure-inject) and removed flag (--log.format=text,--shutdown.drain-budget,--version-short) + their upstream replacementscmd/tracecore/treetelemetry.listen+telemetry.paths.*→telemetry.metricsListen+telemetry.healthListen+telemetry.healthPathwith default-port valuesinstall/kubernetes/tracecore/values.yamlHEAD/healthz+/readyzon:8888→healthcheckextensionat:13133/install/kubernetes/tracecore/templates/daemonset.yamlHEADclockreceiver → stdoutexporter→hostmetrics → debugsnippetvalues.yamlpipelines:block HEADcomponents/receivers/,components/exporters/directory inventory + RFC-0013 §2 adoption matrixtracecore_*→otelcol_*for receiver / exporter / queue / component-status / build-info families with per-signal splitinternal/integration/ocb_scrape_test.gocontract metricsstdoutexporterfailure-rate gap — debugexporter pinsotelcol_exporter_send_failed_*at zero; debug-only pipelines lose the signaldebugexportersemantics./_build/tracecore), source tree, smoke, image build, release pipeline, version source.goreleaser.yaml+.ko.yaml+ workflows HEADinternal/*package deletion (PR-F, in flight) — per-package public-surface migration map forselftelemetry,runtime/lifecycle,componentstatus,telemetry,pipeline,pipelinebuilder,consumer,fanoutinternal/tree inventory0.1.0-m9-alphahardcoded inbuilder-config.yaml dist.version; cross-ref todocs/reproducibility.mdworkaroundbuilder-config.yamlHEADtracecore componentsparity check against the rendered configCloses RFC-0013 PR-L. Open follow-ups (PR-I in-repo submodule, PR-J upstream recipes, PR-K in-tree-receiver delete, PR-F internal/* delete) are referenced inline in the guide so the next agent picking up any of them lands the corresponding doc update in the same PR.
Adversarial pre-review notes
builder-config.yaml: 6 receivers, 4 exporters, 3 extensions, 4 processors) againstawkcount ofgomod:lines.hostmetricsis the default invalues.yaml(enabled: true, loadscraper, 1s).cmd/tracecore,tools/components-gen,components.yamlall deleted from HEAD (git ls-filesreturns empty).internal/integration/ocb_scrape_test.gois present and asserts the two contract metrics named in the guide.port: health(notport: telemetry) athealthPath.builder-config.yaml,ocb_scrape_test.go, RFC-0013 §3, RFC-0013 §migration, in-doc anchor).install/kubernetes/tracecore/README.mdstill references/healthz+/readyzon three lines (chart-doc rot from PR-A2 that didn't sweep the README). Out of scope for PR-L; flagging for the next chart-doc sweep.Test plan
make doc-checkpasses (banned-phrase lint, link resolution, test-name parity, all 15 sub-checks green)chart,ci,install-benchworkflows do not gate on this file; onlydoc-checkmatters for a docs-only PR