Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 7 additions & 18 deletions .github/workflows/chaos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,20 @@ name: Chaos
# - harness-determinism: same argv + --seed produces byte-identical
# output across two runs, matching tools/failure-inject/testdata/
# golden.sha256.
# - chaos-pipeline-test: internal/pipeline/chaos_test.go runs under
# -tags=chaos and proves the receiver-exporter panic-or-error
# pairing leaks no goroutines across ≥100 iterations.
# - cpu-steal-mpstat: failure-inject cpu-steal pins a busy-loop and
# mpstat reports %steal+%user ≥ 95% on the pinned core for ≥ D-1
# seconds.
# - pattern-pod-evicted (M19): runs the hermetic replay-corpus
# detector test plus pins the pod-evict CLI output SHA so harness
# drift and detector drift are caught in the same workflow.
#
# The legacy chaos-pipeline-test job ran internal/pipeline/chaos_test.go
# under -tags=chaos to prove the in-tree receiver-exporter panic-or-error
# pairing leaked no goroutines. Deleted in RFC-0013 PR-F.2 along with the
# in-tree pipeline runtime; the equivalent panic-recovery contract now
# rides on upstream `go.opentelemetry.io/collector/service` and is
# covered by upstream's own chaos tests.
#
# Matrix-of-patterns rule: per MILESTONES.md §M4b the workflow grows
# a row when each pattern lands. M17 / M18 are still open and will
# add their own rows when they land.
Expand All @@ -27,13 +31,11 @@ on:
- main
paths:
- "tools/failure-inject/**"
- "internal/pipeline/chaos_test.go"
- "internal/synthesis/**"
- ".github/workflows/chaos.yml"
pull_request:
paths:
- "tools/failure-inject/**"
- "internal/pipeline/chaos_test.go"
- "internal/synthesis/**"
- ".github/workflows/chaos.yml"

Expand Down Expand Up @@ -105,19 +107,6 @@ jobs:
echo "OK [$argv]"
done < tools/failure-inject/testdata/golden.sha256

chaos-pipeline-test:
name: chaos-pipeline-test
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-go@4a3601121dd01d1626a1e23e37211e3254c1c06c # v6.4.0
with:
go-version-file: go.mod
cache: true
- name: go test -tags=chaos (panic-or-error × goleak)
run: go test -tags=chaos -race -count=1 -run TestChaos ./internal/pipeline/...

cpu-steal-mpstat:
name: cpu-steal-mpstat (linux)
runs-on: ubuntu-latest
Expand Down
6 changes: 0 additions & 6 deletions .github/workflows/install-bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,6 @@ on:
- 'components/exporters/otlphttp/**'
- 'install/kubernetes/tracecore/**'
- 'builder-config.yaml'
- 'internal/pipeline/**'
- 'internal/runtime/**'
- 'internal/selftelemetry/**'
- 'go.mod'
- 'go.sum'
- '.github/workflows/install-bench.yml'
Expand All @@ -29,9 +26,6 @@ on:
- 'components/exporters/otlphttp/**'
- 'install/kubernetes/tracecore/**'
- 'builder-config.yaml'
- 'internal/pipeline/**'
- 'internal/runtime/**'
- 'internal/selftelemetry/**'
- 'go.mod'
- 'go.sum'
- '.github/workflows/install-bench.yml'
Expand Down
4 changes: 2 additions & 2 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@ linters:
# Interface forwarders (Consume*, Component.Shutdown) are designed
# to pass errors through unmodified — wrapping would stutter.
ignore-interface-regexps:
- ^github\.com/tracecoreai/tracecore/internal/consumer\.(Metrics|Traces|Logs)$
- ^github\.com/tracecoreai/tracecore/internal/pipeline\.Component$
- ^go\.opentelemetry\.io/collector/consumer\.(Metrics|Traces|Logs)$
- ^go\.opentelemetry\.io/collector/component\.Component$
revive:
rules:
- name: var-naming
Expand Down
11 changes: 8 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,14 @@ Concrete implications for any edit you make:
`github.com/tracecoreai/tracecore/module`) per RFC-0013 §6, via
CONTRIBUTING.md "RFC routing" guidance.
- **Self-telemetry internals** (`internal/componentstatus`,
`internal/selftelemetry`, `internal/telemetry`) delete at v0.1.0 -
replaced by upstream `componentstatus` + `service/telemetry` + the
standard `otelcol_*` metric surface.
`internal/selftelemetry`, `internal/telemetry`) deleted at v0.1.0
(PR-F.1 + PR-F.2) - replaced by upstream `componentstatus` +
`service/telemetry` + the standard `otelcol_*` metric surface.
- **Pipeline / boot-path internals** (`internal/pipeline`,
`internal/pipelinebuilder`, `internal/config`, `internal/consumer`,
`internal/fanout`, `internal/runtime/lifecycle`) deleted at v0.1.0
(PR-F.2) - replaced by the OCB-generated boot path off
`builder-config.yaml`.
- **Release pipeline** rewrites at v0.1.0 to goreleaser + slsa-github-generator
+ cosign-installer + sbom-action + actions/attest-build-provenance.
- **Customer-stable contracts** (`k8s.event.hint` enum,
Expand Down
12 changes: 9 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,13 @@ Pivot landed across four waves of PRs:
- `internal/telemetry/` — was the in-tree `MeterProvider` + probe-server (`/metrics`, `/healthz`, `/readyz`) wrapper. Probes now flow through the upstream `healthcheckextension`; meter-provider is upstream `service.telemetry`. Only remaining consumers were `internal/selftelemetry/*_test.go` (deleted together with selftelemetry) and one orphan clockreceiver integration test.
- `components/receivers/clockreceiver/errors_integration_test.go` — orphan integration test from #185's PR-B1 clockreceiver port; bootstrapped via the now-deleted `selftelemetry.Receiver` interface but never migrated to the receiver-scoped sibling `selftel.go`. The covered behaviour ("errors_total surfaces on downstream failure") is now exercised through clockreceiver's sibling tests.

PR-F.2 (deferred — pending three open ports): Delete `internal/{componentstatus,pipeline,pipelinebuilder,consumer,fanout,runtime/lifecycle}`. Gated on the last three pipeline+consumer-importing receivers landing — k8sevents (#204), clockreceiver (#205), otlphttp (#207) — all three open as of this entry, all three following the PR-B2 (#201) shape. Once they merge, the entire `internal/*` runtime bundle has zero non-test consumers and drops in a single cut. The `clockreceiver` source deletion stays in PR-K (chart + values-keys deprecation cycle) — PR-F.2 only deletes `internal/*` packages, not the canonical-example receivers themselves.
**PR-F.2 landed: `internal/{pipeline,pipelinebuilder,config,consumer,fanout,componentstatus,runtime/lifecycle}` deleted.** Net deletion of the full in-tree boot-path infrastructure across 56 files / -6,888 LOC. The nine sibling upstream-port PRs (#201 nccl_fr, #202 stdoutexporter, #203 pyspy, #204 k8sevents, #205 clockreceiver, #207 otlphttp, #208 kernelevents, #209 containerstdout, plus PR-A2 boot-path retirement) cleared every external caller of these packages; with PR-F.1's selftel/telemetry deletions and PR-A2's `cmd/tracecore` rewire to the OCB-generated main, none of these packages has a non-test consumer in the tree. Replacements:
- `internal/pipeline/` + `internal/pipelinebuilder/` + `internal/consumer/` + `internal/fanout/` — assembly + per-signal consumer chain + fanout cloning are now provided by upstream `go.opentelemetry.io/collector/service`. The OCB-generated `_build/main.go` consumes `builder-config.yaml` and produces an equivalent collector instance.
- `internal/config/` — YAML loader with `file:line:col` errors replaced by upstream `confmap` providers (`file`, `yaml`, `env`).
- `internal/componentstatus/` — replaced by upstream `go.opentelemetry.io/collector/component/componentstatus.ReportStatus` (same free-function shape).
- `internal/runtime/lifecycle/` — the `Lifecycle{Add, Start, Shutdown, PanicCallback}` helper was already ported into each receiver as a package-local sibling during the PR-B1 wave (#184/#185/#186/#187/#194/#196/#197); after the wave-3 PRs landed, the in-tree package had no non-test consumers and is now gone. `kernelevents/lifecycle.go` was inherited from k8sevents (#208).

`.golangci.yml` `ignore-interface-regexps` now points at upstream `consumer.{Metrics,Traces,Logs}` and upstream `component.Component`. `.github/workflows/chaos.yml` drops the `chaos-pipeline-test` job (the in-tree `internal/pipeline/chaos_test.go` is gone; the equivalent panic-recovery contract is now provided by upstream `service`); harness-determinism + cpu-steal-mpstat + pattern-pod-evicted jobs preserved. `.github/workflows/install-bench.yml` drops the `internal/{pipeline,runtime,selftelemetry}/**` path-filter rows. `docs/FAILURE-MODES.md` Lifecycle / Data-flow / Shutdown-timing / Backend-connectivity tables rewired from in-tree test pointers to upstream-delegated wording matching PR-A2's pattern. `docs/STRATEGY.md` "Stable interfaces in `internal/pipeline/`" graduation row rewritten to point at the upstream surface. `MILESTONES.md` M1 + M4b + M19 rubric details annotated. `docs/migration/v0.1-to-v0.2.md` `internal/*` deletion section's status banner flipped from deferred to landed.

Build-tag `dcgm` retired (`make build-tags` no longer vets `-tags dcgm`). `make bench-check` loop drops both deleted package rows (dcgm + internal/telemetry). `scripts/register-lint.sh` allowlist emptied (the two `internal/telemetry/{build_info,slo}.go` entries are gone with the package). Chart `receivers.dcgm` toggle + `_helpers.tpl` doc-list + `NOTES.txt` warning retained until PR-K removes them outright (toggle is already inert — operators enabling `receivers.dcgm.enabled=true` have crashed at boot since PR-A2). `internal/runtime/lifecycle/` doc-comment updated. `docs/FAILURE-MODES.md` self-tel-surface rows rewired to upstream-delegated wording. `docs/patterns/{README,pattern-{1,3,4,5}}.md` replay-test pointers updated.

Expand All @@ -28,7 +34,7 @@ Build-tag `dcgm` retired (`make build-tags` no longer vets `-tags dcgm`). `make

**PR-E unblocked.** Original RFC-0013 §migration plan named `telemetrygeneratorreceiver` as the upstream replacement for `clockreceiver`. Verified 2026-05-30: the receiver does not exist in `opentelemetry-collector-contrib` at any tag from v0.95.0 through v0.130.0; two community proposals (contrib issues #41687 and #43657) were closed `not_planned`. Replacement landed on `hostmetricsreceiver` (loadscraper @ 1s) — an upstream OCB-bundled receiver that emits 3 low-cardinality series (`system.cpu.load_average.{1m,5m,15m}`) at the cadence the bench's pass condition needs (first parseable JSON line at the sink — see `bench/install/run.sh`). This PR adds `hostmetricsreceiver` to `builder-config.yaml`, adds a `receivers.hostmetrics` opt-in block to the chart values (default disabled — chart default stays `clockreceiver` this release), and flips `bench/install/tracecore-values.yaml` to enable hostmetrics + disable clockreceiver. RFC-0013 §migration PR-E + §4 + §7 deletion table updated. Chart-default flip from `clockreceiver` to `hostmetrics` + source-deletion of `components/receivers/clockreceiver/` are deferred to PR-K (in-tree-receiver deletion wave) so the values-keys migration ships together with `NOTES.txt` deprecation warnings and the coordinated migration of ~92 in-tree test-fixture references in one cut rather than two operator-visible changes.

Remaining v0.1.0 work: PR-F.1 (delete `components/receivers/dcgm/` + `pkg/dcgm/` + `internal/selftelemetry/` + `internal/telemetry/`) landed in this Unreleased section; PR-F.2 (delete `internal/componentstatus/`) deferred until `internal/pipeline` migrates to upstream `componentstatus`. Chart default pipeline still hardwires the to-be-deleted receivers, so the receiver-side deletions (clockreceiver / containerstdout / kernelevents / k8sevents) ride with PR-K alongside the v0.2.0 recipe migration to avoid an interim chart break.
Remaining v0.1.0 work: PR-F.1 (delete `components/receivers/dcgm/` + `pkg/dcgm/` + `internal/selftelemetry/` + `internal/telemetry/`) and PR-F.2 (delete `internal/{pipeline,pipelinebuilder,config,consumer,fanout,componentstatus,runtime/lifecycle}`) both landed in this Unreleased section. Chart default pipeline still hardwires the to-be-deleted receivers, so the receiver-side deletions (clockreceiver / containerstdout / kernelevents / k8sevents) ride with PR-K alongside the v0.2.0 recipe migration to avoid an interim chart break.

**RFC-0013 §migration rescoped (doc-only).** Headline: **PR-I is now an in-repo Go submodule at `module/`, not an external `tracecoreai/tracecore-components` repo.** Open-source project — one fork, one CI, one issue tracker, one DCO wins. Go submodule tags give independent version line; OCB `gomod:` + `replaces: ./module` for dev-loop resolves identical to external repo.

Expand Down Expand Up @@ -86,7 +92,7 @@ Files updated in this PR: `docs/rfcs/0013-distro-first-pivot.md` (§1, §6, §mi
- [RFC-0004](docs/rfcs/archived/0004-clockreceiver-stdoutexporter.md): clockreceiver + stdoutexporter (Option C scope adoption - `Capabilities()`, fan-out, `ComponentState` mixin, factory-as-package-var). Archived under RFC-0013.
- [`docs/STRATEGY.md`](docs/STRATEGY.md): long-term repo posture. The single load-bearing principle: "Tracecore is OpenTelemetry-Collector-compatible by default. Every divergence is deliberate and documented."
- [`docs/research/otel-graph-notes.md`](docs/research/otel-graph-notes.md): synthesized findings from reading OTel Collector v0.152.0's `service/internal/graph` + `testcomponents` + `fanoutconsumer` source.
- Receiver-author quickstart in [`internal/pipeline/README.md`](internal/pipeline/README.md).
- Receiver-author quickstart in `internal/pipeline/README.md` (deleted at v0.1.0 with the in-tree pipeline runtime per RFC-0013 PR-F.2; superseded by upstream [`go.opentelemetry.io/collector`](https://pkg.go.dev/go.opentelemetry.io/collector) docs).
- `PRINCIPLES.md`: 15 design and engineering principles distilled from the foundation work; the *why* behind every rule in `STYLE.md`.
- **`internal/selftelemetry`** - producer-side `Receiver` interface (`IncError`, `IncEmissions`, `ObserveLatency`, `SetDegraded`, `MarkActivity`) that components write to when reporting their own health, plus a noop default. The `/metrics` endpoint that surfaces these to operators is owned by M2; this package lets M8+ receivers wire to self-telemetry from day one without waiting for M2.
- **M9 - kernelevents receiver (alpha)** - tails `/dev/kmsg` and the systemd journal, filters by severity / facility / regex, preserves trace context. Two sources behind a common interface, both `//go:build linux` with non-Linux stubs that degrade silently. Subprocess crash → backoff restart (1s/2s/5s, max 3 retries, 60s window). Emits records with a stable, dereferenceable SchemaURL pointing at [`docs/schemas/kernelevents/v0.md`](docs/schemas/kernelevents/v0.md); resolves NVIDIA Xid codes to canonical descriptions via the `kernelevents.xid.description` attribute (40 codes in the alpha subset). 16 KiB body cap with `...`-suffix truncation on pathological inputs. **Depends on M2** - receivers acquire their `selftelemetry.Receiver` from `TelemetrySettings.MeterProvider`; the M9 receiver will run without M2 (noop telemetry) but operators won't see `/metrics` / `/healthz` / `/readyz` until M2 is wired in `cmd/tracecore`. See [RFC-0007](docs/rfcs/0007-kernelevents-receiver-scope.md). (Originally numbered 0005; renamed to 0007 in the M8↔M9 merge to resolve a collision with the dcgm RFC that landed on main as RFC-0005.)
Expand Down
Loading
Loading