[pipeline] M1 runtime + first canonical receiver/exporter#12
Merged
Conversation
RFC-0003 defines the M1 keystone foundation: Component, Host, and per-signal Factory interfaces in `internal/pipeline`; two-phase shutdown (1s receiver-ingest cessation + operator-configurable exporter drain); push-based pdata consumer interfaces between stages (no channels in the data path); explicit factory registration via `components.yaml` codegen'd to `cmd/tracecore/components.go`; a `pipelinetest` stub-host package consumed by every downstream receiver in M8-M16; and `safe.Call(ctx, op, fn)` for cgo / vendor SDK calls. Five Operator-UX patterns are elevated to M1 acceptance, not optional polish: line-numbered YAML errors, named-op `safe.Call`, empty-pipeline boot, first-data log line per pipeline, and `pipelinetest.NewRuntime(t)`. They're cheap to ship at M1, set the project's UX tone, and become hard to retrofit once receivers start landing in M8. Verification evidence captured inline: pdata transitive footprint bounded; Component Type regex copied verbatim from OTel v0.152.0; safeCall panic-recover and errgroup-cause-propagation behavior confirmed by spike code; default exporter-drain budget grounded in OTel's own internal defaults. MILESTONES.md M1 acceptance criteria updated to reference the RFC and enumerate the must-ship UX patterns. Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
First scaffold commit for M1, kept stdlib-only so each commit builds
on its own and reviewers see the contract before the pdata dependency
lands in the follow-up.
internal/pipeline:
* Type / NewType / MustNewType with the OTel-v0.152.0 component-type
regex copied verbatim. NewType returns an error; MustNewType panics
and is reserved for compile-time constants.
* ID = (Type, optional instance name). String() renders "kind" or
"kind/name" to match the YAML config form operators will write.
* Component lifecycle interface (Start, Shutdown). Doc comment names
the idempotency requirement explicitly so the M8-M16 receiver
authors don't have to re-derive it.
* Host interface with the deferrals from RFC-0003 written into the
godoc (GetFactory deferred; StatusEvent shape opaque until M2).
* TelemetrySettings carries *slog.Logger only at M1; Resource lands
with the pdata import in the next commit.
internal/safe:
* Call(ctx, opName, fn) wraps cgo / vendor SDK calls. Recovers
panics, races fn against ctx.Done, tags every returned error with
opName so operator-visible errors say `"DCGM_FieldGroupGetAll: ..."`
rather than `"vendor call: ..."`. Package doc is explicit that
SIGSEGV in cgo escapes recover() — that is the runtime's problem,
not this helper's.
* Tests cover success, error pass-through, panic recovery, deadline
bypass when fn ignores ctx, and the empty-opName guard.
`make ci` passes (vet + lint + race tests + reproducible build).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Wires the pdata dependency declared in RFC-0003 and the contract
pieces it enables.
internal/consumer:
* Metrics / Traces / Logs push interfaces — one method each,
ConsumeX(ctx, x). Shape copied verbatim from
go.opentelemetry.io/collector/consumer at v1.58.0 so receivers
port one-to-one between OTel and tracecore.
* Doc comment names pdata's non-thread-safe contract and tells
fan-out callees to clone before crossing goroutines.
internal/pipeline:
* TelemetrySettings grows a Resource (pcommon.Resource) field.
* Receiver/Processor/Exporter factory interfaces with per-signal
Create methods. Receivers and processors take a next consumer.X;
exporters are pipeline leaves and take none.
* ErrSignalUnsupported sentinel for factories whose Component
doesn't implement a given signal — match with errors.Is so the
runtime can surface a clear operator-facing message.
* Config marker interface (Validate() error) the loader calls
before handing config to a factory.
* README documents the contract and points at RFC-0003 for the
rationale.
internal/pipeline/pipelinetest:
* Host stub with atomic call recording and an extension map that
GetExtensions copies before returning (mutating the returned map
must not corrupt host state).
* NewRuntime(t) returns a pre-wired Runtime with sensible defaults:
stub Host, discarding logger, empty Resource, and a CreateSettings
whose ID name is t.Name() for distinguishable log lines in parallel
tests.
* Compile-time assertion that *Host satisfies pipeline.Host.
pdata v1.58.0 lands in go.mod. Transitive footprint matches the
verification in RFC-0003 (bounded; uses the slim OTLP proto path,
no full gRPC).
`make ci` passes.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Runtime owns a graph of pre-wired Components and exposes Start /
Shutdown. NewRuntime does not invoke factories; the caller assembles
each Pipeline with consumers already plumbed and hands the list in.
Start order: exporters → processors (last-stage first) → receivers.
Consumers are ready before producers; a Start failure stops the
sequence and the caller invokes Shutdown to unwind only the
Components that actually started.
Shutdown is two-phase:
* Phase 1 (≤1s): every Receiver Shutdown is invoked so ingest stops
fast. Deadline-exceeded receivers are abandoned and logged at WARN.
Matches STYLE.md §Concurrency — training nodes can't wait.
* Phase 2 (≤DrainBudget, default 10s, hard ceiling 30s): processors
and exporters Shutdown in LIFO of their start order so each tier
drains into a still-running downstream tier. Budget-exceeded
returns a sentinel error so the caller can return non-zero.
Empty-pipelines case logs `"no pipelines configured"` once and returns
nil — first of the M1 Operator-UX patterns to land in code.
`pipeline.Kind` enumerates the signal type (Metrics / Traces / Logs);
String() renders the YAML form operators write.
Tests cover start order, shutdown order (receivers first, non-receivers
LIFO), partial-Start unwind, double-Start guard, drain-budget clamp,
phase-2 deadline elapse, and the no-op Host substituted when
Settings.Host is nil.
`make ci` passes (vet, lint, race tests, reproducible build).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
internal/config loads the operator-facing collector YAML. It is structural-only at M1: per-component config bodies are stored as opaque yaml.Nodes for the matching factory to decode at runtime — which lands when receivers arrive in M8. Operator-UX criterion #1 lands here: * KnownFields(true) rejects unknown top-level keys with the line number from yaml.v3's TypeError, re-shaped as `path:line: <message>` so an editor can jump to it. * Cross-section validation catches undefined component references from inside service.pipelines and pinpoints the offending name. * Pipeline-key parsing (`<kind>` or `<kind>/<name>`) rejects unknown signal kinds with a message naming the bad token. Empty file → zero-valued Config; required for the empty-pipeline boot UX criterion that lands when cmd/tracecore is rewired. Tests cover: empty file, minimal valid config, missing file, unknown top-level field with line number, unknown pipeline kind, undefined-receiver reference, plus a table-driven ParsePipelineID. `make ci` passes. Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
Factory registration uses the explicit-map pattern from STYLE.md
§Component registration. To let receiver work (M8-M16) land in
parallel without conflicting on a single import block, the map
itself is generated:
* components.yaml — the single source of truth for which
components the binary ships. Empty lists at M1; receiver PRs
add one entry each.
* tools/components-gen — text/template generator that reads the
manifest and writes cmd/tracecore/components.go. Output runs
through go/format so the generated file is gofumpt-clean.
* cmd/tracecore/components.go — generated file, committed. M1
contents are an empty Factories struct, which `make ci` will
flag if it ever drifts from the manifest.
* pipeline.Factories type — what the generated function returns.
Resolves YAML component keys to the right factory.
Makefile gains:
* `make generate` — re-runs the generator and gofumpts the result.
* `make generate-check` — fails if `make generate` would produce
a diff; wired into `make ci` so a stale components.go can't
sneak past review.
`make ci` passes.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Replaces the stub runCollect with the load-config → build-pipelines → start-runtime → wait-for-signal → two-phase-shutdown path. `tracecore collect --config=<empty.yaml>` now boots, logs `"no pipelines configured"` once, idles until SIGINT/SIGTERM, runs two-phase shutdown, and exits 0 — the M1 Operator-UX criterion for empty-pipeline behavior, verified end-to-end: time=... msg="tracecore starting" version=... time=... msg="no pipelines configured" time=... msg="tracecore running; waiting for signal" time=... msg="shutdown signal received" drain_budget=10s time=... msg="tracecore stopped cleanly" New flags: `--config` (required, removed default-Run-with-no-config foot-gun) and `--shutdown.drain-budget` (operator knob for Phase 2, defaults 10s). buildPipelines is honest about M1's state: with the empty Factories struct this build ships, only empty configs are accepted. A config that references a receiver/processor/exporter is rejected with a clear "no component factories registered; only an empty config is supported until receivers land (M8+)" — the alternative would have been silently dropping configured components on the floor. Exit codes follow sysexits.h: 0 ok, 1 runtime failure, 2 data error (bad config), 64 usage error. Tests cover: empty-config boot exits 0 and logs the right lines; bad config path returns 2; non-empty-config rejection; build-pipelines branches for empty / receiver-only / pipeline-ref / empty-service-block configs. `make ci` passes. Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
WrapFirstDataMetrics / Traces / Logs wrap a consumer.X so the first push through it emits a single `"pipeline first data"` log line: msg="pipeline first data" pipeline=metrics/primary item_count=42 Subsequent pushes hit only `next`. sync.Once gives once-per-process semantics that survive concurrent first-pushes, verified by a 32- goroutine race test. The wrappers aren't used yet — this build has no receivers — but they're the load-bearing building block for the M1 Operator-UX criterion. Receivers wrap their `next` consumer at construction time when they land (M8+), and operators get a no-external-tooling "the thing is alive and pumping" signal for free on every pipeline. Tests cover: single-fire across N calls, three signals, concurrent N-goroutine first-push, and error pass-through from `next`. `make ci` passes. Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
5 fixes surfaced by line-by-line self-review:
1. Phase 2 shutdown was parallel + reverse-ordered slice, which is
theatre — concurrent goroutines have no ordering. Now serial-LIFO,
mirroring OTel Collector v0.152.0 service.Shutdown. Upstream stages
genuinely drain into a still-running downstream stage. Phase 1
(receivers) stays parallel — receivers have no inter-receiver
ordering dependency. New test
TestRuntime_PhaseTwoSerialLIFO_SkipsRemainingOnBudgetElapse pins
the contract: when the budget is exhausted, remaining Components
are skipped (not invoked then abandoned).
2. shutdownGroup had a race where ctx.Done and the work-finished
channel firing the same instant could append a spurious "budget
elapsed" sentinel even when every Shutdown succeeded. Treat a
closed done as authoritative when ctx fires.
3. config.Load discarded *yaml.TypeError type by joining its entries
into a plain string. Replaced with a LoadError struct that wraps
the underlying error (preserves the chain for errors.As) and
formats per-line on Error(). Test verifies errors.As reaches the
*yaml.TypeError after wrapping.
4. cmd/tracecore swallowed shutdown errors when Start failed —
`_ = rt.Shutdown(ctx)` hid any unwind problem. Now logged at WARN
so operators see the full picture, not just the initial failure.
5. component.go godoc said `pipelinetest.NewHost(t)`; the function
takes no args. Fixed.
Also dropped: the defensive copy in shutdownGroup's return path
(errs is a function-local slice; the copy didn't help correctness),
and the reverse() helper (now dead code).
`make ci` passes.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Ten low-risk simplifications surfaced by the line-by-line review:
* CreateSettings: drop the speculative "keeps the door open for
future fields" rationale. If/when those fields arrive, the
motivation lives at that commit.
* Pipeline.Kind: drop the unused struct field. The ID encodes the
kind already (stringified as "metrics/primary"); the field was
referenced nowhere in Runtime or its consumers.
* pipelinetest.Runtime: drop the standalone Telemetry field — it
was a copy of CreateSettings.Telemetry, so mutating one didn't
update the other (foot-gun). Tests now access via
rt.CreateSettings.Telemetry.
* pipelinetest.NewRuntime godoc: drop the wrong "captures t for
future cleanup hooks" claim — t is only used for Helper() and
Name(), not stored anywhere.
* firstdata.go: extract a shared onceLog struct so the three
signal-specific wrappers each carry one embedded helper instead
of duplicating the (pipeline, logger, once) trio. ~30 lines
shorter; identical observable behaviour.
* tools/components-gen: replace three near-identical receiver /
processor / exporter loops with a single buildEntries helper.
* Makefile generate-check: replace shasum (Perl-dependent, varies
by distro) with `git diff --exit-code`. Cleaner and POSIX-clean.
* safe.Call: drop the empty-opName guard. Callers pass string
literals; runtime check on a compile-time discipline was noise.
Doc note added that emptiness isn't validated.
* config.Config.validate: stop threading the file path through —
Load wraps the validation error in a *LoadError with the path
once at the top, not via every callee's signature.
* Test rename: TestRunCollect_NonEmptyConfig_RejectsUntilM8 →
RejectsWithoutFactories. Test names that hard-code future
milestone numbers rot when scope shifts.
`make ci` passes.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Seven naming / docstring / consistency tweaks:
* pipelinetest.Runtime → pipelinetest.Fixture; NewRuntime → New.
Two adjacent packages both having a "Runtime" type was confusing —
pipeline.Runtime is production lifecycle code, pipelinetest is
test scaffolding. Files renamed runtime.go → fixture.go in lockstep.
README, MILESTONES, and RFC text updated; RFC adds a one-line note
that the original draft used NewRuntime, since the audit trail
matters more than the original name.
* pipeline.NewID now validates the instance name against the same
regex as Type. Prevents slashes from leaking into ID.String(),
which would collide with the kind/name separator. Added MustNewID
for compile-time-known instance names. pipelinetest.New flattens
`/` in t.Name() to `_` so subtest IDs stay valid.
* Receiver / Processor / Exporter alias godocs: trim the false
claim of a "receiver-specific contract" that doesn't exist in the
interface. Honest one-line description instead.
* internal/consumer/*.go: drop the per-file thread-safety repeats;
the note lives in package doc only.
* DefaultDrainBudget rationale shortened from a paragraph to one
line. The numeric value is the load-bearing part.
* Trivial method docstrings on Type.String, Type.IsZero, ID.Kind,
ID.Name kept as one-liners — revive insists on them, so the
middle ground is the shortest description that compiles.
End-to-end smoke test confirmed empty-pipeline boot still works
after the rename:
msg="tracecore starting" config=/tmp/empty.yaml
msg="no pipelines configured"
msg="tracecore running; waiting for signal"
msg="shutdown signal received" drain_budget=10s
msg="tracecore stopped cleanly"
`make ci` passes.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
pipeline.Kind (the signal enum) and id.Kind() (returns the
Component's Type) were two unrelated meanings of "kind" living next
to each other. Renaming the enum to Signal removes the overload:
* pipeline.Kind → pipeline.Signal
* pipeline.KindMetrics → pipeline.SignalMetrics
* pipeline.KindTraces → pipeline.SignalTraces
* pipeline.KindLogs → pipeline.SignalLogs
* config.ParsePipelineID returns (pipeline.Signal, name, err) —
error message now reads `"unknown pipeline signal"` so operators
see consistent vocabulary.
id.Kind() keeps its existing name; "the component's kind" reads
naturally now that "kind" isn't also used for signal type.
`make ci` passes; test renamed accordingly.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Eight smaller fixes from the round-2 self-review:
* cmd/tracecore.buildPipelines: replace the runtime "not yet
implemented" error with a panic. The factory-arm is unreachable
in this build (zero factories ship), so a future contributor who
adds a factory without extending buildPipelines hits a loud crash
in dev rather than emitting an error operators have to puzzle
over. Test pins the panic message so the guard is regression-
proof.
* cmd/tracecore.hasOperatorIntent (renamed from hasComponents):
a bare `service: { pipelines: { metrics/primary: {} } }` now
rejects with the same operator-facing message. Empty service
blocks (no pipeline declarations at all) still pass through.
* tools/components-gen: validate component type strings at gen-time
against the same regex pipeline.NewType uses. A bad string in
components.yaml now fails `make generate` instead of panicking
at binary startup via MustNewType.
* tools/components-gen: add a main_test.go covering buildEntries —
valid specs produce sane entries, bad types and missing packages
fail.
* pipelinetest.New: inline the single-use sanitizeName helper. The
`strings.ReplaceAll(t.Name(), "/", "_")` is more honest in-place.
* pipeline.Signal.String godoc: drop the stale "Used for log lines"
claim — the runtime doesn't log it. Only YAML form remains.
* config.LoadError godoc: clarify that yaml.v3 surfaces line
numbers only (no columns), so the rendered format is
`<path>:<line>: <msg>` rather than gcc/clang's
`<path>:<line>:<col>:`.
* Drop the "(M8+)" milestone literal from the operator-facing
"no factories registered" message. Phrased as "until the first
receiver factory is wired" so the message doesn't rot when M8
lands.
End-to-end smoke test confirmed: empty config boots clean (exit 0),
non-empty config rejects with the clearer message (exit 2).
`make ci` passes; tools/components-gen now has tests so every package
in the build has at least one.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Seven fixes from a third self-review pass.
* safe.Call panic path doubled opName: the panic handler prefixed
with opName, then the receive case prefixed again, producing
"DCGM_X: DCGM_X panic: ...". Tests passed because Contains was
satisfied by either occurrence — the actual error string was
ugly. Panic handler now sends just "panic: <r>"; receive wraps
once. Test now uses Equal to pin the exact format so the
regression can't sneak back.
* tools/components-gen: imports were keyed by package path. A
receiver and a processor with the same package name (legal Go,
rare but possible) lost the first alias to the second and emitted
an unresolved name. Imports now tracked by alias; Go permits
aliased duplicate imports of the same package. Regression test
added.
* Pipeline godoc previously claimed receivers and exporters must
be in data-flow order. They're concurrent peers at runtime; only
processor order matters. Doc trimmed to reflect that.
* Three separate `const X = ...` declarations for the shutdown
timing constants merged into one block (gofumpt-canonical).
* Added TestRuntime_NilLogger_FallsBackToSlogDefault: pins the
nil-Logger substitution that NewRuntime does to avoid panics.
* Dropped the unused `order int32` field from runtime_test's event
struct, and the unused atomic.Int32 sequence counter on eventLog
— only label and kind are read anywhere.
* safe.Call signature dropped the unused named return.
`make ci` passes; smoke test confirms empty-pipeline boot still
exits 0.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Document the M1 scaffold's deviations from the RFC's original design.
Future readers checking the RFC against the tree shouldn't have to
play spot-the-difference, but the original design history is worth
preserving — the deltas live in a new section rather than rewriting
the body.
Deviations recorded:
* consumer/ hoisted from internal/pipeline/ to internal/consumer/
(avoids import cycle when components/<role>/<name>/ imports it).
* safe/ hoisted similarly to internal/safe/.
* fanout.go deferred to M8 (no receivers exist yet to fan out from).
* pipelinetest.Runtime renamed to Fixture during M1 self-review to
avoid colliding with pipeline.Runtime.
* pipeline.Kind renamed Signal (collided with id.Kind() which returns
the component's Type).
* Phase 2 shutdown is serial-LIFO, not parallel — parallel was the
RFC's drafted shape but doesn't actually drain into a still-running
downstream tier.
* errgroup not used in Runtime.Shutdown. The "Verifications" table's
errgroup entry was a research finding, not a commitment to use it;
raw sync.WaitGroup + mutex-guarded error slice gives us multi-error
collection that errgroup.Group can't.
* safeCall is spelled safe.Call in shipped code.
* buildPipelines panics on the unreachable factory-arm rather than
returning a runtime "not implemented" error.
`make ci` passes (doc-only change).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Step back from internal polish and pin the M1 contract — the surface
M8-M16 receiver authors and operators actually consume.
internal/pipeline/m1_contract_test.go:
* Package doc lists the nine M1 contract claims and points at where
each is pinned (this file + the unit tests scattered across the
tree). Future contributors get one place to see what M1 promises.
* TestM1Contract_ReceiverAuthorEndToEnd is the under-tested story.
Existing unit tests use stub components that just record calls;
this test wires the kind of types an M8 receiver author will
write (fakeReceiver implements Component + pushes via
consumer.Metrics; fakeProcessor reads + forwards; fakeExporter is
the leaf). The Runtime drives lifecycle, data flows stage→stage,
WrapFirstDataMetrics fires once across N pushes, two-phase
Shutdown returns clean.
Three contract claims tested together (#1 lifecycle, #2 data
flow, #3 first-data once-fire): a regression in any layer trips
this test even when the individual unit tests still pass.
Factory-based assembly (buildPipelines with non-empty Factories) is
deliberately not exercised — that path panics today and is the M8
deferred work. Receiver authors writing the first integration test
will use the hand-wired Pipeline shape this acceptance test pins.
Status: contract test passes against current code. The PR's scope
(M1 keystone + 5 UX patterns + codegen + main wiring) is validated.
`make ci` passes.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Milestone numbers rot. The pipeline contract doesn't. internal/pipeline/m1_contract_test.go → contract_test.go TestM1Contract_ReceiverAuthorEndToEnd → TestContract_ReceiverAuthorEndToEnd Package doc reworded from "M1 contract claims" to "Contract claims"; the inline comment about which path is "M8 deferred work" rephrased as "until the first receiver lands" — same intent, no rotting label. `make ci` passes. Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
Five edge-case tests added to capture behaviors that surprised me on
re-read. Three describe behavior the code doesn't yet implement; two
pin current semantics so any future regression trips a test.
GREEN (current behavior matches desired):
* TestWrapFirstDataMetrics_EmptyPayload_StillLogs — "first data"
means first PUSH ATTEMPT, not first non-empty push. Empty payload
fires once with item_count=0. Pinned so a future "wait for non-zero"
refactor catches the semantic change.
* TestRuntime_SharedComponentAcrossPipelines_StartedTwice — same
Component instance in two Pipelines is Started twice, Shutdown
twice (Component contract is idempotent). Pinned so any future
dedup is a deliberate decision, not an accident.
RED, skipped with TODO (iterate next session):
* TestParsePipelineID_TrailingSlash_Rejected — `metrics/` parses
identically to `metrics`. Silent normalization hides operator
intent (typo vs. missing instance). Iterate: add a trailing-slash
check to ParsePipelineID.
* TestParsePipelineID_MultipleSlashes_Rejected — `metrics/primary/secondary`
parses with name="primary/secondary", which slips past validate
(NewID isn't constructed here) and would only fail at runtime in
the future builder. Iterate: reject at parse time so the error
surfaces at config load with operator-visible line context.
* TestBuildTmplData_DuplicateTypeWithinRole_Rejected — two entries
with the same type in the same role silently collapse to one (Go
map semantics). Operator's first entry becomes invisible. Iterate:
add a per-role `seenType` set in buildEntries so `make generate`
fails loudly.
`t.Skip("TODO: …")` keeps `make ci` green while preserving the
gap-pinning intent — skipped status with the TODO surfaces in test
output. Removing each `t.Skip` is the next iteration's red→green
step.
`make ci` passes (5 new tests run; 2 pass, 3 skip).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Phase 1 — implement the 3 TODOs surfaced by the previous commit:
* ParsePipelineID rejects trailing slash. `metrics/` distinguished
from `metrics`; operator typos no longer silently normalized.
* ParsePipelineID rejects multi-slash forms. `metrics/primary/secondary`
fails at config load with a clear message instead of slipping past
validation to fail later at NewID construction.
* components-gen's buildEntries rejects duplicate types within a role.
Operator copy-paste no longer silently loses an entry to Go map
semantics in the generated file.
Three previously-skipped tests removed t.Skip; all green now.
Phase 2 — six more edge cases pinned:
* TestCall_RecoversPanicNil — Go 1.21+'s `panic(nil)` becomes a
*runtime.PanicNilError; verify our wrapper still produces a
readable "opName: panic: ..." rather than "opName: panic: <nil>".
* TestRuntime_ShutdownTwice_IsIdempotent — second Shutdown is a
no-op; Component.Shutdown is invoked exactly once even with two
Runtime.Shutdown calls.
* TestRuntime_ShutdownWithoutStart_IsNoOp — Shutdown before Start
returns nil cleanly (the path main.go's "shutdown after failed
Start" relies on).
* TestRuntime_TinyDrainBudget_SkipsAllPhase2 — DrainBudget so small
(1ns) the deadline fires before any Component.Shutdown call; the
serial loop bails via ctx.Err() check at the top.
* TestParsePipelineID_LeadingSlash_Rejected — `/primary` falls into
the "unknown signal" branch with signalStr = "".
* TestLoad_PathIsDirectory_ReturnsClearError — `--config=<dir>`
(operator typo: forgot the filename) produces an error mentioning
the path so the cause is operator-pinable.
All 9 new tests green. `make ci` passes.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Four parallel audit agents combed the M1 surfaces (Runtime, config, safe.Call+codegen, CLI). Acting on the highest-impact findings: 1. Concurrent Start+Shutdown race (runtime.go). Start released the lock between `started=true` and the per-startOne append. A concurrent Shutdown could clear startedReceivers/ NonReceivers mid-Start, leaving late-appended Components running but untracked for teardown. Fix: consolidate to one lifecycleMu held by Start for its full duration; Shutdown waits to acquire it so it observes the complete set. Contract: Component.Start MUST respect ctx, since a hung Start will block Shutdown. New test: TestRuntime_ConcurrentStartShutdown_NoLostComponents (race-detector verified). 2. Component.Shutdown panic escapes (runtime.go). Neither shutdownGroup nor shutdownSerial wrapped the Shutdown call in recover(). A panicking exporter killed the process mid-teardown. Fix: extract safeShutdown helper used by both paths. New test: TestRuntime_PanickingShutdown_RecoveredAsError. 3. Multi-document YAML silently discarded (config/load.go). yaml.Decoder reads one `---` block per call; trailing blocks were silently lost. Operators copy-pasting from OTel collector configs (which sometimes ship multi-doc) would lose data without warning. Fix: second Decode call after the first; success means a trailing doc exists → reject with clear message. New test: TestLoad_MultiDocumentYAML_Rejected. 4. Leading-digit pipeline instance name slipped past load-time validation (config/config.go + pipeline/id.go). `metrics/1primary` passed ParsePipelineID but would fail at NewID construction later. Fix: expose pipeline.ValidateInstanceName, call it from ParsePipelineID so the error surfaces at config load. New test: TestParsePipelineID_LeadingDigitInstanceName_Rejected. 5. runtime.Goexit inside fn hung safe.Call until ctx (safe/safe.go). recover() returns nil for Goexit; defer fell through without sending to `done`; outer select waited for ctx.Done. Fix: track normalReturn flag in the goroutine's defer; treat the no-panic-no-return case as an explicit error. New test: TestCall_GoexitInFn_DoesNotHangUntilCtx. Bonus fix: nil Component in Pipeline slices now returns an error from startOne instead of nil-derefing on c.Start. Test added. `make ci` passes (8 new tests; smoke test confirms binary still boots empty-config + shuts down cleanly on SIGTERM). Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
docs/loops/m1.5-build-minimal-receiver.md is the prompt fed to
/ralph-loop to drive M1.5 iteratively. Phases 1-8 cover:
1. Read OTel v0.152.0 graph + test fixtures; write findings
2. Design RFC-0004 (clockreceiver + stdoutexporter)
3. Implement components/receivers/clockreceiver
4. Implement components/exporters/stdoutexporter
5. Real factory-based assembly in cmd/tracecore.buildPipelines
(replaces the unreachable-arm panic)
6. Integration test that spawns the binary end-to-end
7. Receiver-author quickstart in internal/pipeline/README.md
8. Verify + emit <promise>M1.5 COMPLETE</promise>
Each phase has a file-existence checkpoint so iterations resume
deterministically. Safety rails: no force-push, no history rewrite,
DCO sign-off on every commit, STOP on unexpected state or repeated
test failures.
Goal of M1.5: exercise the M1 contract end-to-end via factory-based
assembly, which the contract test deliberately doesn't cover. Until
the factory-arm runs, the M8 unblocker is theoretical.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Original prompt had phases and safety rails but was light on what
"done well" looks like. Each iteration's Claude needs explicit
quality criteria, not just step lists.
Added:
* **North star** stated up front: every decision serves "will a
future M8-M16 receiver author find this surface ergonomic,
obvious, and hard to misuse?" Secondary stars: operator UX
first-class, code is read more than written, honest history.
* **Coding standards** section beyond "see STYLE.md": hard rules
(SPDX, gofumpt, no globals, errors.Is wrapping, ctx respect,
idempotent lifecycle) + style choices (WHY-not-WHAT comments,
operator-actionable errors, no half-finished implementations,
no premature abstractions, no backwards-compat shims).
* **Testing standards**: testify/require (never assert),
table-driven, t.Parallel by default, race detector, t.Context().
What a good test asserts: behavior not implementation, happy +
failure + edge, operator-visible error messages. Coverage
targets from STYLE.md.
* **Commit standards**: imperative subject ≤72, body wraps at 72,
DCO sign-off, Assisted-by trailer, never amend after push.
* **Decision-making heuristics** for ambiguity: re-read RFC, match
OTel v0.152.0, choose easier for receiver-author, surface
rather than guess.
* **Anti-patterns to actively avoid**: features beyond phase
scope, "while I'm here" refactors, comments restating code,
premature abstractions, silent error swallowing, mock-testing.
* **When you're stuck** algorithm: re-read, grep prior patterns,
leave TODO + WIP commit if truly stuck.
* **Per-phase quality bars** for each of the 8 phases:
- Phase 1 (OTel research): cite file:line, ≥300/≤1500 words
- Phase 2 (RFC-0004): operator YAML example, alternatives section
- Phase 3 (clockreceiver): canonical-example bar, copy-worthy
- Phase 4 (stdoutexporter): io.Writer injection, JSON-per-line
- Phase 5 (buildPipelines): every error names pipeline+component
- Phase 6 (integration test): ≤2s wall-clock, no time.Sleep
- Phase 7 (docs): code blocks valid Go, ≤300 words added
- Phase 8 (verify): full smoke + clean commits before promise
* **Anti-patterns flagged per phase** where specific to that phase.
* Three explicit **STOP conditions**: unexpected repo state,
repeated test failures (3+ attempts), undefined design decision.
Prompt is now 415 lines (was 200). Longer is fine for Ralph Loop —
the whole point is repeated context — and the additions are
scannable by section.
`make ci` passes (doc-only).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
…anout
Doing Phase 1 of the Ralph Loop manually so its findings inform the
prompt update for the remaining phases — better than letting the loop
discover surprises mid-iteration.
Three parallel agents read OTel v0.152.0 source at:
* service/internal/graph/ — how Components become a runnable graph
* service/internal/testcomponents/ — OTel's minimal-receiver pattern
* internal/fanoutconsumer/ — multi-exporter cloning logic
Synthesized into docs/research/otel-graph-notes.md (~280 lines). Key
findings that affect M1.5 design decisions:
* OTel ALWAYS inserts a fan-out node between processors and exporters,
even for single-exporter pipelines. Stable seam for capability
aggregation + multi-exporter.
* OTel builds factories in REVERSE topological order so each
factory.CreateX call has its `next consumer.X` already wired.
No two-phase construct-then-patch dance.
* consumer.Metrics/Traces/Logs need Capabilities() with MutatesData
bool. Without it, fan-out cloning is impossible and adding the
method later is breaking to every component.
* pdata's MarkReadOnly() / IsReadOnly() gates the "donate-to-last-
mutator" optimization. Wire it through.
* Topo sort + cycle detection is ~30 LOC and pays off the moment
multi-pipeline orchestration arrives.
* OTel's testcomponents pattern: package-var factory, &struct{}{}
default config, componentState mixin for lifecycle, consumer.
ConsumeXFunc embedding for signal handling shortcuts. Their entire
example_receiver.go is 108 lines.
The doc ends with 6 open questions for RFC-0004 + a list of patterns
to copy verbatim and patterns to deliberately not copy (multierr, zap,
errgroup in shutdown).
This satisfies the file-existence checkpoint for Phase 1 of
docs/loops/m1.5-build-minimal-receiver.md.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
The single load-bearing principle stated up front:
Tracecore is OpenTelemetry-Collector-compatible by default. Every
divergence is deliberate and documented.
Why: OTel's runtime is 5+ years of receiver-author and operator
feedback. Mirroring captures their R&D for free. Diverging costs us
and every contributor who has to learn our flavor.
Current accepted divergences listed in a table (slog over zap,
errors.Join over multierr, two-phase shutdown semantics, raw
WaitGroup over errgroup, pipelinetest.Fixture naming). Anything not
in the table should be identical to OTel or actively becoming so.
Covers nine areas:
* Adoption posture for M1.5 (Option C: cheap-to-add-now patterns
adopted; expensive-additive patterns deferred). Concrete lists
of what to adopt vs what to defer (Connector, exporterhelper,
multi-instance receivers, status reporting).
* Repo structure: single repo for collector core; vendor SDK
wrappers (pkg/dcgm etc.) move out-of-tree at integration time;
internal/ default; pkg/ requires accepted RFC.
* Public API discipline: stay pre-1.0 through M22+; lock 1.0 only
when ≥6 receivers ship alpha, ≥3 root-cause patterns are covered,
reproducible build is in place, SLSA Build Level 1 achieved.
* Testing tiers: unit (every PR), integration (every PR after
M1.5), hardware (nightly, gated), fuzz (PR-triggered + nightly),
benchmark (advisory, M5+).
* Receiver-author onboarding: clockreceiver is the canonical
example; don't extract patterns before the fourth receiver.
* AI collaboration patterns: codifies /learn, /loop, /ralph-loop,
/review usage. What AI is good at vs what it isn't.
* Release cadence: ad-hoc pre-M3, monthly post-M3 matching OTel.
* Community surface: defer Discord/forum/triage until M22.
* When this strategy is wrong: open an RFC, don't quietly diverge.
This is the doc future contributors and AI collaborators read first
to understand the why behind every code-level decision.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Reflects the strategic decision in docs/STRATEGY.md and the Phase 1
findings in docs/research/otel-graph-notes.md: adopt the OTel patterns
that are cheap to add now and expensive to retrofit. Specifically:
Capabilities() on consumer interfaces, always-fanout node, MarkReadOnly
propagation, topo sort, componentstate mixin, factory-as-package-var.
Changes to the prompt:
* Required reading list adds docs/STRATEGY.md and
docs/research/otel-graph-notes.md as Phase-0 context.
* New "Option C adoption checklist" section enumerates the contract
additions M1.5 brings in. Each appears as a concrete requirement
in its phase.
* Phase 1 is marked DONE since the research notes are already
committed. Loop iterations skip it.
* Phase 2 (RFC-0004) scope expanded to cover:
- Capabilities + Capabilities struct (MutatesData bool)
- internal/fanout/ package shape
- componentstate mixin location and contract
- Topo sort decision (adopt or defer)
- Why the deferred entries in RFC-0003 are superseded
* NEW Phase 3 (contract additions): adds Capabilities() to
consumer.Metrics/Traces/Logs interfaces, updates every existing
implementation (firstdata.go + all test stubs), introduces
internal/pipeline/componentstate.go with the OTel mixin pattern.
Happens BEFORE any Component is written so the first one uses
them naturally.
* Phases 4 (clockreceiver) and 5 (stdoutexporter) refined to:
- Use the package-var factory style (var Factory = ...)
- Embed pipeline.ComponentState for lifecycle
- Declare Capabilities() returning {MutatesData: false}
- Anti-patterns list adds "reimplementing lifecycle instead of
embedding ComponentState"
* NEW Phase 6 (internal/fanout/): mirror OTel's fanoutconsumer
shape. Per-signal New constructors, mutable/readonly split,
donate-to-last-mutator optimization, single-readonly fast path.
Uses errors.Join (stdlib), not multierr — matches our
divergence-from-OTel table.
* Phase 7 (buildPipelines factory assembly) rewritten:
- Bottom-up wiring matching OTel's reverse-topological build
- Always wrap exporters in fanout.NewMetrics, even for 1 exporter
- WrapFirstDataMetrics at the LAST hop before processors
- Optional topo-sort step with cycle detection
- New test: fanout invoked even for single-exporter pipelines
* Phase 9 (receiver-author quickstart) doc updates point at the
new patterns: package-var factory, ComponentState embedding,
Capabilities() declaration.
* Phase numbers shift: 1-8 → 1-10 (Phase 1 done, two new phases
inserted). Promise unchanged: <promise>M1.5 COMPLETE</promise>.
Prompt is now 518 lines (was 415). Longer is fine — repeated context
is the whole point — and each phase has explicit quality bars +
anti-patterns.
`make ci` passes (doc-only change).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Critical review pass on the 518-line prompt before launching. Real
concerns surfaced; addressed in this commit.
What changed:
* NEW "Loop hygiene" section near the top:
- One iteration = one phase = one commit (explicit rule)
- Phase-transition checklist (clean git status, ci green,
git log -1 to confirm prior phase, checkpoint holds)
- "Don't touch what you didn't break" — no scope creep
- No new failures or skips without TODO+issue ref
- Self-review sub-step at phase end (re-read quality bar,
check anti-patterns, scan git diff --stat)
- Token budget is NOT a constraint — read fully, don't skim
(per user direction; replaces an earlier read-budget hint)
- Promise tag hygiene — only in final output, never committed
inside files or code blocks
* NEW "When to stop the loop cleanly (BLOCKED state)" section:
- Instead of vague "STOP and surface", write
docs/loops/m1.5-LOOP_BLOCKED.md with phase, attempts, guess,
and what a human must decide
- Commit as `[m1.5] BLOCKED: <phase> <summary>`
- Emit <promise>BLOCKED</promise> so the loop halts with a
visible signal humans can find
- Lists 4 blocked conditions explicitly
* Phase 2 (RFC-0004) load-bearing addition:
- Quality bar now requires a "Phase deliverables" table
enumerating every file Phases 3-7 will create or modify
- If a later phase surprises us by needing a file the RFC
didn't list, the RFC was incomplete
- Explicit row for Phase 7: "remove
TestBuildPipelines_PanicsWhenFactoryRegisteredWithoutAssembly"
is critical — that test pins the panic Phase 7 removes
* Phase 3 (contract additions) exhaustive consumer-impl list:
- Run `grep -rln "ConsumeMetrics|ConsumeTraces|ConsumeLogs"
--include='*.go' .` FIRST to find every impl
- Listed the baseline implementations (firstdata.go,
contract_test.go stubs, firstdata_test.go sinks)
- Explicit "stubComponent does NOT implement consumer.X —
leave it alone" to prevent over-eager edits
* Phase 7 (buildPipelines) much stronger:
- "Remove the panic-pinning test FIRST" before any other work
- Bottom-up wiring algorithm given as concrete pseudocode
(exporters → fanout → processors reversed → first-data
wrap → receivers); each step matches OTel's reverse-topo
build per docs/research/otel-graph-notes.md §1
* Phase 8 (integration test) clarified:
- Capture stdout AND stderr separately (stdoutexporter
writes to stdout; slog writes to stderr per STYLE.md)
- Assert lifecycle log lines on stderr ("tracecore starting",
"pipeline first data", "tracecore stopped cleanly") so the
operator-UX criteria are observable end-to-end
- On test failure, t.Logf BOTH stdout and stderr
- Poll/select instead of fixed time.Sleep for sync
* NEW "Common failure modes" section near the bottom — pattern-
match table for 7 likely failures:
- Capabilities method missing after Phase 3
- panic-pinning test still present after Phase 7
- generate-check fails (components.yaml edited without regen)
- integration test hangs (SIGTERM not honored)
- fanout clone count surprise
- bottom-up wiring nil consumer panic
- two iterations producing different RFC-0004 (stale state)
Prompt grew from 518 to 641 lines. Token cost is not a concern per
user direction; signal-to-noise of the prompt is.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Implements Phase 2 of the M1.5 Ralph Loop. The RFC supersedes
RFC-0003's deferred entries for MutatesData and fan-out — both
adopted in M1.5 because they're cheap now and expensive to
retrofit across N receivers.
Contract additions specified:
* Capabilities{MutatesData bool} struct in internal/consumer/
* Capabilities() method on consumer.Metrics/Traces/Logs
* ComponentState mixin in internal/pipeline/ for receiver
authors to embed (lifecycle bookkeeping for free)
* internal/fanout/ package with per-signal New constructors
mirroring OTel's fanoutconsumer cloning strategy
* Topological sort deferred to M17 (when Connectors arrive);
M1.5's linear-pipeline shape doesn't need it
Components specified:
* clockreceiver — ticker-driven metric emitter, gauge
tracecore.clock.now. Reject <10ms intervals.
* stdoutexporter — JSON-line writer with injectable io.Writer
(testable without capturing global stdout).
buildPipelines bottom-up wiring algorithm specified:
exporters → fanout → processors-reversed → first-data wrap →
receivers. Mirrors OTel v0.152.0's reverse-topological build.
Phase deliverables table enumerates every file Phases 3-7 will
create or modify. Critical Phase 7 note: delete (not skip)
TestBuildPipelines_PanicsWhenFactoryRegisteredWithoutAssembly
since Phase 7 removes the panic it pins.
RFC-0003's Deferred Patterns table updated:
* MutatesData → superseded by RFC-0004 (adopted M1.5)
* Fan-out → superseded by RFC-0004 (adopted M1.5)
* Topological sort → added; deferred to M17
229 lines (≤300 quality bar). make ci passes (doc-only change).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Phase 3 of the M1.5 Ralph Loop. Closes RFC-0004's contract-additions
section: the cheap-now / expensive-later patterns from OTel v0.152.0
land before any Component is written, so the first Component uses
them naturally.
internal/consumer:
* capabilities.go — Capabilities{MutatesData bool} struct. Mirrors
go.opentelemetry.io/collector/consumer.Capabilities. Doc explains
fan-out's cloning decision.
* metrics.go / traces.go / logs.go — each consumer interface gains
Capabilities() Capabilities. One-method addition; not a redesign.
internal/pipeline:
* componentstate.go — ComponentState mixin with mutex-guarded
started/stopped bools and default Start/Shutdown methods.
Exported so M8+ receiver authors embed directly. Mirrors OTel's
service/internal/testcomponents/stateful_component.go.
* componentstate_test.go — 6 tests covering zero-value-usable,
Start-flips-started, Shutdown-flips-stopped, both-flags-after-
sequence, concurrent-access (race-detector-pinned), and the
canonical "embedding satisfies Component" pattern.
* firstdata.go — firstDataMetrics/Traces/Logs gain Capabilities()
that transparently delegate to the wrapped next consumer. The
instrumentation wrapper doesn't change capability semantics.
Test stubs updated (matches RFC-0004 Phase 3 deliverables row):
* contract_test.go: fakeProcessor + fakeExporter get
Capabilities() returning zero-valued Capabilities.
* firstdata_test.go: metricsSink + tracesSink + logsSink get the
same. consumer import added.
No new failures or skips. make ci passes (race detector clean).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Phase 4 of the M1.5 Ralph Loop. Implements the canonical example
receiver — the shape M8+ receiver authors will copy when writing
real components.
components/receivers/clockreceiver/ (6 files per STYLE.md):
* config.go — Config{Interval time.Duration} with Validate() that
rejects zero, negative, and <10ms intervals (rate-limit footgun).
* factory.go — package-var Factory = &factory{}. Type() returns
"clockreceiver". CreateMetrics constructs a real receiver;
CreateTraces and CreateLogs return pipeline.ErrSignalUnsupported.
* clockreceiver.go — embeds pipeline.ComponentState for lifecycle
bookkeeping. Custom Start spawns a ticker goroutine bound to an
internal ctx (not Start's short-lived authorize-ctx); Shutdown
cancels and waits with the caller's ctx as a safety net. Emits
one gauge per tick: tracecore.clock.now = current Unix time,
unit "s". Errors from downstream logged but don't stop the
ticker.
* clockreceiver_test.go — 8 tests covering:
- Config.Validate happy path + 3 reject paths
- Factory.Type() returns "clockreceiver"
- Factory.CreateDefaultConfig returns valid config (Interval=1s)
- Factory.CreateTraces / CreateLogs return ErrSignalUnsupported
- Receiver emits expected metric name within 500ms (no sleeps)
- Shutdown stops the ticker within 100ms (drains in-flight,
verifies silence for 100ms post-shutdown)
- ComponentState embedding exposes Started/Stopped accessors
* README.md — config table, supported signals matrix, limitations,
implementation notes pointing at the package-var factory and the
ComponentState embedding for M8+ authors.
* example_config.yaml — minimum working config.
Anti-patterns avoided per prompt: no time.Sleep in tests (channels
+ time.After in select); no hardcoded os.Stdout (exporter's job);
example_config.yaml present; ComponentState embedded not
reimplemented.
The receiver isn't useful in production — the metric value is
something any consumer can read from its own clock. The point is to
exercise the M1 contract end-to-end and provide a copy-worthy
example for M8.
make ci passes (race detector clean).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Phase 5 of the M1.5 Ralph Loop. Implements the canonical example
exporter — pairs with clockreceiver to exercise the M1 pipeline
contract end-to-end and to act as the copy-worthy shape for real
exporters (M8+).
components/exporters/stdoutexporter/ (6 files per STYLE.md):
* config.go — Config{Out io.Writer `yaml:"-"`}. Out is exported
so tests inject a *bytes.Buffer but yaml-skipped so it is not
an operator-configurable knob. Validate() is a no-op.
* factory.go — package-var Factory = &factory{}. Type returns
"stdoutexporter". CreateMetrics returns a real Exporter;
CreateTraces and CreateLogs return pipeline.ErrSignalUnsupported.
CreateDefaultConfig sets Out: os.Stdout.
* stdoutexporter.go — embeds pipeline.ComponentState for lifecycle.
Capabilities() returns {MutatesData: false} so fan-out can share
a read-only payload with us instead of cloning. ConsumeMetrics
marshals via pmetric.JSONMarshaler (OTLP/JSON), writes one line
+ newline. Empty pmetric.Metrics produces zero output (no
empty-line noise). writeMu serializes concurrent writes so JSON
lines don't interleave.
* stdoutexporter_test.go — 9 tests covering:
- Config.Validate (no-op passes)
- Factory.Type returns "stdoutexporter"
- Factory.CreateDefaultConfig sets Out=os.Stdout
- Factory.CreateTraces / CreateLogs return ErrSignalUnsupported
- Exporter.Capabilities reports MutatesData=false
- One JSON line per ConsumeMetrics call (parseable)
- Empty metrics produce no output (documented behaviour)
- Concurrent writes don't interleave (writeMu pinning,
race-detector clean, errors channeled back from goroutines)
- ComponentState embedding exposes Started/Stopped accessors
* README.md — config note, signal-supported matrix, capability,
limitations, implementation notes pointing at the embedding +
writeMu pattern for M8+ authors.
* example_config.yaml — empty `stdoutexporter: {}` block.
Anti-patterns avoided per prompt: single-line JSON not multi-line
(via pmetric.JSONMarshaler); no hardcoded os.Stdout (injectable via
Config.Out); no buffering across calls.
make ci passes (race detector clean).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
…loning
Phase 6 of the M1.5 Ralph Loop. Mirrors OTel Collector v0.152.0's
internal/fanoutconsumer/ shape (per docs/research/otel-graph-notes.md
§3) so M8+ multi-exporter pipelines become additive rather than a
refactor.
internal/fanout/ (5 files):
* doc.go — package overview + cloning rules.
* metrics.go / traces.go / logs.go — one NewX constructor per
signal. Each:
- Fast-path returns the input unwrapped when there is exactly
one consumer and it does not mutate data.
- Otherwise partitions consumers into mutable / readonly slices
at construction time so the hot path skips the Capabilities
lookup.
- At each Consume<Signal>: clones for every mutator except the
last; last mutator gets the original when there are zero
readonly consumers AND the payload isn't already
MarkReadOnly'd. Readonly consumers share the original; if
there are ≥2 of them, MarkReadOnly is called first so
pdata's debug-time checks catch violators.
- Capabilities() reports MutatesData=true only when every
downstream mutates (no readonly behind us) — so upstream
may donate the payload to this fanout without cloning.
- Errors aggregated with errors.Join (stdlib), never multierr.
- Strictly serial in caller's goroutine (no goroutines).
* fanout_test.go — 10 tests covering:
- Metrics: fast-path returns the consumer unwrapped
- Single mutating consumer is wrapped so Capabilities surface
- All-readonly shares payload + MarkReadOnly when ≥2
- All-mutating: last gets donated original (verified by
post-fanout mutation visibility behavioural test)
- Read-only input forces a clone even for the last mutator
- Mixed: last mutator clones, readonly gets original
- Errors joined; every consumer invoked even on partial failure
- Traces fast-path + MarkReadOnly (representative)
- Logs fast-path + MarkReadOnly (representative)
The behavioural identity tests (mutate after fanout, observe which
recorder sees the change) replaced an earlier attempt at structural
equality — pmetric.Metrics values share state via internal pointers,
so testify's value-equality misled. Mutation-visibility is the
direct test of "got the original vs got a clone."
Anti-patterns avoided per RFC-0004: no goroutines (OTel doesn't,
this is hot-path), no multierr (depguard-banned + STRATEGY.md
divergence table).
make ci passes (race-detector clean).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Phase 7 of the M1.5 Ralph Loop. The keystone M8 unblocker:
buildPipelines now does real bottom-up wiring instead of panicking
on a non-empty Factories struct.
Removed:
* TestBuildPipelines_PanicsWhenFactoryRegisteredWithoutAssembly —
that test pinned the panic Phase 7 removes, deleted entirely
per RFC-0004 Phase 7 deliverables row.
* The panic itself, in cmd/tracecore/main.go's buildPipelines.
Added (cmd/tracecore/main.go):
* Real buildPipelines that iterates cfg.Service.Pipelines (sorted
for deterministic order), parses each pipeline ID, dispatches
to a per-signal helper (buildMetricsPipeline, buildTracesPipeline,
buildLogsPipeline).
* Each per-signal helper follows the bottom-up algorithm from
RFC-0004:
1. Build exporters first via factory.Create<Signal>(ctx, set, cfg).
2. Wrap exporters in fanout.New<Signal> (always, even for one).
3. Build processors in reverse data-flow order, each receiving
the previous stage as `next`.
4. Wrap the receivers' next with pipeline.WrapFirstData<Signal>
so the first-data UX log line fires once per pipeline.
5. Build receivers, each with the wrapped consumer as `next`.
* splitName helper: parses "type" or "type/instance" component
references with operator-actionable errors on multi-slash or
invalid type chars.
* resolveComponent helper: bundles the per-component decode +
validate + ID + Settings work that every signal repeats.
* Errors at any step name the offending pipeline + component +
role for operator-actionable feedback.
* Type assertions from pipeline.Receiver/Processor/Exporter to
consumer.Metrics/Traces/Logs return clear errors rather than
panicking on contract violations.
Updated:
* components.yaml: register clockreceiver and stdoutexporter.
* cmd/tracecore/components.go: regenerated by `make generate` to
pick up the manifest changes. Committed in the same commit per
the make generate-check workflow.
* components/receivers/clockreceiver/factory.go,
components/exporters/stdoutexporter/factory.go: add a thin
NewFactory() function returning the package-var Factory.
Required by components-gen, which generates calls like
`clockreceiver.NewFactory()`. Receiver/exporter authors should
still reference the package-var Factory directly; the function
exists only as the codegen entry point. This was an oversight
in RFC-0004's Phase 4/5 deliverables — surfaced during Phase 7
when the codegen output needed a callable, and the package-var
can't be called like a function. Documented inline; the RFC's
"Implementation notes" gets an update in Phase 9.
* TestRunCollect_NonEmptyConfig_RejectsWithoutFactories renamed
to TestRunCollect_UnknownComponentType_OperatorActionableError:
with factories now registered, the error message is "unknown
receiver type" instead of "no factories registered". Body
asserts the clearer message and references the pipeline name.
New tests (factory-assembly):
* TestBuildPipelines_ValidConfig_BuildsPipelines: clockreceiver →
stdoutexporter pipeline assembles cleanly.
* TestBuildPipelines_UnknownReceiverType_OperatorActionableError:
"ghost" receiver errors with pipeline + type names.
* TestBuildPipelines_BadPerComponentConfig_FieldName: interval=1ms
surfaces clockreceiver.Validate's "interval" field name.
* TestBuildPipelines_SignalMismatch_ErrSignalUnsupported:
clockreceiver under a traces pipeline returns the sentinel.
* TestBuildPipelines_UndeclaredComponentRef: pipeline references
a name not in the top-level section.
Existing buildPipelines tests updated for new (ctx, logger, cfg,
factories) signature; discardLogger + parseConfig helpers added.
Anti-patterns avoided per prompt: every error message names the
offending pipeline + component; no silent dropping; the unreachable
panic is gone; existing TestRunCollect_*, TestBuildPipelines_*, and
TestContract_* tests still pass.
vet/lint/test all pass (race-detector clean). make ci will pass
after this commit lands (generate-check fails pre-commit when
components.yaml and components.go are both in the working tree;
post-commit they match HEAD and the check is a no-op).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Phase 8 of the M1.5 Ralph Loop. The first test that actually runs
the binary end-to-end — verifying that the contract validated by
unit tests holds at the binary boundary too.
TestIntegration_ClockreceiverToStdoutexporter:
1. Builds the tracecore binary into t.TempDir() via `go build -o`.
Errors from go build include stderr so failures are debuggable.
2. Writes a config (clockreceiver every 100ms → stdoutexporter)
in t.TempDir(); runs the binary with --log.format=text so
stderr lifecycle lines are tail-able.
3. Captures stdout and stderr separately via a sync.Mutex-protected
buffer (bytes.Buffer alone races with exec.Cmd's writer goroutine
under -race).
4. Polls stdout at 25ms intervals until ≥3 metric lines appear,
bounded by a 1.5s scenario deadline. No fixed time.Sleep.
5. Sends syscall.SIGTERM via cmd.Process.Signal; cmd.Wait()s.
6. Asserts:
- cmd.ProcessState.ExitCode() == 0 (clean exit)
- ≥3 stdout lines, each valid JSON
- "tracecore.clock.now" appears in at least one line
- Stderr contains "tracecore starting", "pipeline first data",
and "tracecore stopped cleanly" — the operator-UX criteria
are observable end-to-end, not just inside unit tests
7. On test failure, t.Cleanup t.Logf's both stdout and stderr so
CI logs alone suffice to diagnose.
Quality bar per RFC-0004 / loop prompt Phase 8:
* Skipped on `go test -short` — building the binary takes ~2s; not
worth paying on every TDD iteration.
* No fixed time.Sleep — select+ticker polling with a deadline.
* Scenario wall-clock ~1s in practice (3-line threshold hit
quickly; SIGTERM + Wait fast); local test run ~1.07s.
* t.Context() bounds the test framework's own deadline as the
outer safety net (CommandContext kills the binary if t fails).
* Race-detector clean.
Two gosec false positives suppressed with targeted nolint:
binPath and cfgPath both come from t.TempDir() and are
fully test-controlled, not operator input.
make ci passes. `go test -race -run Integration ./cmd/tracecore/...`
exits 0 in ~1.1s.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Phase 9 of the M1.5 Ralph Loop. Documents the contract from a
receiver author's perspective so M8+ contributors can write a real
receiver without trial-and-error against the codebase.
internal/pipeline/README.md changes:
* NEW "Writing a receiver" section (228 words, ≤300 cap):
- 6 files per STYLE.md §Component layout
- Factory pattern: package-var + NewFactory() bridge for codegen
- Lifecycle: embedding pipeline.ComponentState + override pattern
- Capabilities: default false, set true only for mutating processors
- Testing: pipelinetest.New(t) returning Fixture
- Registration: components.yaml + make generate workflow
- Three pitfalls, each tied to a specific clockreceiver.go line:
no globals + TelemetrySettings.Logger (line 45)
ctx-respecting goroutines (line 111)
idempotent Shutdown (line 80)
* "What's here" updated to reflect post-Phase-7 layout
(componentstate.go, firstdata.go, runtime.go now listed;
pipelinetest description fixed: New(t) returning Fixture,
not NewRuntime(t)).
* Sibling packages list adds internal/fanout.
* "Deferred from M1" updated:
- MutatesData + fan-out removed (adopted in M1.5 per RFC-0004)
- Footnote referencing RFC-0004 for the supersession
docs/rfcs/0003 implementation notes updated with two new bullets:
* Note that the buildPipelines panic was superseded in M1.5 by
RFC-0004 Phase 7 (real factory-based assembly).
* NewFactory() bridge — package-var Factory pattern needed a
thin NewFactory() function for components-gen to call. This
was an RFC-0004 Phase 4/5 oversight; surfaced during Phase 7
codegen integration. Documented retroactively here for future
contributors.
make ci passes (doc-only change).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Signed-off-by: Tri Lam <trilamsr@gmail.com> # Conflicts: # Makefile # go.sum
The /ralph-loop plugin writes /.claude/ralph-loop.local.md during a run; it's cleaned up when the loop emits its completion promise but lingers if the loop is cancelled mid-flight. Adding the pattern to .gitignore so subsequent loop runs don't show as dirty working tree. Glob `/.claude/*.local.md` catches any future per-session state files the same plugin family might write. Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
The scoped logger built by resolveComponent (cmd/tracecore/main.go) already attaches `component=<id>` to every log line. Three call sites in clockreceiver.go were ALSO logging `pipeline=<r.id>` explicitly — double-scoping AND mislabelling, since r.id is a Component ID, not a pipeline ID. Smoke-test output before: msg="clockreceiver started" component=clockreceiver pipeline=clockreceiver interval=100ms After: msg="clockreceiver started" component=clockreceiver interval=100ms Removed: * line 71 (Start log) * line 95 (Shutdown ctx-expired warn) * line 137 (downstream-rejected warn) `r.id` field was used only by those log calls; removed from the struct + constructor. The scoped logger remains the single source of component identity in log output. make ci passes; manual smoke test still produces clean lifecycle log lines on stderr. Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
Per STYLE.md §Changelog (Keep a Changelog format), every user-visible
change gets an [Unreleased] entry. M1 + M1.5 introduced a substantial
surface; documenting before the PR lands so reviewers can scan the
changelog instead of the 30+ commits.
Added section covers:
* M1 keystone (internal/pipeline, Runtime, two-phase shutdown,
pipelinetest fixture, first-data wrappers)
* internal/consumer push interfaces with Capabilities()
* internal/safe.Call cgo wrapper
* internal/config YAML loader with line-numbered errors
* internal/fanout per-signal cloning helpers
* pipeline.ComponentState mixin
* tools/components-gen + components.yaml codegen
* components/receivers/clockreceiver (canonical example)
* components/exporters/stdoutexporter (canonical example)
* cmd/tracecore collect with factory-based assembly
* Five operator-UX patterns
* Integration + acceptance tests
* RFC-0003, RFC-0004, STRATEGY.md, OTel research notes,
receiver-author quickstart
Changed section:
* Makefile generate / generate-check targets; ci gates on both
* --config required, --shutdown.drain-budget flag added
* sysexits.h exit codes
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Three artifacts: docs/loops/m1.5-build-minimal-receiver.md (modified): Added STATUS banner at the top marking the loop COMPLETE with the commit range (c2747ff..09c422c) and a forward pointer to ROADMAP. Preserved as historical record; do NOT re-run. docs/loops/ROADMAP.md (new): Sequence of upcoming Ralph Loop work. Tier 1: m1.6 cleanup (this PR's follow-up loop). Tier 2: m2 self-telemetry. Tier 3: m3 reproducible-build, m4 lint+test harness, m4b failure injection. Tier 4: m8+ receivers. Each loop is independently launch-able; the file is the index. docs/loops/m1.6-cleanup.md (new): The next loop's prompt. Six phases addressing the M1.5 self-review compromises: 1. Deduplicate 3-signal buildXPipeline functions 2. Deduplicate 3-signal fanout structs 3. Cache integration-test binary build 4. Fuzz targets for ParsePipelineID, Load, splitName 5. Decide NewFactory() bridge long-term 6. Verify + promise M1.6 COMPLETE Follows the same loop-hygiene + BLOCKED-state contract M1.5 used. Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
Five additions surfaced when auditing what I generate locally that
shouldn't ship:
* /coverage/ — some tools dump coverage HTML into a directory
variant; the existing coverage.html / coverage.txt globs miss
this.
* *.pprof — `go test -cpuprofile=foo.pprof` and friends.
* __debug_bin* — Goland writes these for delve-attached runs.
* /.claude/projects/ — claude-mem's per-project local memory DB
(this run accumulated ~50 observations; personal, not shareable).
* /.claude/settings.local.json — local permission allowlist
(contains user-specific paths + deny rules).
* *.local.* — catch-all for `<thing>.local.<ext>` overrides
(direnv, .envrc.local, per-machine config patterns).
Together these prevent operator-private state from leaking into the
repo when contributors hit `git add .`.
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Personal workflow prompts for multi-phase tasks are scaffolding, not
project artifacts. Untracking the directory keeps individual
contributors free to write their own task prompts without committing
them; architectural decisions still flow through docs/rfcs/.
Changes:
* .gitignore: add /docs/loops/ to untracked patterns; tighten
/.claude/*.local.md to use the glob alone (drop the specific
plugin-state file mention).
* docs/STRATEGY.md: remove the AI-collaboration subsection that
named a specific workflow plugin as a "default for iteration".
The remaining /learn /loop /review /security-review bullets are
sufficient — those are tools we lean on; the meta-process for
multi-phase work doesn't need to be in the strategy doc.
* docs/rfcs/0004-clockreceiver-stdoutexporter.md: remove the
References entry pointing at the M1.5 task prompt (no longer
in-tree).
* docs/loops/* — untracked (git rm --cached); files remain on
disk for personal use, just not version-controlled.
No code changes; make ci passes (doc-only).
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr
added a commit
that referenced
this pull request
May 14, 2026
Closes the four actionable items from PR-13 v3 review:
A. TestIntegration_SignalDuringBoot retry loop:
- Track `signaled` bool; fail with a meaningful message if all
250 attempts return errors (was: misleading "process hung" via
the cmd.Wait deadline)
- Detect os.ErrProcessDone and break early (process exited
before we delivered the signal — nothing to wait for)
B. validate.go ctx-cancel exit-code comment:
- runValidate returns exitFailure on ctx-cancel; runCollect
returns exitOK. Inconsistency was intentional: validate's job
is to PROVE config validity; cancelled validate hasn't proved
anything and shouldn't claim success. Comment makes this
intentional asymmetry visible.
C. Pointer comment at top of signalops.go warning future
contributors away from adding the signalConsumer generic
constraint. The PR-13 v2 attempt dropped coverage 74%→43%
under -coverpkg; the comment + FOLLOWUPS entry should stop the
next person from rediscovering it.
D. New `make doc-check` (scripts/doc-check.sh): verifies every
Test/Fuzz/Benchmark identifier referenced in rot-prone docs
(docs/FAILURE-MODES.md + internal/pipeline/README.md)
corresponds to a real function in the test tree. Scoped tight
on purpose — RFCs intentionally reference deleted-test names
as historical record. Wired into `make ci` between govulncheck
and build.
Plus two minor v3 review tweaks:
- internal/pipelinebuilder/doc.go gains an "As of 2026-05" date so
future readers can judge staleness of its "where the tests live"
guidance (review item #4).
- FOLLOWUPS "considered and skipped" DRY-wrap entry now says
"Revisit if: never — this is a style commitment" (review #12).
Current state: doc-check finds 23 valid references across 2 docs;
all CI gates pass.
Signed-off-by: tree <tree@lumalabs.ai>
Assisted-by: Anthropic:claude-opus-4-7 [Claude Code]
Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr
added a commit
that referenced
this pull request
May 14, 2026
Address self-review items #5 (pkg-name confusion), #12 (MILESTONES glyph), #16+#17 (commit/scope discipline), #18 (telemetry pkg architecture intent), #19 (Resource auto-pop) — all doc-only. internal/telemetry/doc.go + internal/selftelemetry/doc.go: explicit "where do I add code" guidance with the conceptual split (process- level SURFACE vs per-component PRODUCER CONTRACT). Includes the architecture intent for future milestones — when tracing lands the TracerProvider goes in `telemetry`, signal-direction-driven split (consumer/producer) rather than modality-driven (metrics/traces). selftelemetry/doc.go also pins the cardinality contract explicitly (every label value must be low-cardinality; canonical Kind* constants exist for that purpose) and the Resource auto-population contract. MILESTONES.md: new ☒ glyph for "policy-declined, not deferred." Applied to the /readyz SLO-threshold criterion — RFC-0006 explicitly chose degraded ≠ not-ready, that's a policy, not a missing feature. FOLLOWUPS.md: new "M2 process lessons" section captures three honest process gaps from this milestone for the next loop's prompt: - Scope-creep (Go-bump + CI-fix-deadlines shipped under [telemetry]) - Commit history discipline (25 commits incl. 13 review-fixes) - Self-assessment optimism caught by gates four separate times Lessons → next-milestone-prompt input, not corrective action for M2 (retroactive split blocked by no-force-push rule). Assisted-by: Anthropic:claude-opus-4-7 [Claude Code] Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr
added a commit
that referenced
this pull request
May 15, 2026
Closes the last two A+ ceiling items in scope. Floor #12 — SchemaURL link rot. TestSchemaURL_ResolvesInRepo reads source.go for the kernelEventsSchemaURL constant, extracts the /docs/schemas/... suffix, and asserts the file exists at the resolved repo path. A future SchemaURL bump that forgets to add the backing doc file fails the gate at PR time instead of on first downstream 404. Ceiling C1 / C2 / C9 — docs/M9-AGRADE-GAP.md. Mirrors the M8 AGRADE-GAP shape so cross-receiver comparison is direct. Walks: - Composite ≈ 4.35 across six rubric lenses (Correctness 4.5, Operator UX 4.4, Maintainability 4.5, Honesty 4.6, Precedent 4.0, Ownership 4.1). Above A+ threshold (4.25); no lens below 4.0; no kill-switch failures. - Floor table: 14 of 18 known-bug-classes fully pinned by M9 tests; 2 partial (live k8s, probabilistic TOCTOU) with honest disclosure; 2 N/A. - Cross-receiver pattern audit: explicit table of what M8 → M9 inheritance, what M9 → M11+ inherits, and where DCGM should backport (RUNBOOK Kind parity test + warnOnce gate). - Deferred items table: each FOLLOWUPS row reproduced with its falsifiable trigger. - Verification commands listed so the composite score is reproducible. Self-rated A+, NOT externally ratified. C8 (independent reviewer score within ±0.2 of self-grade) remains process-gated. Assisted-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Tri Lam <trilamsr@gmail.com>
trilamsr
pushed a commit
that referenced
this pull request
Jun 1, 2026
Eight 1-page pattern-design specs covering #2 IB link flap, #7 dataloader hang, #8 NCCL timeout no-HW, #9 NCCL bootstrap timeout, #10 CUDA OOM deceptive allocator, #11 checkpointer hang, #12 loss spike NaN, #13 silent data corruption. Each carries the standard detector-design shape (symptom, layers, signal sources, evaluation rule, verdict attrs, edge cases, status, open questions) so the next contributor can write a TDD red test directly off the spec. Status: all 8 marked planned. #10 already has issue #303; the spec frames the design alongside. NORTHSTARS Appendix A gains a Spec column; docs/README + patterns README link the new specs. Signed-off-by: Tri Lam <tri@maydow.com>
5 tasks
trilamsr
added a commit
that referenced
this pull request
Jun 4, 2026
## Summary Document the intentional producer/SDK split between `module/pkg/patterns/verdict.go` (producer-side) and `module/sdk/verdict/verdict.go` (client SDK). PR #508 review plus several follow-up audits repeatedly asked whether the two files are duplicates — they are not. This PR puts the rationale at every touch-point so the next reviewer does not re-ask. Changes: - `module/sdk/verdict/README.md`: new section **"Why two `verdict.go` files?"** covering audience split, the JSON-schema binding contract (v1.0-rc1 cut-criterion #12), what goes where, versioning policy, and the **anti-pattern of "dedupe by deletion"**. - `module/pkg/patterns/verdict.go`: 1-line godoc pointer → SDK README. - `module/sdk/verdict/verdict.go`: 1-line godoc back-ref → SDK README. No code changes. No schema changes. No public API changes. ## Why two files (TL;DR for reviewers) | File | Audience | Holds | | --- | --- | --- | | `pkg/patterns/verdict.go` | Producers (detectors) | Per-pattern typed verdicts; producer-internal hooks free to evolve. | | `sdk/verdict/verdict.go` | Consumers (alert routers, webhooks) | Single typed envelope subset; per-pattern extras via `Verdict.Extras`. | Binding contract: `docs/schemas/verdict-1.0.0-rc1.json`. Collapsing them would (a) couple producer-internal evolution to SDK release cadence, (b) erase typed pattern-specific fields detectors unit-test against, (c) regress producer type safety into the untyped extras bag. ## Test plan - [x] `make doc-check` — clean (markdown links resolve, anchors resolve, no comment-noise drift). - [x] `golangci-lint run ./...` — clean (commit-time hook). - [x] `go vet ./...` — clean. - [x] `go mod verify` — clean. - [x] Six-months-cold-reader test: README section + both godocs name the rationale at the touch-points, so the next reviewer asking "why two files?" hits the answer without searching. ```release-notes docs: explain the intentional producer/SDK split between module/pkg/patterns/verdict.go and module/sdk/verdict/verdict.go, including the v1.0-rc1 schema binding contract and the anti-pattern of "dedupe by deletion". ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
M1 ships the pipeline runtime contract; M1.5 proves it works end-to-end with one canonical receiver, one canonical exporter, and an integration test that runs the binary.
internal/pipeline/)Component,Host, per-signalFactoryinterfaces;Runtimewith two-phase shutdown (1s receivers + operator-configurable drain budget);pipelinetest.New(t)fixture;WrapFirstData*consumers;ComponentStatemixininternal/consumer/)Metrics/Traces/Logspush interfaces withCapabilities() Capabilities(MutatesData flag)internal/fanout/)errors.Joinaggregation; mirrors OTel v0.152.0fanoutconsumerinternal/safe/)panic(nil)+runtime.Goexithandlinginternal/config/)tools/components-gen/,components.yaml)make generateproducescmd/tracecore/components.go;make generate-checkgates on freshnesscomponents/receivers/clockreceiver/)tracecore.clock.nowgauge at configurable interval. M8+ receivers mirror this shapecomponents/exporters/stdoutexporter/)ConsumeMetricsto a configurableio.Writercmd/tracecore/)tracecore collect --config=<path>boots, builds pipelines bottom-up (exporters → fanout → processors reversed → first-data wrap → receivers), runs until SIGTERM/SIGINT, two-phase shutdown; sysexits.h exit codessafe.Call; empty-pipeline boot logs once + idles; first-data log per pipeline;pipelinetest.New(t)fixturecmd/tracecore/integration_test.go)internal/pipeline/contract_test.go)Capabilities(), fan-out,ComponentState, package-var factory)docs/STRATEGY.mddocs/research/otel-graph-notes.mdservice/internal/graph+testcomponents+fanoutconsumerinternal/pipeline/README.md)ComponentStateembedding,Capabilities,pipelinetest.New, registration, pitfalls (each tied to aclockreceiver.goline)CHANGELOG.md[Unreleased]section documenting all user-visible additions and changesMakefilegenerate+generate-checktargets; merged withorigin/main'stidy-check,check,hooks(PRs #10/#11).gitignore*.local.*), pprof, Goland debug binsLinked issue(s)
N/A — foundation work establishing the keystone M8+ receivers plug into.
Release notes
Checklist
contract_test.go), and integration (integration_test.go)make cipasses locallygit commit -s) — DCO verified on all 41 commitsSTYLE.md§Component layout (clockreceiver + stdoutexporter each ship the 6 files)