diff --git a/docs/research/m15-container-stdout.md b/docs/research/m15-container-stdout.md
new file mode 100644
index 00000000..07662e91
--- /dev/null
+++ b/docs/research/m15-container-stdout.md
@@ -0,0 +1,2664 @@
+# M15 — Container-stdout receiver: research
+
+Synthesis of six parallel research passes for [`M15` in
+MILESTONES.md](../../MILESTONES.md#m15-container-stdout-receiver),
+performed 2026-05-19 against current upstream sources. Purpose: address
+the architectural knowledge gaps **before** any design or code work.
+
+This file is not a design doc. It is the evidence base for one. Open
+decisions are listed in §10. Rubric-edit asks are listed in §11.
+
+## 1. Decision sheet
+
+| Area | Finding | Decision implication |
+|---|---|---|
+| CRI on-disk format | Space-delimited `TS STREAM TAG BODY\n`, RFC3339Nano timestamp, `F`/`P` partial-line tags. Identical across containerd and CRI-O. | Parser is well-specified. Use the off-the-shelf parser, not a hand-rolled regex. |
+| Path topology | Runtime writes `/var/log/pods/<ns>_<pod>_<uid>/<container>/<restart>.log`. `/var/log/containers/*.log` is a kubelet-created symlink farm. | Tail `/var/log/pods/**/*.log` directly. Mount `/var/log` (parent) read-only, not just `/var/log/pods`. |
+| Rotation driver | Kubelet (not the runtime) rotates via `rename → ReopenContainerLog → gzip`. Timestamped suffix, not numeric. | Tailer must handle inode changes and a brief window where `0.log` is absent. |
+| Tailer strategy | OTel filelog uses **poll + fingerprint**, not `fsnotify`/`tail` libs. Both common Go tail libs (`nxadm/tail`, `hpcloud/tail`) are stale or unmaintained. | Depend on `pkg/stanza/fileconsumer` rather than rolling our own or pulling a stale dep. |
+| Build approach | Filelog receiver + `container` stanza operator already do glob, rotation, partial-line recombine, pod attribution from path. | **Depend** on filelog + container operator. Add tracecore features as downstream processors. |
+| Per-rank attribution | No upstream OTel component reads pod env vars. Filelog + `k8sattributes` only attributes by IP / UID / labels. | Net-new for tracecore. Read `Pod.spec.containers[].env` via informer, not by execing in containers. |
+| SemConv `gen_ai.*` | Upstream explicitly forbids vendor-prefixed `gen_ai.*` attributes. No `gen_ai.training.*` exists. Active proposal puts training under `rl.*`. | **Rubric edit needed:** rename `gen_ai.training.rank` → `tracecore.training.rank`. See §7 and §11. |
+| Dataloader regex | Only torchvision and detectron2 emit `data_time` by default. Lightning / NeMo / HF Trainer / Composer do not. | Ship a multi-pattern default covering torchvision + detectron2. Document the rest as user-instrumentation territory. |
+| containerd #11149 | Open upstream bug, last updated 2025-05-30, marked Stale. Mechanism (per the 2025-01-22 reproducer in the issue): container stdout can be silently dropped when an in-container process reads from FD 1 (e.g. application self-tee, `cat /proc/1/fd/1`). Shared-pipe contention with containerd's log copier. Standard workloads that do not read FD 1 are unaffected. | Receiver README must enumerate the narrow failure mode rather than claim universal lossless delivery, but the practical surface is small. |
+
+## 2. CRI log format and path topology
+
+The wire format is defined in the [kubelet CRI logging design
+proposal](https://github.com/kubernetes/design-proposals-archive/blob/main/node/kubelet-cri-logging.md),
+not the CRI protobuf spec itself. The CRI proto only covers RPCs like
+`ReopenContainerLog`; the on-disk encoding is a kubelet convention each
+runtime implements.
+
+```
+2016-10-06T00:17:09.669794202Z stdout F The content of the log entry 1
+```
+
+- **Separator:** single ASCII space between the four fields; body may
+  contain spaces, so parsers split on the first three only.
+- **Timestamp:** `time.RFC3339Nano`, always UTC.
+- **Stream:** literal `stdout` or `stderr`.
+- **Tag:** single rune `F` (full) or `P` (partial), but the proposal
+  reserves comma-extension (`P,foo`); split tag field on `,` and look
+  for `F`/`P` rather than match the whole token.
+- **Partial-line reassembly:** runtime emits `P` lines until the final
+  segment is tagged `F`. Consumers concatenate consecutive same-stream
+  `P` lines with the trailing `F`. Containerd's stdout buffer is 16 KiB;
+  CRI-O via conmon is the same historically.
+- **Docker JSON format** (`{"log":...,"stream":...,"time":...}`) is
+  obsolete; dockershim was removed in Kubernetes 1.24. If we target
+  CRI runtimes only (containerd, CRI-O, cri-dockerd), assume CRI text.
+
+### Path topology
+
+The runtime writes directly to
+`/var/log/pods/<ns>_<pod>_<uid>/<container>/<restart>.log`.
+`/var/log/containers/<podName>_<podNamespace>_<containerName>-<containerID>.log`
+is a symlink kubelet creates via
+[`legacyLogSymlink` in `pkg/kubelet/kuberuntime/legacy.go`](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/legacy.go),
+pointing at the real pod-log file. Both paths exist on every modern
+distro (GKE/COS, EKS AL2/Bottlerocket, kind, k3s, OpenShift).
+
+Note the **field-order inversion**: the pod-log directory is
+`<ns>_<pod>_<uid>` but the symlink filename is `<pod>_<ns>_<container>-<id>`.
+
+Pod and namespace names cannot contain `_` per
+[Kubernetes object names](https://kubernetes.io/docs/concepts/overview/working-with-objects/names/)
+(namespaces: DNS subdomain; pods: RFC 1123 label), so the underscore
+split is unambiguous. Defensively split on the **last two** underscores
+(UID first, then pod) so future CRDs that produce pod-like objects with
+relaxed naming don't break the parser silently.
+
+### Mount strategy
+
+Mount `/var/log` (the parent) read-only. Mounting only `/var/log/pods`
+breaks on OpenShift and Bottlerocket where `/var/log/pods` may be a
+symlink into `/var/lib/kubelet/...` and the resolved target lives
+elsewhere under `/var/log`. Read-only is sufficient: kubelet and the
+runtime own writes and rotation; we never write.
+
+## 3. Rotation mechanics
+
+Driven by the kubelet `containerLogManager` goroutine
+(`pkg/kubelet/logs/container_log_manager.go`), not by the runtime. The
+runtime only responds to a CRI `ReopenContainerLog` RPC.
+
+Tunables: `containerLogMaxSize` (default `10Mi`), `containerLogMaxFiles`
+(default 5), `containerLogMaxWorkers`, `containerLogMonitorInterval`.
+Trigger is **size-only**; no time or line-count thresholds.
+
+Sequence on rotation:
+
+1. `rename("0.log", "0.log.20060102-150405")` — atomic on same FS; the
+   inode is now reachable only via the timestamped path. Any open `fd`
+   on the old name stays valid (POSIX semantics).
+2. Kubelet calls `ReopenContainerLog` over CRI.
+3. Runtime closes its old `fd` and opens a fresh `0.log` (new inode).
+4. Later, a separate `compressLog` step gzips older rotated files via a
+   `.tmp` intermediate, leaving `0.log.20060102-150405.gz`.
+
+Observable on disk in sequence:
+`0.log` (live) → `0.log.<ts>` (just rotated, plain) → `0.log.<ts>.gz`.
+
+Edge cases:
+
+- **Brief absence window.** Between rename and runtime reopen, `0.log`
+  does not exist. A tailer that treats "file missing" as fatal will
+  mis-handle this. Container writes are not lost during the window; the
+  runtime buffers through the shim until reopen.
+- **`MaxFiles=2`.** Retention math is `MaxFiles - 2 = 0` retained
+  rotated files. A slow tailer can lose the tail of the rotated file
+  before draining it.
+- **Compression race.** If the tailer is slow, the renamed plain file
+  becomes `.gz` mid-read. Either drain promptly or decompress.
+- **Fast writers.** Kubelet does not truncate; it only rotates. A
+  pod writing faster than the monitor interval can grow `0.log` past
+  `MaxSize` until the next tick. No drop at the kubelet layer.
+- **containerd #11149.** Container stdout can be silently dropped when
+  **anything inside the container reads from FD 1** (e.g. the
+  application tees its own stdout, or another in-container process
+  `cat /proc/1/fd/1`). Reproducer in the issue (2025-01-22): writing
+  100 lines from nginx with an in-container `cat /proc/1/fd/1 | tee` —
+  the container's tee saw 90 lines, the kubelet log file saw only 10.
+  Without the in-container reader, all 100 lines reached the log.
+  The mechanism is shared-pipe contention between the in-container
+  reader and containerd's log copier, not generic disk-I/O backpressure.
+  Open, marked Stale, last updated 2025-05-30, no PR. Narrow failure
+  mode: standard workloads that don't read their own FD 1 are
+  unaffected. We still cannot claim universal lossless delivery, but
+  the practical reliability surface is much smaller than a generic
+  "0.log is lossy" framing would suggest.
+
+## 4. Tailer strategy
+
+OTel filelog uses neither `fsnotify` nor a `tail`-style library. Its
+[`pkg/stanza/fileconsumer`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza/fileconsumer)
+is a **poll-based reader** that scans the glob each `poll_interval`
+(default 200 ms), fingerprints file-head bytes to detect rotation
+(handles both move/create and copy/truncate), and persists byte offsets
+via the `storage` extension. No OS-level inotify or `inotify`-via-cgo;
+works identically on Linux, macOS, Windows.
+
+Go tail-library survey:
+
+| Library | Last commit | Inode-follow | Verdict |
+|---|---|---|---|
+| `github.com/nxadm/tail` | 2023-10 | `Config.ReOpen` works | Maintained but stale; ~400 stars |
+| `github.com/hpcloud/tail` | 2018 (abandoned) | Buggy edges | Don't use |
+| `github.com/fsnotify/fsnotify` | active | N/A (low-level) | Foundation if rolling our own |
+| `pkg/stanza/fileconsumer` (OTel) | active | Fingerprint-based | Production-grade, what filelog uses |
+
+If we depend on filelog (see §5), we inherit `fileconsumer` for free.
+Rolling our own on `fsnotify` only makes sense if we explicitly reject
+filelog; pulling `nxadm/tail` adds a stale dependency for no gain.
+
+**Rubric note.** [`MILESTONES.md` §M15](../../MILESTONES.md#m15-container-stdout-receiver)
+says the receiver "follows inode, not path". Poll-plus-fingerprint
+satisfies the *intent* (correctly track rotation) but does it by file
+identity (fingerprint hash) rather than by `fstat` inode number. The
+rubric phrasing predates this finding; the integration-test gate should
+verify "zero record loss across rotation", not specifically inode
+semantics. See §11.
+
+## 5. Build approach: depend on filelog + container operator
+
+Three options were evaluated: **depend** (import filelog as a Go
+dependency), **borrow** (fork parsing code), **rewrite** (own the
+stack).
+
+Recommendation: **depend**. Apache-2.0 ↔ Apache-2.0 is clean.
+
+The [`container` stanza operator](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/container.md)
+already handles: auto-detection across docker/crio/containerd, CRI
+partial-line recombine (`P`/`F`, `max_log_size: 1MiB`), path parsing of
+`NAMESPACE_PODNAME_UID/CONTAINERNAME/RESTARTCOUNT.log`, and extraction
+of `k8s.pod.name`, `k8s.pod.uid`, `k8s.container.name`,
+`k8s.container.restart_count`, `k8s.namespace.name`, plus `time`,
+`logtag`, `log.iostream`.
+
+Minimal config that gets us most of the rubric for free:
+
+```yaml
+receivers:
+  filelog:
+    include: [/var/log/pods/*/*/*.log]
+    start_at: end
+    operators:
+      - type: container
+```
+
+The three tracecore differentiators all compose **downstream** of this:
+
+- **Per-rank attribution.** New processor reading
+  `Pod.spec.containers[*].env` via an informer, mapping configured
+  env names (e.g. `RANK`, `WORLD_SIZE`, `TORCHELASTIC_RUN_ID`) to
+  attributes keyed on `(k8s.pod.uid, k8s.container.name)`.
+  No upstream OTel-collector-contrib processor we surveyed does this
+  today;
+  [`resourcedetectionprocessor`'s `env` detector](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor)
+  reads only `OTEL_RESOURCE_ATTRIBUTES`, not arbitrary env vars.
+  Components not surveyed include `k8sobjectsreceiver` (could be
+  configured to project pod-spec env into attributes), Vector's
+  `kubernetes_logs` source, and Datadog Agent autodiscovery — each
+  may approximate the behavior with different ergonomics. Validate
+  one of these is unsuitable before declaring this work greenfield.
+- **Dataloader regex tagging.** Either a `transform` processor with OTTL
+  statements, or a dedicated processor with a configurable regex map.
+  See §6 for the regex landscape.
+- **Rate limiting.** New per-key token-bucket processor keyed on
+  `(k8s.pod.uid, k8s.container.name)`. Filelog has none today; the
+  closest existing processors are `probabilisticsampler` and
+  `tailsampling`, neither of which solves per-key budgeting.
+
+Trade-offs accepted:
+
+- **Binary size.** Filelog + `pkg/stanza` adds dependency weight, but
+  tracecore's binary already ships an OTel-derived pipeline; the delta
+  is small relative to current footprint. Measure before claiming.
+- **API stability risk.** `pkg/stanza` Go API is less stable than the
+  YAML config surface. If we hit churn, the fallback is to wrap filelog
+  by config rather than by Go import. Lower risk if we only consume
+  filelog's receiver factory.
+
+## 6. Pod attribution and dataloader regex
+
+### Env-var landscape
+
+Six launchers surveyed; the canonical ground truth is **the in-process
+env at training-script time**, not the PodSpec.
+
+- **torchrun / torch.distributed.run** sets `RANK`, `LOCAL_RANK`,
+  `GROUP_RANK`, `ROLE_RANK`, `WORLD_SIZE`, `LOCAL_WORLD_SIZE`,
+  `ROLE_WORLD_SIZE`, `MASTER_ADDR`, `MASTER_PORT`,
+  `TORCHELASTIC_RESTART_COUNT`, `TORCHELASTIC_RUN_ID`. From
+  [`torch/distributed/run.py` lines 205-241](https://github.com/pytorch/pytorch/blob/main/torch/distributed/run.py):
+  `TORCHELASTIC_RUN_ID` is documented as "equal to the rendezvous
+  `run_id`" — stable only if the operator passes `--rdzv-id=$JOB_ID`.
+- **Kubeflow PyTorchJob v1** injects `RANK`, `WORLD_SIZE`, `MASTER_ADDR`,
+  `MASTER_PORT`, plus `PET_*` mirrors for torchelastic. Worker pods are
+  offset by +1 (Master is rank 0). PodSpec-level `RANK` is **wrong** as
+  soon as `nproc_per_node > 1`, because torchrun overwrites it
+  per-process. Confirmed at
+  [`pkg/controller.v1/pytorch/envvar.go` (release-1.9)](https://github.com/kubeflow/training-operator/blob/release-1.9/pkg/controller.v1/pytorch/envvar.go).
+- **torchx Kubernetes scheduler** sets only `TORCHX_RANK0_HOST` and
+  `TORCHX_IMAGE` at pod level; rank vars come from the inner `torchrun`.
+- **MPI Operator** sets nothing rank-like on the PodSpec.
+  `OMPI_COMM_WORLD_RANK` is set per-process by `orted` at spawn; it is
+  not visible in `Pod.spec.env`. Must read from `/proc/<pid>/environ`
+  or rely on the training script to log it.
+- **Ray Train** sets `RANK`/`WORLD_SIZE`/`LOCAL_RANK` etc. **inside the
+  actor process**, not on the pod. Mirrors the torchrun contract
+  deliberately ([`ray/train/torch/config.py` line 167](https://github.com/ray-project/ray/blob/master/python/ray/train/torch/config.py)).
+  Run-id via `RAY_JOB_ID` only when using KubeRay's `RayJob`.
+- **JobSet** sets no custom env vars. Attribution by pod labels
+  (`jobset.sigs.k8s.io/job-index`, `replicatedjob-name`) and the
+  standard `JOB_COMPLETION_INDEX` from kube's IndexedJob feature.
+
+**Implication for tracecore.** Reading `Pod.spec.env` via an informer
+gives us correct *node-level* attribution (job ID, replica index) but
+wrong *process-level* rank when `nproc_per_node > 1`. The robust
+attribution chain is:
+
+1. Pod metadata for: namespace, pod name, container name, node, job
+   labels.
+2. Pod env for: job ID / run ID (Kubeflow injects `TORCHELASTIC_RUN_ID`
+   when in elastic mode; vanilla pods can be configured via downward
+   API).
+3. Training-script log content (regex on `body`) for: per-process
+   `RANK` and `LOCAL_RANK`. Most training scripts print these on
+   startup (e.g., `[Rank 3] starting epoch 0`).
+
+Document this in the receiver README so operators understand why the
+rubric's "derives `RANK` from Pod env vars" is correct for the common
+case (1 process per pod) but degenerate for `nproc_per_node > 1`. See
+§10 open decision OD-2.
+
+### Dataloader log formats
+
+Only torchvision references and detectron2 emit per-step `data_time`
+out of the box. Surveyed in detail:
+
+| Framework | Per-step `data_time`? | Sample line |
+|---|---|---|
+| torchvision `MetricLogger` | Yes | `... time: 2.0156  data: 1.4523  max mem: 14523` |
+| detectron2 `CommonMetricPrinter` | Yes | `... time: 0.2540  ... data_time: 0.0084  ...` |
+| PyTorch Lightning | No (only SimpleProfiler post-train table) | `[_TrainingEpochLoop].train_dataloader_next | 0.012345 | 12.345` |
+| NVIDIA NeMo | No (only `train_step_timing in s=`) | `... train_step_timing in s=0.512]` |
+| HF `transformers.Trainer` | No (only end-of-train aggregates) | `'train_runtime': 1234.5678, 'train_steps_per_second': 8.04` |
+| MosaicML Composer | No (throughput only) | `throughput/batches_per_sec: 1234.567` |
+
+The default `dataloader_regex` should be a multi-pattern alternation
+matching torchvision and detectron2. For Lightning / NeMo / HF /
+Composer, the only path is user-side instrumentation (log a line with
+the same shape from the training script). Document this in the receiver
+README as the **placeholder regex is a starting point**, not a
+silver bullet.
+
+Proposed default (covers torchvision and detectron2; validated against
+Go's `regexp` package on 2026-05-19):
+
+```regex
+\btime:\s+(?P<iter_time_s>\d+(?:\.\d+)?)\b.*?\b(?:data_time|data):\s+(?P<data_time_s>\d+(?:\.\d+)?)\b
+```
+
+**Validation:** the regex compiles under Go RE2 and produces correct
+captures across five test inputs. Test fixture:
+
+| Input | iter_time_s | data_time_s |
+|---|---|---|
+| `... time: 2.0156  data: 1.4523  max mem: 14523` (torchvision) | `2.0156` | `1.4523` |
+| `... time: 0.2540  last_time: 0.2491  data_time: 0.0084  ...` (detectron2) | `0.2540` | `0.0084` |
+| `dataloader_idle_time: 0.05  unrelated:0` (false-positive guard) | (no match) | (no match) |
+| `time: 2 data: 1` (integer seconds) | `2` | `1` |
+| `no match here` | (no match) | (no match) |
+
+Design notes:
+
+- **`\d+(?:\.\d+)?` accepts integer and decimal seconds.** Round-1's
+  `\d+\.\d+` rejected `time: 2 data: 1`; the new form is more
+  defensive.
+- **Alternation order is `data_time` first, then `data`.** At a word
+  boundary, the regex engine tries the longer literal before the
+  shorter, so `data_time:` matches the longer branch and is never
+  mis-parsed as `data` + `_time` residue.
+- **Trailing `\b` after each numeric capture anchors the end.** Round-2
+  used `(?=\s|$)` lookahead, which Go's RE2 engine does not support
+  (`error parsing regexp: invalid or unsupported Perl syntax: '(?='`).
+  `\b` is the RE2-compatible substitute: it matches between a digit
+  and any non-word character (space, comma, end-of-string,
+  punctuation), which is what we want at the end of a numeric token.
+- **`.*?` is intentionally non-greedy.** If a log line emits multiple
+  `time:`/`data:` pairs, the non-greedy match preferentially binds the
+  first pair, so iter and data are correctly paired.
+- **No multi-line concerns.** OTel `container` operator recombines
+  `P`/`F` partials into a single record before this regex sees the
+  body. The same holds under any BA-* build approach: CRI partial-line
+  reassembly happens before the regex.
+- **Go RE2 lookahead is unavailable.** Do not re-introduce `(?=…)` in
+  any tracecore-emitted regex; the regex must run unchanged in both
+  tracecore-native code and (under BA-1) OTel `ExtractPatterns`, both
+  of which use Go's stdlib `regexp` package.
+
+The lit-GPT format in §13.7 is **a separate pattern**, not covered by
+this default. Multi-pattern support would need a config map
+(`framework_name → regex`), not regex alternation.
+
+## 7. SemConv namespace decision
+
+**Status: contested. The current M15 rubric uses `gen_ai.training.rank`
+deliberately as part of NORTHSTARS objective O4 (shepherd the
+`gen_ai.training.*` namespace into upstream SemConv before the
+ecosystem standardizes elsewhere). The naming.md "recommendation" cuts
+against it. This is a strategic bet, not an oversight. R-1 below is
+revised accordingly.**
+
+### 7.1 Evidence that `gen_ai.training.*` is a deliberate project goal
+
+[`NORTHSTARS.md`](../../NORTHSTARS.md) O4 row (line 38) names "Standards"
+as a top-level objective with `gen_ai.training.*` external
+implementations as the hero KPI. Line 202 states the goal verbatim:
+"author and shepherd the OpenTelemetry `gen_ai.training.*` semantic
+conventions through to wide adoption, before the ecosystem standardizes
+on someone else's vocabulary." The O4 commitments include:
+
+- First-draft PR filed on `open-telemetry/semantic-conventions` by M1
+  (M1 has shipped; PR status not verified in this research pass).
+- First merged `gen_ai.training.*` upstream PR by M6 (M6 in progress).
+- Tracecore receivers emit semconv attribute names: 100% per release.
+
+[`MILESTONES.md`](../../MILESTONES.md) line 29 reinforces this:
+"M7 is absent by design — OTel `gen_ai.training.*` semconv work lives
+in `open-telemetry/semantic-conventions`, not this repo (recurring
+cadence in NORTHSTARS.md O4)."
+
+So `gen_ai.training.rank` in M15's rubric is consistent with a stated
+project objective, not a naming mistake. The earlier framing in this
+section (round 1) missed this entirely.
+
+### 7.2 Evidence that the naming.md rule cuts the other way
+
+The
+[`docs/general/naming.md` rule](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/naming.md)
+states verbatim (re-verified 2026-05-19): "It is not recommended to use
+existing OpenTelemetry semantic convention namespace as a prefix for a
+new company- or application-specific attribute name. Doing so may
+result in a name clash in the future."
+
+This is a **recommendation**, not a hard prohibition. The risk it warns
+against is a name clash if upstream later defines the same attribute
+with different semantics. Tracecore's bet under O4 is that **tracecore
+is the upstream definer**, so the clash risk reduces to "the upstream
+PR fails to land or lands with different semantics."
+
+The `gen_ai.*` registry today is inference-only
+(`request.*`/`response.*`/`usage.*`/`agent.*`/`tool.*`/`operation.*`).
+Issue
+[semantic-conventions-genai #88](https://github.com/open-telemetry/semantic-conventions-genai/issues/88)
+proposes a sibling `rl.*` namespace for training, which is a competing
+proposal to O4's plan. Either could land first.
+
+### 7.3 Verdict and revised recommendation
+
+The decision is not "which namespace is technically correct" but "how
+much risk does tracecore want to carry that O4 doesn't land." Two
+defensible postures:
+
+| Posture | Emit | Risk |
+|---|---|---|
+| **Hold the bet** (status quo) | `gen_ai.training.*` | If `rl.*` lands first or `gen_ai.training.*` PR is rejected, we own a clash. Mitigation: collector-side rename via attributes processor at that point. |
+| **Hedge** | `tracecore.training.*` primary, `gen_ai.training.*` aliased | Doubles attribute cardinality. Reads heavier in dashboards. Operationally pre-pays the clash risk. |
+| **Concede** | `tracecore.training.*` only | Cleanest under naming.md. Abandons O4's namespace bet. Out of scope for a research doc to recommend. |
+
+This research pass cannot decide the bet; it requires sign-off from the
+O4 owner (per NORTHSTARS.md line 204, "OTel/semconv lead"). The
+research finding here is to **stop treating this as a research-resolved
+question**. The rubric edit framing in §11 R-1 (round-1's "rename to
+`tracecore.training.*`") is **withdrawn** in favor of recommending the
+owner make the call.
+
+If O4 status is "stalled or rejected" at design-doc time, the fallback
+to `tracecore.training.*` is well-evidenced. If O4 status is "draft PR
+open and tracking", hold the bet.
+
+This is a [`MILESTONES.md`](../../MILESTONES.md) rubric question with
+8 cross-receiver call sites (lines 29, 358, 360, 433, 453, 459, 460,
+481 — see §11 R-1). Any change must be a single cross-cutting PR, not
+an M15-scope edit.
+
+### 7.4 O4 status check: no upstream PR exists as of 2026-05-19
+
+Round-2 noted the M1-target first-draft PR's status was unverified.
+This pass verified directly via `gh search` against both upstream
+repos:
+
+- `open-telemetry/semantic-conventions`: zero PRs containing
+  `tracecore` (open or closed); zero PRs containing
+  `gen_ai.training` (open or closed).
+- `open-telemetry/semantic-conventions-genai` (the post-split repo
+  hosting `gen_ai.*`): zero PRs containing `tracecore`; one open PR
+  containing `training` (PR #172 by `renovate[bot]`, dependency update,
+  unrelated).
+
+**Verdict: the NORTHSTARS O4 "First-draft PR filed on
+`open-telemetry/semantic-conventions` by M1" commitment is not
+fulfilled.** M1 has shipped per MILESTONES.md. Either the PR was filed
+under different keywords this search missed (verify by re-running with
+the O4 owner's GitHub handle and the actual PR title if known), the PR
+exists in a different repo, or the commitment is overdue.
+
+**Implication for §7.3 verdict.** The "hold the bet" posture in §7.3
+assumed an active upstream effort. Without a filed PR, the bet is
+unfunded. The defensible postures now are:
+
+- **Hedge** (`tracecore.training.*` primary, `gen_ai.training.*`
+  aliased on emit): pre-pays the clash risk, doubles attribute
+  cardinality, but lets the receiver ship without depending on an
+  upstream effort that has not started.
+- **Concede** (`tracecore.training.*` only): cleanest under naming.md;
+  acknowledges O4 as dormant.
+- **Re-activate O4** (escalate to the O4 owner): file the draft PR
+  before M15 ships so the upstream effort is at least visible. If this
+  happens within the M15 design window, the original "hold the bet"
+  recommendation becomes defensible again.
+
+The doc cannot resolve which posture; the call must go to the project
+lead and the O4 owner. The negative finding above is the load-bearing
+input — it changes the strategic-bet framing from "we're in flight"
+to "we're not yet."
+
+### 7.5 Attribute table
+
+The attributes M15 would emit under the current rubric, scattered
+across §6, §11, §13.11, §14, §16 and consolidated here for the
+O4-owner stakeholder lens:
+
+| Attribute | Type | Cardinality | Source | Stability | Rationale |
+|---|---|---|---|---|---|
+| `k8s.namespace.name` | string | low (≤cluster namespaces) | path parse / `container` operator | stable (upstream SemConv) | Standard pod attribution |
+| `k8s.pod.name` | string | medium (pods/cluster) | path parse | stable (upstream SemConv) | Standard |
+| `k8s.pod.uid` | string | medium (pods/cluster lifetime) | path parse | stable (upstream SemConv) | Globally unique join key |
+| `k8s.container.name` | string | low (containers/pod) | path parse | stable (upstream SemConv) | Per-container disambiguation |
+| `k8s.container.restart_count` | int | low | path parse | stable (upstream SemConv) | Restart correlation |
+| `k8s.node.name` | string | low (nodes/cluster) | env via downward API | stable (upstream SemConv) | Per-node sharding |
+| `log.iostream` | string | 2 (stdout/stderr) | CRI parse | stable (upstream SemConv) | Stream attribution |
+| `log.file.path` | string | medium | tailer | stable (upstream SemConv) | Path traceability |
+| `tracecore.training.rank` | int | medium (≤world_size) | pod env via informer | tracecore-coined; contested namespace per §7 | Per-process join key |
+| `tracecore.training.world_size` | int | low (≤jobs × replicas) | pod env via informer | tracecore-coined | Cluster-size context |
+| `tracecore.training.local_rank` | int | low | pod env / body regex | tracecore-coined | Per-node process index |
+| `tracecore.training.job.id` | string | medium | pod env or label | tracecore-coined | Cross-restart correlation; UUID4 under torchrun `--standalone` per §13.13 |
+| `tracecore.training.data_time_s` | double, nullable | per-record | body regex (§6) | tracecore-coined; emitted only on dataloader-line match | M18 straggler input |
+| `tracecore.training.iter_time_s` | double, nullable | per-record | body regex (§6) | tracecore-coined; emitted only on dataloader-line match | M18 straggler input |
+| `tracecore.container.lines_per_s` | double | per-(pod_uid, container) per 15s | derived metric | tracecore-coined | Straggler-pattern feed; rate-derived |
+| `tracecore.dropped_lines` | int | per-record (sampled) | rate-limiter | tracecore-coined | Rate-limit observability |
+| `k8s.event.hint` (joined) | string | low | from k8sevents M10 record | tracecore-coined | Cross-receiver pattern input |
+
+**Cardinality budget:** the `tracecore.training.*` set is bounded by
+`(jobs × world_size)` cluster-wide, which for typical training
+workloads (≤32 jobs × ≤8192 ranks) keeps the attribute-value
+cardinality under 256K — within Prometheus's default scrape sanity
+limits. Document the budget in the receiver README.
+
+**Stability classes:**
+
+- **stable (upstream SemConv):** the attribute name and semantics are
+  fixed by upstream OpenTelemetry SemConv; tracecore must align.
+- **tracecore-coined:** the name was introduced by tracecore. Two
+  sub-classes under §7:
+  - `tracecore.training.*` attributes: the name is committed but the
+    namespace choice is contested per §7.4. If the bet ends as
+    "hedge" or "concede", the prefix changes; the semantics do not.
+  - `tracecore.container.*` and `tracecore.dropped_lines`: pure
+    receiver-specific telemetry, no upstream contention.
+
+### 7.6 Draft semantic-conventions YAML sketch (for the O4 owner)
+
+If the O4 effort is re-activated, the upstream PR shape would be a
+new file under `model/gen-ai/training/registry.yaml` in
+`open-telemetry/semantic-conventions-genai`. Sketch:
+
+```yaml
+groups:
+  - id: registry.gen_ai.training
+    type: attribute_group
+    display_name: Generative AI Training Attributes
+    brief: >
+      Describes attributes for distributed-training observability.
+      Applies to receivers and processors that emit per-process,
+      per-job training-loop telemetry from Kubernetes-orchestrated
+      jobs.
+    attributes:
+      - id: gen_ai.training.rank
+        type: int
+        stability: experimental
+        brief: >
+          Global rank of the worker within the training job
+          (0 ≤ rank < gen_ai.training.world_size). Sourced from
+          torchrun `RANK`, Ray Train's mirrored env, or Kubeflow
+          PyTorchJob worker offset (+1 for workers in v1).
+        examples: [0, 1, 7, 4095]
+      - id: gen_ai.training.world_size
+        type: int
+        stability: experimental
+        brief: Total number of workers in the training job.
+        examples: [8, 64, 4096]
+      - id: gen_ai.training.local_rank
+        type: int
+        stability: experimental
+        brief: >
+          Rank of the worker within its node. 0 ≤ local_rank
+          < local_world_size. Sourced from torchrun `LOCAL_RANK`.
+        examples: [0, 1, 7]
+      - id: gen_ai.training.job.id
+        type: string
+        stability: experimental
+        brief: >
+          Stable identifier for the training job across worker
+          restarts. Sourced from orchestrator (Kubeflow
+          `training.kubeflow.org/job-name`, JobSet
+          `jobset.sigs.k8s.io/jobset-uid`, Slurm `SLURM_JOB_ID`).
+          NOT `TORCHELASTIC_RUN_ID` in `--standalone` mode (random
+          UUID per launch; see implementation notes).
+        examples: ["pytorch-train-12345", "jobset-abc-uid"]
+      - id: gen_ai.training.data_time_s
+        type: double
+        stability: experimental
+        brief: >
+          Seconds the dataloader spent producing the batch consumed
+          by this training step. Emitted per-step when the training
+          framework logs a recognizable pattern (torchvision
+          MetricLogger, detectron2 CommonMetricPrinter).
+      - id: gen_ai.training.iter_time_s
+        type: double
+        stability: experimental
+        brief: >
+          Seconds for the full training iteration (forward + backward
+          + optimizer step). Pairs with data_time_s for straggler
+          detection (M18 in tracecore).
+```
+
+This is **a sketch, not a proposal text.** The actual PR would need:
+- Reference implementation link (tracecore as the first emitter).
+- Cross-references to existing `gen_ai.*` attribute groups.
+- A `stability: experimental` lifecycle commitment.
+- Engagement with the `rl.*` proposal (issue #88) for scope overlap.
+
+Filing this as a draft PR closes the §7.4 negative finding and
+re-enables the §7.3 "hold the bet" posture.
+
+## 8. Internal repo prior art
+
+[`components/receivers/k8sevents/`](../../components/receivers/k8sevents)
+is the closest template. Patterns below verified against source on
+2026-05-19 (not just relayed from sub-agent report).
+
+- **Lifecycle.** Embed `pipeline.ComponentState` and own a
+  `*lifecycle.Lifecycle` per independent source
+  (`components/receivers/k8sevents/receiver.go:61, 73`).
+- **Optional sub-sources.** k8sevents ships a `NodeConditions.Enabled`
+  toggle that registers an additional Node SharedInformer **with its
+  own typed record, channel, and degraded flag**
+  (`receiver.go:75, 99-101`; `config.go:90-99`). Pattern relevant for
+  M15's OD-1b Path B (informer-based env-var projection): ship the
+  pod-spec informer as an opt-in sub-source, not as a hard dependency,
+  so the receiver still works for operators using OD-1b Path A
+  (downward API).
+- **Config.** Public struct, YAML-stable field names, `Validate()` at
+  load time (not Start time). Ambiguity errors are exit-2 with named
+  field. `config.go:18-80, 269-288`. Cap internal channel at `2^20`
+  with an error message redirecting to a persistent queue
+  (`config.go:197-206`).
+- **Typed record export.** `record.go:20-69` defines `Record` and
+  `ObjectRef` structs; downstream consumers (M19 pattern detector)
+  import these for compile-time joins. Same approach for M15: export
+  a `containerstdout.Record` with the canonical attribute fields
+  including `tracecore.training.rank` once §11 lands.
+- **Self-telemetry.** `selftelemetry.Receiver` interface has five
+  methods (verified at source 2026-05-19,
+  `internal/selftelemetry/interface.go:193-237`):
+  `IncError(Kind)` (line 200), `IncEmissions(n int64)` (line 206),
+  `ObserveLatency(d time.Duration)` (line 218),
+  `SetDegraded(degraded bool)` (line 231),
+  `MarkActivity()` (line 237). Package-level
+  `selftelemetry.RecordInitError(ctx, mp, kind, id, reason)` handles
+  the noop-fallback case when MeterProvider is unavailable (used in
+  k8sevents' factory.go:57-58). Kinds are receiver-local typed
+  constants (e.g. `KindBackpressureDrop`, `KindCardinality`), never
+  raw error strings; the package documents this cardinality contract
+  but does not guard it at runtime.
+- **Degraded transitions.** Track `lastWatchSeen` atomically;
+  `SetDegraded(false)` fires exactly once per error→recovery edge
+  (`receiver.go:276-305`). Don't call it on every successful emit.
+- **RBAC pinning.** Hand-authored `rbac.yaml` per receiver; golden
+  file `rbac.can-i.golden` validated by Go test (`rbac_test.go`).
+  M15 needs a new ClusterRole granting pod-list/watch (and namespace
+  list for filtering) — see §10 OD-1.
+- **Backpressure vs downstream errors.** `KindBackpressureDrop` and
+  `KindDownstream` are distinct kinds with different remediations.
+  M15 should partition: rate-limit drops vs file-read errors vs pod
+  informer flap.
+
+**Gaps with no in-tree helper:**
+
+- No file-tailer code anywhere (kernelevents reads journald + kmsg, not
+  files). M15 brings the first file-watch path; depending on filelog
+  inherits `pkg/stanza/fileconsumer` and removes the question.
+- No generic per-key token-bucket rate limiter.
+  [`internal/telemetry/windowed_rate.go`](../../internal/telemetry)
+  exists but is for self-telemetry, not data-plane rate limiting.
+  Bring in `golang.org/x/time/rate` with a per-key map and eviction.
+- No cursor/checkpoint helper. K8sevents is stateless. M15 needs to
+  persist byte offsets; if we depend on filelog, the `storage`
+  extension contract provides this — pick a backend (likely the
+  `file_storage` extension at a configurable path).
+- No NODE_NAME downward-API wiring in the chart.
+  [`install/kubernetes/tracecore/`](../../install/kubernetes/tracecore)
+  does not expose `NODE_NAME`; the nccl_fr example documents
+  `${POD_NAME}` / `${POD_UID}` as operator-provided. Per chart
+  convention, the tracecore config loader does not expand env vars —
+  the DaemonSet template must inject them via `fieldRef:` and the
+  receiver reads `os.Getenv("NODE_NAME")`.
+
+### 8.1 Typed-record contract for downstream consumers
+
+M18 (straggler pattern) and M19 (pod-evicted pattern) must consume M15's
+records via a compile-time-stable Go schema. The k8sevents
+`Record` / `ObjectRef` / `SchemaURLv0` pattern (verified at
+`components/receivers/k8sevents/record.go:20-107`) is the template.
+
+Sketch (Go pseudocode for the design phase; field names contingent on
+§7 namespace decision — `tracecore.training.*` if conceded, otherwise
+`gen_ai.training.*`):
+
+```go
+// Record is the typed projection of a single CRI log line plus
+// receiver-derived attribution. Exported for M18/M19 to import
+// without grepping plog.LogRecord attributes.
+type Record struct {
+    // CRI parse output (always populated).
+    Timestamp time.Time       // RFC3339Nano from CRI line
+    Stream    StreamType      // StreamStdout / StreamStderr
+    LogTag    LogTag          // LogTagFinal / LogTagPartial; recombined upstream
+    Body      string          // post-recombine; may be JSON-parsed if §6 auto-detect fires
+
+    // Kubernetes attribution (always populated; sourced from filepath).
+    Pod       PodRef          // namespace, name, uid, container, restartCount
+    NodeName  string          // from $NODE_NAME via downward API
+
+    // Training attribution (populated when discoverable; zero values
+    // are explicit signals of "not discovered", not "no rank").
+    Training  TrainingRef     // see TrainingRef below
+
+    // Receiver-derived metrics (per-record samples; aggregated by the
+    // derived-metric path).
+    DroppedLines int64        // 0 unless this record is a rate-limiter sample
+
+    // Schema URL pin for version-gated joins (mirrors k8sevents pattern).
+    // Frozen via SchemaURLv0 constant; bumping is a deprecation hook.
+}
+
+type PodRef struct {
+    Namespace    string
+    Name         string
+    UID          string
+    Container    string
+    RestartCount int
+}
+
+// TrainingRef holds the per-process training attribution. All fields
+// are zero/empty when the receiver couldn't discover them; consumers
+// must guard against zero RANK (which is a valid value!) by checking
+// the WorldSize > 0 sentinel.
+type TrainingRef struct {
+    Rank       int    // -1 if undiscovered, 0..N-1 otherwise
+    WorldSize  int    // 0 if undiscovered; >0 means Rank is meaningful
+    LocalRank  int    // -1 if undiscovered
+    JobID      string // empty if undiscovered
+
+    // Body-regex outputs (per-record; nullable via pointer).
+    DataTimeS *float64 // nil when no dataloader match on this record
+    IterTimeS *float64 // nil when no dataloader match on this record
+}
+
+type StreamType uint8
+const (
+    StreamUnknown StreamType = iota
+    StreamStdout
+    StreamStderr
+)
+
+type LogTag uint8
+const (
+    LogTagFinal LogTag = iota
+    LogTagPartial
+)
+
+// SchemaURLv0 freezes the v0 join schema for M18/M19 consumers.
+// Pattern follows components/receivers/k8sevents/record.go:100.
+const SchemaURLv0 = "https://tracecore.ai/schemas/containerstdout/v0"
+const SchemaURL   = SchemaURLv0
+```
+
+**Decisions baked into the sketch:**
+
+- **Rank uses `-1` as the "undiscovered" sentinel, not `int *`.** Rank
+  0 is a valid global rank; a nullable-pointer would force every
+  consumer to deref. The `-1` sentinel is checked by `WorldSize > 0`,
+  not by comparing Rank itself.
+- **`DataTimeS` and `IterTimeS` ARE nullable pointers.** Most records
+  do not match the dataloader regex; emitting zero would be a false
+  signal. M18's input contract should accept "match present" / "match
+  absent" cleanly.
+- **`Body` post-recombine, not pre-recombine.** Under any BA-* the
+  CRI partial-line reassembly happens before this Record is emitted.
+- **`SchemaURLv0` frozen pattern.** Mirrors k8sevents
+  `record.go:90-107`'s explicit "do not redefine in terms of SchemaURL"
+  comment; future v1 bumps require a separate constant.
+
+**M18 join-key contract (load-bearing for §11 R-2):**
+
+M18's straggler detector requires the following attribute set co-present
+on a single record for the join to fire:
+
+- `Training.Rank` (valid: `WorldSize > 0` && `Rank >= 0`)
+- `Training.WorldSize` (valid: `> 0`)
+- One of (`Training.DataTimeS`, `Training.IterTimeS`) non-nil
+- `Pod.UID` (always present)
+
+Records missing any of these are not M18 input; the receiver emits them
+anyway for log-fidelity reasons (per PRINCIPLES §1, never drop data
+just because one consumer wouldn't use it).
+
+**M19 join-key contract:**
+
+M19's pod-evicted detector consumes M15's records as one of two
+inputs (the other is k8sevents' Record with `Hint == HintEvicted`).
+M19's join is:
+
+- `Pod.UID` (matches against k8sevents `Regarding.UID`)
+- `Timestamp` (within evictionMatchWindow of k8sevents `EventTime`)
+
+The receiver SLA for M19 (§17.4) is: **on pod-eviction event, the
+receiver MUST emit any pending records from that pod within `2 ×
+poll_interval` (default `400 ms`) of receiving the informer's
+deletion event.** This is enforced by the
+`TestContainerStdout_PodEvictionTailFlush` fixture.
+
+### components.go is generated
+
+[`cmd/tracecore/components.go`](../../cmd/tracecore/components.go) is
+code-generated from
+[`cmd/tracecore/components.yaml`](../../cmd/tracecore/components.yaml)
+by [`tools/components-gen/main.go`](../../tools/components-gen/main.go).
+Sorted alphabetically. The receiver registration is a one-line edit to
+`components.yaml`, then `make generate`. Never hand-edit
+`components.go`.
+
+## 9. Chart and RBAC additions
+
+### 9.0 Operator-facing values.yaml schema
+
+The operator-visible configuration surface, derived from the rubric +
+the open decisions in §10. Field names are alpha-stable contracts;
+renames go through the §18.1 deprecation flow.
+
+```yaml
+receivers:
+  containerstdout:
+    enabled: false                        # Alpha; opt-in (§18.1)
+
+    # File discovery
+    include: ["/var/log/pods/*/*/*.log"]  # Default; rarely changed
+    namespaces: []                        # Allowlist (empty = all); §16.1
+
+    # CRI parser knobs (rubric line 358-359)
+    max_log_size: 1MiB                    # Per partial-line cap
+    max_attributes: 16                    # JSON parse output cap
+
+    # Pod attribution (§13.4, OD-1b)
+    rank_source: downward_api             # downward_api | informer
+    process_rank_regex: "\\bRANK[=:]\\s*(\\d+)\\b"   # OD-2; user-override expected
+
+    # Dataloader timing (§6)
+    dataloader_regex: |
+      \btime:\s+(?P<iter_time_s>\d+(?:\.\d+)?)\b.*?\b(?:data_time|data):\s+(?P<data_time_s>\d+(?:\.\d+)?)\b
+
+    # Rate limiting (OD-3; §10.1 OD-3 expanded below)
+    egress_rate_limit:
+      rate: 200                           # lines/s per (pod_uid, container)
+      burst: 1000                         # token bucket depth
+      lru_cap: 8192                       # max tracked (pod_uid, container) keys
+      lru_evict_after: 5m                 # idle TTL before eviction
+      namespace_budgets: {}               # optional per-namespace sub-budget
+
+    # Cursor persistence (§13.5, R-6)
+    cursor:
+      dir: /var/lib/tracecore/container_stdout
+      fsync: true                         # BA-1 forwards to file_storage; BA-2/3 native
+
+    # Tailer tuning (§4)
+    poll_interval: 200ms
+    fingerprint_size: 1KiB
+    max_concurrent_files: 1024
+
+    # Compression (R-7; BA-1 only)
+    compression: auto                     # gzip | auto | ""
+```
+
+### 9.1 NODE_NAME downward-API injection
+
+M15 is the first tracecore receiver to require `NODE_NAME` via
+downward API. The DaemonSet template addition (sketch; the chart
+maintainer is the owner of the exact YAML):
+
+```yaml
+spec:
+  containers:
+    - name: tracecore
+      env:
+        - name: NODE_NAME
+          valueFrom:
+            fieldRef:
+              fieldPath: spec.nodeName
+        - name: POD_NAMESPACE
+          valueFrom:
+            fieldRef:
+              fieldPath: metadata.namespace
+        - name: POD_UID
+          valueFrom:
+            fieldRef:
+              fieldPath: metadata.uid
+```
+
+`POD_NAMESPACE` and `POD_UID` are pre-existing tracecore downward-API
+patterns (per nccl_fr's example config); `NODE_NAME` is the new
+addition for M15.
+
+### 9.2 Post-deployment smoke test
+
+After enabling M15 on a node, operators should verify M15 sees the
+same records `kubectl logs` does. Recommended one-liner for
+`docs/integrations/containerstdout.md` runbook:
+
+```bash
+# Pick a pod with stdout activity
+POD=$(kubectl get pod -o name | head -1)
+
+# Capture 10 lines via kubectl
+kubectl logs --tail=10 "$POD" > /tmp/kubectl.log
+
+# Wait for M15 to flush (≈poll_interval + force_flush_period)
+sleep 1
+
+# Compare against M15-emitted records (via Prometheus or downstream sink)
+# Expect: 10 records in M15's output with matching body+timestamp.
+```
+
+Failures here indicate misconfigured `include` glob, RBAC denying pod
+informer, or NODE_NAME not injected. Cross-reference RUNBOOK §
+"Smoke test failure" (to be written in M15 design phase).
+
+### 9.3 Helm-chart asks
+
+
+
+Helm-chart changes M15 will require:
+
+1. `receivers.containerstdout.enabled` toggle in `values.yaml`.
+2. New host-path mount on the DaemonSet template:
+   ```yaml
+   - name: var-log
+     hostPath:
+       path: /var/log
+       type: Directory
+   ```
+   with `readOnly: true` `volumeMount`.
+3. New volume for cursor persistence:
+   ```yaml
+   - name: tracecore-state
+     hostPath:
+       path: /var/lib/tracecore/container_stdout
+       type: DirectoryOrCreate
+   ```
+4. Downward-API env injection for `NODE_NAME` (new pattern; nothing in
+   the chart uses it today).
+5. New `rbac.yaml` per receiver (pods list+watch in target namespaces;
+   namespaces list for filtering). Pinned by `rbac.can-i.golden`.
+6. `conftest`/`kyverno` policy delta: M5b rejects extra capabilities
+   and writable mounts. The two new hostPath mounts above must be
+   read-only or `DirectoryOrCreate`, never writable across pod
+   boundaries.
+
+## 10. Open decisions for design phase
+
+Round-2 research closed several of these. Resolved ones are kept with their
+verdict for the design-doc trail; surviving open items have `OPEN` in the
+status column.
+
+| ID | Status | Decision | Verdict / notes |
+|---|---|---|---|
+| OD-1 | RESOLVED | Pod-attribution mechanism. | **Hybrid.** Pod metadata (namespace, pod, container, node, owner refs, training labels) via `k8sattributesprocessor` with `pod_association` keyed on `k8s.pod.uid` (the `container` operator already emits this; default `from: connection` does not work for local file tails). RANK / WORLD_SIZE / JOB_ID via a tracecore-owned env-projection mechanism — see OD-1b. |
+| OD-1b | OPEN | Env-var projection path: tracecore processor reading `Pod.spec.containers[].env` via an informer, **or** ask training launchers to set `OTEL_RESOURCE_ATTRIBUTES=tracecore.training.rank=$(RANK),...` via downward API and ride `resourcedetectionprocessor` env detector. | Zero upstream prior art either way. The downward-API path is one shell line per launch script with no extra RBAC; the informer path is invisible to operators but adds pod-list/watch RBAC and a cardinality budget. Lean toward the downward-API path as the default and the informer as an opt-in for unmodified workloads. |
+| OD-2 | OPEN | Process-level `RANK` recovery when `nproc_per_node > 1`. | Pod env reports the launcher's view (master rank only); per-process rank requires regex on body content. Ship a configurable `process_rank_regex` that defaults to a tolerant pattern (`\bRANK[=:]\s*(\d+)\b`); document that pod-level rank is the join key when `nproc_per_node = 1` and operators must instrument their script otherwise. |
+| OD-3 | RESOLVED (semantics specified below) | Rate-limit processor vs inline. | **Processor.** Per-key token-bucket keyed on `(k8s.pod.uid, k8s.container.name)`. Reusable beyond filelog. Receiver-inline would tighten coupling for no architectural gain. Concrete semantics: `golang.org/x/time/rate.Limiter` per key; default `rate=200` lines/s, `burst=1000`; per-key state stored in an LRU map with `lru_cap=8192` keys and `lru_evict_after=5m` idle TTL; over-budget records sampled (1 in N) emitted with `tracecore.dropped_lines=N` attribute and `IncError(KindBackpressureDrop)` counter; namespace-scoped sub-budgets supported via a parent-bucket-and-child-buckets structure (per-namespace budget consumed before per-(pod,container) budget). Cardinality cap on the LRU prevents runaway memory under pod churn. |
+| OD-4 | RESOLVED | Dataloader-regex tagging: OTTL `transform` processor or dedicated processor. | **OTTL first.** Config-only, no new Go code, and the regex set is small (~2 default patterns). Re-evaluate if the regex set grows beyond a half-dozen patterns or needs stateful book-keeping. |
+| OD-5 | RESOLVED (with rubric edit) | Cursor persistence backend. | **Use OTel `file_storage` extension (bbolt) as authoritative.** Production landscape is split (SQLite in Fluent Bit, JSON in Vector/Datadog, YAML in Promtail, bbolt in OTel). The rubric's JSON-at-`cursor.json` predates the depend-on-filelog finding; reconciling means rubric edit R-6. Optionally emit a low-frequency JSON snapshot for human inspection — read-only, not authoritative. |
+| OD-6 | RESOLVED | Gzip-compressed rotated files. | **Decompress and continue** via filelog `compression: auto`. Supported since contrib v0.144+; gzip-corruption bugs (#46105, #45572) are fixed. Pin contrib >= v0.144.0 and add an integration test that gzips a rotated file mid-tail and asserts zero record loss. |
+| OD-7 | RESOLVED | Filelog dependency surface. | **Factory-only.** Consume `filelogreceiver.NewFactory()` and configure via YAML; do not embed `pkg/stanza` operators directly in Go. Round-2 found ~3 BREAKING changes in 18 months, all in adjacent code (OTTL, Windows event logs); `pkg/stanza/fileconsumer` surface has been stable but the deprecation policy is not documented. Factory-only keeps the blast radius bounded. |
+| OD-8 | OPEN | Bench harness coupling with M5. | M5 (install + overhead benchmark harness) is still planned; M15's overhead-budget rubrics (≤0.10% CPU, ≤20 MB RSS, ≤0.3 Mbps) must land through M5's framework, not a parallel one. Sequence: M5 lands the harness, M15 lands a benchmark fixture under it. |
+| OD-9 | OPEN | Filelog feature-gate posture. | Two relevant gates: `filelog.protobufCheckpointEncoding` (new bbolt key encoding, stable timeline TBD) and `filelog.decompressFingerprint` (stable at v0.142). Decide whether tracecore opts in by default or follows upstream defaults; affects upgrade churn. |
+| OD-10 | OPEN | Custom builder manifest for tree-shaking. | If the binary-size delta from `filelogreceiver` + `pkg/stanza` (~15–30 MB estimated, unmeasured) is unacceptable, the only path is a custom builder manifest excluding unused operators. Measure before committing engineering effort. |
+
+## 11. Rubric edits to propose against MILESTONES.md
+
+**Status note.** Round-1 of this research recommended opening these as a
+small PR before any M15 implementation. The §15 correction and the
+NORTHSTARS O4 finding in §7 changed the picture substantially:
+- R-1 and R-2 are namespace edits, which §7 has now reframed as a
+  cross-cutting strategic question, not an M15-local rubric mistake.
+  They are **withdrawn as PR proposals**; the call is the O4 owner's.
+- R-3 and R-7 are gated on tests that do not yet exist. They should
+  not be opened as MILESTONES.md edits until the corresponding fixtures
+  are written and pass. Marked as **deferred-pending-fixture**.
+- R-6 is conditional on the build-approach choice (§15.1 BA-1 vs the
+  others). Only valid under BA-1 (adapter to upstream filelog).
+- R-5 and R-8 are independent of build approach and remain proposable.
+- R-4 is cosmetic and low-priority.
+
+| Edit | Status | Current rubric | Proposed | Rationale |
+|---|---|---|---|---|
+| R-1 | **WITHDRAWN; recast as cross-cutting strategic question.** Originally proposed to rename `gen_ai.training.rank` → `tracecore.training.rank` in M15's rubric. §7 now reframes this as a NORTHSTARS O4 bet (own the namespace upstream). MILESTONES.md uses `gen_ai.training.*` at **lines 29, 358, 360, 433, 453, 459, 460, 481** — eight sites across M7-absence-note, M15, M13, M14, M18. Any rename must be a cross-receiver PR with the O4 owner's sign-off, not an M15-local edit. | "per-rank attribution: derives `gen_ai.training.rank` (canonical join key across receivers) from Pod env vars ... falls back to Pod labels `tracecore.io/rank`, `tracecore.io/job-id`; missing → record emitted with `rank=unknown`." (line 358 verbatim) | No change at M15 level; surface decision to O4 owner. | §7. The namespace is a strategic bet, not an oversight. |
+| R-2 | **REVISED.** Original proposal misquoted the rubric (dropped the `keyed by 'gen_ai.training.rank' for M18's straggler detector to join on` clause). Corrected quote below. Status now ties to R-1: if R-1 is concluded as "hedge" or "concede", R-2 mirrors that choice on the M18-join-key clause; if R-1 is "hold the bet", R-2 is unchanged. | "When a log line matches, receiver emits `tracecore.training.data_time_s` and `tracecore.training.iter_time_s` keyed by `gen_ai.training.rank` for M18's straggler detector to join on; schema lives in fixture." (line 360 verbatim) | If R-1 = "hold the bet": no change. If R-1 = "hedge"/"concede": rename `gen_ai.training.rank` in the keyed-by clause to match R-1's chosen target. | §7 + §6 dataloader survey. |
+| R-3 | **DEFERRED-PENDING-FIXTURE.** Round-2's "fingerprint detection satisfies the intent" is research, not a passing test. Should not open as a rubric edit until an integration test demonstrates zero-record-loss across both inode-rename and copy-truncate rotation on a kind cluster. | "Rotation: kubelet rotates by renaming `0.log` → `0.log.<N>` and creating new `0.log`; receiver follows inode, not path; integration test asserts zero record loss." | (Pending) "Rotation: receiver preserves zero-record-loss across both inode-rename and copy-truncate rotation, validated by integration test at `TestContainerStdout_RotationInodeRename` and `TestContainerStdout_RotationCopyTruncate`." | §4. Edit gates on the named tests existing. |
+| R-4 | **PROPOSABLE.** Cosmetic; documents the prior art. Low priority. | "`max_log_size` (default 1 MiB)" | Add citation: "`max_log_size` (default 1 MiB; matches OTel `container` stanza operator default)". | Cosmetic. |
+| R-5 | **PROPOSABLE, but revised mechanism.** Round-1 framed #11149 as disk-I/O backpressure; the actual mechanism (per the 2025-01-22 reproducer) is shared-pipe contention when something inside the container reads FD 1. Narrower failure surface than originally implied. | (No current row; new rubric.) | New: "Reliability caveat: containerd #11149 (open upstream) causes log-loss in `/var/log/pods/.../0.log` when an in-container process reads from its own FD 1 (e.g. application self-tee, sidecar reading `/proc/1/fd/1`). The mechanism is shared-pipe contention, not generic backpressure. Standard workloads that do not read FD 1 are unaffected. README must enumerate this failure mode." | §3 / §13.2. No CRI-RPC mitigation exists. |
+| R-6 | **CONDITIONAL on §15.1 BA-1.** Only valid if we adopt the OTel-adapter build approach. Under BA-2 (port `fileconsumer`) or BA-3 (reimplement), the rubric's original JSON-at-`cursor.json` framing remains correct because there is no `file_storage` extension to delegate to. | "Checkpoint persistence: cursor stored under `/var/lib/tracecore/container_stdout/cursor.json` (atomic rename); on restart resumes within 1 record of last-acknowledged position." (line 364) | **Under BA-1 only:** "Checkpoint persistence: cursor stored via OTel `file_storage` extension (bbolt) at `/var/lib/tracecore/container_stdout/`; on restart resumes within 1 record of last-acknowledged position. Optional read-only JSON snapshot for human inspection." Otherwise: no change. | §13.5 + OD-5. |
+| R-7 | **DEFERRED-PENDING-FIXTURE.** Also build-approach-conditional. The integration test must exist before the rubric edit can be proposed. | (No current row; would be new.) | (Pending fixture, BA-1 only) "Compressed rotated files (`0.log.<ts>.gz`) are read transparently via filelog `compression: auto`; an integration test at `TestContainerStdout_RotationCompressed` gzips a rotated file mid-tail and asserts zero record loss." | §13.6. |
+| R-8 | **PROPOSABLE.** Build-approach-independent; the receiver must surface kubelet rotation failure regardless of whether it consumes filelog or rolls its own. | (No current row; new rubric.) | New: "Degraded mode for kubelet rotation failure: receiver tracks observed `0.log` size and surfaces `IncError(KindRotationStalled)` plus `SetDegraded(true)` when size > `containerLogMaxSize` for ≥ 30 s (3× kubelet default `containerLogMonitorInterval`); receiver stays alive; FAILURE-MODES.md row references `TestContainerStdout_RotationStalled`." | §13.1. Kubelet emits klog-only on rotation failure; no K8s Event or metric. Tracecore is the observability layer that surfaces this. |
+
+**Net proposable edits at this time: R-4, R-5, R-8.** Three small,
+build-approach-independent edits that can land as a precursor PR with
+no architecture commitment. R-1/R-2/R-3/R-6/R-7 are all gated on
+decisions or fixtures that do not yet exist.
+
+### 11.1 Coverage of M15 rubric lines not addressed elsewhere
+
+Reviewer feedback (round 4) flagged that 5 M15 rubric lines had no
+analysis in this doc. Analyzed here:
+
+**MILESTONES.md line 359 — Structured-log JSON auto-detection.**
+Rubric: "first non-whitespace byte `{` triggers JSON parse; on success
+emits parsed fields as attributes capped at `max_attributes` (default
+16); on failure or non-`{`, passthrough as `body`."
+
+- **Implementation under each BA:** under BA-1, OTel's `json_parser`
+  stanza operator (`pkg/stanza/operator/parser/json`) does exactly
+  this; wire it after the `container` operator with a routing rule on
+  `body[0] == '{'`. Under BA-2/3, native code calling
+  `encoding/json.Decoder` on the body, with same routing.
+- **Cardinality risk:** `max_attributes: 16` is the existing
+  k8sevents default. Reasonable for structured logs from training
+  scripts (most emit ≤8 fields).
+- **Failure mode:** JSON parse failure on body that starts with `{`
+  but is not valid JSON. Receiver MUST fall through to body
+  passthrough, not drop the record. Test target:
+  `TestContainerStdout_JSONParseFailureFallthrough`.
+- **Rubric stands as written.** No edit proposed.
+
+**MILESTONES.md line 363 — `tracecore.container.lines_per_s` derived
+metric.**
+Rubric: "emits per-rank line-rate (`tracecore.container.lines_per_s`)
+as derived metric on 15s window."
+
+- **Aggregation key:** the rubric says "per-rank" but `Training.Rank`
+  is only populated when discoverable (see §8.1 typed-record schema).
+  Records with `WorldSize == 0` (undiscovered rank) need a bucket
+  key — `Pod.UID` is the natural fallback. Decision for design phase:
+  emit one metric per `(rank, pod_uid)` tuple, with `rank = "unknown"`
+  for the undiscovered case.
+- **Window mechanics:** 15s sliding window or 15s tumbling?
+  Tumbling is simpler and matches Prometheus scrape cadence. Sliding
+  would smooth bursts but doubles state. Recommend tumbling for v0.
+- **Cardinality:** `(rank, pod_uid)` cardinality on a node is
+  `pod_count × ranks_per_pod`. Default cardinality cap (§13.4 pattern)
+  should bound this; emit `IncError(KindCardinality)` on overflow.
+- **Test target:** `TestContainerStdout_LinesPerSDerivedMetric`.
+- **Rubric stands.** No edit proposed; designate the undiscovered-rank
+  bucket key in the design doc.
+
+**MILESTONES.md line 370 — Multi-tenancy namespace allowlist.**
+Rubric: "`namespaces:` allowlist filters Pod discovery before file
+watch opened; per-namespace egress sub-budget configurable."
+
+- **Filesystem-level filter is required.** §16.1 emphasized this:
+  filter at file-open time, not at emit time. Filelog's `include`
+  globs CAN filter by directory pattern but the Pod-namespace lives
+  in the directory name (`<ns>_<pod>_<uid>`), not as a top-level
+  directory. Practical implementation: pre-glob filter that walks
+  `/var/log/pods/*` and rejects entries whose `<ns>_` prefix is not
+  in the allowlist BEFORE the watcher opens the file. Under BA-1,
+  filelog's `exclude` globs handle the prefix match: `exclude:
+  ['/var/log/pods/^(?!ns1_|ns2_).*']` — but Go's RE2 doesn't support
+  negative lookahead, so glob must enumerate excluded namespaces
+  explicitly OR use filelog's filter-after-glob path. Confirm during
+  design.
+- **Per-namespace sub-budget:** rate-limit processor (OD-3) keys on
+  `(k8s.pod.uid, k8s.container.name)`. Per-namespace budget is a
+  second-tier aggregator: token bucket per namespace, sub-budgets per
+  pod within. Implementation under any BA: a `namespace_budgets` map
+  in the rate-limit processor config.
+- **Test targets:** `TestContainerStdout_NamespaceAllowlistFiltersAtOpen`,
+  `TestContainerStdout_PerNamespaceSubBudget`.
+- **Rubric stands.** Worth a follow-up note in §10 OD-3 about the
+  two-tier bucket structure.
+
+**MILESTONES.md line 371 — `goleak` test for back-pressure.**
+Rubric: "1M-line burst from one rank MUST NOT block sibling streams;
+bounded per-file goroutine + bounded channel (1024); `goleak` test."
+
+- **goleak setup:** tracecore already uses `go.uber.org/goleak` (per
+  go.mod), so the test integration is straightforward. The `goleak`
+  test wraps the receiver Start/Shutdown lifecycle and asserts no
+  leaked goroutines post-shutdown.
+- **Bounded channel: 1024.** Matches k8sevents' default per
+  `config.go:55-60`. The k8sevents ceiling of `2^20` (cap against
+  operator typos that allocate the channel into swap territory) is
+  the right ceiling.
+- **Per-file goroutine isolation:** under BA-1, filelog manages this
+  internally via `pkg/stanza/fileconsumer`'s reader pool. Under
+  BA-2/3, tracecore owns it; pattern is "spawn one goroutine per
+  tailed file, bound the total by `max_concurrent_files` (default
+  1024), drop on overflow with `IncError(KindMaxFilesExceeded)`."
+- **Test target:** `TestContainerStdout_BackPressureGoLeak` already
+  named in §17.
+- **Rubric stands.** No edit proposed.
+
+**MILESTONES.md line 372 — File-handle hygiene with `lsof` golden.**
+Rubric: "≤2× pod-count open fds steady-state; closed within 30s of
+Pod `Terminated`; verified by `lsof` golden."
+
+- **2× pod-count rationale:** ~1 fd per `0.log` plus ~1 fd per
+  currently-rotating `0.log.<ts>` (transient during the kubelet-
+  rotation window of §13.1).
+- **`lsof` golden:** parse `lsof -p $(pidof tracecore) -F n` output;
+  assert count of `/var/log/pods/...` entries ≤ `2 × len(pods)`.
+  Easy CI gate but requires `lsof` in the test image.
+- **30s close-after-Terminated:** the Pod informer's deletion event
+  triggers cursor-GC (§17.3) and fd-close. 30s window covers
+  drain-then-close. Cursor-GC must complete in this window.
+- **Test target:** `TestContainerStdout_FdHygieneAfterPodTermination`.
+- **Rubric stands.** Worth adding `lsof` to the test-image
+  requirements in the design doc.
+
+## 12. Follow-up gaps
+
+Closed gaps and the round-2 evidence that resolved them are recorded here
+for the audit trail.
+
+### Closed
+
+1. **containerd #11149 mitigation path.** **No CRI-RPC mitigation exists.**
+   The CRI `RuntimeService` has no `GetContainerLog` / `ContainerLog` /
+   `StreamLogs` RPC; only `ReopenContainerLog` (a notification). `kubectl
+   logs` tails the same `/var/log/pods` file via kubelet's `/containerLogs`
+   handler, which calls `m.ReadLogs(...)` on the file path returned by
+   `ContainerStatus(...).log_path`. Real fixes have to live upstream of the
+   disk write (ring-buffered shim logger, sidecar logger). Resolved.
+2. **`MaxFiles=1` validation.** **Rejected at two layers.**
+   `KubeletConfiguration.ContainerLogMaxFiles <= 1` returns
+   `"containerLogMaxFiles must be greater than 1"` at config validation
+   (`pkg/kubelet/apis/config/validation/validation.go`); `NewContainerLogManager`
+   also rejects `<= 1` at constructor time. `MaxFiles=2` is accepted but is
+   degenerate (zero retained rotated files; the rotated file is deleted at
+   the next monitor tick before compression can run). Resolved.
+3. **Host FS-full rotation behavior.** **`Rename` fails synchronously; no
+   retry, no K8s Event, no metric.** `rotateLatestLog` returns immediately
+   on `Rename` error; klog logs at error level. The live `0.log` stays
+   open via the runtime's fd and continues to grow past `MaxSize` until
+   the rename succeeds on a later tick or the container exits.
+   `processContainer`'s `defer Forget(key)` neutralizes the workqueue's
+   exponential backoff, so retry happens on every monitor tick. Resolved.
+4. **Containerd vs CRI-O reopen timing.** Both implement
+   `ReopenContainerLog` synchronously (gRPC response IS the ack).
+   Containerd does the dup2 in its CRI plugin; CRI-O signals conmon
+   (historically `SIGUSR1`). Behavior is observably identical to the
+   tailer. Resolved.
+5. **`pkg/stanza` Go API stability.** **~3 BREAKING in 18 months
+   (v0.140–v0.152), all in adjacent code (`pkg/ottl`, `receiver/windowseventlog`).**
+   `pkg/stanza/fileconsumer` public surface has not had a tracked
+   breaking change in this window. No formal deprecation policy
+   documented; rely on collector-wide stability matrix + feature gates.
+   Budget ~1 day/quarter for upgrades, pin contrib version, integration-test
+   each bump. Resolved.
+6. **`fileconsumer` fingerprint default.** **1 KiB default, 16 B minimum.**
+   CRI lines as short as 32 B produce a 32 B fingerprint; collision risk
+   exists only across pods sharing identical first-line prefixes that then
+   idle. Resolves automatically with normal log flow. Recommendation: keep
+   default; raise `fingerprint_size` only if tailing very-low-volume pods.
+   Resolved.
+7. **lit-GPT log format.** Format:
+   `Epoch N | iter N step N | loss train: X, val: Y | iter time: Z ms`
+   with optional ` (step)` suffix on optimizer-step boundaries. **No
+   per-step data_time.** Source:
+   [`litgpt/pretrain.py`](https://github.com/Lightning-AI/litgpt/blob/main/litgpt/pretrain.py)
+   `fit()` format string. Default dataloader regex must convert ms → s.
+   Resolved.
+8. **`TORCHELASTIC_RUN_ID` in `--standalone` mode.** **Random `uuid4()` per
+   invocation.** No correlation to PID, hostname, or start time. Useful as
+   intra-launch grouping key only; do not treat as a stable job
+   identifier. Always pair with an orchestrator-provided ID (Kubeflow
+   `training.kubeflow.org/job-name`, Slurm `SLURM_JOB_ID`, JobSet
+   `jobset-uid`). Resolved.
+
+### Closed (round 3)
+
+11. **Filestorage corruption observability.** Extension emits **zero
+    metrics** and 5 log lines. Tracecore alerts must bind to log-string
+    matches (`"Database corruption detected"`, `"compaction on start
+    failed"`) or host metrics on the storage directory. See §13.10.
+14. **`monitoringPeriod` default value.** **10 seconds.** Set in
+    `pkg/kubelet/apis/config/v1beta1/defaults.go`
+    `SetDefaults_KubeletConfiguration`; unchanged since 2024-02-09. See
+    §13.8.
+15. **PodLogs API KEP status.** **Dead.** KEP-3059 was auto-closed via
+    `lifecycle/rotten` without reaching alpha. No replacement filed.
+    Status quo (file tail via `/var/log/pods`) is the only supported
+    path. See §13.9.
+
+### Still open (deferred to future iterations or measurement)
+
+9. **MPI `OMPI_COMM_WORLD_RANK` extraction.** Reading from
+   `/proc/<pid>/environ` is the only path; needs hostPID +
+   `CAP_SYS_PTRACE` or equivalent. Conflicts with M5b's minimal-privilege
+   policy. **Defer MPI attribution to a future receiver iteration;
+   document the gap.**
+10. **Empirical binary-size delta of `filelogreceiver` import.** Estimate
+    is 15–30 MB unstripped; `-ldflags='-s -w'` saves 20–25%; only
+    custom-builder-manifest tree-shaking removes unused operators.
+    **Measure on a tracecore build before merging M15.**
+12. **Compression-window race.** Kubelet's compress-after-rotate happens
+    in the next monitor tick (~10 s after rename) and writes via
+    `.tmp` + rename (confirmed round-2). Verify filelog only opens after
+    the final rename completes by stat-ing for `.gz` suffix without
+    `.tmp`. Add to integration test.
+13. **Cross-pod fingerprint collision under realistic CRI prefixes.**
+    Write a property-style test: N pods, identical first-line timestamps
+    truncated to second precision + same stream marker, idle, force
+    rotation, assert no offset cross-pollination.
+
+## 13. Round-2 deeper-dive findings
+
+These sections record the evidence that resolved most of the §12 gaps and
+several §10 decisions. They sit here (rather than expanding earlier
+sections) so the original first-pass research stays readable as written.
+
+### 13.1 Kubelet rotation source dive
+
+Source: `pkg/kubelet/logs/container_log_manager.go` (kubernetes/kubernetes,
+master), `pkg/kubelet/apis/config/validation/validation.go`.
+
+- **`MaxFiles <= 1` is rejected.** Two layers: KubeletConfiguration
+  validation returns `"containerLogMaxFiles must be greater than 1"`;
+  `NewContainerLogManager` returns `"invalid MaxFiles N, must be > 1"`.
+  `MaxFiles=2` is the degenerate case — `removeExcessLogs` deletes every
+  rotated file at the next tick (`maxRotatedFiles = MaxFiles - 2 = 0`),
+  before compression can run.
+- **FS-full rotation rollback.** `rotateLatestLog` returns immediately on
+  `Rename` failure; no retry, no K8s Event, no metric — only klog at
+  error level. The live `0.log` stays open via the runtime's fd and
+  grows past `MaxSize` until either rename succeeds on a later tick or
+  the container exits.
+- **Rename → Reopen sequence.** `Rename(log, rotated)` is synchronous on
+  the same FS; if it succeeds, kubelet calls
+  `runtimeService.ReopenContainerLog(ctx, id)` (synchronous gRPC). On
+  reopen failure, kubelet attempts a rollback rename (`rotated → log`).
+  Pathological case: if the rollback itself fails AND kubelet then
+  restarts, the original log is orphaned (containerd/CRI-O still writing
+  to the rotated-file inode, but `0.log` does not exist; the comment in
+  source warns "we'll lose original log").
+- **Workqueue backoff neutralized.**
+  `processContainer`'s `defer queue.Forget(key)` resets the
+  rate-limiter history; failed rotations retry on every monitor tick
+  with no backoff.
+- **Compression timing.** `compressLog` runs in the **same goroutine as
+  rotation**, BEFORE `rotateLatestLog`, on the NEXT tick after the rename
+  (default `monitoringPeriod` ~ 10 s). So a renamed plain file sits
+  uncompressed for roughly one monitor period before becoming `.gz`.
+  Compression uses `<log>.tmp` + rename for atomicity; mid-write `.gz`
+  is therefore not observable.
+
+**Receiver implication.** Add a degraded-mode signal when `0.log` size
+exceeds `containerLogMaxSize` for sustained periods — this is the only
+observable a tailer has for upstream rotation failure (rubric R-8).
+
+### 13.2 CRI `GetContainerLogs` is not a thing
+
+The CRI v1 `RuntimeService`
+([`pkg/apis/runtime/v1/api.proto`](https://github.com/kubernetes/cri-api/blob/master/pkg/apis/runtime/v1/api.proto))
+has no `GetContainerLog`, `ContainerLog`, or `StreamLogs` RPC. The only
+log-related RPC is `ReopenContainerLog(ReopenContainerLogRequest)
+returns (ReopenContainerLogResponse)` — a notification, not a read.
+Log content lives only as a file path (`ContainerStatus.log_path`).
+
+`kubectl logs` is a file tail behind kubelet's HTTP API:
+`pkg/kubelet/server/server.go`'s `/containerLogs/{ns}/{pod}/{container}`
+calls `HostInterface.GetKubeletContainerLogs`, which retrieves
+`log_path` from `ContainerStatus(...)` and then calls `m.ReadLogs(...)`
+in `pkg/kubelet/kuberuntime/logs/logs.go` — a standard `os.Open` on the
+CRI text file.
+
+Implication for #11149: **no CRI consumer can recover bytes lost when
+an in-container reader contends on the shared stdout pipe**, because
+the loss happens before anything observable at the CRI surface. The
+fix has to live in containerd's IO copier (move from a shared pipe to
+a non-blocking ring-buffer or separate copy targets) or in operator
+discipline (don't `cat /proc/1/fd/1` from inside training containers).
+Document the failure mode in the receiver README; do not propose
+CRI-RPC mitigation in the design doc.
+
+### 13.3 `pkg/stanza` API stability and binary cost
+
+CHANGELOG audit of contrib v0.140–v0.152 (~18 months):
+
+- **3 BREAKING changes, all in adjacent code:** OTTL semantics changes
+  in v0.150 / v0.152 and a Windows event_data shape change in v0.148.
+  None in `pkg/stanza/fileconsumer` itself.
+- **6 enhancements + 3 bug fixes** tagged `pkg/stanza` or
+  `receiver/file_log` in the same window — active maintenance.
+- **Feature gates** in flight: `filelog.protobufCheckpointEncoding`
+  (v0.148, alpha — new bbolt key encoding) and
+  `filelog.decompressFingerprint` (stable at v0.142).
+- **No formal deprecation policy** documented in the CHANGELOG; the
+  project relies on collector-wide stability matrix and feature gates.
+- **Production users:** `otelcol-k8s` distribution ships filelog +
+  `filestorage` by default
+  ([manifest](https://github.com/open-telemetry/opentelemetry-collector-releases/blob/main/distributions/otelcol-k8s/manifest.yaml)).
+  Strong signal that filelog is production-load-tested.
+
+**Binary-size estimate:** 15–30 MB unstripped (qualitative, from package
+count and reflection-driven decode patterns); `-ldflags='-s -w'` saves
+20–25 % by stripping DWARF/symbol tables; **only custom-builder-manifest
+tree-shaking removes unused operators**, because `pkg/stanza`
+type-registers all operators at init. Measure before claiming.
+
+**Filestorage internals:** bbolt single-writer per file (one file per
+component, sharing a directory); default
+`/var/lib/otelcol/file_storage`; `create_directory: true` /
+`recreate_on_error: true` toggles; on corruption renames
+`<name>.<ISO8601>.backup` and starts fresh. Single-collector-per-node is
+fine; shared-PVC anti-pattern would deadlock on the file lock.
+
+**Fingerprint default:** 1 KiB (`pkg/stanza/fileconsumer/internal/fingerprint/fingerprint.go`),
+minimum 16 B. CRI lines as short as 32 B yield 32 B fingerprints —
+collision risk exists across pods that share an identical first-line
+prefix and then idle, but resolves under normal log flow because the
+fingerprint grows with the file.
+
+### 13.4 Pod attribution: `k8sattributes` pairing + env-var prior art
+
+**Pairing config** for the `container` stanza operator's output:
+
+```yaml
+processors:
+  k8sattributes:
+    pod_association:
+      - sources:
+          - from: resource_attribute
+            name: k8s.pod.uid
+    extract:
+      metadata:
+        - k8s.node.name
+        - k8s.deployment.name
+        - k8s.daemonset.name
+        - k8s.job.name
+        - k8s.statefulset.name
+        - k8s.pod.start_time
+      labels:
+        - tag_name: training.kubeflow.org/job-name
+          key: training.kubeflow.org/job-name
+          from: pod
+        - tag_name: training.kubeflow.org/replica-type
+          key: training.kubeflow.org/replica-type
+          from: pod
+        - tag_name: training.kubeflow.org/replica-index
+          key: training.kubeflow.org/replica-index
+          from: pod
+        - tag_name: jobset.sigs.k8s.io/jobset-name
+          key: jobset.sigs.k8s.io/jobset-name
+          from: pod
+        - tag_name: jobset.sigs.k8s.io/job-index
+          key: jobset.sigs.k8s.io/job-index
+          from: pod
+```
+
+Critical: the **default `pod_association`** is `from: connection` (the
+OTLP client's source IP). For a local file-tail pipeline, the
+"connection" is the agent itself, so the default does not work; explicit
+`k8s.pod.uid` association is required.
+
+**RBAC for k8sattributes (under §15.1 BA-1 only):**
+`get,list,watch` on `pods`, `namespaces`, `nodes` (core);
+`apps` `replicasets,deployments,statefulsets,daemonsets`
+for owner-kind resolution; `batch` `jobs,cronjobs` for job ownership.
+ClusterRoleBinding to tracecore's ServiceAccount.
+
+This is the **upstream `k8sattributesprocessor`'s default RBAC**, which
+is what BA-1 inherits. Under BA-2 or BA-3, tracecore implements its
+own informer and SHOULD scope the watch to node-local pods via
+`FieldSelector=spec.nodeName=$NODE_NAME` (see §16.3 — this contradicts
+the BA-1 default and intentionally so, because under BA-2/BA-3 we
+control the informer scope and should use the tighter setting).
+Resolution of the apparent §13.4 vs §16.3 conflict: §13.4 documents
+the upstream-OTel default (BA-1 inherits this); §16.3 documents
+tracecore's recommended posture (BA-2/BA-3 own this choice).
+
+**Env-var-as-attributes prior art: not found in the components
+surveyed.** The components I checked
+(`resourcedetectionprocessor`, OTel Operator, `k8sattributesprocessor`,
+OpenInference / LangSmith / Arize semantic conventions) do not project
+arbitrary running-pod env vars into attributes. The OTel Operator
+injects only the OTel SDK config set (`OTEL_RESOURCE_ATTRIBUTES`,
+`OTEL_NODE_IP`, etc.). Components NOT surveyed and worth validating
+before declaring greenfield: `k8sobjectsreceiver` (a generic K8s API
+object collector), Datadog Agent's autodiscovery, Vector's
+`kubernetes_logs` source. The strength of the "greenfield" framing
+is therefore "no prior art among the components surveyed", not "no
+prior art exists."
+
+Implication for OD-1b: two viable paths.
+
+- **Path A (downward API + resourcedetection).** Operators add an env
+  block to their PodSpec / training launcher script:
+  ```yaml
+  env:
+    - name: OTEL_RESOURCE_ATTRIBUTES
+      value: "tracecore.training.rank=$(RANK),tracecore.training.world_size=$(WORLD_SIZE),tracecore.training.job.id=$(TORCHELASTIC_RUN_ID)"
+  ```
+  Then `resourcedetectionprocessor` with the `env` detector lifts these
+  to resource attributes. No new tracecore code; one shell line per
+  launch script.
+- **Path B (tracecore env-projection processor).** New processor reading
+  `Pod.spec.containers[*].env` via informer; maps configured env names
+  to attributes keyed on `(k8s.pod.uid, k8s.container.name)`. Invisible
+  to operators but adds pod-watch RBAC and a cardinality surface.
+
+Recommend shipping Path A as the default (documented in the receiver
+README) and Path B as an opt-in for unmodified workloads.
+
+### 13.5 Cursor persistence: bbolt wins over JSON
+
+Production landscape (round-2 survey):
+
+| Shipper | Backend | Format | Notes |
+|---|---|---|---|
+| Fluent Bit `tail` | SQLite (WAL) | binary | `.db-shm` / `.db-wal` companions |
+| Vector `file` | JSON | `checkpoints.json` | tmp + atomic rename, human-readable |
+| Datadog Agent | JSON | `registry.json` | optional atomic write toggle |
+| Promtail | YAML | `positions.yaml` | flat path → offset map |
+| OTel filelog | bbolt | binary | via `file_storage` extension |
+
+The rubric's JSON-at-`cursor.json` matches Vector/Datadog convention,
+not OTel. The rubric predates the depend-on-filelog decision; the
+cleanest resolution is to adopt filelog's `file_storage` (bbolt) as
+authoritative and update the rubric (R-6). Optional read-only JSON
+snapshot for human inspection can be a low-frequency side-channel —
+not the source of truth, to avoid dual-cursor consistency bugs.
+
+Config:
+
+```yaml
+extensions:
+  file_storage:
+    directory: /var/lib/tracecore/container_stdout
+    fsync: true
+    compaction:
+      on_start: true
+
+receivers:
+  filelog:
+    storage: file_storage
+    ...
+```
+
+### 13.6 Gzip-compressed rotated files: decompress and continue
+
+Filelog supports `compression: auto` (since contrib v0.142+) and
+`compression: gzip`; with `auto`, files matching gzip suffix are read
+through a transparent decompressing reader.
+
+Known gzip-corruption bugs (`#46105` rotation-with-compression
+corruption; `#45572` last-line-without-newline) are **fixed in contrib
+v0.144+**. Pin and integration-test.
+
+Other shippers:
+
+- Fluent Bit `tail`: no gzip decompression; users typically
+  `exclude_path: *.gz` and accept the gap.
+- Vector `file`: no gzip decompression.
+- Filebeat: supports `.gz` via the `tail` reader with a separate
+  `gzip` processor.
+
+Recommendation: **decompress.** Kubelet's rename-then-gzip flow makes
+"accept the gap" a real data-loss surface on chatty pods during
+pod-death bursts. Add integration test that writes 50 MB to `0.log`,
+triggers rotation + gzip + new `0.log` in sequence, asserts zero record
+loss across the seam.
+
+### 13.7 lit-GPT log format
+
+Real samples from
+[`litgpt` issues #1110 and #1607](https://github.com/Lightning-AI/litgpt/issues):
+
+```
+Epoch 4 | iter 962 step 962 | loss train: 0.937, val: 1.057 | iter time: 503.53 ms
+Epoch 34 | iter 3051050 step 610210 | loss train: 2.895, val: 2.861 | iter time: 111.24 ms (step)
+```
+
+Source format string in `litgpt/pretrain.py:fit()`:
+
+```python
+f"Epoch {metrics['epoch'] + 1} | iter {metrics['iter']} step {metrics['step']} |"
+f" loss train: {metrics['loss']:.3f}, val: {val_loss} |"
+f" iter time: {metrics['iter_time'] * 1000:.2f} ms"
+f"{' (step)' if not is_accumulating else ''}"
+```
+
+Fields present: `epoch`, `iter`, `step`, `loss train`, `val`,
+`iter time` (ms), optional ` (step)` marker on optimizer-step
+boundaries. **No per-step data_time** in the printed line.
+
+Regex for extraction (note: source emits ms; receiver must divide):
+
+```regex
+^Epoch\s+\d+\s+\|\s+iter\s+\d+\s+step\s+\d+\s+\|\s+loss train:\s+[\d.]+,\s+val:\s+\S+\s+\|\s+iter time:\s+(?P<iter_time_ms>[\d.]+)\s+ms
+```
+
+Use the trailing `(step)` flag to gate on true optimizer steps; do not
+emit derived metrics on accumulation-only iterations.
+
+### 13.8 Kubelet `containerLogMonitorInterval` default
+
+**10 seconds.** Set in `pkg/kubelet/apis/config/v1beta1/defaults.go` in
+`SetDefaults_KubeletConfiguration`; introduced 2024-02-09 in commit
+`ab8c784ee970d72b03fd1c2ed7c228914e17e954` ("kubelet: enable configurable
+rotation duration and parallel rotate"), unchanged since.
+
+```go
+if obj.ContainerLogMonitorInterval == nil {
+    obj.ContainerLogMonitorInterval = &metav1.Duration{Duration: 10 * time.Second}
+}
+```
+
+Worst-case rotation latency on defaults: ~10 s (one monitor period) plus
+single-worker queue drain (`ContainerLogMaxWorkers` default `1`). A
+container emitting >`ContainerLogMaxSize` (10 MiB default) in <10 s can
+briefly exceed the cap before kubelet rotates. Relevant for rubric R-8's
+"sustained period" definition: 30 s is a safe threshold for "rotation
+stalled" alerting (3× the monitor period).
+
+### 13.9 PodLogs API KEP status: dead
+
+**No upstream replacement path.** KEP-3059 ("Add pod level logs API")
+was filed 2021-11-27 as `[WIP]`, never assigned a sig-node owner, and
+auto-closed via `lifecycle/rotten` without reaching alpha
+(<https://github.com/kubernetes/enhancements/issues/3059>). No
+replacement KEP has been opened. Adjacent work (KEP-2411 CRI log
+rotation, KEP-1602 structured logging, KEP-1753 logs sanitization,
+KEP-3077 contextual logging) does not promote container logs to a
+streaming API.
+
+The status quo — kubelet `/containerLogs/{ns}/{pod}/{container}`
+proxied through apiserver `GET .../log` plus node-local file tailing —
+is the only supported pattern. **Tracecore M15 design does not need to
+anticipate a near-term upstream shift.**
+
+### 13.10 `file_storage` extension observability surface
+
+**Zero metrics. Five log lines. No collector-side health metric.**
+
+Source: `extension/storage/filestorage/extension.go`, `factory.go`.
+
+Log lines emitted (via component logger):
+
+1. `Warn` — "filename too long, using hashed filename instead"
+2. `Warn` — "Database corruption detected, recreating database file"
+   (fires from `recover()` panic handler)
+3. `Info` — "Corrupted database file renamed" (backup path)
+4. `Error` — "compaction on start failed"
+5. `Info`/`Debug` — "cleanup" / "cleanup error listing temporary files"
+
+Notably absent:
+- No lock-timeout log when bbolt can't acquire the file lock within the
+  configured `Timeout` (default 1 s); the error propagates to the
+  caller (filelog) as a component error.
+- No metric instrumenting rebuild count, lock-wait, or db-size growth.
+
+**Tracecore alerting strategy (M15 RUNBOOK):**
+
+- **Log-string alert** on `"Database corruption detected"` and
+  `"compaction on start failed"` from logger name
+  `extension/file_storage`. This is the only collector-side signal.
+- **Host metrics** on the storage directory via `hostmetricsreceiver`
+  (`filesystem` scraper): disk free, inode pressure, db file mtime
+  drift as liveness proxy.
+- **Downstream signal:** filelog's
+  `otelcol_receiver_refused_log_records` rising while the storage
+  path is unreachable. Indirect but observable.
+
+Defaults that matter: `Timeout: 1s`, `Compaction.OnStart: false`,
+`Compaction.OnRebound: false`, `Compaction.CheckInterval: 5s`,
+`FSync: false`, `DirectoryPermissions: 0750`. Recommend
+`Compaction.OnStart: true` and `FSync: true` for tracecore's
+production posture; both add small startup/write cost but improve
+crash safety.
+
+### 13.11 `container` operator vs `k8sattributes` attribute overlap
+
+**Status banner (read before continuing):** the pipeline YAML below is
+**illustrative of the semantic shape**, not the deployable
+configuration. Tracecore's pipeline runtime is not the OTel Collector
+(see §15). Under any build approach other than BA-1, the components
+named here (filelog, k8sattributes, resourcedetection, transform,
+file_storage) cannot run as-shown. The YAML is preserved as a
+specification of *which concerns* the receiver and its surrounding
+processors must address, not *how* they are wired. Under BA-2/BA-3,
+each concern becomes a tracecore-native component with equivalent
+behavior.
+
+**Operator wins by default; processor fills in gaps.** Three attributes
+overlap (`k8s.namespace.name`, `k8s.pod.name`, `k8s.pod.uid`) but
+`k8sattributesprocessor.setResourceAttribute` skips non-empty
+existing values:
+
+```go
+func setResourceAttribute(attributes pcommon.Map, key, val string) {
+    attr, found := attributes.Get(key)
+    if !found || attr.AsString() == "" {
+        attributes.PutStr(key, val)
+    }
+}
+```
+
+**Canonical pipeline config** that avoids double-extraction and
+exercises both components' strengths:
+
+```yaml
+receivers:
+  filelog/containerstdout:
+    include: [/var/log/pods/*/*/*.log]
+    start_at: end
+    storage: file_storage
+    compression: auto                      # OD-6: decompress rotated .gz
+    operators:
+      - type: container                    # writes k8s.{pod.uid,pod.name,namespace.name,container.name,container.restart_count}
+
+processors:
+  k8sattributes/training:
+    auth_type: serviceAccount
+    pod_association:
+      - sources:
+          - from: resource_attribute
+            name: k8s.pod.uid              # operator already set this
+    extract:
+      metadata:
+        - k8s.pod.start_time               # not duplicated by operator
+        - k8s.deployment.name
+        - k8s.statefulset.name
+        - k8s.daemonset.name
+        - k8s.job.name
+        - k8s.node.name
+      labels:
+        - tag_name: training.kubeflow.org/job-name
+          key: training.kubeflow.org/job-name
+          from: pod
+        - tag_name: training.kubeflow.org/replica-type
+          key: training.kubeflow.org/replica-type
+          from: pod
+        - tag_name: training.kubeflow.org/replica-index
+          key: training.kubeflow.org/replica-index
+          from: pod
+        - tag_name: jobset.sigs.k8s.io/jobset-name
+          key: jobset.sigs.k8s.io/jobset-name
+          from: pod
+
+  resourcedetection/training:
+    detectors: [env]                       # OD-1b Path A: lifts $OTEL_RESOURCE_ATTRIBUTES
+    override: false                        # preserve any operator-set values
+
+  transform/training_dataloader:           # §14.1 — OTTL recipe for dataloader regex
+    error_mode: ignore
+    log_statements:
+      - context: log
+        statements:
+          - merge_maps(attributes,
+              ExtractPatterns(body,
+                "\\btime:\\s+(?P<iter_time_s_str>\\d+(?:\\.\\d+)?)\\b.*?\\b(?:data_time|data):\\s+(?P<data_time_s_str>\\d+(?:\\.\\d+)?)\\b"),
+              "upsert")
+            where IsString(body) and IsMatch(body, "\\btime:.*\\b(?:data_time|data):")
+          - set(attributes["tracecore.training.iter_time_s"], Double(attributes["iter_time_s_str"]))
+            where attributes["iter_time_s_str"] != nil
+          - set(attributes["tracecore.training.data_time_s"], Double(attributes["data_time_s_str"]))
+            where attributes["data_time_s_str"] != nil
+          - delete_key(attributes, "iter_time_s_str")
+          - delete_key(attributes, "data_time_s_str")
+
+extensions:
+  file_storage:                            # OD-5: bbolt cursor persistence
+    directory: /var/lib/tracecore/container_stdout
+    fsync: true
+    compaction:
+      on_start: true
+
+service:
+  extensions: [file_storage]
+  pipelines:
+    logs/containerstdout:
+      receivers: [filelog/containerstdout]
+      processors:
+        - k8sattributes/training
+        - resourcedetection/training
+        - transform/training_dataloader
+        - tracecore_ratelimit                # OD-3: per-(pod_uid, container) token bucket
+      exporters: [...]
+```
+
+This is **not the final M15 config** — it's the canonical shape that
+the receiver wrapper should produce or document, with tracecore-specific
+defaults baked in.
+
+### 13.12 Industry training-observability namespace survey
+
+Surveyed 10 platforms; **no shared namespace exists for distributed-training
+signals.** Each platform invents its own:
+
+| Platform | Namespace | Rank attribution |
+|---|---|---|
+| NVIDIA Triton | `nv_*` (flat snake_case Prometheus) | `gpu_uuid` label, no rank |
+| NVIDIA NeMo | None (flat PTL keys: `train_step_timing`) | Not first-class |
+| Google Vertex AI | `AIP_*` env vars; no metric namespace | JSON `CLUSTER_SPEC` env |
+| AWS SageMaker | `/aws/sagemaker/TrainingJobs` CW namespace | Encoded in `Host` dimension |
+| AWS Bedrock fine-tuning | None — no metric stream | API/EventBridge only |
+| MosaicML Composer | Slash-path (`throughput/batches_per_sec`) | W&B/MLflow run tag |
+| Together fine-tuning | Flat object fields | Not exposed |
+| OpenAI fine-tuning | Flat fields under `data` | Not exposed |
+| Weights & Biases | `gpu.*`, `gpu.process.*` | Per-rank run |
+| MLflow | Flat snake_case (`gpu_{i}_*`) | Per-rank run |
+
+**Three robust patterns emerge from the survey:**
+
+1. **Rank is universally a resource-level identifier**, not a per-metric
+   label. SageMaker bakes it into `Host`; W&B/MLflow tag the run;
+   Composer stores it as a run attribute. Tracecore should follow:
+   `tracecore.training.rank` is a *resource attribute*, set once per
+   process, never a record-level label.
+2. **No vendor uses a shared namespace.** `nv_*` is Triton-only. `gpu.*`
+   is W&B-only. Tracecore's `tracecore.training.*` is consistent with
+   the prevailing pattern (vendor-prefixed for unstandardized concepts).
+3. **`process.runtime.*` does not fit.** SemConv's process registry is
+   explicitly OS-process + language-VM metadata; zero collective-comm
+   vocabulary. Filing an upstream PR is unlikely to land given the
+   GenAI SIG's inference-only scope. `system.*` is forbidden by spec
+   for non-host metrics. There is no upstream group that accepts
+   distributed-training attributes today.
+
+**Verdict (caveated): the surveyed industry has no shared
+distributed-training namespace.** The sample is biased: it covers the
+ten most visible commercial / cloud-vendor platforms (Triton, NeMo,
+Vertex, SageMaker, Bedrock, Composer, Together, OpenAI, W&B, MLflow),
+which are exactly the actors with the market power to invent and
+sustain a vendor-prefixed namespace. Smaller / newer platforms not
+surveyed in this pass include **ClearML, Determined.AI, Skypilot,
+lit-Lightning, Anyscale**, any of which may use shared conventions
+inherited from PyTorch / MLflow rather than inventing their own.
+
+The conclusion "no shared namespace exists" is therefore "no shared
+namespace exists among the platforms surveyed." Under §7's revised
+framing, this verdict is also less decisive than it appeared: tracecore
+is actively trying to *create* the shared namespace via NORTHSTARS O4.
+A small-platform follow-up survey would either reveal an existing
+convention worth aligning with or confirm O4's hypothesis that no
+convention exists yet.
+
+Round-1 used this survey to corroborate `tracecore.training.*` as the
+right namespace. §7's revision withdraws that recommendation pending
+the O4 owner's call. The survey remains a useful negative-evidence
+artifact, not a positive recommendation.
+
+### 13.13 `TORCHELASTIC_RUN_ID` in `--standalone` mode
+
+Source: `torch/distributed/run.py` in
+[pytorch/pytorch](https://github.com/pytorch/pytorch).
+
+```python
+if args.standalone:
+    args.rdzv_backend = "c10d"
+    args.rdzv_endpoint = "localhost:0"
+    args.rdzv_id = str(uuid.uuid4())
+```
+
+That `rdzv_id` is passed to `LaunchConfig(run_id=args.rdzv_id, ...)` and
+surfaced to workers as `TORCHELASTIC_RUN_ID`.
+
+- **Random `uuid4()` per launcher invocation.** Siblings within one
+  launch share the value; cross-launch correlation is zero.
+- **No correlation to PID, hostname, or wall clock.** Purely random
+  bits.
+- **Practical implication.** Intra-launch grouping key only. Do not
+  treat as a stable job identifier; a crash-loop produces a new UUID
+  every restart. For job-level attribution, pair with an orchestrator
+  identifier: `training.kubeflow.org/job-name`, `SLURM_JOB_ID`, or
+  `jobset.sigs.k8s.io/jobset-uid`.
+
+Receiver README must document this so operators don't store
+`tracecore.training.job.id = $TORCHELASTIC_RUN_ID` and expect cross-restart
+joins to work.
+
+## 14. Pipeline-shape illustration and OTTL recipe (semantics-only)
+
+**Status banner.** The OTTL recipe in this section is **specification
+of the desired semantic behavior**, not a runnable Collector config. It
+has not been validated against `otelcol validate` or executed against a
+real pipeline. Under §15.1 BA-0 (sidecar collector), the recipe would
+run inside the sidecar; under BA-1, it would run via the adapter;
+under BA-2 or BA-3, the equivalent must be reimplemented as a native
+tracecore processor using `regexp` from the Go standard library. The
+syntax shown was checked against the OTTL `ottlfuncs` README on
+2026-05-19 but not exercised. Treat as a design hint, not as a
+deliverable.
+
+The pipeline config sketched in §13.11 captures the canonical shape.
+Notable elements:
+
+- **`compression: auto`** on the filelog receiver (OD-6).
+- **`storage: file_storage`** + `fsync: true` + `compaction.on_start: true`
+  (OD-5).
+- **`k8sattributes` with `pod_association` keyed on `k8s.pod.uid`**.
+  Default IP-based association doesn't work for local file tails.
+- **`resourcedetection` with `env` detector** for OD-1b Path A: lifts
+  `OTEL_RESOURCE_ATTRIBUTES=tracecore.training.rank=$(RANK),...` set
+  by the training launcher via downward API.
+- **OTTL `transform` processor** for dataloader regex extraction, with
+  named capture groups, body-type guard (`IsString`), match-shortcut
+  (`IsMatch`), explicit `Double(...)` coercion, and temp-key cleanup.
+  This resolves OD-4 as config-only; no new Go code required.
+- **`tracecore_ratelimit` processor** (new) for per-key token bucket
+  on `(k8s.pod.uid, k8s.container.name)` (OD-3).
+
+### 14.1 OTTL recipe nuances
+
+Worth calling out for the design phase:
+
+- **`ExtractPatterns` requires named capture groups** (Go regex
+  identifier rules; no dots in the name). Use temp keys then rename in
+  a second step.
+- **Body-type guard is mandatory.** Structured logs where `body` is a
+  map will crash the extract under `error_mode: propagate`. Under
+  `error_mode: ignore` it silently no-ops but is slower; the `IsString`
+  + `IsMatch` guards short-circuit cleanly.
+- **Type coercion is explicit.** `ExtractPatterns` always returns
+  strings; downstream gauges/histograms need floats. `Double(...)`
+  returns `nil` on parse failure (per OTTL `ottlfuncs` README — verified
+  2026-05-19), so the `!= nil` guard is required to skip rather than
+  emit a null-valued attribute.
+- **Pattern configurability:** OTTL has no env-var interpolation;
+  expose the regex pattern via Collector-config template substitution
+  (`confmap` providers) at startup, not inside OTTL.
+
+## 15. Correction: tracecore is not an OTel Collector distribution
+
+**Discovered during the 2026-05-19 confidence-raising pass; invalidates
+significant parts of §5, §13.11, and §14 as originally written. Section
+left in place for the audit trail; this section is the load-bearing one.**
+
+Tracecore has its **own** pipeline runtime under
+[`internal/pipeline/`](../../internal/pipeline). Quoting
+`internal/pipeline/factory.go:72-79`:
+
+```go
+type ReceiverFactory interface {
+    Type() Type
+    CreateDefaultConfig() Config
+
+    CreateMetrics(ctx context.Context, set CreateSettings, cfg Config, next consumer.Metrics) (Receiver, error)
+    CreateTraces(ctx context.Context, set CreateSettings, cfg Config, next consumer.Traces) (Receiver, error)
+    CreateLogs(ctx context.Context, set CreateSettings, cfg Config, next consumer.Logs) (Receiver, error)
+}
+```
+
+This **mirrors** upstream `go.opentelemetry.io/collector/receiver.Factory`
+at v1.55.0 in shape (per the source comment in `factory.go:37-38`), but
+is **a different Go type**. Upstream filelog's `receiver.Factory` does
+not satisfy tracecore's `pipeline.ReceiverFactory`; the interfaces have
+different method names (`CreateLogs` vs `CreateLogsReceiver`), different
+parameter types (`pipeline.CreateSettings` vs `receiver.Settings`,
+`pipeline.Config` vs `component.Config`), and tracecore's version omits
+the `*ReceiverStability()` methods.
+
+Confirmed by grep across the repo: only test files in
+`components/receivers/kernelevents/otelcontrib_e2e_test.go` import the
+upstream `go.opentelemetry.io/collector/receiver` packages, and only as
+test-time end-to-end fixtures. No runtime adapter or shim exists.
+
+### 15.1 What this means for the build approach
+
+**You cannot "depend on filelog" as §5 originally said.** Five viable
+paths. The adversarial review surfaced two that round-2 missed (BA-0,
+BA-4); they are included here.
+
+| ID | Approach | Cost | Trade-offs |
+|---|---|---|---|
+| BA-0 | **Sidecar otelcol DaemonSet.** Run upstream `otelcol-k8s` as a sibling DaemonSet (separate from the tracecore DaemonSet); configure its filelog receiver and exporters; have it ship OTLP to tracecore as an upstream producer. Tracecore consumes that OTLP via its existing receiver path. | Lowest. No tracecore code change. Operator-config-only. | Adds a second pod per node. Two binaries to keep current. Two RBAC sets. Resource overhead duplicates what tracecore already provides for hostmetrics. Loses any tracecore-specific receiver knobs (per-rank attribution, dataloader regex, tracecore rate-limiting) unless we also build them as native processors downstream. |
+| BA-1 | Build a generic `pipeline.OTelReceiverFactory` adapter in `internal/pipeline/` that wraps any upstream `receiver.Factory`. Import `filelogreceiver` through the adapter. | High up front; pays back across future imports of OTel components. | Once-and-done infrastructure. Future M16 (Kueue scraper) and other receivers benefit. Risk: subtle semantic mismatches in `CreateSettings`/`TelemetrySettings` field mapping. Reasonable mitigation: `plog.Logs` IS what tracecore's `consumer.Logs` already takes (verified at `internal/consumer/logs.go:13`), so data-model interop is not the problem; factory-interface bridging is. |
+| BA-2 | Port `pkg/stanza/fileconsumer` and the `container` operator as a vendored Go dependency, write tracecore-native wiring (factory, config, lifecycle) around them. The consumed types are `plog.Logs` and `pcommon.Map`, both already in tracecore's data path (`internal/consumer/logs.go:13`). | Medium; one-time port + tracking upstream releases. | Inherits filelog's tail mechanics without the upstream-receiver-interface coupling. Vendor-update burden on every contrib release we care about. Data-model interop verified clean per `plog.Logs` finding above. |
+| BA-3 | Reimplement on top of `fsnotify` or polling + custom CRI parser. | Highest; we own the rotation/fingerprint/partial-line bugs forever. | No upstream dependency surface. Reinvents proven code. Round-1 finding was that this is exactly what we shouldn't do. |
+| BA-4 | **Defer M15. Promote the adapter to its own milestone first.** If BA-1 is the right answer architecturally but only justifies its cost when ≥2 upstream components ride on it, ship the adapter as its own milestone (M-Adapter, between Lane 1 and Lane 4), then ship M15 against it as the first user. | Low for the M15 sequencing decision; the adapter cost moves to a new milestone. | Scope-discipline win: M15 doesn't carry the adapter's design overhead. Calendar cost: M15 ships later. Surfaces the build-approach question at a milestone-planning level (the right level), not as a Day-1 M15 design decision. |
+
+**No single recommendation in this research pass.** The choice depends
+on stakeholder questions this doc cannot answer alone:
+
+1. Is M16 (Kueue receiver) intended to consume upstream OTel
+   components? Verified evidence (MILESTONES.md lines 377-395): M16 is
+   a Prometheus scrape against `kueue-controller-manager`, emits OTLP
+   metrics, has a custom `cluster_queue` cardinality cap (default 256),
+   and registers via `components.go` one-line factory. This rubric
+   **could** be implemented either way: (a) tracecore-native scraper
+   (no BA-1 dependency, ~mid-size work), or (b) wrap upstream OTel
+   `prometheusreceiver` via BA-1's adapter (reuses upstream Prom +
+   histogram + label-translation machinery, but commits to BA-1's
+   adapter cost). The rubric does not mandate either path. **Net:
+   M16's owner has a real choice.** If they pick (b), BA-1 amortizes
+   across both receivers; if they pick (a), BA-1 serves only M15 and
+   BA-2 becomes the cheaper choice for tracecore overall.
+2. Is the additional pod-per-node footprint of BA-0 acceptable in
+   tracecore's "minimal-privilege, single-binary" positioning? Per
+   `NORTHSTARS.md` O2 (Convenience) — likely no, but it's a stakeholder
+   call.
+3. Is scope-discipline more valuable than calendar speed? BA-4 explicitly
+   trades the latter for the former.
+
+Round-1 recommended BA-1; round-2's correction implied BA-1 or BA-2;
+round-3 (this) recognizes BA-0 and BA-4 as legitimate options that
+should be on the table before any commitment.
+
+This decision should be made before any M15 implementation work, with
+M16's owner, the O2 stakeholder, and the milestone-planning lead in the
+room.
+
+### 15.2 What this means for the pipeline config in §13.11
+
+**Most of §13.11 is illustrative-only.** Tracecore's pipeline runtime
+does not load an `otelcol`-style YAML config; it has its own loader and
+component graph. The example pipeline shows what an equivalent
+upstream-OTel pipeline would look like, useful for understanding which
+*concerns* need to be addressed, but **not deployable as-shown**.
+
+The concerns that still apply, with adjusted owners:
+
+- **CRI parsing + partial-line recombine.** Either ported from
+  `pkg/stanza/operator/parser/container` (BA-2) or invoked through the
+  adapter (BA-1). Owner: the M15 receiver.
+- **Cursor persistence.** No filelog `file_storage` extension; the
+  receiver owns this. Reconsider OD-5: under BA-1 the bbolt path is
+  re-attainable through the adapter; under BA-2/BA-3 the rubric's
+  original JSON-at-`cursor.json` becomes the sane choice. **Rubric edit
+  R-6 is now conditional on BA-1.**
+- **Pod attribution.** `k8sattributesprocessor` is also an upstream
+  component; same adapter question. If we don't have BA-1, tracecore
+  owns the informer + attribute-projection logic. The k8sevents
+  informer wiring (verified in §8) is the template.
+- **Dataloader regex.** OTTL `transformprocessor` is upstream-only too.
+  Without BA-1, this needs to be a native tracecore processor (config:
+  `dataloader_regex` string; behavior: ExtractPatterns-equivalent
+  using `regexp` stdlib). The OTTL recipe in §14 is illustrative of
+  the *semantics* but cannot run as-written.
+- **Rate limiting.** Native tracecore processor regardless of build
+  approach; no upstream equivalent existed anyway (OD-3 conclusion
+  unchanged).
+- **Env-var projection.** Native tracecore (Path B in OD-1b);
+  upstream `resourcedetectionprocessor` only reads
+  `OTEL_RESOURCE_ATTRIBUTES`, which is Path A.
+
+### 15.3 What this means for §10 open decisions
+
+New open decision and updates:
+
+| ID | Status | Update |
+|---|---|---|
+| OD-11 | OPEN (new) | **Build approach choice: BA-0 (sidecar) / BA-1 (adapter) / BA-2 (port `fileconsumer`) / BA-3 (reimplement) / BA-4 (defer).** See §15.1 and §15.6. Decision needs M16 owner input. |
+
+### 10.1 Open-decision owner table
+
+| OD | Open question | Primary owner | Stakeholders | Target decision date |
+|---|---|---|---|---|
+| OD-1b | Env-var projection path: downward-API or informer | Receiver owner | Operator (deployment cost), Security reviewer (RBAC scope) | Before M15 design doc opens |
+| OD-2 | Process-rank regex default | Receiver owner | Operator (regex override workflow) | Before M15 alpha |
+| OD-8 | Bench harness coupling with M5 | M5 owner | M15 receiver owner | Before M5 starts harness work |
+| OD-9 | Filelog feature-gate posture | Receiver owner (BA-1 conditional) | Upstream-tracking lead | After OD-11 lands |
+| OD-10 | Binary-size delta measurement | Receiver owner | Project lead (binary-size budget) | After BA-1 vs BA-2 spike |
+| OD-11 | Build-approach (BA-0..BA-4) | **Milestone-planning lead** | M16 owner, O2 stakeholder, Receiver owner | **Blocking; resolve in 1 week per §15.6** |
+| OD-12 (new) | O4 namespace posture (hold/hedge/concede) | **O4 owner** per NORTHSTARS.md line 204 | Project lead, M13/M14/M18 owners (cross-receiver join contract) | Before any rubric edit affecting `gen_ai.training.*` |
+
+"OD-12" is added in this pass to make the namespace decision an
+explicit open item rather than embedded in §7's narrative. The §11
+R-1 row references this OD.
+| OD-3 | RESOLVED | Unchanged: native tracecore processor. |
+| OD-4 | OPEN (re-opened) | OTTL `transformprocessor` is upstream-only. Native tracecore processor with `regexp` is the new default unless BA-1 lands first. |
+| OD-5 | OPEN (re-opened) | bbolt vs JSON depends on BA-1 vs BA-2/BA-3. Under BA-1, keep R-6 (file_storage). Under BA-2/BA-3, revert to rubric's JSON cursor. |
+| OD-7 | RESOLVED → SUPERSEDED by OD-11 | "Factory-only vs Go embedding" was an upstream-OTel framing; the actual question is now BA-1 vs BA-2 vs BA-3. |
+
+### 15.6 How to resolve OD-11 in one week
+
+OD-11 (build approach) is the load-bearing open decision blocking
+six other ODs (OD-1b informer choice, OD-3 rate-limit processor, OD-4
+OTTL vs native, OD-5 cursor backend, OD-9 feature gates, OD-10 binary
+size). The decision tree below collapses the stakeholder meeting to
+about three working days plus one async review.
+
+**Day 1: BA-1 feasibility spike (1 day).**
+Owner: Receiver implementer. Goal: prove or disprove that a
+`pipeline.OTelReceiverFactory` adapter can wrap an upstream
+`receiver.Factory` cleanly. Deliverable: 200-LoC adapter skeleton
+that satisfies the tracecore `pipeline.ReceiverFactory` interface and
+delegates to upstream-OTel calls. Pass = adapter compiles and unit-tests
+against a no-op upstream factory. Fail = field-mapping mismatch that
+requires invasive changes to `internal/pipeline/`. Per-day-cap: if the
+spike takes >1 day, BA-1 is implicitly downgraded.
+
+**Day 2: M16 owner async sign-off (~2 hours total).**
+The doc has the evidence (MILESTONES.md lines 377-395, §15.1 footnote)
+that M16 *could* go either tracecore-native or upstream-`prometheusreceiver`-via-BA-1.
+Ask M16's owner: "If BA-1 lands as M15 infra, would you build M16 on
+top of it, or build M16 native?" Two-line answer is sufficient. If
+yes, BA-1 amortizes (and is the recommended path). If no, BA-1 amortizes
+only to M15 and BA-2 is the cheaper alternative.
+
+**Day 3: stakeholder decision (1 hour).**
+M15 receiver owner + Project lead + O2 stakeholder + (optional) M16
+owner. Decision matrix:
+
+| M16 owner says | BA-1 spike result | Decision |
+|---|---|---|
+| Will use BA-1 | Pass | **BA-1.** Amortization confirmed; ship adapter as M15-precursor work. |
+| Will use BA-1 | Fail | **BA-2.** Adapter is too costly; port `fileconsumer` natively. M16 also goes native. |
+| Will go native | Pass | **BA-2.** Adapter only serves M15; not worth standing infrastructure for one user. |
+| Will go native | Fail | **BA-2 or BA-3.** Both options for M15-only-cost; lean BA-2 for prior-art benefit. |
+| No answer | Pass | **BA-2 with adapter spike preserved.** Future M16 owner can retroactively adopt. |
+| No answer | Fail | **BA-2.** Default to the tractable option. |
+
+**BA-0 and BA-4 as escape hatches.**
+- BA-0 (sidecar otelcol) becomes the recommended path if the Day 1
+  spike fails AND O2 stakeholder accepts the additional pod-per-node
+  footprint AND operator UX cost is acceptable. Lower likelihood;
+  needs explicit stakeholder agreement.
+- BA-4 (defer M15) becomes the recommended path if the Day 1 spike
+  succeeds but Day 2 + Day 3 leave BA-1 amortization ambiguous AND
+  milestone-planning lead prefers to ship adapter-as-infrastructure
+  first. Lower likelihood; needs explicit milestone-planning sign-off.
+
+**Day 4 (async): write the design-doc resolution.**
+Receiver owner drafts the single-paragraph "we chose X because Y"
+section that locks the choice. Reviewers async-sign. Design phase
+proceeds.
+
+**Fallback rule for stakeholder unavailability (per Reviewer B P1):**
+If M16's owner is unavailable within the Day 2 window, the Receiver
+owner defaults to BA-2 (port `fileconsumer`). Rationale: BA-2's cost is
+borne by tracecore unilaterally; M16 owner can later retroactively
+adopt the same model. Choosing BA-2 by default never blocks M16; the
+inverse (defaulting to BA-1) commits adapter infrastructure that may
+not be amortized.
+
+### 15.4 Why this wasn't caught earlier
+
+The round-1 internal-repo agent surveyed the k8sevents receiver
+structurally (lifecycle, config, factory pattern, RBAC golden) but
+**did not surface that the `pipeline.ReceiverFactory` interface is
+tracecore-owned, not upstream-OTel-derived**. The factory.go file
+header comment names "OTel component.Settings shape at v1.55.0" as the
+mirror reference, which made the interface look like it might be the
+upstream one. Lesson: when relaying interface-membership claims through
+a sub-agent, verify the actual import path of the interface, not just
+its shape.
+
+This is exactly the kind of layer-of-indirection error my §1 confidence
+self-assessment flagged for §8 ("trusted the agent's report, not read
+the source"). Reading the source on the confidence-raising pass caught
+it.
+
+### 15.5 Updated confidence
+
+| Section | Pre-correction | Post-correction |
+|---|---|---|
+| §2 CRI format | 90% | 90% |
+| §3 Rotation mechanics | 90% | 90% |
+| §4 Tailer strategy | 75% | 75% |
+| §5 Build approach (Depend) | 70% | **20%** (recommendation was based on a false premise) |
+| §6 Pod attribution | 80% | 80% (substance unchanged; implementation owner shifts) |
+| §7 SemConv namespace | 90% | 90% (re-verified naming.md) |
+| §8 Internal repo prior art | 65% | **88%** (source read; agent claims validated; interface ownership corrected) |
+| §11 Rubric edits | 85% | **75%** (R-6 conditional; R-3 unchanged; others fine) |
+| §13.11 Pipeline config | 80% | **35%** (illustrative-only) |
+| §14 OTTL recipe | 50% | **20%** (cannot run in tracecore's pipeline as-shown) |
+| §15 Build-approach correction | — | 85% (load-bearing new finding) |
+
+Overall: confidence shifts from ~75% to roughly ~70%, with the gain in
+some sections offset by the build-approach correction. The doc is now
+substantively more *correct*; the lower headline number reflects honest
+re-assessment, not regression.
+
+## 16. Security threat model (deferred from §9)
+
+§9 named the chart-and-RBAC additions M15 requires but did not reason
+about the trust implications. The adversarial review correctly flagged
+this as a load-bearing gap. The threats below are not exhaustive; they
+are the surface a design-doc threat-model section should expand on.
+
+### 16.0 Adversary model
+
+The threat-model lens for M15. Without a named adversary, "mitigation"
+is performative.
+
+**Assets:**
+1. Log content from every pod on the node (read).
+2. Pod-spec env vars (read, possibly secret-bearing) via the informer
+   if OD-1b Path B is chosen.
+3. Cursor state at `/var/lib/tracecore/container_stdout/` (read/write,
+   host-local).
+4. The tracecore binary itself on each node.
+
+**Trust boundary:**
+The tracecore-binary process running as the DaemonSet pod is inside
+the trust boundary. Anything outside the binary's process address
+space, including all co-tenant pods, all in-container processes of
+non-tracecore pods, and the apiserver, is outside.
+
+**Adversaries:**
+
+| ID | Adversary | Capability | Goal |
+|---|---|---|---|
+| A-1 | **Co-tenant pod, no node-level access** | Can run arbitrary code in a sibling pod on the same node. Cannot read tracecore's process memory or `/var/lib/tracecore`. | Read another pod's logs via M15 as a confused-deputy attack. |
+| A-2 | **Pod with crafted name** | Can request to create a pod with a name that contains shell metacharacters / path separators (Kubernetes name rules prevent most of this, but not all). | Trigger path-traversal in M15's filename parser. |
+| A-3 | **In-container reader of FD 1** | Can run `cat /proc/1/fd/1` or `tee` inside their own container, triggering containerd #11149. | Silently drop their own logs to evade M15-based detection. Within their own trust boundary; not an attack on others. |
+| A-4 | **High-volume logger** | Can emit 1M+ lines/s to stdout. | Exhaust receiver resources (rate-limit drops, fingerprint cardinality, cursor-write FS pressure). DoS the receiver or sibling tracecore-binary functions. |
+| A-5 | **Compromised tracecore image** | Replaced the published image SHA. | Exfiltrate every log on every node. Full compromise of all assets. |
+| A-6 | **Compromised supply chain** | Injected malicious dep into tracecore's `go.mod` (filelog, `pkg/stanza`, bbolt, fsnotify, etc.). | Same as A-5 but with smaller blast radius (one dep). |
+| A-7 | **Operator with kubeapi access** | Can patch the DaemonSet pod's env or config. | Disable the receiver, alter the namespace allowlist, exfiltrate cursor. Equivalent to legitimate admin; out-of-scope by definition. |
+
+**Out of scope:** node-level attackers (already have root on the node;
+M15 adds no surface), apiserver attackers (compromised cluster control
+plane is a project-level concern, not M15-local), DoS against the
+kubelet itself (not M15's path).
+
+**Mitigations matrix:**
+
+| Adversary | Primary mitigation | Residual risk |
+|---|---|---|
+| A-1 | Co-tenant cannot read M15's process or `/var/lib/tracecore` because file permissions + PSS `restricted` prevent privilege escalation. The receiver does not expose an inbound network surface that a co-tenant could query. | Low. Co-tenant can still read its own logs and the kubelet log surface they were already allowed to see. |
+| A-2 | §2 "split on last two underscores" defensive parse + Kubernetes name validation (RFC 1123 label). | Low. A non-conformant CRD-created object would have to bypass kubelet's own object validation, which is a cluster-level break. |
+| A-3 | Cannot be mitigated at M15 (root cause is in containerd). README enumerates the failure. | Medium for self-tee pattern; low for standard workloads. |
+| A-4 | Per-key token bucket (OD-3), bounded channel (k8sevents pattern), cursor compaction cap (§16.2), informer cardinality cap (§13.4 + §16.3). | Sustained DoS at world-size > rate-limiter capacity degrades the receiver gracefully (drops + degraded mode), does not crash. |
+| A-5 | M3 reproducible-build + SBOM + cosign chain. Image SHA pinning at deploy time. | High by design; the binary's compromise compromises all assets. No M15-local mitigation. |
+| A-6 | `go.mod` checksum DB, `pkg/stanza` version pin, Renovate-style automated dep audit. | Medium. Standard supply-chain risk. |
+
+### 16.1 Cluster-wide log-read surface from `/var/log` hostPath
+
+A read-only `hostPath: /var/log` mount on the tracecore DaemonSet
+gives the receiver process **read access to every container's stdout
+on the node**, including kube-system pods (kube-apiserver client
+errors, kubelet, scheduler), every tenant's workload, and any sidecar
+that logs secrets to stdout (token-rotator pods, image-pull-secret
+controllers, mTLS-injection sidecars). On a multi-tenant cluster this
+is **read-equivalent to cluster-admin for log content**. Effective
+controls:
+
+- **Namespace allowlist** at the receiver level
+  (`include_namespaces` config), even though the filesystem read is
+  unrestricted. The receiver MUST filter at the source-of-truth level
+  (the file path it opens), not just at the emission level, to keep
+  records out of the in-process channel entirely.
+- **Document the trust boundary** explicitly in the receiver README:
+  operators deploying M15 are granting the receiver-binary equivalent
+  of read-only access to every container's logs. Single-tenant
+  clusters: low risk. Multi-tenant: needs policy review.
+- **Avoid `/var/log/containers/`** for tailing if it adds symlink-
+  resolution attack surface. Direct `/var/log/pods/**/*.log` reads
+  are simpler and equally functional per §2.
+- **No supplementary capabilities.** No `CAP_SYS_PTRACE`, no hostPID,
+  no privileged container. The file read works at the kubelet's
+  log-group fsGroup with no escalation.
+
+### 16.2 hostPath write surface
+
+`/var/lib/tracecore/container_stdout/` (cursor persistence) with
+`DirectoryOrCreate` writes to host filesystem **without size limits**.
+Runaway bbolt growth (e.g. corrupted DB triggering rebuild loops,
+fingerprint cardinality explosion under churn) can fill host root FS,
+which evicts every pod on the node. Effective controls:
+
+- **Bound cursor DB size**. Either via filelog's `compaction.on_start`
+  (BA-1) or via a tracecore-native LRU eviction policy capped at a
+  configurable byte limit (BA-2/3). Default 100 MiB on a node-local
+  bbolt is a reasonable starting budget.
+- **emptyDir is not viable** for this volume because cursor must
+  survive pod restart. The hostPath is structurally required.
+- **Surface the FS-usage metric** to the receiver's self-telemetry
+  surface so tracecore alerting (not just host-side) can catch growth
+  before it evicts pods.
+
+### 16.3 Pod-list/watch RBAC scope
+
+Under §15.1 BA-2/BA-3 the receiver embeds a Pod informer for env-var
+projection (OD-1b Path B) or k8sattributes-equivalent metadata
+hydration. The RBAC scope choice is load-bearing:
+
+- **Cluster-wide pod list/watch** is the simplest and lets the receiver
+  attribute logs from any pod whose container the tailer sees. It
+  also means the receiver SA can read every PodSpec in the cluster,
+  including env vars that may contain secrets (Kubernetes does not
+  prevent operators from putting tokens in env). Trust boundary
+  equivalent to cluster-admin read on PodSpec.
+- **Namespace-scoped** requires one informer per allowed namespace,
+  more RBAC machinery, and breaks the "single SharedInformer per node"
+  pattern from §13.4. But it caps the blast radius if the receiver is
+  compromised.
+- **Node-scoped via FieldSelector** is the right answer, mirroring the
+  rubric: `FieldSelector=spec.nodeName=$NODE_NAME` on a cluster-wide
+  pod informer reduces apiserver load and receiver attack surface to
+  pods on this node. RBAC remains cluster-wide list/watch (k8s does
+  not scope by node), but watched data is filtered server-side.
+
+### 16.4 seccomp / AppArmor delta from M5b
+
+M5b ships a Pod Security Standard (PSS) `restricted` policy with the
+DaemonSet running `runAsNonRoot`, `RuntimeDefault` seccomp,
+`allowPrivilegeEscalation: false`. M15's deltas:
+
+- **fsGroup matching kubelet log group.** Distros vary (often `root`
+  via 0, sometimes a kubelet-specific group). Configurable in
+  `values.yaml` with a default sentinel and an operator-override.
+- **hostPath read-only `/var/log`** is allowed by PSS `restricted`
+  *if* the policy's `volumes` allowlist includes `hostPath`. Most
+  hardened clusters explicitly disallow hostPath — operators will need
+  a policy exception, which is the practical cost of any node-local
+  log tailer.
+- **No `procMount: Unmasked`, no `hostPID`, no `hostIPC`**. M15 stays
+  inside `restricted` modulo the hostPath exception.
+
+### 16.4a Env-var redaction (OD-1b Path B only)
+
+If OD-1b Path B (tracecore informer reads Pod env) is selected,
+`Pod.spec.containers[].env` is a credential-leak surface. Kubernetes
+does not prevent operators from setting `AWS_SECRET_ACCESS_KEY`,
+`DATABASE_PASSWORD`, or `OPENAI_API_KEY` as PodSpec env. If M15
+naively projects every env var, those values land in attributes
+exported downstream.
+
+Mitigations:
+
+- **Allowlist mode (recommended).** Operators name the env vars to
+  project (e.g., `["RANK", "WORLD_SIZE", "LOCAL_RANK",
+  "TORCHELASTIC_RUN_ID", "JOB_ID"]`). Default empty; missing env vars
+  silently skipped. Cannot leak unnamed env.
+- **Pattern blocklist (defense in depth).** Block any env-var name
+  matching `(?i).*(SECRET|TOKEN|PASSWORD|KEY|CREDENTIAL).*` even if
+  allowlisted; emit `IncError(KindEnvRedacted)` for observability.
+- **Value-length cap.** Env values are length-capped at, say, 256
+  bytes before projection. Long values are nearly always credentials
+  or paths; rank-style values are at most a few bytes.
+
+**Body-content redaction is out of scope for M15.** A training script
+that prints AWS credentials to stdout is the operator's responsibility
+to handle (typically via container-level secret masking or
+stdout-aware log scrubbers upstream). M15 README documents this as a
+non-goal.
+
+### 16.5 Trust boundary summary
+
+Operators deploying M15 are granting the tracecore-binary on each node
+the practical equivalent of:
+1. Read-only access to every container's stdout (`/var/log`).
+2. Read-only access to PodSpec for pods on the node (via informer).
+3. Read/write access to a host-local bbolt at
+   `/var/lib/tracecore/container_stdout/`.
+
+None of these alone is a privilege escalation. Combined, the binary's
+compromise would leak log content cluster-wide. The mitigation surface
+is image provenance + the existing M3 reproducible-build + SBOM +
+cosign chain. README must spell this out so operators don't deploy
+M15 to multi-tenant clusters without a policy review.
+
+## 17. Failure-mode coverage (deferred from §13)
+
+### 17.0 Receiver-runtime contract
+
+**Asserted: every failure mode below preserves the
+`pipeline.Receiver` runtime contract.** Concretely:
+
+- The receiver process does NOT panic out of any failure path. Per
+  PRINCIPLES §1 ("never crash the workload"), every per-file
+  goroutine wraps `defer recover()`; malformed CRI lines do not
+  cascade.
+- The receiver does NOT block kubelet's SIGTERM beyond the 1s phase-1
+  shutdown budget (per the rubric).
+- The receiver continues to attempt forward progress in degraded
+  states; `SetDegraded(true)` is a signal, not a stop.
+- The receiver MUST surface every failure mode via `IncError(kind)`
+  with a canonical or receiver-local typed `Kind` constant. Untyped
+  error-string passing is forbidden (cardinality risk per
+  `internal/selftelemetry/interface.go:200`).
+
+The subsections below enumerate the failure modes; each row in §17.1
+through §17.7 satisfies the contract above. Receiver tests must
+verify the assertion holds for the named failure mode.
+
+The round-2 §13.1 dive covered kubelet's internal rotation failures
+but did not cover **receiver-side failure surfaces under realistic
+operational events**. The adversarial review correctly flagged the
+gap. Each failure mode below should map to a row in the receiver's
+RUNBOOK and (where the behavior is observable) a `Test*` identifier
+in FAILURE-MODES.md per §6 doc-check rubric.
+
+### 17.1 Node drain (`kubectl drain`)
+
+The tracecore DaemonSet pod is evicted with
+`terminationGracePeriodSeconds` (default 30 s). Receiver behavior:
+
+- Open per-file tailer goroutines should drain in-flight reads up to
+  the grace period, flush the cursor to disk, exit cleanly.
+- Records already in the bounded channel are best-effort flushed to
+  the exporter pipeline within the grace period; overflow drops on
+  exporter back-pressure rather than blocking shutdown.
+- Cursor durability across drain depends on whether the host volume
+  survives pod replacement. `hostPath` persists across pod restart on
+  the same node, so resume-after-drain is well-defined. The new pod
+  reads the cursor and resumes at the recorded offset; any records
+  the runtime wrote during the drain window are picked up.
+- Receiver MUST NOT block kubelet's grace-period termination. SIGTERM
+  → 1 s phase-1 budget per the rubric is appropriate.
+
+Test target: `TestContainerStdout_GracefulShutdown`.
+
+### 17.2 Kubelet restart (without node restart)
+
+Containerd / CRI-O keep writing during kubelet downtime (they own the
+log fd via `ReopenContainerLog`'s last call). Rotation cannot happen
+while kubelet is down (kubelet drives it per §13.1). When kubelet
+restarts:
+
+- Pre-restart writes accumulated in the live `0.log` past
+  `containerLogMaxSize` are still readable to the tailer.
+- Kubelet's first monitor tick after restart triggers a backlog of
+  rotations across all containers. Tailer's fingerprint-based rotation
+  detection should handle the burst without losing records.
+- The receiver's Pod informer disconnects from apiserver during the
+  kubelet-restart window only if it goes through kubelet (most do
+  not; they go to apiserver directly). So pod attribution remains
+  uninterrupted as long as apiserver is reachable.
+
+Test target: `TestContainerStdout_KubeletRestartBacklog`.
+
+### 17.3 Pod deletion
+
+Kubelet removes `/var/log/pods/<ns>_<pod>_<uid>/` after the
+TerminationGracePeriod elapses. Tailer behavior:
+
+- The per-file tailer holding an open fd on `0.log` reads until EOF
+  (POSIX rename / unlink leaves the fd valid).
+- After EOF, the tailer must close the fd, remove the corresponding
+  cursor entry from bbolt / JSON, and exit.
+- The directory-deletion event from the Pod informer is the canonical
+  trigger to garbage-collect the cursor. Without GC the cursor file
+  grows unboundedly under pod churn.
+
+Test target: `TestContainerStdout_PodDeletionCursorGC`.
+
+### 17.4 Pod eviction
+
+A subset of pod deletion: the pod object is removed but the container
+may have produced its final log lines just before eviction. The
+receiver needs to drain the file before the directory is removed.
+This is the M19 "pod evicted" pattern's reliance on M15 — M19 must see
+the tail of the evicted container's stdout to verify why it died.
+
+- Kubelet's deletion sequence: stop containers → delete container
+  filesystem → delete pod-log directory. There is a short window
+  between the last write and the directory removal.
+- Tailer's poll interval (default 200 ms in filelog) is fast enough
+  to catch the tail bytes if it gets one more tick before the
+  directory removal.
+- Edge case: very-short-lived containers that produce their last log
+  in <1 poll interval may have records that exist in `0.log` but the
+  tailer never reads them. Mitigation: fast-flush on directory-removal
+  event from the informer; force one final read on EOF before
+  cursor GC.
+
+Test target: `TestContainerStdout_PodEvictionTailFlush`.
+
+### 17.5 Sibling-receiver interaction on `/var/lib/tracecore/`
+
+k8sevents (M10), kernelevents (M9), and future receivers share
+`/var/lib/tracecore/`. M15's cursor lives at
+`/var/lib/tracecore/container_stdout/`. Concerns:
+
+- **Filesystem permissions.** Each receiver should own a subdirectory
+  with no cross-receiver write. Confirmed pattern matches: k8sevents
+  has no cursor today; if it adds one under M10's evolution, it goes
+  under `k8sevents/`, not the root.
+- **bbolt single-writer per file** (per §13.3). Multiple receivers
+  writing to **different** bbolt files in the same directory is fine.
+  Multiple receivers sharing a single bbolt file is not. Cursor-per-
+  receiver-subdirectory keeps the lock surface trivially isolated.
+- **Backup-on-corruption renames** (`<name>.<ISO8601>.backup`) can
+  produce filesystem-level garbage. Receiver should clean up backups
+  older than N days (configurable, default 7).
+
+Test target: `TestContainerStdout_SiblingReceiverIsolation`.
+
+### 17.6 Container-runtime crash / restart
+
+If containerd or CRI-O crashes and is restarted by systemd:
+
+- The runtime's open fd to `0.log` is closed at crash. Bytes in the
+  shim's stdout pipe buffer that hadn't yet been written are lost
+  (this is mostly orthogonal to #11149 but related — same shared-pipe
+  surface).
+- After restart, the runtime calls neither `ReopenContainerLog` nor
+  re-stat; it reopens via the path it had cached. The tailer sees a
+  brief absence of new writes followed by a resumption.
+
+Test target: not feasible at unit-test level; covered by chaos.yml
+integration if at all.
+
+### 17.7 Filesystem full at `/var/lib/tracecore/`
+
+If the host volume backing the cursor directory is full:
+
+- bbolt write fails. Filestorage's recovery path under BA-1 is
+  recreate-from-corruption; under BA-2/3, tracecore-owned cursor
+  format must handle ENOSPC gracefully.
+- Receiver MUST surface this via `IncError(KindCursorWriteFailed)`
+  and continue tailing in-memory; on next successful cursor write,
+  the offsets catch up. Loss-on-restart in this regime is bounded
+  by the time the FS was full.
+- Alerting on `KindCursorWriteFailed` is essential because the only
+  observable downstream symptom otherwise is "records re-played after
+  pod restart" (cursor not durable).
+
+Test target: `TestContainerStdout_CursorWriteFailureGraceful`.
+
+## 18. Rollout posture
+
+This section addresses the design-team / project-lead gap flagged by
+Reviewer B P1-13.
+
+### 18.1 Stability stage and default
+
+M15 ships at **alpha**, `receivers.containerstdout.enabled: false` by
+default in the Helm chart. Operators opt in per the alpha-receiver
+contract documented in `docs/STABILITY.md` (if missing, document the
+contract as part of M15's design doc):
+
+- Backward-compat is opt-in (per PRINCIPLES §11).
+- Config field names may rename through a 1-minor-version
+  deprecation; new names ship alongside old, old emits warning, old
+  is removed on next minor.
+- Attribute names follow the §7 namespace decision; if R-1 is later
+  conceded, deprecated emit happens via a `tracecore_compat` processor
+  with a 1-version overlap window.
+
+### 18.2 Coexistence with sibling receivers
+
+M15 coexists with kernelevents (M9, shipped) and k8sevents (M10,
+alpha) on the same DaemonSet pod. Boundaries:
+
+- **Filesystem scope:** kernelevents reads `/dev/kmsg` + journald;
+  M15 reads `/var/log/pods`; k8sevents reads apiserver only. Zero
+  shared file surface.
+- **Cursor directory:** M15 owns
+  `/var/lib/tracecore/container_stdout/`. kernelevents and k8sevents
+  do not have cursors today. Reserved sibling subdirectories prevent
+  any future cross-receiver write collision.
+- **Self-telemetry namespace:** all three receivers emit
+  `tracecore_receiver_*` metrics partitioned by `receiver_id`. Per
+  k8sevents' `KindBackpressureDrop` / `KindWatch` pattern (§8),
+  M15 introduces `KindRotationStalled` / `KindCursorWriteFailed`;
+  these MUST NOT alias any kernelevents or k8sevents kinds — verify at
+  PR time by grep.
+- **RBAC namespacing:** each receiver ships its own ClusterRole with
+  a unique name. M15 introduces
+  `tracecore-containerstdout-clusterrole`.
+
+### 18.3 Upgrade path across `pkg/stanza` BREAKING changes (BA-1 only)
+
+Under §15.1 BA-1, M15 depends on upstream `pkg/stanza` evolution.
+Per §13.3, ~3 BREAKING changes per 18 months, all in adjacent code
+(OTTL, windows event logs), none in `fileconsumer` surface. Tracecore's
+upgrade contract:
+
+- Pin contrib version in `go.mod`.
+- CHANGELOG entry at every contrib bump describes operator-visible
+  changes.
+- Feature-gate posture (OD-9): tracecore opts in to stable gates
+  (`filelog.decompressFingerprint`) by default; tracks alpha gates
+  (`filelog.protobufCheckpointEncoding`) but does not flip until
+  upstream marks beta. Default tracking matches upstream defaults to
+  minimize divergence.
+
+Under BA-2, M15 is decoupled from contrib churn at the receiver level
+but inherits any `fileconsumer` algorithmic improvements only via
+manual port. Document the upstream-port cadence in the receiver
+README.
+
+### 18.4 Migration from alternative loggers
+
+Operators currently using Fluent Bit / Vector / Promtail to tail
+`/var/log/pods` can run M15 alongside without conflict (read-only
+mounts, distinct cursor paths). Migration to M15-only is a deployment
+choice, not a contract requirement. The doc does not currently
+recommend M15 as a replacement for general-purpose log shippers (it
+is training-observability-focused per O1 scope); a future RFC could
+revisit this.
+
+## 19. Alerts catalog
+
+The §17 failure modes name test identifiers; this section names the
+corresponding alerts so operators can wire monitoring at deploy time.
+Per docs/STYLE-docs.md §5, every alert binds to a RUNBOOK section.
+
+| Alert name | Trigger | Severity | RUNBOOK | Notes |
+|---|---|---|---|---|
+| `M15RotationStalled` | `tracecore_receiver_errors_total{receiver_id="containerstdout", kind="rotation_stalled"} > 0` for 5m | Warning | RUNBOOK § Rotation stall | Kubelet rotation has not happened in 30s after `0.log` exceeded `containerLogMaxSize`. |
+| `M15CursorWriteFailed` | `tracecore_receiver_errors_total{receiver_id="containerstdout", kind="cursor_write_failed"} > 0` for 1m | Warning | RUNBOOK § Cursor write failure | Host FS at cursor dir failed; in-memory tailing continues but durability lost. |
+| `M15BackpressureDrop` | rate of `tracecore_receiver_errors_total{receiver_id="containerstdout", kind="backpressure_drop"}[5m] > 100` | Warning | RUNBOOK § Backpressure drop | Per-key rate-limit dropping records; investigate noisy pod or raise budget. |
+| `M15Degraded` | `tracecore_receiver_degraded_seconds_total{receiver_id="containerstdout"}` increasing | Warning | RUNBOOK § Degraded mode | Receiver in degraded state; any failure mode from §17 could be the cause. |
+| `M15PodInformerDisconnected` | `tracecore_receiver_errors_total{kind="watch"}` > 0 for 2m for receiver_id="containerstdout" | Warning | RUNBOOK § Pod informer disconnect | apiserver unreachable; attribution falls back to filepath-only. |
+| `M15Cardinality` | `tracecore_receiver_errors_total{kind="cardinality"}` rate > 0 | Critical | RUNBOOK § Cardinality cap | Fingerprint set or rank set exceeded cap; data loss. |
+| `M15FileStorageCorruption` *(BA-1 only)* | log-string match on `"Database corruption detected"` from `extension/file_storage` | Critical | RUNBOOK § Filestorage corruption | bbolt rebuild; offsets lost; resume-from-EOF until next write. |
+| `M15FdLeak` | host metric: open fds for tracecore process > 2× current pod count for 5m | Warning | RUNBOOK § fd hygiene | Per rubric line 372; investigate slow-closing tailers. |
+| `M15HighDroppedLines` | `rate(tracecore_dropped_lines_total{receiver_id="containerstdout"}[5m]) > 1000` | Warning | RUNBOOK § Rate-limit drops | Per-pod rate limit hit sustained; tune budget or investigate. |
+
+**Operator-side runbook prose** must be drafted as part of M15's
+RUNBOOK.md per the k8sevents template. Examples of triage steps per
+alert live in `components/receivers/k8sevents/RUNBOOK.md`.
+
+## 20. Overhead-budget methodology (deferred from OD-8)
+
+MILESTONES.md line 368 gates M15 at ≤0.10% CPU, ≤20 MB RSS, ≤0.3 Mbps
+egress. Without methodology, these numbers are unfalsifiable.
+
+### 20.1 Workload spec
+
+A reference workload that exercises M15's hot paths:
+
+- **Topology.** Single-node kind cluster, tracecore DaemonSet, 100
+  fixture pods.
+- **Per-pod log rate.** Configurable; default 100 lines/s × 256 B avg
+  line = 25.6 KB/s per pod → 2.56 MB/s aggregate on the node.
+- **Rotation cadence.** Each pod hits `containerLogMaxSize` ~ every
+  6.5 minutes at default. Aggregate ~15 rotations/min cluster-wide.
+- **Training-pattern coverage.** 10% of pods emit dataloader-format
+  lines (`time:` / `data:`); 10% emit JSON-structured logs; 80% emit
+  free-text.
+- **Pod churn.** 1 pod restart every 30s (kubelet-driven).
+- **Duration.** 60-minute steady-state measurement after a 10-minute
+  warmup.
+
+### 20.2 Measurement points
+
+- **CPU%:** cgroup-derived process CPU% averaged over the 60-min
+  window. Source:
+  `/sys/fs/cgroup/cpu.stat` for the tracecore container.
+- **RSS:** `/proc/self/status` `VmRSS` field sampled every 30s,
+  reported as p50 and p95.
+- **Egress Mbps:** OTLP-out bytes summed at the exporter, normalized
+  per second.
+
+### 20.3 Pass/fail
+
+- **CPU.** Window-average ≤0.10% of one CPU core. Hard fail at any
+  30s window exceeding 0.50%.
+- **RSS.** p95 ≤20 MB AND p99 ≤30 MB. Hard fail at any 30s sample
+  >50 MB.
+- **Egress.** Window-average ≤0.3 Mbps. Hard fail at any 30s window
+  >1 Mbps.
+
+### 20.4 Capacity-model extrapolation
+
+The reference workload sizes the receiver at ~25 MB RSS for 100 pods
+under the assumed rates. Naive linear extrapolation (not
+empirically validated): each tailed pod adds ~200 KB of `fileconsumer`
+state plus ~0.001% CPU. So a 200-pod node would land ~50 MB RSS /
+0.20% CPU — over the rubric's per-pod budget. Either: raise the
+budget proportionally to pod density, or design the receiver for an
+explicit 100-pod assumption. Recommend the latter (PRINCIPLES §13:
+do not over-engineer for the wide tail). Document the assumption in
+the receiver README.
+
+### 20.5 Owner
+
+M5 owns the harness. M15 contributes the workload spec (this section)
+and the fixture pods. The 60-min steady-state run is M5's CI gate
+once the harness ships.
+
+## 21. Follow-ups beyond research scope
+
+Items raised by Reviewers A / B / C that cannot be closed by more
+research alone. Tagged by requirement type so the design phase and
+project-lead can route each to the right owner.
+
+### 21.1 Requires stakeholder decision
+
+| # | Item | Owner | Notes |
+|---|---|---|---|
+| 1 | Resolve OD-11 (BA-0..BA-4 build approach) | Milestone-planning lead | Day-1-spike + Day-2 M16-owner async + Day-3 stakeholder meeting per §15.6. |
+| 2 | Resolve OD-12 (`gen_ai.training.*` namespace posture: hold / hedge / concede / re-activate O4) | O4 owner per NORTHSTARS.md line 204 | Load-bearing on §7.4 negative finding (no upstream PR exists). |
+| 3 | Confirm M16's build approach (will M16 use BA-1's adapter or go native?) | M16 owner | Two-line async answer suffices per §15.6 Day 2. |
+| 4 | Confirm O2 stakeholder accepts BA-0 sidecar pod footprint (if BA-1 spike fails) | O2 stakeholder | Conditional; only if §15.6 Day 1 spike fails. |
+| 5 | Sign off on §18.4 migration posture (no recommendation to replace Fluent Bit / Vector / Promtail) | Project lead | Statement of scope, not architectural. |
+| 6 | Decide OD-1b Path A vs Path B at design time | Receiver owner | Operator-facing trade-off (downward API requires opt-in vs informer requires RBAC). |
+
+### 21.2 Requires kind cluster / fixture infrastructure (no production data, no GPU)
+
+| # | Item | Notes |
+|---|---|---|
+| 7 | Empirical binary-size delta of `filelogreceiver` import | Needs `go build` with filelog linked; requires committing to BA-1 path. Estimated 15–30 MB unstripped (§13.3). |
+| 8 | Cross-pod fingerprint-collision property test | kind cluster + N pods with identical first-line prefixes, force rotation, assert no offset cross-pollination (§12 #13). |
+| 9 | Real-cluster overhead measurement at §20 workload spec | kind cluster + 100 fixture pods + 60-min steady-state. Validates rubric line 368. |
+| 10 | Containerd vs CRI-O reopen timing window | Instrumented kind cluster with both runtimes; measure rename→reopen gap (§13.1 follow-up). |
+| 11 | Container-runtime crash recovery test (§17.6) | chaos.yml integration; SIGKILL containerd, observe receiver behavior. |
+| 12 | OTTL recipe validation against `otelcol validate` (BA-1 only) | Download `otelcol-contrib`, instantiate the §13.11 pipeline, verify all components resolve. Only meaningful under BA-1 (under BA-2/3 the recipe is reimplemented natively). |
+| 13 | `lsof`-golden fd-hygiene test | kind cluster + pod churn; assert `lsof -p $(pidof tracecore)` shows ≤ 2× pod-count entries (rubric line 372). |
+
+None of these require production data (synthetic kind-cluster fixtures
+suffice). None require GPU.
+
+### 21.3 Requires external action / contribution
+
+| # | Item | Owner | Notes |
+|---|---|---|---|
+| 14 | File the upstream `gen_ai.training.*` draft PR | O4 owner | §7.4 finding: no PR exists as of 2026-05-19. Closes the §7.3 "hold the bet" posture's evidence gap. |
+| 15 | Engage semantic-conventions-genai issue #88 (`rl.*` proposal) for scope overlap | O4 owner / O7 governance | Determines whether training observability lands under `rl.*` or `gen_ai.training.*` upstream. |
+| 16 | Small-platform training-namespace survey (ClearML, Determined.AI, Skypilot, Anyscale) | Research follow-up | Reviewer C P2-22; weakens or confirms §13.12 "no shared namespace" finding. |
+| 17 | Path-traversal hardening analysis on pod-name filepath parse | Security reviewer | Reviewer C P2-23; defensive last-two-underscores split is documented (§2) but no explicit attack-tree. |
+
+### 21.4 Consciously deferred (out-of-scope for M15 v0)
+
+| # | Item | Reason |
+|---|---|---|
+| 18 | MPI `OMPI_COMM_WORLD_RANK` extraction | Requires `CAP_SYS_PTRACE` + hostPID, conflicts with M5b minimal-privilege policy. Future receiver iteration. |
+| 19 | PodLogs API KEP tracking | KEP-3059 dead; no replacement filed. Re-check in ~6 months (§13.9). |
+| 20 | Body-content credential redaction | Operator responsibility per §16.4a. M15 README documents as non-goal. |
+
+### 21.5 What about GPU? Production data?
+
+**No M15 follow-up requires GPU.** M15 tails container stdout; the
+GPU surface (DCGM, NCCL FlightRecorder, Kineto) lives in Lane 6,
+which is GPU-hardware-gated and does not block M15.
+
+**No M15 follow-up requires production data.** Every measurement need
+above is satisfiable on a kind cluster with synthetic fixtures, per
+PRINCIPLES §6's "test against real components, not mocks" balanced
+against the M3 reproducible-build constraint that production data
+must not enter CI.
+
+## Sources
+
+Primary references consulted during this research pass (all current as
+of 2026-05-19):
+
+- Kubernetes logging architecture: <https://kubernetes.io/docs/concepts/cluster-administration/logging/>
+- CRI logging design proposal: <https://github.com/kubernetes/design-proposals-archive/blob/main/node/kubelet-cri-logging.md>
+- Kubelet log manager source: `pkg/kubelet/logs/container_log_manager.go`
+- Kubelet symlink construction: `pkg/kubelet/kuberuntime/legacy.go`
+- Kubernetes object naming rules: <https://kubernetes.io/docs/concepts/overview/working-with-objects/names/>
+- containerd #11149: <https://github.com/containerd/containerd/issues/11149>
+- OTel filelog receiver: <https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver>
+- OTel container operator: <https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/container.md>
+- OTel `pkg/stanza/fileconsumer`: <https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/stanza/fileconsumer>
+- OTel `k8sattributesprocessor`: <https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/k8sattributesprocessor/README.md>
+- OTel SemConv naming rule: <https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/naming.md>
+- OTel SemConv GenAI: <https://github.com/open-telemetry/semantic-conventions-genai>
+- RL SemConv proposal: <https://github.com/open-telemetry/semantic-conventions-genai/issues/88>
+- torchrun env vars: <https://github.com/pytorch/pytorch/blob/main/torch/distributed/run.py>
+- Kubeflow PyTorchJob env injection: <https://github.com/kubeflow/training-operator/blob/release-1.9/pkg/controller.v1/pytorch/envvar.go>
+- Kubeflow MPI Operator: <https://github.com/kubeflow/mpi-operator/blob/master/pkg/controller/mpi_job_controller.go>
+- torchx Kubernetes scheduler: <https://github.com/pytorch/torchx/blob/main/torchx/schedulers/kubernetes_scheduler.py>
+- Ray Train env injection: <https://github.com/ray-project/ray/blob/master/python/ray/train/torch/config.py>
+- JobSet concepts: <https://github.com/kubernetes-sigs/jobset/blob/main/site/content/en/docs/concepts/_index.md>
+- torchvision MetricLogger: <https://github.com/pytorch/vision/blob/main/references/classification/utils.py>
+- detectron2 events: <https://github.com/facebookresearch/detectron2/blob/main/detectron2/utils/events.py>
+- PyTorch Lightning SimpleProfiler: <https://github.com/Lightning-AI/pytorch-lightning/blob/master/src/lightning/pytorch/profilers/simple.py>
+- NeMo TimingCallback: <https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/utils/exp_manager.py>
+- HF Trainer speed_metrics: <https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_utils.py>
+- MosaicML Composer SpeedMonitor: <https://github.com/mosaicml/composer/blob/main/composer/callbacks/speed_monitor.py>
+
+Round-2 additions:
+
+- CRI v1 proto (no log RPC): <https://github.com/kubernetes/cri-api/blob/master/pkg/apis/runtime/v1/api.proto>
+- kubelet `/containerLogs` handler: `pkg/kubelet/server/server.go`, `pkg/kubelet/kuberuntime/kuberuntime_container.go`, `pkg/kubelet/kuberuntime/logs/logs.go`
+- KubeletConfiguration validation: `pkg/kubelet/apis/config/validation/validation.go`
+- OTel filelog README: <https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver>
+- OTel `file_storage` extension: <https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/storage/filestorage>
+- OTel container operator: <https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/container.md>
+- otelcol-k8s distribution manifest: <https://github.com/open-telemetry/opentelemetry-collector-releases/blob/main/distributions/otelcol-k8s/manifest.yaml>
+- contrib CHANGELOG (stanza/filelog churn): <https://raw.githubusercontent.com/open-telemetry/opentelemetry-collector-contrib/main/CHANGELOG.md>
+- lit-GPT pretrain.py: <https://github.com/Lightning-AI/litgpt/blob/main/litgpt/pretrain.py>
+- lit-GPT issue #1110 (sample logs): <https://github.com/Lightning-AI/litgpt/issues/1110>
+- lit-GPT issue #1607 (sample logs): <https://github.com/Lightning-AI/litgpt/issues/1607>
+- torch/distributed/run.py (standalone branch): <https://github.com/pytorch/pytorch/blob/main/torch/distributed/run.py>
+
+Round-3 additions:
+
+- Kubelet defaults: <https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/v1beta1/defaults.go>
+- KEP-3059 (closed): <https://github.com/kubernetes/enhancements/issues/3059>
+- `file_storage` extension source: `extension/storage/filestorage/extension.go`, `factory.go`
+- `k8sattributesprocessor` overlap behavior: <https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/k8sattributesprocessor/processor.go>
+- Triton metrics: <https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md>
+- SageMaker CloudWatch monitoring: <https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html>
+- Bedrock model-customization monitor: <https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-monitor.html>
+- W&B GPU asset source: <https://github.com/wandb/wandb/blob/main/wandb/sdk/internal/system/assets/gpu.py>
+- MLflow system metrics: <https://github.com/mlflow/mlflow/tree/master/mlflow/system_metrics/metrics>
+- OTel process registry (no `distributed.*`): <https://github.com/open-telemetry/semantic-conventions/blob/main/model/process/registry.yaml>
+- OTel system-metrics spec (`system.*` host-only): <https://github.com/open-telemetry/semantic-conventions/blob/main/docs/system/system-metrics.md>
+- OTel community SIG list (no ML SIG): <https://github.com/open-telemetry/community>
+- OTTL ExtractPatterns / IsMatch: <https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/ottl/ottlfuncs/README.md>
+- transformprocessor README: <https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/transformprocessor/README.md>