You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking issue for NORTHSTARS O4 ("Standards" objective): file the first gen_ai.training.* semconv PR upstream at open-telemetry/semantic-conventions-genai.
Recruit one non-tracecore co-author or sponsor from a SIG-regular vendor (Pyroscope, Datadog, Honeycomb, Microsoft, Google, Splunk approvers per CONTRIBUTING.md).
Reconcile two internal naming inconsistencies before PR-1:
gen_ai.training.job_id (current emit) → gen_ai.training.job.id (dotted, consistent with gen_ai.tool.call.id).
gen_ai.training.step_id (M14 plan) → gen_ai.training.step (int counter, consistent with proposal).
PR-1 — minimal viable scope (M1 target)
gen_ai.training.run.id (string, development) — logical training run id, stable across restart/pre-emption.
SIG declines parallel gen_ai.training.* → tracecore adopts rl.* for the overlap subset and keeps internal names for the rest, marked alpha per docs/ATTRIBUTES.md soft-lock policy.
O4 stalls entirely past M12 → tracecore continues emitting the namespace as tracecore-stable and engages vendor partners directly for adoption ahead of the spec.
Tracking issue for NORTHSTARS O4 ("Standards" objective): file the first
gen_ai.training.*semconv PR upstream atopen-telemetry/semantic-conventions-genai.Roadmap (binding):
docs/standards-roadmap.md.Technical proposal body (ready to copy-paste into the PR description):
docs/proposals/gen-ai-training-semconv.md.Hero KPI gate
External (non-tracecore) implementations of
gen_ai.training.*:Pre-filing prerequisites
semantic-conventions-genaiIssue #88 (rl.*proposal) with a scope-overlap comment — alignment or graceful coexistence stance.gen_ai.training.job_id(current emit) →gen_ai.training.job.id(dotted, consistent withgen_ai.tool.call.id).gen_ai.training.step_id(M14 plan) →gen_ai.training.step(int counter, consistent with proposal).PR-1 — minimal viable scope (M1 target)
gen_ai.training.run.id(string, development) — logical training run id, stable across restart/pre-emption.gen_ai.training.job.id(string, development) — orchestrator-assigned job id.gen_ai.training.rank(int, development) — global zero-indexed process rank.gen_ai.training.world_size(int, development) — total process count (constant perjob.id).gen_ai.training.local_rank(int, development) — per-node rank.PR-2 — step and collective (M3 target)
gen_ai.training.step(int, development) — current optimizer step (monotonic perrun.id).gen_ai.training.collective.op(string, development) — collective op enum (all_reduce,all_gather,reduce_scatter,broadcast,send,recv,barrier).gen_ai.training.collective.tag(string, development) — application-supplied tag distinguishing collectives in a step.gen_ai.training.group_rank(int, development) — rank within a parallelism group (DP/TP/PP/EP).gen_ai.training.group_kind(string, development) — required whengroup_rankpresent; enumdata/tensor/pipeline/expert.Cadence
docs/standards-roadmap.md§6 meeting log on each attend.Risk fallbacks (per O4 Operating Rule "External implementations matter, not the chair seat")
rl.*proposal (Issue [chore] branch-protection: drop strict status-checks for solo #88) lands at top level → tracecore engages on [chore] branch-protection: drop strict status-checks for solo #88 arguing nesting undergen_ai.training.rl.*.gen_ai.training.*→ tracecore adoptsrl.*for the overlap subset and keeps internal names for the rest, marked alpha perdocs/ATTRIBUTES.mdsoft-lock policy.Cross-ref — in-repo dependents
rankjoinprocessorrank stamping (live, M19)gen_ai.training.{rank,job_id}(live, RFC-0013 §3)job_id→job.idmigration followsgen_ai.training.world_size(deferred)gen_ai.training.step_id→.step(planned)patterndetectorprocessornccl_hangrank fallback chain (live)nccl.rank/nccl.fr.rankpreservedClosing condition: PR-1 merged upstream + ≥1 external implementation observed (verified per
docs/adoption-pipeline.mdmethodology).