Skip to content

[m14] kineto profiler receiver: RFC-0012 + pkg/kineto/ parser + receiver + bench scaffold#107

Closed
trilamsr wants to merge 15 commits into
mainfrom
worktree-m14-kineto-design-spec
Closed

[m14] kineto profiler receiver: RFC-0012 + pkg/kineto/ parser + receiver + bench scaffold#107
trilamsr wants to merge 15 commits into
mainfrom
worktree-m14-kineto-design-spec

Conversation

@trilamsr

Copy link
Copy Markdown
Contributor

Summary

Ships M14 (Kineto profiler receiver) end-to-end across the four phases of the M14 plan (plan is gitignored locally; design lives in RFC-0012 and the spec). Branch is one logical PR per autonomous-run constraint; split into 4 cuts before merge if preferred.

  • RFC-0012 design-locks the receiver. Originally drafted as RFC-0010 but renumbered after [docs] rfc-0010: containerstdout receiver scope (M15, alpha; design-locked draft) #94 and an in-flight M16 RFC claimed the slot.
  • MILESTONES.md M14 rubric corrections: step-id source (ProfilerStep#N interval, not args.Iteration — falsified during design review against PyTorch source); sampling determinism (per-(rank, step) repeated-lookup equality, not cross-rank set equality; 10000 mod granularity); fr_trace analogy disclaimer.
  • pkg/kineto/ streaming Chrome-trace parser. stdlib-only, no internal/ imports. Typed sentinel errors, Synthesize deterministic generator, toy_2step checked-in fixture + golden + SHA256SUMS, FuzzParseKinetoTrace 30s in-tree, BenchmarkParse_50MB advisory. make generate-fixtures-check wired into make ci.
  • components/receivers/kineto/ receiver: single-pass ProfilerStep#N stack + complete-phase interval tracker, /proc/<pid>/environ rank discovery (with filename-fallback for tensorboard_trace_handler worker prefix; with receiver-env fallback), deterministic per-(rank, step) sampling, optional cpu_op aggregation within a step, fsnotify IN_CLOSE_WRITE trigger (gz-aware glob), safe.Call wrap, degraded-state machine, README + RUNBOOK.
  • Factory registered in components.yaml; cmd/tracecore/components.go regenerated.
  • MILESTONES.md M14 flipped ☐ → ☑ alpha; functional rubrics , non-functional pending PR D wall-time runs.
  • Bench scaffold: bench/overhead/kineto_bench_test.go (10-min CPU gate), soak_test.go (//go:build soak, 100 × 50MB RSS gate), stress_2gb_test.go (//go:build !race, 2GB HeapAlloc ceiling). Wall-time runs deferred — code lands, ops can flip rubrics once gates run.

Verified during design (May 2026)

Cited verbatim in RFC-0012:

  • pytorch/pytorch torch/profiler/profiler.py:44, 651, 655, 1165-1169, 1230-1233PROFILER_STEP_NAME, default filename {worker}.{ts_ns}.pt.trace.json, direct-write export_chrome_trace (no tmp+rename — IN_CLOSE_WRITE is the safe trigger), record_function(\"ProfilerStep#N\").
  • pytorch/kineto libkineto/src/output_json.cpp:341-352, 543, 557, 580, 593 — top-level key order; traceEvents written between handleTraceStart and finalizeTrace; no top-level samples or stackFrames (streaming-from-the-top decoder is safe).

Test plan

  • `go test -race ./pkg/kineto/...` — PASS
  • `go test -race -short ./components/receivers/kineto/...` — PASS (all sub-tests: config, step tracker, rank, emit, sampling, ingest golden, aggregation, factory, shutdown, degraded)
  • `go test -fuzz=FuzzParseKinetoTrace -fuzztime 30s ./pkg/kineto/` — 506k execs, no failures, 154 corpus entries
  • `make lint` — 0 issues
  • `make vet` — clean
  • `make doc-check` — banned-phrase + section lint clean
  • `make generate-fixtures-check` — no fixture drift
  • `make generate-check` — `cmd/tracecore/components.go` regenerated, no drift
  • `go list -deps ./pkg/kineto/ | grep tracecoreai/tracecore/internal` — empty (no `internal/` leak from `pkg/`)
  • `go build ./cmd/tracecore/` — kineto factory linked
  • PR D wall-time runs (10-min CPU bench, 100-file soak, 2GB stress) — deferred; code lands, ops run gates and flip non-functional rubrics

Carry-forward

  • Non-functional rubric ☑ flips (CPU ≤0.50%, egress ≤0.5Mbps, RSS ≤30MB, peak HeapAlloc <80MB) after PR D wall-time runs.
  • `tools/kineto-lint/strace_test.go` (read-only assertion). Trigger: any change to `ingest.go`'s `os.Open` line.
  • `pattern_consumer_test.go` compile-time gate for M18 downstream consumer. Trigger: M18 plan starts.
  • Upstream `gen_ai.training.*` semconv ratification per NORTHSTARS O4.

🤖 Generated with Claude Code

trilamsr added 15 commits May 19, 2026 13:16
Signed-off-by: Tri Lam <trilamsr@gmail.com>
…nism, fr_trace disclaimer

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…els (Tasks 3-4)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…maxevents/golden-stub) (Tasks 5-8, 10-stub)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…ip tests (Task 9)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…nerate-fixtures CI gate (Task 10)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
Signed-off-by: Tri Lam <trilamsr@gmail.com>
Signed-off-by: Tri Lam <trilamsr@gmail.com>
…ocyclo (Task 13)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…gest, factory, watch loop (Tasks 14-22)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…(Tasks 21-25)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…o extract, testifylint, forcetypeassert helper

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…OK/example_config (Tasks 26-27)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
…erred (Tasks 30-33)

Signed-off-by: Tri Lam <trilamsr@gmail.com>
@trilamsr

Copy link
Copy Markdown
Contributor Author

Closing in favor of stacked PRs A→B→C→D per post-shipping review (gap #11). Replacement: m14-a-rfc, m14-b-parser, m14-c-receiver, m14-d-bench. Each reviewable independently; merge order matches stack.

@trilamsr trilamsr closed this May 20, 2026
@trilamsr trilamsr deleted the worktree-m14-kineto-design-spec branch May 30, 2026 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant