From 87280bf6be2c2ea0ab50f77378f6c4177008ecbe Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Wed, 20 May 2026 20:15:23 -0700 Subject: [PATCH] [chore] kineto: stub MaxEvents in cap test, cut verify-test by ~390s MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TestParse_MaxEventsCap built a 16M-event / ~1.1 GB JSON string and ran ~398s under -race, dominating verify-test wallclock (pkg/kineto = 392s of ~470s total). MaxEvents is excluded from -coverpkg, so the test contributed zero coverage signal — only the cap-path assertion. Promote MaxEvents from const to var (mirrors existing MaxBytes idiom), have the test lower the cap to 100, and remove the testing.Short skip since the cheap form always belongs in CI. Override pattern matches TestParse_MaxBytesCap_DistinguishedFromTruncated. Receipts (local, M-series): pkg/kineto race tests: 392s -> 1.5s TestParse_MaxEventsCap: 398s -> <0.01s make coverage-check total: 470s -> 25s Per-package coverage % identical pre/post (verified by diffing both runs). On CI ubuntu-latest, verify-test's coverage-check step (~125s) is expected to drop ~80-90s; pkg/kineto no longer dominates and the tail (cmd/tracecore, receivers/kineto, receivers/kernelevents at ~17s each) bounds the job. RFC-0012 wording updated: "package-level constants" -> "package-level bounds" with a note that MaxBytes and MaxEvents are vars (test overrides only); MaxStringBytes and MaxDepth remain const. stress_2gb_test.go (14M events) was audited — already gated with //go:build !race and testing.Short(), so it is not in the race path. Signed-off-by: Tri Lam Co-Authored-By: Claude Opus 4.7 (1M context) Signed-off-by: Tri Lam --- docs/rfcs/0012-kineto-receiver-scope.md | 4 ++-- pkg/kineto/decoder.go | 9 +++++++-- pkg/kineto/decoder_test.go | 16 ++++++++++++---- 3 files changed, 21 insertions(+), 8 deletions(-) diff --git a/docs/rfcs/0012-kineto-receiver-scope.md b/docs/rfcs/0012-kineto-receiver-scope.md index cbaae4d2..997fce9e 100644 --- a/docs/rfcs/0012-kineto-receiver-scope.md +++ b/docs/rfcs/0012-kineto-receiver-scope.md @@ -236,7 +236,7 @@ Every error path calls `selftel.IncError(kind)` exactly once per file (not per e ### Resource bounds -Pinned in `pkg/kineto/decoder.go` package-level constants: +Pinned in `pkg/kineto/decoder.go` as package-level bounds: | Bound | Default | Derivation | |---|---|---| @@ -245,7 +245,7 @@ Pinned in `pkg/kineto/decoder.go` package-level constants: | `MaxStringBytes` | `1<<20` (1 MiB) | Per-field cap; matches the M11 precedent; defends against malformed `args.Stream` strings | | `MaxDepth` | `64` | Chrome-trace structure is flat; no nesting beyond `args.{}` | -Arithmetic for `MaxEvents`: `2.2e9 bytes / 140 bytes/event ≈ 15.7e6 events`, rounded up to the nearest power of two (`1<<24 = 16.78e6`). Exceed any bound triggers `ErrLimitExceeded`, aborts the file, bumps `IncError(KindLimitExceeded)`, and continues to the next file. +`MaxBytes` and `MaxEvents` are declared as `var` (not `const`) so tests can lower them to exercise the cap path on a tiny input. Production code does not mutate either; `MaxStringBytes` and `MaxDepth` remain `const`. Arithmetic for `MaxEvents`: `2.2e9 bytes / 140 bytes/event ≈ 15.7e6 events`, rounded up to the nearest power of two (`1<<24 = 16.78e6`). Exceed any bound triggers `ErrLimitExceeded`, aborts the file, bumps `IncError(KindLimitExceeded)`, and continues to the next file. ### `safe.Call` boundary diff --git a/pkg/kineto/decoder.go b/pkg/kineto/decoder.go index a4c9a932..a49a3003 100644 --- a/pkg/kineto/decoder.go +++ b/pkg/kineto/decoder.go @@ -16,9 +16,14 @@ import ( // See RFC-0012 §Resource bounds. var MaxBytes int64 = 1 << 32 // 4 GiB -// Per-event and per-field resource bounds. See RFC-0012 §Resource bounds. +// MaxEvents caps the number of traceEvents we will parse. Package-level +// var (not const); mirrors MaxBytes so tests can lower it to exercise +// the cap path without allocating a 16M-event input. Production code does +// not mutate it. See RFC-0012 §Resource bounds. +var MaxEvents = 1 << 24 // ~16.7M; derived from 2.2 GB / ~140 bytes-per-event median (pytorch#130622) + +// Per-field resource bounds. See RFC-0012 §Resource bounds. const ( - MaxEvents = 1 << 24 // ~16.7M; derived from 2.2 GB / ~140 bytes-per-event median (pytorch#130622) MaxStringBytes = 1 << 20 // 1 MiB per string field (Name/Cat/Ph/Args.ExternalID) MaxDepth = 64 // Chrome-trace structure is flat; 64 is generous ) diff --git a/pkg/kineto/decoder_test.go b/pkg/kineto/decoder_test.go index 5a154afc..3a5d6e24 100644 --- a/pkg/kineto/decoder_test.go +++ b/pkg/kineto/decoder_test.go @@ -81,12 +81,20 @@ func TestParse_VisitErrorPropagates(t *testing.T) { } func TestParse_MaxEventsCap(t *testing.T) { - if testing.Short() { - t.Skip("skipping MaxEvents stress under -short") - } + // Stub MaxEvents down to ~100 to exercise the cap path on a tiny + // input. Override pattern mirrors the MaxBytes test below + // (TestParse_MaxBytesCap_DistinguishedFromTruncated). The naive form + // (build MaxEvents+1 = 16M events as a ~1.1 GB JSON string and run + // the parser under -race) took ~400s and dominated verify-test + // wallclock; this preserves the assertion at sub-millisecond cost. + const stubCap = 100 + orig := MaxEvents + MaxEvents = stubCap + t.Cleanup(func() { MaxEvents = orig }) + var sb strings.Builder sb.WriteString(`{"traceEvents":[`) - for i := 0; i < MaxEvents+1; i++ { + for i := 0; i < stubCap+1; i++ { if i > 0 { sb.WriteString(",") }