Skip to content

fix(core): allow PERCY_GZIP to bypass raw-size cap on asset discovery (PER-8648)#2250

Merged
ninadbstack merged 8 commits into
masterfrom
fix/per-8648-gzip-bypasses-size-cap
Jun 1, 2026
Merged

fix(core): allow PERCY_GZIP to bypass raw-size cap on asset discovery (PER-8648)#2250
ninadbstack merged 8 commits into
masterfrom
fix/per-8648-gzip-bypasses-size-cap

Conversation

@pranavz28

@pranavz28 pranavz28 commented May 28, 2026

Copy link
Copy Markdown
Contributor

Fixes PER-8648.

Problem

Resources whose raw bytes exceed MAX_RESOURCE_SIZE (~15.75 MB) are rejected at network capture time in network.js, even when PERCY_GZIP=true is set. Because discovery.js only gzips resources after they've been added to resources[], PERCY_GZIP could never compress a resource that the size cap rejected first — making the env var effectively useless for the case it is most needed: large AEM/Sitecore clientlib-* CSS bundles on enterprise sites.

Real customer impact today on PER-8648: cooperlighting.com (Signify, AEM) has a single 29.3 MB clientlib-main.min.ACSHASH<hash>.css that Percy can never capture. The renderer falls back to fetching from origin, but the AEM ACSHASH cache-busts on every publish — by render time the hash has rolled and origin returns 404 — so the page renders unstyled. None of the previously-suggested workarounds (PERCY_GZIP, enableJavaScript, visual_allow_all_hostname) bypass network.js:643 by design.

Change

packages/core/src/network.js

  • New shouldSkipForSize(size) helper. When PERCY_GZIP is set it relaxes the raw cap to an 80 MB ceiling (kept well under the 100 MB CDP WebSocket payload limit, bounds CLI memory). Without PERCY_GZIP, behavior is unchanged — still MAX_RESOURCE_SIZE.
  • All four > MAX_RESOURCE_SIZE sites (inspectContentLength, captureResourceDirectly, saveResponseResource body check, and via tooLarge in the Fetch interceptor) now go through shouldSkipForSize.
  • Log message text (- Skipping resource larger than 25MB) and the structured asset_not_uploaded / resource_too_large instrumentation are preserved — downstream parsers and existing tests are unaffected.

packages/core/src/discovery.js

  • In the existing PERCY_GZIP loop, after gzipping each resource, drop it if the gzipped bytes still exceed MAX_RESOURCE_SIZE. This preserves the original guarantee on the upload payload (oversized resources cannot reach the server) and surfaces the rejection in CLI logs at the gzip stage instead of at capture time.

Tests added (packages/core/test/discovery.test.js)

  • captures resource larger than 25MB raw when PERCY_GZIP is enabled — 30 MB text/css of 'A'.repeat(30_000_000) is captured into the snapshot, gzipped, and uploaded.
  • still skips resource above PERCY_GZIP raw-size ceiling — 90 MB resource is still rejected even with PERCY_GZIP=true (verifies the ceiling).
  • All existing size-cap tests (does not capture large files, checks if no header is send, does not capture remote files with content-length NAN…, the three Content-Length casing cases, skips file greater than 100MB) continue to pass unchanged — shouldSkipForSize falls through to the original MAX_RESOURCE_SIZE check when PERCY_GZIP is unset.

Verification (live builds on prod Percy)

Same Python Playwright snapshot of https://www.cooperlighting.com/global, identical except for PERCY_GZIP:

Baseline (no PERCY_GZIP) — build 50211556 Fix path (PERCY_GZIP=true) — build 50211623
CLI log for 29.3 MB clientlib-main.min.ACSHASH<hash>.css [ASSET_NOT_UPLOADED] Resource too large → skipped Processing resource: …/clientlib-main.min.ACSHASH<hash>.css
Renderer head status on the CSS 404 (origin fallback failed on cache-busted hash) bundled in snapshot — fallback not needed
Renderer success rate 91.7% (1 critical failure) 100.0%
Post-gzip cap rejections (new check) n/a 0 (29.3 MB gzipped → ~30 KB, well under cap)

Discussion / alignment in Slack thread above with Ninad. cc @rishigupta1599 (author of the original c0cafd33 / PR #1803 that tightened the cap to account for base64 inflation), @rahul-barnwal, @aryan2611.

🤖 Generated with Claude Code

… (PER-8648)

Resources whose raw bytes exceed MAX_RESOURCE_SIZE (~15.75 MB) were rejected
at network capture time even when PERCY_GZIP=true was set. Because
discovery.js applies gzip only AFTER resources are added to resources[],
PERCY_GZIP could never compress a resource that the size cap rejected
first — making the env var effectively useless for the case it's most
needed (large AEM/Sitecore clientlib CSS bundles).

This change introduces shouldSkipForSize(): when PERCY_GZIP is enabled,
the raw cap is relaxed up to an 80 MB ceiling (kept well under the CDP
WebSocket payload limit). Resources are gzipped as before in
discovery.js, with a new post-gzip cap re-check at the existing
MAX_RESOURCE_SIZE so the upload payload still respects the original
limit.

Verified on cooperlighting.com (29.3 MB clientlib CSS):
- Baseline build 50211556 (no PERCY_GZIP): CSS rejected at capture,
  renderer 404 on origin fallback, 91.7% success rate.
- Fix path build 50211623 (PERCY_GZIP=true): CSS captured + gzipped
  to ~30 KB, no origin fallback needed, 100% success rate.

Slack thread: https://browserstack.slack.com/archives/C0543RYTFGB/p1779966728108129

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pranavz28 pranavz28 requested a review from a team as a code owner May 28, 2026 12:08
Addresses P1/P2 review findings on PR #2250:

P1 — production safety:
- Lower MAX_RAW_RESOURCE_SIZE_WITH_GZIP from 80 MB → 50 MB. CDP returns
  binary bodies as base64 (+33%); 80 MB raw became ~107 MB on the wire,
  exceeding the ws library's 100 MB maxPayload default and tearing down
  the CDP socket on bodies near the ceiling. 50 MB raw → ~67 MB base64,
  comfortably under the limit. (Final value pending storage-cost review.)
- browser.js: explicitly set ws maxPayload to 150 MB so the channel
  survives base64 + framing for any body up to the new ceiling.
- Wrap Pako.gzip + sha256 in try/catch inside the PERCY_GZIP loop. Before
  this, a single resource failing to gzip would abort the entire
  snapshot; now it logs + skips and continues.
- Scale DIRECT_FETCH_TIMEOUT from 5 s → 30 s when PERCY_GZIP is set.
  5 s could not download tens of MB on typical CI bandwidth; failures
  were swallowed silently.

P2 — correctness and observability:
- Wrap response.buffer() in raceWithTimeout (30 s / 60 s) so a hung CDP
  socket on a large body can't stall the discovery queue.
- New direct-fetch catch and buffer-fetch catch emit
  logAssetInstrumentation('asset_load_missing', 'network_error', …) so
  these failures are visible to the support tooling (analyse_build,
  analyze_cli_logs_tool) instead of being log.debug-only.
- Post-gzip drop in discovery.js emits the structured
  [ASSET_NOT_UPLOADED] resource_too_large signal that every other
  size-skip uses; previously it was log.debug-only.
- Exempt root (DOM HTML) and log resources from the post-gzip drop —
  shipping an oversized root and surfacing an API error is safer than
  silently producing a broken snapshot.
- Export MAX_RESOURCE_SIZE and logAssetInstrumentation from network.js;
  discovery.js imports MAX_RESOURCE_SIZE instead of redeclaring it.
- Rename MAX_RESOURCE_SIZE_GZIP_CEILING → MAX_RAW_RESOURCE_SIZE_WITH_GZIP
  (it gates raw bytes, not the gzipped output).
- Log message text changed from "Skipping resource larger than 25MB" to
  "Skipping resource larger than allowed size"; the actual size is in
  the structured logAssetInstrumentation payload.
- ByteLRU cache floor: raise the --max-cache-ram minimum to 50 MB when
  PERCY_GZIP is set so the cache can hold raw bodies up to the new
  ceiling instead of silently dropping them.

Tests:
- New: drops resource post-gzip when gzipped size still exceeds the cap
  (incompressible 20 MB random payload exercises the new branch).
- All "Skipping resource larger than 25MB" assertions updated to the new
  "allowed size" wording.
- afterEach now cleans up PERCY_GZIP so state can't leak between tests.

End-to-end re-verified on cooperlighting.com (build 50215292):
clientlib-main.min.ACSHASH<hash>.css (29.3 MB) processed into resources,
no [ASSET_NOT_UPLOADED] for it, only the unrelated Signify CDN 206s remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pranavz28

Copy link
Copy Markdown
Contributor Author

Pushed review-fix commit (5f5a744) addressing P1/P2 findings from /ce:review:

P1 (production safety):

  • Ceiling lowered 80 MB → 50 MB raw. CDP returns binary bodies as base64 (+33%), so 80 MB became ~107 MB on the wire — over the ws library's 100 MB maxPayload default. 50 MB raw → ~67 MB on the wire, comfortably under. (Final value pending storage-cost review.)
  • browser.js: explicit maxPayload: 150 MB on the CDP WebSocket so it survives base64+framing for any body up to the new ceiling.
  • Pako.gzip + sha now wrapped in try/catch per resource — one bad resource can no longer abort the whole snapshot.
  • DIRECT_FETCH_TIMEOUT scaled to 30 s when PERCY_GZIP=true (was 5 s — far too short for tens of MB).

P2 (correctness + observability):

  • response.buffer() wrapped in raceWithTimeout (30 s / 60 s under PERCY_GZIP) — a hung CDP socket can no longer stall the discovery queue indefinitely.
  • Direct-fetch catch and buffer-fetch catch emit logAssetInstrumentation('asset_load_missing', 'network_error', …) so support tools (analyse_build, analyze_cli_logs_tool) see the failure instead of just a debug log.
  • Post-gzip drop in discovery.js now emits the same structured [ASSET_NOT_UPLOADED] resource_too_large signal as every other size-skip site.
  • Root and log resources are exempted from the post-gzip drop — shipping an oversized root and letting the API surface a clear error is safer than silently producing a broken snapshot.
  • MAX_RESOURCE_SIZE is now exported from network.js and imported in discovery.js — single source of truth, eliminates the drift hazard.
  • Constant renamed MAX_RESOURCE_SIZE_GZIP_CEILINGMAX_RAW_RESOURCE_SIZE_WITH_GZIP (it gates raw bytes, not gzipped output — the previous name read as the opposite).
  • Log message "Skipping resource larger than 25MB""Skipping resource larger than allowed size" (the previous wording was factually wrong under PERCY_GZIP; the actual size is already in the structured logAssetInstrumentation payload).
  • ByteLRU cache floor: minimum --max-cache-ram is now 50 MB when PERCY_GZIP is set (was hard-coded 25 MB; large raw bodies were being silently dropped from the cache, forcing re-gzip on every snapshot).

Tests:

  • New: drops resource post-gzip when gzipped size still exceeds the cap — uses a 20 MB crypto.randomBytes payload (incompressible), exercising the new discovery.js post-gzip branch that the previous tests bypassed.
  • All Skipping resource larger than 25MB assertions updated to the new wording.
  • afterEach now delete process.env.PERCY_GZIP so state can't leak between tests.

Re-verified end-to-end: cooperlighting.com build 50215292clientlib-main.min.ACSHASH<hash>.css (29.3 MB) is processed into the resource bundle, no [ASSET_NOT_UPLOADED] for it; only the unrelated Signify CDN HTTP 206 entries remain (separate, pre-existing issue).

Knowingly deferred (with notes):

  • Synchronous Pako.gzip on large bodies still blocks the event loop (~300–500 ms for a 30–40 MB body). Swapping to node:zlib.gzip is async/off-thread but would require making processSnapshotResources async — out of scope for this PR. Tracking as follow-up.
  • await response.buffer() materializes body into V8 heap before any --max-cache-ram accounting. With the new 50 MB ceiling + default concurrency 10, peak transient heap can reach ~500 MB. Real-world impact bounded by discovery.concurrency and snapshot composition. Worth a follow-up to do Content-Length-based budgeting before buffering.

🤖 Generated with Claude Code

pranavz28 and others added 5 commits May 28, 2026 19:08
…them

The previous commit's .catch on response.buffer() silently consumed any
underlying CDP error (returning null), breaking the existing
"Encountered an error processing resource" log path that the
"logs unhandled response errors gracefully" test asserts on.

Drop the .catch — errors (including the timeout from raceWithTimeout)
propagate naturally to the outer try/catch at the top of
saveResponseResource, which already logs the failure. Move the
structured logAssetInstrumentation call into that outer catch so
support tooling still sees buffer/timeout failures as
asset_load_missing/network_error events.

Net behavior:
- Existing test path preserved (CDP-level rejection → outer catch
  logs "Encountered an error processing resource: <url>" + stack).
- New observability: every failure in the body-processing block now
  also emits an [ASSET_LOAD_MISSING] structured event for support tools.
- Timeout protection from raceWithTimeout is retained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI failure on the previous push was 100% coverage threshold misses, not
test failures (all 1071 specs passed). Two branches added in earlier
commits weren't exercised:
- discovery.js: the try/catch around Pako.gzip (P1.2 — one bad resource
  must not fail the whole snapshot)
- network.js:824: the PERCY_GZIP-aware DIRECT_FETCH_TIMEOUT ternary
  (P1.3 — 5s is too short for tens-of-MB direct fetches under gzip mode)

Two new tests:
- "skips resource and continues when Pako.gzip throws" — uses spyOn
  to make Pako.gzip throw on the second call, asserts the catch path
  logs the expected message and the snapshot completes.
- "uses the longer direct-fetch timeout when PERCY_GZIP is set" —
  drops Network.responseReceived to force captureResourceDirectly,
  returns 400 to trigger the catch; with PERCY_GZIP=true the
  truthy branch of the timeout ternary is exercised.

Both branches now have coverage; istanbul-ignore comments removed
since the catch and ternary are testable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'uses the longer direct-fetch timeout when PERCY_GZIP is set' test
asserted on a log.debug-level message ('Direct fetch failed for ...')
but never set percy.loglevel('debug'), so the message was filtered out
and the assertion failed. The sibling test at line 3359 inherits debug
via its describe-block beforeEach; mine is in the outer describe so
needs the explicit call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two branches surfaced by the coverage gate:
- discovery.js:245 — the `?? 0` fallback in `resource.content?.length ?? 0`
  was dead defensiveness (the upstream try block guarantees `.content` is
  a Buffer at this point: either freshly assigned from Pako.gzip or
  alreadyZipped was true). Tighten to `resource.content.length` so the
  branch is gone instead of uncovered.
- discovery.js:514 — the PERCY_GZIP=true side of the cache-floor ternary
  (`process.env.PERCY_GZIP ? 50 : MAX_RESOURCE_SIZE_MB`) had no test.
  Added "clamps to the 50MB floor when PERCY_GZIP is set" alongside the
  existing 25MB-floor tests in the with --max-cache-ram describe block.

No source-behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire browser.js's WebSocket maxPayload to the MAX_CDP_PAYLOAD constant
(set to 150 MB) instead of a magic literal, removing the previously
unused export and the stale 100 MB reference in the comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pranavz28

Copy link
Copy Markdown
Contributor Author

PR Review — PER-8648

Summary: Allows PERCY_GZIP runs to capture raw asset bodies up to 50 MB (instead of the ~15.75 MB base64-adjusted cap), gzips them during snapshot processing, then re-applies the upload cap on the compressed size. Also raises the WebSocket maxPayload, adds timeouts around body buffering and direct fetches, and adds structured instrumentation on failure paths.

Review Table

Priority Category Check Status Notes
High Security No hardcoded secrets/credentials ✅ Pass None introduced
High Security AuthN/AuthZ checks present N/A No auth surface touched
High Security Input validation & sanitization ✅ Pass shouldSkipForSize guards non-finite sizes; size caps bound memory
High Security No IDOR N/A
High Security No SQL injection N/A
High Correctness Logic correct, edge cases ✅ Pass gzip try/catch, root/log resources preserved, finite-size guard all sound
High Correctness Explicit error handling, no swallowed exceptions ✅ Pass 42edeb24 fixed swallowed response.buffer() errors; timeouts propagate to outer catch
High Correctness No race conditions/concurrency ✅ Pass raceWithTimeout clears its timer via .finally()
Medium Testing New code has tests ✅ Pass Specs for raw-bypass, post-gzip drop, gzip throw, longer timeout, cache floor
Medium Testing Error/edge paths tested ✅ Pass gzip-throws + timeout branches covered
Medium Testing Existing tests pass (no regressions) ✅ Pass CI green
Medium Performance No unbounded data fetching ✅ Pass This PR adds the timeouts that bound buffering
Medium Performance Long tasks backgrounded N/A
Medium Quality Follows codebase patterns ✅ Pass Reuses logAssetInstrumentation, existing constants style
Medium Quality Focused/single concern ✅ Pass All scoped to PER-8648
Low Quality Meaningful names, no dead code ✅ Fixed MAX_CDP_PAYLOAD was exported-but-unused → now wired up
Low Quality Comments explain why ✅ Pass Comments are genuinely explanatory
Low Quality No unnecessary deps ✅ Pass No new deps

Findings (both addressed in 004724e)

1 — Dead constant (Low): MAX_CDP_PAYLOAD (network.js) was exported but never referenced; the real ws limit lived as a magic literal 150 * (1024 ** 2) in browser.js. → Set MAX_CDP_PAYLOAD = 150 MB and browser.js now imports and uses it, so the constant and the literal can't drift.

2 — Stale comment (Low): The network.js comment said "keep the raw ceiling small enough … under [the 100 MB default]", but browser.js overrides the default to 150 MB. → Comment now accurately states the 100 MB default is raised to MAX_CDP_PAYLOAD.

Observations (non-blocking)

  • Two cap semantics (intentional): network.js caps the raw body at 50 MB during capture; discovery.js caps the gzipped size at MAX_RESOURCE_SIZE (~15.75 MB). Text compresses and is kept; incompressible 15–50 MB assets are captured then dropped post-gzip — slightly wasteful but correct and well-commented.
  • Known TODO: The 50 MB ceiling is documented as "pending a storage-cost review (PER-8648)" — fine as a documented interim value.

Verdict

✅ Approve — solid, well-tested fix with correct error propagation and bounded memory. The two Low-severity cleanups have been applied.

🤖 Generated with Claude Code

/* istanbul ignore next: very hard to mock true */
if (!alreadyZipped) {
resource.content = Pako.gzip(resource.content);
resource.sha = sha256hash(resource.content);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add one log line after we set resouce sha that resource has been zipped

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a debug log line right after the SHA is set: log.debug(- Gzipped resource: ${resource.url}). Pushed in 18feb25.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pranavz28 pranavz28 added the 🧹 maintenance General maintenance label Jun 1, 2026
@ninadbstack ninadbstack merged commit 20a1b13 into master Jun 1, 2026
45 checks passed
@ninadbstack ninadbstack deleted the fix/per-8648-gzip-bypasses-size-cap branch June 1, 2026 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🧹 maintenance General maintenance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants