fix(core): allow PERCY_GZIP to bypass raw-size cap on asset discovery (PER-8648)#2250
Conversation
… (PER-8648) Resources whose raw bytes exceed MAX_RESOURCE_SIZE (~15.75 MB) were rejected at network capture time even when PERCY_GZIP=true was set. Because discovery.js applies gzip only AFTER resources are added to resources[], PERCY_GZIP could never compress a resource that the size cap rejected first — making the env var effectively useless for the case it's most needed (large AEM/Sitecore clientlib CSS bundles). This change introduces shouldSkipForSize(): when PERCY_GZIP is enabled, the raw cap is relaxed up to an 80 MB ceiling (kept well under the CDP WebSocket payload limit). Resources are gzipped as before in discovery.js, with a new post-gzip cap re-check at the existing MAX_RESOURCE_SIZE so the upload payload still respects the original limit. Verified on cooperlighting.com (29.3 MB clientlib CSS): - Baseline build 50211556 (no PERCY_GZIP): CSS rejected at capture, renderer 404 on origin fallback, 91.7% success rate. - Fix path build 50211623 (PERCY_GZIP=true): CSS captured + gzipped to ~30 KB, no origin fallback needed, 100% success rate. Slack thread: https://browserstack.slack.com/archives/C0543RYTFGB/p1779966728108129 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses P1/P2 review findings on PR #2250: P1 — production safety: - Lower MAX_RAW_RESOURCE_SIZE_WITH_GZIP from 80 MB → 50 MB. CDP returns binary bodies as base64 (+33%); 80 MB raw became ~107 MB on the wire, exceeding the ws library's 100 MB maxPayload default and tearing down the CDP socket on bodies near the ceiling. 50 MB raw → ~67 MB base64, comfortably under the limit. (Final value pending storage-cost review.) - browser.js: explicitly set ws maxPayload to 150 MB so the channel survives base64 + framing for any body up to the new ceiling. - Wrap Pako.gzip + sha256 in try/catch inside the PERCY_GZIP loop. Before this, a single resource failing to gzip would abort the entire snapshot; now it logs + skips and continues. - Scale DIRECT_FETCH_TIMEOUT from 5 s → 30 s when PERCY_GZIP is set. 5 s could not download tens of MB on typical CI bandwidth; failures were swallowed silently. P2 — correctness and observability: - Wrap response.buffer() in raceWithTimeout (30 s / 60 s) so a hung CDP socket on a large body can't stall the discovery queue. - New direct-fetch catch and buffer-fetch catch emit logAssetInstrumentation('asset_load_missing', 'network_error', …) so these failures are visible to the support tooling (analyse_build, analyze_cli_logs_tool) instead of being log.debug-only. - Post-gzip drop in discovery.js emits the structured [ASSET_NOT_UPLOADED] resource_too_large signal that every other size-skip uses; previously it was log.debug-only. - Exempt root (DOM HTML) and log resources from the post-gzip drop — shipping an oversized root and surfacing an API error is safer than silently producing a broken snapshot. - Export MAX_RESOURCE_SIZE and logAssetInstrumentation from network.js; discovery.js imports MAX_RESOURCE_SIZE instead of redeclaring it. - Rename MAX_RESOURCE_SIZE_GZIP_CEILING → MAX_RAW_RESOURCE_SIZE_WITH_GZIP (it gates raw bytes, not the gzipped output). - Log message text changed from "Skipping resource larger than 25MB" to "Skipping resource larger than allowed size"; the actual size is in the structured logAssetInstrumentation payload. - ByteLRU cache floor: raise the --max-cache-ram minimum to 50 MB when PERCY_GZIP is set so the cache can hold raw bodies up to the new ceiling instead of silently dropping them. Tests: - New: drops resource post-gzip when gzipped size still exceeds the cap (incompressible 20 MB random payload exercises the new branch). - All "Skipping resource larger than 25MB" assertions updated to the new "allowed size" wording. - afterEach now cleans up PERCY_GZIP so state can't leak between tests. End-to-end re-verified on cooperlighting.com (build 50215292): clientlib-main.min.ACSHASH<hash>.css (29.3 MB) processed into resources, no [ASSET_NOT_UPLOADED] for it, only the unrelated Signify CDN 206s remain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed review-fix commit (5f5a744) addressing P1/P2 findings from P1 (production safety):
P2 (correctness + observability):
Tests:
Re-verified end-to-end: cooperlighting.com build 50215292 — Knowingly deferred (with notes):
🤖 Generated with Claude Code |
…them The previous commit's .catch on response.buffer() silently consumed any underlying CDP error (returning null), breaking the existing "Encountered an error processing resource" log path that the "logs unhandled response errors gracefully" test asserts on. Drop the .catch — errors (including the timeout from raceWithTimeout) propagate naturally to the outer try/catch at the top of saveResponseResource, which already logs the failure. Move the structured logAssetInstrumentation call into that outer catch so support tooling still sees buffer/timeout failures as asset_load_missing/network_error events. Net behavior: - Existing test path preserved (CDP-level rejection → outer catch logs "Encountered an error processing resource: <url>" + stack). - New observability: every failure in the body-processing block now also emits an [ASSET_LOAD_MISSING] structured event for support tools. - Timeout protection from raceWithTimeout is retained. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI failure on the previous push was 100% coverage threshold misses, not test failures (all 1071 specs passed). Two branches added in earlier commits weren't exercised: - discovery.js: the try/catch around Pako.gzip (P1.2 — one bad resource must not fail the whole snapshot) - network.js:824: the PERCY_GZIP-aware DIRECT_FETCH_TIMEOUT ternary (P1.3 — 5s is too short for tens-of-MB direct fetches under gzip mode) Two new tests: - "skips resource and continues when Pako.gzip throws" — uses spyOn to make Pako.gzip throw on the second call, asserts the catch path logs the expected message and the snapshot completes. - "uses the longer direct-fetch timeout when PERCY_GZIP is set" — drops Network.responseReceived to force captureResourceDirectly, returns 400 to trigger the catch; with PERCY_GZIP=true the truthy branch of the timeout ternary is exercised. Both branches now have coverage; istanbul-ignore comments removed since the catch and ternary are testable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'uses the longer direct-fetch timeout when PERCY_GZIP is set' test
asserted on a log.debug-level message ('Direct fetch failed for ...')
but never set percy.loglevel('debug'), so the message was filtered out
and the assertion failed. The sibling test at line 3359 inherits debug
via its describe-block beforeEach; mine is in the outer describe so
needs the explicit call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two branches surfaced by the coverage gate: - discovery.js:245 — the `?? 0` fallback in `resource.content?.length ?? 0` was dead defensiveness (the upstream try block guarantees `.content` is a Buffer at this point: either freshly assigned from Pako.gzip or alreadyZipped was true). Tighten to `resource.content.length` so the branch is gone instead of uncovered. - discovery.js:514 — the PERCY_GZIP=true side of the cache-floor ternary (`process.env.PERCY_GZIP ? 50 : MAX_RESOURCE_SIZE_MB`) had no test. Added "clamps to the 50MB floor when PERCY_GZIP is set" alongside the existing 25MB-floor tests in the with --max-cache-ram describe block. No source-behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire browser.js's WebSocket maxPayload to the MAX_CDP_PAYLOAD constant (set to 150 MB) instead of a magic literal, removing the previously unused export and the stale 100 MB reference in the comment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR Review — PER-8648Summary: Allows Review Table
Findings (both addressed in 004724e)1 — Dead constant (Low): 2 — Stale comment (Low): The Observations (non-blocking)
Verdict✅ Approve — solid, well-tested fix with correct error propagation and bounded memory. The two Low-severity cleanups have been applied. 🤖 Generated with Claude Code |
| /* istanbul ignore next: very hard to mock true */ | ||
| if (!alreadyZipped) { | ||
| resource.content = Pako.gzip(resource.content); | ||
| resource.sha = sha256hash(resource.content); |
There was a problem hiding this comment.
we can add one log line after we set resouce sha that resource has been zipped
There was a problem hiding this comment.
Added a debug log line right after the SHA is set: log.debug(- Gzipped resource: ${resource.url}). Pushed in 18feb25.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes PER-8648.
Problem
Resources whose raw bytes exceed
MAX_RESOURCE_SIZE(~15.75 MB) are rejected at network capture time innetwork.js, even whenPERCY_GZIP=trueis set. Becausediscovery.jsonly gzips resources after they've been added toresources[],PERCY_GZIPcould never compress a resource that the size cap rejected first — making the env var effectively useless for the case it is most needed: large AEM/Sitecoreclientlib-*CSS bundles on enterprise sites.Real customer impact today on PER-8648: cooperlighting.com (Signify, AEM) has a single 29.3 MB
clientlib-main.min.ACSHASH<hash>.cssthat Percy can never capture. The renderer falls back to fetching from origin, but the AEMACSHASHcache-busts on every publish — by render time the hash has rolled and origin returns 404 — so the page renders unstyled. None of the previously-suggested workarounds (PERCY_GZIP,enableJavaScript,visual_allow_all_hostname) bypassnetwork.js:643by design.Change
packages/core/src/network.jsshouldSkipForSize(size)helper. WhenPERCY_GZIPis set it relaxes the raw cap to an 80 MB ceiling (kept well under the 100 MB CDP WebSocket payload limit, bounds CLI memory). WithoutPERCY_GZIP, behavior is unchanged — stillMAX_RESOURCE_SIZE.> MAX_RESOURCE_SIZEsites (inspectContentLength,captureResourceDirectly,saveResponseResourcebody check, and viatooLargein the Fetch interceptor) now go throughshouldSkipForSize.- Skipping resource larger than 25MB) and the structuredasset_not_uploaded / resource_too_largeinstrumentation are preserved — downstream parsers and existing tests are unaffected.packages/core/src/discovery.jsPERCY_GZIPloop, after gzipping each resource, drop it if the gzipped bytes still exceedMAX_RESOURCE_SIZE. This preserves the original guarantee on the upload payload (oversized resources cannot reach the server) and surfaces the rejection in CLI logs at the gzip stage instead of at capture time.Tests added (
packages/core/test/discovery.test.js)captures resource larger than 25MB raw when PERCY_GZIP is enabled— 30 MBtext/cssof'A'.repeat(30_000_000)is captured into the snapshot, gzipped, and uploaded.still skips resource above PERCY_GZIP raw-size ceiling— 90 MB resource is still rejected even withPERCY_GZIP=true(verifies the ceiling).does not capture large files,checks if no header is send,does not capture remote files with content-length NAN…, the threeContent-Length casingcases,skips file greater than 100MB) continue to pass unchanged —shouldSkipForSizefalls through to the originalMAX_RESOURCE_SIZEcheck whenPERCY_GZIPis unset.Verification (live builds on prod Percy)
Same Python Playwright snapshot of
https://www.cooperlighting.com/global, identical except forPERCY_GZIP:PERCY_GZIP) — build 50211556PERCY_GZIP=true) — build 50211623clientlib-main.min.ACSHASH<hash>.css[ASSET_NOT_UPLOADED] Resource too large→ skippedProcessing resource: …/clientlib-main.min.ACSHASH<hash>.css✅Discussion / alignment in Slack thread above with Ninad. cc @rishigupta1599 (author of the original
c0cafd33/ PR #1803 that tightened the cap to account for base64 inflation), @rahul-barnwal, @aryan2611.🤖 Generated with Claude Code