Skip to content

chore(rrweb): Replace fast-mhtml with inline MHTML parser#274

Open
chargome wants to merge 2 commits intosentry-v2from
chargome/chore/remove-fast-mhtml
Open

chore(rrweb): Replace fast-mhtml with inline MHTML parser#274
chargome wants to merge 2 commits intosentry-v2from
chargome/chore/remove-fast-mhtml

Conversation

@chargome
Copy link
Copy Markdown
Member

@chargome chargome commented Mar 26, 2026

Remove the fast-mhtml dependency which was only used in one test utility function (packages/rrweb/test/utils.ts) for parsing MHTML snapshots in replayer E2E tests.

Replace with a minimal inline parser (~30 lines) that handles multipart MIME boundary splitting and quoted-printable content decoding. All 47 replayer tests pass.

fast-mhtml pulled in cheerio, express, undici, qs, bluebird, and cookie — a massive transitive tree for a simple test helper. This removes ~584 lines from yarn.lock.

Dependabot alerts resolved

Fully resolved (vulnerable package completely removed from lockfile):

Alert Severity Package Summary
#166 MEDIUM qs arrayLimit bypass allows DoS via memory exhaustion
#183 LOW qs arrayLimit bypass in comma parsing allows DoS

Partially resolved (some entries removed, but package still exists via other dependency chains):

Alert Severity Package Remaining source
#225, #224, #223, #222, #221, #170, #130, #112 HIGH/MEDIUM/LOW undici Still pulled in by puppeteer (Phase 3)
#100 LOW cookie Still pulled in by @sveltejs/kit (Phase 5)

closes https://linear.app/getsentry/issue/SDK-1097/replace-fast-mhtml-9-alerts

chargome and others added 2 commits March 26, 2026 15:29
Remove the fast-mhtml dependency which was only used in one test utility
function for parsing MHTML snapshots. Replace with a minimal inline
parser (~30 lines) that handles multipart MIME boundary splitting and
quoted-printable decoding.

fast-mhtml pulled in cheerio, express, undici, qs, and bluebird — a
massive transitive tree for a simple test helper. This removes ~584
lines from yarn.lock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code bot commented Mar 27, 2026

@chargome chargome marked this pull request as ready for review March 27, 2026 09:30
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}
if (filename.includes('frame')) {
content = format(content, { parser: 'html' });
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frame formatting condition is now dead code

Low Severity

The filename.includes('frame') check is now dead code. The old fast-mhtml library generated internal names starting with "frame" for frame parts, producing rewritten names like file-frame-4. The new parseMhtml only extracts Content-Location headers — frame parts either lack this header (yielding empty filenames → file-null-0) or have URL-based values (yielding file-http-N). Neither ever contains "frame", so format(content, { parser: 'html' }) is never called. The updated snapshots confirm this: previously pretty-printed HTML is now single-line.

Additional Locations (1)
Fix in Cursor Fix in Web

.replace(/=([0-9A-Fa-f]{2})/g, (_, hex) =>
String.fromCharCode(parseInt(hex, 16)),
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoted-printable decoder mishandles multi-byte UTF-8 sequences

Low Severity

decodeQuotedPrintable uses String.fromCharCode(parseInt(hex, 16)) which treats each encoded byte as an independent character. For multi-byte UTF-8 sequences (e.g., =C3=A9 for "é"), this produces two incorrect Latin-1 characters instead of one correct Unicode character. Current tests use ASCII-only content so this doesn't surface, but any future test with non-ASCII text in MHTML snapshots would decode incorrectly.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant