fix: prevent tableFromIPC from hanging on corrupted files#378
fix: prevent tableFromIPC from hanging on corrupted files#378Divyanshu-s13 wants to merge 2 commits intoapache:mainfrom
Conversation
|
@kou I’ve fixed the issue. Could you please merge it? |
|
Why do we need max size check for this fix? |
There was a problem hiding this comment.
Pull request overview
This pull request fixes a critical issue where tableFromIPC() could hang indefinitely when reading corrupted IPC files containing invalid metadata or body length values. The fix adds validation to reject unreasonably large values (>256MB for metadata, >2GB for body), throwing clear error messages instead of hanging.
Changes:
- Added MAX_METADATA_LENGTH (256MB) and MAX_BODY_LENGTH (2GB) constants as safeguards
- Implemented validation checks in both MessageReader and AsyncMessageReader classes
- Added early error throwing with descriptive messages indicating potential file corruption
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ipc/message.ts
Outdated
| if (len < 0 || len > MAX_METADATA_LENGTH) { | ||
| throw new Error(`Invalid metadata length ${len}. The file may be corrupted.`); | ||
| } |
There was a problem hiding this comment.
The new validation logic for MAX_METADATA_LENGTH lacks test coverage. Consider adding test cases that verify the error is thrown when metadata length exceeds MAX_METADATA_LENGTH (256MB) or is negative (excluding the valid -1 continuation indicator), and that valid metadata lengths still work correctly.
src/ipc/message.ts
Outdated
| const bb = buf && new ByteBuffer(buf); | ||
| const len = bb?.readInt32(0) || 0; | ||
| // Guard against corrupted files with unreasonable metadata lengths | ||
| if (len < 0 || len > MAX_METADATA_LENGTH) { |
There was a problem hiding this comment.
The validation condition len < 0 will break the continuation indicator logic used for Arrow IPC format v0.15+. According to the comment on lines 56-58 (and 132-134 for async), a value of -1 is a valid continuation indicator per ARROW-6313. This validation should exclude -1 from being treated as an error. Consider changing the condition to len < -1 or (len < 0 && len !== -1).
| if (len < 0 || len > MAX_METADATA_LENGTH) { | |
| if (len < -1 || len > MAX_METADATA_LENGTH) { |
src/ipc/message.ts
Outdated
| const bb = buf && new ByteBuffer(buf); | ||
| const len = bb?.readInt32(0) || 0; | ||
| // Guard against corrupted files with unreasonable metadata lengths | ||
| if (len < 0 || len > MAX_METADATA_LENGTH) { |
There was a problem hiding this comment.
The validation condition len < 0 will break the continuation indicator logic used for Arrow IPC format v0.15+. According to the comment on lines 132-134, a value of -1 is a valid continuation indicator per ARROW-6313. This validation should exclude -1 from being treated as an error. Consider changing the condition to len < -1 or (len < 0 && len !== -1).
| if (len < 0 || len > MAX_METADATA_LENGTH) { | |
| if ((len < 0 && len !== -1) || len > MAX_METADATA_LENGTH) { |
src/ipc/message.ts
Outdated
| if (bodyLength > MAX_BODY_LENGTH) { | ||
| throw new Error(`Message body length ${bodyLength} exceeds maximum allowed size of ${MAX_BODY_LENGTH} bytes. The file may be corrupted.`); | ||
| } |
There was a problem hiding this comment.
The new validation logic for MAX_BODY_LENGTH lacks test coverage. Consider adding test cases that verify the error is thrown when bodyLength exceeds MAX_BODY_LENGTH (2GB), and that valid body lengths just under the limit still work correctly. This is especially important given that this fix addresses a critical hanging issue reported in issue #315.
src/ipc/message.ts
Outdated
| if (len < 0 || len > MAX_METADATA_LENGTH) { | ||
| throw new Error(`Invalid metadata length ${len}. The file may be corrupted.`); | ||
| } |
There was a problem hiding this comment.
The new validation logic for MAX_METADATA_LENGTH lacks test coverage. Consider adding test cases that verify the error is thrown when metadata length exceeds MAX_METADATA_LENGTH (256MB) or is negative (excluding the valid -1 continuation indicator), and that valid metadata lengths still work correctly.
src/ipc/message.ts
Outdated
| if (bodyLength > MAX_BODY_LENGTH) { | ||
| throw new Error(`Message body length ${bodyLength} exceeds maximum allowed size of ${MAX_BODY_LENGTH} bytes. The file may be corrupted.`); | ||
| } |
There was a problem hiding this comment.
The new validation logic for MAX_BODY_LENGTH lacks test coverage. Consider adding test cases that verify the error is thrown when bodyLength exceeds MAX_BODY_LENGTH (2GB), and that valid body lengths just under the limit still work correctly.
src/ipc/message.ts
Outdated
| /** | ||
| * Maximum allowed metadata length (256 MB). This is a safeguard against corrupted | ||
| * files that could cause the reader to hang or consume excessive memory. | ||
| * @ignore | ||
| */ | ||
| const MAX_METADATA_LENGTH = 256 * 1024 * 1024; | ||
|
|
||
| /** | ||
| * Maximum allowed message body length (2 GB). This is a safeguard against corrupted | ||
| * files that could cause the reader to hang or consume excessive memory. | ||
| * @ignore | ||
| */ | ||
| const MAX_BODY_LENGTH = 2 * 1024 * 1024 * 1024; | ||
|
|
There was a problem hiding this comment.
Yeah these are wrong, Arrow tables can be larger than this. It'd be better to figure out why your IPC stream is corrupted when it's written.
There was a problem hiding this comment.
Yeah these are wrong, Arrow tables can be larger than this. It'd be better to figure out why your IPC stream is corrupted when it's written.
Thank you for the feedback! You're right - I've reverted the size limits as they were too restrictive for valid Arrow files.
I have fixed this. |
What's Changed
Fixed an issue where [tableFromIPC()] could hang indefinitely when reading corrupted IPC files. The problem occurred when a corrupted file contained invalid metadata length or body length values, causing the reader to wait forever trying to read bytes that don't exist.
Added validation to reject unreasonably large metadata lengths (>256MB) and body lengths (>2GB), throwing a clear error message instead of hanging.
Closes #315.