fix: cap format parser partial buffers to prevent OOM#402
Conversation
Found by CodeRabbit in PR #375. JsonParser, RawParser, and CriParser partial buffers previously grew without limit until a newline arrived, risking memory exhaustion from a single malformed or extremely long line. This change: 1. Adds `max_line_size` to `JsonParser` and `RawParser` constructors. 2. Caps `partial` buffer growth in `process()` for all parsers. 3. Discards data exceeding `max_line_size` until the next newline. 4. Updates `CriParser` to also cap its outer partial buffer. 5. Sets a default 2MB limit in the production pipeline. 6. Updates tests and fuzzers to accommodate the new constructor signatures. Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughParsers (JsonParser, RawParser, CriParser) were changed to accept and store a Possibly related PRs
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/logfwd-io/src/format.rs (2)
49-79:⚠️ Potential issue | 🟠 MajorSilent truncation produces invalid JSON without signaling errors.
When a line exceeds
max_line_size, truncation yields malformed JSON (e.g.,{"key":"truncated_valwith no closing brace/quote). The method always returnsparse_errors = 0, so callers have no indication data was corrupted. The downstream scanner will fail on these lines.Consider either:
- Discard over-limit lines entirely and increment a truncation/error counter.
- Track truncation state and skip emitting the incomplete line.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/logfwd-io/src/format.rs` around lines 49 - 79, The process method silently truncates lines when a chunk or accumulated self.partial exceeds self.max_line_size and still returns (count, 0), producing malformed JSON without signaling errors; update process (and use self.partial and max_line_size) to detect when truncation/overflow occurs and avoid emitting the incomplete/truncated line: either drop the over-limit line instead of pushing it to out and increment the parse_errors return value, or track a truncation flag/state so you skip emitting the partial buffer and return a non-zero parse_errors count so callers know data was corrupted. Ensure the updated return uses the second tuple field to reflect the number of truncations/errors and that any emitted lines are only complete, non-truncated JSON.
131-134: 🧹 Nitpick | 🔵 Trivial
format_args!in byte loop for control characters.This allocates on each control character
< 0x20. While rare in typical logs, adversarial input could trigger many allocations. Pre-computed hex lookup table would avoid this.Suggestion: static hex lookup
const HEX: &[u8; 16] = b"0123456789abcdef"; // ... b if b < 0x20 => { out.extend_from_slice(b"\\u00"); out.push(HEX[(b >> 4) as usize]); out.push(HEX[(b & 0xf) as usize]); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/logfwd-io/src/format.rs` around lines 131 - 134, The current control-character branch uses std::io::Write::write_fmt with format_args! (write_fmt(out, format_args!("\\u{:04x}", b))) which allocates per character; replace it with a non-allocating static hex table and direct byte writes: add a static const HEX: &[u8;16] = b"0123456789abcdef"; then in the b if b < 0x20 branch write the four-byte sequence "\\u00" followed by HEX[(b >> 4) as usize] and HEX[(b & 0xF) as usize] to out (e.g., via out.extend_from_slice(b"\\u00") and out.push(...)), removing the format_args! / write_fmt call and using the existing out and b variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/logfwd/src/pipeline.rs`:
- Around line 637-643: The magic number 2 * 1024 * 1024 used in
build_format_parser for CriParser::new, RawParser::new, and JsonParser::new
should be extracted to a named constant (e.g., DEFAULT_PARSER_BUF_SIZE or
PARSER_CAPACITY) declared near the top of the module; replace the three
occurrences with that constant to centralize the default buffer size and improve
readability, keeping the same numeric value and visibility (pub if needed
elsewhere).
---
Outside diff comments:
In `@crates/logfwd-io/src/format.rs`:
- Around line 49-79: The process method silently truncates lines when a chunk or
accumulated self.partial exceeds self.max_line_size and still returns (count,
0), producing malformed JSON without signaling errors; update process (and use
self.partial and max_line_size) to detect when truncation/overflow occurs and
avoid emitting the incomplete/truncated line: either drop the over-limit line
instead of pushing it to out and increment the parse_errors return value, or
track a truncation flag/state so you skip emitting the partial buffer and return
a non-zero parse_errors count so callers know data was corrupted. Ensure the
updated return uses the second tuple field to reflect the number of
truncations/errors and that any emitted lines are only complete, non-truncated
JSON.
- Around line 131-134: The current control-character branch uses
std::io::Write::write_fmt with format_args! (write_fmt(out,
format_args!("\\u{:04x}", b))) which allocates per character; replace it with a
non-allocating static hex table and direct byte writes: add a static const HEX:
&[u8;16] = b"0123456789abcdef"; then in the b if b < 0x20 branch write the
four-byte sequence "\\u00" followed by HEX[(b >> 4) as usize] and HEX[(b & 0xF)
as usize] to out (e.g., via out.extend_from_slice(b"\\u00") and out.push(...)),
removing the format_args! / write_fmt call and using the existing out and b
variables.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0bfb4ef3-e46b-4cfc-998a-13843fa9257d
📒 Files selected for processing (4)
crates/logfwd-core/fuzz/fuzz_targets/format_parser.rscrates/logfwd-core/tests/compliance_data.rscrates/logfwd-io/src/format.rscrates/logfwd/src/pipeline.rs
Found by CodeRabbit in PR #375. JsonParser, RawParser, and CriParser partial buffers previously grew without limit until a newline arrived, risking memory exhaustion from a single malformed or extremely long line. This change: 1. Adds `max_line_size` to `JsonParser` and `RawParser` constructors. 2. Caps `partial` buffer growth in `process()` for all parsers. 3. Discards data exceeding `max_line_size` until the next newline. 4. Updates `CriParser` to also cap its outer partial buffer. 5. Sets a default 2MB limit in the production pipeline. 6. Updates tests and fuzzers to accommodate the new constructor signatures. 7. Fixed code formatting issues found in CI. Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
There was a problem hiding this comment.
There is one blocking regression to address before merge.
crates/logfwd-io/src/format.rs (around lines 205–211): in CriParser::process, when self.partial is already at limit and the next chunk begins with \n, to_add becomes 0, process_cri_to_buf(&self.partial, ...) receives a buffer with no newline, returns (0, 0), and self.partial.clear() drops the oversized pending line silently with no parse error.
Please preserve/error-account for that terminated over-limit line (e.g., track overflow state and increment parse_errors when newline arrives), and add a regression test for: oversized no-newline chunk -> newline in next chunk -> valid line.
Note
🔒 Integrity filtering filtered 1 item
Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.
- pr:#402 (
pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
What is this? | From workflow: AI: PR Review
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.
|
Addressed the unresolved review feedback and pushed a follow-up commit.
Validation
Note 🔒 Integrity filtering filtered 1 itemIntegrity filtering activated and filtered the following item during workflow execution.
What is this? | From workflow: AI: Address PR Review Feedback Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
This PR fixes a potential OOM vulnerability where format parsers (
JsonParser,RawParser,CriParser) would buffer data indefinitely if no newline was encountered.Key changes:
JsonParserandRawParsernow accept amax_line_sizeparameter.processmethod for these parsers now enforces this limit, truncating lines that exceed it.CriParser's outer partial buffer is now also capped (with a small overhead allowance for the CRI envelope).crates/logfwd-io/src/format.rsto verify the fix.Fixes #384
PR created automatically by Jules for task 6343512594639654685 started by @strawgate