Skip to content

fix: cap format parser partial buffers to prevent OOM#402

Closed
strawgate wants to merge 2 commits into
masterfrom
fix/format-parser-unbounded-growth-6343512594639654685
Closed

fix: cap format parser partial buffers to prevent OOM#402
strawgate wants to merge 2 commits into
masterfrom
fix/format-parser-unbounded-growth-6343512594639654685

Conversation

@strawgate

Copy link
Copy Markdown
Owner

This PR fixes a potential OOM vulnerability where format parsers (JsonParser, RawParser, CriParser) would buffer data indefinitely if no newline was encountered.

Key changes:

  • JsonParser and RawParser now accept a max_line_size parameter.
  • The process method for these parsers now enforces this limit, truncating lines that exceed it.
  • CriParser's outer partial buffer is now also capped (with a small overhead allowance for the CRI envelope).
  • Default limits of 2MB are applied in the pipeline configuration.
  • Tests and fuzzers have been updated and new reproduction tests were added to crates/logfwd-io/src/format.rs to verify the fix.

Fixes #384


PR created automatically by Jules for task 6343512594639654685 started by @strawgate

Found by CodeRabbit in PR #375. JsonParser, RawParser, and CriParser
partial buffers previously grew without limit until a newline arrived,
risking memory exhaustion from a single malformed or extremely long line.

This change:
1. Adds `max_line_size` to `JsonParser` and `RawParser` constructors.
2. Caps `partial` buffer growth in `process()` for all parsers.
3. Discards data exceeding `max_line_size` until the next newline.
4. Updates `CriParser` to also cap its outer partial buffer.
5. Sets a default 2MB limit in the production pipeline.
6. Updates tests and fuzzers to accommodate the new constructor signatures.

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
@google-labs-jules

Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai

coderabbitai Bot commented Mar 31, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8dc20a1a-3e87-4ab3-9847-32424a51e854

📥 Commits

Reviewing files that changed from the base of the PR and between 03a5fa7 and 1c36b05.

📒 Files selected for processing (3)
  • crates/logfwd-core/src/structural.rs
  • crates/logfwd-io/src/diagnostics.rs
  • crates/logfwd-io/src/format.rs

Walkthrough

Parsers (JsonParser, RawParser, CriParser) were changed to accept and store a max_line_size parameter and enforce bounded growth of their partial buffers during process(). JsonParser/RawParser truncate incoming segments to max_line_size (respecting existing partial content), and CriParser applies an outer cap of max_line_size + 128 while processing every newline-terminated chunk immediately. Call sites in pipeline, fuzz targets, and tests were updated to pass explicit sizes (examples: 64 KiB for fuzz, 2 MiB for runtime). Tests added to assert capped partial growth.

Possibly related PRs

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR successfully implements all required changes from #384: max_line_size parameters added to JsonParser/RawParser, partial buffers capped in process(), CriParser's outer buffer limited, 2MB defaults applied, and tests/fuzzers updated.
Out of Scope Changes check ✅ Passed Minor formatting changes in structural.rs and diagnostics.rs (indentation, import reordering) are tangential but acceptable as cleanup during active refactoring.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/logfwd-io/src/format.rs (2)

49-79: ⚠️ Potential issue | 🟠 Major

Silent truncation produces invalid JSON without signaling errors.

When a line exceeds max_line_size, truncation yields malformed JSON (e.g., {"key":"truncated_val with no closing brace/quote). The method always returns parse_errors = 0, so callers have no indication data was corrupted. The downstream scanner will fail on these lines.

Consider either:

  1. Discard over-limit lines entirely and increment a truncation/error counter.
  2. Track truncation state and skip emitting the incomplete line.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-io/src/format.rs` around lines 49 - 79, The process method
silently truncates lines when a chunk or accumulated self.partial exceeds
self.max_line_size and still returns (count, 0), producing malformed JSON
without signaling errors; update process (and use self.partial and
max_line_size) to detect when truncation/overflow occurs and avoid emitting the
incomplete/truncated line: either drop the over-limit line instead of pushing it
to out and increment the parse_errors return value, or track a truncation
flag/state so you skip emitting the partial buffer and return a non-zero
parse_errors count so callers know data was corrupted. Ensure the updated return
uses the second tuple field to reflect the number of truncations/errors and that
any emitted lines are only complete, non-truncated JSON.

131-134: 🧹 Nitpick | 🔵 Trivial

format_args! in byte loop for control characters.

This allocates on each control character < 0x20. While rare in typical logs, adversarial input could trigger many allocations. Pre-computed hex lookup table would avoid this.

Suggestion: static hex lookup
const HEX: &[u8; 16] = b"0123456789abcdef";
// ...
b if b < 0x20 => {
    out.extend_from_slice(b"\\u00");
    out.push(HEX[(b >> 4) as usize]);
    out.push(HEX[(b & 0xf) as usize]);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-io/src/format.rs` around lines 131 - 134, The current
control-character branch uses std::io::Write::write_fmt with format_args!
(write_fmt(out, format_args!("\\u{:04x}", b))) which allocates per character;
replace it with a non-allocating static hex table and direct byte writes: add a
static const HEX: &[u8;16] = b"0123456789abcdef"; then in the b if b < 0x20
branch write the four-byte sequence "\\u00" followed by HEX[(b >> 4) as usize]
and HEX[(b & 0xF) as usize] to out (e.g., via out.extend_from_slice(b"\\u00")
and out.push(...)), removing the format_args! / write_fmt call and using the
existing out and b variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/logfwd/src/pipeline.rs`:
- Around line 637-643: The magic number 2 * 1024 * 1024 used in
build_format_parser for CriParser::new, RawParser::new, and JsonParser::new
should be extracted to a named constant (e.g., DEFAULT_PARSER_BUF_SIZE or
PARSER_CAPACITY) declared near the top of the module; replace the three
occurrences with that constant to centralize the default buffer size and improve
readability, keeping the same numeric value and visibility (pub if needed
elsewhere).

---

Outside diff comments:
In `@crates/logfwd-io/src/format.rs`:
- Around line 49-79: The process method silently truncates lines when a chunk or
accumulated self.partial exceeds self.max_line_size and still returns (count,
0), producing malformed JSON without signaling errors; update process (and use
self.partial and max_line_size) to detect when truncation/overflow occurs and
avoid emitting the incomplete/truncated line: either drop the over-limit line
instead of pushing it to out and increment the parse_errors return value, or
track a truncation flag/state so you skip emitting the partial buffer and return
a non-zero parse_errors count so callers know data was corrupted. Ensure the
updated return uses the second tuple field to reflect the number of
truncations/errors and that any emitted lines are only complete, non-truncated
JSON.
- Around line 131-134: The current control-character branch uses
std::io::Write::write_fmt with format_args! (write_fmt(out,
format_args!("\\u{:04x}", b))) which allocates per character; replace it with a
non-allocating static hex table and direct byte writes: add a static const HEX:
&[u8;16] = b"0123456789abcdef"; then in the b if b < 0x20 branch write the
four-byte sequence "\\u00" followed by HEX[(b >> 4) as usize] and HEX[(b & 0xF)
as usize] to out (e.g., via out.extend_from_slice(b"\\u00") and out.push(...)),
removing the format_args! / write_fmt call and using the existing out and b
variables.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0bfb4ef3-e46b-4cfc-998a-13843fa9257d

📥 Commits

Reviewing files that changed from the base of the PR and between af06963 and 03a5fa7.

📒 Files selected for processing (4)
  • crates/logfwd-core/fuzz/fuzz_targets/format_parser.rs
  • crates/logfwd-core/tests/compliance_data.rs
  • crates/logfwd-io/src/format.rs
  • crates/logfwd/src/pipeline.rs

Comment thread crates/logfwd/src/pipeline.rs
Found by CodeRabbit in PR #375. JsonParser, RawParser, and CriParser
partial buffers previously grew without limit until a newline arrived,
risking memory exhaustion from a single malformed or extremely long line.

This change:
1. Adds `max_line_size` to `JsonParser` and `RawParser` constructors.
2. Caps `partial` buffer growth in `process()` for all parsers.
3. Discards data exceeding `max_line_size` until the next newline.
4. Updates `CriParser` to also cap its outer partial buffer.
5. Sets a default 2MB limit in the production pipeline.
6. Updates tests and fuzzers to accommodate the new constructor signatures.
7. Fixed code formatting issues found in CI.

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
@strawgate

Copy link
Copy Markdown
Owner Author

Conflicts with #399 which removes format parsers from the pipeline. The OOM fix for format.rs is still valid for test/fuzz consumers — will file a follow-up issue. The pipeline's remainder buffer in #399 should also have a size cap — adding that now.

@strawgate strawgate closed this Mar 31, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one blocking regression to address before merge.

crates/logfwd-io/src/format.rs (around lines 205–211): in CriParser::process, when self.partial is already at limit and the next chunk begins with \n, to_add becomes 0, process_cri_to_buf(&self.partial, ...) receives a buffer with no newline, returns (0, 0), and self.partial.clear() drops the oversized pending line silently with no parse error.

Please preserve/error-account for that terminated over-limit line (e.g., track overflow state and increment parse_errors when newline arrives), and add a regression test for: oversized no-newline chunk -> newline in next chunk -> valid line.

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.

  • pr:#402 (pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)

What is this? | From workflow: AI: PR Review

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@github-actions

Copy link
Copy Markdown
Contributor

Addressed the unresolved review feedback and pushed a follow-up commit.

  • Resolved with code change: PRRT_kwDORzg8fM53_vyd (extracted repeated 2 * 1024 * 1024 into DEFAULT_MAX_LINE_SIZE and reused it in build_format_parser in crates/logfwd/src/pipeline.rs).
  • Replied instead of fixing: none.
Validation
  • just ci could not be run in this environment (just: command not found).
  • Full-repo cargo fmt --check && cargo clippy -- -D warnings && taplo check && cargo deny check && cargo test was attempted and stopped by pre-existing formatting diffs outside this PR change.
  • Focused crate checks were attempted with RUSTC_WRAPPER='' to bypass missing sccache; compilation for cargo test -p logfwd --no-run started successfully but exceeded practical runtime in this environment before completion.

Note

🔒 Integrity filtering filtered 1 item

Integrity filtering activated and filtered the following item during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.


What is this? | From workflow: AI: Address PR Review Feedback

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix: format parsers have unbounded partial buffer growth

1 participant