Add fuzz testing harness (3 targets, found scanner crash)#57
Conversation
WalkthroughA new fuzzing test suite is added under Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/logfwd-core/fuzz/fuzz_targets/format_parser.rs`:
- Around line 52-60: The output-size invariant is only checked after the final
parser run because `out` is reused and cleared; move or duplicate the size
assertion to run immediately after each parser invocation (i.e., after each time
`out` is populated following calls that exercise `JsonParser`, `RawParser`, and
the CRI invocation) so that amplification in any individual parser is caught;
reference the `out` buffer, the existing assert that compares `out.len()` to
`data.len() * 20 + 4096`, and the places where `out.clear()` is called and each
parser is invoked, and perform the same length check right after each parser
produces output.
In `@crates/logfwd-core/fuzz/fuzz_targets/pipe_event.rs`:
- Line 37: The code unsafely creates a reference to a PipeEvent from
data.as_ptr() which can be misaligned and causes undefined behavior; instead
copy the struct out using std::ptr::read_unaligned (or
std::ptr::read_unaligned::<PipeEvent>(data.as_ptr() as *const PipeEvent)) into a
local PipeEvent value and then use that value (replace the unsafe reference
creation to event with a read_unaligned-based copy), ensuring you no longer
dereference a potentially misaligned pointer.
In `@crates/logfwd-core/fuzz/fuzz_targets/scanner.rs`:
- Around line 29-65: Extend the same RecordBatch invariants to the pushdown
path: after creating scanner2 and computing batch2, assert batch2.num_columns()
and batch2.num_rows() are non-negative and, if batch2.num_rows() > 0, iterate
0..batch2.num_columns() and assert batch2.column(i).len() == batch2.num_rows()
(use schema = batch2.schema() and include schema.field(i).name() in the
assertion message) so the field-pushdown path (Scanner::new, ScanConfig,
FieldSpec, scanner2, batch2) enforces the same column-length consistency as the
original batch checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e74a1e91-36f4-4533-b1bf-c27b8468349f
⛔ Files ignored due to path filters (1)
crates/logfwd-core/fuzz/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
crates/logfwd-core/fuzz/.gitignorecrates/logfwd-core/fuzz/Cargo.tomlcrates/logfwd-core/fuzz/artifacts/fuzz_scanner/crash-47eef4b87ea82d35dc7da973c3391a3f15800a30crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-2d5b3aeca551c891189e4856971b929d4fefab9ecrates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-47eef4b87ea82d35dc7da973c3391a3f15800a30crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-5258ff094acbab2d15a3be0c8f89be128454d4bccrates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-878968522d930a45a393a40ae4b00fe2a2b95e1ccrates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-ab479c7a0300377f0cdfa22164d1db0a0a326df2crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-aed4337a5c6da1ffddc9b3acc7c42a86377e2420crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-cb529b0dba4299132befe93a913113cb667acaa2crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-e7f9636f4b6c13489c3188335bd2e0a3d33f2a04crates/logfwd-core/fuzz/fuzz_targets/format_parser.rscrates/logfwd-core/fuzz/fuzz_targets/pipe_event.rscrates/logfwd-core/fuzz/fuzz_targets/scanner.rs
| } | ||
|
|
||
| // Simulate what userspace does: cast raw bytes to PipeEvent. | ||
| let event = unsafe { &*(data.as_ptr() as *const PipeEvent) }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Checking for direct '&*(data.as_ptr() as *const T)' patterns in fuzz targets..."
rg -nP --type rust '&\*\(\s*data\.as_ptr\(\)\s+as\s+\*const\s+[A-Za-z0-9_:]+' crates/logfwd-core/fuzz/fuzz_targets -C2 || true
echo
echo "Checking for any '&*(...as_ptr() as *const T)' patterns in fuzz targets..."
rg -nP --type rust '&\*\(\s*[^)]*as_ptr\(\)\s+as\s+\*const\s+[A-Za-z0-9_:]+' crates/logfwd-core/fuzz/fuzz_targets -C2 || trueRepository: strawgate/memagent
Length of output: 1131
🏁 Script executed:
cat crates/logfwd-core/fuzz/fuzz_targets/pipe_event.rsRepository: strawgate/memagent
Length of output: 1999
🏁 Script executed:
# Search for PipeEvent struct definition
fd -t f '\.rs$' crates/logfwd-core | xargs grep -l 'struct PipeEvent' | head -5Repository: strawgate/memagent
Length of output: 112
🏁 Script executed:
# Check PipeEvent definition and alignment
rg -n 'struct PipeEvent' crates/logfwd-core -A 10 -B 2Repository: strawgate/memagent
Length of output: 1011
Fix undefined behavior in raw-byte struct cast.
Line 37 creates a reference from data.as_ptr() without alignment guarantee. PipeEvent has #[repr(C)] with a u64 field (8-byte alignment), but libfuzzer's input is an arbitrary byte slice (1-byte aligned). Dereferencing a misaligned pointer is undefined behavior.
Use std::ptr::read_unaligned() to safely copy the value:
Fix
- let event = unsafe { &*(data.as_ptr() as *const PipeEvent) };
+ let event = unsafe { std::ptr::read_unaligned(data.as_ptr() as *const PipeEvent) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-core/fuzz/fuzz_targets/pipe_event.rs` at line 37, The code
unsafely creates a reference to a PipeEvent from data.as_ptr() which can be
misaligned and causes undefined behavior; instead copy the struct out using
std::ptr::read_unaligned (or std::ptr::read_unaligned::<PipeEvent>(data.as_ptr()
as *const PipeEvent)) into a local PipeEvent value and then use that value
(replace the unsafe reference creation to event with a read_unaligned-based
copy), ensuring you no longer dereference a potentially misaligned pointer.
Fuzz targets (cargo-fuzz / libfuzzer): - fuzz_format_parser: all FormatParser impls, output size checked per run - fuzz_scanner: JSON scanner with column-length validation on both modes - fuzz_pipe_event: PipeEvent deserialization with safe aligned copy Scanner crash found: malformed JSON with binary keys causes column length mismatch panic (#56, likely fixed by SIMD parser rewrite #23). Run: cd crates/logfwd-core && cargo +nightly fuzz run <target> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/logfwd-core/fuzz/fuzz_targets/pipe_event.rs (1)
35-40:⚠️ Potential issue | 🔴 CriticalFix UB:
Vec<u8>pointer cast is still potentially misalignedLine 39 dereferences a
*const PipeEventfrombuf.as_ptr();Vec<u8>has byte alignment, so this can be misaligned and undefined behavior.🔧 Proposed fix
- // Safe deserialization: copy into aligned buffer instead of raw pointer cast - // (fuzz input may not be aligned to PipeEvent's alignment requirement). - let mut buf = vec![0u8; event_size]; - buf.copy_from_slice(&data[..event_size]); - let event = unsafe { &*(buf.as_ptr() as *const PipeEvent) }; + // Safe deserialization from potentially unaligned fuzz input. + let event = unsafe { std::ptr::read_unaligned(data.as_ptr() as *const PipeEvent) };Use this read-only check to confirm the risky pattern is removed:
#!/bin/bash set -euo pipefail echo "Looking for reference casts from u8 buffers to PipeEvent:" rg -nP --type rust '&\*\(\s*(buf|data)\.as_ptr\(\)\s+as\s+\*const\s+PipeEvent\s*\)' -C2 crates/logfwd-core/fuzz/fuzz_targets echo echo "Looking for unaligned reads to PipeEvent:" rg -nP --type rust 'read_unaligned\s*\(\s*(buf|data)\.as_ptr\(\)\s+as\s+\*const\s+PipeEvent\s*\)' -C2 crates/logfwd-core/fuzz/fuzz_targets/pipe_event.rs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/logfwd-core/fuzz/fuzz_targets/pipe_event.rs` around lines 35 - 40, The current unsafe cast from buf.as_ptr() to *const PipeEvent can still be misaligned; replace the direct reference creation (the unsafe line creating event from buf.as_ptr() as *const PipeEvent) with a safe unaligned read: allocate a properly aligned local PipeEvent value (or use std::ptr::read_unaligned) and copy bytes from buf into that aligned instance before taking a reference, or use a safe byte-to-struct conversion helper (e.g. bytemuck::try_from_bytes or core::slice::align_to) to produce a &PipeEvent; update the code around buf, event_size and the created event so no pointer-cast from Vec<u8>::as_ptr() to *const PipeEvent remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/logfwd-core/fuzz/fuzz_targets/pipe_event.rs`:
- Around line 35-40: The current unsafe cast from buf.as_ptr() to *const
PipeEvent can still be misaligned; replace the direct reference creation (the
unsafe line creating event from buf.as_ptr() as *const PipeEvent) with a safe
unaligned read: allocate a properly aligned local PipeEvent value (or use
std::ptr::read_unaligned) and copy bytes from buf into that aligned instance
before taking a reference, or use a safe byte-to-struct conversion helper (e.g.
bytemuck::try_from_bytes or core::slice::align_to) to produce a &PipeEvent;
update the code around buf, event_size and the created event so no pointer-cast
from Vec<u8>::as_ptr() to *const PipeEvent remains.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 35af5d7e-43fd-4cdd-9739-237dc28c72e6
⛔ Files ignored due to path filters (1)
crates/logfwd-core/fuzz/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
crates/logfwd-core/fuzz/.gitignorecrates/logfwd-core/fuzz/Cargo.tomlcrates/logfwd-core/fuzz/artifacts/fuzz_scanner/crash-47eef4b87ea82d35dc7da973c3391a3f15800a30crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-2d5b3aeca551c891189e4856971b929d4fefab9ecrates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-47eef4b87ea82d35dc7da973c3391a3f15800a30crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-5258ff094acbab2d15a3be0c8f89be128454d4bccrates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-878968522d930a45a393a40ae4b00fe2a2b95e1ccrates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-ab479c7a0300377f0cdfa22164d1db0a0a326df2crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-aed4337a5c6da1ffddc9b3acc7c42a86377e2420crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-cb529b0dba4299132befe93a913113cb667acaa2crates/logfwd-core/fuzz/artifacts/fuzz_scanner/minimized-from-e7f9636f4b6c13489c3188335bd2e0a3d33f2a04crates/logfwd-core/fuzz/fuzz_targets/format_parser.rscrates/logfwd-core/fuzz/fuzz_targets/pipe_event.rscrates/logfwd-core/fuzz/fuzz_targets/scanner.rs
Summary
Adds
cargo-fuzz/ libfuzzer fuzz testing for logfwd-core. Three fuzz targets that test the boundaries where untrusted data enters the system.Fuzz targets
fuzz_format_parserfuzz_scannerfuzz_pipe_eventBug found: scanner crash on binary keys (#56)
The scanner panics when processing malformed JSON with binary (non-UTF-8) key bytes. The batch builder produces columns with mismatched lengths. Minimized to 18 bytes.
Likely resolved by the upcoming SIMD parser rewrite (#23).
Usage
Test plan
cargo test --workspace— 102 tests pass (fuzz crate is separate workspace)cargo clippy --workspace -- -D warnings— cleancargo +nightly fuzz build🤖 Generated with Claude Code