Summary
Create logfwd-arrow crate and move Arrow-dependent code out of logfwd-core.
No backward compatibility needed — we have no external users. Just move the code and update all imports.
What moves
| File |
What |
Why |
| streaming_builder.rs |
StreamingBuilder (entire file) |
Arrow StringViewArray, bytes::Bytes |
| storage_builder.rs |
StorageBuilder (entire file) |
Arrow arrays, HashMap |
| scanner.rs (structs only) |
SimdScanner, StreamingSimdScanner |
Return RecordBatch |
| chunk_classify.rs (SIMD only) |
AVX2/SSE2/NEON platform impls |
unsafe intrinsics |
What stays in logfwd-core
- ScanBuilder trait (will become FieldSink in Phase 3)
- scan_into, scan_line, skip_ws (generic scan loop)
- ChunkIndex struct + compute_real_quotes + prefix_xor (scalar logic)
- Scalar find_char_mask fallback
- All Kani proofs
Hard part: SIMD extraction from chunk_classify.rs
chunk_classify.rs currently has both scalar logic (stays in core) and SIMD platform impls (moves to logfwd-arrow). The split point:
Stays in core:
ChunkIndex struct + new(), next_quote(), is_in_string(), scan_string(), skip_nested()
compute_real_quotes(), prefix_xor()
- Scalar
find_char_mask() (the #[cfg(not(any(x86_64, aarch64)))] fallback)
- The
#[cfg(kani)] mod verification block
Moves to logfwd-arrow:
mod x86 (AVX2 + SSE2 impls)
mod aarch64_impl (NEON impl)
- The platform dispatch function
find_quotes_and_backslashes()
- All SIMD-specific tests
The connection between them: core's ChunkIndex::new() currently calls find_quotes_and_backslashes() directly. After the split, core should define a CharDetector trait and ChunkIndex::new() should be generic over it (or take a function pointer). logfwd-arrow implements the trait with SIMD.
Steps
- Create
crates/logfwd-arrow/Cargo.toml (deps: logfwd-core, arrow, bytes)
- Move streaming_builder.rs, storage_builder.rs to logfwd-arrow/src/
- Extract SimdScanner + StreamingSimdScanner from scanner.rs → logfwd-arrow/src/scanner.rs
- Extract SIMD from chunk_classify.rs → logfwd-arrow/src/simd.rs
- Define CharDetector trait in core, make ChunkIndex::new generic over it
- Implement CharDetector with SIMD in logfwd-arrow
- Update all imports across workspace (logfwd, logfwd-transform, logfwd-bench, logfwd-output)
- All existing tests pass
Assignability
The file moves are Copilot-friendly. The CharDetector trait + SIMD split needs design review.
Consider splitting into two PRs:
- PR A: Move builders + scanner structs (purely mechanical)
- PR B: SIMD extraction + CharDetector trait (needs thought)
Parent: #262
Summary
Create
logfwd-arrowcrate and move Arrow-dependent code out of logfwd-core.No backward compatibility needed — we have no external users. Just move the code and update all imports.
What moves
What stays in logfwd-core
Hard part: SIMD extraction from chunk_classify.rs
chunk_classify.rs currently has both scalar logic (stays in core) and SIMD platform impls (moves to logfwd-arrow). The split point:
Stays in core:
ChunkIndexstruct +new(),next_quote(),is_in_string(),scan_string(),skip_nested()compute_real_quotes(),prefix_xor()find_char_mask()(the#[cfg(not(any(x86_64, aarch64)))]fallback)#[cfg(kani)] mod verificationblockMoves to logfwd-arrow:
mod x86(AVX2 + SSE2 impls)mod aarch64_impl(NEON impl)find_quotes_and_backslashes()The connection between them: core's
ChunkIndex::new()currently callsfind_quotes_and_backslashes()directly. After the split, core should define aCharDetectortrait andChunkIndex::new()should be generic over it (or take a function pointer). logfwd-arrow implements the trait with SIMD.Steps
crates/logfwd-arrow/Cargo.toml(deps: logfwd-core, arrow, bytes)Assignability
The file moves are Copilot-friendly. The CharDetector trait + SIMD split needs design review.
Consider splitting into two PRs:
Parent: #262