Skip to content

feat: StructuralIndex — portable SIMD via wide, replace ChunkIndex#360

Merged
strawgate merged 16 commits into
masterfrom
feat/structural-index-continued
Mar 31, 2026
Merged

feat: StructuralIndex — portable SIMD via wide, replace ChunkIndex#360
strawgate merged 16 commits into
masterfrom
feat/structural-index-continued

Conversation

@strawgate
Copy link
Copy Markdown
Owner

Summary

  • Replace chunk_classify.rs with structural.rs — portable SIMD via wide crate
  • Scanner uses StructuralIndex instead of ChunkIndex
  • Delete ~1,040 lines of hand-rolled unsafe SIMD (NEON, AVX2, SSE2 modules)
  • Add ~460 lines of safe, portable structural detection + streaming classifier
  • Net: -580 lines, zero unsafe in scanner path

What changed

Before After
chunk_classify.rs (1,026 lines) structural.rs (490 lines)
3 platform SIMD modules (unsafe) wide::u8x16 (safe, portable)
ChunkIndex (stored Vec<u64>) StructuralIndex (same API, wide backend)
2 chars detected (quote, backslash) 10 chars detected (+ newline, space, comma, colon, braces, brackets)
memchr for newlines in scanner Newlines from SIMD pass (no separate memchr)
get_unchecked in accessors Safe indexing throughout

New types

  • RawBlockMasks — 10 u64 bitmasks from SIMD detection (stack-local)
  • ProcessedBlock — escape-aware, string-masked bitmasks
  • StreamingClassifier — carries 2 u64s between 64-byte blocks
  • StructuralIndex — whole-buffer classification with next_quote/scan_string/skip_nested

Benchmarks (vs previous ChunkIndex)

Benchmark Change
Narrow 5f pushdown +1.2% (noise)
Wide 55f pushdown -1.4% (faster)
K8s 12f SELECT * -1.6% (faster)
Batch 10K rows -2.0% (faster)
Nested JSON +12.5% (regression)

Nested regression is from safe bounds checking replacing get_unchecked in the skip_nested tight loop. All real-world workloads are neutral or faster.

Test plan

  • 262 tests passing (120 core lib + 66 core integration + 34 arrow + 42 other)
  • Clippy clean
  • SIMD ≡ scalar equivalence verified (6 cases + 100 pseudo-random blocks)
  • End-to-end pipeline tests (line extraction, field counting with string masking)
  • Fuzz target updated for StructuralIndex
  • Vectorization canary updated
  • All dev-docs updated (no stale ChunkIndex/chunk_classify references)
  • Benchmark on CI (criterion)

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 10d33cb3-d0bb-4a59-bf05-a365e5edd214

📥 Commits

Reviewing files that changed from the base of the PR and between 3270f21 and 4696b53.

📒 Files selected for processing (5)
  • .github/agents/issue-worker.agent.md
  • .github/copilot-instructions.md
  • AGENTS.md
  • crates/logfwd-core/src/structural.rs
  • dev-docs/research/wide-crate-evaluation.md

Walkthrough

This PR replaces the whole-buffer ChunkIndex classifier with a new SIMD-assisted pipeline: adds crates/logfwd-core/src/structural.rs (introducing StructuralIndex, StreamingClassifier, raw/processed block types, and portable-SIMD detection via the wide crate), deletes crates/logfwd-core/src/chunk_classify.rs, updates the scanner to use StructuralIndex and its helpers (next_quote, is_in_string, scan_string, skip_nested), removes the structural_detect benchmark, updates fuzz/bench/Cargo metadata and CI bench caching, and revises developer docs to reference StructuralIndex.

Possibly related PRs

  • strawgate/memagent PR 360 — Implements the same module-level replacement: deletes chunk_classify.rs and adds structural.rs with StructuralIndex/StreamingClassifier.
  • strawgate/memagent PR 307 — Refactors scanner integration to adopt StructuralIndex in place of ChunkIndex and updates scan flow.
  • strawgate/memagent PR 329 — Updates benchmarks and bench configuration overlapping the removed structural_detect benchmark and StructuralIndex changes.

Comment @coderabbitai help to get the list of available commands and usage tips.

strawgate and others added 7 commits March 31, 2026 00:22
New structural.rs module implementing the simdjson two-stage pattern:
- Stage 1 (SIMD): find_structural_chars_scalar detects 10 chars
- Stage 2 (scalar): StreamingClassifier processes bitmasks per-block

Key types:
- RawBlockMasks: 10 u64 bitmasks from SIMD detection (stack-local)
- ProcessedBlock: escape-aware, string-masked bitmasks
- StreamingClassifier: carries only 2 u64s between blocks

Includes 6 unit tests (string masking, escapes, cross-block carry,
tail blocks) and 5 Kani proof harnesses (correctness, consistency,
no-panic, tail masking, string exclusion).

Makes compute_real_quotes and prefix_xor pub in chunk_classify.rs
for reuse. ChunkIndex is preserved — migration happens in next step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add NEON, AVX2, and SSE2 backends for find_structural_chars(),
detecting all 10 structural characters in one SIMD pass. Each
backend loads 64 bytes once, then runs 10 comparisons against
the loaded data — same pattern as existing chunk_classify.rs
but extended from 2 to 10 characters.

New tests:
- simd_matches_scalar: 6 representative inputs verified identical
- simd_matches_scalar_random: 100 pseudo-random blocks verified
- end_to_end_ndjson_line_extraction: full buffer → line ranges
- end_to_end_structural_field_counting: comma/colon counting
  with string-interior masking (comma in "hello, world" masked)

10 tests total, all passing. Clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive evaluation of the `wide` crate as replacement for
hand-rolled std::arch intrinsics. Covers platform dispatch, API
surface, performance, Kani implications, and alternatives.

Recommendation: use wide. Eliminates ~285 lines of unsafe
platform-specific code, same generated assembly, works on
x86_64 + aarch64 + WASM with one codebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 3 platform-specific SIMD modules (NEON ~55 lines, AVX2 ~30
lines, SSE2 ~40 lines, all unsafe) with one 15-line safe function
using `wide::u8x16`. Same codegen, all platforms, zero unsafe.

Before: simd_neon + simd_avx2 + simd_sse2 + cfg dispatch = 165 lines
After:  mask64() + find_structural_chars() = 25 lines

All 10 tests pass including SIMD ≡ scalar equivalence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ChunkIndex with StructuralIndex in the scanner. One SIMD
pass (via `wide` crate) now detects 10 structural characters and
extracts line boundaries simultaneously. Scanner uses the same
next_quote/is_in_string/scan_string/skip_nested API.

Key changes:
- scan_into() uses StructuralIndex::new() instead of ChunkIndex::new()
- Line ranges extracted during SIMD pass (no separate memchr call)
- No unsafe code in the new path (wide handles SIMD safely)
- Removes memchr import from scanner.rs

100 tests passing (66 core + 34 arrow). Clippy clean.

chunk_classify.rs is now dead code — will be deleted in follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chunk_classify.rs is replaced by structural.rs. All hand-rolled SIMD
(NEON, AVX2, SSE2 modules) eliminated — `wide` crate handles
portable SIMD. compute_real_quotes and prefix_xor moved to
structural.rs.

Deleted: ~740 lines of chunk_classify.rs (including ~285 lines of
unsafe platform-specific SIMD code).

Updated: fuzz target and vectorization canary to reference
structural.rs instead of chunk_classify.rs.

262 tests passing. Clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace ChunkIndex → StructuralIndex across all dev-docs
- Replace chunk_classify.rs → structural.rs references
- Update "Only dependency: memchr" → includes wide crate
- Fix copilot-instructions: "no unsafe" → "unsafe only in SIMD layer"
- Update PHASES.md: add Phase 9 (allocation-free kernel)
- Add decision: pipeline state machine over linear BatchToken
- Update PROOF_AUDIT.md section headings
- Update ZERO_COPY_PIPELINE.md pipeline diagram

New issues created:
- #357 — Decommission shadow JSON parser in otlp.rs
- #358 — Allocation-free kernel (no extern crate alloc)
- #359 — proptest: SQL pushdown integrity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@strawgate strawgate force-pushed the feat/structural-index-continued branch from 2a02865 to a91b882 Compare March 31, 2026 05:23
strawgate and others added 2 commits March 31, 2026 00:27
- Fix AGENTS.md: remove phantom investigation links, update crate
  structure, add architecture decisions section, fix reference paths
- Create CLAUDE.md symlink → AGENTS.md
- Slim .github/copilot-instructions.md to pointer (was duplicating AGENTS.md)
- Delete docs/ARCHITECTURE.md (duplicate of dev-docs/ARCHITECTURE.md)
- Delete 4 orphaned compass artifacts from references/
- Move docs/investigation/ → dev-docs/research/ (one file)
- Fix typo flagged by CI ("ue" in test comment)

Net: -1,317 lines of stale/duplicate content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CONTRIBUTING.md: pre-commit/pre-push checklists, PR process, key commands
- CODE_STYLE.md: naming, error handling, hot path rules, abstraction
  preferences, comment style, test conventions — for CodeRabbit and
  human reviewers (not linting rules)
- issue-worker.agent.md: slimmed from 130 to 40 lines, references
  AGENTS.md + CONTRIBUTING.md instead of duplicating project info

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@strawgate
Copy link
Copy Markdown
Owner Author

@coderabbitai please do a full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

@strawgate Sure, kicking off a full review of the PR now!


🧠 Learnings used
Learnt from: CR
Repo: strawgate/memagent PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-31T04:48:42.229Z
Learning: Applies to **/{scanner,format,parse}*/*.rs : Avoid per-line heap allocations in hot paths (scanner, format parser)

Learnt from: CR
Repo: strawgate/memagent PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-30T16:21:10.222Z
Learning: Applies to **/*.rs : Avoid per-line heap allocations in hot paths (scanner, CRI parser, OTLP encoder, compress)
✅ Actions performed

Full review triggered.

strawgate and others added 2 commits March 31, 2026 00:38
structural_detect.rs benchmark referenced chunk_classify::ChunkIndex
which was deleted. The benchmark served its purpose (research
comparison documented in dev-docs/research/). Remove it rather than
port to StructuralIndex — a new benchmark should be written against
the final API.

Also fix fuzz target doc comment: "NEON intrinsics" → "portable
SIMD via wide crate".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove restating comments (Stage 2/3 labels, "tail mask", "extract
  line ranges") that just echo the code below them
- Add WHY comment on mask64: explains u8x16 vs u8x32 choice (NEON
  has no native 256-bit, u8x32 silently degrades to 2× u8x16)
- Add WHY comment on StructuralIndex: explains why only 2 of 10
  bitmasks are stored (others consumed during construction)
- Keep the one useful inline comment: newlines skip string masking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@strawgate strawgate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

Excellent PR — this is a clean replacement of ~1,040 lines of hand-rolled unsafe SIMD with ~460 lines of safe, portable code. The streaming classifier design is well-structured, thoroughly tested, and the simdjson-style architecture (detect → process → consume) is the right approach. Net -580 lines, zero unsafe, good Kani proofs.

I have a few items to flag, ranging from a potential broken doc link to performance observations.


🔴 Broken documentation reference

AGENTS.md now references dev-docs/ARCHITECTURE.md but this PR deletes docs/ARCHITECTURE.md and no new file at dev-docs/ARCHITECTURE.md appears in the changeset. Unless that file already exists on the branch, this is a dangling link.

🟡 skip_nested regression path

The 12.5% nested-JSON regression is acknowledged and attributed to safe bounds checking. But there's a deeper opportunity: skip_nested currently calls is_in_string(pos) on every byte, each of which does a block lookup + bit shift. Since StructuralIndex already stores in_string bitmasks, skip_nested could extract the bitmask once per 64-byte block and iterate within it using bit operations (e.g., use stored open_brace/close_brace bitmasks to jump between structural positions rather than scanning byte-by-byte). This wouldn't just recover the regression — it could make nested skipping significantly faster than the old get_unchecked path.

Not a blocker for this PR, but worth filing as a follow-up since the data structures are already in place to support it.

🟡 Benchmark deleted without replacement

structural_detect.rs (824 lines) was deleted. There's no micro-benchmark for the new find_structural_chars + StreamingClassifier::process_block pipeline. The scanner benchmark still exists (good), but without a dedicated structural detection benchmark, performance regressions in the SIMD layer would only show up indirectly. Consider adding a minimal criterion bench for StructuralIndex::new() over representative buffers.

🟢 mask64 re-loads block data 10 times

find_structural_chars calls mask64 10 times, each creating 4 × u8x16::new() from the same block. LLVM will almost certainly inline mask64 and hoist the loads, but an explicit "load once, compare 10 times" version would be more robust against compiler regressions and would make the intent clearer:

pub fn find_structural_chars(block: &[u8; 64]) -> RawBlockMasks {
    let c0 = u8x16::new(block[0..16].try_into().unwrap());
    let c1 = u8x16::new(block[16..32].try_into().unwrap());
    let c2 = u8x16::new(block[32..48].try_into().unwrap());
    let c3 = u8x16::new(block[48..64].try_into().unwrap());
    
    let mask = |needle: u8| -> u64 {
        let n = u8x16::splat(needle);
        (c0.simd_eq(n).to_bitmask() as u64)
            | ((c1.simd_eq(n).to_bitmask() as u64) << 16)
            | ((c2.simd_eq(n).to_bitmask() as u64) << 32)
            | ((c3.simd_eq(n).to_bitmask() as u64) << 48)
    };
    
    RawBlockMasks {
        newline: mask(b'\n'),
        space: mask(b' '),
        // ...
    }
}

🟢 Minor observations

  • ProcessedBlock computes space, open_bracket, close_bracket during StructuralIndex::new() but these are never read — only newline, real_quotes, and in_string are stored. The compiler should DCE these, but it's worth a comment noting they're currently unused in the stored-index path (they'd be consumed in a future streaming-only path).
  • StructuralIndex::new() returns Vec<(usize, usize)> for line ranges. For very large buffers, an iterator/callback pattern would avoid the allocation, but this matches the existing scanner pattern and is fine for now.
  • Newlines not string-masked is documented and correct for NDJSON, but worth a debug_assert! or test that verifies the NDJSON invariant (no literal newlines inside JSON strings) so a future reader doesn't "fix" this.
  • compute_real_quotes — the iterative O(num_backslashes) approach is a correct replacement for the O(1) prefix_xor method. The escape carry logic at block boundaries looks correct. Nice that the Kani proof covers this exhaustively.

🟢 Docs & CI

  • bench.yml binary caching — good improvement, will speed up CI significantly when only non-code files change.
  • CONTRIBUTING.md + CODE_STYLE.md — clean separation of concerns. The agent/copilot instruction trimming removes a lot of duplication.
  • PROOF_AUDIT.md updates correctly reflect the rename from ChunkIndex to StructuralIndex.
  • Deleting references/ compass artifacts — fine, these were LLM-generated research docs that have been superseded.

Summary

Ship it after confirming the dev-docs/ARCHITECTURE.md reference isn't broken and ideally filing a follow-up for the skip_nested bitmask optimization and a replacement structural detection benchmark. The core code is solid.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/agents/issue-worker.agent.md:
- Around line 34-40: The condensed checklist removed two hard rules; restore the
repo-wide constraints by re-adding “Do not use unsafe code” and expanding the
hot-path allocation line to cover all required components (e.g., change the
existing “No per-line heap allocations in hot paths (scanner, OTLP encoder, CRI
parser)” entry to explicitly include format parser and compress — e.g.,
“scanner, format parser, CRI parser, OTLP encoder, compress”), ensuring the
checklist in issue-worker.agent.md matches the canonical repo rules and
language.

In @.github/workflows/bench.yml:
- Around line 81-86: Add a fail-fast verification step before the "Upload
binaries" action to ensure the expected artifacts exist in cached-binaries/;
create a new job step (e.g., name: "Verify cached binaries") that checks for the
presence of the required binary files (or at least that the directory is
non-empty) and exits non‑zero if any are missing so the build job fails
immediately instead of allowing later matrix jobs to error on a partial cache
restore.
- Line 55: The cache key for benchmark binaries is under-scoped; update the
hashFiles call used in the key (the line containing key: bench-binaries-${{
hashFiles('Cargo.lock', 'crates/*/src/**', 'crates/*/Cargo.toml') }}) to also
include root Cargo.toml, rust-toolchain.toml, .cargo/config.toml (or
.cargo/config) and the workflow file itself so changes to workspace manifest,
toolchain, cargo config or RUSTFLAGS invalidate the cache; modify the hashFiles
argument list to include those filenames accordingly.

In `@AGENTS.md`:
- Around line 12-25: The fenced code block that lists crates (starting with the
line "logfwd-core         Parsing, structural SIMD detection, pipeline logic")
lacks a language specifier; update the opening fence from ``` to ```text (or
```plaintext) so the block becomes "```text" to satisfy the static analysis and
ensure correct rendering/syntax highlighting.
- Around line 44-49: The doc currently claims "Streaming per-block: no stored
bitmask vectors" as a key decision but that misstates the current
implementation—StructuralIndex still stores Vec<u64> bitmasks—so update the text
to make this a target/goal rather than current state: change the phrase to
something like "Target: streaming per-block (no stored bitmask vectors) —
current implementation (StructuralIndex) still uses Vec<u64> bitmasks" and/or
add a short note referencing StructuralIndex and its Vec<u64> bitmask storage to
avoid confusion.

In `@crates/logfwd-core/fuzz/fuzz_targets/chunk_index.rs`:
- Around line 1-28: Rename the fuzz target file from chunk_index.rs to
structural_index.rs to match the exercised type `StructuralIndex`; move/rename
the file in the fuzz target directory and update any references to the old
filename in fuzz manifest/config (e.g., fuzz target registrations in Cargo.toml
or any scripts) so the fuzz harness still builds and runs the
`fuzz_target!(|data: &[u8]| { ... })` that constructs `StructuralIndex::new` and
invokes `SimdScanner`, and update any README/docs that mention the old name for
consistency.

In `@crates/logfwd-core/src/lib.rs`:
- Line 15: The new public module export pub mod structural lacks documentation;
add a Rustdoc comment above the declaration describing the purpose and public
surface of the structural module (what it contains and when to use it), mention
any key types or traits exported by structural (so callers know what to expect),
and include a brief usage example or pointer to more detailed docs if
appropriate; place the comment immediately above the pub mod structural
declaration in lib.rs using /// style so it satisfies the crate's "All public
APIs must have doc comments" rule.

In `@crates/logfwd-core/src/structural.rs`:
- Around line 779-781: The mask calculation for after_nl_mask panics when
bit_pos == 63 due to shifting by 64; change the logic in the block around
after_nl_mask (where after_nl_mask is computed and used with processed.colon and
processed.comma) to handle the boundary: if bit_pos == 63 set after_nl_mask = 0,
otherwise compute after_nl_mask = !((1u64 << (bit_pos + 1)) - 1); keep the
subsequent uses of after_nl_mask with processed.colon and
processed.comma.count_ones() unchanged.
- Around line 335-350: The skip_nested() loop only tracks numeric depth and
treats '}' and ']' the same, which allows mismatched pairs like "{]" to silently
desynchronize the scanner; update skip_nested() to record the opener kind (e.g.,
push a boolean or small stack/bitset when you see '{' vs '[') and on
encountering a closer check it matches the last opener via that recorded kind
(or return an error/None on mismatch) rather than decrementing depth blindly;
use the existing is_in_string(pos), buf, pos and len variables and return
behavior of skip_nested() to locate and handle mismatches consistently.
- Around line 439-451: find_structural_chars currently calls mask64 ten times
which rebuilds the same four SIMD/load lanes from the 64-byte block on each
call; hoist the four lane loads out of mask64 by loading the block into the lane
variables once at the start of find_structural_chars and reuse them for every
comparison, then call a new helper or inline mask computation that accepts those
preloaded lanes to produce each RawBlockMasks field (newline, space, quote,
backslash, comma, colon, open_brace, close_brace, open_bracket, close_bracket)
so the block is loaded only once per invocation instead of ten times.
- Around line 79-89: Public API elements RawBlockMasks (and its public fields
newline, space, quote, backslash, comma, colon, open_brace, close_brace,
open_bracket, close_bracket) and the public constructor
StreamingClassifier::new() lack rustdoc; add concise doc comments describing the
purpose and semantics of the RawBlockMasks struct and each field (what bitmask
each represents and how callers should use them), and add a doc comment for
StreamingClassifier::new() explaining what the constructor does, its parameters,
and any invariants or error conditions so the public surface is fully
documented.
- Around line 245-246: Replace the production .unwrap() calls on fixed-size
slice conversions with .expect(...) that documents the invariant (or refactor to
avoid the conversion) so failures carry a descriptive message; specifically, in
structural.rs where you build a 64-byte block (the let block: [u8; 64] = if
remaining >= 64 { buf[offset..offset + 64].try_into().unwrap() }), change the
unwrap to .expect("slice length invariant: remaining >= 64 ensures 64-byte
conversion") or similar, and apply the same change to the other
try_into().unwrap() occurrences referenced around lines 424-427 so all
fixed-size try_into() calls assert the invariant rather than panicking with
unwrap().
- Around line 544-546: The test fixture contains a duplicated substring "ues"
causing the typos check to fail; update the comment and the raw-string payload
so the escaped-quote example no longer contains "ues". Specifically, edit the
comment line and the variable json (the raw string bound to json) to use a
different escaped-quote sample such as {"k":"val\"ue"} (keeping the escaped
quote) and leave buf unchanged, so the symbols to edit are the comment above buf
and the json variable assignment.

In `@dev-docs/DIRECTION.md`:
- Line 20: The SIMD architecture descriptions are inconsistent: Line 20 mentions
portable SIMD via `wide` but nearby lines still reference `NEON/AVX2/SSE2` in
the `logfwd-arrow` text; update the adjacent lines so the section consistently
describes a single SIMD approach. Edit the `logfwd-arrow` mentions of
`NEON/AVX2/SSE2` to reference portable SIMD (e.g., `wide`) or conversely change
Line 20 to match `NEON/AVX2/SSE2`—whichever single architecture you choose—and
make the wording uniform across the paragraph, keeping references to `memchr`
and `wide` consistent with that chosen architecture.

In `@dev-docs/PROOF_AUDIT.md`:
- Around line 6-24: The documented proof names and coverage in PROOF_AUDIT.md
are out-of-sync with the actual Kani proofs in structural.rs; replace references
to verify_prefix_xor and verify_compute_real_quotes with the real proof names
(verify_find_char_mask_correct, verify_structural_scalar_consistent,
verify_process_block_no_panic, verify_process_block_tail_mask,
verify_in_string_exclusion) and update each proof’s coverage note to reflect
what those functions actually prove (e.g., map prefix-xor/quote-byte coverage to
verify_find_char_mask_correct and verify_structural_scalar_consistent where
appropriate, state that multi-block chaining/carry propagation is only tested by
proptest and not Kani-proven and that verify_process_block_* proofs address
no-panic, tail mask and in-string exclusion properties). Ensure the prose
clearly links each listed proof name to its true verified properties and
preserves the existing “Gap” notes about process_block multi-block chaining.

In `@dev-docs/research/wide-crate-evaluation.md`:
- Around line 28-38: The wide crate now exposes simd_eq() instead of cmp_eq(),
so update all calls to use simd_eq: in the mask64 function replace lo.cmp_eq(n)
and hi.cmp_eq(n) with lo.simd_eq(n) and hi.simd_eq(n), replace any
data.cmp_eq(needle) calls with data.simd_eq(needle), and replace
u8x32::cmp_eq(...) usages with u8x32::simd_eq(...); keep the existing CmpEq
import and to_bitmask() usage unchanged.

In `@DEVELOPING.md`:
- Around line 68-69: Update the bullet describing chunk-level classification to
reflect the new portable SIMD implementation: change the NEON/SSE wording to
portable SIMD (`wide`) and replace "pre-computes all quote positions" with
"pre-computes a structural-character mask" so that `StructuralIndex` is
described as using `wide` to build a structural-character bitmask for the buffer
and then `scan_string` consumes that mask with a single `trailing_zeros`
bit-scan; reference `StructuralIndex`, `wide`, and `scan_string` in the updated
description.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 662fd3a8-b096-4896-ac7b-658e06cc5cd4

📥 Commits

Reviewing files that changed from the base of the PR and between fb87b82 and 77fb6be.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (28)
  • .github/agents/issue-worker.agent.md
  • .github/copilot-instructions.md
  • .github/workflows/bench.yml
  • AGENTS.md
  • CLAUDE.md
  • CODE_STYLE.md
  • CONTRIBUTING.md
  • DEVELOPING.md
  • crates/logfwd-core/Cargo.toml
  • crates/logfwd-core/benches/structural_detect.rs
  • crates/logfwd-core/fuzz/fuzz_targets/chunk_index.rs
  • crates/logfwd-core/src/chunk_classify.rs
  • crates/logfwd-core/src/lib.rs
  • crates/logfwd-core/src/scanner.rs
  • crates/logfwd-core/src/structural.rs
  • crates/logfwd-core/tests/vectorization_canary.rs
  • dev-docs/DECISIONS.md
  • dev-docs/DIRECTION.md
  • dev-docs/PHASES.md
  • dev-docs/PROOF_AUDIT.md
  • dev-docs/ZERO_COPY_PIPELINE.md
  • dev-docs/research/arrow-ipc-feasibility.md
  • dev-docs/research/wide-crate-evaluation.md
  • docs/ARCHITECTURE.md
  • references/compass_artifact_wf-2dfe88a4-5edb-4626-bb0f-86e075888b3a_text_markdown.md
  • references/compass_artifact_wf-8cf54a98-da0f-43c0-a58d-004071580569_text_markdown.md
  • references/compass_artifact_wf-b72cb9f9-d224-4cf4-8352-9f4acf1d2f3c_text_markdown.md
  • references/compass_artifact_wf-c5800b72-6964-4704-aa02-05d6ca9b3c55_text_markdown.md
💤 Files with no reviewable changes (7)
  • references/compass_artifact_wf-b72cb9f9-d224-4cf4-8352-9f4acf1d2f3c_text_markdown.md
  • docs/ARCHITECTURE.md
  • references/compass_artifact_wf-2dfe88a4-5edb-4626-bb0f-86e075888b3a_text_markdown.md
  • crates/logfwd-core/benches/structural_detect.rs
  • references/compass_artifact_wf-c5800b72-6964-4704-aa02-05d6ca9b3c55_text_markdown.md
  • references/compass_artifact_wf-8cf54a98-da0f-43c0-a58d-004071580569_text_markdown.md
  • crates/logfwd-core/src/chunk_classify.rs

Comment on lines +34 to +40
- Run `just ci` before considering any change complete
- Zero clippy warnings, all tests pass, `cargo fmt` clean
- No `.unwrap()` in production code — use `?` or `.expect("reason")`
- No per-line heap allocations in hot paths (scanner, OTLP encoder, CRI parser)
- No new dependencies without justification
- No unnecessary abstractions — write the simplest correct code
- Doc comments on public APIs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep the condensed checklist aligned with hard repo rules.

This version drops the explicit no unsafe code rule and narrows the no-allocation hot-path list by omitting format parser and compress. Since this file is the agent's working checklist, it shouldn't relax repo-wide constraints.

Based on learnings Avoid per-line heap allocations in hot paths (scanner, format parser) and Avoid per-line heap allocations in hot paths (scanner, CRI parser, OTLP encoder, compress). As per coding guidelines Do not use unsafe code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/agents/issue-worker.agent.md around lines 34 - 40, The condensed
checklist removed two hard rules; restore the repo-wide constraints by re-adding
“Do not use unsafe code” and expanding the hot-path allocation line to cover all
required components (e.g., change the existing “No per-line heap allocations in
hot paths (scanner, OTLP encoder, CRI parser)” entry to explicitly include
format parser and compress — e.g., “scanner, format parser, CRI parser, OTLP
encoder, compress”), ensuring the checklist in issue-worker.agent.md matches the
canonical repo rules and language.

uses: actions/cache@v4
with:
path: cached-binaries
key: bench-binaries-${{ hashFiles('Cargo.lock', 'crates/*/src/**', 'crates/*/Cargo.toml') }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && cat -n .github/workflows/bench.yml | sed -n '40,70p'

Repository: strawgate/memagent

Length of output: 121


🏁 Script executed:

find . -name "bench.yml" -path "*/.github/workflows/*" | head -5

Repository: strawgate/memagent

Length of output: 91


🏁 Script executed:

cat -n ./.github/workflows/bench.yml | sed -n '40,70p'

Repository: strawgate/memagent

Length of output: 1340


🏁 Script executed:

# Check for root Cargo.toml and other potential build inputs
ls -la Cargo.toml Cargo.lock 2>/dev/null | head -10
echo "---"
# Check for .cargo and rust-toolchain files
find . -maxdepth 2 -name ".cargo" -o -name "rust-toolchain*" 2>/dev/null | grep -v ".git"

Repository: strawgate/memagent

Length of output: 220


🏁 Script executed:

# Check root Cargo.toml content to see if it affects build
echo "=== Root Cargo.toml ==="
cat Cargo.toml
echo ""
echo "=== rust-toolchain.toml ==="
cat rust-toolchain.toml
echo ""
echo "=== .cargo directory structure ==="
find .cargo -type f 2>/dev/null | head -20

Repository: strawgate/memagent

Length of output: 1075


🏁 Script executed:

# Check .cargo/config.toml
echo "=== .cargo/config.toml ==="
cat .cargo/config.toml
echo ""
# Check if RUSTFLAGS in workflow or other env vars affect build
echo "=== Full workflow around build step ==="
cat -n ./.github/workflows/bench.yml | sed -n '66,85p'

Repository: strawgate/memagent

Length of output: 1228


Cache key is under-scoped for reproducible benchmark binaries.

Line 55 omits root Cargo.toml (workspace, dependencies, profile settings), rust-toolchain.toml (toolchain channel), .cargo/config.toml (rustc-wrapper, default rustflags), and the workflow file itself (contains RUSTFLAGS). Changes to any of these will not invalidate the cache, causing stale binaries to be reused with outdated dependencies or compiler flags.

Suggested key hardening
-          key: bench-binaries-${{ hashFiles('Cargo.lock', 'crates/*/src/**', 'crates/*/Cargo.toml') }}
+          key: bench-binaries-${{ runner.os }}-${{ hashFiles('Cargo.lock', 'Cargo.toml', 'crates/*/src/**', 'crates/*/Cargo.toml', '.cargo/**', 'rust-toolchain*', '.github/workflows/bench.yml') }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
key: bench-binaries-${{ hashFiles('Cargo.lock', 'crates/*/src/**', 'crates/*/Cargo.toml') }}
key: bench-binaries-${{ runner.os }}-${{ hashFiles('Cargo.lock', 'Cargo.toml', 'crates/*/src/**', 'crates/*/Cargo.toml', '.cargo/**', 'rust-toolchain*', '.github/workflows/bench.yml') }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/bench.yml at line 55, The cache key for benchmark binaries
is under-scoped; update the hashFiles call used in the key (the line containing
key: bench-binaries-${{ hashFiles('Cargo.lock', 'crates/*/src/**',
'crates/*/Cargo.toml') }}) to also include root Cargo.toml, rust-toolchain.toml,
.cargo/config.toml (or .cargo/config) and the workflow file itself so changes to
workspace manifest, toolchain, cargo config or RUSTFLAGS invalidate the cache;
modify the hashFiles argument list to include those filenames accordingly.

Comment on lines 81 to 86
- name: Upload binaries
uses: actions/upload-artifact@v4
with:
name: bench-binaries
path: |
target/release/logfwd
target/release/logfwd-dhat
target/release/logfwd-competitive-bench
path: cached-binaries/
retention-days: 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add a fail-fast integrity check before uploading cached binaries.

When build is skipped, there’s no guard that all expected binaries exist in cached-binaries/. A partial cache restore will fail later in matrix jobs instead of failing the build job immediately.

Suggested guard step
+      - name: Verify cached binaries
+        run: |
+          test -f cached-binaries/logfwd
+          test -f cached-binaries/logfwd-dhat
+          test -f cached-binaries/logfwd-competitive-bench
+
       - name: Upload binaries
         uses: actions/upload-artifact@v4
         with:
           name: bench-binaries
           path: cached-binaries/
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Upload binaries
uses: actions/upload-artifact@v4
with:
name: bench-binaries
path: |
target/release/logfwd
target/release/logfwd-dhat
target/release/logfwd-competitive-bench
path: cached-binaries/
retention-days: 1
- name: Verify cached binaries
run: |
test -f cached-binaries/logfwd
test -f cached-binaries/logfwd-dhat
test -f cached-binaries/logfwd-competitive-bench
- name: Upload binaries
uses: actions/upload-artifact@v4
with:
name: bench-binaries
path: cached-binaries/
retention-days: 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/bench.yml around lines 81 - 86, Add a fail-fast
verification step before the "Upload binaries" action to ensure the expected
artifacts exist in cached-binaries/; create a new job step (e.g., name: "Verify
cached binaries") that checks for the presence of the required binary files (or
at least that the directory is non-empty) and exits non‑zero if any are missing
so the build job fails immediately instead of allowing later matrix jobs to
error on a partial cache restore.

Comment thread AGENTS.md Outdated
Comment thread AGENTS.md
Comment thread crates/logfwd-core/src/structural.rs Outdated
Comment on lines +779 to +781
let after_nl_mask = !((1u64 << (bit_pos + 1)) - 1);
let colons_after = (processed.colon & after_nl_mask).count_ones();
let commas_after = (processed.comma & after_nl_mask).count_ones();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle the block-end newline case in this mask calculation.

If bit_pos == 63, 1u64 << (bit_pos + 1) becomes 1u64 << 64 and this test panics instead of validating the boundary case. Return 0 for after_nl_mask in that case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 779 - 781, The mask
calculation for after_nl_mask panics when bit_pos == 63 due to shifting by 64;
change the logic in the block around after_nl_mask (where after_nl_mask is
computed and used with processed.colon and processed.comma) to handle the
boundary: if bit_pos == 63 set after_nl_mask = 0, otherwise compute
after_nl_mask = !((1u64 << (bit_pos + 1)) - 1); keep the subsequent uses of
after_nl_mask with processed.colon and processed.comma.count_ones() unchanged.

Comment thread dev-docs/DIRECTION.md
Comment thread dev-docs/PROOF_AUDIT.md Outdated
Comment thread dev-docs/research/wide-crate-evaluation.md
Comment thread DEVELOPING.md Outdated
pub fn find_char_mask(block: &[u8; 64], needle: u8) -> u64 {
let mut mask: u64 = 0;
for (i, &b) in block.iter().enumerate() {
if b == needle {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Load the four u8x16 chunks once in find_structural_chars and close over them, rather than re-loading in each mask64 call. LLVM will likely inline and hoist, but the explicit version is clearer and more robust against compiler regressions:

pub fn find_structural_chars(block: &[u8; 64]) -> RawBlockMasks {
    let c0 = u8x16::new(block[0..16].try_into().unwrap());
    let c1 = u8x16::new(block[16..32].try_into().unwrap());
    let c2 = u8x16::new(block[32..48].try_into().unwrap());
    let c3 = u8x16::new(block[48..64].try_into().unwrap());
    let mask = |needle: u8| -> u64 {
        let n = u8x16::splat(needle);
        (c0.simd_eq(n).to_bitmask() as u64)
            | ((c1.simd_eq(n).to_bitmask() as u64) << 16)
            | ((c2.simd_eq(n).to_bitmask() as u64) << 32)
            | ((c3.simd_eq(n).to_bitmask() as u64) << 48)
    };
    RawBlockMasks { newline: mask(b'\n'), space: mask(b' '), /* ... */ }
}

}
}

if line_start < len {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the source of the 12.5% nested-JSON regression — is_in_string(pos) is called per byte, each doing a block lookup + bit shift.

Since the index stores in_string bitmasks, a future optimization could extract the bitmask once per 64-byte block and use trailing_zeros to jump between structural positions rather than scanning byte-by-byte. Worth a follow-up issue.

let real_q = compute_real_quotes(raw.quote, raw.backslash, &mut self.prev_odd_backslash);
let raw_string_bits = prefix_xor(real_q) ^ self.prev_in_string;

self.prev_in_string = if (raw_string_bits >> 63) & 1 == 1 {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct for NDJSON. Worth a brief doc note or debug_assert! that this module assumes NDJSON input (no literal newlines inside JSON strings) — a future reader might try to "fix" this by adding string masking.

Comment thread AGENTS.md
- `README.md` — what logfwd does, performance targets
- `DEVELOPING.md` — build/test/bench commands, codebase structure
- `docs/ARCHITECTURE.md` — v2 Arrow pipeline design
- `dev-docs/ARCHITECTURE.md` — pipeline data flow, scanner stages
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev-docs/ARCHITECTURE.md is referenced here but docs/ARCHITECTURE.md was deleted in this PR and I don't see dev-docs/ARCHITECTURE.md created in the changeset. Is this a dangling reference, or does that file already exist on the branch?

Major fixes:
- Preload SIMD chunks once in find_structural_chars instead of 10×
  per block (mask64 → cmp4, chunks loaded once at top)
- Fix potential panic: 1u64 << 64 when newline at bit position 63
- Add NDJSON assumption doc on process_block (newlines not string-masked)
- Add rustdoc on pub mod structural in lib.rs

Minor fixes:
- Rename fuzz target: chunk_index.rs → structural_index.rs
- Add language specifier to AGENTS.md code block
- Clarify AGENTS.md: stored bitmasks is current state, not zero yet
- Update DEVELOPING.md scanner description for portable SIMD

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

@strawgate Sure, triggering a full review of PR #360 now!


🧠 Learnings used
Learnt from: CR
Repo: strawgate/memagent PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-31T04:48:42.229Z
Learning: Applies to **/{scanner,format,parse}*/*.rs : Avoid per-line heap allocations in hot paths (scanner, format parser)

Learnt from: CR
Repo: strawgate/memagent PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-03-30T16:21:10.222Z
Learning: Applies to **/*.rs : Avoid per-line heap allocations in hot paths (scanner, CRI parser, OTLP encoder, compress)
✅ Actions performed

Full review triggered.

- DIRECTION.md: remove stale "NEON/AVX2/SSE2 backends in logfwd-arrow"
  reference, align with portable SIMD via wide
- PROOF_AUDIT.md: add 5 new structural.rs proofs (find_char_mask,
  structural_scalar_consistent, process_block_no_panic,
  process_block_tail_mask, in_string_exclusion), update count to 36
- wide-crate-evaluation.md: fix API names (cmp_eq → simd_eq, CmpEq
  trait not needed — simd_eq is an inherent method)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
crates/logfwd-core/src/structural.rs (6)

239-240: ⚠️ Potential issue | 🟡 Minor

Replace these production unwrap()s with invariant-bearing expect()s.

These conversions only fail if an internal slice-length invariant is broken, so the panic should say which invariant failed. .unwrap() here violates the repo rule and makes field triage needlessly opaque.

🛠️ Minimal fix
-                buf[offset..offset + 64].try_into().unwrap()
+                buf[offset..offset + 64]
+                    .try_into()
+                    .expect("remaining >= 64 guarantees a 64-byte slice")
-    let c0 = u8x16::new(block[0..16].try_into().unwrap());
-    let c1 = u8x16::new(block[16..32].try_into().unwrap());
-    let c2 = u8x16::new(block[32..48].try_into().unwrap());
-    let c3 = u8x16::new(block[48..64].try_into().unwrap());
+    let c0 = u8x16::new(
+        block[0..16]
+            .try_into()
+            .expect("64-byte block splits into exact 16-byte lanes"),
+    );
+    let c1 = u8x16::new(
+        block[16..32]
+            .try_into()
+            .expect("64-byte block splits into exact 16-byte lanes"),
+    );
+    let c2 = u8x16::new(
+        block[32..48]
+            .try_into()
+            .expect("64-byte block splits into exact 16-byte lanes"),
+    );
+    let c3 = u8x16::new(
+        block[48..64]
+            .try_into()
+            .expect("64-byte block splits into exact 16-byte lanes"),
+    );

As per coding guidelines Do not use '.unwrap()' in production code paths — use '?' operator or '.expect()' with a descriptive reason.

Also applies to: 417-420

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 239 - 240, Replace the
production unwrap() calls used for slice-to-array conversions with
invariant-bearing expect() calls that include a descriptive message about the
broken internal invariant (e.g., when constructing `block: [u8; 64]` from
`buf[offset..offset + 64]` and the other occurrences around lines handling
similar `try_into()` conversions); update the conversions in the `structural.rs`
functions that build fixed-size arrays so that each `.try_into().unwrap()`
becomes `.try_into().expect("internal invariant: expected slice length X for
<describe field>")`, clearly naming the expected length and the field/operation
so a panic identifies the violated invariant.

770-773: ⚠️ Potential issue | 🟡 Minor

Handle the block-end newline case in after_nl_mask.

When bit_pos == 63, 1u64 << (bit_pos + 1) shifts by 64 and this test panics instead of covering the boundary. Return 0 for after_nl_mask in that case.

🧪 Minimal fix
-                let after_nl_mask = !((1u64 << (bit_pos + 1)) - 1);
+                let after_nl_mask = if bit_pos == 63 {
+                    0
+                } else {
+                    !((1u64 << (bit_pos + 1)) - 1)
+                };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 770 - 773, The computation
of after_nl_mask panics for bit_pos == 63 due to shifting by 64; update the code
that sets after_nl_mask so it returns 0 when bit_pos == 63, otherwise compute
!((1u64 << (bit_pos + 1)) - 1). Specifically change the after_nl_mask assignment
(the expression using bit_pos and 1u64 << (bit_pos + 1)) to a conditional: if
bit_pos == 63 { 0 } else { !((1u64 << (bit_pos + 1)) - 1) }, leaving the
subsequent colons_after = (processed.colon & after_nl_mask).count_ones()
unchanged.

537-539: ⚠️ Potential issue | 🟡 Minor

Rename this escaped-quote fixture so typos passes.

The current comment and raw string both contain ues, which is already failing CI.

✏️ Minimal fix
-        // {"k":"val\"ues"}  — escaped quote should not close the string
+        // {"k":"hello\"world"} — escaped quote should not close the string
         let mut buf = [b' '; 64];
-        let json = br#"{"k":"val\"ues"}"#;
+        let json = br#"{"k":"hello\"world"}"#;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 537 - 539, The test
fixture uses the substring "ues" in both the comment and the raw string which is
causing typos CI to fail; update the escaped-quote fixture by renaming the
comment and the raw string value (the variable json defined as
br#"{"k":"val\"ues"}"#) to avoid the sequence "ues" (for example change
"val\"ues" to "val\"Xes" or "val_quoted") and ensure the explanatory comment
above the buffer (the line containing // {"k":"val\"ues"} ...) is updated to
match the new example.

79-89: ⚠️ Potential issue | 🟡 Minor

Finish rustdoc on the remaining public API surface.

RawBlockMasks still exposes undocumented public fields, and StreamingClassifier::new() is still a public constructor without rustdoc. As per coding guidelines All public APIs must have doc comments.

Also applies to: 142-147

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 79 - 89, Public API items
lack rustdoc: add descriptive doc comments for the RawBlockMasks struct and each
of its public fields (newline, space, quote, backslash, comma, colon,
open_brace, close_brace, open_bracket, close_bracket) describing what each mask
represents and any invariants, and add rustdoc to the public constructor
StreamingClassifier::new() explaining its purpose, parameters, return value, and
any panics/constraints; update the struct-level comment on RawBlockMasks and the
impl/associated function doc for StreamingClassifier::new() so all public
surfaces have documentation.

415-425: ⚠️ Potential issue | 🟠 Major

find_structural_chars() still reloads the block ten times.

Each mask64() call rebuilds c0..c3, so this is 40 lane loads per 64-byte block unless LLVM happens to CSE them. In this hot path that's a material perf risk, and it contradicts the “Loads the 64-byte block once” doc comment; hoist the four u8x16 loads into find_structural_chars() and reuse them for every needle.

Also applies to: 427-444

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 415 - 425, The mask64
function rebuilds c0..c3 on every call causing repeated loads; change mask64 to
accept the already-loaded u8x16 vectors (e.g., c0, c1, c2, c3 or a small struct)
instead of a &[u8;64], and hoist the four u8x16 loads out of
find_structural_chars() so find_structural_chars() loads the 64-byte block once
into c0..c3 and reuses them for each needle (also apply the same pattern to the
other mask64/usage sites mentioned around the 427-444 area).

323-343: ⚠️ Potential issue | 🟠 Major

Reject mismatched closers in skip_nested().

This only tracks depth, so {] and [} both pop successfully. In crates/logfwd-core/src/scanner.rs:136-145, the caller trusts that boundary, slices &buf[s..pos], and keeps parsing later fields, so a malformed nested value can desynchronize the rest of the line. Track opener kind in a small bitstack or fail fast on a mismatched closer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 323 - 343, skip_nested
currently only tracks depth so mismatched closers like `{]` are accepted; change
skip_nested(buf, pos) to track opener kinds (e.g., push b'{' or b'[' onto a
small stack/Vec<u8> when you see an opener and pop/validate on b'}' or b']'),
still honoring is_in_string(pos), and if a closer does not match the stack top
return immediately (fail fast) rather than treating it as a valid pop; update
logic in skip_nested to push on b'{'|b'[' and on b'}'|b']' check the top matches
the corresponding opener, otherwise return current pos to signal the mismatch so
the caller parsing (&buf[s..pos]) won’t be desynchronized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/logfwd-core/src/structural.rs`:
- Around line 311-320: The current scan_string(&self, buf, pos) can return a
slice that crosses into the next NDJSON row; add a bounded variant (e.g.,
scan_string_bounded(&self, buf: &[u8], pos: usize, end: usize) -> Option<(&[u8],
usize)>) that limits the search for the closing quote to <end> and uses a
next_quote_bounded(start, end) helper (or reuse next_quote but only accept a
close <= end) to avoid returning cross-line slices; update callers (or replace
uses in scanner.rs) to call the bounded variant or explicitly reject results
where close > end before appending the slice.

---

Duplicate comments:
In `@crates/logfwd-core/src/structural.rs`:
- Around line 239-240: Replace the production unwrap() calls used for
slice-to-array conversions with invariant-bearing expect() calls that include a
descriptive message about the broken internal invariant (e.g., when constructing
`block: [u8; 64]` from `buf[offset..offset + 64]` and the other occurrences
around lines handling similar `try_into()` conversions); update the conversions
in the `structural.rs` functions that build fixed-size arrays so that each
`.try_into().unwrap()` becomes `.try_into().expect("internal invariant: expected
slice length X for <describe field>")`, clearly naming the expected length and
the field/operation so a panic identifies the violated invariant.
- Around line 770-773: The computation of after_nl_mask panics for bit_pos == 63
due to shifting by 64; update the code that sets after_nl_mask so it returns 0
when bit_pos == 63, otherwise compute !((1u64 << (bit_pos + 1)) - 1).
Specifically change the after_nl_mask assignment (the expression using bit_pos
and 1u64 << (bit_pos + 1)) to a conditional: if bit_pos == 63 { 0 } else {
!((1u64 << (bit_pos + 1)) - 1) }, leaving the subsequent colons_after =
(processed.colon & after_nl_mask).count_ones() unchanged.
- Around line 537-539: The test fixture uses the substring "ues" in both the
comment and the raw string which is causing typos CI to fail; update the
escaped-quote fixture by renaming the comment and the raw string value (the
variable json defined as br#"{"k":"val\"ues"}"#) to avoid the sequence "ues"
(for example change "val\"ues" to "val\"Xes" or "val_quoted") and ensure the
explanatory comment above the buffer (the line containing // {"k":"val\"ues"}
...) is updated to match the new example.
- Around line 79-89: Public API items lack rustdoc: add descriptive doc comments
for the RawBlockMasks struct and each of its public fields (newline, space,
quote, backslash, comma, colon, open_brace, close_brace, open_bracket,
close_bracket) describing what each mask represents and any invariants, and add
rustdoc to the public constructor StreamingClassifier::new() explaining its
purpose, parameters, return value, and any panics/constraints; update the
struct-level comment on RawBlockMasks and the impl/associated function doc for
StreamingClassifier::new() so all public surfaces have documentation.
- Around line 415-425: The mask64 function rebuilds c0..c3 on every call causing
repeated loads; change mask64 to accept the already-loaded u8x16 vectors (e.g.,
c0, c1, c2, c3 or a small struct) instead of a &[u8;64], and hoist the four
u8x16 loads out of find_structural_chars() so find_structural_chars() loads the
64-byte block once into c0..c3 and reuses them for each needle (also apply the
same pattern to the other mask64/usage sites mentioned around the 427-444 area).
- Around line 323-343: skip_nested currently only tracks depth so mismatched
closers like `{]` are accepted; change skip_nested(buf, pos) to track opener
kinds (e.g., push b'{' or b'[' onto a small stack/Vec<u8> when you see an opener
and pop/validate on b'}' or b']'), still honoring is_in_string(pos), and if a
closer does not match the stack top return immediately (fail fast) rather than
treating it as a valid pop; update logic in skip_nested to push on b'{'|b'[' and
on b'}'|b']' check the top matches the corresponding opener, otherwise return
current pos to signal the mismatch so the caller parsing (&buf[s..pos]) won’t be
desynchronized.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e532502a-d9e1-46bf-b5e5-041a3ec74d82

📥 Commits

Reviewing files that changed from the base of the PR and between 77fb6be and a6ba1c3.

📒 Files selected for processing (1)
  • crates/logfwd-core/src/structural.rs

Comment on lines +311 to +320
/// Scan a JSON string starting at `pos` (pointing to opening `"`).
#[inline(always)]
pub fn scan_string<'a>(&self, buf: &'a [u8], pos: usize) -> Option<(&'a [u8], usize)> {
debug_assert!(pos < buf.len() && buf[pos] == b'"');
let start = pos + 1;
if let Some(close) = self.next_quote(start) {
Some((&buf[start..close], close + 1))
} else {
None
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add a line-bounded string scan path.

Line 316 searches for the next quote anywhere in buf. On an unterminated row, scan_string() can close on the next NDJSON row's first " and return a cross-line slice before the caller notices; in crates/logfwd-core/src/scanner.rs:123-139, the value branch can append that slice before breaking. Please add a bounded variant here, or have the callers reject after > end before using the slice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 311 - 320, The current
scan_string(&self, buf, pos) can return a slice that crosses into the next
NDJSON row; add a bounded variant (e.g., scan_string_bounded(&self, buf: &[u8],
pos: usize, end: usize) -> Option<(&[u8], usize)>) that limits the search for
the closing quote to <end> and uses a next_quote_bounded(start, end) helper (or
reuse next_quote but only accept a close <= end) to avoid returning cross-line
slices; update callers (or replace uses in scanner.rs) to call the bounded
variant or explicitly reject results where close > end before appending the
slice.

strawgate and others added 2 commits March 31, 2026 01:11
- Replace unwrap() with expect() in find_structural_chars and
  StructuralIndex::new (production paths, safe by construction
  but expect() states the invariant)
- Restore format parser and compress to issue-worker hot-path list

Intentionally not addressed (same behavior as old ChunkIndex,
separate issues):
- scan_string cross-line safety on unterminated strings
- skip_nested delimiter kind tracking ({] mismatch)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (8)
crates/logfwd-core/src/structural.rs (6)

78-90: 🧹 Nitpick | 🔵 Trivial

Add doc comments to RawBlockMasks public fields.

The struct has a doc comment but the 10 public fields don't. Each field should briefly document what character it detects.

As per coding guidelines All public APIs must have doc comments.

,

Example
 pub struct RawBlockMasks {
+    /// Bitmask of newline (`\n`) positions.
     pub newline: u64,
+    /// Bitmask of space (` `) positions.
     pub space: u64,
     // ... etc
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 78 - 90, The public struct
RawBlockMasks exposes ten public bitmask fields without documentation; add
concise doc comments to each field (newline, space, quote, backslash, comma,
colon, open_brace, close_brace, open_bracket, close_bracket) describing which
character each mask detects (e.g., "Mask for newline characters", "Mask for
space characters", etc.) so that the RawBlockMasks public API complies with the
project's doc-comment requirement.

141-147: 🧹 Nitpick | 🔵 Trivial

Add doc comment to StreamingClassifier::new().

The constructor is public but undocumented.

As per coding guidelines All public APIs must have doc comments.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 141 - 147, Add a public
doc comment for the constructor StreamingClassifier::new() describing what a
StreamingClassifier represents/does and what the returned instance's initial
fields (prev_odd_backslash and prev_in_string) mean; place the doc comment
directly above the impl or the fn signature for StreamingClassifier::new so it
appears in generated docs and follows crate coding guidelines for public APIs.

428-432: ⚠️ Potential issue | 🟡 Minor

Replace .unwrap() with .expect() on SIMD lane conversions.

These conversions are infallible (fixed 16-byte slices from a 64-byte array), but use .expect("16-byte slice") for consistency with repo rules.

As per coding guidelines Do not use '.unwrap()' in production code paths.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 428 - 432, In
find_structural_chars replace the four infallible .try_into().unwrap() calls
used for u8x16::new (c0, c1, c2, c3) with .try_into().expect("16-byte slice") to
follow repo rules against unwrap; update each u8x16::new(block[...]
.try_into().expect("16-byte slice")) so the conversion remains unchanged but
uses expect with the message "16-byte slice".

243-244: ⚠️ Potential issue | 🟡 Minor

Replace .unwrap() with .expect() documenting the invariant.

The try_into().unwrap() is safe here (guarded by remaining >= 64), but should use .expect("slice is exactly 64 bytes") per repo rules.

As per coding guidelines Do not use '.unwrap()' in production code paths — use '?' operator or '.expect()' with a descriptive reason.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 243 - 244, Replace the
unsafe .unwrap() on the slice conversion in the block construction with
.expect(...) and document the invariant; specifically, in the code that sets let
block: [u8; 64] inside the branch guarded by remaining >= 64 (in structural.rs),
change the buf[offset..offset + 64].try_into().unwrap() call to use
.expect("slice is exactly 64 bytes") so the reason for safety is recorded
alongside the conversion.

315-325: ⚠️ Potential issue | 🟠 Major

scan_string can cross NDJSON line boundaries.

As flagged previously, an unterminated string will find the next quote in a subsequent line. Consider a bounded variant or have callers validate close <= line_end.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 315 - 325, scan_string
currently calls next_quote from start and may return a closing quote found in a
later NDJSON line, allowing strings to span lines; fix by bounding the search to
the current line or adding a bounded variant: change scan_string or add
scan_string_bounded(&self, buf: &[u8], pos: usize, line_end: usize) that calls
next_quote(start, line_end) (or use an existing next_quote_with_limit) and only
returns Some if close <= line_end, otherwise return None; alternatively update
callers of scan_string to validate the returned close index against their
line_end before accepting the slice (reference: scan_string, next_quote).

329-357: ⚠️ Potential issue | 🟠 Major

skip_nested doesn't validate delimiter matching.

As flagged previously, {] or [} pairs will decrement depth without checking opener kind. On malformed input this can return at the wrong boundary. Consider tracking opener kind or failing fast on mismatch.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-core/src/structural.rs` around lines 329 - 357, The skip_nested
function currently only tracks nesting depth and so treats any closing '}' or
']' as matching any opener; update skip_nested to track opener kinds (e.g., push
the byte b'{' or b'[' onto a small stack or last-opener variable) when you see
an opener in the match arm for b'{' | b'[' (and use is_in_string(pos) as
currently done), and when you encounter a closing b'}' or b']' verify it matches
the most recent opener kind (using the stored opener or stack) — on mismatch
return a failure boundary (e.g., len or a sentinel) instead of decrementing
depth, and keep the existing string handling via scan_string(buf, pos) and
is_in_string(pos).
dev-docs/PROOF_AUDIT.md (1)

6-24: ⚠️ Potential issue | 🟡 Minor

Documented proof names still don't match actual Kani proofs in structural.rs.

The section header says "7 proofs" but verify_prefix_xor (lines 8-13) and verify_compute_real_quotes (lines 15-24) don't exist in the codebase. The actual proofs in structural.rs are:

  • verify_find_char_mask_correct
  • verify_structural_scalar_consistent
  • verify_process_block_no_panic
  • verify_process_block_tail_mask
  • verify_in_string_exclusion

That's 5 proofs, not 7. Either add the missing proofs or update the doc to reflect reality.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-docs/PROOF_AUDIT.md` around lines 6 - 24, The PROOF_AUDIT.md section for
structural.rs is out of sync: it claims "7 proofs" and lists non-existent proofs
verify_prefix_xor and verify_compute_real_quotes; instead update the document to
list the actual Kani proof functions found in structural.rs
(verify_find_char_mask_correct, verify_structural_scalar_consistent,
verify_process_block_no_panic, verify_process_block_tail_mask,
verify_in_string_exclusion) and correct the proof count to 5, or alternatively
add the missing Kani proofs to the codebase if the doc is intended to reflect
additional guarantees—ensure the header, proof names, and any "Doesn't
prove"/"Gap" notes match the actual functions and their behaviors.
.github/agents/issue-worker.agent.md (1)

32-40: ⚠️ Potential issue | 🟡 Minor

Checklist still missing repo-wide constraints.

The "no unsafe code" rule isn't listed, and the hot-path allocation scope (line 37) omits format parser and compress. Since this is the agent's working checklist, it should match the canonical rules.

As per coding guidelines Do not use unsafe code. Based on learnings: Avoid per-line heap allocations in hot paths (scanner, CRI parser, OTLP encoder, compress).

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/agents/issue-worker.agent.md around lines 32 - 40, Update the Code
Quality checklist under the "Code Quality" section to include the
repository-wide rule "Do not use unsafe code" and expand the hot-path allocation
item to explicitly list "scanner, CRI parser, OTLP encoder, format parser,
compress" so the agent's working checklist matches the canonical rules; ensure
the wording mirrors the existing checklist style (e.g., "No unsafe code" and
"Avoid per-line heap allocations in hot paths (scanner, CRI parser, OTLP
encoder, format parser, compress)").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/copilot-instructions.md:
- Line 7: Update the single-line language note to reflect that the scanner no
longer uses unsafe and that logfwd-core is now marked #![forbid(unsafe_code)]:
replace the old phrase "Rust (stable toolchain, unsafe only in SIMD layer)" with
something like "Rust (stable toolchain; scanner contains zero unsafe;
logfwd-core is #![forbid(unsafe_code)]; SIMD functionality provided safely via
the wide crate)" and reference the Scanner, logfwd-core, dev-docs/PHASES.md and
wide crate in the updated note so readers know where this policy is enforced.

In `@AGENTS.md`:
- Around line 74-75: The hot-path guidance is narrower than repo learnings;
update the list to include all hot-path components that must avoid per-line heap
allocations by expanding the parenthetical after the rule to include scanner,
builders, OTLP encoder, CRI parser, format parser, and compress; locate the rule
text that currently lists "(scanner, builders, OTLP encoder)" and replace it
with the full set so the guidance matches the documented learnings.

In `@dev-docs/research/wide-crate-evaluation.md`:
- Around line 177-196: Add explicit language specifiers to the two unlabeled
code blocks in the "Before" and "After" sections by changing their
triple-backtick fences to include a language token (use `text` or `plaintext`)
so the file listings for structural.rs and chunk_classify.rs are treated as
plain text; update the blocks that list symbols like mod simd_neon, mod
simd_avx2, mod simd_sse2, fn find_structural_chars, mod aarch64_impl, mod x86,
and fn find_quotes_and_backslashes (and the "After" block listing fn mask64 and
fn find_structural_chars) to start with ```text (or ```plaintext) instead of
just ```.

---

Duplicate comments:
In @.github/agents/issue-worker.agent.md:
- Around line 32-40: Update the Code Quality checklist under the "Code Quality"
section to include the repository-wide rule "Do not use unsafe code" and expand
the hot-path allocation item to explicitly list "scanner, CRI parser, OTLP
encoder, format parser, compress" so the agent's working checklist matches the
canonical rules; ensure the wording mirrors the existing checklist style (e.g.,
"No unsafe code" and "Avoid per-line heap allocations in hot paths (scanner, CRI
parser, OTLP encoder, format parser, compress)").

In `@crates/logfwd-core/src/structural.rs`:
- Around line 78-90: The public struct RawBlockMasks exposes ten public bitmask
fields without documentation; add concise doc comments to each field (newline,
space, quote, backslash, comma, colon, open_brace, close_brace, open_bracket,
close_bracket) describing which character each mask detects (e.g., "Mask for
newline characters", "Mask for space characters", etc.) so that the
RawBlockMasks public API complies with the project's doc-comment requirement.
- Around line 141-147: Add a public doc comment for the constructor
StreamingClassifier::new() describing what a StreamingClassifier represents/does
and what the returned instance's initial fields (prev_odd_backslash and
prev_in_string) mean; place the doc comment directly above the impl or the fn
signature for StreamingClassifier::new so it appears in generated docs and
follows crate coding guidelines for public APIs.
- Around line 428-432: In find_structural_chars replace the four infallible
.try_into().unwrap() calls used for u8x16::new (c0, c1, c2, c3) with
.try_into().expect("16-byte slice") to follow repo rules against unwrap; update
each u8x16::new(block[...] .try_into().expect("16-byte slice")) so the
conversion remains unchanged but uses expect with the message "16-byte slice".
- Around line 243-244: Replace the unsafe .unwrap() on the slice conversion in
the block construction with .expect(...) and document the invariant;
specifically, in the code that sets let block: [u8; 64] inside the branch
guarded by remaining >= 64 (in structural.rs), change the buf[offset..offset +
64].try_into().unwrap() call to use .expect("slice is exactly 64 bytes") so the
reason for safety is recorded alongside the conversion.
- Around line 315-325: scan_string currently calls next_quote from start and may
return a closing quote found in a later NDJSON line, allowing strings to span
lines; fix by bounding the search to the current line or adding a bounded
variant: change scan_string or add scan_string_bounded(&self, buf: &[u8], pos:
usize, line_end: usize) that calls next_quote(start, line_end) (or use an
existing next_quote_with_limit) and only returns Some if close <= line_end,
otherwise return None; alternatively update callers of scan_string to validate
the returned close index against their line_end before accepting the slice
(reference: scan_string, next_quote).
- Around line 329-357: The skip_nested function currently only tracks nesting
depth and so treats any closing '}' or ']' as matching any opener; update
skip_nested to track opener kinds (e.g., push the byte b'{' or b'[' onto a small
stack or last-opener variable) when you see an opener in the match arm for b'{'
| b'[' (and use is_in_string(pos) as currently done), and when you encounter a
closing b'}' or b']' verify it matches the most recent opener kind (using the
stored opener or stack) — on mismatch return a failure boundary (e.g., len or a
sentinel) instead of decrementing depth, and keep the existing string handling
via scan_string(buf, pos) and is_in_string(pos).

In `@dev-docs/PROOF_AUDIT.md`:
- Around line 6-24: The PROOF_AUDIT.md section for structural.rs is out of sync:
it claims "7 proofs" and lists non-existent proofs verify_prefix_xor and
verify_compute_real_quotes; instead update the document to list the actual Kani
proof functions found in structural.rs (verify_find_char_mask_correct,
verify_structural_scalar_consistent, verify_process_block_no_panic,
verify_process_block_tail_mask, verify_in_string_exclusion) and correct the
proof count to 5, or alternatively add the missing Kani proofs to the codebase
if the doc is intended to reflect additional guarantees—ensure the header, proof
names, and any "Doesn't prove"/"Gap" notes match the actual functions and their
behaviors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 047dd1de-b5e5-4286-bee9-cfc93268d236

📥 Commits

Reviewing files that changed from the base of the PR and between fb87b82 and 3270f21.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (29)
  • .github/agents/issue-worker.agent.md
  • .github/copilot-instructions.md
  • .github/workflows/bench.yml
  • AGENTS.md
  • CLAUDE.md
  • CODE_STYLE.md
  • CONTRIBUTING.md
  • DEVELOPING.md
  • crates/logfwd-core/Cargo.toml
  • crates/logfwd-core/benches/structural_detect.rs
  • crates/logfwd-core/fuzz/Cargo.toml
  • crates/logfwd-core/fuzz/fuzz_targets/structural_index.rs
  • crates/logfwd-core/src/chunk_classify.rs
  • crates/logfwd-core/src/lib.rs
  • crates/logfwd-core/src/scanner.rs
  • crates/logfwd-core/src/structural.rs
  • crates/logfwd-core/tests/vectorization_canary.rs
  • dev-docs/DECISIONS.md
  • dev-docs/DIRECTION.md
  • dev-docs/PHASES.md
  • dev-docs/PROOF_AUDIT.md
  • dev-docs/ZERO_COPY_PIPELINE.md
  • dev-docs/research/arrow-ipc-feasibility.md
  • dev-docs/research/wide-crate-evaluation.md
  • docs/ARCHITECTURE.md
  • references/compass_artifact_wf-2dfe88a4-5edb-4626-bb0f-86e075888b3a_text_markdown.md
  • references/compass_artifact_wf-8cf54a98-da0f-43c0-a58d-004071580569_text_markdown.md
  • references/compass_artifact_wf-b72cb9f9-d224-4cf4-8352-9f4acf1d2f3c_text_markdown.md
  • references/compass_artifact_wf-c5800b72-6964-4704-aa02-05d6ca9b3c55_text_markdown.md
💤 Files with no reviewable changes (7)
  • docs/ARCHITECTURE.md
  • references/compass_artifact_wf-c5800b72-6964-4704-aa02-05d6ca9b3c55_text_markdown.md
  • references/compass_artifact_wf-2dfe88a4-5edb-4626-bb0f-86e075888b3a_text_markdown.md
  • references/compass_artifact_wf-8cf54a98-da0f-43c0-a58d-004071580569_text_markdown.md
  • crates/logfwd-core/benches/structural_detect.rs
  • references/compass_artifact_wf-b72cb9f9-d224-4cf4-8352-9f4acf1d2f3c_text_markdown.md
  • crates/logfwd-core/src/chunk_classify.rs

Comment thread .github/copilot-instructions.md Outdated
Comment thread AGENTS.md Outdated
Comment thread dev-docs/research/wide-crate-evaluation.md
- copilot-instructions: "no hand-written unsafe — SIMD via wide crate"
- AGENTS.md: expand hot-path list to include format parser, CRI
  parser, builders, compress
- wide-crate-evaluation.md: add language specifiers to code blocks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@strawgate strawgate merged commit 4f0a828 into master Mar 31, 2026
9 of 11 checks passed
@strawgate strawgate deleted the feat/structural-index-continued branch March 31, 2026 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant