Skip to content

fix: increment parse_errors for invalid JSON lines in passthrough JSON mode#755

Merged
strawgate merged 2 commits into
masterfrom
copilot/fix-invalid-json-handling
Apr 3, 2026
Merged

fix: increment parse_errors for invalid JSON lines in passthrough JSON mode#755
strawgate merged 2 commits into
masterfrom
copilot/fix-invalid-json-handling

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 2, 2026

Invalid JSON lines (e.g. not json at all) in Format::Json mode silently became all-null records — the scanner produces an empty row when a line doesn't start with {, and nothing incremented parse_errors.

Changes

FormatProcessor redesign (logfwd-io/src/format.rs)

  • Split the unit Passthrough variant into two struct variants:
    • Passthrough { stats } — raw format, forwards verbatim, never counts errors
    • PassthroughJson { stats } — JSON format, forwards verbatim and increments parse_errors for any non-empty line whose first non-whitespace byte is not {
  • Added passthrough(stats) and passthrough_json(stats) constructors (consistent with cri()/auto() pattern)
  • Added count_json_parse_errors() helper that scans a chunk line-by-line without copying

Pipeline wiring (logfwd/src/pipeline.rs)

  • Format::Jsonpassthrough_json(stats) (counts errors)
  • Format::Rawpassthrough(stats) (no validation — non-JSON is expected)

Call sites updated (framed.rs, allocation_churn.rs) to use passthrough(stats) constructor.

Behavior

Lines are still forwarded to the scanner unchanged (option b from the issue spec). The parse_errors counter now increments per invalid line, making data-quality issues visible in diagnostics without dropping any data.

# Before: parse_errors stays 0, record emitted with all fields null
# After:  parse_errors = 1, same record emitted (data preserved, error surfaced)
echo 'not json at all' | logfwd ...

@strawgate strawgate marked this pull request as ready for review April 3, 2026 00:34
Invalid JSON lines silently became all-null records because
FormatProcessor::Passthrough had no access to ComponentStats and
performed no validation.

- Split Passthrough into Passthrough (raw, no validation) and
  PassthroughJson (counts parse_errors for non-JSON-object lines)
- passthrough_json() increments parse_errors for any non-empty line
  that doesn't start with '{' (after leading whitespace)
- pipeline.rs: Format::Json uses passthrough_json, Format::Raw uses passthrough
- Lines are still forwarded to the scanner (behavior is option (b) from
  the issue: forwarded as raw event AND parse_errors incremented)
- All 143 logfwd-io tests pass; clippy and rustfmt clean

Agent-Logs-Url: https://github.com/strawgate/memagent/sessions/0604966e-3556-4cc7-b3c7-0eab67b7ceaa

Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix invalid JSON lines handling in logfwd fix: increment parse_errors for invalid JSON lines in passthrough JSON mode Apr 3, 2026
Copilot AI requested a review from strawgate April 3, 2026 00:55
@strawgate
Copy link
Copy Markdown
Owner

@coderabbitai please do a full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

@strawgate Sure, kicking off a full review now!


🧠 Learnings used
Learnt from: CR
Repo: strawgate/memagent PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2026-04-03T01:38:58.608Z
Learning: Applies to crates/logfwd-core/src/**/*.rs : New async pipeline code must have proptest coverage for: arbitrary event ordering; acks out of order; drain while in-flight

Learnt from: CR
Repo: strawgate/memagent PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-02T07:03:19.858Z
Learning: Applies to **/{scanner,*parser*,builder*,encoder*}.rs : Avoid per-line heap allocations in hot paths: scanner.rs, format parser, CRI parser, builders, OTLP encoder, and compression code
✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

The FormatProcessor enum was refactored to carry Arc<ComponentStats> in both Passthrough and new PassthroughJson variants. Two constructors were added: passthrough() and passthrough_json(). A helper function count_json_parse_errors() scans lines and increments error stats for lines not matching JSON object format. The make_format function now differentiates between JSON and Raw formats by selecting distinct passthrough processors. Test usages were updated to call the new constructors instead of using the variant directly.

Possibly related PRs


Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
High-Quality Rust Practices ❌ Error Logic bug in count_json_parse_errors: whitespace-only lines incorrectly increment parse_errors when first_nonws is None, violating documented behavior to ignore empty/whitespace-only lines. Restructure condition to if let Some(p) = line.iter().position(...) { if line[p] != b'{' { parse_errors += 1 } } and add regression tests for whitespace-only inputs.
Documentation Thoroughly Updated ⚠️ Warning PR adds public API functions with doc comments but documentation updates incomplete: CONFIG_REFERENCE.md missing json format parse_errors behavior, DESIGN.md lacks ADR for PassthroughJson split, DEVELOPING.md lacks whitespace edge case documentation, critical bug in count_json_parse_errors incorrectly counts whitespace-only lines as parse errors. Fix count_json_parse_errors bug, update CONFIG_REFERENCE.md for parse_errors behavior, add ADR to DESIGN.md, document whitespace edge case in DEVELOPING.md, add regression test.
Maintainer Fitness ⚠️ Warning PR modifies hot-path format processing without benchmark results and has unaddressed code review feedback identifying a whitespace-only line handling bug. Fix count_json_parse_errors to properly handle whitespace-only lines, run just bench before/after, and update PR description with risk surface and test coverage details.
✅ Passed checks (2 passed)
Check name Status Explanation
Formal Verification Coverage ✅ Passed PR contains no logfwd-core modifications; new public functions in logfwd-io are trivial constructors exempt from verification requirements.
Crate Boundary And Dependency Integrity ✅ Passed PR maintains crate boundary integrity with no new crates, no unsafe code, and correct dependency hierarchy preserved.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@strawgate strawgate merged commit 26bf01e into master Apr 3, 2026
12 of 13 checks passed
@strawgate strawgate deleted the copilot/fix-invalid-json-handling branch April 3, 2026 01:40
strawgate added a commit that referenced this pull request Apr 3, 2026
- verify_encode_tag, verify_bytes_field_size: add 2 kani::cover!() guards
  each (CodeRabbit requirement: proofs using assume() must have ≥2 cover!
  guards to confirm the constrained space is non-vacuous)
- framed.rs tests: FormatProcessor::Passthrough became a struct variant in
  #755 but 3 EOF-flush tests were not updated; use passthrough(stats.clone())

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
strawgate added a commit that referenced this pull request Apr 3, 2026
…dr from start()

Two test failures on macOS CI, both introduced by switching to nextest (#761):

1. diagnostics.rs — free_port() + TEST_LOCK failed because nextest runs each
   test as its own process: the static AtomicU16 resets to 19100 per process,
   so concurrent tests pick the same starting port → 'Address already in use'.
   Fix: DiagnosticsServer::start() now returns (JoinHandle<()>, SocketAddr).
   Tests use "127.0.0.1:0" and read the OS-assigned port from the result.
   Removes free_port(), TEST_LOCK, and all per-test port management.

2. framed.rs — FormatProcessor::Passthrough became a struct variant in #755
   but 3 EOF-flush tests were not updated → test compile failure on master.
   Fix: use FormatProcessor::passthrough(stats.clone()).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: invalid JSON lines silently become all-null records — parse_errors counter stays 0

2 participants