feat: file tailing audit + TLA+ spec + Kani checkpoint proofs#802
Conversation
Comprehensive audit of logfwd's file reading path compared with three production collectors. Documents 7 findings with priority fixes. Critical: shared remainder buffer (#797), truncation event (#796) High: unbounded read (#800), glob dedup (#799), fingerprint (#798) Medium: read fairness (#801), partial flush timer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Caution Review failedPull request was closed or merged during review WalkthroughAdded a new end-to-end audit document (dev-docs/research/file-tailing-audit.md) describing logfwd’s file-tailing pipeline and decision points for discovery/identity, disk reads, truncation/rotation, checkpoint identity, remainder buffering/line splitting, glob rescans/deduplication, batching, and channel backpressure. The audit contrasts current behavior with Vector, OpenTelemetry Collector ( Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dev-docs/research/file-tailing-audit.md`:
- Line 68: The document's "### 6. No partial line flush timer" heading (and the
subsequent "Findings 6 and 7") lack references to issues in the PR objectives
table; either add the corresponding issue links/IDs next to those findings in
the markdown or explicitly note that no issue was filed and why; update the
“Findings 6 and 7” entries to include the PR objectives issue table reference
(or a clear statement that they are intentionally untracked) so reviewers can
find or verify the associated issue(s).
- Around line 115-116: Update the priority list entry that reads "Per-file read
fairness budget — MEDIUM" to append the issue reference so it becomes "Per-file
read fairness budget — MEDIUM (see issue `#801`)"; locate the exact list item text
in dev-docs/research/file-tailing-audit.md and modify that line accordingly to
include the "#801" reference.
- Line 86: Update the heading "8. No per-file read fairness" to include the
issue reference `#801` (e.g., change to "8. No per-file read fairness (see issue
`#801`)"), so the finding clearly links to that issue; locate the heading text "8.
No per-file read fairness" in dev-docs/research/file-tailing-audit.md and modify
it to append the issue number as shown.
- Line 35: Update the heading "Unbounded read in read_new_data" to include the
issue reference `#800` (e.g., "Unbounded read in read_new_data — issue `#800`") so
the finding is linked to the tracked issue; search for the exact heading text
"Unbounded read in read_new_data" in the document and modify it to append "issue
`#800`" or a similar clear reference to issue 800.
- Line 112: Update the line that currently reads "3. Bound read_new_data — HIGH,
new issue needed" to reflect that an issue already exists by changing it to
something like "3. Bound read_new_data — HIGH, tracked in issue `#800`
(read_new_data allocates unbounded Vec — OOM on large files)"; locate the entry
by the unique phrase "Bound read_new_data" and replace the "new issue needed"
status with the reference to issue `#800` and a brief descriptor.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7ef42886-d8b8-4908-8881-4c1897f9c1f6
📒 Files selected for processing (1)
dev-docs/research/file-tailing-audit.md
|
|
||
| **Fix**: Emit `TailEvent::Truncated` when truncation detected. | ||
|
|
||
| ### 3. Unbounded read in read_new_data |
There was a problem hiding this comment.
Add issue reference #800 to finding 3 heading.
PR objectives indicate issue #800 covers "read_new_data allocates unbounded Vec — OOM on large files," which matches this finding.
📝 Suggested fix
-### 3. Unbounded read in read_new_data
+### 3. Unbounded read in read_new_data (`#800`)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### 3. Unbounded read in read_new_data | |
| ### 3. Unbounded read in read_new_data (`#800`) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dev-docs/research/file-tailing-audit.md` at line 35, Update the heading
"Unbounded read in read_new_data" to include the issue reference `#800` (e.g.,
"Unbounded read in read_new_data — issue `#800`") so the finding is linked to the
tracked issue; search for the exact heading text "Unbounded read in
read_new_data" in the document and modify it to append "issue `#800`" or a similar
clear reference to issue 800.
| OTel Collector flushes with a 5s timeout. Fluent Bit and logfwd both | ||
| discard partial lines on shutdown. | ||
|
|
||
| ### 8. No per-file read fairness |
There was a problem hiding this comment.
Add issue reference #801 to finding 8 heading.
PR objectives indicate issue #801 covers "No per-file read fairness budget," which matches this finding.
📝 Suggested fix
-### 8. No per-file read fairness
+### 8. No per-file read fairness (`#801`)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### 8. No per-file read fairness | |
| ### 8. No per-file read fairness (`#801`) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dev-docs/research/file-tailing-audit.md` at line 86, Update the heading "8.
No per-file read fairness" to include the issue reference `#801` (e.g., change to
"8. No per-file read fairness (see issue `#801`)"), so the finding clearly links
to that issue; locate the heading text "8. No per-file read fairness" in
dev-docs/research/file-tailing-audit.md and modify it to append the issue number
as shown.
Replaces comparison table with: - ASCII data flow diagram showing all 8 decision points - Per-decision options with pros/cons and industry context - Recommendations for each - Priority-ordered fix table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…audit Address CodeRabbit review feedback on PR #802: explicitly mark the partial line flush timer and partial line on shutdown sections as having no filed issue (future enhancements, not bugs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
End-to-end audit of logfwd's file and network input paths, formal verification of the checkpoint design, and Kani-proven checkpoint state machine.
Audits
TLA+ Specification
dev-docs/tla/FileCheckpoint.tla— 12 actions, per-source-identity checkpoints.Two design bugs found by TLC before any code was written:
read_offset - framer_buf.len()Safety: 10,939 states, 8 invariants + CheckpointMonotonicity — ALL PASS
Liveness: 288 states, Progress + EventualEmission — ALL PASS
Kani Proofs
crates/logfwd-core/src/checkpoint_tracker.rs— pure state machine for checkpoint-remainder coordination.5 Kani harnesses proving: offset ordering invariants, checkpoint monotonicity, no data loss on crash, processed advances only on newline, overflow safety. 15 unit tests.
Verification Plans
checkpoint-kani-plan.md— Kani proof designcheckpoint-proptest-plan.md— proptest state machine with crash injectioncheckpoint-tla-plan.md— TLA+ design docper-source-remainder-design.md— implementation plan for per-source remainderRefs #806, #796, #797, #803
Test plan
cargo test -p logfwd-core --lib checkpoint_tracker— 15 tests passcargo clippy -p logfwd-core -- -D warnings— clean🤖 Generated with Claude Code