feat: file tailing audit + TLA+ spec + Kani checkpoint proofs by strawgate · Pull Request #802 · strawgate/fastforward

strawgate · 2026-04-03T04:53:20Z

Summary

End-to-end audit of logfwd's file and network input paths, formal verification of the checkpoint design, and Kani-proven checkpoint state machine.

Audits

File tailing audit — data flow diagram with options analysis for 8 decision points
Network input audit — TCP cross-connection corruption found, UDP verified correct
8 bugs filed (bug: TailEvent::Truncated never emitted — copytruncate corrupts remainder #796-enhancement: per-file read fairness budget #801, bug: TCP input merges all connections into one stream — cross-connection data corruption #803-bug: TCP client disconnect loses partial line — no EndOfFile emitted #804), tracked by epic epic: File tailing correctness — per-source remainder + checkpoint safety #806

TLA+ Specification

dev-docs/tla/FileCheckpoint.tla — 12 actions, per-source-identity checkpoints.

Two design bugs found by TLC before any code was written:

Single-checkpoint model can't handle rotation during in-flight batches → fixed with per-source checkpoints keyed by file identity
Checkpoint = read_offset skips buffered lines → fixed with read_offset - framer_buf.len()

Safety: 10,939 states, 8 invariants + CheckpointMonotonicity — ALL PASS
Liveness: 288 states, Progress + EventualEmission — ALL PASS

Kani Proofs

crates/logfwd-core/src/checkpoint_tracker.rs — pure state machine for checkpoint-remainder coordination.

5 Kani harnesses proving: offset ordering invariants, checkpoint monotonicity, no data loss on crash, processed advances only on newline, overflow safety. 15 unit tests.

Verification Plans

checkpoint-kani-plan.md — Kani proof design
checkpoint-proptest-plan.md — proptest state machine with crash injection
checkpoint-tla-plan.md — TLA+ design doc
per-source-remainder-design.md — implementation plan for per-source remainder

Refs #806, #796, #797, #803

Test plan

cargo test -p logfwd-core --lib checkpoint_tracker — 15 tests pass
cargo clippy -p logfwd-core -- -D warnings — clean
TLC safety model — 10,939 states, no errors
TLC liveness model — 288 states, no errors
Kani proofs (requires nightly toolchain)
Proptest state machine (in progress)

🤖 Generated with Claude Code

Comprehensive audit of logfwd's file reading path compared with three production collectors. Documents 7 findings with priority fixes. Critical: shared remainder buffer (#797), truncation event (#796) High: unbounded read (#800), glob dedup (#799), fingerprint (#798) Medium: read fairness (#801), partial flush timer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-03T04:53:33Z

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Added a new end-to-end audit document (dev-docs/research/file-tailing-audit.md) describing logfwd’s file-tailing pipeline and decision points for discovery/identity, disk reads, truncation/rotation, checkpoint identity, remainder buffering/line splitting, glob rescans/deduplication, batching, and channel backpressure. The audit contrasts current behavior with Vector, OpenTelemetry Collector (filelog), and Fluent Bit, and enumerates concrete issues (shared remainder buffer across globbed files, truncation not emitting TailEvent::Truncated, unbounded read-buffer allocation, pathname-based glob dedup, xxh64 fingerprinting without device/inode) plus related timing/shutdown gaps and a prioritized remediation table referencing issues #796–#801.

Possibly related PRs

feat: Phase 5c — checkpoint wiring (#584, #585, #586, #587, #589) #637 — Modifies FileTailer/InputSource and pipeline checkpoint wiring (file_offsets/file_paths/ByteOffset, ChannelMsg, PipelineMachine, source lifecycle/registry), addressing many of the same file-tailing, checkpointing, and source-identity components discussed in the audit.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Maintainer Fitness	⚠️ Warning	PR description is auto-generated placeholder lacking substantive context about audit scope, findings, issues, and relationships to parallel work.	Expand PR description to include audit scope summary, finding enumeration with severities and issue numbers, relationship to parallel production issues, and note that audit supports remediation PRs.

✅ Passed checks (4 passed)

Check name	Status	Explanation
High-Quality Rust Practices	✅ Passed	The custom check for High-Quality Rust Practices applies to Rust code quality patterns. This PR is a documentation-only change with no modifications to any Rust source files.
Formal Verification Coverage	✅ Passed	Pull request adds only documentation (306 lines) with no production code, logfwd-core modules, or .rs/.toml file changes.
Documentation Thoroughly Updated	✅ Passed	This initial commit adds a complete, well-documented codebase with all required documentation files present and substantive, including architecture, design, verification, development, roadmap, and reference documentation.
Crate Boundary And Dependency Integrity	✅ Passed	PR is purely a documentation addition with no code changes, maintaining all crate boundary and dependency integrity constraints.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-docs/research/file-tailing-audit.md`:
- Line 68: The document's "### 6. No partial line flush timer" heading (and the
subsequent "Findings 6 and 7") lack references to issues in the PR objectives
table; either add the corresponding issue links/IDs next to those findings in
the markdown or explicitly note that no issue was filed and why; update the
“Findings 6 and 7” entries to include the PR objectives issue table reference
(or a clear statement that they are intentionally untracked) so reviewers can
find or verify the associated issue(s).
- Around line 115-116: Update the priority list entry that reads "Per-file read
fairness budget — MEDIUM" to append the issue reference so it becomes "Per-file
read fairness budget — MEDIUM (see issue `#801`)"; locate the exact list item text
in dev-docs/research/file-tailing-audit.md and modify that line accordingly to
include the "#801" reference.
- Line 86: Update the heading "8. No per-file read fairness" to include the
issue reference `#801` (e.g., change to "8. No per-file read fairness (see issue
`#801`)"), so the finding clearly links to that issue; locate the heading text "8.
No per-file read fairness" in dev-docs/research/file-tailing-audit.md and modify
it to append the issue number as shown.
- Line 35: Update the heading "Unbounded read in read_new_data" to include the
issue reference `#800` (e.g., "Unbounded read in read_new_data — issue `#800`") so
the finding is linked to the tracked issue; search for the exact heading text
"Unbounded read in read_new_data" in the document and modify it to append "issue
`#800`" or a similar clear reference to issue 800.
- Line 112: Update the line that currently reads "3. Bound read_new_data — HIGH,
new issue needed" to reflect that an issue already exists by changing it to
something like "3. Bound read_new_data — HIGH, tracked in issue `#800`
(read_new_data allocates unbounded Vec — OOM on large files)"; locate the entry
by the unique phrase "Bound read_new_data" and replace the "new issue needed"
status with the reference to issue `#800` and a brief descriptor.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7ef42886-d8b8-4908-8881-4c1897f9c1f6

📥 Commits

Reviewing files that changed from the base of the PR and between 429fac8 and c5e3a48.

📒 Files selected for processing (1)

dev-docs/research/file-tailing-audit.md

coderabbitai · 2026-04-03T04:58:47Z

+
+**Fix**: Emit `TailEvent::Truncated` when truncation detected.
+
+### 3. Unbounded read in read_new_data


⚠️ Potential issue | 🟡 Minor

Add issue reference #800 to finding 3 heading.

PR objectives indicate issue #800 covers "read_new_data allocates unbounded Vec — OOM on large files," which matches this finding.

📝 Suggested fix

-### 3. Unbounded read in read_new_data +### 3. Unbounded read in read_new_data (`#800`)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### 3. Unbounded read in read_new_data

### 3. Unbounded read in read_new_data (`#800`)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dev-docs/research/file-tailing-audit.md` at line 35, Update the heading "Unbounded read in read_new_data" to include the issue reference `#800` (e.g., "Unbounded read in read_new_data — issue `#800`") so the finding is linked to the tracked issue; search for the exact heading text "Unbounded read in read_new_data" in the document and modify it to append "issue `#800`" or a similar clear reference to issue 800.

coderabbitai · 2026-04-03T04:58:47Z

+OTel Collector flushes with a 5s timeout. Fluent Bit and logfwd both
+discard partial lines on shutdown.
+
+### 8. No per-file read fairness


⚠️ Potential issue | 🟡 Minor

Add issue reference #801 to finding 8 heading.

PR objectives indicate issue #801 covers "No per-file read fairness budget," which matches this finding.

📝 Suggested fix

-### 8. No per-file read fairness +### 8. No per-file read fairness (`#801`)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### 8. No per-file read fairness

### 8. No per-file read fairness (`#801`)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dev-docs/research/file-tailing-audit.md` at line 86, Update the heading "8. No per-file read fairness" to include the issue reference `#801` (e.g., change to "8. No per-file read fairness (see issue `#801`)"), so the finding clearly links to that issue; locate the heading text "8. No per-file read fairness" in dev-docs/research/file-tailing-audit.md and modify it to append the issue number as shown.

Replaces comparison table with: - ASCII data flow diagram showing all 8 decision points - Per-decision options with pros/cons and industry context - Recommendations for each - Priority-ordered fix table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…audit Address CodeRabbit review feedback on PR #802: explicitly mark the partial line flush timer and partial line on shutdown sections as having no filed issue (future enhancements, not bugs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot requested changes Apr 3, 2026

View reviewed changes

strawgate merged commit 4488fad into master Apr 3, 2026
8 of 9 checks passed

strawgate deleted the docs/file-tailing-audit branch April 3, 2026 05:04

strawgate mentioned this pull request Apr 3, 2026

epic: File tailing correctness — per-source remainder + checkpoint safety #806

Closed

8 tasks

coderabbitai Bot mentioned this pull request Apr 3, 2026

fix: file tailing reliability — truncation, OOM, glob dedup, checkpoint validation, warnings (#796, #800, #799, #656, #730) #808

Merged

4 tasks

strawgate changed the title ~~docs: file tailing audit — comparison with industry collectors~~ feat: file tailing audit + TLA+ spec + Kani checkpoint proofs Apr 3, 2026

This was referenced Apr 3, 2026

feat: wire FileCheckpointStore into pipeline — persist and restore file offsets (#588) #784

Merged

feat: per-source remainder tracking in FramedInput #837

Merged

Copilot AI mentioned this pull request Apr 3, 2026

fix: TCP cross-connection corruption, fingerprint collision, per-file read fairness #868

Closed

coderabbitai Bot mentioned this pull request Apr 4, 2026

refactor(logfwd-io): split FileTailer + Bytes buffer ownership #939

Merged

5 tasks

github-actions Bot mentioned this pull request Apr 5, 2026

arch: property-based tests for checkpoint state machine at-least-once invariant #1109

Closed

coderabbitai Bot mentioned this pull request Apr 6, 2026

fix(tail): normalize empty glob root, fix depth for relative patterns, remove unsafe #1350

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: file tailing audit + TLA+ spec + Kani checkpoint proofs#802

feat: file tailing audit + TLA+ spec + Kani checkpoint proofs#802
strawgate merged 2 commits into
masterfrom
docs/file-tailing-audit

strawgate commented Apr 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

Review failed

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Uh oh!

Uh oh!

coderabbitai Bot Apr 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		Fix: Emit `TailEvent::Truncated` when truncation detected.

		### 3. Unbounded read in read_new_data

	### 3. Unbounded read in read_new_data
	### 3. Unbounded read in read_new_data (`#800`)

	### 8. No per-file read fairness
	### 8. No per-file read fairness (`#801`)

Conversation

strawgate commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Audits

TLA+ Specification

Kani Proofs

Verification Plans

Test plan

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

strawgate commented Apr 3, 2026 •

edited

Loading

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading