Skip to content

feat: file tailing audit + TLA+ spec + Kani checkpoint proofs#802

Merged
strawgate merged 2 commits into
masterfrom
docs/file-tailing-audit
Apr 3, 2026
Merged

feat: file tailing audit + TLA+ spec + Kani checkpoint proofs#802
strawgate merged 2 commits into
masterfrom
docs/file-tailing-audit

Conversation

@strawgate
Copy link
Copy Markdown
Owner

@strawgate strawgate commented Apr 3, 2026

Summary

End-to-end audit of logfwd's file and network input paths, formal verification of the checkpoint design, and Kani-proven checkpoint state machine.

Audits

TLA+ Specification

dev-docs/tla/FileCheckpoint.tla — 12 actions, per-source-identity checkpoints.

Two design bugs found by TLC before any code was written:

  1. Single-checkpoint model can't handle rotation during in-flight batches → fixed with per-source checkpoints keyed by file identity
  2. Checkpoint = read_offset skips buffered lines → fixed with read_offset - framer_buf.len()

Safety: 10,939 states, 8 invariants + CheckpointMonotonicity — ALL PASS
Liveness: 288 states, Progress + EventualEmission — ALL PASS

Kani Proofs

crates/logfwd-core/src/checkpoint_tracker.rs — pure state machine for checkpoint-remainder coordination.

5 Kani harnesses proving: offset ordering invariants, checkpoint monotonicity, no data loss on crash, processed advances only on newline, overflow safety. 15 unit tests.

Verification Plans

  • checkpoint-kani-plan.md — Kani proof design
  • checkpoint-proptest-plan.md — proptest state machine with crash injection
  • checkpoint-tla-plan.md — TLA+ design doc
  • per-source-remainder-design.md — implementation plan for per-source remainder

Refs #806, #796, #797, #803

Test plan

  • cargo test -p logfwd-core --lib checkpoint_tracker — 15 tests pass
  • cargo clippy -p logfwd-core -- -D warnings — clean
  • TLC safety model — 10,939 states, no errors
  • TLC liveness model — 288 states, no errors
  • Kani proofs (requires nightly toolchain)
  • Proptest state machine (in progress)

🤖 Generated with Claude Code

Comprehensive audit of logfwd's file reading path compared with three
production collectors. Documents 7 findings with priority fixes.

Critical: shared remainder buffer (#797), truncation event (#796)
High: unbounded read (#800), glob dedup (#799), fingerprint (#798)
Medium: read fairness (#801), partial flush timer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Added a new end-to-end audit document (dev-docs/research/file-tailing-audit.md) describing logfwd’s file-tailing pipeline and decision points for discovery/identity, disk reads, truncation/rotation, checkpoint identity, remainder buffering/line splitting, glob rescans/deduplication, batching, and channel backpressure. The audit contrasts current behavior with Vector, OpenTelemetry Collector (filelog), and Fluent Bit, and enumerates concrete issues (shared remainder buffer across globbed files, truncation not emitting TailEvent::Truncated, unbounded read-buffer allocation, pathname-based glob dedup, xxh64 fingerprinting without device/inode) plus related timing/shutdown gaps and a prioritized remediation table referencing issues #796#801.

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Maintainer Fitness ⚠️ Warning PR description is auto-generated placeholder lacking substantive context about audit scope, findings, issues, and relationships to parallel work. Expand PR description to include audit scope summary, finding enumeration with severities and issue numbers, relationship to parallel production issues, and note that audit supports remediation PRs.
✅ Passed checks (4 passed)
Check name Status Explanation
High-Quality Rust Practices ✅ Passed The custom check for High-Quality Rust Practices applies to Rust code quality patterns. This PR is a documentation-only change with no modifications to any Rust source files.
Formal Verification Coverage ✅ Passed Pull request adds only documentation (306 lines) with no production code, logfwd-core modules, or .rs/.toml file changes.
Documentation Thoroughly Updated ✅ Passed This initial commit adds a complete, well-documented codebase with all required documentation files present and substantive, including architecture, design, verification, development, roadmap, and reference documentation.
Crate Boundary And Dependency Integrity ✅ Passed PR is purely a documentation addition with no code changes, maintaining all crate boundary and dependency integrity constraints.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-docs/research/file-tailing-audit.md`:
- Line 68: The document's "### 6. No partial line flush timer" heading (and the
subsequent "Findings 6 and 7") lack references to issues in the PR objectives
table; either add the corresponding issue links/IDs next to those findings in
the markdown or explicitly note that no issue was filed and why; update the
“Findings 6 and 7” entries to include the PR objectives issue table reference
(or a clear statement that they are intentionally untracked) so reviewers can
find or verify the associated issue(s).
- Around line 115-116: Update the priority list entry that reads "Per-file read
fairness budget — MEDIUM" to append the issue reference so it becomes "Per-file
read fairness budget — MEDIUM (see issue `#801`)"; locate the exact list item text
in dev-docs/research/file-tailing-audit.md and modify that line accordingly to
include the "#801" reference.
- Line 86: Update the heading "8. No per-file read fairness" to include the
issue reference `#801` (e.g., change to "8. No per-file read fairness (see issue
`#801`)"), so the finding clearly links to that issue; locate the heading text "8.
No per-file read fairness" in dev-docs/research/file-tailing-audit.md and modify
it to append the issue number as shown.
- Line 35: Update the heading "Unbounded read in read_new_data" to include the
issue reference `#800` (e.g., "Unbounded read in read_new_data — issue `#800`") so
the finding is linked to the tracked issue; search for the exact heading text
"Unbounded read in read_new_data" in the document and modify it to append "issue
`#800`" or a similar clear reference to issue 800.
- Line 112: Update the line that currently reads "3. Bound read_new_data — HIGH,
new issue needed" to reflect that an issue already exists by changing it to
something like "3. Bound read_new_data — HIGH, tracked in issue `#800`
(read_new_data allocates unbounded Vec — OOM on large files)"; locate the entry
by the unique phrase "Bound read_new_data" and replace the "new issue needed"
status with the reference to issue `#800` and a brief descriptor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7ef42886-d8b8-4908-8881-4c1897f9c1f6

📥 Commits

Reviewing files that changed from the base of the PR and between 429fac8 and c5e3a48.

📒 Files selected for processing (1)
  • dev-docs/research/file-tailing-audit.md

Comment thread dev-docs/research/file-tailing-audit.md Outdated

**Fix**: Emit `TailEvent::Truncated` when truncation detected.

### 3. Unbounded read in read_new_data
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add issue reference #800 to finding 3 heading.

PR objectives indicate issue #800 covers "read_new_data allocates unbounded Vec — OOM on large files," which matches this finding.

📝 Suggested fix
-### 3. Unbounded read in read_new_data
+### 3. Unbounded read in read_new_data (`#800`)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### 3. Unbounded read in read_new_data
### 3. Unbounded read in read_new_data (`#800`)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-docs/research/file-tailing-audit.md` at line 35, Update the heading
"Unbounded read in read_new_data" to include the issue reference `#800` (e.g.,
"Unbounded read in read_new_data — issue `#800`") so the finding is linked to the
tracked issue; search for the exact heading text "Unbounded read in
read_new_data" in the document and modify it to append "issue `#800`" or a similar
clear reference to issue 800.

Comment thread dev-docs/research/file-tailing-audit.md Outdated
Comment thread dev-docs/research/file-tailing-audit.md Outdated
OTel Collector flushes with a 5s timeout. Fluent Bit and logfwd both
discard partial lines on shutdown.

### 8. No per-file read fairness
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add issue reference #801 to finding 8 heading.

PR objectives indicate issue #801 covers "No per-file read fairness budget," which matches this finding.

📝 Suggested fix
-### 8. No per-file read fairness
+### 8. No per-file read fairness (`#801`)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### 8. No per-file read fairness
### 8. No per-file read fairness (`#801`)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-docs/research/file-tailing-audit.md` at line 86, Update the heading "8.
No per-file read fairness" to include the issue reference `#801` (e.g., change to
"8. No per-file read fairness (see issue `#801`)"), so the finding clearly links
to that issue; locate the heading text "8. No per-file read fairness" in
dev-docs/research/file-tailing-audit.md and modify it to append the issue number
as shown.

Comment thread dev-docs/research/file-tailing-audit.md Outdated
Comment thread dev-docs/research/file-tailing-audit.md Outdated
Replaces comparison table with:
- ASCII data flow diagram showing all 8 decision points
- Per-decision options with pros/cons and industry context
- Recommendations for each
- Priority-ordered fix table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@strawgate strawgate merged commit 4488fad into master Apr 3, 2026
8 of 9 checks passed
@strawgate strawgate deleted the docs/file-tailing-audit branch April 3, 2026 05:04
@strawgate strawgate changed the title docs: file tailing audit — comparison with industry collectors feat: file tailing audit + TLA+ spec + Kani checkpoint proofs Apr 3, 2026
strawgate added a commit that referenced this pull request Apr 3, 2026
…audit

Address CodeRabbit review feedback on PR #802: explicitly mark the
partial line flush timer and partial line on shutdown sections as having
no filed issue (future enhancements, not bugs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant