feat: logfwd-core is now #![no_std] + #![forbid(unsafe_code)]#375
Conversation
The proven kernel. logfwd-core compiles with no_std and forbids all unsafe code. The compiler enforces: no IO, no filesystem, no threads, no network, no unsafe — structurally impossible, not just linted. Impure modules moved to new logfwd-io crate: - diagnostics, checkpoint, enrichment, tail, input (IO/threads) - compress (std::io), format (old parser), filter_hints logfwd-core now contains only proven pure logic: - structural.rs — SIMD detection via wide (safe) - scanner.rs — JSON field extraction - framer.rs — newline framing (Kani-proven) - aggregator.rs — CRI P/F reassembly (Kani-proven) - otlp.rs — protobuf encoding helpers - cri.rs — CRI format parsing - scan_config.rs — field selection - byte_search.rs — proven byte search Dependencies: only memchr + wide (both no_std compatible). 439 workspace tests pass. Zero compilation warnings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 17 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (8)
WalkthroughA new crate, Possibly related PRs
Comment |
With no_std + forbid(unsafe_code) in place, add missing_docs warning to enforce documentation on all public items in the kernel. Fixed 30 missing doc items: - Crate-level doc comment explaining the kernel's guarantees - Module-level docs on all 8 pub mod declarations - RawBlockMasks: all 10 field docs - Severity enum: all 7 variant docs - FieldSpec: struct + field docs - CriLine, CriAggregator, CriReassembler::new: struct/fn docs - StreamingClassifier::new: constructor doc - Fixed misplaced `use alloc` statements breaking doc comment chains logfwd-core now enforces: no_std, forbid(unsafe_code), warn(missing_docs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@coderabbitai this is a very important PR please be very thorough and accurate |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain
🏁 Script executed: Length of output: 79483 Let me get the rest of the diff and key files. 🏁 Script executed: Length of output: 75733
Analysis Summary✅ What's correct and well-executed
|
The old files (diagnostics, checkpoint, enrichment, tail, input, compress, format, filter_hints, dashboard.html) were copied to logfwd-io but not deleted from logfwd-core/src/. They weren't compiled (not in lib.rs) but were dead files in the directory. Also: fix fuzz target format_parser.rs to import from logfwd-io. -4,340 lines of dead code removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@coderabbitai this is a very important PR please be very thorough and accurate |
There was a problem hiding this comment.
Actionable comments posted: 14
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/logfwd-core/src/lib.rs`:
- Around line 6-14: Remove the stale moved modules left in logfwd-core/src so
there's only one source of truth: delete the old files compress.rs,
diagnostics.rs, and enrichment.rs from logfwd-core/src (and any other files that
were moved to logfwd-io) and ensure lib.rs has no remaining pub mod lines
referencing them; after removal, run a build/test (cargo build / cargo test) to
confirm no references remain.
In `@crates/logfwd-core/src/scan_config.rs`:
- Around line 7-8: Remove the stray doc comment that precedes the import
statement "use alloc::{string::String, vec, vec::Vec};" (do not document use
statements); either delete that doc comment or move the import above the doc
comment for the "FieldSpec" struct so the struct retains its doc comment and the
use line has no documentation attached.
In `@crates/logfwd-io/src/checkpoint.rs`:
- Around line 186-217: libc_getuid() does file I/O on every call (reading
/proc/self/status), so change it to compute the UID once and cache it or call
the libc syscall directly; specifically, update libc_getuid() (and any callers
like default_data_dir()) to use a static one-time cache (e.g.,
once_cell::sync::OnceCell or std::sync::Once) that stores the parsed u32 on
first invocation and returns the cached value thereafter, or replace the parsing
branch with a direct libc::getuid() call under cfg(unix) if you prefer adding
the libc dependency—ensure the function still returns a fallback (1000) when
detection fails.
- Around line 123-146: The flush method currently writes to a tmp file and
renames it but misses syncing the parent directory; after std::fs::rename(&tmp,
&final_path)? call, open the parent directory of final_path and call sync_all()
on that directory File to ensure rename is durable (use
final_path.parent().unwrap_or(Path::new(".")) to locate the dir), propagate any
io errors as io::Result so flush still returns Err on failure; reference the
flush function and the tmp_path() / checkpoints_path() names when making the
change.
In `@crates/logfwd-io/src/dashboard.html`:
- Around line 118-134: The poll() loop can overlap requests and allow
out-of-order responses to make rate() return negative values and leave the UI
showing "live" on errors; fix by making poll() non-reentrant (use an inFlight
flag or start the next poll from xhr.onloadend instead of setInterval) so a new
XHR is not issued until the prior request completes, update xhr.onload to treat
non-200 responses as disconnected (set the "status" element to "disconnected"
and skip render), move the JSON.parse try/catch to set status to "disconnected"
on parse errors (inside render call handling), and harden rate() to guard dt > 0
and clamp negative deltas to zero before computing (or return 0 on invalid data)
to avoid negative rates.
In `@crates/logfwd-io/src/diagnostics.rs`:
- Around line 647-660: esc() currently only handles a few characters and fails
to escape control codepoints (e.g., U+0000, U+0008, U+000C), which can produce
invalid JSON; update the esc function to, after matching '"', '\\', '\n', '\r',
'\t', check the character's Unicode scalar value (as u32) and if it's less than
0x20 emit a \u00XX style escape (four hex digits) for that codepoint, otherwise
fall back to pushing the character; keep the existing explicit escapes for
quotes/backslashes/newlines/r/ t and ensure the \u escape uses zero-padded
two-digit hex in the lower byte (formatted as \u00NN) so all control characters
are safely encoded.
In `@crates/logfwd-io/src/enrichment.rs`:
- Around line 356-365: The current code reads CSV headers into let headers:
Vec<String> using csv_reader.headers() and only checks headers.is_empty();
update this block to also reject empty header names and duplicate header names
up front: after building headers, check for any header.trim().is_empty() and
return Err("CSV has empty column name".to_string()) if found, and check for
duplicates by inserting header strings into a HashSet (or using
headers.iter().collect::<HashSet<_>>() length comparison) and return Err("CSV
has duplicate column names".to_string()) if duplicates exist; keep using
csv_reader.headers() and the same headers variable so callers and the downstream
RecordBatch building logic remain unchanged.
- Around line 369-404: The CSV padding currently writes empty strings into
columns, collapsing empty vs missing; change the column storage and array
construction to preserve nulls: make columns a Vec<Vec<Option<String>>> (instead
of Vec<Vec<String>>), push Some(field.to_string()) for present fields and push
None when padding short rows (in the loop over csv_reader.records and the
padding loop), then when creating Arrow arrays convert each Option<String> to
Option<&str> (e.g., via as_deref()) and collect into a StringArray so missing
fields become nulls; keep the existing Field::new(..., DataType::Utf8, true) and
RecordBatch::try_new usage.
In `@crates/logfwd-io/src/format.rs`:
- Around line 107-125: The current guard `if !line.is_empty()` drops empty raw
lines, losing explicit blank records; remove that condition so the code always
emits a JSON record for every `line` (including empty ones) by writing out the
prefix `{"_raw":"`, running the existing escaping loop (which is a no-op for
empty slices), appending the closing `"}\n"`, and incrementing `count` for every
line; update the block around `line`, `out`, and `count` in format.rs
accordingly so blank `b"\n"` lines produce `{"_raw":""}\n` instead of being
skipped.
- Around line 68-69: The partial buffer growth must be bounded: wherever you
call self.partial.extend_from_slice(&bytes[start..]) (seen in the JsonParser,
RawParser and CriParser parsing paths), check and enforce max_line_size before
extending — compute new_len = self.partial.len() + slice.len(), and if new_len >
self.max_line_size only extend up to the remaining allowed bytes (or stop
extending and mark/truncate the line) instead of unboundedly appending; ensure
CriParser uses its outer max_line_size when extending its outer partial as well
and propagate a truncation/overflow flag or behavior consistently so a single
newline-free input cannot grow memory without bound.
In `@crates/logfwd-io/src/tail.rs`:
- Around line 406-408: Avoid allocating a new Vec each poll by not calling
self.files.keys().cloned().collect(); instead iterate the key iterator or
maintain a persistent order vector. Replace the current allocation with a loop
like `for path in self.files.keys().cloned()` (no collect) or add a persistent
`read_order: Vec<PathBuf>` field and iterate that, updating it when files
change; update the loop that uses `paths` (the current `paths: Vec<PathBuf>`
variable) to use the new iteration approach and ensure borrows of `self.files`
inside the loop are handled correctly.
- Around line 502-511: The loop reads into result incrementally causing
reallocations; before the loop allocate result with Vec::with_capacity by
computing the expected remaining bytes (e.g., let remaining =
(file_size.saturating_sub(tailed.offset)) as usize) using the tailed file's
metadata and tailed.offset, then call Vec::with_capacity(remaining) so the
subsequent loop (tailed.file.read into self.read_buf, extending result, and
updating tailed.offset) won't reallocate; ensure you clamp or check remaining to
a sane max (and handle metadata errors) to avoid panic or huge allocations.
- Around line 345-347: The code currently clones the entire Vec<PathBuf> each
poll by calling watch_paths.clone(); change the loop to iterate over a reference
to the existing collection to avoid the allocation: stop cloning and use for
path in &self.watch_paths (or self.watch_paths.iter()) inside the polling code
where watch_paths is defined, ensuring any mutations that previously required an
owned Vec are adjusted (e.g., collect indices or clone only when absolutely
needed).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: da038166-9ab5-4490-a1c2-16edbc9c62cf
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (35)
Cargo.tomlcrates/logfwd-bench/Cargo.tomlcrates/logfwd-bench/benches/pipeline.rscrates/logfwd-bench/src/e2e_profile.rscrates/logfwd-core/Cargo.tomlcrates/logfwd-core/src/aggregator.rscrates/logfwd-core/src/byte_search.rscrates/logfwd-core/src/cri.rscrates/logfwd-core/src/lib.rscrates/logfwd-core/src/otlp.rscrates/logfwd-core/src/scan_config.rscrates/logfwd-core/src/scanner.rscrates/logfwd-core/src/structural.rscrates/logfwd-core/tests/compliance_data.rscrates/logfwd-io/Cargo.tomlcrates/logfwd-io/src/checkpoint.rscrates/logfwd-io/src/compress.rscrates/logfwd-io/src/dashboard.htmlcrates/logfwd-io/src/diagnostics.rscrates/logfwd-io/src/enrichment.rscrates/logfwd-io/src/filter_hints.rscrates/logfwd-io/src/format.rscrates/logfwd-io/src/input.rscrates/logfwd-io/src/lib.rscrates/logfwd-io/src/tail.rscrates/logfwd-output/Cargo.tomlcrates/logfwd-output/src/otlp_sink.rscrates/logfwd-transform/Cargo.tomlcrates/logfwd-transform/src/lib.rscrates/logfwd-transform/src/udf/geo_lookup.rscrates/logfwd-transform/tests/scanner_datafusion_boundary.rscrates/logfwd/Cargo.tomlcrates/logfwd/src/main.rscrates/logfwd/src/pipeline.rscrates/logfwd/tests/integration.rs
| fn flush(&mut self) -> io::Result<()> { | ||
| use std::io::Write as _; | ||
|
|
||
| let list: Vec<&SourceCheckpoint> = self.checkpoints.values().collect(); | ||
| let json = serde_json::to_vec_pretty(&list) | ||
| .map_err(|e| io::Error::new(io::ErrorKind::InvalidData, e))?; | ||
|
|
||
| let tmp = self.tmp_path(); | ||
| let final_path = self.checkpoints_path(); | ||
|
|
||
| { | ||
| let mut file = std::fs::OpenOptions::new() | ||
| .write(true) | ||
| .create(true) | ||
| .truncate(true) | ||
| .open(&tmp)?; | ||
| file.write_all(&json)?; | ||
| file.sync_all()?; | ||
| } | ||
|
|
||
| std::fs::rename(&tmp, &final_path)?; | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Missing directory fsync after rename for crash-safe durability.
The atomic write pattern (tmp → fsync → rename) is correct, but on Linux/ext4, rename durability requires an additional fsync on the parent directory. Without it, a crash after rename but before journal commit can lose the file.
🛡️ Proposed fix to add directory fsync
std::fs::rename(&tmp, &final_path)?;
+ // Sync the directory to ensure the rename is durable.
+ if let Ok(dir) = std::fs::File::open(&self.data_dir) {
+ let _ = dir.sync_all();
+ }
+
Ok(())
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-io/src/checkpoint.rs` around lines 123 - 146, The flush method
currently writes to a tmp file and renames it but misses syncing the parent
directory; after std::fs::rename(&tmp, &final_path)? call, open the parent
directory of final_path and call sync_all() on that directory File to ensure
rename is durable (use final_path.parent().unwrap_or(Path::new(".")) to locate
the dir), propagate any io errors as io::Result so flush still returns Err on
failure; reference the flush function and the tmp_path() / checkpoints_path()
names when making the change.
| #[cfg(unix)] | ||
| fn libc_getuid() -> u32 { | ||
| // Use the raw syscall via std rather than pulling in libc. | ||
| // std::os::unix doesn't expose getuid directly, so we use a cfg-guarded | ||
| // approach: on non-root systems HOME is always set, so the root check is | ||
| // mostly a documentation hint. We fall back to HOME-based path if unsure. | ||
| // | ||
| // Using nix or libc would be cleaner but they aren't dependencies. | ||
| // Instead we read /proc/self/status (Linux) or skip the check (macOS). | ||
| #[cfg(target_os = "linux")] | ||
| { | ||
| if let Ok(status) = std::fs::read_to_string("/proc/self/status") { | ||
| for line in status.lines() { | ||
| #[allow(clippy::collapsible_if)] | ||
| if let Some(rest) = line.strip_prefix("Uid:") { | ||
| if let Some(uid_str) = rest.split_whitespace().next() { | ||
| if let Ok(uid) = uid_str.parse::<u32>() { | ||
| return uid; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| // Couldn't determine, assume non-root. | ||
| 1000 | ||
| } | ||
| #[cfg(not(target_os = "linux"))] | ||
| { | ||
| // On macOS / other unices, default to non-root path. | ||
| 1000 | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
libc_getuid() reads /proc/self/status on every call.
This performs I/O each time default_data_dir() is called. If called frequently, consider caching the result or using the libc crate directly (single syscall vs. file read + parse).
Current impact is likely minimal since default_data_dir() is typically called once at startup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-io/src/checkpoint.rs` around lines 186 - 217, libc_getuid()
does file I/O on every call (reading /proc/self/status), so change it to compute
the UID once and cache it or call the libc syscall directly; specifically,
update libc_getuid() (and any callers like default_data_dir()) to use a static
one-time cache (e.g., once_cell::sync::OnceCell or std::sync::Once) that stores
the parsed u32 on first invocation and returns the cached value thereafter, or
replace the parsing branch with a direct libc::getuid() call under cfg(unix) if
you prefer adding the libc dependency—ensure the function still returns a
fallback (1000) when detection fails.
| function poll() { | ||
| var xhr = new XMLHttpRequest(); | ||
| xhr.open("GET", "/api/pipelines"); | ||
| xhr.onload = function() { | ||
| if (xhr.status === 200) { | ||
| document.getElementById("status").textContent = "live"; | ||
| try { render(JSON.parse(xhr.responseText)); } catch(e) {} | ||
| } | ||
| }; | ||
| xhr.onerror = function() { | ||
| document.getElementById("status").textContent = "disconnected"; | ||
| }; | ||
| xhr.send(); | ||
| } | ||
|
|
||
| poll(); | ||
| setInterval(poll, 1000); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's see the structure and find the dashboard.html file
find . -name "dashboard.html" -type fRepository: strawgate/memagent
Length of output: 139
🏁 Script executed:
# Read the dashboard.html file to see the code and understand the context
wc -l ./crates/logfwd-io/src/dashboard.htmlRepository: strawgate/memagent
Length of output: 103
🏁 Script executed:
# Read the entire dashboard.html file to see the polling logic and rate calculation
cat -n ./crates/logfwd-io/src/dashboard.htmlRepository: strawgate/memagent
Length of output: 6417
🏁 Script executed:
# Let me trace through a specific scenario to confirm the rate calculation vulnerability
# Check if there are any guards against negative values anywhere else
rg -i "negative|rate|delta" ./crates/logfwd-io/src/dashboard.htmlRepository: strawgate/memagent
Length of output: 749
Prevent overlapping polls from rendering stale or negative rates.
setInterval fires a new XHR every second even if the previous /api/pipelines request is still running. Under a slow diagnostics endpoint, responses can arrive out of order, overwrite a newer snapshot with an older one, and make the delta math in rate() go negative. The rate() function at line 46-50 calculates (cur - prevVal) / dt with no safeguard—if a newer response is overwritten by an older one, counters decrease and rates become negative.
Additionally, non-200 responses silently skip rendering while leaving the "live" status intact, and JSON parse failures (line 124) are caught without updating the status to reflect a disconnected state.
Drive the next poll from onloadend or guard with an in-flight flag to ensure requests complete before the next one starts. Treat non-200 and JSON parse failures as disconnected errors.
Suggested fix
(function() {
var prev = null;
var prevTime = null;
+ var inFlight = false;
function fmt(n) {
@@
function poll() {
+ if (inFlight) return;
+ inFlight = true;
+
var xhr = new XMLHttpRequest();
xhr.open("GET", "/api/pipelines");
xhr.onload = function() {
- if (xhr.status === 200) {
- document.getElementById("status").textContent = "live";
- try { render(JSON.parse(xhr.responseText)); } catch(e) {}
- }
+ if (xhr.status !== 200) {
+ document.getElementById("status").textContent = "disconnected";
+ return;
+ }
+
+ try {
+ render(JSON.parse(xhr.responseText));
+ document.getElementById("status").textContent = "live";
+ } catch (e) {
+ document.getElementById("status").textContent = "disconnected";
+ }
};
xhr.onerror = function() {
document.getElementById("status").textContent = "disconnected";
};
+ xhr.onloadend = function() {
+ inFlight = false;
+ setTimeout(poll, 1000);
+ };
xhr.send();
}
poll();
- setInterval(poll, 1000);
})();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-io/src/dashboard.html` around lines 118 - 134, The poll() loop
can overlap requests and allow out-of-order responses to make rate() return
negative values and leave the UI showing "live" on errors; fix by making poll()
non-reentrant (use an inFlight flag or start the next poll from xhr.onloadend
instead of setInterval) so a new XHR is not issued until the prior request
completes, update xhr.onload to treat non-200 responses as disconnected (set the
"status" element to "disconnected" and skip render), move the JSON.parse
try/catch to set status to "disconnected" on parse errors (inside render call
handling), and harden rate() to guard dt > 0 and clamp negative deltas to zero
before computing (or return 0 on invalid data) to avoid negative rates.
| if start < bytes.len() { | ||
| self.partial.extend_from_slice(&bytes[start..]); |
There was a problem hiding this comment.
Cap partial growth on newline-free input.
These buffers grow until a newline arrives. JsonParser and RawParser have no limit at all, and CriParser does not apply max_line_size to its outer partial buffer. A single malformed or adversarial line can therefore grow memory without bound before any truncation happens.
Also applies to: 132-133, 166-170
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-io/src/format.rs` around lines 68 - 69, The partial buffer
growth must be bounded: wherever you call
self.partial.extend_from_slice(&bytes[start..]) (seen in the JsonParser,
RawParser and CriParser parsing paths), check and enforce max_line_size before
extending — compute new_len = self.partial.len() + slice.len(), and if new_len >
self.max_line_size only extend up to the remaining allowed bytes (or stop
extending and mark/truncate the line) instead of unboundedly appending; ensure
CriParser uses its outer max_line_size when extending its outer partial as well
and propagate a truncation/overflow flag or behavior consistently so a single
newline-free input cannot grow memory without bound.
| if !line.is_empty() { | ||
| out.extend_from_slice(b"{\"_raw\":\""); | ||
| for &b in line { | ||
| match b { | ||
| b'"' => out.extend_from_slice(b"\\\""), | ||
| b'\\' => out.extend_from_slice(b"\\\\"), | ||
| b'\n' => out.extend_from_slice(b"\\n"), | ||
| b'\r' => out.extend_from_slice(b"\\r"), | ||
| b'\t' => out.extend_from_slice(b"\\t"), | ||
| b if b < 0x20 => { | ||
| // Escape control characters per RFC 8259. | ||
| let _ = std::io::Write::write_fmt(out, format_args!("\\u{:04x}", b)); | ||
| } | ||
| _ => out.push(b), | ||
| } | ||
| } | ||
| out.extend_from_slice(b"\"}\n"); | ||
| count += 1; | ||
| } |
There was a problem hiding this comment.
Preserve blank raw lines instead of dropping them.
The if !line.is_empty() guard turns b"\n" into no output. In raw mode that is data loss: explicit blank records never reach downstream consumers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-io/src/format.rs` around lines 107 - 125, The current guard `if
!line.is_empty()` drops empty raw lines, losing explicit blank records; remove
that condition so the code always emits a JSON record for every `line`
(including empty ones) by writing out the prefix `{"_raw":"`, running the
existing escaping loop (which is a no-op for empty slices), appending the
closing `"}\n"`, and incrementing `count` for every line; update the block
around `line`, `out`, and `count` in format.rs accordingly so blank `b"\n"`
lines produce `{"_raw":""}\n` instead of being skipped.
| // Check for new/rotated files. | ||
| let watch_paths = self.watch_paths.clone(); | ||
| for path in &watch_paths { |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Minor: watch_paths.clone() on every poll.
This clones the entire Vec<PathBuf> each poll cycle. For large numbers of watched paths, consider iterating over a reference or using indices.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-io/src/tail.rs` around lines 345 - 347, The code currently
clones the entire Vec<PathBuf> each poll by calling watch_paths.clone(); change
the loop to iterate over a reference to the existing collection to avoid the
allocation: stop cloning and use for path in &self.watch_paths (or
self.watch_paths.iter()) inside the polling code where watch_paths is defined,
ensuring any mutations that previously required an owned Vec are adjusted (e.g.,
collect indices or clone only when absolutely needed).
| // Read new data from all tailed files. | ||
| let paths: Vec<PathBuf> = self.files.keys().cloned().collect(); | ||
| for path in paths { |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Minor: files.keys().cloned().collect() on every poll.
Same pattern — allocates a new Vec<PathBuf> each poll. Could iterate keys directly with careful borrow management or use a separate read_order Vec.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-io/src/tail.rs` around lines 406 - 408, Avoid allocating a new
Vec each poll by not calling self.files.keys().cloned().collect(); instead
iterate the key iterator or maintain a persistent order vector. Replace the
current allocation with a loop like `for path in self.files.keys().cloned()` (no
collect) or add a persistent `read_order: Vec<PathBuf>` field and iterate that,
updating it when files change; update the loop that uses `paths` (the current
`paths: Vec<PathBuf>` variable) to use the new iteration approach and ensure
borrows of `self.files` inside the loop are handled correctly.
| // Read ALL available bytes in a loop until we drain everything. | ||
| let mut result = Vec::new(); | ||
| loop { | ||
| let n = tailed.file.read(&mut self.read_buf)?; | ||
| if n == 0 { | ||
| break; | ||
| } | ||
| result.extend_from_slice(&self.read_buf[..n]); | ||
| tailed.offset += n as u64; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider pre-allocating result Vec based on available bytes.
The loop extends result incrementally. Pre-allocating with Vec::with_capacity((current_size - tailed.offset) as usize) would avoid reallocations when reading large chunks.
♻️ Suggested optimization
// Read ALL available bytes in a loop until we drain everything.
- let mut result = Vec::new();
+ let expected = (current_size - tailed.offset) as usize;
+ let mut result = Vec::with_capacity(expected);
loop {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/logfwd-io/src/tail.rs` around lines 502 - 511, The loop reads into
result incrementally causing reallocations; before the loop allocate result with
Vec::with_capacity by computing the expected remaining bytes (e.g., let
remaining = (file_size.saturating_sub(tailed.offset)) as usize) using the tailed
file's metadata and tailed.offset, then call Vec::with_capacity(remaining) so
the subsequent loop (tailed.file.read into self.read_buf, extending result, and
updating tailed.offset) won't reallocate; ensure you clamp or check remaining to
a sane max (and handle metadata errors) to avoid panic or huge allocations.
scan_config.rs had a doc comment attached to the `use alloc` import instead of the struct. Removed the duplicate. Pre-existing bugs in moved code filed as separate issues: #380 — diagnostics uptime/CPU mismatch #381 — diagnostics JSON escape incomplete #382 — enrichment CSV empty/duplicate headers #383 — enrichment CSV missing fields as empty vs null #384 — format parser unbounded partial buffer #385 — RawParser drops blank lines #386 — checkpoint missing directory fsync Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The automated review will add inline comments to the diff. The most actionable items before merge are #1 (incorrect metric semantics) and #2 (potential MSRV breakage on let-chains). Let me know if you'd like me to dig deeper into any of these. 🧠 Learnings used✅ Actions performedFull review triggered. |
CodeRabbit Round 1 — All addressedFixed in this PR:
Pre-existing bugs in moved code — filed as separate issues (not fixing in this PR to avoid scope creep):
Trivial nitpicks (tail.rs clone/collect per poll, checkpoint libc_getuid): All CI checks pass. 439 tests green. |
Found by CodeRabbit in PR #375. JsonParser, RawParser, and CriParser partial buffers previously grew without limit until a newline arrived, risking memory exhaustion from a single malformed or extremely long line. This change: 1. Adds `max_line_size` to `JsonParser` and `RawParser` constructors. 2. Caps `partial` buffer growth in `process()` for all parsers. 3. Discards data exceeding `max_line_size` until the next newline. 4. Updates `CriParser` to also cap its outer partial buffer. 5. Sets a default 2MB limit in the production pipeline. 6. Updates tests and fuzzers to accommodate the new constructor signatures. Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Found by CodeRabbit in PR #375. JsonParser, RawParser, and CriParser partial buffers previously grew without limit until a newline arrived, risking memory exhaustion from a single malformed or extremely long line. This change: 1. Adds `max_line_size` to `JsonParser` and `RawParser` constructors. 2. Caps `partial` buffer growth in `process()` for all parsers. 3. Discards data exceeding `max_line_size` until the next newline. 4. Updates `CriParser` to also cap its outer partial buffer. 5. Sets a default 2MB limit in the production pipeline. 6. Updates tests and fuzzers to accommodate the new constructor signatures. 7. Fixed code formatting issues found in CI. Co-authored-by: strawgate <6384545+strawgate@users.noreply.github.com>
Summary
The proven kernel. logfwd-core compiles with
no_stdand forbids all unsafe code. The compiler enforces: no IO, no filesystem, no threads, no unsafe — structurally impossible, not just linted.What moved
Impure modules → new
logfwd-iocrate:diagnostics.rs— HTTP server, metrics, threadscheckpoint.rs— filesystem persistenceenrichment.rs— CSV/JSON file loading, Arrow tablestail.rs— file watching, inotifyinput.rs— file input pollingcompress.rs— zstd compression (std::io)format.rs— old format parserfilter_hints.rs— glob-based filteringWhat remains in logfwd-core
Pure logic only, 52 Kani proofs:
structural.rs— SIMD detection viawide(safe)scanner.rs— JSON field extractionframer.rs— newline framing (Kani-proven)aggregator.rs— CRI P/F reassembly (Kani-proven)otlp.rs— protobuf encoding helperscri.rs— CRI format parsingscan_config.rs— field selectionbyte_search.rs— proven byte searchDependencies:
memchr+wide(both no_std compatible).Test plan
cargo check -p logfwd-corecompiles with no_std#![forbid(unsafe_code)]— compiler rejects any unsafe in core🤖 Generated with Claude Code