Skip to content

Research: eBPF pipe_write interception prototype#58

Merged
strawgate merged 1 commit into
masterfrom
ebpf-prototype
Mar 29, 2026
Merged

Research: eBPF pipe_write interception prototype#58
strawgate merged 1 commit into
masterfrom
ebpf-prototype

Conversation

@strawgate

@strawgate strawgate commented Mar 29, 2026

Copy link
Copy Markdown
Owner

Summary

eBPF prototype for capturing container stdout/stderr by intercepting vfs_write via kprobe. Bypasses CRI log file path.

Tested: zero data loss at 10K-1M lines, 81K events/sec on 2 vCPU.

Dev workflow

docker build -t ebpf-dev -f Dockerfile.dev .
docker run -it --rm --privileged -v $(pwd):/src ebpf-dev bash
# Inside: cargo build, run, iterate (~17s incremental)

See pipe-capture/README.md for full instructions.

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Mar 29, 2026

Copy link
Copy Markdown

Walkthrough

This pull request introduces a new Rust crate logfwd-ebpf-proto containing an eBPF-based log capture prototype for intercepting container stdout/stderr at the kernel level. It includes: (1) a design specification crate defining shared types (PipeWriteEvent, MAX_DATA = 4096) and architecture documentation; (2) a pipe-capture workspace with an eBPF kernel program that attaches a kprobe to vfs_write, reads user-buffer data into a 64MB ring buffer, and filters events by cgroup and PID; and (3) a userspace loader that reads the compiled eBPF binary, attaches the probe, and continuously drains captured events to an output log file with throughput metrics.

Possibly related PRs


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/logfwd-ebpf-proto/Cargo.toml`:
- Around line 12-15: The Cargo.toml currently lists serde_json under
[dependencies] but the comment says it's only for test helpers; update the
manifest by removing the serde_json entry from [dependencies] and adding it
under [dev-dependencies] instead (i.e., add serde_json = "1" to
[dev-dependencies] and delete the serde_json line from [dependencies]) so it is
only pulled in for tests.

In `@crates/logfwd-ebpf-proto/src/bpf/pipe_capture.rs`:
- Around line 31-44: The comment text is inconsistent about whether fd==1/2
filtering is required or must be skipped; pick one canonical strategy and make
code/comments match: either (A) enforce the cheap fd check (only accept fd 1 or
2) then run the cgroup map check (bpf_get_current_cgroup_id and watched-cgroup
hashmap) and optionally the pipe f_op resolution, or (B) drop the fd==1/2
restriction entirely and instead accept any write from watched cgroups (using
bpf_get_current_cgroup_id + watched-cgroup map) with an optional expensive f_op
pipe check; update the prose at the top of pipe_capture.rs and all places
implementing the checks to reflect the chosen strategy (references: the "fd"
check description, the cgroup check using bpf_get_current_cgroup_id and the
watched-cgroup hashmap, and the optional pipe f_op resolution) so comments,
condition order, and early returns in the corresponding check functions are
consistent.

In `@crates/logfwd-ebpf-proto/src/lib.rs`:
- Around line 82-90: The example in EbpfInput::poll reinterprets raw bytes
unsafely; update the loop to first check the raw slice length is at least
std::mem::size_of::<PipeWriteEvent>(), validate that event.captured_len as usize
does not exceed the backing data length before slicing, and avoid direct
unaligned dereference by copying the bytes into an aligned buffer or using
ptr::read_unaligned to build a PipeWriteEvent safely; on any validation failure
return or skip with an io::Error (or continue) instead of panicking, and keep
producing InputEvent::Data only when the checks pass (referencing InputSource,
EbpfInput::poll, ring_buf.next(), PipeWriteEvent, captured_len, and InputEvent).
- Around line 193-221: Replace the loose ABI checks in event_struct_size and
event_is_repr_c with precise assertions: compute expected_total = 4096 +
header_bytes and assert std::mem::size_of::<PipeWriteEvent>() == expected_total
(and keep the <= 8192 guard), and in event_is_repr_c use
offset_of!(PipeWriteEvent, field) to assert exact byte offsets for pid, tgid,
cgroup_id, write_len, captured_len, stream, _pad and data plus assert data.len()
== MAX_CAPTURE_BYTES; ensure you reference PipeWriteEvent, event_struct_size,
event_is_repr_c and MAX_CAPTURE_BYTES, add a comment about requiring Rust 1.77+
for offset_of! and if MSRV is lower either gate the test behind cfg or implement
manual offset checks (e.g., using raw pointers and addr casts) so the ABI tests
remain strict and deterministic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aa1f05ed-eea3-44ca-b662-893264044627

📥 Commits

Reviewing files that changed from the base of the PR and between 42e5629 and 0a94dae.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • Cargo.toml
  • crates/logfwd-ebpf-proto/Cargo.toml
  • crates/logfwd-ebpf-proto/src/bpf/mod.rs
  • crates/logfwd-ebpf-proto/src/bpf/pipe_capture.rs
  • crates/logfwd-ebpf-proto/src/common.rs
  • crates/logfwd-ebpf-proto/src/lib.rs

Comment thread crates/logfwd-ebpf-proto/Cargo.toml
Comment thread crates/logfwd-ebpf-proto/src/bpf/pipe_capture.rs
Comment thread crates/logfwd-ebpf-proto/src/lib.rs
Comment thread crates/logfwd-ebpf-proto/src/lib.rs
@strawgate strawgate force-pushed the ebpf-prototype branch 2 times, most recently from 53bfc8f to b776a0c Compare March 29, 2026 16:12
Working eBPF program + userspace loader for capturing container
stdout/stderr by intercepting vfs_write via kprobe.

Tested on GCE e2-small (kernel 6.17): zero data loss at 10K-1M lines,
80K events/sec throughput. See #32 for full results.

Contents:
- pipe-capture/ — standalone eBPF project (aya framework)
  - pipe-capture-ebpf/ — kernel program with PID exclusion + ring buffer
  - pipe-capture-common/ — shared PipeEvent struct (no_std)
  - src/main.rs — userspace loader
- Design docs and integration notes in logfwd-ebpf-proto crate

Build requires Linux + nightly Rust + bpfel-unknown-none target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (3)
crates/logfwd-ebpf-proto/src/lib.rs (2)

193-221: ⚠️ Potential issue | 🟠 Major

Strengthen ABI tests with exact size and offsets.

Lines 193–221 allow broad size ranges and do not verify field offsets, so wire-format regressions can pass undetected.

Proposed fix
     #[test]
     fn event_struct_size() {
-        // Verify the event struct is a reasonable size for ring buffer events.
-        let size = std::mem::size_of::<PipeWriteEvent>();
-        // Header (pid, tgid, cgroup_id, write_len, captured_len, stream, pad)
-        // + 4096 bytes data + alignment padding.
-        assert!(
-            size >= 4096 + 20 && size <= 4096 + 64,
-            "unexpected event size: {size}"
-        );
+        let size = std::mem::size_of::<PipeWriteEvent>();
+        assert_eq!(size, 4124, "unexpected event size: {size}");
         assert!(size <= 8192, "must fit in a single ring buffer slot");
     }
@@
     fn event_is_repr_c() {
-        // Verify offsets are predictable (repr(C) layout).
-        let event = PipeWriteEvent {
-            pid: 0,
-            tgid: 0,
-            cgroup_id: 0,
-            write_len: 0,
-            captured_len: 0,
-            stream: 0,
-            _pad: [0; 3],
-            data: [0; MAX_CAPTURE_BYTES],
-        };
-        // Should be constructable with known layout.
-        assert_eq!(event.pid, 0);
-        assert_eq!(event.data.len(), MAX_CAPTURE_BYTES);
+        use core::mem::offset_of;
+        assert_eq!(offset_of!(PipeWriteEvent, pid), 0);
+        assert_eq!(offset_of!(PipeWriteEvent, tgid), 4);
+        assert_eq!(offset_of!(PipeWriteEvent, cgroup_id), 8);
+        assert_eq!(offset_of!(PipeWriteEvent, write_len), 16);
+        assert_eq!(offset_of!(PipeWriteEvent, captured_len), 20);
+        assert_eq!(offset_of!(PipeWriteEvent, stream), 24);
+        assert_eq!(offset_of!(PipeWriteEvent, _pad), 25);
+        assert_eq!(offset_of!(PipeWriteEvent, data), 28);
+        assert_eq!(MAX_CAPTURE_BYTES, 4096);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-ebpf-proto/src/lib.rs` around lines 193 - 221, The ABI tests
are too loose: tighten event_struct_size() to assert the exact size of
PipeWriteEvent (use std::mem::size_of::<PipeWriteEvent>() and assert_eq! against
the expected constant total size) and extend event_is_repr_c() to verify field
offsets and layout deterministically by asserting offsets for pid, tgid,
cgroup_id, write_len, captured_len, stream, and data (use a reliable offset
macro like memoffset::offset_of! or core::mem::size_of to compute offsets) and
assert the data array length equals MAX_CAPTURE_BYTES; reference the
PipeWriteEvent type, event_struct_size, and event_is_repr_c when adding these
exact size and offset checks.

82-90: ⚠️ Potential issue | 🟠 Major

Make the decode example safe-by-default.

Line 88/89 demonstrates unchecked raw-pointer cast + unchecked slice length. Even in rust,ignore, this is hazardous guidance for a kernel/userspace ABI boundary.

Proposed fix
 ///         while let Some(raw) = self.ring_buf.next() {
-///             let event: &PipeWriteEvent = unsafe { &*(raw.as_ptr() as *const _) };
-///             let data = event.data[..event.captured_len as usize].to_vec();
+///             if raw.len() < core::mem::size_of::<PipeWriteEvent>() {
+///                 continue;
+///             }
+///             let event = unsafe { (raw.as_ptr() as *const PipeWriteEvent).read_unaligned() };
+///             let captured = (event.captured_len as usize).min(MAX_CAPTURE_BYTES);
+///             let data = event.data[..captured].to_vec();
 ///             events.push(InputEvent::Data { bytes: data });
 ///         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-ebpf-proto/src/lib.rs` around lines 82 - 90, The example in
EbpfInput::poll unsafely casts a raw ring buffer pointer to &PipeWriteEvent and
slices event.data using event.captured_len without validating bounds; change
this to a safe-by-default pattern: after obtaining raw = self.ring_buf.next(),
convert the pointer using a checked, documented unsafe block (e.g., use
std::ptr::read_unaligned or from_raw_parts only inside an unsafe block) then
validate that (event.captured_len as usize) <= event.data.len() before slicing;
if the length is invalid, handle it by returning an Err or skipping the event,
otherwise copy the bytes with a safe slice like &event.data[..cap].to_vec() and
push InputEvent::Data { bytes } (references: EbpfInput::poll, ring_buf.next(),
PipeWriteEvent, event.captured_len, InputEvent::Data).
crates/logfwd-ebpf-proto/src/bpf/pipe_capture.rs (1)

31-44: ⚠️ Potential issue | 🟡 Minor

Unify the filter strategy narrative in one canonical flow.

The doc still conflicts: Lines 31–44 require fd 1/2 filtering, while Lines 91–102 say to skip fd resolution and capture all watched-cgroup writes. Pick one strategy and align both sections to it.

Also applies to: 91-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-ebpf-proto/src/bpf/pipe_capture.rs` around lines 31 - 44, The
README/commentary currently describes two conflicting capture strategies: strict
fd filtering (only fd 1/2) plus optional pipe checks, and an alternative that
skips fd resolution and captures all writes from watched cgroups; reconcile them
by choosing one canonical flow and updating all mentions to match it. Decide
whether to enforce stdout/stderr filtering (fd 1/2) or to capture all writes in
watched cgroups, then edit the narrative blocks that reference fd 1/2,
bpf_get_current_cgroup_id(), the BPF hashmap of watched cgroup IDs, and the
optional pipe check so they consistently describe the chosen approach (e.g., if
choosing fd filtering, state explicitly that bpf_get_current_cgroup_id() +
watched-cgroup map further narrow processes but fd must be 1/2; if choosing
watched-cgroup-only, remove the fd 1/2 requirement and clarify skipping fd
resolution). Ensure mentions of "pipe check", "fd 1/2",
"bpf_get_current_cgroup_id()", and "watched-cgroup writes" are updated to a
single, non-conflicting description.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-ebpf/Cargo.toml`:
- Around line 11-13: The binary name in this crate's Cargo.toml ([[bin]] name =
"pipe-capture") collides with the userspace loader binary; change the [[bin]]
name to a distinct identifier (e.g., "pipe-capture-ebpf") so build artifacts
don't conflict, keeping the path = "src/main.rs" unchanged; update any internal
references or docs that invoke the old binary name (search for "pipe-capture" in
this crate) to the new name.

In `@crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-ebpf/src/main.rs`:
- Around line 26-29: The handler currently only checks EXCLUDE_PIDS and
therefore still captures vfs_write from the host; update the vfs_write probe
(the code paths where you check EXCLUDE_PIDS and perform event
reservation/creation) to first check the current task's cgroup against your
watched-cgroup set (e.g., a map like WATCHED_CGROUPS or a cgroup_id map) and
return immediately if it is not watched, before any event reservation or
allocation occurs; ensure this same early cgroup-filter check is added in the
other similar handler locations referenced around the EXCLUDE_PIDS usage so no
event reservation happens for non-watched cgroups.
- Around line 78-79: The assignment to (*event).len uses count (usize) cast to
u32 and can overflow silently; clamp count to u32::MAX before storing just like
capture_len is clamped to MAX_DATA: compute a clamped_len = std::cmp::min(count,
u32::MAX as usize) (or equivalent) and assign (*event).len = clamped_len as u32
while leaving (*event).captured = capture_len as u32 unchanged; update the same
function/block where count, capture_len, (*event).len and (*event).captured are
set.

In `@crates/logfwd-ebpf-proto/pipe-capture/README.md`:
- Around line 49-54: The fenced code block in the README's ASCII art lacks a
language tag; update the block opener for the diagram (the triple-backtick
before "Container write(stdout)" in the pipe-capture README) to use the text
language specifier (i.e., ```text) so the ASCII diagram is treated as plain text
by linters and renderers.

In `@crates/logfwd-ebpf-proto/pipe-capture/src/main.rs`:
- Around line 41-49: Replace the panic paths on map/program lookup so missing
artifacts return a typed error instead of aborting: handle the result of
ebpf.map_mut("EXCLUDE_PIDS") and ebpf.program_mut("pipe_write_probe") without
using unwrap/expect, propagate a descriptive error (e.g., with map_err or a
match that returns Err) when the map or program is not found, and only call
try_into() after a successful lookup; reference the EXCLUDE_PIDS map lookup and
the pipe_write_probe program retrieval (ebpf.map_mut and ebpf.program_mut) and
ensure the function returns a Result with the appropriate error rather than
panicking.
- Around line 66-70: Validate the ring buffer item size before doing the unsafe
cast to PipeEvent: check item.len() (or equivalent) is at least
size_of::<PipeEvent>() and matches expectations, then perform the unsafe
reinterpret; after reading ev.captured, enforce a bound (e.g., if ev.captured as
usize > PipeEvent::DATA_LEN or 4096) and return an error instead of panicking;
when writing use the Result from out.write_all(...) (and/or out.flush()) and
propagate or return the error instead of ignoring it. Reference: ring.next(),
PipeEvent, ev.captured, and out.write_all.

In `@crates/logfwd-ebpf-proto/src/common.rs`:
- Around line 12-29: The PR introduces two diverging event
structs—PipeWriteEvent and PipeEvent—causing maintenance drift; decide which is
canonical and consolidate accordingly by either (A) updating PipeWriteEvent (the
struct in common.rs) to match the simpler PipeEvent shape used in
pipe-capture-common (remove stream/_pad and align field names and sizes) or (B)
standardizing pipe-capture-common to the richer PipeWriteEvent (keep stream,
_pad, captured_len/write_len semantics and MAX_CAPTURE_BYTES layout), then
update all serializers/deserializers and consumers to the chosen type (search
for PipeWriteEvent and PipeEvent in your codebase and make their field
names/types identical and add a short doc comment on the chosen canonical struct
to prevent future divergence).

---

Duplicate comments:
In `@crates/logfwd-ebpf-proto/src/bpf/pipe_capture.rs`:
- Around line 31-44: The README/commentary currently describes two conflicting
capture strategies: strict fd filtering (only fd 1/2) plus optional pipe checks,
and an alternative that skips fd resolution and captures all writes from watched
cgroups; reconcile them by choosing one canonical flow and updating all mentions
to match it. Decide whether to enforce stdout/stderr filtering (fd 1/2) or to
capture all writes in watched cgroups, then edit the narrative blocks that
reference fd 1/2, bpf_get_current_cgroup_id(), the BPF hashmap of watched cgroup
IDs, and the optional pipe check so they consistently describe the chosen
approach (e.g., if choosing fd filtering, state explicitly that
bpf_get_current_cgroup_id() + watched-cgroup map further narrow processes but fd
must be 1/2; if choosing watched-cgroup-only, remove the fd 1/2 requirement and
clarify skipping fd resolution). Ensure mentions of "pipe check", "fd 1/2",
"bpf_get_current_cgroup_id()", and "watched-cgroup writes" are updated to a
single, non-conflicting description.

In `@crates/logfwd-ebpf-proto/src/lib.rs`:
- Around line 193-221: The ABI tests are too loose: tighten event_struct_size()
to assert the exact size of PipeWriteEvent (use
std::mem::size_of::<PipeWriteEvent>() and assert_eq! against the expected
constant total size) and extend event_is_repr_c() to verify field offsets and
layout deterministically by asserting offsets for pid, tgid, cgroup_id,
write_len, captured_len, stream, and data (use a reliable offset macro like
memoffset::offset_of! or core::mem::size_of to compute offsets) and assert the
data array length equals MAX_CAPTURE_BYTES; reference the PipeWriteEvent type,
event_struct_size, and event_is_repr_c when adding these exact size and offset
checks.
- Around line 82-90: The example in EbpfInput::poll unsafely casts a raw ring
buffer pointer to &PipeWriteEvent and slices event.data using event.captured_len
without validating bounds; change this to a safe-by-default pattern: after
obtaining raw = self.ring_buf.next(), convert the pointer using a checked,
documented unsafe block (e.g., use std::ptr::read_unaligned or from_raw_parts
only inside an unsafe block) then validate that (event.captured_len as usize) <=
event.data.len() before slicing; if the length is invalid, handle it by
returning an Err or skipping the event, otherwise copy the bytes with a safe
slice like &event.data[..cap].to_vec() and push InputEvent::Data { bytes }
(references: EbpfInput::poll, ring_buf.next(), PipeWriteEvent,
event.captured_len, InputEvent::Data).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 87ae209a-89ab-4f27-8f2d-e6a927563d73

📥 Commits

Reviewing files that changed from the base of the PR and between 0a94dae and 3a1ec42.

📒 Files selected for processing (12)
  • crates/logfwd-ebpf-proto/Cargo.toml
  • crates/logfwd-ebpf-proto/pipe-capture/Cargo.toml
  • crates/logfwd-ebpf-proto/pipe-capture/README.md
  • crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-common/Cargo.toml
  • crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-common/src/lib.rs
  • crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-ebpf/Cargo.toml
  • crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-ebpf/src/main.rs
  • crates/logfwd-ebpf-proto/pipe-capture/src/main.rs
  • crates/logfwd-ebpf-proto/src/bpf/mod.rs
  • crates/logfwd-ebpf-proto/src/bpf/pipe_capture.rs
  • crates/logfwd-ebpf-proto/src/common.rs
  • crates/logfwd-ebpf-proto/src/lib.rs

Comment thread crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-ebpf/Cargo.toml
Comment thread crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-ebpf/src/main.rs
Comment thread crates/logfwd-ebpf-proto/pipe-capture/pipe-capture-ebpf/src/main.rs
Comment thread crates/logfwd-ebpf-proto/pipe-capture/README.md
Comment thread crates/logfwd-ebpf-proto/pipe-capture/src/main.rs
Comment thread crates/logfwd-ebpf-proto/pipe-capture/src/main.rs
Comment thread crates/logfwd-ebpf-proto/src/common.rs
@strawgate strawgate merged commit 46786e2 into master Mar 29, 2026
1 of 2 checks passed
@strawgate strawgate deleted the ebpf-prototype branch March 29, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant