Skip to content

sensor: use ebpf/es input names and Arrow-native path#1733

Merged
strawgate merged 24 commits into
mainfrom
codex/platform-sensor-beta-3os-ci-lanes
Apr 10, 2026
Merged

sensor: use ebpf/es input names and Arrow-native path#1733
strawgate merged 24 commits into
mainfrom
codex/platform-sensor-beta-3os-ci-lanes

Conversation

@strawgate

@strawgate strawgate commented Apr 9, 2026

Copy link
Copy Markdown
Owner

Summary

  • renamed platform sensor input types to:
    • linux_ebpf_sensor
    • windows_ebpf_sensor
    • macos_es_sensor
  • made sensor inputs Arrow-native only:
    • sensor inputs now reject format (no raw/json mode)
    • runtime sensor path no longer routes through FramedInput format decoding
  • removed heartbeat-row behavior from sensor emission path:
    • startup/control + optional periodic per-family sample rows remain
    • deprecated emit_heartbeat is accepted only as a no-op compatibility field
  • removed beta branding from code/docs/CI sensor surfaces:
    • renamed module platform_sensor_beta -> platform_sensor
    • updated CI workflow test selectors and changed paths

Compatibility

  • legacy config aliases remain accepted:
    • input types: linux_sensor_beta, macos_sensor_beta, windows_sensor_beta
    • block key: sensor_beta: (alias of sensor:)

Validation

  • cargo check -p logfwd
  • cargo test -p logfwd-io platform_sensor::tests -- --nocapture
  • cargo test -p logfwd-config sensor -- --nocapture
  • cargo test -p logfwd-runtime input_build::tests -- --nocapture
  • cargo test -p logfwd validate_pipelines_read_only_rejects_sensor_format -- --nocapture
  • cargo test -p logfwd validate_pipelines_read_only_accepts_arrow_native_sensor -- --nocapture

Note

Replace beta sensor inputs with Arrow-native linux_ebpf_sensor, macos_es_sensor, and windows_ebpf_sensor

  • Renames linux_sensor_beta, macos_sensor_beta, and windows_sensor_beta input types to linux_ebpf_sensor, macos_es_sensor, and windows_ebpf_sensor in config, runtime, and docs; legacy names remain accepted as serde aliases for backward compatibility.
  • Replaces PlatformSensorBetaInput (JSON heartbeat emitter) with a new PlatformSensorInput that emits Arrow RecordBatch rows directly, including control, capability, and signal sample rows on startup and at each poll interval.
  • Adds new sensor config fields: control_path, control_reload_interval_ms, enabled_families, and emit_signal_rows; validation rejects unknown families, zero intervals, empty paths, and any format setting (Arrow-native inputs do not support format).
  • Removes libc::isatty calls in logfwd and logfwd-output, replacing them with std::io::IsTerminal.
  • Risk: any pipeline that specifies a format field on a sensor input will now fail validation with an explicit error.

Macroscope summarized ffe2beb.

Copilot AI review requested due to automatic review settings April 9, 2026 06:11
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai

coderabbitai Bot commented Apr 9, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds end-to-end beta platform sensor support and OS capability probes: three new input types (linux_sensor_beta, macos_sensor_beta, windows_sensor_beta) with config schema, validation, and pipeline integration; a new exported platform_sensor_beta module implementing a cross-platform Arrow-backed sensor input that emits heartbeat/control/signal batches; pipeline changes to construct and run sensor-beta inputs and treat their formats as raw by default; a README and reference docs for the beta inputs; a GitHub Actions workflow plus scripts/ci-ebpf-capabilities.sh to probe eBPF/OS capabilities and upload artifacts; and a workspace dependency-audit document. Several public enums/structs and one exported module were added.

Possibly related PRs


Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (1 error, 2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
High-Quality Rust Practices ❌ Error The review comment identified a real bug where emit_signal_rows is ignored when emit_heartbeat=false due to coupled conditions gating signal emission. Decouple heartbeat and signal-row emission logic: check emit_heartbeat only for the control row, then separately check poll_interval and emit_signal_rows for sample rows. Add regression test signal_rows_emit_without_heartbeat().
Documentation Thoroughly Updated ⚠️ Warning New public API items lack doc comments; public structs/enums in platform_sensor_beta.rs and config variants missing /// documentation. Add /// doc comments to all new public items (PlatformSensorTarget, PlatformSensorBetaConfig, PlatformSensorBetaInput, config variants) describing purpose and usage.
Maintainer Fitness ⚠️ Warning PR adds 1,716 lines without justification and leaves a decoupling bug unresolved: signal rows coupled to heartbeat condition, preventing emission when emit_heartbeat=false. Decouple signal row/heartbeat emission, add regression test, document justification for size, run benchmarks, and expand PR description with risk surface and limitations.
Formal Verification Coverage ❓ Inconclusive Referenced verification documentation does not exist in repository; no Kani infrastructure present. Confirm whether formal verification is required and provide or reference the missing guidance documents.
✅ Passed checks (1 passed)
Check name Status Explanation
Crate Boundary And Dependency Integrity ✅ Passed PR maintains strict crate boundary and dependency integrity with no new crates or external dependencies added, all validation logic properly placed in appropriate crates, and no unsafe code introduced.

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread crates/logfwd-io/src/platform_sensor_beta.rs Outdated
@macroscopeapp

macroscopeapp Bot commented Apr 9, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

This PR introduces significant new runtime behavior by replacing simple JSON heartbeat emission with a 1100+ line Arrow-native sensor implementation featuring a state machine, runtime control plane reload, and per-platform signal families. While backward-compatible config aliases are preserved, the fundamental change in what sensor inputs emit and how they behave at runtime warrants human review.

You can customize Macroscope's approvability policy. Learn more.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a dedicated, cross-platform CI workflow to probe/record platform-specific sensor capabilities (Linux eBPF, macOS EndpointSecurity/OpenBSM, Windows eBPF-for-Windows surfaces) and introduces new *_sensor_beta inputs with lightweight heartbeat/control-event emission plus corresponding config validation and documentation.

Changes:

  • Add eBPF Capabilities GitHub Actions workflow with explicit Linux/macOS/Windows lanes and artifacted capability snapshots.
  • Introduce linux_sensor_beta / macos_sensor_beta / windows_sensor_beta inputs end-to-end (config parsing/validation, pipeline wiring, and a new logfwd-io beta input source).
  • Update user docs (README + mdBook config reference) and add a dependency-structure audit research note.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/ci-ebpf-capabilities.sh Adds a Linux runner eBPF capability probe producing TSV/env/markdown summaries + raw outputs.
.github/workflows/ebpf-capabilities.yml Adds a scheduled/PR workflow with per-OS lanes and uploaded capability artifacts.
crates/logfwd-io/src/platform_sensor_beta.rs New beta input source emitting startup/heartbeat control events with platform gating.
crates/logfwd-io/src/lib.rs Exposes the new platform_sensor_beta module publicly.
crates/logfwd-config/src/lib.rs Adds new InputType variants and sensor_beta config + validation/tests.
crates/logfwd/src/pipeline.rs Wires new *_sensor_beta inputs into pipeline input creation and format validation.
crates/logfwd/src/main.rs Updates read-only format validation and input labeling for new input types.
book/src/config/reference.md Documents new beta inputs and supported fields.
README.md Adds a user-facing overview and example config for platform sensor betas.
dev-docs/research/dependency-structure-audit-2026-04-08.md Adds internal research notes on dependency/tooling/doc drift.

Comment thread crates/logfwd-io/src/platform_sensor_beta.rs Outdated
Comment thread crates/logfwd-io/src/platform_sensor_beta.rs Outdated
Comment thread scripts/ci-ebpf-capabilities.sh Outdated
Comment thread .github/workflows/ebpf-capabilities.yml Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ebpf-capabilities.yml:
- Around line 188-190: The "Add macOS probe summary" step currently uses if:
always() and unconditionally cats artifacts/macos-sensor/summary.md which fails
when the probe step was skipped; update that step to first check for the file's
existence (e.g., test -f artifacts/macos-sensor/summary.md on macOS/Linux and
Test-Path on Windows) and only append the file to $GITHUB_STEP_SUMMARY if
present, otherwise append a small fallback message; apply the same
existence-check + fallback logic to the corresponding Windows/other summary
steps referenced around lines 284-287 so they don't error when the probe didn't
run.
- Around line 79-83: The workflow's tests only run crate-level tests for
logfwd-io and logfwd-config and therefore miss the full runtime lane implemented
in crates/logfwd/src/pipeline.rs; add a logfwd-level smoke per OS so the
pipeline and OS-gating are exercised end-to-end by either adding a targeted
cargo invocation such as "cargo test -p logfwd <specific_smoke_test_or_module>
-- --nocapture" for each platform job or by running a validate/effective-config
command against a platform-sensor config fixture (e.g., invoke the logfwd binary
or test helper that loads the fixture and runs the pipeline) so the workflow
covers the full runtime lane alongside logfwd-io and logfwd-config (apply the
same change to the other two similar job blocks mentioned).
- Around line 6-20: The workflow path filters currently omit runtime wiring
files that can affect sensor format checks; update both the push and
pull_request "paths" arrays in .github/workflows/ebpf-capabilities.yml to also
include crates/logfwd/src/main.rs and crates/logfwd/src/pipeline.rs so changes
to PlatformSensorBetaInput construction and the sensor format checks in
main.rs/pipeline.rs will trigger this workflow.

In `@crates/logfwd-config/src/lib.rs`:
- Around line 984-1026: The validation branch for InputType::LinuxSensorBeta |
InputType::MacosSensorBeta | InputType::WindowsSensorBeta must additionally
reject non-JSON formats up front so Config::load_str() fails fast: inside the
same branch (where you already check
generator/listen/path/max_open_files/glob_rescan_interval_ms and
sensor_beta.poll_interval_ms) add a check for input.format (e.g., ensure
input.format is Some(Format::Json) or the enum variant your code uses) and
return ConfigError::Validation with a clear message like "pipeline '{name}'
input '{label}': only JSON format is supported for *_sensor_beta inputs" if it's
not JSON; then add a regression test next to
sensor_beta_rejects_zero_poll_interval that constructs a *_sensor_beta input
with format: raw and asserts Config::load_str() returns a validation error.

In `@crates/logfwd-io/src/platform_sensor_beta.rs`:
- Around line 183-218: poll() takes self.machine with take(), so if
Init::start() returns Err the machine is lost and name() later panics; fix by
preserving/restoring state: before calling init.start() save init.common (or
clone needed fields) and call start() on a non-consuming borrow or, simpler,
don't take the machine—match on self.machine.as_mut() or temporarily replace it
then on Err put back PlatformSensorMachine::Init(init) into self.machine; update
PlatformSensorBetaInput::poll to restore self.machine on any Err from
init.start() or use as_mut() to avoid consuming Init so name() cannot observe
None.

In `@crates/logfwd/src/pipeline.rs`:
- Around line 1654-1663: The function build_platform_sensor_beta_config silently
clamps poll_interval_ms to 1 via poll_interval_ms.max(1); change it to detect
when clamping happens and emit a warning (e.g., using tracing::warn!) including
the original configured value and the clamped value, then use
Duration::from_millis(clamped); specifically, read cfg.and_then(|c|
c.poll_interval_ms) into an Option/variable original_poll, compute let clamped =
original_poll.unwrap_or(10_000).max(1), if original_poll.is_some() &&
original_poll.unwrap() == 0 { tracing::warn!(original = original_poll.unwrap(),
clamped = clamped, "poll_interval_ms was below minimum and was clamped to 1ms");
} and then construct PlatformSensorBetaConfig with poll_interval:
Duration::from_millis(clamped) (or alternatively add equivalent documentation to
CONFIG_REFERENCE.md if logging is not desired).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: ef8ee97f-a998-41ea-986e-d3719d4bc372

📥 Commits

Reviewing files that changed from the base of the PR and between 24c6596 and 210b27a.

📒 Files selected for processing (10)
  • .github/workflows/ebpf-capabilities.yml
  • README.md
  • book/src/config/reference.md
  • crates/logfwd-config/src/lib.rs
  • crates/logfwd-io/src/lib.rs
  • crates/logfwd-io/src/platform_sensor_beta.rs
  • crates/logfwd/src/main.rs
  • crates/logfwd/src/pipeline.rs
  • dev-docs/research/dependency-structure-audit-2026-04-08.md
  • scripts/ci-ebpf-capabilities.sh

Comment thread .github/workflows/ebpf-capabilities.yml Outdated
Comment thread .github/workflows/ebpf-capabilities.yml Outdated
Comment thread .github/workflows/ebpf-capabilities.yml Outdated
Comment thread crates/logfwd-config/src/lib.rs Outdated
Comment thread crates/logfwd-io/src/platform_sensor_beta.rs Outdated
Comment thread crates/logfwd/src/pipeline.rs Outdated

Copy link
Copy Markdown
Owner Author

Pushed follow-up commit b05b0b43 to make substantial control-plane + per-OS sensor progress in this PR.

What landed:

  • platform_sensor_beta now emits Arrow InputEvent::Batch rows only (no JSON Data events), with typed control/sensor columns.
  • Added runtime control-plane reload via optional sensor_beta.control_path + control_reload_interval_ms.
  • Added per-target family model and enforcement:
    • Linux: process,file,network,dns,authz
    • macOS: process,file,network,dns,module,authz
    • Windows: process,file,network,dns,module,registry,authz
  • Added OS-aware config validation for enabled_families with targeted error messages.
  • Pipeline wiring updates:
    • sensor beta defaults to format: raw
    • allows raw|json (rejects CRI/auto/etc)
    • bypasses FramedInput for sensor batch-native path
    • resolves relative sensor_beta.control_path against base_path
  • Control-plane bug fixes from review:
    • no repeated control_reload_applied spam when generation is present but unchanged
    • enabled_families: [] is now explicit “disable all families” (distinct from omission/defaults)

Validation run locally after push:

  • cargo test -p logfwd-config sensor_beta -- --nocapture
  • cargo test -p logfwd-io platform_sensor_beta::tests -- --nocapture
  • cargo check -p logfwd --all-targets
  • cargo test -p logfwd sensor_beta_formats_allow_raw_and_json_only -- --nocapture

Comment thread crates/logfwd/src/pipeline.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/logfwd/src/pipeline.rs (1)

1632-1717: 🛠️ Refactor suggestion | 🟠 Major

Move the new sensor-beta runtime wiring out of the binary crate.

This branch adds more long-lived runtime orchestration to crates/logfwd, which is exactly the layer the repo wants kept to CLI/bootstrap only. Please push the sensor-beta builder/config-translation logic into logfwd-runtime (or the IO/runtime layer) and keep this file as thin wiring. As per coding guidelines, crates/logfwd/**/*.rs: Binary crate (logfwd) must contain only CLI/bootstrap logic — no long-lived runtime orchestration logic. Move orchestration to logfwd-runtime.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd/src/pipeline.rs` around lines 1632 - 1717, This change wires
sensor-beta runtime orchestration into the binary crate; move the
builder/translation logic for sensor-beta into the runtime/io crate and keep
pipeline.rs as thin wiring: extract build_platform_sensor_beta_config (and any
validation like is_sensor_beta_input_type usage and the creation of
PlatformSensorBetaInput::new) into a new public builder/util in logfwd-runtime
(or logfwd-io) that returns a ready-to-use PlatformSensorBetaConfig and an
initialized source (or a factory function that returns Box<dyn InputSource>,
format and buf_cap), then in pipeline.rs call that runtime API to receive
raw_source/format/buf_cap and only perform the minimal wiring to wrap with
make_format/FramedInput and return InputState; update imports and error mapping
in pipeline.rs to delegate initialization errors to the runtime helper.
♻️ Duplicate comments (2)
crates/logfwd-config/src/lib.rs (1)

994-1078: ⚠️ Potential issue | 🟠 Major

Reject unsupported *_sensor_beta formats here too.

This branch validates sensor-beta fields, but it still lets format: cri, auto, text, etc. through because the only global input-format rejection is logfmt/syslog. That means Config::load_str() can still accept configs that the logfwd runtime later refuses, even though the sensor beta lane now only supports raw or json.

Suggested fix
                     InputType::LinuxSensorBeta
                     | InputType::MacosSensorBeta
                     | InputType::WindowsSensorBeta => {
+                        if let Some(fmt) = &input.format
+                            && !matches!(fmt, Format::Raw | Format::Json)
+                        {
+                            return Err(ConfigError::Validation(format!(
+                                "pipeline '{name}' input '{label}': {} inputs only support format raw or json",
+                                input.input_type
+                            )));
+                        }
                         if input.generator.is_some() {
                             return Err(ConfigError::Validation(format!(
                                 "pipeline '{name}' input '{label}': 'generator' settings are only supported for generator inputs"
                             )));
                         }

Also applies to: 1089-1094

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-config/src/lib.rs` around lines 994 - 1078, In the
InputType::LinuxSensorBeta | MacosSensorBeta | WindowsSensorBeta validation
branch (around the sensor_beta checks), also reject inputs whose format is not
"raw" or "json": inspect input.format (or input.format.as_deref()) and if
Some(fmt) and fmt != "raw" && fmt != "json" return ConfigError::Validation with
a message like "pipeline '{name}' input '{label}': sensor_beta inputs only
support 'raw' or 'json' format". Apply the same check in the other sensor_beta
branch referenced (the one at lines ~1089-1094) so both sensor_beta input types
consistently validate supported formats; keep references to
InputType::LinuxSensorBeta, InputType::MacosSensorBeta,
InputType::WindowsSensorBeta, sensor_beta, validate_sensor_beta_family_name and
sensor_beta_supported_families_csv to locate the code.
.github/workflows/ebpf-capabilities.yml (1)

83-87: ⚠️ Potential issue | 🟠 Major

The logfwd smoke still stops at format validation.

cargo test -p logfwd sensor_beta_formats_allow_raw_and_json_only only exercises validate_input_format() in crates/logfwd/src/pipeline.rs:1767-1780; it never builds the sensor input path. Regressions in control-path resolution, default raw, or the framing bypass can still leave all three lanes green. Add one fixture-driven logfwd smoke per OS that reaches pipeline/input construction.

Also applies to: 136-138, 236-238

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ebpf-capabilities.yml around lines 83 - 87, The current CI
step only runs the unit test sensor_beta_formats_allow_raw_and_json_only which
exercises validate_input_format() in crates/logfwd/src/pipeline.rs but does not
construct the pipeline input path; add (per-OS) a fixture-driven smoke test
invocation that exercises pipeline/input construction (e.g., a test that calls
the pipeline input builder or the higher-level integration test that creates a
sensor input) so regressions in control-path resolution, default raw framing, or
framing bypass are caught; update the .github/workflows/ebpf-capabilities.yml
job to run that new/alternative logfwd test (in addition to
sensor_beta_formats_allow_raw_and_json_only) for each OS instead of relying
solely on sensor_beta_formats_allow_raw_and_json_only.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ebpf-capabilities.yml:
- Around line 6-13: Update the workflow's push.paths and pull_request.paths
lists to include the logfwd-io lib file that exports pub mod
platform_sensor_beta so changes to that module re-trigger the ebpf-capabilities
workflow; specifically, add an entry for the logfwd-io lib (the file that
contains "pub mod platform_sensor_beta") to both push.paths and
pull_request.paths under the ebpf-capabilities workflow (the keys to edit are
push.paths and pull_request.paths).

In `@crates/logfwd-config/src/lib.rs`:
- Around line 1379-1381: Rename the boolean helper
validate_sensor_beta_family_name to a predicate-style name (e.g.
is_sensor_beta_family_supported) and update all call sites and any
documentation/comments to use the new name; change the function
signature/function identifier from validate_sensor_beta_family_name to
is_sensor_beta_family_supported while keeping the implementation (calling
sensor_beta_supported_families(input_type).contains(&name)) unchanged to
preserve behavior.
- Around line 1359-1377: The platform-specific family lists in
sensor_beta_supported_families and sensor_beta_supported_families_csv are
duplicated from the runtime matrix and must be unified: introduce a single
shared source (e.g., a const map/array per InputType like SENSOR_BETA_FAMILIES
or a function get_sensor_beta_families) and update
sensor_beta_supported_families to return references into that shared constant,
and update sensor_beta_supported_families_csv to derive the CSV string by
joining the same shared data rather than hardcoding strings; reference
InputType, sensor_beta_supported_families, and
sensor_beta_supported_families_csv when making the change so both functions use
the single definition.

In `@crates/logfwd-io/src/platform_sensor_beta.rs`:
- Around line 29-37: Add a triple-slash doc comment for the public method
PlatformSensorTarget::as_str describing that it returns a stable, lowercase
string identifier ("linux", "macos", or "windows") for the target platform and
instruct callers to use it when a stable textual platform key is required (e.g.,
for config keys, logging, or interop with external APIs), including that the
returned &'static str is stable and should not be parsed for other semantics;
follow existing doc style and place the comment directly above the pub const fn
as_str declaration.

In `@crates/logfwd/src/pipeline.rs`:
- Around line 340-353: The code currently calls std::fs::canonicalize(...) on
sensor_cfg.control_path which resolves symlinks and pins the watcher to the
symlink target; stop resolving symlinks and instead make the path absolute by
joining with base when relative (keep the joined PathBuf as-is) and assign
sensor_cfg.control_path = Some(path.to_string_lossy().into_owned()) directly; in
other words remove the std::fs::canonicalize(...) branch and simply set
control_path from the computed PathBuf (after the relative -> base.join(...)
adjustment) so symlinked control files are preserved (referencing
resolved_cfg.sensor_beta, sensor_cfg.control_path and the std::fs::canonicalize
call to locate the change).

---

Outside diff comments:
In `@crates/logfwd/src/pipeline.rs`:
- Around line 1632-1717: This change wires sensor-beta runtime orchestration
into the binary crate; move the builder/translation logic for sensor-beta into
the runtime/io crate and keep pipeline.rs as thin wiring: extract
build_platform_sensor_beta_config (and any validation like
is_sensor_beta_input_type usage and the creation of
PlatformSensorBetaInput::new) into a new public builder/util in logfwd-runtime
(or logfwd-io) that returns a ready-to-use PlatformSensorBetaConfig and an
initialized source (or a factory function that returns Box<dyn InputSource>,
format and buf_cap), then in pipeline.rs call that runtime API to receive
raw_source/format/buf_cap and only perform the minimal wiring to wrap with
make_format/FramedInput and return InputState; update imports and error mapping
in pipeline.rs to delegate initialization errors to the runtime helper.

---

Duplicate comments:
In @.github/workflows/ebpf-capabilities.yml:
- Around line 83-87: The current CI step only runs the unit test
sensor_beta_formats_allow_raw_and_json_only which exercises
validate_input_format() in crates/logfwd/src/pipeline.rs but does not construct
the pipeline input path; add (per-OS) a fixture-driven smoke test invocation
that exercises pipeline/input construction (e.g., a test that calls the pipeline
input builder or the higher-level integration test that creates a sensor input)
so regressions in control-path resolution, default raw framing, or framing
bypass are caught; update the .github/workflows/ebpf-capabilities.yml job to run
that new/alternative logfwd test (in addition to
sensor_beta_formats_allow_raw_and_json_only) for each OS instead of relying
solely on sensor_beta_formats_allow_raw_and_json_only.

In `@crates/logfwd-config/src/lib.rs`:
- Around line 994-1078: In the InputType::LinuxSensorBeta | MacosSensorBeta |
WindowsSensorBeta validation branch (around the sensor_beta checks), also reject
inputs whose format is not "raw" or "json": inspect input.format (or
input.format.as_deref()) and if Some(fmt) and fmt != "raw" && fmt != "json"
return ConfigError::Validation with a message like "pipeline '{name}' input
'{label}': sensor_beta inputs only support 'raw' or 'json' format". Apply the
same check in the other sensor_beta branch referenced (the one at lines
~1089-1094) so both sensor_beta input types consistently validate supported
formats; keep references to InputType::LinuxSensorBeta,
InputType::MacosSensorBeta, InputType::WindowsSensorBeta, sensor_beta,
validate_sensor_beta_family_name and sensor_beta_supported_families_csv to
locate the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2779166c-f2d7-490a-b66d-bdd7b5587210

📥 Commits

Reviewing files that changed from the base of the PR and between 210b27a and 6dd12ba.

📒 Files selected for processing (5)
  • .github/workflows/ebpf-capabilities.yml
  • crates/logfwd-config/src/lib.rs
  • crates/logfwd-io/src/platform_sensor_beta.rs
  • crates/logfwd/src/pipeline.rs
  • scripts/ci-ebpf-capabilities.sh

Comment thread .github/workflows/ebpf-capabilities.yml Outdated
Comment thread crates/logfwd-config/src/lib.rs Outdated
Comment thread crates/logfwd-config/src/lib.rs Outdated
Comment thread crates/logfwd-io/src/platform_sensor_beta.rs
Comment thread crates/logfwd/src/pipeline.rs Outdated

Copy link
Copy Markdown
Owner Author

Did a deeper branch-wide audit pass and pushed follow-up commit 0270d7e.

What I found and fixed:

  • main.rs read-only validation path was still treating *_sensor_beta inputs as JSON-only.
    • Runtime pipeline now allows and defaults these inputs to format: raw.
    • Fixed read-only default format + validation to accept raw|json for sensor beta inputs.
    • Added regression test: validate_pipelines_read_only_accepts_sensor_beta_raw.
  • Updated docs to match current behavior:
    • README sensor-beta example now uses format: raw and Arrow-native wording.
    • Config reference now states raw (default) or json for platform sensor betas and documents control-plane fields (control_path, control_reload_interval_ms, enabled_families, emit_signal_rows).

Verification run after this commit:

  • cargo fmt --all
  • cargo check -p logfwd --all-targets
  • cargo test -p logfwd-config sensor_beta -- --nocapture
  • cargo test -p logfwd-io platform_sensor_beta::tests -- --nocapture
  • cargo test -p logfwd validate_pipelines_read_only_rejects_otlp_raw -- --nocapture
  • cargo test -p logfwd validate_pipelines_read_only_accepts_sensor_beta_raw -- --nocapture

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@book/src/config/reference.md`:
- Around line 204-222: Change the heading paragraph for the windows_sensor_beta
input so its opening sentence matches the other sensor sections: replace "This
input only supports `format: raw` (default) or `format: json`." with the
consistent phrasing "This input supports `format: raw` (default) or `format:
json`." in the windows_sensor_beta input block to align wording with the other
sensor types.

In `@crates/logfwd/src/main.rs`:
- Around line 1959-1974: Add additional unit tests alongside
validate_pipelines_read_only_accepts_sensor_beta_raw to cover the full
sensor-beta platform/format matrix: create separate tests that call
validate_pipelines_read_only with yaml inputs for macos_sensor_beta+raw,
windows_sensor_beta+raw, and at least one sensor-beta (linux/macos/windows) with
format json to assert Ok. Use one test per behavior with descriptive test names
(e.g., validate_pipelines_read_only_accepts_macos_sensor_beta_raw,
validate_pipelines_read_only_accepts_windows_sensor_beta_raw,
validate_pipelines_read_only_accepts_sensor_beta_json) and reuse the same call
pattern to validate_pipelines_read_only(&config, None, |_name| {}, |_err| {}).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: d7a33c44-f73d-407c-8f72-b9b502ee7b01

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd12ba and 0270d7e.

📒 Files selected for processing (3)
  • README.md
  • book/src/config/reference.md
  • crates/logfwd/src/main.rs

Comment thread book/src/config/reference.md Outdated
Comment thread crates/logfwd/src/main.rs
Comment thread crates/logfwd-io/src/platform_sensor_beta.rs
@strawgate

Copy link
Copy Markdown
Owner Author

Why do we have a raw and a json mode?

Let's also remove beta from the names and we'll need to figure out how to test in CI

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
crates/logfwd/src/main.rs (1)

1959-1982: 🧹 Nitpick | 🔵 Trivial

Split this matrix test and add a json case.

This still leaves validate_pipelines_read_only(..., format: json) untested, and the loop makes failures less localizable than separate scenario tests.

As per coding guidelines: "Write one test per behavior with descriptive names describing the scenario (e.g., empty_input_returns_none). Test names should make the expected behavior clear without reading the test body."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd/src/main.rs` around lines 1959 - 1982, Split the existing
matrix test validate_pipelines_read_only_accepts_sensor_beta_raw into individual
tests for each sensor_beta platform and add a separate test for the json format;
specifically create one test per behavior (e.g.,
validate_pipelines_read_only_accepts_linux_sensor_beta_raw,
validate_pipelines_read_only_accepts_macos_sensor_beta_raw,
validate_pipelines_read_only_accepts_windows_sensor_beta_raw) that each build
the YAML with format: raw and call logfwd_config::Config::load_str(...) and
validate_pipelines_read_only(&config, None, |_name| {}, |_err| {}) asserting
is_ok(), and add another test (e.g.,
validate_pipelines_read_only_accepts_sensor_beta_json) that uses format: json
and asserts the same; this makes failures localizable and ensures the json case
is covered.
crates/logfwd-config/src/lib.rs (1)

994-1077: ⚠️ Potential issue | 🟠 Major

Reject non-raw/json formats for *_sensor_beta during config load.

This branch still lets cri, auto, text, and console through because the only later format gate rejects logfmt/syslog. That means Config::load_str() can accept sensor-beta configs the runtime is documented to refuse.

Suggested fix
                     InputType::LinuxSensorBeta
                     | InputType::MacosSensorBeta
                     | InputType::WindowsSensorBeta => {
+                        if let Some(fmt) = &input.format
+                            && !matches!(fmt, Format::Raw | Format::Json)
+                        {
+                            return Err(ConfigError::Validation(format!(
+                                "pipeline '{name}' input '{label}': {} inputs only support format raw or json",
+                                input.input_type
+                            )));
+                        }
                         if input.generator.is_some() {
                             return Err(ConfigError::Validation(format!(
                                 "pipeline '{name}' input '{label}': 'generator' settings are only supported for generator inputs"
                             )));
                         }

Also applies to: 1088-1093

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-config/src/lib.rs` around lines 994 - 1077, The validation for
InputType::LinuxSensorBeta|MacosSensorBeta|WindowsSensorBeta currently doesn't
reject unsupported output formats early; update the sensor-beta branch in the
validator (the match arm handling those InputType variants and the sensor_beta
field) to check input.format (or however format is stored) and return
ConfigError::Validation if the format is not "raw" or "json" (i.e., disallow
"cri", "auto", "text", "console", etc.) so Config::load_str() fails for
sensor-beta inputs with non-raw/json formats; apply the same check in the
sibling branch around the later sensor-beta validation (the second occurrence
mentioned for lines 1088-1093).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@book/src/config/reference.md`:
- Around line 167-174: Update the documentation for sensor_beta.enabled_families
to explicitly state that an empty array is a valid override that disables all
signal families (i.e., [] means “no families enabled” rather than falling back
to defaults); modify the existing entry for sensor_beta.enabled_families and the
other duplicated occurrences of this field so they each mention this behavior
and the possible values (e.g., process,file,network,dns,authz) and that omitting
the field uses the default set while providing [] disables everything.

In `@crates/logfwd-io/src/platform_sensor_beta.rs`:
- Around line 582-586: The constructor new() currently calls
read_control_file(cfg.control_path) and returns Err on parse/IO failures, so a
malformed or partial control file prevents the input from starting; change new()
to treat failures from read_control_file(path) as non-fatal: when
cfg.control_path.is_some() and read_control_file(...) returns an Err, log or
record a control_reload_failed-style event but keep using the static control
value (do not replace control or return Err), and schedule/allow subsequent
reload attempts to surface the parsing error via the existing reload/error path
(i.e., leave control unchanged and only apply apply_control_file(control,
file_cfg, target) when read_control_file succeeds). Ensure the code references
cfg.control_path, read_control_file, apply_control_file and the new()
initialization path so the startup no longer fails on bad control files.
- Around line 983-1010: The test
control_file_reload_updates_generation_and_families uses raw tempfile and
std::fs::write to create and modify control files; replace those direct
filesystem calls by using the crate's shared tempfiles test helpers (the
tempfiles module) to create the control file and write its contents so tests go
through the standardized helpers; update the creation of control_path and the
write step (used with PlatformSensorBetaInput::new, control_path, and the
subsequent write/sleep) to call the appropriate tempfile helper functions
instead of tempfile::tempdir and std::fs::write, and ensure any helper cleanup
or sync semantics the helper provides are used in place of the explicit sleep.

In `@crates/logfwd/src/pipeline.rs`:
- Around line 1774-1787: Split the single test
sensor_beta_formats_allow_raw_and_json_only into three focused tests, each
asserting one behavior: (1) sensor_beta_accepts_raw should call
validate_input_format("sensor", InputType::LinuxSensorBeta, &Format::Raw) and
assert is_ok(), (2) sensor_beta_accepts_json should call
validate_input_format("sensor", InputType::WindowsSensorBeta, &Format::Json) and
assert is_ok(), and (3) sensor_beta_rejects_cri should call
validate_input_format("sensor", InputType::MacosSensorBeta, &Format::Cri) and
assert expect_err with a message containing "expected raw or json"; keep the
same validate_input_format and InputType variants but give each test a
descriptive name so failures point to the exact behavior.

---

Duplicate comments:
In `@crates/logfwd-config/src/lib.rs`:
- Around line 994-1077: The validation for
InputType::LinuxSensorBeta|MacosSensorBeta|WindowsSensorBeta currently doesn't
reject unsupported output formats early; update the sensor-beta branch in the
validator (the match arm handling those InputType variants and the sensor_beta
field) to check input.format (or however format is stored) and return
ConfigError::Validation if the format is not "raw" or "json" (i.e., disallow
"cri", "auto", "text", "console", etc.) so Config::load_str() fails for
sensor-beta inputs with non-raw/json formats; apply the same check in the
sibling branch around the later sensor-beta validation (the second occurrence
mentioned for lines 1088-1093).

In `@crates/logfwd/src/main.rs`:
- Around line 1959-1982: Split the existing matrix test
validate_pipelines_read_only_accepts_sensor_beta_raw into individual tests for
each sensor_beta platform and add a separate test for the json format;
specifically create one test per behavior (e.g.,
validate_pipelines_read_only_accepts_linux_sensor_beta_raw,
validate_pipelines_read_only_accepts_macos_sensor_beta_raw,
validate_pipelines_read_only_accepts_windows_sensor_beta_raw) that each build
the YAML with format: raw and call logfwd_config::Config::load_str(...) and
validate_pipelines_read_only(&config, None, |_name| {}, |_err| {}) asserting
is_ok(), and add another test (e.g.,
validate_pipelines_read_only_accepts_sensor_beta_json) that uses format: json
and asserts the same; this makes failures localizable and ensures the json case
is covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3565b579-30cc-421e-8a3b-f51442c3420e

📥 Commits

Reviewing files that changed from the base of the PR and between 0270d7e and 753222d.

📒 Files selected for processing (6)
  • .github/workflows/ebpf-capabilities.yml
  • book/src/config/reference.md
  • crates/logfwd-config/src/lib.rs
  • crates/logfwd-io/src/platform_sensor_beta.rs
  • crates/logfwd/src/main.rs
  • crates/logfwd/src/pipeline.rs

Comment thread book/src/config/reference.md Outdated
Comment thread crates/logfwd-io/src/platform_sensor_beta.rs Outdated
Comment thread crates/logfwd-io/src/platform_sensor.rs
Comment thread crates/logfwd/src/pipeline.rs Outdated
coderabbitai[bot]
coderabbitai Bot previously requested changes Apr 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
crates/logfwd-io/src/platform_sensor_beta.rs (2)

582-586: ⚠️ Potential issue | 🟠 Major

Don't make an optional control file a startup dependency.

Lines 582-586 still fail new() on malformed or partially-written control JSON. That makes the optional control plane able to prevent the input from starting, even though the reload path already models the same condition as control_reload_failed. Fall back to the static config at startup and surface the file error on reload instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-io/src/platform_sensor_beta.rs` around lines 582 - 586, The
startup path in new() currently propagates errors from read_control_file(path)
and can abort startup; change the logic so cfg.control_path.as_ref() &&
read_control_file only applies when read_control_file returns Ok(file_cfg)
(i.e., do not ?-propagate read_control_file errors from new()). Instead, on
startup fall back to the static control value and defer surfacing file
parsing/IO errors to the reload path (emit a control_reload_failed/log the error
or set the same error state used by reload). Update the conditional around
cfg.control_path, read_control_file, and apply_control_file so malformed/partial
control JSON does not cause new() to fail but still gets reported via the
existing reload/error handling.

987-1014: 🛠️ Refactor suggestion | 🟠 Major

Use the shared tempfiles helpers in these file-I/O tests.

These tests still create temp dirs/files with tempfile::tempdir and std::fs::write directly. In logfwd-io, file-I/O tests are supposed to go through the crate’s tempfiles helpers instead. As per coding guidelines, In logfwd-io, tests must use tempfiles for file I/O testing — no raw payload injection. See #1615.

Also applies to: 1046-1073

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/logfwd-io/src/platform_sensor_beta.rs` around lines 987 - 1014, The
test control_file_reload_updates_generation_and_families uses
tempfile::tempdir() and std::fs::write for creating the control file; replace
those raw file-I/O calls with the crate's shared tempfiles helpers (use the
crate's temp dir/file creation and write helpers) so the control file used by
PlatformSensorBetaInput is created via the tempfiles utilities instead of
tempfile::tempdir and std::fs::write; update the control_path initialization and
the write step in control_file_reload_updates_generation_and_families (and the
similar block around the other lines noted) to call the tempfiles helper APIs
that create a temp directory and write the JSON payload into control_path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/logfwd-io/src/platform_sensor_beta.rs`:
- Around line 279-293: The periodic signal-sample emission is currently gated by
self.state.control.emit_heartbeat; change the logic in the emission block so
that signal samples (self.common.signal_sample_rows call) are evaluated and
emitted independently when emit_signal_rows is enabled and last_emit.elapsed()
>= self.common.cfg.poll_interval, while heartbeat rows (self.common.control_row)
remain gated by emit_heartbeat; update self.state.last_emit when either path
emits to avoid silence, and add a regression test covering emit_heartbeat =
false with emit_signal_rows = true to verify samples continue after startup.

---

Duplicate comments:
In `@crates/logfwd-io/src/platform_sensor_beta.rs`:
- Around line 582-586: The startup path in new() currently propagates errors
from read_control_file(path) and can abort startup; change the logic so
cfg.control_path.as_ref() && read_control_file only applies when
read_control_file returns Ok(file_cfg) (i.e., do not ?-propagate
read_control_file errors from new()). Instead, on startup fall back to the
static control value and defer surfacing file parsing/IO errors to the reload
path (emit a control_reload_failed/log the error or set the same error state
used by reload). Update the conditional around cfg.control_path,
read_control_file, and apply_control_file so malformed/partial control JSON does
not cause new() to fail but still gets reported via the existing reload/error
handling.
- Around line 987-1014: The test
control_file_reload_updates_generation_and_families uses tempfile::tempdir() and
std::fs::write for creating the control file; replace those raw file-I/O calls
with the crate's shared tempfiles helpers (use the crate's temp dir/file
creation and write helpers) so the control file used by PlatformSensorBetaInput
is created via the tempfiles utilities instead of tempfile::tempdir and
std::fs::write; update the control_path initialization and the write step in
control_file_reload_updates_generation_and_families (and the similar block
around the other lines noted) to call the tempfiles helper APIs that create a
temp directory and write the JSON payload into control_path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: dac1abb1-3077-4268-9744-221b80bb4906

📥 Commits

Reviewing files that changed from the base of the PR and between 753222d and 8b2b75c.

📒 Files selected for processing (1)
  • crates/logfwd-io/src/platform_sensor_beta.rs

Comment thread crates/logfwd-io/src/platform_sensor_beta.rs Outdated
@strawgate strawgate changed the title ci: add platform-specific sensor capability lanes ci: stabilize cross-platform sensor lanes and resolve review feedback Apr 9, 2026
Comment thread crates/logfwd-runtime/src/pipeline/input_build.rs
@strawgate strawgate changed the title ci: stabilize cross-platform sensor lanes and resolve review feedback sensor: use ebpf/es input names and Arrow-native path Apr 9, 2026
…-beta-3os-ci-lanes

# Conflicts:
#	.github/workflows/ebpf-capabilities.yml
#	crates/logfwd-config/src/lib.rs
#	crates/logfwd-config/src/validate.rs
#	crates/logfwd-runtime/src/pipeline/input_build.rs
#	scripts/ci-ebpf-capabilities.sh
@github-actions github-actions Bot dismissed coderabbitai[bot]’s stale review April 10, 2026 03:49

Auto-dismissed because every review thread opened from this change request is now resolved. If additional changes are still required, please leave a new review.

@strawgate strawgate merged commit ad0261c into main Apr 10, 2026
22 of 24 checks passed
@strawgate strawgate deleted the codex/platform-sensor-beta-3os-ci-lanes branch April 10, 2026 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants