Skip to content

fix: correct 6 documentation accuracy bugs#786

Merged
strawgate merged 2 commits into
masterfrom
fix/doc-accuracy
Apr 3, 2026
Merged

fix: correct 6 documentation accuracy bugs#786
strawgate merged 2 commits into
masterfrom
fix/doc-accuracy

Conversation

@strawgate
Copy link
Copy Markdown
Owner

Summary

Six cases where the docs say one thing and the runtime does another.

Bugs fixed

Issue Bug Fix
#718 README marks elasticsearch as ✅ Implemented but config rejects it Changed to 🔜 Planned
#713 README advanced-layout uses singular input:/output: keys that silently fail Fixed to plural inputs:/outputs:
#706 CONFIG_REFERENCE documents _file_str column that doesn't exist at runtime Removed the row
#714 generator input type completely undocumented Added to CONFIG_REFERENCE
#719 CRI timestamp documented as _time_ns_int (int64 ns) but actual is _timestamp_str (RFC 3339) Fixed docs to match code
#711 COLUMN_NAMING.md implies C3 cross-batch stability is available (only C1 is implemented) Clarified status, referenced #625

Files changed

  • README.md — elasticsearch status, advanced-layout example keys
  • docs/CONFIG_REFERENCE.md — removed phantom column, added generator, fixed CRI timestamp
  • docs/COLUMN_NAMING.md — clarified C1 vs C3 stability status

Fixes #718, #713, #706, #714, #719, #711

- #718: README marks elasticsearch as '✅ Implemented' but config rejects
  it as 'not yet implemented' → change to '🔜 Planned'
- #713: README advanced-layout examples use singular input:/output: keys
  which silently fail → fix to plural inputs:/outputs:
- #706: CONFIG_REFERENCE documents _file_str column that doesn't exist at
  runtime → remove the row
- #714: generator input type completely undocumented → add to
  CONFIG_REFERENCE input types table
- #719: CRI timestamp documented as _time_ns_int (int64 ns) but actual
  column is _timestamp_str (RFC 3339 string) → fix docs to match code
- #711: COLUMN_NAMING.md implies cross-batch schema stability (C3) is
  available when only per-batch fidelity (C1) is implemented → clarify
  and reference #625

Fixes #718, #713, #706, #714, #719, #711

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Walkthrough

Updates only documentation across three files. README.md: renamed pipeline keys in the “Advanced — multiple named pipelines” YAML example from inputinputs and outputoutputs for errors and debug, and added a trailing space to the elasticsearch status cell (✅ Implemented ). docs/COLUMN_NAMING.md: clarifies that only per-batch type fidelity (C1) is implemented and that full cross-batch schema stability (C3) is not yet implemented, with notes on the TableProvider requirements. docs/CONFIG_REFERENCE.md: documents a new generator input type, removes _file_str from special columns, replaces _time_ns_int with _timestamp_str, and updates example SQL to use _source_path_str with a note about source path injection not yet implemented.

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
High-Quality Rust Practices ✅ Passed The PR contains no Rust source code changes (.rs files), consisting entirely of documentation updates to README.md and docs/ markdown files. The High-Quality Rust Practices check does not apply to documentation-only modifications.
Formal Verification Coverage ✅ Passed PR modifies only documentation files with no changes to Rust source code. No new public functions introduced, so formal verification coverage requirements do not apply.
Documentation Thoroughly Updated ✅ Passed PR updates CONFIG_REFERENCE.md and COLUMN_NAMING.md per guidance; no conditions triggered for other documentation files.
Crate Boundary And Dependency Integrity ✅ Passed PR contains only documentation updates to README.md and docs files with no modifications to Cargo.toml, Rust source code, crate structures, or dependency declarations.
Maintainer Fitness ✅ Passed Documentation-only PR fixing six tightly-related bugs across README, COLUMN_NAMING.md, and CONFIG_REFERENCE.md with clear issue traceability and no semantic code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

192-194: ⚠️ Potential issue | 🔴 Critical

Remove incorrect CRI column documentation from README.md.

Lines 192-194 document non-existent Arrow columns. CRI input processing injects _timestamp and _stream as JSON metadata (not as separate Arrow columns), and _file_str is not created as an Arrow column by the scanner. The column name _time_ns_int does not exist anywhere in the codebase. Refer to CONFIG_REFERENCE.md lines 316-317 for accurate documentation of actual CRI columns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 192 - 194, Remove the three incorrect README table
rows that document `_file_str`, `_time_ns_int`, and `_stream_str`; instead
update the CRI section to state that CRI processing injects `_timestamp` and
`_stream` as JSON metadata (not Arrow columns) and that `_file_str` and
`_time_ns_int` are not created by the scanner, and point readers to the
CONFIG_REFERENCE.md CRI section for the authoritative column/metadata list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/CONFIG_REFERENCE.md`:
- Around line 316-317: The docs still reference and join on the removed
`_file_str`; update the examples at the affected snippets to stop using
`_file_str` and instead use the correct remaining fields (e.g., `_stream_str`
for stream name or `_timestamp_str` for time) or construct paths using the new
canonical field(s) used elsewhere in the doc; search for occurrences of
`_file_str` in the examples and replace the join logic with usage of
`_stream_str` (or the appropriate field) and adjust surrounding text to remove
mention of `_file_str`.

In `@README.md`:
- Line 216: The README row incorrectly labels Elasticsearch as "🔜 Planned"
despite implemented code; update the table entry for `elasticsearch` to indicate
it is implemented (e.g., "✅ Implemented" or "Implemented") and keep or slightly
adjust the description to match existing behavior (mentioning Elasticsearch bulk
API with retry logic and per-document error handling) to reflect that
`ElasticsearchSink`, `OutputType::Elasticsearch`, and deserialization are
present and active.

---

Outside diff comments:
In `@README.md`:
- Around line 192-194: Remove the three incorrect README table rows that
document `_file_str`, `_time_ns_int`, and `_stream_str`; instead update the CRI
section to state that CRI processing injects `_timestamp` and `_stream` as JSON
metadata (not Arrow columns) and that `_file_str` and `_time_ns_int` are not
created by the scanner, and point readers to the CONFIG_REFERENCE.md CRI section
for the authoritative column/metadata list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: a1984235-7a75-4447-9df6-9a6fbaed76b0

📥 Commits

Reviewing files that changed from the base of the PR and between 8a9c452 and 0aa0337.

📒 Files selected for processing (3)
  • README.md
  • docs/COLUMN_NAMING.md
  • docs/CONFIG_REFERENCE.md

Comment thread docs/CONFIG_REFERENCE.md
Comment thread README.md Outdated
- Replace _file_str with _source_path_str in k8s enrichment SQL examples
  and note the column requires source path injection (not yet implemented)
- Mark Elasticsearch as Implemented in README (ElasticsearchSink exists
  with bulk API, retry logic, and per-document error handling)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

192-202: ⚠️ Potential issue | 🔴 Critical

Critical: CRI column names remain incorrect in README.

PR objectives state that issues #706 and #719 fixed documentation bugs where _file_str doesn't exist at runtime and _time_ns_int should be _timestamp_str (RFC 3339 format). Those fixes were applied to CONFIG_REFERENCE.md but not here. Lines 192-194 and the SQL example on line 199 still document the incorrect column names, leaving README.md inconsistent with the corrected CONFIG_REFERENCE.md.

📋 Proposed fix based on PR objectives `#706` and `#719`
 | Column | Description |
 |--------|-------------|
-| `_file_str` | Absolute path of the source log file |
-| `_time_ns_int` | CRI timestamp in nanoseconds |
+| `_timestamp_str` | CRI timestamp in RFC 3339 format |
 | `_stream_str` | `stdout` or `stderr` |
 
-Use `_file_str` to identify which pod and container generated a record, or filter by stream:
+Filter by stream or timestamp:
 
 ```sql
-SELECT _time_ns_int, _stream_str, level_str, message_str
+SELECT _timestamp_str, _stream_str, level_str, message_str
 FROM logs
 WHERE _stream_str = 'stderr'
   AND level_str = 'ERROR'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 192 - 202, Update README.md to match
CONFIG_REFERENCE.md by replacing incorrect CRI column names: change `_file_str`
and `_time_ns_int` usages to the runtime-correct `_timestamp_str` (RFC 3339) and
the actual file/identifier column used in runtime (remove or replace `_file_str`
with the correct column documented elsewhere), and update the SQL example SELECT
(replace `_time_ns_int` with `_timestamp_str`) so the `SELECT _timestamp_str,
_stream_str, level_str, message_str FROM logs WHERE _stream_str = 'stderr' AND
level_str = 'ERROR'` example reflects the corrected column names.
♻️ Duplicate comments (2)
docs/CONFIG_REFERENCE.md (2)

492-492: ⚠️ Potential issue | 🟠 Major

Duplicate issue: unavailable column in SQL example.

Same problem as line 372-373. The inline comment warns about "not yet implemented" but the SQL still uses _source_path_str, which will fail or return unexpected results.

📝 Suggested documentation fix
-      LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix  -- requires source path injection
+      -- LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix  -- source path injection not yet implemented
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CONFIG_REFERENCE.md` at line 492, The SQL example uses a non-existent
column `_source_path_str` in the LEFT JOIN condition (LEFT JOIN k8s k ON
l._source_path_str = k.log_path_prefix) which will fail; edit the example to
either remove that join until source path injection is implemented or replace
`_source_path_str` with the actual available column (e.g., `source_path` or
whichever column exists on alias `l`), and keep the inline comment about
"requires source path injection" so the example is accurate and doesn't
reference a missing symbol (`_source_path_str`) while still showing the intended
k8s join (`k.log_path_prefix`).

372-373: ⚠️ Potential issue | 🟠 Major

SQL examples use unavailable column, will mislead users.

The note states "not yet implemented" but then immediately shows SQL using _source_path_str. Users will copy these queries expecting them to work. Either:

  1. Remove the SQL examples entirely and replace with a placeholder comment, OR
  2. If _source_path_str exists but is empty/null, clarify that explicitly

As written, this creates confusion about whether the column exists at all.

📝 Suggested documentation fix
--- Requires source path column injection (not yet implemented).
--- Once available, join on the source file path:
-SELECT l.level_str, l.message_str, k.namespace, k.pod_name, k.container_name
-FROM logs l
-JOIN k8s k ON l._source_path_str = k.log_path_prefix
+-- Source path column injection is not yet implemented.
+-- Future join syntax will be:
+--   SELECT l.level_str, l.message_str, k.namespace, k.pod_name, k.container_name
+--   FROM logs l
+--   JOIN k8s k ON l._source_path_str = k.log_path_prefix
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/CONFIG_REFERENCE.md` around lines 372 - 373, The SQL snippet references
a not-yet-implemented column `_source_path_str` and will mislead users; update
the documentation around the join/SQL examples to either remove the example SQL
entirely and replace it with a short placeholder mentioning "column not yet
implemented", or explicitly state that `_source_path_str` does not exist / will
be null/empty until implemented and show a non-working illustrative example with
a clear warning; ensure the text around the join on source file path and any
example queries reference the symbol `_source_path_str` so readers know exactly
which column is affected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 216: Remove the trailing space after "Implemented" in the Markdown table
cell for the `elasticsearch` row (the line containing "| `elasticsearch` | ✅
Implemented   | Elasticsearch bulk API with retry logic, per-document error
handling |"); edit that cell so "Implemented" has no trailing spaces between it
and the closing pipe.

---

Outside diff comments:
In `@README.md`:
- Around line 192-202: Update README.md to match CONFIG_REFERENCE.md by
replacing incorrect CRI column names: change `_file_str` and `_time_ns_int`
usages to the runtime-correct `_timestamp_str` (RFC 3339) and the actual
file/identifier column used in runtime (remove or replace `_file_str` with the
correct column documented elsewhere), and update the SQL example SELECT (replace
`_time_ns_int` with `_timestamp_str`) so the `SELECT _timestamp_str,
_stream_str, level_str, message_str FROM logs WHERE _stream_str = 'stderr' AND
level_str = 'ERROR'` example reflects the corrected column names.

---

Duplicate comments:
In `@docs/CONFIG_REFERENCE.md`:
- Line 492: The SQL example uses a non-existent column `_source_path_str` in the
LEFT JOIN condition (LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix)
which will fail; edit the example to either remove that join until source path
injection is implemented or replace `_source_path_str` with the actual available
column (e.g., `source_path` or whichever column exists on alias `l`), and keep
the inline comment about "requires source path injection" so the example is
accurate and doesn't reference a missing symbol (`_source_path_str`) while still
showing the intended k8s join (`k.log_path_prefix`).
- Around line 372-373: The SQL snippet references a not-yet-implemented column
`_source_path_str` and will mislead users; update the documentation around the
join/SQL examples to either remove the example SQL entirely and replace it with
a short placeholder mentioning "column not yet implemented", or explicitly state
that `_source_path_str` does not exist / will be null/empty until implemented
and show a non-working illustrative example with a clear warning; ensure the
text around the join on source file path and any example queries reference the
symbol `_source_path_str` so readers know exactly which column is affected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1d41827-71dc-4bde-a442-423c7206b155

📥 Commits

Reviewing files that changed from the base of the PR and between 0aa0337 and 0145e26.

📒 Files selected for processing (2)
  • README.md
  • docs/CONFIG_REFERENCE.md

Comment thread README.md
| `http` | ✅ Implemented | JSON lines over HTTP POST, optional zstd compression |
| `stdout` | ✅ Implemented | JSON or colored console output — great for local debugging |
| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |
| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove trailing whitespace.

The elasticsearch status cell has an unnecessary trailing space after "Implemented".

🧹 Remove trailing space
-| `elasticsearch` | ✅ Implemented   | Elasticsearch bulk API with retry logic, per-document error handling |
+| `elasticsearch` | ✅ Implemented  | Elasticsearch bulk API with retry logic, per-document error handling |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |
| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 216, Remove the trailing space after "Implemented" in the
Markdown table cell for the `elasticsearch` row (the line containing "|
`elasticsearch` | ✅ Implemented   | Elasticsearch bulk API with retry logic,
per-document error handling |"); edit that cell so "Implemented" has no trailing
spaces between it and the closing pipe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: README shows elasticsearch as implemented but --validate rejects it as 'not yet implemented'

1 participant