fix: correct 6 documentation accuracy bugs#786
Conversation
- #718: README marks elasticsearch as '✅ Implemented' but config rejects it as 'not yet implemented' → change to '🔜 Planned' - #713: README advanced-layout examples use singular input:/output: keys which silently fail → fix to plural inputs:/outputs: - #706: CONFIG_REFERENCE documents _file_str column that doesn't exist at runtime → remove the row - #714: generator input type completely undocumented → add to CONFIG_REFERENCE input types table - #719: CRI timestamp documented as _time_ns_int (int64 ns) but actual column is _timestamp_str (RFC 3339 string) → fix docs to match code - #711: COLUMN_NAMING.md implies cross-batch schema stability (C3) is available when only per-batch fidelity (C1) is implemented → clarify and reference #625 Fixes #718, #713, #706, #714, #719, #711 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WalkthroughUpdates only documentation across three files. README.md: renamed pipeline keys in the “Advanced — multiple named pipelines” YAML example from Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
192-194:⚠️ Potential issue | 🔴 CriticalRemove incorrect CRI column documentation from README.md.
Lines 192-194 document non-existent Arrow columns. CRI input processing injects
_timestampand_streamas JSON metadata (not as separate Arrow columns), and_file_stris not created as an Arrow column by the scanner. The column name_time_ns_intdoes not exist anywhere in the codebase. Refer to CONFIG_REFERENCE.md lines 316-317 for accurate documentation of actual CRI columns.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 192 - 194, Remove the three incorrect README table rows that document `_file_str`, `_time_ns_int`, and `_stream_str`; instead update the CRI section to state that CRI processing injects `_timestamp` and `_stream` as JSON metadata (not Arrow columns) and that `_file_str` and `_time_ns_int` are not created by the scanner, and point readers to the CONFIG_REFERENCE.md CRI section for the authoritative column/metadata list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/CONFIG_REFERENCE.md`:
- Around line 316-317: The docs still reference and join on the removed
`_file_str`; update the examples at the affected snippets to stop using
`_file_str` and instead use the correct remaining fields (e.g., `_stream_str`
for stream name or `_timestamp_str` for time) or construct paths using the new
canonical field(s) used elsewhere in the doc; search for occurrences of
`_file_str` in the examples and replace the join logic with usage of
`_stream_str` (or the appropriate field) and adjust surrounding text to remove
mention of `_file_str`.
In `@README.md`:
- Line 216: The README row incorrectly labels Elasticsearch as "🔜 Planned"
despite implemented code; update the table entry for `elasticsearch` to indicate
it is implemented (e.g., "✅ Implemented" or "Implemented") and keep or slightly
adjust the description to match existing behavior (mentioning Elasticsearch bulk
API with retry logic and per-document error handling) to reflect that
`ElasticsearchSink`, `OutputType::Elasticsearch`, and deserialization are
present and active.
---
Outside diff comments:
In `@README.md`:
- Around line 192-194: Remove the three incorrect README table rows that
document `_file_str`, `_time_ns_int`, and `_stream_str`; instead update the CRI
section to state that CRI processing injects `_timestamp` and `_stream` as JSON
metadata (not Arrow columns) and that `_file_str` and `_time_ns_int` are not
created by the scanner, and point readers to the CONFIG_REFERENCE.md CRI section
for the authoritative column/metadata list.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: a1984235-7a75-4447-9df6-9a6fbaed76b0
📒 Files selected for processing (3)
README.mddocs/COLUMN_NAMING.mddocs/CONFIG_REFERENCE.md
- Replace _file_str with _source_path_str in k8s enrichment SQL examples and note the column requires source path injection (not yet implemented) - Mark Elasticsearch as Implemented in README (ElasticsearchSink exists with bulk API, retry logic, and per-document error handling) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
192-202:⚠️ Potential issue | 🔴 CriticalCritical: CRI column names remain incorrect in README.
PR objectives state that issues
#706and#719fixed documentation bugs where_file_strdoesn't exist at runtime and_time_ns_intshould be_timestamp_str(RFC 3339 format). Those fixes were applied to CONFIG_REFERENCE.md but not here. Lines 192-194 and the SQL example on line 199 still document the incorrect column names, leaving README.md inconsistent with the corrected CONFIG_REFERENCE.md.📋 Proposed fix based on PR objectives `#706` and `#719`
| Column | Description | |--------|-------------| -| `_file_str` | Absolute path of the source log file | -| `_time_ns_int` | CRI timestamp in nanoseconds | +| `_timestamp_str` | CRI timestamp in RFC 3339 format | | `_stream_str` | `stdout` or `stderr` | -Use `_file_str` to identify which pod and container generated a record, or filter by stream: +Filter by stream or timestamp: ```sql -SELECT _time_ns_int, _stream_str, level_str, message_str +SELECT _timestamp_str, _stream_str, level_str, message_str FROM logs WHERE _stream_str = 'stderr' AND level_str = 'ERROR'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 192 - 202, Update README.md to match CONFIG_REFERENCE.md by replacing incorrect CRI column names: change `_file_str` and `_time_ns_int` usages to the runtime-correct `_timestamp_str` (RFC 3339) and the actual file/identifier column used in runtime (remove or replace `_file_str` with the correct column documented elsewhere), and update the SQL example SELECT (replace `_time_ns_int` with `_timestamp_str`) so the `SELECT _timestamp_str, _stream_str, level_str, message_str FROM logs WHERE _stream_str = 'stderr' AND level_str = 'ERROR'` example reflects the corrected column names.
♻️ Duplicate comments (2)
docs/CONFIG_REFERENCE.md (2)
492-492:⚠️ Potential issue | 🟠 MajorDuplicate issue: unavailable column in SQL example.
Same problem as line 372-373. The inline comment warns about "not yet implemented" but the SQL still uses
_source_path_str, which will fail or return unexpected results.📝 Suggested documentation fix
- LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix -- requires source path injection + -- LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix -- source path injection not yet implemented🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/CONFIG_REFERENCE.md` at line 492, The SQL example uses a non-existent column `_source_path_str` in the LEFT JOIN condition (LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix) which will fail; edit the example to either remove that join until source path injection is implemented or replace `_source_path_str` with the actual available column (e.g., `source_path` or whichever column exists on alias `l`), and keep the inline comment about "requires source path injection" so the example is accurate and doesn't reference a missing symbol (`_source_path_str`) while still showing the intended k8s join (`k.log_path_prefix`).
372-373:⚠️ Potential issue | 🟠 MajorSQL examples use unavailable column, will mislead users.
The note states "not yet implemented" but then immediately shows SQL using
_source_path_str. Users will copy these queries expecting them to work. Either:
- Remove the SQL examples entirely and replace with a placeholder comment, OR
- If
_source_path_strexists but is empty/null, clarify that explicitlyAs written, this creates confusion about whether the column exists at all.
📝 Suggested documentation fix
--- Requires source path column injection (not yet implemented). --- Once available, join on the source file path: -SELECT l.level_str, l.message_str, k.namespace, k.pod_name, k.container_name -FROM logs l -JOIN k8s k ON l._source_path_str = k.log_path_prefix +-- Source path column injection is not yet implemented. +-- Future join syntax will be: +-- SELECT l.level_str, l.message_str, k.namespace, k.pod_name, k.container_name +-- FROM logs l +-- JOIN k8s k ON l._source_path_str = k.log_path_prefix🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/CONFIG_REFERENCE.md` around lines 372 - 373, The SQL snippet references a not-yet-implemented column `_source_path_str` and will mislead users; update the documentation around the join/SQL examples to either remove the example SQL entirely and replace it with a short placeholder mentioning "column not yet implemented", or explicitly state that `_source_path_str` does not exist / will be null/empty until implemented and show a non-working illustrative example with a clear warning; ensure the text around the join on source file path and any example queries reference the symbol `_source_path_str` so readers know exactly which column is affected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Line 216: Remove the trailing space after "Implemented" in the Markdown table
cell for the `elasticsearch` row (the line containing "| `elasticsearch` | ✅
Implemented | Elasticsearch bulk API with retry logic, per-document error
handling |"); edit that cell so "Implemented" has no trailing spaces between it
and the closing pipe.
---
Outside diff comments:
In `@README.md`:
- Around line 192-202: Update README.md to match CONFIG_REFERENCE.md by
replacing incorrect CRI column names: change `_file_str` and `_time_ns_int`
usages to the runtime-correct `_timestamp_str` (RFC 3339) and the actual
file/identifier column used in runtime (remove or replace `_file_str` with the
correct column documented elsewhere), and update the SQL example SELECT (replace
`_time_ns_int` with `_timestamp_str`) so the `SELECT _timestamp_str,
_stream_str, level_str, message_str FROM logs WHERE _stream_str = 'stderr' AND
level_str = 'ERROR'` example reflects the corrected column names.
---
Duplicate comments:
In `@docs/CONFIG_REFERENCE.md`:
- Line 492: The SQL example uses a non-existent column `_source_path_str` in the
LEFT JOIN condition (LEFT JOIN k8s k ON l._source_path_str = k.log_path_prefix)
which will fail; edit the example to either remove that join until source path
injection is implemented or replace `_source_path_str` with the actual available
column (e.g., `source_path` or whichever column exists on alias `l`), and keep
the inline comment about "requires source path injection" so the example is
accurate and doesn't reference a missing symbol (`_source_path_str`) while still
showing the intended k8s join (`k.log_path_prefix`).
- Around line 372-373: The SQL snippet references a not-yet-implemented column
`_source_path_str` and will mislead users; update the documentation around the
join/SQL examples to either remove the example SQL entirely and replace it with
a short placeholder mentioning "column not yet implemented", or explicitly state
that `_source_path_str` does not exist / will be null/empty until implemented
and show a non-working illustrative example with a clear warning; ensure the
text around the join on source file path and any example queries reference the
symbol `_source_path_str` so readers know exactly which column is affected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: e1d41827-71dc-4bde-a442-423c7206b155
📒 Files selected for processing (2)
README.mddocs/CONFIG_REFERENCE.md
| | `http` | ✅ Implemented | JSON lines over HTTP POST, optional zstd compression | | ||
| | `stdout` | ✅ Implemented | JSON or colored console output — great for local debugging | | ||
| | `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling | | ||
| | `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling | |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Remove trailing whitespace.
The elasticsearch status cell has an unnecessary trailing space after "Implemented".
🧹 Remove trailing space
-| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |
+| `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling | | |
| | `elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic, per-document error handling | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` at line 216, Remove the trailing space after "Implemented" in the
Markdown table cell for the `elasticsearch` row (the line containing "|
`elasticsearch` | ✅ Implemented | Elasticsearch bulk API with retry logic,
per-document error handling |"); edit that cell so "Implemented" has no trailing
spaces between it and the closing pipe.
Summary
Six cases where the docs say one thing and the runtime does another.
Bugs fixed
input:/output:keys that silently failinputs:/outputs:_file_strcolumn that doesn't exist at runtimegeneratorinput type completely undocumented_time_ns_int(int64 ns) but actual is_timestamp_str(RFC 3339)Files changed
README.md— elasticsearch status, advanced-layout example keysdocs/CONFIG_REFERENCE.md— removed phantom column, added generator, fixed CRI timestampdocs/COLUMN_NAMING.md— clarified C1 vs C3 stability statusFixes #718, #713, #706, #714, #719, #711