Remove EloqUnitRecovery::_kvPair by xiexiaoy · Pull Request #387 · eloqdata/eloqdoc

xiexiaoy · 2025-12-21T09:41:10Z

MongoDB assumes storage engines are architected as heap-organized tables. Both _id and skey are mapped to a loc, and that loc is used to fetch a record from a base table. However, txservice is architected as index-organized tables. EloqDoc takes _id as loc, which violates MongoDB's assumption.

When accessing a table with index _id, EloqIndexCursor::seekExact() or EloqIndexCursor::next() are called to get loc for the given _id, then EloqRecordStore::seekExact() will be called to fetch a record with loc. This PR eliminates the redundant call to EloqRecordStore::seekExact() by:

EloqIndexCursor::seekExact() and EloqIndexCursor::next() return record alongwith loc for index _id.
IDHackStage::doWork(), IndexScan::doWork() and IndexIteratorStage::doWork() fill the record to WorkSetMember.

Summary by CodeRabbit

Improvements
- Index, scan and iterator paths can now return full document payloads when available, reducing extra lookups and improving query performance.
- Cursors and retrieval flows cache and surface record contents directly, streamlining result construction and advancing efficiency.
Chores
- Introduced a minor public API update to optionally retrieve record payloads during index lookup.
- Removed some legacy internal utilities and simplified internal record-handling paths.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-21T09:41:16Z

Walkthrough

Index lookups and iterators can now optionally return stored RecordData alongside a RecordId. IndexAccessMethod::findSingle gained an additional RecordData* out-parameter; execution stages and Eloq storage code paths propagate and cache RecordData to populate WorkingSetMember with full BSON when available.

Changes

Cohort / File(s)	Change Summary
Index Access Method API `src/mongo/db/index/index_access_method.h`, `src/mongo/db/index/index_access_method.cpp`	Extended `findSingle` to `RecordId findSingle(OperationContext* opCtx, const BSONObj& key, RecordData* record = nullptr) const`. When `record` is provided and a matching entry carries payload, the stored `RecordData` is moved into `*record`.
Execution Stages / Index Emitters `src/mongo/db/exec/idhack.cpp`, `src/mongo/db/exec/index_scan.cpp`, `src/mongo/db/exec/index_iterator.cpp`	Call sites now pass an optional `RecordData` to index lookup. If RecordData is returned, populate `WorkingSetMember::obj` with `{snapshotId, record.releaseToBson()}`, transition to RecordId+Obj and immediately return via `advance(...)`; otherwise continue using RecordId+Idx path.
Eloq Storage & Cursor internals `src/mongo/db/modules/eloq/src/eloq_index.cpp`, `src/mongo/db/modules/eloq/src/eloq_record_store.cpp`, `src/mongo/db/modules/eloq/src/eloq_recovery_unit.*`	Replaced EloqKVPair usage with direct `Eloq::MongoKey`/`Eloq::MongoRecord` buffers (`_recordPtr`, `_idReadRecord`), removed `getKVInternal` and `_kvPair` users, updated seek/read and cursor logic to populate and return record blob data from the new buffer.
Index entry payload support `src/mongo/db/storage/index_entry_comparison.h`	Added `#include "mongo/db/storage/record_data.h"`, a constructor `IndexKeyEntry(BSONObj key, RecordId loc, RecordData record)`, and `boost::optional<RecordData> record` member to carry optional record payloads with index entries.

sequenceDiagram
    participant ES as Execution Stage (idhack/index_scan/index_iterator)
    participant IAM as IndexAccessMethod
    participant IS as Index Storage
    participant RBUF as RecordData Buffer
    participant WSM as WorkingSetMember

    ES->>IAM: findSingle(opCtx, key, &recordData)
    activate IAM
    IAM->>IS: lookup key
    activate IS
    IS-->>IAM: RecordId (+ RecordData?)
    deactivate IS
    IAM->>RBUF: move RecordData into output (if present)
    IAM-->>ES: return RecordId
    deactivate IAM

    ES->>ES: check RecordData present?
    alt RecordData present
        ES->>RBUF: releaseToBson() → BSON
        ES->>WSM: set obj = {snapshotId, BSON}
        ES->>WSM: transitionToRecordIdAndObj(RecordId)
        ES-->>ES: return advance(RecordId, member, out)
    else no RecordData
        ES->>WSM: transitionToRecordIdAndIdx(RecordId)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas needing extra attention:
- Eloq module files (eloq_index.cpp, eloq_record_store.cpp, eloq_recovery_unit.*) for correct lifetime, ownership, and removed KVPair semantics.
- Execution-stage transitions in idhack.cpp, index_scan.cpp, index_iterator.cpp to ensure WorkingSetMember state, snapshotId correctness, and that early returns don't break surrounding control flow.
- index_access_method signature change impact on callers and proper move semantics of RecordData.
- index_entry_comparison.h new RecordData member: ownership and optional semantics.

Possibly related PRs

Converge support eloqdoc #279 — Modifies Eloq internals touching record/kv handling and recovery-unit logic; strongly related to the Eloq changes here.

Suggested reviewers

lzxddz
liunyl

Poem

🐰 I hopped through indexes with a curious nod,
Found blobs in my pocket and skipped over the sod.
From id to object I danced on the log,
Caching payloads snug in a tiny soft bog.
Code nibble, code hop — a rabbit’s small jog. 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main objective: removing EloqRecoveryUnit::_kvPair and related infrastructure. This is supported by changes across multiple files where _kvPair is eliminated and replaced with new patterns.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/mongo/db/exec/idhack.cpp (1)

144-167: Unreachable dead code after early return.

Lines 144-167 are now unreachable due to the unconditional return advance(id, member, out); on line 142. This includes:

Record cursor initialization
Page-in/yield logic
WorkingSetCommon::fetch path

This dead code should be removed to avoid confusion and maintenance burden.

🔎 Proposed fix - remove dead code

         _workingSet->transitionToRecordIdAndObj(id);
         return advance(id, member, out);
-
-        if (!_recordCursor)
-            _recordCursor = _collection->getCursor(getOpCtx());
-
-        // We may need to request a yield while we fetch the document.
-        if (auto fetcher = _recordCursor->fetcherForId(recordId)) {
-            // There's something to fetch. Hand the fetcher off to the WSM, and pass up a
-            // fetch request.
-            _idBeingPagedIn = id;
-            member->setFetcher(fetcher.release());
-            *out = id;
-            return NEED_YIELD;
-        }
-
-        // The doc was already in memory, so we go ahead and return it.
-        if (!WorkingSetCommon::fetch(getOpCtx(), _workingSet, id, _recordCursor)) {
-            // _id is immutable so the index would return the only record that could
-            // possibly match the query.
-            _workingSet->free(id);
-            _commonStats.isEOF = true;
-            _done = true;
-            return IS_EOF;
-        }
-
-        return advance(id, member, out);
     } catch (const WriteConflictException&) {

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8c34ce and 7db3428.

📒 Files selected for processing (7)

src/mongo/db/exec/idhack.cpp (2 hunks)
src/mongo/db/exec/index_scan.cpp (1 hunks)
src/mongo/db/index/index_access_method.cpp (2 hunks)
src/mongo/db/index/index_access_method.h (2 hunks)
src/mongo/db/modules/eloq/src/eloq_index.cpp (5 hunks)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2 hunks)
src/mongo/db/storage/index_entry_comparison.h (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

src/mongo/db/modules/eloq/src/eloq_index.cpp (2)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (5)

mongoKey (714-714)

MONGO_UNREACHABLE (122-124)

MONGO_UNREACHABLE (126-128)

MONGO_UNREACHABLE (130-132)

MONGO_UNREACHABLE (134-136)

src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)

TxErrorCodeToMongoStatus (11-75)

TxErrorCodeToMongoStatus (11-11)

src/mongo/db/exec/idhack.cpp (1)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (4)

id (118-120)

id (118-118)

id (475-507)

id (475-475)

src/mongo/db/index/index_access_method.h (1)

src/mongo/db/index/index_access_method.cpp (2)

findSingle (260-293)

findSingle (260-262)

🔇 Additional comments (14)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

485-506: LGTM - seekExact refactored to use MongoKey-based read path.

The implementation correctly:

Creates a MongoKey from the RecordId

Uses _ru->getKV() to fetch the record into _idReadRecord

Returns the record data from _idReadRecord, which is a class member ensuring the data remains valid after the method returns

596-597: LGTM - Member storage for record data.

Using a class member to store the read record ensures the data remains valid after seekExact returns, which is essential since RecordData returned to callers only holds pointers to this buffer.

src/mongo/db/modules/eloq/src/eloq_index.cpp (5)

108-108: LGTM - Reset properly initializes _recordPtr.

Correctly resets _recordPtr to nullptr during cursor reset to avoid stale pointers.

238-255: LGTM - _idRead refactored to use MongoKey/MongoRecord workflow.

The implementation:

Uses _ru->getKV() to populate _idReadRecord

Sets _recordPtr = &_idReadRecord on success, enabling record data to be returned via _curr()

The pointer is safe since _idReadRecord is a class member

335-360: LGTM - Index type-specific record pointer handling.

Correctly differentiates:

ID index: Sets _recordPtr = _scanTupleRecord since primary index stores full document

UNIQUE/STANDARD: Sets _recordPtr = nullptr since secondary indexes don't store document data

This ensures _curr() only returns record data when it's actually available.

379-388: LGTM - _curr returns IndexKeyEntry with optional record data.

Uses the appropriate IndexKeyEntry constructor based on whether _recordPtr is available, enabling callers to access pre-fetched document data when available.

441-446: LGTM - Well-documented member additions.

The comment clearly explains the optimization purpose of caching record data to avoid re-fetching from the main table for index-organized tables.

src/mongo/db/storage/index_entry_comparison.h (3)

38-38: LGTM - Required include added.

49-51: LGTM - New constructor for IndexKeyEntry with RecordData.

Clean addition of a 3-parameter constructor that properly initializes the new record field.

54-59: Consider whether operator== should account for the new record field.

The operator== (lines 63-65) only compares key and loc, not the new record field. This is likely intentional since two entries with the same key and location are semantically equal regardless of cached data, but worth confirming this is the desired behavior.

src/mongo/db/index/index_access_method.h (1)

185-187: LGTM - Backward-compatible API extension.

The findSingle method signature is extended with an optional RecordData* record = nullptr parameter, maintaining backward compatibility with existing callers while enabling new callers to retrieve the record payload.

src/mongo/db/exec/idhack.cpp (1)

123-142: LGTM - IDHackStage now uses pre-fetched record data.

The optimization correctly:

Retrieves RecordData directly from findSingle

Populates member->obj using recordData.releaseToBson()

Transitions to object state and returns immediately

This avoids the additional record store fetch that was previously required.

src/mongo/db/index/index_access_method.cpp (1)

260-262: LGTM on the signature extension.

The addition of the optional RecordData* record parameter allows callers to retrieve the full document alongside the RecordId, which aligns with the PR's objective to propagate RecordData through index lookups.

src/mongo/db/exec/index_scan.cpp (1)

227-232: The code is correct and follows established patterns.

The RecordData::releaseToBson() method exists and returns a BSONObj, which is the correct type for initializing the Snapshotted<BSONObj> member. The conditional guard if (kv->record) safely protects the method call, and the usage pattern matches similar constructs elsewhere in the codebase (e.g., collection_scan.cpp, idhack.cpp). The snapshot ID is correctly obtained and paired with the released BSON object.

src/mongo/db/index/index_access_method.cpp

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/mongo/db/exec/idhack.cpp (1)

139-167: Dead code after early return must be removed.

The early return advance(id, member, out); on line 142 makes lines 144-167 unreachable. This dead code includes the page-in logic for handling NEED_YIELD scenarios and the WorkingSetCommon::fetch path.

If the intent is that findSingle now always provides the record data (eliminating the need for a separate fetch), this dead code and the commented-out line 140 should be removed for clarity. Otherwise, if the page-in/yield logic is still needed for certain edge cases, the control flow needs restructuring.

🔎 Proposed fix - remove dead code

         member->obj = {getOpCtx()->recoveryUnit()->getSnapshotId(), recordData.releaseToBson()};
-        // _workingSet->transitionToRecordIdAndIdx(id);
         _workingSet->transitionToRecordIdAndObj(id);
         return advance(id, member, out);
-
-        if (!_recordCursor)
-            _recordCursor = _collection->getCursor(getOpCtx());
-
-        // We may need to request a yield while we fetch the document.
-        if (auto fetcher = _recordCursor->fetcherForId(recordId)) {
-            // There's something to fetch. Hand the fetcher off to the WSM, and pass up a
-            // fetch request.
-            _idBeingPagedIn = id;
-            member->setFetcher(fetcher.release());
-            *out = id;
-            return NEED_YIELD;
-        }
-
-        // The doc was already in memory, so we go ahead and return it.
-        if (!WorkingSetCommon::fetch(getOpCtx(), _workingSet, id, _recordCursor)) {
-            // _id is immutable so the index would return the only record that could
-            // possibly match the query.
-            _workingSet->free(id);
-            _commonStats.isEOF = true;
-            _done = true;
-            return IS_EOF;
-        }
-
-        return advance(id, member, out);

♻️ Duplicate comments (1)

src/mongo/db/index/index_access_method.cpp (1)
286-288: Missing optional check before accessing kv->record.value().

This issue was already flagged in a previous review. The code checks if the record pointer is non-null but does not verify that kv->record contains a value before calling .value(). If kv->record is an empty std::optional, this will throw std::bad_optional_access at runtime.
🔎 Proposed fix
-        if (record) {
-            *record = std::move(kv->record.value());
-        }
+        if (record && kv->record) {
+            *record = std::move(*kv->record);
+        }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7db3428 and 01029ee.

📒 Files selected for processing (8)

src/mongo/db/exec/idhack.cpp (2 hunks)
src/mongo/db/exec/index_iterator.cpp (1 hunks)
src/mongo/db/exec/index_scan.cpp (1 hunks)
src/mongo/db/index/index_access_method.cpp (2 hunks)
src/mongo/db/index/index_access_method.h (2 hunks)
src/mongo/db/modules/eloq/src/eloq_index.cpp (5 hunks)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2 hunks)
src/mongo/db/storage/index_entry_comparison.h (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp
src/mongo/db/index/index_access_method.h

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.

📚 Learning: 2025-11-28T10:55:21.020Z

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

src/mongo/db/modules/eloq/src/eloq_index.cpp

🧬 Code graph analysis (1)

src/mongo/db/modules/eloq/src/eloq_index.cpp (2)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

mongoKey (714-714)

src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)

TxErrorCodeToMongoStatus (11-75)

TxErrorCodeToMongoStatus (11-11)

🔇 Additional comments (10)

src/mongo/db/exec/index_iterator.cpp (1)

69-75: LGTM - Conditional record population is well-structured.

The branching logic correctly handles both cases: when a cached record is available (transitioning to RecordIdAndObj with the full document), and when it's not (preserving the previous RecordIdAndIdx behavior). This aligns with the broader optimization to avoid redundant record fetches in index-organized tables.

src/mongo/db/modules/eloq/src/eloq_index.cpp (3)

338-360: Verify _scanTupleRecord lifetime when used through _recordPtr.

In the ID case, _recordPtr is assigned to _scanTupleRecord which is a pointer obtained from scanTuple->record_. Ensure the scan batch tuple remains valid until after _curr() is called and the IndexKeyEntry with record data is consumed. If the batch is invalidated before consumption, this would lead to a use-after-free.

The code path in _idRead is safe since _recordPtr points to _idReadRecord which is owned by the cursor.

379-388: LGTM - Record data propagation in _curr() is well-implemented.

The conditional construction of IndexKeyEntry with or without record data correctly supports the optimization goal. The pattern matches the approach used in EloqRecordStoreCursor::seekExact.

441-446: LGTM - New member variables for record caching.

The addition of _recordPtr for bookkeeping and _idReadRecord for holding blob data from the KV read path supports the index-organized table optimization. The comment clearly explains the purpose.

src/mongo/db/exec/index_scan.cpp (1)

227-232: LGTM - Consistent pattern with other execution stages.

The conditional record population logic mirrors the implementation in index_iterator.cpp, maintaining consistency across execution stages. When a cached record is available, it correctly constructs the full object and transitions to RecordIdAndObj.

src/mongo/db/index/index_access_method.cpp (1)

260-262: LGTM - API extension with backward compatibility.

The optional RecordData* record parameter with default nullptr correctly extends the API while preserving backward compatibility for existing call sites that don't need the record data.

src/mongo/db/storage/index_entry_comparison.h (4)

38-38: LGTM: Include addition is necessary.

The new include for record_data.h is required for the RecordData type used in the new optional member field.

49-50: LGTM: Constructor addition is well-implemented.

The new three-parameter constructor properly uses move semantics and follows the same pattern as the existing two-parameter constructor.

55-58: LGTM: Optional member field is well-documented.

The use of boost::optional<RecordData> is appropriate for optionally caching record data, and the documentation clearly explains the optimization rationale.

63-65: Operator== correctly excludes the record field; no changes needed.

The record field is documented as cached data for optimization ("EloqDoc: Store record data along with index entry to avoid fetching from the main table"). Two IndexKeyEntry objects with identical key and loc but different cached record values represent the same index entry semantically, so excluding record from the equality comparison is intentional and correct. Index entries are uniquely identified by their key and location pair, not by cached data.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/mongo/db/exec/idhack.cpp (1)

123-142: Remove commented-out code and verify the early return is intentional.

Line 140 contains commented-out code (// _workingSet->transitionToRecordIdAndIdx(id);) that should be removed. Additionally, the early return at line 142 makes the entire code block from lines 144-167 unreachable. If this old fetch-on-demand logic is no longer needed, please remove it to improve maintainability. If it's being retained temporarily for rollback purposes, add a TODO comment explaining the intent.

🔎 Proposed cleanup

         member->recordId = recordId;
         member->obj = {getOpCtx()->recoveryUnit()->getSnapshotId(), recordData.releaseToBson()};
-        // _workingSet->transitionToRecordIdAndIdx(id);
         _workingSet->transitionToRecordIdAndObj(id);
         return advance(id, member, out);
-
-        if (!_recordCursor)
-            _recordCursor = _collection->getCursor(getOpCtx());
-
-        // We may need to request a yield while we fetch the document.
-        if (auto fetcher = _recordCursor->fetcherForId(recordId)) {
-            // There's something to fetch. Hand the fetcher off to the WSM, and pass up a
-            // fetch request.
-            _idBeingPagedIn = id;
-            member->setFetcher(fetcher.release());
-            *out = id;
-            return NEED_YIELD;
-        }
-
-        // The doc was already in memory, so we go ahead and return it.
-        if (!WorkingSetCommon::fetch(getOpCtx(), _workingSet, id, _recordCursor)) {
-            // _id is immutable so the index would return the only record that could
-            // possibly match the query.
-            _workingSet->free(id);
-            _commonStats.isEOF = true;
-            _done = true;
-            return IS_EOF;
-        }
-
-        return advance(id, member, out);
     } catch (const WriteConflictException&) {

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01029ee and dc8d1c3.

📒 Files selected for processing (8)

src/mongo/db/exec/idhack.cpp (2 hunks)
src/mongo/db/exec/index_iterator.cpp (1 hunks)
src/mongo/db/exec/index_scan.cpp (1 hunks)
src/mongo/db/index/index_access_method.cpp (2 hunks)
src/mongo/db/index/index_access_method.h (2 hunks)
src/mongo/db/modules/eloq/src/eloq_index.cpp (5 hunks)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2 hunks)
src/mongo/db/storage/index_entry_comparison.h (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)

TxErrorCodeToMongoStatus (11-75)

TxErrorCodeToMongoStatus (11-11)

src/mongo/db/modules/eloq/src/eloq_index.cpp (1)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

mongoKey (714-714)

src/mongo/db/index/index_access_method.h (1)

src/mongo/db/index/index_access_method.cpp (2)

findSingle (260-293)

findSingle (260-262)

src/mongo/db/exec/index_scan.cpp (1)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (4)

id (118-120)

id (118-118)

id (475-507)

id (475-475)

🔇 Additional comments (13)

src/mongo/db/exec/index_iterator.cpp (1)

69-75: LGTM! Conditional record data handling is correct.

The logic appropriately branches based on whether a cached record is available. When entry->record exists, the code populates the working set member with the full BSON object and transitions to RecordIdAndObj state; otherwise, it falls back to the original RecordIdAndIdx state. This optimization avoids redundant fetches from the record store.

src/mongo/db/index/index_access_method.cpp (1)

260-262: LGTM! Optional parameter and guarded access are correctly implemented.

The expanded signature maintains backward compatibility with a default nullptr parameter. The conditional block at lines 286-288 correctly guards both the pointer validity and the optional's presence before accessing .value(), addressing the concern from previous reviews.

Also applies to: 286-288

src/mongo/db/index/index_access_method.h (2)

185-187: LGTM! Backward-compatible API extension.

The optional RecordData* parameter with a default value of nullptr allows callers to optionally retrieve the record payload without breaking existing call sites. This is a clean, non-breaking API change.

1-27: No action needed. This is a newly created file, not a modification of an existing one. The copyright header lines are marked as additions (+), not modifications. There are no whitespace or formatting changes to the header itself—it is the standard MongoDB copyright boilerplate for a new file.

Likely an incorrect or invalid review comment.

src/mongo/db/exec/index_scan.cpp (1)

227-232: LGTM! Consistent record data optimization.

The conditional logic matches the pattern introduced in index_iterator.cpp, providing a consistent approach across execution stages. When a cached record is available, the code avoids redundant record store lookups by populating the working set member directly with the BSON object.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

485-506: LGTM! Record caching optimizes exact seeks.

The refactored implementation correctly uses the per-cursor _idReadRecord member to store the retrieved record, eliminating redundant fetches. The _lastMongoKey updates (via Copy or emplace) ensure the cursor position is tracked properly. The returned RecordData correctly references the cached _idReadRecord's encoded blob data.

596-596: Verify _idReadRecord lifetime safety.

The _idReadRecord member stores the retrieved record data and its lifetime spans across potential cursor operations. Ensure that operations like save(), saveUnpositioned(), and restore() properly handle scenarios where the cached data might be invalidated. Based on the existing cursor lifecycle methods (lines 509-544), this appears to be handled correctly, but please confirm.

src/mongo/db/modules/eloq/src/eloq_index.cpp (3)

108-108: LGTM! Per-cursor record caching for ID index lookups.

The implementation correctly caches the record pointer (_recordPtr) for ID index cursors where the record data is co-located with the index entry. For UNIQUE and STANDARD indexes (lines 348, 355), _recordPtr is set to nullptr, which is correct since those index types store only the RecordId in the value. The reset() method properly initializes _recordPtr to nullptr.

Also applies to: 238-250, 341-360

380-387: LGTM! Conditional record data return is correct.

The _curr() method appropriately returns an IndexKeyEntry with the embedded record data when _recordPtr is non-null, and without it otherwise. This optimization allows downstream execution stages to use the cached record without an additional fetch.

441-445: Verify cursor state management for cached record pointer.

The _recordPtr points to either _idReadRecord or _scanTupleRecord, which have different lifetimes. When pointing to _scanTupleRecord, ensure that cursor operations like save(), saveUnpositioned(), and restore() (lines 189-210) don't leave _recordPtr dangling. Based on the code, _recordPtr is reassigned during _updateIdAndTypeBits() and _idRead(), but please confirm that no operation can invalidate the pointed-to record while _recordPtr is still in use.

src/mongo/db/storage/index_entry_comparison.h (3)

38-38: LGTM! Required dependency for RecordData support.

Adding the record_data.h include is necessary for the new RecordData member in IndexKeyEntry.

49-50: LGTM! Backward-compatible constructor addition.

The new constructor allows creating IndexKeyEntry instances with associated record data. The existing two-parameter constructor remains available, ensuring backward compatibility.

55-58: Verify performance impact of adding optional record member to IndexKeyEntry.

The addition of boost::optional<RecordData> to IndexKeyEntry is technically sound:

Copy/move semantics: Confirmed correct. IndexKeyEntry uses compiler-generated constructors, and RecordData provides nothrow move constructors and assignments, ensuring the optional member is properly moved. The two-argument constructor already uses std::move() for all members.

Memory overhead: boost::optional stores the value and a bool flag in aligned_storage with placement-new lifetime management. For RecordData (~24-32 bytes), the overhead is modest—approximately 1-8 bytes plus alignment padding. However, in large index structures, this compounds across millions of entries.

Performance impact: Verify that the EloqDoc optimization benefit (avoiding main table fetches) outweighs the per-entry overhead for non-EloqDoc storage engines (e.g., WiredTiger). Benchmark index operations: creation, scanning, and comparison performance.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/mongo/db/exec/idhack.cpp (1)

139-167: Dead code after unconditional early return.

The return advance(id, member, out); at line 142 makes lines 144-167 unreachable. This includes the page-in logic (_idBeingPagedIn, fetcher handling) and the WorkingSetCommon::fetch fallback.

If EloqDoc always returns record data via findSingle, this dead code should be removed. If other storage engines may not populate recordData, the early return should be conditional on recordData being valid.

🔎 Option 1: Remove dead code (if record is always available)

         member->obj = {getOpCtx()->recoveryUnit()->getSnapshotId(), recordData.releaseToBson()};
-        // _workingSet->transitionToRecordIdAndIdx(id);
         _workingSet->transitionToRecordIdAndObj(id);
         return advance(id, member, out);
-
-        if (!_recordCursor)
-            _recordCursor = _collection->getCursor(getOpCtx());
-
-        // We may need to request a yield while we fetch the document.
-        if (auto fetcher = _recordCursor->fetcherForId(recordId)) {
-            // There's something to fetch. Hand the fetcher off to the WSM, and pass up a
-            // fetch request.
-            _idBeingPagedIn = id;
-            member->setFetcher(fetcher.release());
-            *out = id;
-            return NEED_YIELD;
-        }
-
-        // The doc was already in memory, so we go ahead and return it.
-        if (!WorkingSetCommon::fetch(getOpCtx(), _workingSet, id, _recordCursor)) {
-            // _id is immutable so the index would return the only record that could
-            // possibly match the query.
-            _workingSet->free(id);
-            _commonStats.isEOF = true;
-            _done = true;
-            return IS_EOF;
-        }
-
-        return advance(id, member, out);

🔎 Option 2: Conditional early return (if record may not always be available)

         member->recordId = recordId;
-        member->obj = {getOpCtx()->recoveryUnit()->getSnapshotId(), recordData.releaseToBson()};
-        // _workingSet->transitionToRecordIdAndIdx(id);
-        _workingSet->transitionToRecordIdAndObj(id);
-        return advance(id, member, out);
+        if (recordData.isValid()) {
+            member->obj = {getOpCtx()->recoveryUnit()->getSnapshotId(), recordData.releaseToBson()};
+            _workingSet->transitionToRecordIdAndObj(id);
+            return advance(id, member, out);
+        }
+        _workingSet->transitionToRecordIdAndIdx(id);

         if (!_recordCursor)
             _recordCursor = _collection->getCursor(getOpCtx());

🧹 Nitpick comments (2)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)
483-504: LGTM with minor log message typo.

The refactored seekExact implementation correctly uses the new getKV path with _idReadRecord to fetch records directly. The flow is clear: construct key, call getKV, handle not-found case, update cursor state, and return the record data.

Minor typo on line 489: "no found" should be "not found".
🔎 Suggested fix
-            MONGO_LOG(1) << "no found. id: " << id << ". Txservice error code: " << err;
+            MONGO_LOG(1) << "not found. id: " << id << ". Txservice error code: " << err;
src/mongo/db/exec/idhack.cpp (1)

140-140: Remove commented-out code.

The commented line // _workingSet->transitionToRecordIdAndIdx(id); is no longer needed and should be removed for cleanliness.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc8d1c3 and 79f54c4.

📒 Files selected for processing (10)

src/mongo/db/exec/idhack.cpp
src/mongo/db/exec/index_iterator.cpp
src/mongo/db/exec/index_scan.cpp
src/mongo/db/index/index_access_method.cpp
src/mongo/db/index/index_access_method.h
src/mongo/db/modules/eloq/src/eloq_index.cpp
src/mongo/db/modules/eloq/src/eloq_record_store.cpp
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h
src/mongo/db/storage/index_entry_comparison.h

💤 Files with no reviewable changes (2)

src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h

🚧 Files skipped from review as they are similar to previous changes (3)

src/mongo/db/index/index_access_method.cpp
src/mongo/db/modules/eloq/src/eloq_index.cpp
src/mongo/db/exec/index_iterator.cpp

🧰 Additional context used

🧬 Code graph analysis (3)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)

TxErrorCodeToMongoStatus (11-75)

TxErrorCodeToMongoStatus (11-11)

src/mongo/db/index/index_access_method.h (1)

src/mongo/db/index/index_access_method.cpp (2)

findSingle (260-293)

findSingle (260-262)

src/mongo/db/exec/idhack.cpp (1)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (4)

id (116-118)

id (116-116)

id (473-505)

id (473-473)

🔇 Additional comments (4)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

594-595: Verify RecordData lifetime assumption.

The _idReadRecord member holds the record data, and seekExact returns a RecordData pointing to _idReadRecord.EncodedBlobData(). This pattern is consistent with _metadata in EloqCatalogRecordStoreCursor (line 149). Ensure callers do not hold the returned RecordData beyond the next cursor operation, as the underlying buffer will be overwritten.

src/mongo/db/storage/index_entry_comparison.h (1)

46-59: Well-structured extension to IndexKeyEntry.

The addition of the optional RecordData field and the new constructor is clean. The use of boost::optional is appropriate for the optional payload. Note that operator== (line 64) intentionally compares only key and loc, not record, which is correct since the record is auxiliary data and identity is determined by key/loc.

src/mongo/db/index/index_access_method.h (1)

185-187: Backward-compatible API extension.

The addition of the optional RecordData* record parameter with a default of nullptr ensures existing callers are unaffected. The implementation (per the relevant snippet from index_access_method.cpp) correctly populates the out-parameter only when non-null and when record data is available.

src/mongo/db/exec/index_scan.cpp (1)

227-232: Effective optimization path for record-backed index entries.

The branching logic correctly handles both cases: when kv->record is available, it populates the object directly via releaseToBson() and transitions to RecordIdAndObj, avoiding a subsequent fetch. When no record is present, it falls back to the prior RecordIdAndIdx behavior. This aligns well with the PR's objective to eliminate redundant record store lookups.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/mongo/db/exec/idhack.cpp (2)

123-142: RecordData-based flow correctly implemented, but creates dead code below.

The new flow correctly obtains RecordData from findSingle, populates the WorkingSetMember with the full object, and returns early via advance(). However, this early return makes all code at lines 144-167 unreachable.

144-167: Remove unreachable dead code.

Lines 144-167 are unreachable due to the early return on line 142. This dead code includes:

_recordCursor initialization
Fetcher logic for paging in documents
WorkingSetCommon::fetch call

This code path is no longer needed since the record is now populated directly from RecordData.

🔎 Proposed fix

         member->obj = {getOpCtx()->recoveryUnit()->getSnapshotId(), recordData.releaseToBson()};
-        // _workingSet->transitionToRecordIdAndIdx(id);
         _workingSet->transitionToRecordIdAndObj(id);
         return advance(id, member, out);
-
-        if (!_recordCursor)
-            _recordCursor = _collection->getCursor(getOpCtx());
-
-        // We may need to request a yield while we fetch the document.
-        if (auto fetcher = _recordCursor->fetcherForId(recordId)) {
-            // There's something to fetch. Hand the fetcher off to the WSM, and pass up a
-            // fetch request.
-            _idBeingPagedIn = id;
-            member->setFetcher(fetcher.release());
-            *out = id;
-            return NEED_YIELD;
-        }
-
-        // The doc was already in memory, so we go ahead and return it.
-        if (!WorkingSetCommon::fetch(getOpCtx(), _workingSet, id, _recordCursor)) {
-            // _id is immutable so the index would return the only record that could
-            // possibly match the query.
-            _workingSet->free(id);
-            _commonStats.isEOF = true;
-            _done = true;
-            return IS_EOF;
-        }
-
-        return advance(id, member, out);
     } catch (const WriteConflictException&) {

🧹 Nitpick comments (1)

src/mongo/db/index/index_access_method.cpp (1)
286-288: Optional access is safe but could use more idiomatic dereference.

The check on line 286 correctly verifies both that the record pointer is non-null and that kv->record contains a value before calling .value() on line 287. However, using the dereference operator *kv->record would be more idiomatic once you've confirmed the optional has a value.
🔎 Suggested refactor
         if (record && kv->record) {
-            *record = std::move(kv->record.value());
+            *record = std::move(*kv->record);
         }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79f54c4 and fb6af58.

📒 Files selected for processing (10)

src/mongo/db/exec/idhack.cpp
src/mongo/db/exec/index_iterator.cpp
src/mongo/db/exec/index_scan.cpp
src/mongo/db/index/index_access_method.cpp
src/mongo/db/index/index_access_method.h
src/mongo/db/modules/eloq/src/eloq_index.cpp
src/mongo/db/modules/eloq/src/eloq_record_store.cpp
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h
src/mongo/db/storage/index_entry_comparison.h

💤 Files with no reviewable changes (2)

src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h

🚧 Files skipped from review as they are similar to previous changes (2)

src/mongo/db/index/index_access_method.h
src/mongo/db/exec/index_scan.cpp

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.

📚 Learning: 2025-11-28T10:55:21.020Z

Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

src/mongo/db/modules/eloq/src/eloq_record_store.cpp
src/mongo/db/modules/eloq/src/eloq_index.cpp

🧬 Code graph analysis (2)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)

TxErrorCodeToMongoStatus (11-75)

TxErrorCodeToMongoStatus (11-11)

src/mongo/db/modules/eloq/src/eloq_index.cpp (6)

MONGO_LOG (187-189)

MONGO_LOG (191-195)

MONGO_LOG (197-208)

MONGO_LOG (210-215)

MONGO_LOG (331-363)

MONGO_LOG (389-413)

src/mongo/db/modules/eloq/src/eloq_index.cpp (2)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (5)

mongoKey (712-712)

MONGO_UNREACHABLE (120-122)

MONGO_UNREACHABLE (124-126)

MONGO_UNREACHABLE (128-130)

MONGO_UNREACHABLE (132-134)

src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)

TxErrorCodeToMongoStatus (11-75)

TxErrorCodeToMongoStatus (11-11)

🔇 Additional comments (12)

src/mongo/db/exec/index_iterator.cpp (1)

69-75: LGTM! Conditional record population implemented correctly.

The branching logic appropriately handles both cases: when the index entry contains record data (populating the full object and transitioning to RecordIdAndObj), and when it doesn't (falling back to the RecordIdAndIdx path). This aligns with the PR's objective to avoid redundant fetches for index-organized tables.

src/mongo/db/storage/index_entry_comparison.h (2)

49-50: LGTM! Constructor correctly uses std::move for efficiency.

The new constructor appropriately moves the RecordData parameter, avoiding unnecessary copies.

55-58: LGTM! Optional RecordData field clearly documented.

The comment effectively explains the EloqDoc-specific use case for storing RecordData alongside the index entry to avoid main table fetches.

src/mongo/db/index/index_access_method.cpp (1)

260-262: LGTM! Signature change is backward compatible.

The new optional RecordData* record parameter with default nullptr maintains backward compatibility with existing call sites while enabling the new payload-return path.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

483-505: LGTM! Direct KV read correctly replaces KVPair-based flow.

The refactored seekExact implementation correctly:

Constructs a MongoKey from the RecordId

Populates _idReadRecord via getKV

Updates cursor state (_lastMongoKey)

Returns RecordData referencing _idReadRecord's internal buffer

The lifetime of the returned RecordData is properly scoped to the cursor member _idReadRecord, matching the pattern used elsewhere in this file (see lines 145-149 for _metadata).

594-594: New member _idReadRecord correctly stores blob data.

The private member serves as stable storage for the record fetched by seekExact, enabling RecordData to reference its buffer without requiring immediate ownership transfer.

src/mongo/db/modules/eloq/src/eloq_index.cpp (6)

65-65: LGTM! Constructor simplified to remove _kvPair initialization.

Only _query needs explicit initialization now that _kvPair has been removed.

236-252: LGTM! Direct KV read correctly populates _idReadRecord.

The refactored _idRead flow:

Creates a MongoKey from the KeyString

Calls getKV to populate _idReadRecord

On success, sets _recordPtr to reference _idReadRecord

On failure, sets _recordPtr to nullptr and marks EOF

This correctly replaces the _kvPair-based approach with direct record storage.

340-340: LGTM! ID index path correctly exposes record data.

For the primary index (ID type), _recordPtr is set to _scanTupleRecord, enabling the caller to access the full document without an additional fetch.

347-359: LGTM! UNIQUE and STANDARD paths correctly omit record data.

Secondary indexes (UNIQUE and STANDARD types) set _recordPtr to nullptr since they don't carry the full record. The explicit break and default: MONGO_UNREACHABLE ensure defensive handling of the enum.

379-386: LGTM! Conditional IndexKeyEntry construction based on record availability.

The _curr method correctly returns:

IndexKeyEntry with record data when _recordPtr is non-null (primary index case)

IndexKeyEntry without record data when _recordPtr is null (secondary index case)

This enables the caller to populate WorkingSetMember appropriately without redundant fetches.

439-443: LGTM! New members support index-organized table semantics.

_recordPtr: Tracks the current record pointer (either _idReadRecord or _scanTupleRecord)

_idReadRecord: Stores blob data for exact-match reads via _idRead

Together, these members enable returning RecordData alongside the index lookup for primary indexes, avoiding the redundant RecordStore fetch described in the PR objectives.

xiexiaoy marked this pull request as ready for review December 21, 2025 09:41

xiexiaoy added the trigger-ci label Dec 21, 2025

coderabbitai bot reviewed Dec 21, 2025

View reviewed changes

src/mongo/db/index/index_access_method.cpp Outdated Show resolved Hide resolved

xiexiaoy force-pushed the refactor branch from 7db3428 to 01029ee Compare December 21, 2025 10:13

coderabbitai bot reviewed Dec 21, 2025

View reviewed changes

xiexiaoy force-pushed the refactor branch from 01029ee to dc8d1c3 Compare December 21, 2025 10:18

coderabbitai bot reviewed Dec 21, 2025

View reviewed changes

xiexiaoy mentioned this pull request Dec 22, 2025

Batch lookup primary table for secondary index scan #327

Closed

xiexiaoy linked an issue Dec 22, 2025 that may be closed by this pull request

[Bug]: EloqRecordStoreCursor::seekExact store_record set to nullptr by concurrent EloqIndexCursor::seekExact #374

Closed

xiexiaoy force-pushed the refactor branch from dc8d1c3 to 79f54c4 Compare December 22, 2025 06:21

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

Remove KVPair

fb6af58

xiexiaoy force-pushed the refactor branch from 79f54c4 to fb6af58 Compare December 22, 2025 06:29

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

xiexiaoy requested a review from githubzilla December 22, 2025 06:38

xiexiaoy linked an issue Dec 22, 2025 that may be closed by this pull request

[Bug]: tests/jstests/core/remove9.js trigger Segmentation fault #376

Closed

xiexiaoy requested a review from lzxddz December 22, 2025 10:54

lzxddz approved these changes Dec 22, 2025

View reviewed changes

xiexiaoy merged commit e3c4f64 into eloqdata:main Dec 22, 2025
2 of 3 checks passed

xiexiaoy deleted the refactor branch December 22, 2025 11:12

This was referenced Dec 30, 2025

align bulk write error with mongodb #402

Merged

Batch lookup primary table for secondary index scan #404

Merged

coderabbitai bot mentioned this pull request Feb 26, 2026

batch read catalog #464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove EloqUnitRecovery::_kvPair#387

Remove EloqUnitRecovery::_kvPair#387
xiexiaoy merged 1 commit intoeloqdata:mainfrom
xiexiaoy:refactor

xiexiaoy commented Dec 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xiexiaoy commented Dec 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xiexiaoy commented Dec 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 21, 2025 •

edited

Loading