Retry auto commit upserts on out-of-memory error#369
Conversation
WalkthroughAdds per-call retry loops (up to 5 attempts) around EloqDoc-enabled command execution and data-modifying receiver paths, augments Eloq module logs to include transaction IDs and error details, and enforces an early return when session connections exceed the configured maximum. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e2bde35 to
51545be
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/mongo/db/service_entry_point_common.cpp (1)
1181-1280: Consider extracting a reusable retry helper.The retry logic is duplicated across
receivedInsert,receivedUpdate, andreceivedDelete. Consider extracting a template or lambda-based helper to reduce duplication and ensure consistency:template <typename Op> void executeWithRetry(OperationContext* opCtx, Op&& operation, const char* opName) { int retry = RETRY_NUM; while (retry-- > 0) { try { WriteUnitOfWork wuow(opCtx); operation(); if (opCtx->getRecoveryUnitState() == WriteUnitOfWork::kActiveUnitOfWork) { wuow.commit(); } break; } catch (const DBException& ex) { if ((ex.code() == ErrorCodes::WriteConflict || ex.code() == ErrorCodes::ExceededMemoryLimit) && retry > 0) { LOG(1) << opName << " throw DBException " << ex.what() << " retrying " << (RETRY_NUM - retry) << "/" << RETRY_NUM; continue; } LOG(1) << opName << " throw DBException " << ex.what(); throw; } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp(1 hunks)src/mongo/db/service_entry_point_common.cpp(5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
📚 Learning: 2025-09-25T12:24:06.434Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
Applied to files:
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.
Applied to files:
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.
Applied to files:
src/mongo/db/service_entry_point_common.cpp
🔇 Additional comments (2)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (1)
484-494: LGTM - OUT_OF_MEMORY added as retryable error.This correctly groups
OUT_OF_MEMORYwith other transient/retryable errors for logging purposes, consistent with the retry mechanism being added inservice_entry_point_common.cpp.src/mongo/db/service_entry_point_common.cpp (1)
107-108: LGTM - Retry constant defined.Good practice to define the retry count as a named constant.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/mongo/db/service_entry_point_common.cpp (1)
107-107: Retry loops: missingbreakon success and exceptions are swallowed; also hard‑coded retry countAcross
receivedInsert,receivedUpdate, andreceivedDelete:
- After a successful
wuow.commit(), the loop does notbreak, so the same write is re‑executed up toRETRY_NUMtimes. For non‑idempotent inserts/updates/deletes this can cause duplicate or repeated modifications.- In the
catch (const DBException& ex)block, when the error is non‑retryable or retries are exhausted, you just log andbreak. The function then returns as if the operation succeeded, so callers (andLastError) won’t see the failure. This is a behavior change from the previous code where the exception propagated.- You now catch
DBExceptionhere, so the exception never reaches the outerhandleRequestcatch (AssertionException&), further hiding failures.- The log message hard‑codes
5and(5 - retry), which will diverge from the actual retry count ifRETRY_NUMis changed.These issues were already called out in earlier review comments and still apply here. Please adjust the loops so that:
- On success, you immediately
breakout of the loop.- On a retryable error (
WriteConflict/ExceededMemoryLimit) with attempts left, you log andcontinue.- Otherwise you log and
throw;to propagate the failure.- Use
RETRY_NUMconsistently in the log message instead of the literal5.For example, for
receivedInsert(apply the same pattern to update/delete):- int retry = RETRY_NUM; - while (retry-- > 0) { - try { - - WriteUnitOfWork wuow(opCtx); - performInserts(opCtx, insertOp); - if (opCtx->getRecoveryUnitState() == WriteUnitOfWork::kActiveUnitOfWork) { - wuow.commit(); - } - } catch (const DBException& ex) { - if ((ex.code() == ErrorCodes::WriteConflict || - ex.code() == ErrorCodes::ExceededMemoryLimit) && - retry > 0) { - LOG(1) << "performInserts throw DBException " << ex.what() << " retrying " - << (5 - retry) << "/5"; - continue; - } else { - LOG(1) << "performInserts throw DBException " << ex.what(); - break; - } - } - } + int retry = RETRY_NUM; + while (retry-- > 0) { + try { + WriteUnitOfWork wuow(opCtx); + performInserts(opCtx, insertOp); + if (opCtx->getRecoveryUnitState() == WriteUnitOfWork::kActiveUnitOfWork) { + wuow.commit(); + } + break; // Success – exit retry loop + } catch (const DBException& ex) { + if ((ex.code() == ErrorCodes::WriteConflict || + ex.code() == ErrorCodes::ExceededMemoryLimit) && + retry > 0) { + LOG(1) << "performInserts throw DBException " << ex.what() + << " retrying " << (RETRY_NUM - retry) << "/" << RETRY_NUM; + continue; + } else { + LOG(1) << "performInserts throw DBException " << ex.what(); + throw; // Propagate final / non‑retryable failure + } + } + }Same structure should be applied to the update and delete handlers to avoid duplicate writes and to preserve error propagation semantics.
Also applies to: 1181-1200, 1226-1244, 1260-1278
🧹 Nitpick comments (1)
src/mongo/db/service_entry_point_common.cpp (1)
46-46: Unusedwrite_conflict_exceptionincludeThis header isn’t used in this TU (we only reference
ErrorCodes::WriteConflict). If you don’t plan to usewriteConflictRetry/WriteConflictExceptionhere, consider dropping the include to avoid an extra dependency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/mongo/db/service_entry_point_common.cpp(5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.
Applied to files:
src/mongo/db/service_entry_point_common.cpp
51545be to
7e249eb
Compare
7e249eb to
0e5fe19
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/mongo/db/service_entry_point_common.cpp (3)
1195-1217: Critical: Exception silently swallowed on failure.The retry loop has the same critical issue flagged in the previous review: when retries are exhausted or a non-retryable exception occurs, the code
breaks without re-throwing, so clients won't know the operation failed.Apply this diff to fix:
} else { LOG(1) << "performInserts throw DBException " << ex.what(); - break; + throw; // Re-throw to propagate error to caller }
1241-1262: Critical: Exception silently swallowed on failure.This has the same critical bug as the insert retry loop: exceptions are silently swallowed when retries are exhausted or a non-retryable exception occurs.
Apply this diff to fix:
} else { LOG(1) << "performUpdates throw DBException " << ex.what(); - break; + throw; // Re-throw to propagate error to caller }
1276-1297: Critical: Exception silently swallowed on failure.This has the same critical bug as the insert and update retry loops: exceptions are silently swallowed when retries are exhausted or a non-retryable exception occurs.
Apply this diff to fix:
} else { LOG(1) << "performDeletes throw DBException " << ex.what(); - break; + throw; // Re-throw to propagate error to caller }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/mongo/db/service_entry_point_common.cpp(6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.
Applied to files:
src/mongo/db/service_entry_point_common.cpp
🔇 Additional comments (2)
src/mongo/db/service_entry_point_common.cpp (2)
46-46: LGTM!The include is necessary for handling
WriteConflictexceptions in the retry logic.
107-107: LGTM!The retry constant is appropriately defined in the anonymous namespace.
8861aa0 to
62db8bc
Compare
4214dca to
1fc5831
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/mongo/db/service_entry_point_common.cpp (3)
1232-1235: Critical: Exception still swallowed -breakshould bethrow.The past review flagged this issue (marked as addressed) but the code still shows
breakinstead ofthrow. When a non-retryable exception occurs or retries are exhausted, the exception is silently swallowed and the function returns normally.Apply this diff:
} else { LOG(1) << "performInserts throw DBException " << ex.what(); - break; + throw; }
1278-1281: Critical: Same issue -breakshould bethrow.Same problem as
receivedInsert. The exception is silently swallowed when retries are exhausted or a non-retryable exception occurs.Apply this diff:
} else { LOG(1) << "performUpdates throw DBException " << ex.what(); - break; + throw; }
1314-1317: Critical: Same issue -breakshould bethrow.Same problem as the other receiver functions. The exception is silently swallowed.
Apply this diff:
} else { LOG(1) << "performDeletes throw DBException " << ex.what(); - break; + throw; }
🧹 Nitpick comments (1)
src/mongo/db/service_entry_point_common.cpp (1)
1214-1237: Consider extracting a common retry helper to ensure consistency.There are two slightly different retry patterns in this file:
runCommandImpl: doesn't checkretry > 0in catch, throws for non-retryablereceivedInsert/Update/Delete: checksretry > 0, but breaks (should throw)Both have bugs that result in swallowed exceptions. A common helper function would prevent these inconsistencies:
template <typename Func> void retryOnTransientError(OperationContext* opCtx, const char* opName, Func&& fn) { for (int attempt = 1; attempt <= RETRY_NUM; ++attempt) { try { fn(); return; // Success } catch (const DBException& ex) { if ((ex.code() == ErrorCodes::WriteConflict || ex.code() == ErrorCodes::ExceededMemoryLimit) && attempt < RETRY_NUM) { LOG(1) << opName << " throw DBException " << ex.what() << " retrying " << attempt << "/" << RETRY_NUM; opCtx->sleepFor(Milliseconds(1)); continue; } throw; // Re-throw on final attempt or non-retryable } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp(1 hunks)src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp(2 hunks)src/mongo/db/service_entry_point_common.cpp(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
- src/mongo/db/modules/eloq/src/eloq_record_store.cpp
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.
Applied to files:
src/mongo/db/service_entry_point_common.cpp
🔇 Additional comments (2)
src/mongo/db/service_entry_point_common.cpp (2)
46-46: LGTM!Include is necessary for the
WriteConflicterror code used in the retry logic.
107-108: LGTM!Defining the retry count as a named constant is good practice for maintainability.
1fc5831 to
d6487d9
Compare
c469469 to
20fc04a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/mongo/transport/service_entry_point_impl.cpp (1)
116-124: Minor optimization: Skip coroutine configuration for rejected connections.The coroutine executor configuration executes even when
connectionCount > _maxNumConnections, performing unnecessary work for connections that will be rejected at line 134. Consider moving this block after the capacity check to avoid wasted cycles.} - if (_coroutineExecutor) { - MONGO_LOG(0) << "use coroutine service executor"; - ssm->setServiceExecutor(_coroutineExecutor.get()); - - // work balance - size_t targetThreadGroupId = connectionCount % serverGlobalParams.reservedThreadNum; - ssm->setThreadGroupId(targetThreadGroupId); - MONGO_LOG(0) << "Current ssm is assigned to thread group " << targetThreadGroupId; - } - // Checking if we successfully added a connection above. Separated from the lock so we don't log // while holding it. if (connectionCount > _maxNumConnections) { if (!quiet) { // log() << "connection refused because too many open connections: " << connectionCount; log() << "too many open connections: " << connectionCount; } return; } + if (_coroutineExecutor) { + MONGO_LOG(0) << "use coroutine service executor"; + ssm->setServiceExecutor(_coroutineExecutor.get()); + + // work balance + size_t targetThreadGroupId = connectionCount % serverGlobalParams.reservedThreadNum; + ssm->setThreadGroupId(targetThreadGroupId); + MONGO_LOG(0) << "Current ssm is assigned to thread group " << targetThreadGroupId; + } +
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/mongo/transport/service_entry_point_impl.cpp(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
🔇 Additional comments (1)
src/mongo/transport/service_entry_point_impl.cpp (1)
134-134: Early return prevents critical bugs, socket cleanup verified.The early return is essential and correct. Without it, the code would:
- Log "connection accepted" despite rejecting the connection (incorrect behavior).
- Set up a cleanup hook that calls
_sessions.erase(ssmIt)on an uninitialized iterator (undefined behavior/crash).- Start an untracked session (resource leak).
Socket cleanup is properly handled when returning early: the
sessionobject goes out of scope, triggering~Session(), which closes the socket via ASIO. The explicitreleaseSessionHandler()call in the cleanup hook (line 158) simply nulls the pointer and is not necessary for socket closure.
Summary by CodeRabbit
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.