Skip to content

Update cluster config#290

Merged
githubzilla merged 6 commits intoeloqdata:mainfrom
githubzilla:update_cluster_config
Nov 6, 2025
Merged

Update cluster config#290
githubzilla merged 6 commits intoeloqdata:mainfrom
githubzilla:update_cluster_config

Conversation

@githubzilla
Copy link
Copy Markdown
Collaborator

@githubzilla githubzilla commented Nov 4, 2025

Summary by CodeRabbit

  • Refactor

    • Streamlined data store service initialization to use explicit node and topology context for more predictable startup.
    • Replaced file-based config loading with in-memory topology derivation and optional peer-based discovery for topology retrieval.
    • Improved startup logic to better handle single-node, bootstrap, and multi-node scenarios, including clearer leader/node determination and adjusted startup conditions.
  • Chores

    • Updated submodule references (metadata only) with no behavioral changes.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 4, 2025

Walkthrough

DataStoreService startup moved from file-based config to in-memory topology derivation. EloqKVEngine::initDataStoreService now requires node/topology parameters and computes dss_leader_id/dss_node_id; DataStoreService::StartService gains leader/node ID parameters. Submodule pointers updated (metadata-only).

Changes

Cohort / File(s) Summary
DataStoreService init & API
src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp, src/mongo/db/modules/eloq/src/eloq_kv_engine.h
Replaced no-arg initDataStoreService() with initDataStoreService(bool isSingleNode, uint32_t nodeId, uint32_t native_ng_id, const std::unordered_map<uint32_t, std::vector<txservice::NodeConfig>>& ng_configs). Removed file-based dss_config.ini loading; topology is derived from a DS peer if provided or from ng_configs (via TxConfigsToDssClusterConfig). Introduced dss_leader_id and dss_node_id. DataStoreService::StartService signature changed to StartService(bool, uint32_t dss_leader_id, uint32_t dss_node_id) and is invoked with computed IDs (bootstrap logic conditional on build). Added includes for unordered_map and sharder.h.
Submodule pointer updates
src/mongo/db/modules/eloq/store_handler, src/mongo/db/modules/eloq/tx_service
Submodule commit hashes updated (metadata-only; no source-code changes).

Sequence Diagram(s)

sequenceDiagram
    participant KV as EloqKVEngine
    participant DSS as DataStoreService

    rect rgb(245,250,255)
    note over KV,DSS: initDataStoreService(new flow)
    KV->>KV: Determine isSingleNode from ipList\nCompute dss_node_id and dss_leader_id
    alt DS peer provided
        KV->>KV: Fetch topology from DS peer
    else No peer
        KV->>KV: Derive topology from ng_configs (TxConfigsToDssClusterConfig)
    end
    alt rocksdb build
        KV->>DSS: StartService(true, dss_leader_id, dss_node_id)
    else other builds
        KV->>DSS: StartService((opt_bootstrap || isSingleNode), dss_leader_id, dss_node_id)
    end
    DSS->>DSS: Initialize internals with explicit IDs
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect topology derivation paths (peer fetch vs ng_configs) and error handling.
  • Verify dss_leader_id / dss_node_id computation and bootstrap boolean across build variants.
  • Ensure signature changes propagate to any other call sites and that added includes compile.

Possibly related PRs

Poem

🐰 I hopped from .ini to in-memory trees,
Leaders and nodes found on a breeze.
No file to rustle, no disk to comb,
Topology stitched — the datastore's home.
I nibble config crumbs and leave things neat.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Update cluster config' is directly related to the main change: updating cluster configuration handling in the DataStoreService initialization by replacing file-based loading with in-memory topology derivation and adding parameterized configuration.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f2e8ab and 6e37eff.

📒 Files selected for processing (1)
  • src/mongo/db/modules/eloq/store_handler (1 hunks)
🔇 Additional comments (1)
src/mongo/db/modules/eloq/store_handler (1)

1-1: Submodule commit verified as valid and accessible.

The updated submodule pointer to 0f1c1243db27802ff96087b1bc868b37010f8511 has been verified:

  • Commit exists remotely and is accessible ✓
  • Contains a focused, legitimate change: adds a no-op check for non-shard data store service configurations
  • Modifies only one file (eloq_data_store_service/data_store_service.cpp) with 7 lines added
  • Low-risk defensive programming pattern with clear intent

The submodule update is sound.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f00dc2 and 143af2a.

📒 Files selected for processing (4)
  • src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp (3 hunks)
  • src/mongo/db/modules/eloq/src/eloq_kv_engine.h (3 hunks)
  • src/mongo/db/modules/eloq/store_handler (1 hunks)
  • src/mongo/db/modules/eloq/tx_service (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-25T11:58:50.446Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_log_service.cmake:116-119
Timestamp: 2025-09-25T11:58:50.446Z
Learning: The build_log_service.cmake file is specifically for "open log" functionality and only supports the ROCKSDB log state type (LOG_STATE_TYPE_RKDB). The full log state configuration with cloud variants (ROCKSDB_CLOUD_S3, ROCKSDB_CLOUD_GCS) is handled in build_eloq_log_service.cmake.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_kv_engine.h
🧬 Code graph analysis (1)
src/mongo/db/modules/eloq/src/eloq_kv_engine.h (1)
src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp (2)
  • initDataStoreService (584-738)
  • initDataStoreService (584-588)
🔇 Additional comments (1)
src/mongo/db/modules/eloq/tx_service (1)

1-1: Verify submodule commit aligns with DataStoreService initialization refactor.

This is a submodule pointer update. Per the enriched summary, this is a metadata-only change with no functional code modifications in the submodule itself. However, the PR refactors initDataStoreService() and StartService() signatures in the parent module (eloq_kv_engine).

Ensure that:

  1. The referenced tx_service commit hash (5cd5be353827a58263b21e1118e37209c00c4cad) is intended and represents a compatible state.
  2. Any tx_service usage of the updated DataStoreService initialization reflects the new parameterized signatures.
  3. The submodule update does not introduce accidental commits or point to an unexpected revision.

bool isSingleNode,
uint32_t nodeId,
uint32_t nativeNgId,
const std::unordered_map<uint32_t, std::vector<txservice::NodeConfig>>& ng_configs);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either 驼峰or下划线

// Fetch ds topology from peer node
if (!EloqDS::DataStoreService::FetchConfigFromPeer(ds_peer_node, ds_config)) {
LOG(ERROR) << "Failed to fetch config from peer node: " << ds_peer_node;
return;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use uasserted to raise an Exception or return an Error.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 143af2a and 5a9da72.

📒 Files selected for processing (1)
  • src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp (3 hunks)
🔇 Additional comments (1)
src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp (1)

686-696: LGTM! Appropriate handling of create_if_missing flag.

The differentiation between RocksDB (non-shared storage) and shared storage builds for the create_if_missing parameter is correct. RocksDB can safely create if missing since there are no conflicts in multi-node deployments, while shared storage must be more careful.

uasserted(ErrorCodes::InternalError, "Current node does not belong to any node group.");
}

bool isSingleNode = eloqGlobalOptions.ipList.find(',') == eloqGlobalOptions.ipList.npos;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify edge cases for single-node detection.

The comma-based check assumes a well-formed comma-separated list. Consider what happens if ipList is empty, has trailing commas, or contains only whitespace.

Run the following script to check how ipList is validated elsewhere in the codebase:


🏁 Script executed:

#!/bin/bash
# Search for ipList validation or parsing logic
rg -nP --type=cpp -C3 'ipList\s*=' 
rg -nP --type=cpp -C3 'ParseNgConfig.*ipList'

Length of output: 1725


Now let me search for the ParseNgConfig implementation in the codebase:

Now let me search the codebase directly for ParseNgConfig implementation:


🏁 Script executed:

#!/bin/bash
# Find ParseNgConfig definition
rg -nP --type=cpp 'ParseNgConfig\s*\(' -A 10 -B 2 | head -80

Length of output: 1478



🏁 Script executed:

#!/bin/bash
# Search for ParseNgConfig definition in header files
fd -e h -e hpp | xargs grep -l "ParseNgConfig" | head -5
# Also search for txservice namespace definition
rg -nP 'namespace txservice' -A 5 | head -20

Length of output: 564



🏁 Script executed:

#!/bin/bash
# Search for how isSingleNode is used after line 376
rg -nP --type=cpp 'isSingleNode' -A 2 -B 2

Length of output: 2220



🏁 Script executed:

#!/bin/bash
# Search for ngConfigs.size() usage to understand relationship with isSingleNode
rg -nP --type=cpp 'ngConfigs\.size\(\)|ngConfigs\.empty\(\)' -B 3 -A 3

Length of output: 1354


Use ngConfigs.size() instead of comma-based ipList check to determine single-node deployment.

The current logic at line 376 determines isSingleNode by checking if ipList contains a comma, but ngConfigs may be populated from ReadClusterConfigFile() (line 334) independently of ipList. This creates a logic inconsistency: if the cluster config file contains multiple node groups but ipList defaults to localAddr.toString() (which has no comma), isSingleNode would incorrectly evaluate to true. Use ngConfigs.size() == 1 instead to reliably detect single-node deployments.

🤖 Prompt for AI Agents
In src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp around line 376, the code
determines isSingleNode by checking for a comma in eloqGlobalOptions.ipList
which is inconsistent with ngConfigs populated from ReadClusterConfigFile();
change the determination to use ngConfigs.size() == 1 instead. Replace the
comma-based ipList check with a check that ngConfigs.size() == 1, and ensure
ngConfigs is the authoritative source (fall back to existing ipList logic only
if ngConfigs is empty), updating any related comments and keeping variable types
and includes unchanged. Ensure the new check preserves existing behavior when
ngConfigs is empty and add a brief comment explaining why ngConfigs is used for
single-node detection.

Comment on lines +603 to 624
if (opt_bootstrap || isSingleNode) {
dss_leader_id = node_id;
}

if (!ds_peer_node.empty()) {
ds_config.SetThisNode(localIp, EloqDS::DataStoreServiceClient::TxPort2DssPort(localPort));
// Fetch ds topology from peer node
if (!EloqDS::DataStoreService::FetchConfigFromPeer(ds_peer_node, ds_config)) {
error() << "Failed to fetch config from peer node: " << ds_peer_node;
uasserted(ErrorCodes::InternalError,
"DataStoreService initialization failed, config file not found: " +
dss_config_file_path);
+"DataStoreService initialization failed: unable to fetch config from peer " +
ds_peer_node);
}
} else {
if (ngConfigs.size() > 1) {
error() << "DSS peer node must be provided in multi-node deployment.";
uasserted(ErrorCodes::InternalError, "DataStoreService initialization failed");
}

EloqDS::DataStoreServiceClient::TxConfigsToDssClusterConfig(
dss_node_id, native_ng_id, ngConfigs, dss_leader_id, ds_config);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Clarify leader ID initialization logic for multi-node deployments.

The dss_leader_id is only set when opt_bootstrap || isSingleNode (line 603-605). For multi-node non-bootstrap scenarios, it remains UNKNOWN_DSS_LEADER_NODE_ID unless a peer is provided. However, the validation at lines 617-620 only enforces that a peer must be provided when ngConfigs.size() > 1, which could lead to an inconsistent state if ngConfigs.size() == 1 but isSingleNode is false.

Consider adding explicit validation to ensure that:

  • Multi-node deployments (where !isSingleNode) always have either a peer node configured OR are in bootstrap mode
  • The relationship between isSingleNode, ngConfigs.size(), and ds_peer_node is clearly enforced

🏁 Script executed:

#!/bin/bash
# Check how UNKNOWN_DSS_LEADER_NODE_ID is handled in StartService
ast-grep --pattern $'StartService($$$, $LEADER_ID, $$$)' 

# Check DataStoreService implementation for leader ID handling
rg -nP --type=cpp -C5 'UNKNOWN_DSS_LEADER_NODE_ID'

Length of output: 1021


I need to gather more information to verify the review comment's concerns. Let me search for where dss_leader_id is used and check for validation logic:


---

🏁 Script executed:

```shell
#!/bin/bash
# Find where dss_leader_id is used after initialization
rg -nP --type=cpp 'dss_leader_id' src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp

# Search for StartService calls in the file
rg -nP --type=cpp 'StartService' src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp -A3 -B3

# Check for any validation that enforces consistency between isSingleNode, ngConfigs, and ds_peer_node
rg -nP --type=cpp '(isSingleNode|ngConfigs\.size|ds_peer_node)' src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp -B2 -A2

Length of output: 2744


I need to verify whether UNKNOWN_DSS_LEADER_NODE_ID is valid when passed to StartService and TxConfigsToDssClusterConfig:


---

🏁 Script executed:

```shell
#!/bin/bash
# Find StartService implementation to see if it validates dss_leader_id
find src/mongo/db/modules/eloq -type f -name "*.cpp" -o -name "*.h" | xargs rg -l 'StartService' | head -5

# Look for TxConfigsToDssClusterConfig to see how it handles dss_leader_id
rg -nP --type=cpp 'TxConfigsToDssClusterConfig' src/mongo/db/modules/eloq/ -A10 | head -30

# Check if there's any constraint relating isSingleNode and ngConfigs.size()
rg -nP --type=cpp 'isSingleNode.*ngConfigs|ngConfigs.*isSingleNode' src/mongo/db/modules/eloq/ -B3 -A3

Length of output: 1795


Comment on lines +613 to +614
+"DataStoreService initialization failed: unable to fetch config from peer " +
ds_peer_node);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unnecessary unary + operator.

The unary + operator before the string literal on line 613 is unnecessary and appears to be a typo.

Apply this diff:

-            uasserted(ErrorCodes::InternalError,
-                      +"DataStoreService initialization failed: unable to fetch config from peer " +
-                          ds_peer_node);
+            uasserted(ErrorCodes::InternalError,
+                      "DataStoreService initialization failed: unable to fetch config from peer " +
+                          ds_peer_node);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
+"DataStoreService initialization failed: unable to fetch config from peer " +
ds_peer_node);
if (!EloqDS::DataStoreService::FetchConfigFromPeer(ds_peer_node, ds_config)) {
error() << "Failed to fetch config from peer node: " << ds_peer_node;
uasserted(ErrorCodes::InternalError,
"DataStoreService initialization failed: unable to fetch config from peer " +
ds_peer_node);
}
🤖 Prompt for AI Agents
In src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp around lines 613 to 614,
there is an unnecessary unary plus operator before a string literal
("+\"DataStoreService initialization failed: ...\"") — remove the leading '+' so
the string concatenation is written normally (e.g., "DataStoreService
initialization failed: unable to fetch config from peer " + ds_peer_node) to
eliminate the typo and potential warning.

@githubzilla githubzilla merged commit 06ebb54 into eloqdata:main Nov 6, 2025
3 checks passed
This was referenced Nov 9, 2025
@coderabbitai coderabbitai bot mentioned this pull request Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants