docs(design): propose scaling roadmap#954
Conversation
|
Warning Review limit reached
More reviews will be available in 41 minutes and 59 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 3m 55s —— View job Claude's Review of PR #954 — Scaling Roadmap Design DocTasks:
SummaryThis is a solid doc-only PR with well-structured analysis and honest gap identification. The overall framing (current envelope → dimension analysis → gap list → dependency-ordered sequencing) is clear and the design-doc conventions are followed. Most code-level claims check out, but there are two stale line-number citations, one missing companion document that the text promises exists, and one open question that could materially affect the sequencing. Issues Found1. Stale line-number citations in
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive scaling roadmap document (2026_06_12_proposed_scaling_roadmap.md) that analyzes the current scaling limits of elastickv and proposes a prioritized sequence of design proposals to address identified gaps. The reviewer's feedback focuses on strengthening the architectural and operational aspects of the roadmap. Specifically, the feedback highlights the need to address live-migration and rolling-upgrade strategies, apply decoupling patterns (like interceptors) for follower/learner reads, use layered mitigations for upgrade risks during sequencing, and explicitly note the requirement for shared start timestamps in cross-group transactions to ensure MVCC consistency.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| --- | ||
|
|
||
| ## 5. Open Questions |
There was a problem hiding this comment.
When designing a production-grade system migration (such as moving from single-node multi-group to multi-node multi-group), the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption. Additionally, if there are significant operational risks like the inability to perform rolling upgrades, the document should detail potential mitigation strategies (such as implementing a temporary 'bridge' or 'proxy' mode). Consider adding an open question or a dedicated section to address these live-migration and rolling-upgrade strategies.
References
- When designing a production-grade system migration, the plan must consider and detail strategies for live or zero-downtime cutovers (e.g., blue/green, dual-write proxy, logical replication) to avoid service interruption.
- When a design document identifies a significant operational risk, such as the inability to perform rolling upgrades, it must also detail potential mitigation strategies, like implementing a temporary 'bridge' or 'proxy' mode.
| ### Gap 3 — Follower / learner reads | ||
| **Problem.** Reads are leader-only; the learner primitive exists but its | ||
| consumer (off-leader reads) has no design. Read throughput cannot scale past | ||
| one node's leader capacity. **Rough milestones:** (M1) leader-issued read-ts | ||
| pipeline + a replica apply-watermark so a follower/learner can serve a | ||
| snapshot read at a leader-vouched timestamp (CLAUDE.md HLC rule). (M2) lease | ||
| invalidation interaction + staleness bound + adapter read routing to replicas. | ||
| (M3) Jepsen stale-read bound workload. **Depends-on:** Gap 1 (multi-node | ||
| groups to have a remote replica); the learner primitive (already in-tree). |
There was a problem hiding this comment.
When designing the follower/learner reads architecture to decouple low-level generic packages (such as raftengine or store) from high-level domain packages (such as kv or distribution), we should use an interceptor or delegate pattern with interfaces defined in the low-level package. Additionally, ensure that any such interceptor is optional (nil-safe) so that builds or environments with the high-level feature disabled can run without modification.
References
- To prevent tight coupling between low-level generic packages and high-level domain packages, use an interceptor or delegate pattern with interfaces defined in the low-level package. Ensure the interceptor is optional (nil-safe) so that builds or environments with the high-level feature disabled can run without modification.
|
|
||
| --- | ||
|
|
||
| ## 4. Sequencing (dependency-ordered rollout) |
There was a problem hiding this comment.
When planning the sequencing and rollout of these major architectural changes, we must mitigate rolling-upgrade risks. It is recommended to use layered mitigations (such as operational guidelines and admin RPC capability gates) and defer complex coordination protocols (like cluster-version Raft entries) to separate design slices if they have a broader scope. This keeps the initial rollout slices focused and manageable.
References
- When mitigating rolling-upgrade risks in design documents, use layered mitigations (such as operational guidelines and admin RPC capability gates) and defer complex coordination protocols (like cluster-version Raft entries) to separate design slices if they have a broader scope.
| (TSO doc §6 "Guarantee"). Cross-group transactions (`kv/transaction.go`, | ||
| `kv/txn_codec.go`) that span groups led by different nodes need a single | ||
| ordering source for OCC commit-ts comparability. So: per-group renewal fix |
There was a problem hiding this comment.
When designing cross-group transactions at scale, we must ensure that all operations within a single transaction share the same start timestamp to guarantee atomicity and a consistent snapshot view across the involved groups. This is a core correctness requirement for our MVCC system and should be explicitly highlighted in the transaction scaling analysis.
References
- In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1abe3500e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| `--raftJoinAsLearner` exists (`main.go:104`). The learner doc itself scopes | ||
| follower-served reads as an explicit non-goal (§2 "Non-goals", §8 OQ-5): | ||
| learners forward `LinearizableRead` to the leader today. So the read-replica |
There was a problem hiding this comment.
Correct the learner read behavior description
When an attached learner receives LinearizableRead today it does not forward the read to the leader; internal/raftengine/etcd.Engine.handleRead returns ErrNotLeader for any non-leader state, and docs/raft_learner_operations.md documents that the caller must forward to the leader. Leaving this as “learners forward” misstates the current primitive that Gap 3 is supposed to build on, so the follower-read roadmap may omit required forwarding/proxy or API work.
Useful? React with 👍 / 👎.
Issue 1: correct stale line anchors in kv/sharded_coordinator.go - keyviz sampler observeMutation: 1795-1824 -> 1841-1846 - RunHLCLeaseRenewal: 1914-1953 -> 1960-1985; defaultGroup access :1915 -> :1961 Issue 2: soften companion-doc references to in-flight PR #955 form, matching the #945/#951/#953 branch-reference style (3 sites + map line). Issue 3: ground OQ-1 in the actual commitTS logic (nextStartTS/ resolveTxnCommitTS/nextTxnTSAfter all from one c.clock) and annotate §4 step 11 as deferred-pending-OQ-1 with an explicit trigger condition. Inline: fix learner LinearizableRead behaviour (engine returns ErrNotLeader, caller forwards; engine.go:1583); note shared-startTS invariant; add OQ-7 for the single-node->multi-node live cutover / rolling-upgrade strategy.
|
@claude review Addressed in Issue 1 — stale line anchors (fixed). Re-verified against current
The other spot-checked anchors ( Issue 2 — companion doc reference (softened, Option B). The companion is PR #955 ( Issue 3 — OQ-1 / TSO sequencing (took the substantive route: sketched the commitTS logic AND annotated step 11). Grounded in code:
Inline findings:
|
|
Claude finished @bootjp's task in 1m 58s —— View job Re-review of PR #954 — Scaling Roadmap (post
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09031487bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| is in-scope-soon and load-bearing; the dedicated TSO group is only justified | ||
| once (i) multi-node multi-group is real and (ii) cross-group transactions | ||
| are common enough that the batch-allocator amortization pays for the extra | ||
| Raft group. Until both hold, the per-group fix plus the shared-`*HLC` | ||
| property is adequate. The recommendation in §4 sequences the per-group fix |
There was a problem hiding this comment.
Align TSO sequencing with first cross-node txn
This paragraph still frames the dedicated TSO as a throughput/amortization decision and says the per-group fix is adequate until cross-group transactions are common. That conflicts with the later OQ-1 analysis in this same doc: once Gap 1 permits participating groups to be led by different nodes, even the first such cross-group transaction can draw commitTS from incomparable per-node HLCs, so deferring the TSO/interim single-oracle plan based on workload frequency leaves a correctness hole in the roadmap.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bca519a604
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| **Problem.** No startup wiring produces a Raft group whose voters span more | ||
| than one node (`main.go:746` `len(groups) != 1` guard; | ||
| `multiraft_runtime.go:246-254` single-member bootstrap). This blocks |
There was a problem hiding this comment.
Narrow the bootstrap gap to multi-group configs
This overstates the missing bootstrap support: resolveBootstrapServers only rejects --raftBootstrapMembers when len(groups) != 1, and main_bootstrap_test.go has a single group fixed members case with n1,n2, so a single Raft group can already be bootstrapped with voters on multiple nodes from flags and then passed to buildRuntimeForGroup. As written, the roadmap misclassifies existing single-group multi-node/learner-read work as blocked on Gap 1; this should be scoped to multi-group configurations that need per-group voter sets.
Useful? React with 👍 / 👎.
| **Rough milestones:** (M1) data-balance policy (move a range's replica set from | ||
| an over-full node to an under-full one) reusing the M2 migration plane for the | ||
| actual move. (M2) compose with leader balance so the two schedulers don't | ||
| fight. **Depends-on:** M2 migration plane (PR #945) for the data-movement | ||
| mechanism, **and** Gap 1 (multi-node bootstrap) for somewhere to move replicas |
There was a problem hiding this comment.
Separate replica placement from range migration
For the common multi-group topology where each group is bootstrapped with the same voter set, reusing the M2 range-migration plane only changes which Raft group owns the key range; it does not move any replicas off an over-full node because the route model points at a group_id while node placement is controlled by Raft membership. A region-balance design needs an explicit target-group placement constraint or membership-change step, otherwise it can claim to fix per-node data skew while merely moving data between groups colocated on the same nodes.
Useful? React with 👍 / 👎.
|
@codex review Addressed the current-head roadmap review comments.
Verification:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee8c13622b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 8. **Region balance scheduler** (Gap 4). After step 4 (migration plane) and | ||
| step 2 (multi-node). Complement to step 3's leader balance. |
There was a problem hiding this comment.
Add the missing replica-placement prerequisite
This sequencing still unblocks the region-balance scheduler after only the M2 migration plane and multi-node bootstrap, but §3 Gap 4 says the scheduler also needs a replica-placement / membership-change design to create or reshape groups when existing target groups share the same voter set. In that common topology, following this order would ship a scheduler that can move range ownership between groups but still cannot move any replicas off an over-full node, so the rollout list should include that prerequisite here or split it into a prior step.
Useful? React with 👍 / 👎.
| (`main.go:1606-1620`, listener at `:1613`). But a *group whose voter set | ||
| spans more than one node is not deployable from the startup flags*: | ||
| `resolveBootstrapServers` rejects `--raftBootstrapMembers` whenever | ||
| `len(groups) != 1` (`main.go:746`, `ErrBootstrapMembersRequireSingleGroup` |
There was a problem hiding this comment.
Scope bootstrap gap to multi-group configs
resolveBootstrapServers only rejects --raftBootstrapMembers when len(groups) != 1; the single-group path can already bootstrap a multi-voter group from startup flags, which this doc later acknowledges in Gap 1. As written, this current-envelope bullet says no group can span nodes at startup at all, so readers will treat existing single-group multi-voter deployments as unsupported; scope this claim to multi-group configurations.
Useful? React with 👍 / 👎.
| 1. **HLC per-group ceiling renewal fix** (TSO doc §6 / M1). Smallest correct | ||
| change; closes the cross-group monotonicity gap (§1.5) *before* the | ||
| topology that exposes it exists. Land first so multi-node groups are safe | ||
| on arrival. |
There was a problem hiding this comment.
Gate multi-node safety on the TSO decision
This says the per-group HLC renewal fix makes multi-node groups safe on arrival, but §5/OQ-1 and the later step 11 text conclude that the fix only gives per-node monotonicity and that a TSO/single-oracle bridge is required before cross-group transactions led by different nodes. If an implementer follows steps 1–2 as sufficient, the existing cross-group transaction path can issue commitTS values from different HLC instances, so this step should be narrowed to the renewal gap or explicitly gate enabling that topology on the TSO/bridge decision.
Useful? React with 👍 / 👎.
|
|
||
| Status: Proposed | ||
| Author: bootjp | ||
| Date: 2026-06-12 |
There was a problem hiding this comment.
docs/design/README.md requires a new design doc's header date and filename propose date to match the author date of the first commit that added the file. This file is added by c6018d4 with AuthorDate June 23, 2026, but the header says June 12 (and the filename repeats that), so the standard git log --follow --diff-filter=A ... | tail -1 check will disagree with the doc convention from day one.
Useful? React with 👍 / 👎.
| 2. **Operator-driven in-place expansion.** For clusters that can tolerate a | ||
| controlled maintenance window, upgrade all binaries first, verify capability | ||
| convergence, then add voters / learners one group at a time with live | ||
| `AddVoter` / `AddLearner` and keep the old single-voter leader serving until | ||
| the new voter set catches up. Do not permit a flag-only restart to reinterpret |
There was a problem hiding this comment.
Use learners for in-place replica expansion
For expanding an existing single-voter group with a fresh node, allowing direct AddVoter here is operationally unsafe: the learner runbook explains that AddLearner keeps the voter quorum unchanged during catch-up, while AddVoter immediately makes the new node count toward quorum and can stall commits until it catches up. The guardrail should prescribe AddLearner → verify catch-up → PromoteLearner for new replicas, not AddVoter / AddLearner interchangeably.
Useful? React with 👍 / 👎.
|
@codex review Addressed the current rollout-guardrail review items:
Validation: git diff --check. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 360abdc0ce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| load-bearing for one node leading multiple groups; before enabling the first | ||
| cross-group transaction whose participating groups are led by different | ||
| nodes, the roadmap must either pull the dedicated TSO group forward or land a | ||
| narrower single-oracle bridge that pins cross-group timestamp allocation to |
There was a problem hiding this comment.
Gate cross-group txns by coordinator, not leader placement
The trigger is too narrow for the current request path: ShardedCoordinator.Dispatch allocates StartTS and resolveTxnCommitTS before each group's LeaderProxy.Commit forwards the already-stamped pb.Requests, and Internal.Forward preserves non-zero timestamps. In a multi-node deployment, two clients can therefore issue cross-group commit timestamps from different ingress/coordinator nodes even when the participating groups are all led by the same node, so gating only on groups “led by different nodes” leaves that follower-forwarded case outside the TSO/single-oracle guardrail.
Useful? React with 👍 / 👎.
| What is the supported path — operator-driven `AddVoter` expansion of an | ||
| existing single-voter group, blue/green with a dual-write proxy |
There was a problem hiding this comment.
Keep live expansion on learner promotion
This OQ reintroduces direct live AddVoter as a supported in-place path even though §4.1 now says live expansion must add a learner, wait for catch-up, then promote, reserving direct AddVoter for bootstrap/offline fully caught-up peers. Leaving PR #955 to choose “operator-driven AddVoter expansion” makes the guardrail ambiguous and can lead implementers back to the unsafe path this revision was meant to exclude; this option should be phrased as learner-based expansion or explicitly rejected.
Useful? React with 👍 / 👎.
| Both calls go through the same `c.clock`. The apply-time OCC/ownership check | ||
| then compares these timestamps against stored `CommitTS` values (the | ||
| Composed-1 guard, `docs/design/2026_05_29_partial_composed1_cross_group_commit_guard.md` | ||
| §4.2(a)/§4.4; FSM `latest > startTS` write-conflict check). OCC | ||
| serializability depends on those `commitTS` values being **mutually | ||
| comparable** across all participating groups. |
There was a problem hiding this comment.
Include read-only shard validation in txn guardrail
For transactions that read a shard they do not mutate, the current path uses validateReadOnlyShards, whose own comment notes a TOCTOU window between the linearizable barrier and LatestCommitTS check where a concurrent write can go undetected. A TSO/single-oracle bridge does not close that SSI gap, so the roadmap should not describe commit-ts comparability as the remaining OCC serializability requirement for cross-group txns without also gating those read-only-shard shapes or adding the dedicated read-validate FSM phase.
Useful? React with 👍 / 👎.
| - **Range split — distribute a range across groups.** Same-group split | ||
| shipped in M1 (`distribution/`). Cross-group migration (the part that | ||
| actually relocates data and reduces per-node volume) is **PR #945** | ||
| (`docs/design/2026_06_11_proposed_hotspot_split_milestone2_migration.md`, |
There was a problem hiding this comment.
Don't promise M2 reduces per-node data alone
This overstates the scaling effect of the M2 migration plane: moving ownership between Raft groups only reduces per-node data when the target group has a different replica placement. Today's multi-group process hosts every group locally, and the roadmap itself later notes that M2 does not move replicas off an over-full node when groups share the same voter set, so this section should make Gap 1/region-placement a prerequisite for actual per-node volume reduction rather than attributing it to M2 alone.
Useful? React with 👍 / 👎.
| 1. **Capability-gated admin operations.** `raftadmin` / coordinator admin RPCs | ||
| that enable multi-node bootstrap, leader transfers, follower reads, | ||
| migration/import, write fences, or a cross-group timestamp bridge must first | ||
| observe that every target voter advertises the matching capability. Mixed | ||
| binary clusters run in compatibility mode with these features disabled. |
There was a problem hiding this comment.
Check capability on learners before expansion
For the live in-place expansion flow immediately below, the new replica is first added as a learner, so checking only “target voter” capabilities can miss the joiner until after it has already been admitted. To keep mixed-version clusters in compatibility mode, the guardrail should require capability convergence for every target replica/server involved in AddLearner and promotion, not just current or future voters.
Useful? React with 👍 / 👎.
|
@codex review Addressed the new P1 guardrail findings:
Validation: git diff --check; commit hook golangci-lint --config=.golangci.yaml run --fix. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87572881ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - `startTS` via `nextStartTS` (`kv/sharded_coordinator.go:1429`): | ||
| `Observe(maxLatestCommitTS(keys))` then `c.clock.NextFenced()`. The | ||
| `maxLatestCommitTS` floor is a per-key read against the store — it pins | ||
| `startTS` above the latest commit on *the keys this txn touches*, nothing | ||
| more. |
There was a problem hiding this comment.
Limit startTS floor claim to mutation keys
For transactions with ReadKeys that are not also mutation Elems, this overstates what the code observes: Dispatch calls nextStartTS(ctx, reqs.Elems) and maxLatestCommitTS only appends each Elem.Key before calling MaxLatestCommitTS, so read-only participant keys and adapter pre-dispatch reads are not included in this floor. Leaving the roadmap saying the floor covers the keys the txn touches/read can mislead the TSO/read-validation follow-ups into relying on a guarantee that only exists for mutation keys.
Useful? React with 👍 / 👎.
|
|
||
| Status: Proposed | ||
| Author: bootjp | ||
| Date: 2026-06-12 |
There was a problem hiding this comment.
Use the add-commit date in the design header
The design-doc README requires both the filename date and header Date to match the author date of the first commit that added the file. This new file is added by commit 1e510336 on 2026-06-23, but the path/header use 2026-06-12; tooling or review processes following the documented convention will misidentify the proposal's historical anchor. Please rename the file and update this header date to 2026-06-23.
Useful? React with 👍 / 👎.
| - **keyviz** — the per-route load sampler is wired and allocation-free on the | ||
| hot path (`observeMutation`, `kv/sharded_coordinator.go:1841-1846`), with proposed extensions | ||
| for cluster fan-out (`2026_04_27_proposed_keyviz_cluster_fanout.md`), | ||
| subrange sampling (`2026_05_25_proposed_keyviz_subrange_sampling.md`), | ||
| hot-key top-K (`2026_05_28_proposed_keyviz_hot_key_topk.md`), and per-cell | ||
| conflict (implemented). It is the detection signal M3 reuses. |
There was a problem hiding this comment.
Add the missing read-path caveat to keyviz reuse
When M3 or later balancing consumes keyviz as the hotspot signal, this summary omits an important limitation: the in-tree coordinator comment says adapter-direct Redis/DynamoDB/S3 reads that hit MVCCStore.GetAt bypass keyviz, so read-heavy hotspots on those normal adapter paths will be invisible unless read-path sampling is added or listed as a dependency. Without that caveat, the roadmap can incorrectly treat keyviz-driven automation as covering both read and write load.
Useful? React with 👍 / 👎.
| across processes. (M3) Jepsen multi-group multi-node workload. **Depends-on:** | ||
| nothing (it is the root unblocker). *A companion proposal is in review as **PR #955** |
There was a problem hiding this comment.
List the HLC renewal fix as a bootstrap dependency
This section says Gap 1 has no dependencies, but the roadmap later makes the per-group HLC ceiling renewal fix the step that must land before multi-node multi-group bootstrap because default-group-only renewal is unsafe once replicas move across nodes. If PR #955 follows this dependency list rather than the later sequence, it can enable the topology that exposes stale ceilings without the guard; please either scope “depends-on nothing” to design-doc authoring only or list the HLC renewal fix as a rollout prerequisite.
Useful? React with 👍 / 👎.
| pull the dedicated TSO group forward or land a narrower bridge that | ||
| allocates all cross-group `commitTS` values from one designated oracle. The |
There was a problem hiding this comment.
Require the bridge to allocate startTS too
This guardrail narrows the bridge to allocating only commitTS, but in the multi-coordinator case StartTS is also currently allocated before forwarding by Dispatch/nextStartTS. If a single-oracle bridge leaves startTS on each node's local HLC, a transaction can start on one coordinator with a snapshot timestamp below a commit just issued by another coordinator and then read stale data; the bridge needs to pin both startTS and commitTS to the oracle, or the roadmap should explicitly keep those shapes disabled.
Useful? React with 👍 / 👎.
|
@codex review Addressed the current scaling-roadmap followups:
Validation: |
|
Codex Review: Didn't find any major issues. Hooray! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
What
Adds
docs/design/2026_06_12_proposed_scaling_roadmap.md— a doc-only scaling roadmap that surveys what exists, what is proposed, and what is missing for elastickv to scale further (more data, more throughput, more nodes).This is a roadmap / implementation-plan doc, which
docs/design/README.mdexplicitly permits ("Concrete implementation plans"). No code changes.Structure
main.Verified-on-tree claims (selection)
RunHLCLeaseRenewalproposes the ceiling to the default group only (kv/sharded_coordinator.go:1915) — the TSO doc's near-term "all led groups" fix is not yet implemented.--raftBootstrapMembersis rejected for multi-group (main.go:746,len(groups) != 1); each group bootstraps single-member (multiraft_runtime.go:246-254) — the multi-node multi-group gap PR docs(design): propose leader balance scheduler #953 §1.1a (PR0) documents.store/lsm_store.go:273-279); shared-cache is a TODO comment (:117-120), not a design.kv/fsm.go:123-124).AddLearner/PromoteLearner, raftadmin CLI,--raftJoinAsLearner) even though the doc filename still readsproposed— but follower-served reads (its consumer) have no design.Notes
2026_06_12_proposed_multinode_multigroup_bootstrap.md, by name (it is the authoritative spec for Gap 1).docs/hotspot-split-m2-proposal, docs(design): propose hotspot split M3 — automation #951design/hotspot-split-m3-automation, docs(design): propose leader balance scheduler #953design/leader-balance-scheduler).