redis: handle EXEC transient errors#973
Conversation
|
Warning Review limit reached
More reviews will be available in 52 minutes and 4 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughFlips DynamoDB one-phase transaction deduplication from opt-in ( ChangesDynamoDB One-Phase Dedup Default-On
Redis NOTLEADER Error Promotion
Sequence Diagram(s)sequenceDiagram
participant Client
participant proxyTransactionToLeader
participant handleProxyTxnError
participant handleProxyTxnCommandError
participant writeRedisError
Client->>proxyTransactionToLeader: MULTI/EXEC pipeline
proxyTransactionToLeader->>proxyTransactionToLeader: execute pipelined commands
proxyTransactionToLeader->>handleProxyTxnError: check pipeline-level error
alt transient leader error
handleProxyTxnError->>writeRedisError: write NOTLEADER reply (no double-prefix)
writeRedisError-->>Client: -NOTLEADER <msg>
else TxFailedErr
handleProxyTxnError-->>Client: null-array
else no terminal error
proxyTransactionToLeader->>handleProxyTxnCommandError: scan per-command result handles
alt per-command transient leader error
handleProxyTxnCommandError->>writeRedisError: promote to top-level NOTLEADER
writeRedisError-->>Client: -NOTLEADER <msg>
else ordinary command error
handleProxyTxnCommandError-->>proxyTransactionToLeader: stop=false
proxyTransactionToLeader-->>Client: EXEC array response
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request improves error handling for Redis transactions by promoting transient leadership failures returned by proxied EXEC targets to top-level EXEC errors, preventing clients from treating leadership churn as command data. It also prevents double-prefixing of NOTLEADER errors and updates the Jepsen workload to correctly parse and label transaction errors. Feedback on the changes suggests removing an unreachable nil guard (cmd == nil) in handleProxyTxnCommandError to avoid dead defensive code.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for _, cmd := range cmds { | ||
| if cmd == nil { | ||
| continue | ||
| } | ||
| err := cmd.Err() |
There was a problem hiding this comment.
According to the repository's general rules, we should avoid adding dead defensive code, such as unreachable nil guards, if upstream checks or callers already guarantee that the variables are non-nil. Since execTxPipeline always populates cmds with non-nil *redis.Cmd instances, the cmd == nil check is unreachable and can be safely removed.
for _, cmd := range cmds {
err := cmd.Err()References
- Avoid adding dead defensive code, such as unreachable nil guards, if upstream checks or callers already guarantee that the variables are non-nil.
There was a problem hiding this comment.
対応しました。handleProxyTxnCommandError から到達不能な cmd == nil guard を削除しました。
Validation:
go test ./adapter -run 'Test(IsTransientLeaderRedisError|WriteRedisError|HandleProxyTxnError|HandleProxyTxnCommandError)$'git diff --check
|
DynamoDB single-item one-phase dedup を default-on に変更し、 Validation:
@codex review |
|
Gemini の inline finding に対応し、 Validation:
@codex review |
|
PR を ready for review に切り替えました。最新 HEAD は @codex review |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
adapter/dynamodb_onephase_dedup_test.go (1)
250-266: ⚡ Quick winConsider adding test for option-override-to-disable.
The new tests cover default-on, env opt-out, and option-override-to-enable. For completeness, consider adding a test case where the env var enables dedup (or is unset) but the option disables it:
func TestItemWriteDedup_OptionOverridesEnvToDisable(t *testing.T) { t.Setenv("ELASTICKV_DYNAMODB_ONEPHASE_DEDUP", "1") server := NewDynamoDBServer(nil, store.NewMVCCStore(), newDedupTestCoordinator(store.NewMVCCStore(), 0, false), WithDynamoOnePhaseTxnDedup(false)) require.False(t, server.onePhaseTxnDedup) }This validates that the option can override in both directions, not just enable when env disables.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@adapter/dynamodb_onephase_dedup_test.go` around lines 250 - 266, Add a new test function called TestItemWriteDedup_OptionOverridesEnvToDisable to complete the test coverage for bidirectional option override behavior. In this test, set the environment variable ELASTICKV_DYNAMODB_ONEPHASE_DEDUP to "1" to enable dedup, then create a NewDynamoDBServer with the WithDynamoOnePhaseTxnDedup(false) option to override and disable it. Assert that server.onePhaseTxnDedup is False, validating that the option parameter can override the environment variable to disable dedup, not just enable it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@adapter/dynamodb_onephase_dedup_test.go`:
- Around line 250-266: Add a new test function called
TestItemWriteDedup_OptionOverridesEnvToDisable to complete the test coverage for
bidirectional option override behavior. In this test, set the environment
variable ELASTICKV_DYNAMODB_ONEPHASE_DEDUP to "1" to enable dedup, then create a
NewDynamoDBServer with the WithDynamoOnePhaseTxnDedup(false) option to override
and disable it. Assert that server.onePhaseTxnDedup is False, validating that
the option parameter can override the environment variable to disable dedup, not
just enable it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cefa2c14-ec60-47bb-a4c8-fbf602cfa329
📒 Files selected for processing (10)
.github/workflows/jepsen-test-scheduled-dedup.yml.github/workflows/jepsen-test-scheduled.ymladapter/dynamodb.goadapter/dynamodb_item_write.goadapter/dynamodb_onephase_dedup_test.goadapter/redis.goadapter/redis_error_prefix_test.godocs/design/2026_06_03_partial_dynamodb_onephase_dedup.mdjepsen/src/elastickv/redis_workload.cljjepsen/test/elastickv/redis_workload_test.clj
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
CodeRabbit の nitpick に対応し、 Validation:
@codex review |
|
Codex Review: Didn't find any major issues. Delightful! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
:infoinstead of parsing them as read valuesELASTICKV_DYNAMODB_ONEPHASE_DEDUP=0/WithDynamoOnePhaseTxnDedup(false)as rollback switchesRoot Cause
The scheduled Redis Jepsen run hit leader churn during
MULTI/EXEC. A transient leadership error could surface as an EXEC array element, and the Jepsen client then passed that exception object intoparse-read, crashing the worker instead of recording an operation outcome.DynamoDB one-phase dedup was still opt-in even though the probe-aware FSM reader has already shipped everywhere. That left the safe retry path behind an unnecessary adapter option.
Validation
lein testgo test ./adapter -run 'Test(IsTransientLeaderRedisError|WriteRedisError|HandleProxyTxnError|HandleProxyTxnCommandError)$'go test ./adapter -run 'TestRedis_MultiExec|TestRedisTxnMULTIEXECRetriesOnCoordinatorConflict|TestExecDedup'go test ./adapter -run 'TestItemWriteDedup'go test ./adapter -run 'Test(Dynamo|ItemWriteDedup)'git diff --checkNote:
go test ./adapterwas started as a broader smoke test but produced no output for over 5 minutes, so it was interrupted; the targeted DynamoDB/default-on coverage above completed successfully.Summary by CodeRabbit
New Features
Tests