Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
248 changes: 0 additions & 248 deletions .github/workflows/jepsen-test-scheduled-dedup.yml

This file was deleted.

10 changes: 0 additions & 10 deletions .github/workflows/jepsen-test-scheduled.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,6 @@ jobs:
runs-on: ubuntu-latest
env:
GOCACHE: /tmp/go-build
# Explicit dedup-OFF control baseline. The Redis and DynamoDB adapter
# onePhaseTxnDedup flags are default-on, so this workflow is preserved
# as legacy-path coverage. Pair with the dedup-ON workflow
# (.github/workflows/jepsen-test-scheduled-dedup.yml) which pins both
# env vars to 1. Retirement of this workflow is a follow-up after 30
# days of post-flip data; until then, do NOT remove these env vars —
# without them the two workflows would exercise the same path under
# the new defaults.
ELASTICKV_REDIS_ONEPHASE_DEDUP: "0"
ELASTICKV_DYNAMODB_ONEPHASE_DEDUP: "0"
steps:
- uses: actions/checkout@v6
with:
Expand Down
9 changes: 4 additions & 5 deletions docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
Original file line number Diff line number Diff line change
Expand Up @@ -542,11 +542,10 @@ preserves availability and adds correctness.
`cmd/server/demo.go` with `ELASTICKV_REDIS_ONEPHASE_DEDUP=1`.
- **Scheduled Jepsen run criterion.** 7 consecutive days without
`:duplicate-elements` / `:G-single-item-realtime` in the dedup-mode
workflow (`.github/workflows/jepsen-test-scheduled-dedup.yml`,
daily at 03:17 UTC). The general scheduled workflow
(`.github/workflows/jepsen-test-scheduled.yml`, every 6 h) continues to run *without*
the gate so the legacy path stays covered — both must stay green
for option-2 to be safe to default-on.
workflow. During rollout this was a dedicated daily workflow; after the
default-on soak period, the dedicated workflow was retired and the general
scheduled workflow (`.github/workflows/jepsen-test-scheduled.yml`, every 6 h)
now covers the default dedup-on path.
- **Workflow scope rationale.** The dedup-mode workflow exercises only
the Redis workload. The dedup feature ships behind the Redis
adapter's `onePhaseTxnDedup` flag (RPUSH/LPUSH via
Expand Down
20 changes: 9 additions & 11 deletions docs/design/2026_06_03_partial_dynamodb_onephase_dedup.md
Original file line number Diff line number Diff line change
Expand Up @@ -410,15 +410,11 @@ every replica applying the same log entry.
timeouts (`cmd/server/demo.go`) so leadership flaps during the DynamoDB
workload. The default path has DynamoDB dedup enabled; set
`ELASTICKV_DYNAMODB_ONEPHASE_DEDUP=0` only to reproduce the legacy path.
- **CI — LANDED.** The DynamoDB list-append workload is added to the dedup-mode
workflow (`.github/workflows/jepsen-test-scheduled-dedup.yml`) with
`ELASTICKV_DYNAMODB_ONEPHASE_DEDUP=1` pinned at the job env (read by
`adapter.NewDynamoDBServer` in the demo cluster), a fail-closed gate
assertion before the listeners come up (mirroring the Redis assertion), and
the launch step now also waits on the dynamo listeners (63801-63803). The
general workflow (`.github/workflows/jepsen-test-scheduled.yml`) explicitly
sets `ELASTICKV_DYNAMODB_ONEPHASE_DEDUP=0` so the legacy path stays covered
as a control baseline after default-on.
- **CI — LANDED.** The DynamoDB list-append workload was added to the
dedup-mode workflow during rollout. After the default-on soak period, the
dedicated dedup workflow was retired; the general scheduled workflow
(`.github/workflows/jepsen-test-scheduled.yml`) now runs the default
`DynamoDBServer.onePhaseTxnDedup` path without an env-var opt-out.
- Criterion to default-on: 7 consecutive days without `:duplicate-elements` in
the dedup-mode DynamoDB workload, both workflows green. **Satisfied; this PR
flips `DynamoDBServer.onePhaseTxnDedup`'s default and the env-var sense to
Expand Down Expand Up @@ -473,8 +469,10 @@ change (the probe already exists), no proto change, no new store primitive.
- (2026-06-18) Default-on follow-up: `DynamoDBServer.onePhaseTxnDedup` now
defaults on because the probe-aware FSM reader is everywhere. Operators can
still set `ELASTICKV_DYNAMODB_ONEPHASE_DEDUP=0` or
`WithDynamoOnePhaseTxnDedup(false)` for rollback; the general scheduled
Jepsen workflow pins that opt-out to keep legacy-path coverage.
`WithDynamoOnePhaseTxnDedup(false)` for rollback.
- (2026-06-26) Post-flip CI cleanup: retired the legacy-path scheduled control
by removing the `ELASTICKV_DYNAMODB_ONEPHASE_DEDUP=0` opt-out from the general
scheduled Jepsen workflow and deleting the dedicated dedup-mode workflow.
- (2026-06-03, PR #920 round-1) **Leader-only dedup guard added** per codex P1:
the adapter-local `commitTS` allocation is only safe on the leader, so the
dedup path is gated on `d.coordinator.IsLeader()` (+ `NextFenced` ceiling
Expand Down
34 changes: 20 additions & 14 deletions docs/design/2026_06_10_proposed_redis_onephase_dedup_default_on.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,30 +35,30 @@ The parent design landed an FSM-side dedup probe and an adapter-side
write-set reuse path keyed on a stale `commit_ts` ridden through
`OperationGroup.PrevCommitTS` into a V2 `TxnMeta`. The probe is always
present; emission of `prev_commit_ts != 0` is gated by
`RedisServer.onePhaseTxnDedup` (constructor: `WithOnePhaseTxnDedup`, env:
`ELASTICKV_REDIS_ONEPHASE_DEDUP=1`). The gate stays default-off until
the cluster has uniformly upgraded — the parent's R5 "ship the reader
before the writer" sequencing.
`RedisServer.onePhaseTxnDedup` (constructor: `WithOnePhaseTxnDedup`, env
rollback: `ELASTICKV_REDIS_ONEPHASE_DEDUP=0`). The gate now defaults on
because the cluster has uniformly upgraded — the parent's R5 "ship the reader
before the writer" sequencing is satisfied.

Two Jepsen workflows run the same stress profile (`--time-limit 150
--rate 10 --concurrency 8 --key-count 16 --max-writes-per-key 250
--max-txn-length 4`) every day against `main`:
During rollout, two Jepsen workflows ran the same stress profile
(`--time-limit 150 --rate 10 --concurrency 8 --key-count 16
--max-writes-per-key 250 --max-txn-length 4`) every day against `main`:

| Workflow | Env | Purpose |
|---|---|---|
| [`jepsen-test-scheduled.yml`][off] | `ELASTICKV_REDIS_ONEPHASE_DEDUP` unset (off) | Legacy-path baseline — expected to surface the parent's anomaly class until default-on lands. |
| [`jepsen-test-scheduled-dedup.yml`][on] | `ELASTICKV_REDIS_ONEPHASE_DEDUP=1` (on) | M4 validation — must stay green to authorize default-on. |
| [`jepsen-test-scheduled.yml`][scheduled] | `ELASTICKV_REDIS_ONEPHASE_DEDUP=0` during the temporary control window | Legacy-path baseline after the default flip. Retired on 2026-06-26. |
| `jepsen-test-scheduled-dedup.yml` | `ELASTICKV_REDIS_ONEPHASE_DEDUP=1` (on) | M4 validation to authorize default-on. Deleted on 2026-06-26 after dedup-on became the standard scheduled path. |

[off]: ../../.github/workflows/jepsen-test-scheduled.yml
[on]: ../../.github/workflows/jepsen-test-scheduled-dedup.yml
[scheduled]: ../../.github/workflows/jepsen-test-scheduled.yml

## M4 evidence

The parent design's `M4` criterion is *"7 consecutive days without
`:duplicate-elements` / `:G-single-item-realtime` in the dedup-mode
workflow."*

Dedup-mode (`jepsen-test-scheduled-dedup.yml`) run history on `main`:
Dedup-mode run history on `main` from the retired
`jepsen-test-scheduled-dedup.yml` workflow:

| Date (UTC) | Run | Conclusion |
|---|---|---|
Expand Down Expand Up @@ -145,7 +145,7 @@ subsequent `GET` returns `v` — it fails on the pre-fix build
### M2 — Control workflow disposition

After default-on, `jepsen-test-scheduled.yml` would silently exercise
the same path as `jepsen-test-scheduled-dedup.yml` (unset env true),
the same path as `jepsen-test-scheduled-dedup.yml` (unset env -> true),
so the two workflows would collapse to the same coverage. Two options:

| Option | Effect | Recommendation |
Expand All @@ -158,6 +158,10 @@ to `jepsen-test-scheduled.yml`'s top-level `env:` so the control retains
its meaning across the default flip. The 30-day retirement decision
becomes a follow-up issue.

Post-flip cleanup on 2026-06-26 retires that temporary control: the
`ELASTICKV_REDIS_ONEPHASE_DEDUP=0` opt-out was removed from
`jepsen-test-scheduled.yml`, and the dedicated dedup-mode workflow was deleted.

### M3 — Issue #937 closure

Update [#937](https://github.com/bootjp/elastickv/issues/937) with the
Expand Down Expand Up @@ -245,5 +249,7 @@ One PR, two commits:
After merge: monitor the next 2–3 daily runs of both scheduled
workflows. The dedup-mode workflow must stay green; the control
workflow may or may not surface anomalies — both outcomes are
informative. Roll back via env var (no binary revert) if anything
informative. After the post-flip soak period, the control workflow was
retired and the standard scheduled workflow now covers the dedup-on path.
Roll back via env var (no binary revert) if anything
unexpected appears.
Loading