Collapse in-flight reconnects by coverage to close the SequentialQueue dedup gap#94210
Collapse in-flight reconnects by coverage to close the SequentialQueue dedup gap#94210adhorodyski wants to merge 7 commits into
Conversation
…e dedup gap Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Looks like you've decreased code coverage for some files. Please write tests to increase, or at least maintain, the existing level of code coverage. See our documentation here for how to interpret this table.
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00775b5dfb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…aller-specific side effects always run Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…connectApp on the coverage resolver Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc7717e6dd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@mkhutornyi @blimpich One of you needs to copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button] |
|
@codex review |
|
@MelvinBot review |
ReviewVerdict: The approach is sound and well-tested. I traced the two open codex P2s and the core logic — the design holds up. One of the P2s is already fixed; the other is not a real bug. Details below. The unaddressed codex P2 (full-reconnect cutoff freshness) is a false alarmCodex flagged that dropping a newer full reconnect could let an older in-flight one write a stale
The only residual edge is server cutoff set ahead of a skewed client clock — bounded to one extra, self-correcting reconnect, and it predates this PR (it's exactly what the Core logic checks out
Notes (non-blocking)
Investigation methodSpawned two researchers against Next Steps: Reply with |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc7717e6dd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc7717e6dd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…narrow readUpdateIDFrom return type Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reviewer Checklist
Screenshots/VideosAndroid: HybridAppandroid.movAndroid: mWeb ChromeiOS: HybridAppios.moviOS: mWeb SafariMacOS: Chrome / Safariweb.mov |
|
🤖 Review: looks correct and well-tested. The coverage model is sound, the wiring is right, and the test matrix covers the meaningful cases. One safe-but-worth-noting behavioral change vs. Core logic ✅
One behavioral change to be aware of (safe)This resolver only ever returns Minor notes
✅ All CI green (typecheck, ESLint, tests, perf). The Next Steps: Reply with |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 016db62b15
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| */ | ||
| function resolveReconnectDuplicationConflictAction(persistedRequests: AnyRequest[], ongoingRequest: AnyRequest | null, incomingRequest: AnyRequest): ConflictActionData { | ||
| const incomingCoverage = reconnectCoverageFrom(incomingRequest); | ||
| const isCovered = [ongoingRequest, ...persistedRequests].some((live) => isReconnectFamilyRequest(live) && reconnectCoverageFrom(live) <= incomingCoverage); |
There was a problem hiding this comment.
Preserve reconnect success data when dropping duplicates
When a full ReconnectApp was created before the sidebar loaded and a duplicate full reconnect is requested after isSidebarLoaded becomes true, this coverage check drops the later request. That later request is the one whose getOnyxDataForOpenOrReconnect(..., isSidebarLoaded) contains the public-room successData merges; the earlier queued/in-flight request does not, so its full reconnect response can remove public rooms instead of preserving them. Please either include/merge the reconnect Onyx side effects in the dedupe decision or keep the later request when those side effects differ.
Useful? React with 👍 / 👎.
|
🤖 Regression scan: no data-loss regressions. One confirmed minor over-fetch regression, plus one edge I checked and cleared. ✅ No correctness / data-loss regressionThe dedup only ever drops a reconnect that is a coverage subset of a request that is live (in-flight or queued). The covering request fetches the same-or-wider range and, if it fails, is retried by the queue (durable ongoing request). So a dropped duplicate can never cause a missed backfill. The new behavior where an incoming
|
Not consistent but sometimes this test is failing on iOS. Loading bar doesn't disappear. Also happening on production bug.mov |
mkhutornyi
left a comment
There was a problem hiding this comment.
Please check above comments
|
Hm I Think this should also remove ones skipped by the resolver (as Melvin mentioned it's safe but better to delete them), not sure why I missed this one scenario. We have control there so there should be no over-fetch either, this needs correction. |
@mountiny
Explanation of Change
The SequentialQueue sends one request at a time and parks the rest in a waiting line. The old duplicate-check only looked at the waiting line, never at the request already on the wire. So if a
ReconnectAppwas mid-send and another identical one came in, the second one sailed through. TheNetworkStatecomment even claimed the opposite ("Duplicate reconnectApp() calls are safe, SQ deduplicates them"), which was wrong in exactly that window.This PR adds a resolver,
resolveReconnectDuplicationConflictAction, that also looks at the in-flight request and compares by coverage: how far back a reconnect re-fetches (updateIDFrom, where a full reconnect re-fetches everything). It has three outcomes:ReconnectAppalready covered by one in flight or queued is dropped (noAction).push) and runs after, so we never re-fetch too little.OpenAppcounts as covering an incoming reconnect, sinceOpenAppre-fetches everything.Concrete example: you are on a flaky connection that flips offline then online twice in a second. The first online edge fires a
ReconnectAppand it starts sending. Before it finishes, the second edge fires another identicalReconnectApp. Before this PR, both go out, two large downloads. After, the second one sees the first is already fetching everything it would, and drops itself. One download. If the second reconnect were wider instead (a full reconnect arriving while a small incremental one is in flight), it survives and runs next, so nothing is missed.Only the
RECONNECT_APPcall site uses the new resolver.OpenAppkeeps the behavior it had before this PR: it dedupes through the genericwriteWithNoDuplicatesConflictAction(a queue-only replace where the latest one wins), because itssuccessDatacarries caller-specific preservation writes that coverage cannot see, so anOpenAppthat lands mid-flight still runs and nothing is silently dropped. No other command is affected.This is a safety net, not the main throttle. The bulk of the storm reduction lives in two other PRs that cut the cause: "Stop repeated ReconnectApp calls when the device clock is behind the server" (#93426) and "coalesce redundant GetMissingOnyxMessages while a fetch is in flight" (#93948). This PR adds the queue-level correctness on top: if a duplicate reconnect still slips through, the queue now catches it and drops it. It also fixes the wrong
NetworkStatecomment and documents the resolver inSEQUENTIAL_QUEUE.md.Fixed Issues
$ #92541
PROPOSAL:
Tests
RequestConflictUtilsTest.ts. It covers the coverage-table cases: full/full drops, full/incremental drops, incremental/incremental compared by coverage, incremental/full keeps, andOpenAppduring an in-flight reconnect always runs.SequentialQueueTest.ts. It checks the full flow: a duplicateReconnectAppis dropped so exactly one request goes on the wire; an in-flight incremental plus an incoming full means the full one runs after; a different command is not collapsed; a waiting full is not overwritten by an incoming incremental;IS_LOADING_REPORT_DATAstill settles tofalseafter a drop; a failed in-flight reconnect is retried with no data loss; and an incomingOpenAppduring an in-flight reconnect still runs.Offline tests
N/A
QA Steps
OpenAppruns, the home screen loads fully, and no loading state is stuck.PR Author Checklist
### Fixed Issuessection aboveTestssectionOffline stepssectionQA stepssectiontoggleReportand notonIconClick)Avatar, I verified the components usingAvatarare working as expected)StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))npm run compress-svg)Avataris modified, I verified thatAvataris working as expected in all cases)Designlabel and/or tagged@Expensify/designso the design team can review the changes.mainbranch was merged into this PR after a review, I tested again and verified the outcome was still expected according to theTeststeps.Screenshots/Videos
Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari