-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Description
Separate metadata syncs from the main queue.
Follow-up for #491
Why do this?
Currently, whenever a queue job other than MetadataSyncJob sees an ApiInaccessibleError or RequestTimeoutError, we pause the queue. The queue can be unpaused by the follow actions:
-
Clicking on the "Retry" link from the client. This unpauses the queue and retries the last failed job. If it fails again, the queue will pause once again.
-
Initiating a sync either by our 5-minute background process or clicking on the refresh icon. A sync adds a MetadataSyncJob to the queue before unpausing it. Since MetadataSyncJob is the highest priority job it cuts to the front of the line and keeps retrying until it's successful.
The downside to this approach is that potentially many MetadataSyncJobs can be added to the queue, and they will always cut in line, making it so other jobs have to wait longer and longer before getting processed. For example, if a user clicks refresh 5 times and the background process kicks off a sync, there will be 6 MetadataSyncJobs at the front of the queue. And the longer the client fails to connect to the server, the more MetadataSyncJobs pile up. Even when there are no network issues and the queue is not paused, other tasks kick off syncs. For instance, the reply job triggers a sync so that there isn't a long period of displaying a conversation out of order (this can happen if a message came in during the time period between syncs). Another time a sync is triggered is when a user tries to open a file that no longer exists on the file system.
Also, if we want to sync with the server more often, to decrease the likelihood of journalists viewing out-of-date information when they send a reply, this means more MetadataSyncJobs cutting in line.
What's not in scope?
Before tackling this issue, we should do the following (when all these Issues are closed, our background sync will be the only place MetadataSyncJobs are created):
- Remove the creation of MetadataSyncJobs from other areas of the code (these tasks are being tracked here: Sync icon should be disabled during active sync #388, No longer sync on successful delete or star operation #658, No longer sync on reply success #660, No longer trigger sync when files are missing #670, Trigger sync without calling sync_api after login #671).
- Create a UserMetadataSyncJob for when the user manually makes a sync request by clicking the refresh icon (this task is being tracked here: [refactor] Separate MetadataSync and UserMetadataSync #655, also Sync icon should be disabled during active sync #388 should be done ASAP). This is to ensure that the client can be responsive to user actions.
What's in scope for this issue?
So the scope of this issue is to run MetadataSyncJob outside of the main queue (until we add async job support) and to sync more frequently to make up for removing syncs from other areas of the code. What still needs to be decided by is:
- How often to sync (idea: every 15 seconds or until a MetadataSyncJob is successful, whichever one is longer)
- Whether or not to create another queue just for MetadataSyncJob processing (using a queue fits within our current architecture of using queues for processing jobs, however, it would also make a lot of sense to not use a queue since we don't ever need to line up more than one MetadataSyncJob at a time)
Criteria
- No longer use the main queue to process MetadataSyncJobs
- Never add more than one MetadataSyncJob at a time
- MetadataSyncJob should continue to be derived from ApiJob (the only types of jobs with api access)
- Decrease the time between syncs so that we're never out of sync for too long
- If the queues are paused when a MetadataSyncJob succeeds unpause the queues
Regression Criteria
Given that some operations have repeatedly failed due to network issues
When I do nothing
Then the client should periodically test connectivity
And the error message should disappear when connectivity is restored
And the queue should be resumed when connectivity is restored