Make timer task non-blocking#161
Conversation
cgillum
left a comment
There was a problem hiding this comment.
Looking at the new implementation for createTimer, it appears to me that we are creating multiple timers at once, each with an increasing duration. For example, if the max timer duration is 3 days, and a user schedules a timer for 10 days, we'll create 4 timers all at once:
- Fires in 3 days
- Fires in 6 days
- Fires in 9 days
- Fires in 10 days
Am I understanding the implementation correctly, and is this the intended design?
|
Thank you for pointing that out @cgillum. I see how creating the timers in advance would not work since that would mean creating timers that are long. I will work on fixing this implementation. Thank you. |
|
@cgillum @davidmrdavid @kaibocai @shreyas-gopalakrishna Can you please help me review this PR whenever you get a chance? Thank you! |
cgillum
left a comment
There was a problem hiding this comment.
I think this change looks good! I'd love to get another person to double-check for correctness.
I do wonder if we should update the existing longTimer test to help us confirm the timing of the timers? For example, the test currently uses an AtomicInteger counter to ensure that we get the right number of timer-fired events. Should we enhance this test to also keep track of the timing of each of these events to ensure that each one comes in increments of 3 seconds?
|
Good point @cgillum! I just enhanced the |
davidmrdavid
left a comment
There was a problem hiding this comment.
The implementation looks good to me! I just have a question about why we're using System.nanoTime() in the orchestration instead of the "currentUtcDateTime` API.
| CompletableTask<Void> firstTimer = createTimerTask(finalFireAt); | ||
| CompletableFuture<Void> timerChain = createTimerChain(finalFireAt, firstTimer.future); |
There was a problem hiding this comment.
nit: can we add some comments here explaining how the timerChain behaves in both the short timer and long timer cases?
| AtomicReferenceArray<Long> timestamps = new AtomicReferenceArray<>(4); | ||
| DurableTaskGrpcWorker worker = this.createWorkerBuilder() | ||
| .addOrchestrator(orchestratorName, ctx -> { | ||
| timestamps.set(counter.get(), System.nanoTime()); |
There was a problem hiding this comment.
I'm a bit confused as to why System.nanoTime() works here. Shouldn't we be obtaining deterministic timestamps from the ctx object?
There was a problem hiding this comment.
Nvm, I re-read the test and this makes sense. You're measuring the time between replays, not the deterministic time.
Issue describing the changes in this PR
Resolves #136
Cause: When timers are longer than 3 days, we break them up into smaller timers that are 3-day long. However, there was a bug in this logic where we were blocking execution for those smaller timers. This explains why when we have an
anyOfstatement between an external event and a timer longer than 3 days, the external event is missed - we were still just waiting for that long timer to complete.Solution: We have implemented the timer logic to be asynchronous.
Pull request checklist
CHANGELOG.md