-
Notifications
You must be signed in to change notification settings - Fork 135
Have stuck detection account for worker-level timeouts as well as client-level #1126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a76ba97 to
11838bf
Compare
9134eaf to
a0cd70e
Compare
…ent-level As reported by #1125, the stuck job detection mechanism currently only considers client-level timeouts and without properly accounting for the possibility of worker-level overrides. Here, make sure to account for worker-level timeouts as well by hosting the job timeout calculation further up in the executor logic. Fixes #1125.
a0cd70e to
7eebbb3
Compare
| } | ||
|
|
||
| return context.WithCancel(ctx) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was doing very little (it's the same as a if jobTimeout > 0 { check) and I needed the conditional for something else, so I just pulled it into execute.
bgentry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great quick fix. Probably worthy of a quick .1 release?
| watchStuckCancel := e.watchStuck(ctx) | ||
| defer watchStuckCancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is so much more readable than the large inline block before. I don't feel like I lose the context and flow of the surrounding code, while the fairly complex stuck job watching logic is nicely contained in a helper. 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah +1. Should have done this from the beginning.
Thx. And yeah, let's get a patch release in given there's no upcoming plans for another release anytime soon. |
Prepare patch release largely containing #1126. [skip ci]
Prepare patch release largely containing #1126. [skip ci]
As reported by #1125, the stuck job detection mechanism currently only
considers client-level timeouts and without properly accounting for the
possibility of worker-level overrides.
Here, make sure to account for worker-level timeouts as well by hosting
the job timeout calculation further up in the executor logic.
Fixes #1125.