Rewrite the park/unpark mechanism#528
Conversation
ghost
left a comment
There was a problem hiding this comment.
The two implementations in tokio-threadpool and tokio-executor are literally the same - copy/paste and return Ok(()) from park in tokio-executor.
| NOTIFY => return Ok(()), | ||
| IDLE => {}, | ||
| _ => unreachable!(), | ||
| if self.state.compare_exchange(NOTIFY, IDLE, SeqCst, SeqCst).is_ok() { |
There was a problem hiding this comment.
| } | ||
|
|
||
| // If the duration is zero, then there is no need to actually block | ||
| if let Some(ref dur) = timeout { |
There was a problem hiding this comment.
This check is new in tokio-executor. @carllerche, any idea why the previous implementation didn't have it? Maybe just an oversight?
There was a problem hiding this comment.
Oversight... it would be nice to unify the two impls instead of duplicating code.
| // Transition to sleeping | ||
| match self.state.compare_and_swap(IDLE, SLEEP, Ordering::SeqCst) { | ||
| NOTIFY => { | ||
| match self.state.compare_exchange(IDLE, SLEEP, SeqCst, SeqCst) { |
There was a problem hiding this comment.
| } | ||
| } | ||
| None => { | ||
| loop { |
There was a problem hiding this comment.
| None => self.condvar.wait(m).unwrap(), | ||
| match timeout { | ||
| Some(timeout) => { | ||
| m = self.condvar.wait_timeout(m, timeout).unwrap().0; |
There was a problem hiding this comment.
|
+1 to unifying the implementations. I'm a little hesitant to test this out though because we don't have a consistent reproducer, nor a good theory about why this change would help with the bug. I'm concerned about this turning into a "changing things until the problem goes away" exercise. |
| // Notified before we could sleep, consume the notification and | ||
| // exit | ||
| self.state.store(IDLE, Ordering::SeqCst); | ||
| self.state.store(IDLE, SeqCst); |
There was a problem hiding this comment.
This seems problematic for the same reason stated in the issue.
Given that this is intended to acquire all memory from unpark threads, it needs to establish a "happens-before" relationship with
tokio/tokio-executor/src/park.rs
Line 303 in e433282
I do not think that this is done (though, again, MFENCE on x86 will work).
|
@jsgf Well, does reverting back to tokio-threadpool 0.1.4 fix the problem? |
|
@carllerche I've been holding off until we're sure we can get a clear signal. It repros relatively rarely, and it might have been happening for a while at some rate. I'm considering the possibility that the bug is actually elsewhere, and the changes to park/unpark are just changing the timing or something else so it presents differently. I'm currently re-learning TLA+ so I can model park/unpark and see if there's something subtle we're all missing - with luck that will turn something up and so we can move forward with some confidence. If it doesn't then either the problem is somewhere else or the model is bad (but model+inspection gives more confidence than inspection alone). |
|
@jsgf I agree that I am skeptical the bug is w/ park / unpark. |
|
@stjepang This looks good to me, but I am a bit hesitant to merge w/o a repro or at least an understanding as to what the bug is. Do you have thoughts on ways we might be able to repro? |
|
So far I haven't been able to find a problem with either the 0.1.4 or 0.1.5 park/unpark implementations. |
| // Notified before we could sleep, consume the notification and | ||
| // exit | ||
| self.state.store(IDLE, Ordering::SeqCst); | ||
| self.state.swap(IDLE, SeqCst); |
There was a problem hiding this comment.
Using swap as a store seems like overkill since the value is discarded. On x86 this will generate an unnecessary locked instruction. (ie, the std version should use store.)
There was a problem hiding this comment.
I believe that the version in std is incorrect. The swap is required to establish ordering with this line in unpark. Without doing a "read" operation here, visibility cannot be acquired.
There was a problem hiding this comment.
I'm not sure I follow. I didn't check std, but I assume the swap here came from there.
Also, my understanding is that SeqCst should establish total ordering with other SeqCst loads and stores, so this can be a plain store and still be ordered with respect to compare_exchange. (The std::atomic::Ordering docs are irritatingly imprecise here though, so I don't know if I'm actually right - it seems to imply that SeqCst isn't meaningful for store.)
There was a problem hiding this comment.
A store operation with Release ordering (or stronger) writes to memory and allows other future load operations to synchronize with it.
However, here we also want to perform a load operation at the same time (hence swap) in order to synchronize with whoever last wrote to this location (i.e. acquire their writes to memory).
There was a problem hiding this comment.
@jsgf, from an x86 codegen perspective, I don’t think a SeqCst store or swap is going to matter - it’s going to be some instruction that entails LOCK like behavior for both. In fact, I see xchg reg, mem on the playground for both.
There was a problem hiding this comment.
Without doing a "read" operation here, visibility cannot be acquired.
Isn't the read at line 257 sufficient?
There was a problem hiding this comment.
Isn't the read at line 257 sufficient?
I was about to ask the same. The read of NOTIFY with SeqCst is sufficient to establish synchronization with the thread that notified.
Of course release/acquire would be sufficient as well but I am not going to try playing that game again ;)
There was a problem hiding this comment.
No, because a third thread can come along and unpark while the park fn is between 257 and 262. The bug is specifically that the park thread must acquire the memory from a third thread that calls unpark before the park thread hits line 262.
|
Did this ever get anywhere once it ended up becoming a rust-lang discussion? Is the erroneous behavior that led us to believe this was the bug still present? |
|
In #525, a deadlock was reported where #459 seems like the only meaningful change. Since the issue seems to be so difficult to track down, I think we should merge this PR and go with the same park/unpark implementation that has already been battle-tested in I really wonder if this will fix deadlocks. @carllerche, what do you think? |
|
I think unpark() needs to be changed as well. See rust-lang/rust#53366 and rust-lang/rust#54174. |
|
@stjepang What do you think about closing this since it seems unlikely that the bug is in our code. Also the std version has evolved since this PR. |
|
Or at least updating to track std's latest changes. |
|
@carllerche I'm going to update to reflect the std's latest implementation. A question regarding
Originally I thought of putting an instance of |
|
@stjepang right, well the one that is stored in the thread local is to avoid having to recreate all the state each time when calling |
Right. However, |
|
@stjepang it probably should be in a separate crate to be honest... maybe |
235: Add Parker for thread parking r=stjepang a=stjepang This is just an extracted copy of the current implementation of `thread::park()` and `thread::unpark()`. `Parker` is a low-level thread synchronization primitive useful for building others (like locks etc.). It would be useful in tokio-rs/tokio#528 and might remove some cruft and unnecessary TLS access from `context.rs` in `crossbeam-channel`. I've also added a fast-path check for `timeout == Duration::from_secs(0)`, which Tokio relies on. An interesting peculiarity of `Parker` is that it's not split into `Parker` and `Unparker`, which means any thread can call `park()` at any time. However, if multiple threads call `park()` at the same time, expect deadlocks or panics - that is the user's problem. Splitting the primitive into an owned `Parker` and shared `Unparker` would require us to wrap the inner structure into an `Arc`, which comes at a cost of allocation and indirection. Since this is a very low-level primitive, I chose not to do that. The reason why this is in `crossbeam-utils` is because it's a really simple primitive with no dependencies. Also, `tokio-executor` and `tokio-threadpool` really don't want to pull in the whole `crossbeam` in order to use this. cc @carllerche Co-authored-by: Stjepan Glavina <stjepang@gmail.com>
|
I've updated the PR to use |
|
@carllerche Just a ping for review :) |
The
park/unparkmechanism now has identical implementation tostd::thread::{park,park_timeout,unpark}.Our previous implementation was cutting corners in a few places and while it seemed obviously correct, we decided to avoid playing with fire and just copy what
std::threaddoes line by line.