[peer_handler] Take the peers lock before getting messages to send#891
Merged
TheBlueMatt merged 1 commit intolightningdevkit:mainfrom Apr 22, 2021
Merged
Conversation
Previously, if a user simultaneously called `PeerHandler::process_events()` from two threads, we'd race, which ended up sending messages out-of-order in the real world. Specifically, we first called `get_and_clear_pending_msg_events`, then take the `peers` lock and push the messages we got into the sending queue. Two threads may both get some set of messages to send, but then race each other into the `peers` lock and send the messages in random order. Because we already hold the `peers` lock when calling most message handler functions, we can simply take the lock before calling `get_and_clear_pending_msg_events`, solving the race.
TheBlueMatt
added a commit
to TheBlueMatt/rust-lightning
that referenced
this pull request
Apr 21, 2021
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
Codecov Report
@@ Coverage Diff @@
## main #891 +/- ##
==========================================
- Coverage 90.30% 90.29% -0.01%
==========================================
Files 57 57
Lines 29225 29225
==========================================
- Hits 26392 26390 -2
- Misses 2833 2835 +2
Continue to review full report at Codecov.
|
TheBlueMatt
added a commit
to TheBlueMatt/rust-lightning
that referenced
this pull request
Apr 21, 2021
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
jkczyz
approved these changes
Apr 22, 2021
Member
|
utACK If I understand correctly, read handling locks this in |
Member
|
Fixes #888 |
TheBlueMatt
added a commit
to TheBlueMatt/rust-lightning
that referenced
this pull request
May 24, 2021
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
TheBlueMatt
added a commit
to TheBlueMatt/rust-lightning
that referenced
this pull request
May 25, 2021
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
TheBlueMatt
added a commit
to TheBlueMatt/rust-lightning
that referenced
this pull request
May 27, 2021
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
TheBlueMatt
added a commit
to TheBlueMatt/rust-lightning
that referenced
this pull request
May 31, 2021
While trying to debug the issue ultimately tracked down to a `PeerHandler` locking bug in lightningdevkit#891, the ability to deliver only individual messages at a time in chanmon_consistency looked important. Specifically, it initially appeared there may be a race when an update_add_htlc was delivered, then a node sent a payment, and only after that, the corresponding commitment-signed was delivered. This commit adds such an ability, greatly expanding the potential for chanmon_consistency to identify channel state machine bugs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, if a user simultaneously called
PeerHandler::process_events()from two threads, we'd race, whichended up sending messages out-of-order in the real world.
Specifically, we first called
get_and_clear_pending_msg_events,then take the
peerslock and push the messages we got into thesending queue. Two threads may both get some set of messages to
send, but then race each other into the
peerslock and send themessages in random order.
Because we already hold the
peerslock when calling most messagehandler functions, we can simply take the lock before calling
get_and_clear_pending_msg_events, solving the race.