-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
Description
Version
v22.14.0
Platform
Darwin YJF7M4-i4Xp 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:55 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6031 arm64
Linux b2dae3f0c05d 6.10.14-linuxkit #1 SMP Thu Mar 20 16:32:56 UTC 2025 aarch64 aarch64 aarch64 GNU/
Subsystem
cluster
What steps will reproduce the bug?
-
Run program attached:
node listen-retry-issue.js -
After all processes start (will take time), send
SIGTERMto master process to initiate a rolling restart.This will switch interfaces (initially from :: to 0.0.0.0 will toggle each time a SIGTERM is sent):
kill -SIGTERM <masterPid> -
After all workers restart (will take time) notice only a subset will successfully listen successfully
For example: if we have 10 processes and process 5 rebinds first: processes 5, 6, 7, 8, 9 will successfully rebind
processes 0, 1, 2, 3, 4 will not rebind and will retry forever
Depending on the platform tested with may need to send 2 SIGTERMS to trigger the EADDRINUSE scenario.
Also note that not using the ipv6Only listen option as our application must support dual stack when binding to ipv6 addresses.
How often does it reproduce? Is there a required condition?
Using the program above it seems to reproduce most of the time. The only scenario that would cause the issue to not surface is if process 0 is the first to listen successfully.
The required condition is:
- Multiple processes bound to all interfaces with dual stack (::)
- Processes will retry listen when
EADDRINUSEerror occurs - Switch interface from
:: to 0.0.0.0(or vice versa) - Perform a rolling restart (2 processes at a time) so some processes are still bound to the old interface and port during restart.
- After all processes restart, the problem should surface.
- Only a subset of workers will successfully bind to the new interface and port
What is the expected behavior? Why is that the expected behavior?
Expected behavior is that all workers successfully rebind to the interface and port, not a subset.
What do you see instead?
A subset of workers processes are successfully rebinding and there is a strange pattern that dictates which workers rebind. For example:
With 10 processes are restarted and worker 6 is the first to bind, workers 6, 7, 8, and 9 will successfully rebind to the interface and port. Workers 0, 1, 2, 3, 4, and 5 will fail to rebind and be in a forever retry loop.
Additional information
Here is annotated log output from the attached program demonstrating the problem.
Scenario
- 12 processes
- Process 6 listens first, end up with:
- processes 6, 7, 8, 9, 10, 11 successfully listening
- processes 0, 1, 2, 3, 4, 5 in a retry forever loop
- Expecting all processes successfully rebind
...
The final 2 processes are restarted here
Master process 0 (25286) detected worker 10 (27562) died
Master process 0 (25286) detected worker 11 (27563) died
Process 6 and 7 bind successfully
Worker process 6 (29061) listening on 0.0.0.0:8000
Worker process 7 (29062) listening on 0.0.0.0:8000
Worker process 11 (29207) detected address 0.0.0.0:8000 in use, retrying in 517 MS...
Worker process 10 (29206) detected address 0.0.0.0:8000 in use, retrying in 526 MS...
Worker process 9 (29131) detected address 0.0.0.0:8000 in use, retrying in 537 MS...
Worker process 8 (29130) detected address 0.0.0.0:8000 in use, retrying in 573 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 572 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 580 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 548 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 549 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 528 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 567 MS...
Master process 0 (25286) rolling restart completed, 12 processes restarted
Process 8 and 9 bind successfully
Worker process 8 (29130) listening on 0.0.0.0:8000
Worker process 9 (29131) listening on 0.0.0.0:8000
Worker process 11 (29207) detected address 0.0.0.0:8000 in use, retrying in 555 MS...
Worker process 10 (29206) detected address 0.0.0.0:8000 in use, retrying in 554 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 574 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 556 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 570 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 519 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 535 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 559 MS...
Process 10 and 11 bind successfully
Worker process 10 (29206) listening on 0.0.0.0:8000
Worker process 11 (29207) listening on 0.0.0.0:8000
Processes 0, 1, 2, 3, 4, 5 are in a retry forever loop
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 523 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 580 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 531 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 566 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 550 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 574 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 514 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 512 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 570 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 573 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 575 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 563 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 567 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 538 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 540 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 529 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 563 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 528 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 549 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 577 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 577 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 557 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 515 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 544 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 571 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 542 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 537 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 548 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 518 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 569 MS...
...