Skip to content

Not all processes rebind when switching from :: to 0.0.0.0 or vice versa #60086

@hgardneriv

Description

@hgardneriv

Version

v22.14.0

Platform

Darwin YJF7M4-i4Xp 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:55 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6031 arm64

Linux b2dae3f0c05d 6.10.14-linuxkit #1 SMP Thu Mar 20 16:32:56 UTC 2025 aarch64 aarch64 aarch64 GNU/

Subsystem

cluster

What steps will reproduce the bug?

  1. Run program attached: node listen-retry-issue.js

  2. After all processes start (will take time), send SIGTERM to master process to initiate a rolling restart.

    This will switch interfaces (initially from :: to 0.0.0.0 will toggle each time a SIGTERM is sent): kill -SIGTERM <masterPid>

  3. After all workers restart (will take time) notice only a subset will successfully listen successfully

For example: if we have 10 processes and process 5 rebinds first: processes 5, 6, 7, 8, 9 will successfully rebind
processes 0, 1, 2, 3, 4 will not rebind and will retry forever

Depending on the platform tested with may need to send 2 SIGTERMS to trigger the EADDRINUSE scenario.

Also note that not using the ipv6Only listen option as our application must support dual stack when binding to ipv6 addresses.

listen-retry-issue.js

How often does it reproduce? Is there a required condition?

Using the program above it seems to reproduce most of the time. The only scenario that would cause the issue to not surface is if process 0 is the first to listen successfully.

The required condition is:

  1. Multiple processes bound to all interfaces with dual stack (::)
  2. Processes will retry listen when EADDRINUSE error occurs
  3. Switch interface from :: to 0.0.0.0 (or vice versa)
  4. Perform a rolling restart (2 processes at a time) so some processes are still bound to the old interface and port during restart.
  5. After all processes restart, the problem should surface.
  6. Only a subset of workers will successfully bind to the new interface and port

What is the expected behavior? Why is that the expected behavior?

Expected behavior is that all workers successfully rebind to the interface and port, not a subset.

What do you see instead?

A subset of workers processes are successfully rebinding and there is a strange pattern that dictates which workers rebind. For example:

With 10 processes are restarted and worker 6 is the first to bind, workers 6, 7, 8, and 9 will successfully rebind to the interface and port. Workers 0, 1, 2, 3, 4, and 5 will fail to rebind and be in a forever retry loop.

Additional information

Here is annotated log output from the attached program demonstrating the problem.

Scenario

  • 12 processes
  • Process 6 listens first, end up with:
    • processes 6, 7, 8, 9, 10, 11 successfully listening
    • processes 0, 1, 2, 3, 4, 5 in a retry forever loop
    • Expecting all processes successfully rebind

...

The final 2 processes are restarted here

Master process 0 (25286) detected worker 10 (27562) died
Master process 0 (25286) detected worker 11 (27563) died

Process 6 and 7 bind successfully

Worker process 6 (29061) listening on 0.0.0.0:8000
Worker process 7 (29062) listening on 0.0.0.0:8000
Worker process 11 (29207) detected address 0.0.0.0:8000 in use, retrying in 517 MS...
Worker process 10 (29206) detected address 0.0.0.0:8000 in use, retrying in 526 MS...
Worker process 9 (29131) detected address 0.0.0.0:8000 in use, retrying in 537 MS...
Worker process 8 (29130) detected address 0.0.0.0:8000 in use, retrying in 573 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 572 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 580 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 548 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 549 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 528 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 567 MS...
Master process 0 (25286) rolling restart completed, 12 processes restarted

Process 8 and 9 bind successfully

Worker process 8 (29130) listening on 0.0.0.0:8000
Worker process 9 (29131) listening on 0.0.0.0:8000
Worker process 11 (29207) detected address 0.0.0.0:8000 in use, retrying in 555 MS...
Worker process 10 (29206) detected address 0.0.0.0:8000 in use, retrying in 554 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 574 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 556 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 570 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 519 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 535 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 559 MS...

Process 10 and 11 bind successfully

Worker process 10 (29206) listening on 0.0.0.0:8000
Worker process 11 (29207) listening on 0.0.0.0:8000

Processes 0, 1, 2, 3, 4, 5 are in a retry forever loop

Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 523 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 580 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 531 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 566 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 550 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 574 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 514 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 512 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 570 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 573 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 575 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 563 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 567 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 538 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 540 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 529 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 563 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 528 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 549 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 577 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 577 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 557 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 515 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 544 MS...
Worker process 5 (28984) detected address 0.0.0.0:8000 in use, retrying in 571 MS...
Worker process 4 (28985) detected address 0.0.0.0:8000 in use, retrying in 542 MS...
Worker process 3 (28898) detected address 0.0.0.0:8000 in use, retrying in 537 MS...
Worker process 2 (28897) detected address 0.0.0.0:8000 in use, retrying in 548 MS...
Worker process 0 (28821) detected address 0.0.0.0:8000 in use, retrying in 518 MS...
Worker process 1 (28820) detected address 0.0.0.0:8000 in use, retrying in 569 MS...
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions