Skip to content

Stop leaving sockets in CLOSE_WAIT on failed TLS connections#3634

Merged
bogdan-iancu merged 1 commit intoOpenSIPS:masterfrom
jes:proto-tls-fd
Jun 25, 2025
Merged

Stop leaving sockets in CLOSE_WAIT on failed TLS connections#3634
bogdan-iancu merged 1 commit intoOpenSIPS:masterfrom
jes:proto-tls-fd

Conversation

@jes
Copy link
Copy Markdown
Contributor

@jes jes commented Apr 24, 2025

Summary

Prevent outbound TLS sockets from being left in CLOSE_WAIT after failed handshakes.

Details

When an outbound TLS connection fails during the handshake (e.g. due to missing certificates), a file descriptor sticks around without being closed, and ends up stuck in CLOSE_WAIT.

Although tcp_conn_release() is called, it does not actually close the file descriptor. In other paths this is handled explicitly with close(fd) if the connection's proc_id differs from the current process. This fix follows the same pattern.

Testing confirmed that TLS sockets now close promptly on failure and that working TLS connections continue to function correctly.

Solution

Add a conditional close(fd) in proto_tls_send() after calling tcp_conn_release(), using the pattern already present elsewhere: only close if c->proc_id != process_no.

Compatibility

I don't think this change breaks any other scenarios.

We've been running this in prod for over a year, I unfortunately missed creating a PR for it until now.

@bogdan-iancu bogdan-iancu removed the request for review from liviuchircu April 24, 2025 11:17
con_release:
sh_log(c->hist, TCP_SEND2MAIN, "send 1, (%d)", c->refcnt);
/* close the fd if this process is not meant to own it */
if (c->proc_id != process_no)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jes , are you sure about this fix? the con_release label is jumped (from above) only if a connection was not already found, so the connection was locally (to the process) started. If the conn (and fd) are local to the process, I assume c->proc_id == process_no, right? so the code you added will be all the time false, not executed. Or maybe I'm missing something here :D

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worked on this a long time ago but in my notes it says that c->proc_id was 0 in the offending cases. So maybe the problem is just that c->proc_id wasn't getting set correctly?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jes , I had to time to dig a bit more into this. And the only valid case (according to your description and also to the code), is this one here https://github.com/OpenSIPS/opensips/blob/master/modules/proto_tls/proto_tls.c#L556, when the TCP conn is created (and we have a fd), but the TSL init fails. Here we should do the close on the fd, before jumping to con_release. IMHO this is the proper fix, meaning the fix in the right spot.

@github-actions
Copy link
Copy Markdown

Any updates here? No progress has been made in the last 30 days, marking as stale.

@github-actions github-actions Bot added the stale label May 25, 2025
@jes
Copy link
Copy Markdown
Contributor Author

jes commented May 25, 2025

No updates here.

@stale stale Bot removed the stale label May 25, 2025
@bogdan-iancu bogdan-iancu self-assigned this Jun 12, 2025
@bogdan-iancu bogdan-iancu merged commit fde3ff3 into OpenSIPS:master Jun 25, 2025
72 checks passed
bogdan-iancu added a commit that referenced this pull request Jun 25, 2025
Stop leaving sockets in CLOSE_WAIT on failed TLS connections

(cherry picked from commit fde3ff3)
bogdan-iancu added a commit that referenced this pull request Jun 25, 2025
Stop leaving sockets in CLOSE_WAIT on failed TLS connections

(cherry picked from commit fde3ff3)
bogdan-iancu added a commit that referenced this pull request Jun 25, 2025
Stop leaving sockets in CLOSE_WAIT on failed TLS connections

(cherry picked from commit fde3ff3)
NormB pushed a commit to NormB/opensips that referenced this pull request May 9, 2026
Stop leaving sockets in CLOSE_WAIT on failed TLS connections
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants