Stop leaving sockets in CLOSE_WAIT on failed TLS connections#3634
Stop leaving sockets in CLOSE_WAIT on failed TLS connections#3634bogdan-iancu merged 1 commit intoOpenSIPS:masterfrom
Conversation
| con_release: | ||
| sh_log(c->hist, TCP_SEND2MAIN, "send 1, (%d)", c->refcnt); | ||
| /* close the fd if this process is not meant to own it */ | ||
| if (c->proc_id != process_no) |
There was a problem hiding this comment.
@jes , are you sure about this fix? the con_release label is jumped (from above) only if a connection was not already found, so the connection was locally (to the process) started. If the conn (and fd) are local to the process, I assume c->proc_id == process_no, right? so the code you added will be all the time false, not executed. Or maybe I'm missing something here :D
There was a problem hiding this comment.
I worked on this a long time ago but in my notes it says that c->proc_id was 0 in the offending cases. So maybe the problem is just that c->proc_id wasn't getting set correctly?
There was a problem hiding this comment.
@jes , I had to time to dig a bit more into this. And the only valid case (according to your description and also to the code), is this one here https://github.com/OpenSIPS/opensips/blob/master/modules/proto_tls/proto_tls.c#L556, when the TCP conn is created (and we have a fd), but the TSL init fails. Here we should do the close on the fd, before jumping to con_release. IMHO this is the proper fix, meaning the fix in the right spot.
|
Any updates here? No progress has been made in the last 30 days, marking as stale. |
|
No updates here. |
Stop leaving sockets in CLOSE_WAIT on failed TLS connections (cherry picked from commit fde3ff3)
Stop leaving sockets in CLOSE_WAIT on failed TLS connections (cherry picked from commit fde3ff3)
Stop leaving sockets in CLOSE_WAIT on failed TLS connections (cherry picked from commit fde3ff3)
Stop leaving sockets in CLOSE_WAIT on failed TLS connections
Summary
Prevent outbound TLS sockets from being left in
CLOSE_WAITafter failed handshakes.Details
When an outbound TLS connection fails during the handshake (e.g. due to missing certificates), a file descriptor sticks around without being closed, and ends up stuck in
CLOSE_WAIT.Although
tcp_conn_release()is called, it does not actually close the file descriptor. In other paths this is handled explicitly withclose(fd)if the connection'sproc_iddiffers from the current process. This fix follows the same pattern.Testing confirmed that TLS sockets now close promptly on failure and that working TLS connections continue to function correctly.
Solution
Add a conditional
close(fd)inproto_tls_send()after callingtcp_conn_release(), using the pattern already present elsewhere: only close ifc->proc_id != process_no.Compatibility
I don't think this change breaks any other scenarios.
We've been running this in prod for over a year, I unfortunately missed creating a PR for it until now.