Adjust connection timeout for TLS by shinrich · Pull Request #4903 · apache/trafficserver

shinrich · 2019-01-31T23:01:47Z

Found the following crash.

[ 0 ] traffic_server      write_to_net_io                    ( UnixNetVConnection.cc:445 )
[ 1 ] traffic_server      NetHandler::waitForActivity(long)  ( UnixNet.cc:514 )
[ 2 ] traffic_server      EThread::execute_regular()         ( UnixEThread.cc:277 )
[ 3 ] traffic_server      spawn_thread_internal              ( Thread.cc:85 )
[ 4 ] libpthread-2.17.so  start_thread

Line numbers on our branch are off from open source master, but top of stack from write_to_net_io is in code below

 if (!vc->getSSLHandShakeComplete()) {
    if (vc->trackFirstHandshake()) {
      // Send the write ready on up to the state machine
      write_signal_and_update(VC_EVENT_WRITE_READY, vc);
      vc->write.triggered = 0;
      nh->write_ready_list.remove(vc);
    }

    int err, ret;

    if (vc->get_context() == NET_VCONNECTION_OUT) {
      ret = vc->sslStartHandShake(SSL_EVENT_CLIENT, err);
    } else {
      ret = vc->sslStartHandShake(SSL_EVENT_SERVER, err);
    }

Specifically the crash is at the "ret = vc->sslStartHandShake(SSL_EVENT_CLIENT, err);" and looking in the core shows that "vc" has been freed (vtable pointer is bogus).

I think the issue is how we are handling notifying the state machine that the socket is in a write-ready state (SYN exchange has completed) with the call to write_signal_and_update. We assume that the vc is in a good state after this call, but it is quite possible that the HttpSM has determined it is in an error state and closed the vc.

I think we should reconsider how the connect timeout should apply to the TLS connection. Rather than just covering the SYN exchange, I would argue that it should cover the entire TLS handshake. If the TLS handshake stalls out, that should be covered by the connection timeout not the data no-activity timeout. Making that change would remove the write_signal_and_update here. Instead we don't notify the state machine until the read_complete signal sent at the end of the TLS handshake.

This PR makes that change.

We haven't see this crash very often, but the original TTFB timeout fix was backed out of the 8.0.x branch I assume due to instability issues.

zwoop · 2019-02-11T21:07:54Z

@shinrich So with this fix, we don't need to backout the TTFB PR from 8.0.x? If so, we should put that back in for 8.1.x as well IMO.

bryancall · 2020-06-12T17:40:08Z

This is a fix for #4028

Adjust connection timeout for TLS

62c2665

shinrich added TLS Crash labels Jan 31, 2019

shinrich added this to the 9.0.0 milestone Jan 31, 2019

shinrich self-assigned this Jan 31, 2019

SolidWallOfCode approved these changes Feb 1, 2019

View reviewed changes

shinrich merged commit 33818ba into apache:master Feb 4, 2019

masaori335 mentioned this pull request Jun 3, 2020

Crash in write_to_net_io in 8.1.x #6715

Closed

masaori335 mentioned this pull request May 12, 2021

Adjust connection timeout for TLS #7810

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust connection timeout for TLS#4903

Adjust connection timeout for TLS#4903
shinrich merged 1 commit intoapache:masterfrom
shinrich:fix-tls-connect-timeout

shinrich commented Jan 31, 2019

Uh oh!

zwoop commented Feb 11, 2019 •

edited

Loading

Uh oh!

bryancall commented Jun 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

shinrich commented Jan 31, 2019

Uh oh!

zwoop commented Feb 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryancall commented Jun 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zwoop commented Feb 11, 2019 •

edited

Loading