Add autest to make sure H2 grace shutdown is working as intended#11046
Add autest to make sure H2 grace shutdown is working as intended#11046duke8253 merged 1 commit intoapache:masterfrom
Conversation
ea36083 to
173c9b1
Compare
src/proxy/http2/Http2Stream.cc
Outdated
| // Schedule session shutdown if response header has "Connection: close", | ||
| // then remove "Connection: close" which is a protocol violation for HTTP/2. | ||
| if (!this->_send_header.is_keep_alive_set()) { | ||
| this->_send_header.field_delete(MIME_FIELD_CONNECTION, MIME_LEN_CONNECTION); |
There was a problem hiding this comment.
Do we really need to delete the header here? I think it (and other headers that are invalid for H2) will be deleted by VersionConverter later.
There was a problem hiding this comment.
I think it does get deleted, added that mostly for testing purpose. I'll remove that.
| case VC_EVENT_INACTIVITY_TIMEOUT: | ||
| case VC_EVENT_INACTIVITY_TIMEOUT: { | ||
| if (this->connection_state.get_shutdown_state() == HTTP2_SHUTDOWN_NONE) { | ||
| this->connection_state.set_shutdown_state(HTTP2_SHUTDOWN_INITIATED, Http2ErrorCode::HTTP2_ERROR_NO_ERROR); |
There was a problem hiding this comment.
I may not understand the issue, but the state should not be changed directly here. It should be set by HTTP2_SESSION_EVENT_SHUTDOWN_INIT event handler. Changing the state at multiple places is going to be super confusing.
There was a problem hiding this comment.
Will sending this with the SHUTDOWN_CONT event below trigger SHUTDOWN_INIT? Or are you suggesting that this case should send the SHUTDOWN_INIT event instead and rely on that flow to set the appropriate state?
There was a problem hiding this comment.
I was suggesting scheduling SHUTDOWN_INIT event instead. But if canceling grace shutdown works we don't need it.
Aren't these immediate close? Why do we want to initiate grace shutdown on timeout? |
We got a complaint from a property saying they don't have a way to determine whether a connection is closed, since the HTTP/2 doesn't have the The problem with the About that timeout part, was doing some testing and wasn't sure whether we need that, left it in there to get some comments. |
|
Ok, I wonder why the property can't check if the connection is closed on TCP layer, but let's leave it aside. The RFC says:
and
So, a client cannot expect a GOAWAY. A connection close without GOAWAY can happen. I don't think current behavior of ATS is wrong. That said, the RFC also says:
I'd take this as "A server can cancel graceful shutdown and send a GOAWAY without waiting for the 1 RTT of grace time, to close a connection immediately". This may be just a different way of saying what you do on this PR, but I think this is what we should do, and I'd take a different approach. In most cases, TLDR; Isn't canceling graceful shutdown process enough? |
| shutdown_cont_event->cancel(); | ||
| } | ||
| this->_shutdown_cont(); | ||
| } |
There was a problem hiding this comment.
So cleaning up the in progress shutdown events, since we just shutdown cleanly here?
There was a problem hiding this comment.
Yes, since we already started a grace shutdown, I think we should finish the whole process.
There was a problem hiding this comment.
I still don't get why we need to call finish_grace_shutdown() here.
When we schdule HTTP2_SESSION_EVENT_FINI, we send GOAWAY with latest_streamid_in, right? Doesn't this send an extra GOAWAY?
- GOAWAY with the big number for graceful shutdown
- GOAWAY with latest_streamid_in at the place we schedule
HTTP2_SESSION_EVENT_FINI - GOAWAY with latest_streamid_in here in the
HTTP2_SESSION_EVENT_FINIhandler .
There was a problem hiding this comment.
Because when we get a HTTP2_SESSION_EVENT_FINI we set the handler to Http2ConnectionState::state_closed, which just does shutdown_cont_event = nullptr, so the GOAWAY with latest_streamid_in never gets sent out.
There was a problem hiding this comment.
We only want to send two GOAWAY frames at most, A) one with the big number, and B) one with latest_streamid_in, right?
B is sent when we schedule HTTP2_SESSION_EVENT_FINI event. For example:
trafficserver/src/proxy/http2/Http2ConnectionState.cc
Lines 1451 to 1455 in c49e60f
So when we reach this line, which is in HTTP2_SESSION_EVENT_FINI handler, B is already sent (or it's in the send buffer at least). Calling finish_grace_shutdown would send third GOAWAY.
There was a problem hiding this comment.
The only way we get an event HTTP2_SESSION_EVENT_SHUTDOWN_INIT is when shutdown_state == HTTP2_SHUTDOWN_NOT_INITIATED, and that is the trigger for a grace shutdown since we only schedule that event when checking for the state like this.
if (this->connection_state.get_shutdown_state() == HTTP2_SHUTDOWN_NOT_INITIATED) {
send_connection_event(&this->connection_state, HTTP2_SESSION_EVENT_SHUTDOWN_INIT, this);
}
And from testing I've done so far, I was never able to get the second GOAWAY frame as described in the RFC when a grace shutdown is initiated, and that's why I created this PR. But I can move this->_finish_grace_shutdown(); two lines above where it is at now, since shutdown_cont_event is only scheduled during a grace shutdown. I'm also fine with adding that flag you mentioned.
There was a problem hiding this comment.
I just checked the behavior on the latest master (without the change on this PR), and I see the second GOAWAY.
Unfortunately the test was not done with ProxyVerifier but with my own tool though, I see this:
$ ./h2seq.js inactive.json | openssl s_client -quiet -connect 127.0.0.1:8443 | ./h2res.js
Connecting to 127.0.0.1
Can't use SSL_get_servername
depth=1 O=foo, L=Denver, ST=Colorado, C=US, CN=ca.example.com
verify error:num=19:self-signed certificate in certificate chain
verify return:1
depth=1 O=foo, L=Denver, ST=Colorado, C=US, CN=ca.example.com
verify return:1
depth=0 C=US, ST=Colorado, O=foo, CN=server.example.com
verify return:1
Seq#0 done
(node:98221) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Seq#1 done
Seq#2 done
Seq#3 done
(node:98219) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
[Lenght: 18, Type: SETTINGS(4), Flags: 0, StreamID: 0]
Params:
SETTINGS_ENABLE_PUSH: 0
SETTINGS_MAX_CONCURRENT_STREAMS: 100
SETTINGS_MAX_HEADER_LIST_SIZE: 16383
[Lenght: 0, Type: SETTINGS(4), Flags: 1, StreamID: 0]
Flags: ACK
[Lenght: 131, Type: HEADERS(1), Flags: 4, StreamID: 1]
Flags: END_HEADERS
[Lenght: 9, Type: DATA(0), Flags: 1, StreamID: 1]
Flags: END_STREAM
Data:
Not Foun���
[Lenght: 8, Type: GOAWAY(7), Flags: 0, StreamID: 0]
Last-Stream-ID: 2147483647
Error Code: NO_ERROR
[Lenght: 8, Type: GOAWAY(7), Flags: 0, StreamID: 0]
Last-Stream-ID: 1
Error Code: NO_ERROR
^C
And this is ATS side:
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2384 (send_headers_frame)> (http2_con) [2] [1] Send HEADERS frame
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2461 (send_headers_frame)> (http2_con) [2] [1] Send HEADERS frame flags: 0x4 length: 160
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2165 (cancel_retransmit)> (http2_con) [2] [0] Canceling retransmitting data frames
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpTunnel.cc:1035 (producer_run)> (http_tunnel) Start read vio 9 bytes
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpTunnel.cc:1168 (producer_handler)> (http_tunnel) [1] producer_handler [cache read VC_EVENT_READ_COMPLETE/TS_EVENT_VCONN_READ_COMPLETE]
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpTunnel.cc:1204 (producer_handler)> (http_redirect) [1] enable_redirection: [1 1 0] event: 102, state: 0
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpSM.cc:3644 (tunnel_handler_cache_read)> (http) [1] [&HttpSM::tunnel_handler_cache_read, VC_EVENT_READ_COMPLETE/TS_EVENT_VCONN_READ_COMPLETE]
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2Stream.cc:802 (update_write_request)> (http2_stream) [2] [1] update_write_request parse_done=1
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2292 (send_a_data_frame)> (http2_con) [2] [1] End of Data Frame
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2302 (send_a_data_frame)> (http2_con) [2] [1] Send a DATA frame - peer window con: 65526 stream: 65526 payload: 9 flags: 0x1
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2165 (cancel_retransmit)> (http2_con) [2] [0] Canceling retransmitting data frames
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2308 (send_a_data_frame)> (http2_con) [2] [1] END_STREAM
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2Stream.cc:495 (change_state)> (http2_stream) [2] [1] Http2StreamState::HTTP2_STREAM_STATE_HALF_CLOSED_REMOTE -> Http2StreamState::HTTP2_STREAM_STATE_CLOSED
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2350 (send_data_frames)> (http2_con) [2] [1] Shutdown stream
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpTunnel.cc:1375 (consumer_handler)> (http_tunnel) [1] consumer_handler [user agent VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpSM.cc:3412 (tunnel_handler_ua)> (http) [1] [&HttpSM::tunnel_handler_ua, VC_EVENT_WRITE_COMPLETE/TS_EVENT_VCONN_WRITE_COMPLETE]
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2Stream.cc:554 (do_io_close)> (http2_stream) [2] [1] do_io_close
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpSM.cc:2608 (main_handler)> (http) [1] HTTP_TUNNEL_EVENT_DONE, 2301
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpSM.cc:2984 (tunnel_handler)> (http) [1] [&HttpSM::tunnel_handler, HTTP_TUNNEL_EVENT_DONE]
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpSM.cc:8729 (clear)> (http_redirect) [PostDataBuffers::clear]
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpSM.cc:7485 (kill_this)> (http_seq) [1] Logging transaction
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2Stream.cc:83 (~Http2Stream)> (http2_stream) [2] [1] Destroy stream, sent 9 bytes, registered: true
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:1993 (delete_stream)> (http2_con) [2] [1] Delete stream
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <HttpSM.cc:7534 (kill_this)> (http) [1] deallocating sm
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <SSLNetVConnection.cc:310 (_ssl_read_from_net)> (ssl) amount_to_read=4024
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <SSLNetVConnection.cc:315 (_ssl_read_from_net)> (ssl) nread=0
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <SSLNetVConnection.cc:338 (_ssl_read_from_net)> (ssl.error) SSL_ERROR_WOULD_BLOCK(read)
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <SSLNetVConnection.cc:396 (_ssl_read_from_net)> (ssl) bytes_read == 0
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <SSLNetVConnection.cc:759 (net_read_io)> (ssl) read finished - would block
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2700 (send_goaway_frame)> (http2_con) [2] Send GOAWAY frame: Error Code: 0, Last Stream Id: 2147483647
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2165 (cancel_retransmit)> (http2_con) [2] [0] Canceling retransmitting data frames
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <SSLNetVConnection.cc:814 (load_buffer_and_write)> (ssl) towrite=204
[Jun 25 10:07:11.935] [ET_NET 0] DIAG: <SSLNetVConnection.cc:870 (load_buffer_and_write)> (ssl) try_to_write=204 written=204 total_written=204
[Jun 25 10:07:13.932] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2700 (send_goaway_frame)> (http2_con) [2] Send GOAWAY frame: Error Code: 0, Last Stream Id: 1
[Jun 25 10:07:13.932] [ET_NET 0] DIAG: <Http2ConnectionState.cc:2165 (cancel_retransmit)> (http2_con) [2] [0] Canceling retransmitting data frames
[Jun 25 10:07:13.932] [ET_NET 0] DIAG: <Http2CommonSession.cc:153 (set_half_close_local_flag)> (http2_cs) [2] session half-close local
[Jun 25 10:07:13.932] [ET_NET 0] DIAG: <SSLNetVConnection.cc:814 (load_buffer_and_write)> (ssl) towrite=17
[Jun 25 10:07:13.932] [ET_NET 0] DIAG: <SSLNetVConnection.cc:870 (load_buffer_and_write)> (ssl) try_to_write=17 written=17 total_written=17
inactive.json is below, if you want to try it by yourself:
[
{ "action": "send", "stream": 0, "type": "preface"},
{ "action": "send", "stream": 0, "type": "settings", "flags": "", "settings": [] },
{ "action": "send", "stream": 0, "type": "settings", "flags": "ACK", "settings": [] },
{ "action": "send", "stream": 1, "type": "headers", "flags": "END_HEADERS,END_STREAM", "headers": [
[":method", "GET"],
[":scheme", "https"],
[":authority", "localhost"],
[":path", "/httpbin/get"]
]},
{ "action": "pause", "duration": 300000 }
]
There was a problem hiding this comment.
Maybe some other changes in the past few months have fixed this in a different way then, I just checked with Proxy Verifier and it is also getting the second GOAWAY frame now. So it's your call then, do you want to add a flag to make sure the second GOAWAY gets sent during a grace shutdown, or I can remove this part of the change and just add the test to check things are working as intended.
There was a problem hiding this comment.
Okay, nice. Then the current code works as I expect. I would not add the flag, because I think something is wrong if the flag has to work.
Let's just add the test. Thank you for writing the test.
173c9b1 to
dc8fb70
Compare
|
@maskit forgot to mention one thing, when doing a |
|
Closing a connection without sending GOAWAY is not polite, but I don't think it's a violation. Depends on why the connection was closed.
|
46f497d to
a71a773
Compare
include/proxy/http2/Http2Stream.h
Outdated
| void set_expect_send_trailer() override; | ||
| bool expect_receive_trailer() const override; | ||
| void set_expect_receive_trailer() override; | ||
| void set_close_connection(HTTPHdr &hdr) const override; |
There was a problem hiding this comment.
Ah, nvm. I just realized it's override.
There was a problem hiding this comment.
Does this relay Connection header value between a request and a response? I think that should not happen.
Client <-- (H2) -- ATS <-- (H1) -- Origin
When an origin server returns Connection: close, ATS should not close the connection between the client and ATS, right? It's just the origin wants to close the connection between ATS and the origin. The client should be allowed to make another request, which can be served from the cache, on the same connection.
There was a problem hiding this comment.
This sets the Connection: close, but it doesn't decide whether the connection will be closed. HttpTransact::handle_response_keep_alive_headers actually decides whether that close will remain in the response or be removed. But before we never set it in the first place, so all the logic there does nothing for h2 connections.
trafficserver/src/proxy/http/HttpTransact.cc
Line 6865 in 36dac7a
It removes the connection header, and added it back if needed.
There was a problem hiding this comment.
Ok, I think I was confused by this line. It seemed like to modify a client response based on a server response, but it doesn't.
trafficserver/src/proxy/http/HttpTransact.cc
Line 6978 in cbe9169
Http1Transaction and Http2Stream do the same thing in set_close_connection. It's not conditional at all. I assume Http3Transaction should do the same. Now I wonder why we don't modify the heads directly in HttpTransact. What's the reason for letting transactions to set the header? I think we should remove this confusing function.
There was a problem hiding this comment.
This was added in #8178 , maybe at the time we had some issues with our H2 streams and this header was causing trouble?
There was a problem hiding this comment.
So we are going back to the original behavior (do the same thing for H1 and H2). Are you sure you don't have the issue that was resolved by #8178, with this change?
If we don't have to to differentiate the behavior, I prefer the original code that modifies heads directly in HttpTransact.
There was a problem hiding this comment.
Good point, let me dig a bit deeper.
There was a problem hiding this comment.
The intent with #8178 was to not drain the session if the HttpSM logic sets the connection close due to detected transaction failures. Shutting down the connection in HTTP/1 makes sense since the TCP connection may be polluted in the case of bad request bodies. However, this is not an issue with HTTP/2. An error with a request body should not require shutting down the session due to cross-contamination.
Explicitly setting the Connection: close header in the HTTP/2 processing logic when w want to close the session seems reasonable if it makes our client's lives easier. But blindly interpreting Connection: Close passed from the state machine processing will cause HTTP/2 sessions to shut down needlessly in some cases.
c729304 to
e187962
Compare
| SCOPED_MUTEX_LOCK(lock, this->session->get_mutex(), this_ethread()); | ||
| this->session->set_half_close_local_flag(true); | ||
| this->_shutdown_cont(); | ||
| } break; |
There was a problem hiding this comment.
Looks like we no longer need a scope here. Let's remove the braces.
|
@duke8253 What's the status of this PR? Is it ready for review? |
99ff1fc to
64a77df
Compare
|
The AuTest dns_host_down failed: |
|
@maskit It's ready for another review. |
64a77df to
44782b7
Compare
| @@ -0,0 +1,71 @@ | |||
| ''' | |||
| Abort HTTP/2 connection using RST_STREAM frame. | |||
There was a problem hiding this comment.
Please update the comment and the summary (it looks like truncated).
44782b7 to
5784d0d
Compare
|
Cherry-picked to v10.0.x to avoid conflicts with another PR |
(cherry picked from commit 36bbb0e)
This PR addresses a few issues:
- If aHTTP2_SESSION_EVENT_FINIis received after aHTTP2_SESSION_EVENT_SHUTDOWN_INIT, theGOAWAYframe with the correctlast_stream_idisn't sent out.Correctly set and remove theConnection: closeheader for HTTP/2 (this header is only used as a internal flag to indicate a grace shutdown is needed).Initiate a grace shutdown when status is 429.