Skip to content

Conversation

@cmcfarlen
Copy link
Contributor

This autest demonstrates an issue with cache alternates and updating from a response with a content-length to one with content-length 0.

When an alternate is replaced, and previously had a content-length, there is a bug where then old content is preserved and only this headers are changed (and the new 0-length content is not "written"). When this half-updated asset is served from cache, the content length is mismatched, delivered via chunked encoding and the old contents are shipped to the client.

@cmcfarlen cmcfarlen requested a review from bryancall as a code owner April 4, 2022 19:38
@elsloo
Copy link
Contributor

elsloo commented Apr 4, 2022

When digging into this issue, it appears to stem from setting a CacheVC instance variable called update_len to the value of the existing cached object during the call to Cache::open_write() which occurs prior to the request being sent upstream on cache miss.

Later in the transaction, when a response with a Content-Length of zero is received and being handled, the length of the resulting object is set incorrectly to the prior length using update_len. The first request to go through this transition will be served as expected, while any subsequent request will hang, waiting on the response because of ATS unexpectedly switching to Transfer-Encoding: chunked. The length mismatch within ATS arises from the newly cached headers that contain Content-Length: 0 and the length of the fragment on disk which was set incorrectly using update_len. Clients requesting objects in this state will eventually abort.

Because we are already reading the Content-Length header in CacheVC::set_http_info() and specifically handling the zero length case, we could easily handle the issue there. However, there could be other issues that arise from a mismatch in this scenario where the Content-Length is non-zero but greater than or less than the value set on update_len so we should consider the spirit of the intent of the alternate update process that sets update_len during Cache::open_write() while mitigating the underlying cause.

bneradt
bneradt previously approved these changes Apr 5, 2022
Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks good. Thanks for adding a regression test for this.

@ywkaras
Copy link
Contributor

ywkaras commented Apr 12, 2022

Looks like this is causing the alternate-caching Au test to fail.

Running Test all_headers:... Passed
Running Test alternate-caching:F Failed
Running Test background_fill:... Passed

@cmcfarlen
Copy link
Contributor Author

Yes, this PR is just an autest that demonstrates the bug with alternate cache updates so that people could reason about it. A fix is forthcoming and then this autest will pass.

@bneradt
Copy link
Contributor

bneradt commented Apr 20, 2022

Looks good.

Copy link
Contributor

@bneradt bneradt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@elsloo
Copy link
Contributor

elsloo commented May 11, 2022

Merging this PR on behalf of @cmcfarlen due to the approval from @bneradt and my own knowledge of how this works.

@elsloo elsloo merged commit 4c5b182 into apache:master May 11, 2022
@zwoop
Copy link
Contributor

zwoop commented May 16, 2022

Cherry-picked to v9.2.x

@zwoop zwoop modified the milestones: 10.0.0, 9.2.0 May 16, 2022
zwoop pushed a commit that referenced this pull request May 16, 2022
* Add autest to cover updates to cache with alternates

* remove trailing line(autopep8)

* add comments describing the requests

* reduce delay and max-age on test

* set cache object size to zero when incoming content length is zero

* add demystifying comment

Co-authored-by: Chris McFarlen <cmcfarlen@apple.com>
(cherry picked from commit 4c5b182)
@cmcfarlen cmcfarlen deleted the alt-cache-update branch July 5, 2022 15:15
@masaori335
Copy link
Contributor

We faced another symptom made by this bug. If the half-updated object is big enough (> 1GB) and it's on the RAM Cache, you might see the below crash made by a huge loop on cache read.

[ 00 ] libpthread-2.17.so  waitpid                                                           
[ 01 ] traffic_server      crash_logger_invoke(int, siginfo_t*, void*)                       ( Crash.cc:168 )
[ 02 ] libpthread-2.17.so                                                                    
[ 03 ] libtscore.so.9.0.3  freelist_new(_InkFreeList*)                                       ( ink_queue.cc:239 )
[ 04 ] libtscore.so.9.0.3  ink_freelist_new                                                  ( ink_queue.cc:187 )
[ 05 ] traffic_server      IOBufferBlock::alloc(long)                                        ( Allocator.h:131 )
[ 06 ] traffic_server      MIOBuffer::write(void const*, long)                               ( P_IOBuffer.h:839 )
[ 07 ] traffic_server      ChunkedHandler::generate_chunked_content()                        ( HttpTunnel.cc:371 )
[ 08 ] traffic_server      HttpTunnel::producer_handler_dechunked(int, HttpTunnelProducer*)  ( HttpTunnel.cc:1019 )
[ 09 ] traffic_server      HttpTunnel::producer_handler(int, HttpTunnelProducer*)            ( HttpTunnel.cc:1116 )
[ 10 ] traffic_server      HttpTunnel::main_handler(int, void*)                              ( HttpTunnel.cc:1640 )
[ 11 ] traffic_server      CacheVC::calluser(int)                                            ( eventsystem/I_Continuation.h:219 )
[ 12 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( CacheRead.cc:782 )
[ 13 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 14 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 15 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 16 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 17 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 18 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 19 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 20 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 21 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 22 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 23 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 24 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 25 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 26 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 27 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 28 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 29 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 30 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 31 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 32 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 33 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 34 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 35 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 36 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 37 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 38 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 39 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 40 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 41 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 42 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 43 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 44 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 45 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 46 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 47 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 48 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 49 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 50 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 51 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 52 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 53 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 54 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 55 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 56 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 57 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 58 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 59 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 60 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 61 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
[ 62 ] traffic_server      CacheVC::openReadMain(int, Event*)                                ( eventsystem/I_Continuation.h:219 )
[ 63 ] traffic_server      CacheVC::openReadReadDone(int, Event*)                            ( CacheRead.cc:639 )
...                                                           

( the number of call stack is over 8000 )

moonchen pushed a commit to moonchen/trafficserver that referenced this pull request Jul 26, 2022
…pache#356)

* Add autest to cover updates to cache with alternates

* remove trailing line(autopep8)

* add comments describing the requests

* reduce delay and max-age on test

* set cache object size to zero when incoming content length is zero

* add demystifying comment

Co-authored-by: Chris McFarlen <cmcfarlen@apple.com>
(cherry picked from commit 4c5b182)

Co-authored-by: Chris McFarlen <chris@mcfarlen.us>
masaori335 pushed a commit to masaori335/trafficserver that referenced this pull request Feb 21, 2023
* asf/9.2.x:
  Updated ChangeLog
  Fix parent_select optional scheme (apache#8831)
  Add `#pragma once` for PendingAction.h (apache#8846)
  Promote class PendingAction from HttpSM.h for use in other classes. (apache#8423)
  Fixes leak in SNIAction name globbing (apache#8827)
  Handle opentelemetry-cpp v1.3.0 upgrade for otel_tracer plugin (apache#8834)
  Fix overflow conditions in prefetch plugin (apache#8660)
  Remove incorrect comment from base64 functions (apache#8835)
  Add autest to cover updates to cache with alternates (apache#8779)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants