Summary
We have observed a lock contention issue with the Stripe mutex. This is an umbrella issue to track related changes.
Problem
Cache::open_read() has severe lock contention - every read operation (includes cache lookup) acquires an exclusive lock on stripe->mutex, serializing all cache operations and limiting throughput.
|
CACHE_TRY_LOCK(lock, stripe->mutex, mutex->thread_holding); |
Difficulties
I attempted to use a reader-writer lock instead of a mutex lock, and some proof-of-concept tests showed significant performance improvements. However, I found that we cannot simply replace this mutex lock with a reader-writer lock or a lock-free data structure. The main reason is that StripeSM, as a Continuation, requires the mutex lock when called from the event system.
#12601 is recent another attempt by @bryancall.
Event Handlers
- Event handlers of StripeSM
|
int handle_dir_clear(int event, void *data); |
|
int handle_dir_read(int event, void *data); |
|
int handle_recover_from_data(int event, void *data); |
|
int handle_recover_write_dir(int event, void *data); |
|
int handle_header_read(int event, void *data); |
Dir operations
Some Dir functions seems read only operation, but it actually does write operation under some conditions.
|
} else { // delete the invalid entry |
|
ts::Metrics::Gauge::decrement(cache_rsb.direntries_used); |
|
ts::Metrics::Gauge::decrement(stripe->cache_vol->vol_rsb.direntries_used); |
|
ATS_PROBE7(cache_dir_remove_invalid, stripe->fd, s, dir_to_offset(e, seg), dir_offset(e), dir_approx_size(e), |
|
key->slice64(0), key->slice64(1)); |
|
e = dir_delete_entry(e, p, s, this); |
|
continue; |
Proposed Solution
Implement a two-tier locking architecture by decoupling StripeSM and Stripe:
- Separate
StripeSM (Continuation) and Stripe (shared data)
StripeSM (a Continuation) contains event handlers, while Stripe contains shared data.
Half of this change has already been completed by #11565 and related PRs, but we still need to clarify the separation between event handling and shared data access more explicitly.
- Add Reader-Writer Lock to Stripe
Access to the shared data requires a reader-writer lock to allow concurrent reading. Alternatively, making Stripe a lock-free data structure (using RCU or Hazard Pointers) is another option.
- Allocate StripeSM per Transaction
Each cache operation gets a lightweight StripeSM instance with its own mutex for event handling. It acquires an RW lock on the shared Stripe for data access.
Architecture Diagram

Summary
We have observed a lock contention issue with the Stripe mutex. This is an umbrella issue to track related changes.
Problem
Cache::open_read()has severe lock contention - every read operation (includes cache lookup) acquires an exclusive lock onstripe->mutex, serializing all cache operations and limiting throughput.trafficserver/src/iocore/cache/Cache.cc
Line 344 in 7e366fa
Difficulties
I attempted to use a reader-writer lock instead of a mutex lock, and some proof-of-concept tests showed significant performance improvements. However, I found that we cannot simply replace this mutex lock with a reader-writer lock or a lock-free data structure. The main reason is that
StripeSM, as a Continuation, requires the mutex lock when called from the event system.#12601 is recent another attempt by @bryancall.
Event Handlers
trafficserver/src/iocore/cache/StripeSM.h
Lines 118 to 122 in 7e366fa
Dir operations
Some
Dirfunctions seems read only operation, but it actually does write operation under some conditions.Directory::probe()trafficserver/src/iocore/cache/CacheDir.cc
Lines 528 to 534 in 7e366fa
Proposed Solution
Implement a two-tier locking architecture by decoupling
StripeSMandStripe:StripeSM(Continuation) andStripe(shared data)StripeSM(a Continuation) contains event handlers, whileStripecontains shared data.Half of this change has already been completed by #11565 and related PRs, but we still need to clarify the separation between event handling and shared data access more explicitly.
Access to the shared data requires a reader-writer lock to allow concurrent reading. Alternatively, making Stripe a lock-free data structure (using RCU or Hazard Pointers) is another option.
Each cache operation gets a lightweight StripeSM instance with its own mutex for event handling. It acquires an RW lock on the shared Stripe for data access.
Architecture Diagram