Skip to content

Replace CAS retry loop with Interlocked.Add in MemoryCache size tracking#124430

Closed
lufen wants to merge 1 commit into
dotnet:mainfrom
lufen:fix/memorycache-size-contention
Closed

Replace CAS retry loop with Interlocked.Add in MemoryCache size tracking#124430
lufen wants to merge 1 commit into
dotnet:mainfrom
lufen:fix/memorycache-size-contention

Conversation

@lufen

@lufen lufen commented Feb 14, 2026

Copy link
Copy Markdown
Contributor

Fix #111959 — UpdateCacheSizeExceedsCapacity uses a CompareExchange loop with 100 retries to atomically update _cacheSize. Under high thread contention this causes CPU spikes, retry storms, and silent entry drops when retries are exhausted.

Replace with Interlocked.Add (wait-free) + post-check rollback:

  • Optimistically add delta via Interlocked.Add
  • Check capacity; if exceeded, roll back with Interlocked.Add(-delta)

Benchmark results (isolated contention microbenchmark, 50k ops/thread):

| Threads | BEFORE (CAS loop) | AFTER (Interlocked.Add) | Speedup | |---------|-------------------|------------------------|---------|
| 1 | 447 us | 312 us | 1.4x |
| 2 | 3,549 us | 1,729 us | 2.1x |
| 4 | 13,208 us | 4,148 us | 3.2x |
| 8 | 26,885 us | 8,119 us | 3.3x |
| 16 | 105,903 us | 15,365 us | 6.9x |

Zero allocation difference. CAS scales ~O(N^2), Add scales ~O(N).
Benchmark PR: dotnet/performance#5120

Performance trade-off assessment:

The only trade-off is a nanosecond-scale transient over-count window in _cacheSize between Add and rollback. This is acceptable because:

  • _cacheSize is documented as "eventually consistent" (line 91-92)
  • The other two write sites (SetEntry rollback line 182, RemoveEntry line 790) already use plain Interlocked.Add without CAS protection
  • The old CAS code has the same race: two threads can both pass the capacity check and together exceed the limit
  • The over-count is bounded to entry.Size per thread (~1-2 CPU instructions of exposure)

Additional benefits beyond throughput:

  • Fixes correctness bug: old code silently drops entries after 100 CAS failures even when capacity is available
  • 30% faster single-threaded (less branching, no loop)
  • Simpler code: 14 lines vs 20 lines

Fix dotnet#111959 — UpdateCacheSizeExceedsCapacity uses a
CompareExchange loop with 100 retries to atomically update _cacheSize.
Under high thread contention this causes CPU spikes, retry storms, and
silent entry drops when retries are exhausted.

Replace with Interlocked.Add (wait-free) + post-check rollback:
- Optimistically add delta via Interlocked.Add
- Check capacity; if exceeded, roll back with Interlocked.Add(-delta)

Benchmark results (isolated contention microbenchmark, 50k ops/thread):

| Threads | BEFORE (CAS loop) | AFTER (Interlocked.Add) | Speedup |
|---------|-------------------|------------------------|---------|
| 1       | 447 us            | 312 us                 | 1.4x    |
| 2       | 3,549 us          | 1,729 us               | 2.1x    |
| 4       | 13,208 us         | 4,148 us               | 3.2x    |
| 8       | 26,885 us         | 8,119 us               | 3.3x    |
| 16      | 105,903 us        | 15,365 us              | 6.9x    |

Zero allocation difference. CAS scales ~O(N^2), Add scales ~O(N).

Performance trade-off assessment:

The only trade-off is a nanosecond-scale transient over-count window
in _cacheSize between Add and rollback. This is acceptable because:
- _cacheSize is documented as "eventually consistent" (line 91-92)
- The other two write sites (SetEntry rollback line 182, RemoveEntry
  line 790) already use plain Interlocked.Add without CAS protection
- The old CAS code has the same race: two threads can both pass the
  capacity check and together exceed the limit
- The over-count is bounded to entry.Size per thread (~1-2 CPU
  instructions of exposure)

Additional benefits beyond throughput:
- Fixes correctness bug: old code silently drops entries after 100
  CAS failures even when capacity is available
- 30% faster single-threaded (less branching, no loop)
- Simpler code: 14 lines vs 20 lines

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 14, 2026 20:59
@github-actions github-actions Bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 14, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Feb 14, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve MemoryCache performance under high thread contention by replacing a CompareExchange retry loop with an optimistic Interlocked.Add approach for cache size tracking. The change is motivated by CPU spikes observed when many threads compete to update cache size, with benchmark results showing 2-7x speedup under contention.

Changes:

  • Replaced CompareExchange loop (with 100-retry limit) with optimistic Interlocked.Add + post-check rollback pattern
  • Simplified the logic from ~20 lines to ~14 lines
  • Added inline documentation referencing the motivating GitHub issue

Comment on lines +504 to +510
long newSize = Interlocked.Add(ref coherentState._cacheSize, delta);

long original = Interlocked.CompareExchange(ref coherentState._cacheSize, newSize, sizeRead);
if (sizeRead == original)
{
return false;
}
sizeRead = original;
if ((ulong)newSize > (ulong)sizeLimit)
{
// Exceeded capacity — roll back the optimistic addition
Interlocked.Add(ref coherentState._cacheSize, -delta);
return true;

Copilot AI Feb 14, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical bug: When arithmetic overflow occurs in the optimistic Interlocked.Add, the rollback cannot restore the original _cacheSize value.

Example: If _cacheSize is long.MaxValue - 10 and delta is 100, the Add overflows to long.MinValue + 89. The rollback with Interlocked.Add(ref _cacheSize, -100) makes it long.MinValue - 11, not the original long.MaxValue - 10. This corrupts the cache size tracking permanently.

The old CompareExchange loop avoided this by computing newSize locally first and only updating _cacheSize if no overflow would occur. To maintain correctness, you need to check for potential overflow BEFORE calling Interlocked.Add, similar to the old approach but without the retry loop.

Copilot uses AI. Check for mistakes.
Comment on lines +506 to +510
if ((ulong)newSize > (ulong)sizeLimit)
{
// Exceeded capacity — roll back the optimistic addition
Interlocked.Add(ref coherentState._cacheSize, -delta);
return true;

Copilot AI Feb 14, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimistic approach introduces false rejections under concurrent load. When multiple threads add simultaneously, later additions see the temporarily inflated size and may incorrectly reject entries that should fit.

Example: With _cacheSize=50 and limit=100, Thread A adds 60 (making size 110), then Thread B adds 30 (seeing size 140). Thread B checks 140 > 100 and rolls back, but the entry (size 30) should have been accepted since 50+30=80 < 100.

The old CompareExchange loop avoided this by checking capacity BEFORE committing the update, ensuring entries are rejected based on the actual cache state, not temporarily inflated values. This change could increase cache miss rates under high concurrency, undermining the performance improvement goal.

Copilot uses AI. Check for mistakes.
@lufen lufen closed this Feb 14, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 17, 2026
cincuranet pushed a commit that referenced this pull request Jun 16, 2026
…size-limited cache (#129215)

## Summary

Fixes #129186. When `MemoryCacheOptions.SizeLimit` is set, the internal
`_cacheSize` counter can drift **negative** under concurrent
`Set`/`Get`/`Remove` on string keys. Once negative, the capacity check
`(ulong)... > (ulong)sizeLimit` is permanently true, so **every
subsequent `Set` is silently rejected** (no exception) and nothing is
retained for the lifetime of the cache — `CurrentEntryCount` stuck at 0,
every `Get` a miss. This is permanent, silent data loss.

## Regression source

Introduced by #103931 (merged for 9.0, "Subtract prior entry size when
adding entry to cache"), which fixed #36039 — allowing an entry at the
size limit to be replaced by a same-or-smaller one. That PR made two
changes to the replace path:

1. It made the capacity **check** prior-aware (`newSize -=
priorEntry.Size`) so a same-or-smaller replace fits at the limit. This
is correct and desirable.
2. It also **moved the `_cacheSize` decrement of the prior entry** out
of the post-swap `if (entryAdded)` block and into the speculative
capacity computation, *before* the `TryUpdate` swap.

Change (2) is the bug. Pre-#103931 (i.e., 8.x), the prior entry's size
was decremented **only after** a successful `TryUpdate`, atomically tied
to the swap, so a concurrent removal of the prior entry could not
double-count it.

## Root cause

In `SetEntry`, the prior entry's size is subtracted **speculatively**
inside `UpdateCacheSizeExceedsCapacity`, before the entry is actually
swapped in by `TryUpdate`. If another thread runs
`RemoveEntry(priorEntry)` (expiration scan, explicit `Remove`, or
eviction) in the window between that subtraction and the `TryUpdate`,
`priorEntry.Size` is subtracted **twice** — once speculatively and once
by the real removal in `CoherentState.RemoveEntry`. Each leaked
decrement drives `_cacheSize` negative, and the `(ulong)` cast then
latches the cache permanently.

## Fix

Keep #103931's prior-aware capacity **check** (so the #36039 behavior
and its test are preserved), but commit only `+entry.Size` to
`_cacheSize` there. Decrement `priorEntry.Size` exactly once, atomically
with the `TryUpdate` that performs the swap — restoring the pre-#103931
(8.x) ordering. On the failure path, roll back only `entry.Size`.

This intentionally keeps the existing `CompareExchange` retry loop and
the "check capacity *before* committing" ordering, so it avoids both
problems that closed #124430 (overflow-rollback corruption, and false
rejections from a transiently-inflated size). The only transient window
now over-counts by `priorEntry.Size` (positive), which is
self-correcting and can never latch the cache.

## Validation

Reproduced and validated against the genuine `MemoryCache` source
compiled standalone (concurrent Set/Get/Remove storm on a small string
keyspace under a generous `SizeLimit`, with a fresh-key retention
probe):

- **Before**: `_cacheSize` goes negative, 0/512 fresh entries retained.
- **After**: no drift (size stays >= 0), 512/512 retained across
repeated runs; `SizeLimit` eviction still enforced (a tiny limit still
bounds count/size); and the #36039 replace-at-limit behavior
(`ReplaceOldEntryWithSameSizeOrLessNewEntryAtSizeLimitCapacity`, new
value size 6/5/2 at limit 6) still passes.

Bisection (same harness): **8.0.x not affected; 9.0.x and 10.0.x
affected**, on both `net462` and `net10.0` builds — consistent with
#103931 shipping in 9.0.

A single-threaded pure-replace microbenchmark (the only path this change
touches) shows the difference vs the current code is within run-to-run
noise (~100 ns/op either way) — per-replace cost is dominated by entry
allocation and dictionary operations; the fix only restores one
`Interlocked.Add` on the replace path. Happy to add a `Set`-throughput
benchmark to dotnet/performance if desired (per the discussion on
#111959).

## Tests

Adds `MemoryCacheConcurrentSizeTrackingTests` with:
- a concurrency regression test asserting the tracked size never goes
negative and the cache keeps retaining entries after a Set/Get/Remove
storm, and
- a guard that `SizeLimit` eviction is still enforced.

Related: #36039 (behavior preserved), #103931 (regression source),
#111959, #124430.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

community-contribution Indicates that the PR has been added by a community member needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High Contention in MemoryCache _cacheSize Updates Causing Performance Degradation

2 participants