Skip to content

MemoryCache with SizeLimit permanently stops caching (silent Set drops) after _cacheSize drifts negative under concurrency #129186

@sablancoleis

Description

@sablancoleis

Description

When MemoryCacheOptions.SizeLimit is set, the internal running size counter (CoherentState._cacheSize) is maintained non-atomically with respect to the entry collection: increments happen inside the Interlocked.CompareExchange retry loop in UpdateCacheSizeExceedsCapacity, while decrements happen via Interlocked.Add(ref _cacheSize, -entry.Size) in Remove, CoherentState.RemoveEntry (eviction/expiration), and the failed-add rollback in SetEntry.

Under sustained concurrent Set/Get/Remove on a small set of string keys with short expirations, _cacheSize can drift negative. Once it is negative, the capacity check rejects every subsequent insert, because the comparison casts to unsigned:

// UpdateCacheSizeExceedsCapacity
long newSize = sizeRead + entry.Size;
if ((ulong)newSize > (ulong)sizeLimit)   // negative newSize -> huge ulong -> always true
{
    return true;                          // entry silently rejected (EvictionReason.Capacity)
}

A negative newSize becomes a very large unsigned value, so the check is always true and SetEntry takes the over-capacity branch. SetEntry returns void and does not throw, so callers see success while nothing is retained. Because nothing in normal operation resets _cacheSize to a non-negative value, the cache stays in this state permanentlyCurrentEntryCount stuck at 0, every Get a miss — until the process is restarted.

This is a correctness issue (permanent, silent loss of all cached data), and is closely related to the size-accounting problems already tracked in #88733 (faulty/negative CurrentEstimatedSize) and #111959 / #124430 (contended size tracking, "silent entry drops when retries are exhausted"). The novel point here is that the negative-size/cache-empties symptom is reachable purely from concurrent Set/Remove/expiration races — without setting Size after insertion (the trigger in #88733).

Reproduction Steps

Self-contained console app. Reproduces within a few seconds on a multi-core machine; if it does not latch on the first run, re-run or increase Threads/StormFor.

using System.Diagnostics;
using Microsoft.Extensions.Caching.Memory;

var cache = new MemoryCache(new MemoryCacheOptions
{
    SizeLimit = 200L * 1024 * 1024, // 200 MB — far larger than the working set below
    TrackStatistics = true,
});

const int Threads = 64;
const int Keys = 16;            // tiny keyspace -> heavy contention on the same entries
const int ValueSize = 4096;     // working set ~= Keys * ValueSize ~= 64 KB (<< SizeLimit)
var payload = new byte[ValueSize];
var expiry = TimeSpan.FromMilliseconds(15);   // short -> expiry-driven removals race with sets
var stormFor = TimeSpan.FromSeconds(8);

var workers = new Task[Threads];
for (int t = 0; t < Threads; t++)
{
    workers[t] = Task.Run(() =>
    {
        var rnd = new Random(Environment.CurrentManagedThreadId);
        var sw = Stopwatch.StartNew();
        while (sw.Elapsed < stormFor)
        {
            string key = "k" + rnd.Next(Keys);
            int roll = rnd.Next(100);
            if (roll < 65)
            {
                using var e = cache.CreateEntry(key);
                e.AbsoluteExpirationRelativeToNow = expiry;
                e.Size = payload.Length;
                e.Value = payload;
            }
            else if (roll < 85)
            {
                cache.TryGetValue(key, out _);
            }
            else
            {
                cache.Remove(key);
            }
        }
    });
}
await Task.WhenAll(workers);

// Drain the working set and let any remaining entries expire.
for (int k = 0; k < Keys; k++) cache.Remove("k" + k);
await Task.Delay(200);

var afterDrain = cache.GetCurrentStatistics()!;
Console.WriteLine($"After drain : Count={afterDrain.CurrentEntryCount}, EstimatedSize={afterDrain.CurrentEstimatedSize}");

// Retention probe: cache is now logically empty. Set fresh, NON-expiring keys and read each back.
int retained = 0;
for (int i = 0; i < 1000; i++)
{
    string key = "fresh-" + i;
    using (var e = cache.CreateEntry(key)) { e.Size = payload.Length; e.Value = payload; }
    if (cache.TryGetValue(key, out _)) retained++;
}
var final = cache.GetCurrentStatistics()!;
Console.WriteLine($"Retained    : {retained} / 1000");
Console.WriteLine($"Final       : Count={final.CurrentEntryCount}, EstimatedSize={final.CurrentEstimatedSize}");

Expected behavior

After drain : Count=0, EstimatedSize=0      (or any value >= 0)
Retained    : 1000 / 1000
Final       : Count=1000, EstimatedSize=4096000

Set never silently drops entries; CurrentEstimatedSize never goes negative.

Actual behavior (once the race triggers)

After drain : Count=0, EstimatedSize=-40960     <-- negative
Retained    : 0 / 1000                          <-- every fresh Set silently dropped
Final       : Count=0, EstimatedSize=-40960     <-- stuck; never recovers

No exception is thrown by any Set. The cache is permanently unusable for the lifetime of the MemoryCache instance.

Regression?

Yes -- introduced in 9.0 by #103931 ("Subtract prior entry size when adding entry to cache", fixing #36039). Bisected empirically: 8.0.x is not affected; 9.0.x and 10.0.x are. #103931 moved the prior entry's _cacheSize decrement out of the post-swap if (entryAdded) block into the speculative capacity computation (before the TryUpdate swap), which is what races with a concurrent removal and double-counts.

Known Workarounds

Configuration

  • Microsoft.Extensions.Caching.Memory 10.0.7
  • Observed on a .NET Framework 4.7.2 host (the net462 build of the package), x64.
  • The size-tracking code path is TFM-independent, so the same race is expected on .NET 8/9/10.

Other information

Metadata

Metadata

Assignees

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions