feat: chunk-index variable but fixed length, chunk-index repair and updated thin chunk format#95
Conversation
…TreeValidationResult, archive invalidates chunk-index on mismatch, parallel flush uses an internal worker const and partial failure keeps snapshot unpublished, and prefix length lives as a const on ChunkIndexService.
…Shared/ChunkIndex coverage acceptance criterion
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces explicit chunk-index repair and refactors chunk-index internals (shard cache, reader, write-session), makes blob listing metadata-aware, standardizes thin-chunk metadata (empty body + parent hash in metadata), changes FileTreeService validation to return mismatch info, runs chunk-index flush and filetree sync concurrently in archive tail, adds a repair-index CLI/handler, and updates specs, tests, and docs. ChangesChunk index, storage API, pipeline, CLI, specs
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as arius repair-index
participant Mediator
participant RepairHandler
participant ChunkIndexService
participant BlobStore
User->>CLI: invoke
CLI->>Mediator: RepairChunkIndexCommand
Mediator->>RepairHandler: Dispatch
RepairHandler->>ChunkIndexService: RepairAsync
ChunkIndexService->>BlobStore: ListAsync(chunks/, includeMetadata:true)
ChunkIndexService->>BlobStore: Upload rebuilt shards / Delete stale
ChunkIndexService-->>RepairHandler: Repair result
RepairHandler-->>CLI: Success/Failure
CLI-->>User: Output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #95 +/- ##
==========================================
- Coverage 78.52% 77.91% -0.61%
==========================================
Files 78 70 -8
Lines 5300 4900 -400
Branches 731 626 -105
==========================================
- Hits 4162 3818 -344
+ Misses 940 899 -41
+ Partials 198 183 -15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…touched shard uploads succeed, preserving current retry semantics on partial failure
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/Arius.Core/Shared/ChunkIndex/ChunkIndexService.cs (1)
239-248:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftLegacy thin chunks without ParentChunkHash metadata will break repair.
CreateThinRepairEntryrequiresParentChunkHashin blob metadata (line 242) and throwsChunkIndexRepairExceptionif missing or invalid. According to the past review comment (marked "✅ Addressed") and the PR description ("updated thin chunk format"), legacy thin chunks stored the parent hash in the blob body, not metadata. Repairing a repository with legacy thin chunks will throw here, leave the repair marker on disk, and block all archive/restore/list operations until the marker is manually removed.Add a fallback that downloads and parses the legacy thin-body format when
ParentChunkHashmetadata is absent:🔧 Proposed fix to support legacy thin chunks
- private static ShardEntry CreateThinRepairEntry(BlobListItem item) + private static async Task<ShardEntry> CreateThinRepairEntryAsync(BlobListItem item, IBlobContainerService blobs, CancellationToken cancellationToken) { var contentHash = ContentHash.Parse(item.Name.Name.ToString()); - if (!item.Metadata!.TryGetValue(BlobMetadataKeys.ParentChunkHash, out var parentChunkHashValue) || !ChunkHash.TryParse(parentChunkHashValue, out var parentChunkHash)) - throw new ChunkIndexRepairException(item.Name, $"missing or invalid {BlobMetadataKeys.ParentChunkHash} metadata"); + + ChunkHash parentChunkHash; + if (item.Metadata!.TryGetValue(BlobMetadataKeys.ParentChunkHash, out var parentChunkHashValue) && ChunkHash.TryParse(parentChunkHashValue, out var parsedHash)) + { + // New format: parent hash in metadata + parentChunkHash = parsedHash; + } + else + { + // Legacy format: parent hash in body (empty body was not used previously) + if (item.ContentLength == 0) + throw new ChunkIndexRepairException(item.Name, $"missing or invalid {BlobMetadataKeys.ParentChunkHash} metadata and empty body (cannot parse legacy format)"); + + await using var stream = await blobs.DownloadAsync(item.Name, cancellationToken); + using var reader = new StreamReader(stream, System.Text.Encoding.UTF8); + var parentHashValue = await reader.ReadToEndAsync(cancellationToken); + if (!ChunkHash.TryParse(parentHashValue.Trim(), out var legacyHash)) + throw new ChunkIndexRepairException(item.Name, $"missing or invalid {BlobMetadataKeys.ParentChunkHash} metadata and failed to parse legacy thin-body format"); + + parentChunkHash = legacyHash; + } var originalSize = ReadRequiredLongMetadata(item, BlobMetadataKeys.OriginalSize); var compressedSize = ReadRequiredLongMetadata(item, BlobMetadataKeys.CompressedSize); return new ShardEntry(contentHash, parentChunkHash, originalSize, compressedSize); }Update the caller at line 226 and the return type at line 217 to await the async call.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Arius.Core/Shared/ChunkIndex/ChunkIndexService.cs` around lines 239 - 248, CreateThinRepairEntry currently throws if BlobMetadataKeys.ParentChunkHash is missing; change it to an async CreateThinRepairEntry that first tries metadata then falls back to downloading and parsing the legacy thin-body to extract the parent chunk hash when metadata is absent or invalid, then return the same ShardEntry using ContentHash.Parse(item.Name.Name.ToString()), ReadRequiredLongMetadata(item, BlobMetadataKeys.OriginalSize) and ReadRequiredLongMetadata(item, BlobMetadataKeys.CompressedSize); if the legacy body cannot be read or parsed still throw ChunkIndexRepairException with context. Also update the caller that invokes CreateThinRepairEntry to await the new async method and adjust its return type accordingly so the async flow compiles.src/Arius.Core/Shared/ChunkIndex/ShardSerializer.cs (1)
10-10: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winMake
ShardSerializerinternal to align with architecture boundary.
ShardSerializeris a chunk-index implementation detail and is only used by other internal chunk-index components (e.g.,ChunkIndexShardCache) and tests (which haveInternalsVisibleToaccess). Per the architecture requirement in the spec, extracted chunk-index components should remain internal behind theChunkIndexServicefacade. Change the class declaration tointernal static class ShardSerializer. As per coding guidelines, make non-test classes internal and only make them public when they must be consumed by another non-test assembly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Arius.Core/Shared/ChunkIndex/ShardSerializer.cs` at line 10, Change the public visibility of the ShardSerializer class to internal since it is an implementation detail of the chunk-index; update the class declaration for ShardSerializer from "public static class ShardSerializer" to "internal static class ShardSerializer" so it remains accessible to internal consumers like ChunkIndexShardCache and tests (which use InternalsVisibleTo) but is not exposed publicly beyond the assembly.
♻️ Duplicate comments (1)
src/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexServiceRepairTests.cs (1)
82-86:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winVerify that rebuilt shards were actually persisted, not just cached in memory.
The test currently only checks
LookupAsyncon the sameChunkIndexServiceinstance that performed the repair. To definitively prove the rebuilt shard was persisted to blob storage (L2/L3) and not just held in memory (L1), create a freshChunkIndexServiceinstance after repair and verify the lookup succeeds on that new instance.🧪 Proposed fix to add persistence verification
var result = await index.RepairAsync(); result.RebuiltShardCount.ShouldBe(1); - (await index.LookupAsync(rebuiltHash)).ShouldBe(new ShardEntry(rebuiltHash, ChunkHash.Parse(rebuiltHash), 100, 3)); - (await index.LookupAsync(staleHash)).ShouldBeNull(); + + // Verify rebuilt shard was persisted by loading fresh instance + index.Dispose(); + using var freshIndex = new ChunkIndexService(blobs, s_encryption, repositoryKey, repositoryKey); + (await freshIndex.LookupAsync(rebuiltHash)).ShouldBe(new ShardEntry(rebuiltHash, ChunkHash.Parse(rebuiltHash), 100, 3)); + (await freshIndex.LookupAsync(staleHash)).ShouldBeNull();As per coding guidelines,
**/*.Tests/**: Test projects. Focus on test coverage gaps and assertion quality rather than style.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexServiceRepairTests.cs` around lines 82 - 86, The test currently verifies rebuilt shards only against the same ChunkIndexService instance that performed RepairAsync, so update the test to assert persistence by instantiating a fresh ChunkIndexService after calling index.RepairAsync(); then call LookupAsync(rebuiltHash) and LookupAsync(staleHash) on the new service and assert the rebuilt shard returns the expected ShardEntry (new ShardEntry(rebuiltHash, ChunkHash.Parse(rebuiltHash), 100, 3)) and the staleHash returns null; ensure you dispose/close the original index if required before creating the new instance so the lookup truly hits persisted storage.
🧹 Nitpick comments (1)
src/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexServiceLookupTests.cs (1)
216-255: 💤 Low valueConsider moving
ThrowOnMetadataBlobContainerServiceinto a localFakes/subfolder.This scenario-specific test double is declared inline in the test file. The repository convention places such doubles in a
Fakes/subfolder beside the tests that use them (as done for the other blob-container fakes in this PR), which improves discoverability and reuse.As per coding guidelines: "Put scenario-specific test doubles in a local Fakes/ subfolder beside the tests that use them."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexServiceLookupTests.cs` around lines 216 - 255, The inline test double ThrowOnMetadataBlobContainerService should be moved into a local Fakes/ subfolder beside these tests; create a new file named ThrowOnMetadataBlobContainerService.cs under the test's Fakes/ directory, move the class implementation there, adjust the namespace to match the existing tests' namespace (or add an appropriate using) and ensure the test references compile (update accessibility if needed); no behavior changes—just relocate the class to follow the repository convention for scenario-specific fakes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/Arius.Core/Shared/ChunkIndex/ChunkIndexWriteSession.cs`:
- Around line 13-19: AddEntry has a TOCTOU race with FlushAsync: it checks
Volatile.Read(ref _flushInProgress) then writes into the shared dictionary
_sessionEntries so FlushAsync may snapshot and clear entries while AddEntry
writes after the snapshot, causing silent loss; fix by making _sessionEntries
non-readonly and changing FlushAsync to atomically swap the current dictionary
with a new empty one (e.g., Interlocked.Exchange on the _sessionEntries
reference), then operate on the snapshot for clearing/persisting; keep the
Volatile.Read check in AddEntry and continue to write into the current
_sessionEntries reference so any concurrent flush will operate on the swapped
snapshot, eliminating the TOCTOU window (refer to AddEntry, FlushAsync,
_sessionEntries, and _flushInProgress).
---
Outside diff comments:
In `@src/Arius.Core/Shared/ChunkIndex/ChunkIndexService.cs`:
- Around line 239-248: CreateThinRepairEntry currently throws if
BlobMetadataKeys.ParentChunkHash is missing; change it to an async
CreateThinRepairEntry that first tries metadata then falls back to downloading
and parsing the legacy thin-body to extract the parent chunk hash when metadata
is absent or invalid, then return the same ShardEntry using
ContentHash.Parse(item.Name.Name.ToString()), ReadRequiredLongMetadata(item,
BlobMetadataKeys.OriginalSize) and ReadRequiredLongMetadata(item,
BlobMetadataKeys.CompressedSize); if the legacy body cannot be read or parsed
still throw ChunkIndexRepairException with context. Also update the caller that
invokes CreateThinRepairEntry to await the new async method and adjust its
return type accordingly so the async flow compiles.
In `@src/Arius.Core/Shared/ChunkIndex/ShardSerializer.cs`:
- Line 10: Change the public visibility of the ShardSerializer class to internal
since it is an implementation detail of the chunk-index; update the class
declaration for ShardSerializer from "public static class ShardSerializer" to
"internal static class ShardSerializer" so it remains accessible to internal
consumers like ChunkIndexShardCache and tests (which use InternalsVisibleTo) but
is not exposed publicly beyond the assembly.
---
Duplicate comments:
In `@src/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexServiceRepairTests.cs`:
- Around line 82-86: The test currently verifies rebuilt shards only against the
same ChunkIndexService instance that performed RepairAsync, so update the test
to assert persistence by instantiating a fresh ChunkIndexService after calling
index.RepairAsync(); then call LookupAsync(rebuiltHash) and
LookupAsync(staleHash) on the new service and assert the rebuilt shard returns
the expected ShardEntry (new ShardEntry(rebuiltHash,
ChunkHash.Parse(rebuiltHash), 100, 3)) and the staleHash returns null; ensure
you dispose/close the original index if required before creating the new
instance so the lookup truly hits persisted storage.
---
Nitpick comments:
In `@src/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexServiceLookupTests.cs`:
- Around line 216-255: The inline test double
ThrowOnMetadataBlobContainerService should be moved into a local Fakes/
subfolder beside these tests; create a new file named
ThrowOnMetadataBlobContainerService.cs under the test's Fakes/ directory, move
the class implementation there, adjust the namespace to match the existing
tests' namespace (or add an appropriate using) and ensure the test references
compile (update accessibility if needed); no behavior changes—just relocate the
class to follow the repository convention for scenario-specific fakes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b06e78e8-8aff-40e9-805c-81dff31dc094
📒 Files selected for processing (53)
.agents/skills/grill-me/SKILL.md.agents/skills/grill-with-openspec/SKILL.mddocs/cache.mdopenspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/.openspec.yamlopenspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/design.mdopenspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/proposal.mdopenspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/specs/architecture-tests/spec.mdopenspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/specs/chunk-index-service/spec.mdopenspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/tasks.mdopenspec/specs/architecture-tests/spec.mdopenspec/specs/chunk-index-service/spec.mdsrc/Arius.Architecture.Tests/DependencyTests.cssrc/Arius.AzureBlob/AzureBlobContainerService.cssrc/Arius.Cli.Tests/MediatorEventRoutingIntegrationTests.cssrc/Arius.Core.Tests/Fakes/FakeMetadataOnlyBlobContainerService.cssrc/Arius.Core.Tests/Fakes/FakeRecordingBlobContainerService.cssrc/Arius.Core.Tests/Fakes/FakeSeededBlobContainerService.cssrc/Arius.Core.Tests/Fakes/ThrowOnCreateBlobContainerService.cssrc/Arius.Core.Tests/Features/ArchiveCommand/ArchiveRecoveryTests.cssrc/Arius.Core.Tests/Features/ChunkHydrationStatusQuery/ResolveFileHydrationStatusesHandlerTests.cssrc/Arius.Core.Tests/Features/ListQuery/ListQueryHandlerTests.cssrc/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexReaderTests.cssrc/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexServiceLookupTests.cssrc/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexServiceRepairTests.cssrc/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexShardCacheTests.cssrc/Arius.Core.Tests/Shared/ChunkIndex/ChunkIndexWriteSessionTests.cssrc/Arius.Core.Tests/Shared/ChunkIndex/ShardSerializerLocalTests.cssrc/Arius.Core.Tests/Shared/ChunkIndex/ShardSerializerTests.cssrc/Arius.Core.Tests/Shared/ChunkIndex/ShardTests.cssrc/Arius.Core.Tests/Shared/ChunkStorage/Fakes/BlobAlreadyExistsOnSetMetadataOnceBlobContainerService.cssrc/Arius.Core.Tests/Shared/ChunkStorage/Fakes/BlockingDeleteBlobContainerService.cssrc/Arius.Core.Tests/Shared/ChunkStorage/Fakes/ContentTypeCapturingBlobContainerService.cssrc/Arius.Core.Tests/Shared/FileTree/Fakes/BlockingFileTreeUploadBlobContainerService.cssrc/Arius.Core.Tests/Shared/FileTree/Fakes/FaultingAndBlockingFileTreeUploadBlobContainerService.cssrc/Arius.Core.Tests/Shared/FileTree/Fakes/SlowDownloadBlobContainerService.cssrc/Arius.Core.Tests/Shared/FileTree/Fakes/ThrowingDownloadBlobContainerService.cssrc/Arius.Core.Tests/Shared/FileTree/Fakes/UnsortedSnapshotBlobContainerService.cssrc/Arius.Core.Tests/Shared/Storage/FakeInMemoryBlobContainerServiceTests.cssrc/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cssrc/Arius.Core/ServiceCollectionExtensions.cssrc/Arius.Core/Shared/ChunkIndex/ChunkIndexReader.cssrc/Arius.Core/Shared/ChunkIndex/ChunkIndexService.cssrc/Arius.Core/Shared/ChunkIndex/ChunkIndexShardCache.cssrc/Arius.Core/Shared/ChunkIndex/ChunkIndexWriteSession.cssrc/Arius.Core/Shared/ChunkIndex/CustomExceptions.cssrc/Arius.Core/Shared/ChunkIndex/Shard.cssrc/Arius.Core/Shared/ChunkIndex/ShardSerializer.cssrc/Arius.Core/Shared/Storage/IBlobContainerService.cssrc/Arius.Explorer/Infrastructure/RepositorySession.cssrc/Arius.Integration.Tests/Pipeline/Fakes/FaultingBlobService.cssrc/Arius.Integration.Tests/Pipeline/Fakes/RehydrationSimulatingBlobService.cssrc/Arius.Integration.Tests/Storage/BlobStorageServiceTests.cssrc/Arius.Tests.Shared/Storage/FakeInMemoryBlobContainerService.cs
💤 Files with no reviewable changes (1)
- .agents/skills/grill-me/SKILL.md
✅ Files skipped from review due to trivial changes (5)
- openspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/proposal.md
- src/Arius.Core.Tests/Features/ChunkHydrationStatusQuery/ResolveFileHydrationStatusesHandlerTests.cs
- openspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/.openspec.yaml
- openspec/changes/archive/2026-06-02-split-chunk-index-responsibilities/specs/architecture-tests/spec.md
- src/Arius.Core.Tests/Shared/ChunkIndex/ShardSerializerLocalTests.cs
🚧 Files skipped from review as they are similar to previous changes (9)
- src/Arius.Core.Tests/Features/ListQuery/ListQueryHandlerTests.cs
- src/Arius.Explorer/Infrastructure/RepositorySession.cs
- src/Arius.Core.Tests/Fakes/ThrowOnCreateBlobContainerService.cs
- src/Arius.Core.Tests/Shared/ChunkStorage/Fakes/ContentTypeCapturingBlobContainerService.cs
- src/Arius.Core.Tests/Shared/FileTree/Fakes/BlockingFileTreeUploadBlobContainerService.cs
- src/Arius.Core.Tests/Shared/FileTree/Fakes/FaultingAndBlockingFileTreeUploadBlobContainerService.cs
- src/Arius.Core.Tests/Fakes/FakeSeededBlobContainerService.cs
- src/Arius.Core.Tests/Shared/ChunkStorage/Fakes/BlobAlreadyExistsOnSetMetadataOnceBlobContainerService.cs
- src/Arius.Integration.Tests/Pipeline/Fakes/RehydrationSimulatingBlobService.cs
| public void AddEntry(ShardEntry entry) | ||
| { | ||
| if (Volatile.Read(ref _flushInProgress) != 0) | ||
| throw new InvalidOperationException("Cannot record chunk-index entries while a flush is in progress."); | ||
|
|
||
| _sessionEntries[entry.ContentHash] = entry; | ||
| } |
There was a problem hiding this comment.
TOCTOU race can cause silent data loss when AddEntry/FlushAsync run concurrently.
AddEntry checks _flushInProgress (line 15) then writes to _sessionEntries (line 18). If FlushAsync sets the flag and snapshots entries between those two operations, the new entry will be written after the snapshot but cleared at line 44, causing silent data loss instead of throwing the expected exception.
The archive handler upholds the "drain workers before flush" contract (ArchiveCommandHandler lines 506–523), so this race won't occur in current usage. However, the fail-fast check exists to catch future contract violations, and the TOCTOU window makes it unreliable.
Fix by atomically swapping _sessionEntries in FlushAsync:
🔒 Proposed fix to eliminate TOCTOU race
Remove readonly from the field declaration and swap the dictionary atomically in FlushAsync:
- private readonly ConcurrentDictionary<ContentHash, ShardEntry> _sessionEntries = [];
+ private ConcurrentDictionary<ContentHash, ShardEntry> _sessionEntries = new(); public async Task FlushAsync(ChunkIndexShardCache shardCache, CancellationToken cancellationToken = default)
{
if (Interlocked.Exchange(ref _flushInProgress, 1) != 0)
throw new InvalidOperationException("Chunk-index flush is already in progress.");
try
{
- if (_sessionEntries.IsEmpty)
+ var entriesToFlush = Interlocked.Exchange(ref _sessionEntries, new ConcurrentDictionary<ContentHash, ShardEntry>());
+
+ if (entriesToFlush.IsEmpty)
return;
- var byPrefix = _sessionEntries.Values
+ var byPrefix = entriesToFlush.Values
.GroupBy(e => Shard.PrefixOf(e.ContentHash))
.Select(g => new KeyValuePair<PathSegment, ShardEntry[]>(g.Key, [.. g]))
.ToArray();
await Parallel.ForEachAsync(
byPrefix,
new ParallelOptions { MaxDegreeOfParallelism = ChunkIndexService.FlushWorkers, CancellationToken = cancellationToken },
async (group, ct) =>
{
await shardCache.UpdateShardAsync(group.Key, group.Value, ct);
});
- _sessionEntries.Clear();
+ // No need to clear - already swapped to new dictionary
}
finally
{
Volatile.Write(ref _flushInProgress, 0);
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/Arius.Core/Shared/ChunkIndex/ChunkIndexWriteSession.cs` around lines 13 -
19, AddEntry has a TOCTOU race with FlushAsync: it checks Volatile.Read(ref
_flushInProgress) then writes into the shared dictionary _sessionEntries so
FlushAsync may snapshot and clear entries while AddEntry writes after the
snapshot, causing silent loss; fix by making _sessionEntries non-readonly and
changing FlushAsync to atomically swap the current dictionary with a new empty
one (e.g., Interlocked.Exchange on the _sessionEntries reference), then operate
on the snapshot for clearing/persisting; keep the Volatile.Read check in
AddEntry and continue to write into the current _sessionEntries reference so any
concurrent flush will operate on the swapped snapshot, eliminating the TOCTOU
window (refer to AddEntry, FlushAsync, _sessionEntries, and _flushInProgress).
Summary by CodeRabbit
New Features
Documentation