feat: introduce TreeCacheService for centralized filetree caching#49
feat: introduce TreeCacheService for centralized filetree caching#49woutervanranst wants to merge 7 commits into
Conversation
…h-and-tree-upload tree-cache-service: proposal, design, delta specs, and tasks for a shared TreeCacheService with disk-materialized prefetch and snapshot-based validation via timestamp-named marker files. parallel-flush-and-tree-upload: proposal only (parked).
Centralizes all filetree blob reads/writes through a new TreeCacheService with snapshot-based cache validation. On snapshot mismatch, remote blob listing is materialized as empty marker files on disk (no in-memory HashSet). All handlers (Archive, Restore, List) and TreeBuilder now use the service. Registered as singleton in AddArius.
…sk cache SnapshotService now mirrors TreeCacheService: CreateAsync writes plain JSON to disk first, then uploads to Azure. ResolveAsync checks disk before downloading from Azure. Removes WriteSnapshotMarkerAsync from TreeCacheService (snapshot dir ownership moves to SnapshotService). All call sites updated to pass accountName/containerName to SnapshotService constructor.
📝 WalkthroughWalkthroughAdds a new TreeCacheService (disk + remote cache) and integrates it across Archive, List, and Restore flows; refactors TreeBuilder and SnapshotService to use the cache and write local snapshot markers; removes several .serena project/memory files; adds OpenSpec design/proposals (tree-cache and parallel flush/tree upload) and comprehensive unit/integration tests for the cache behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Client/CLI
participant Archive as ArchiveCommandHandler
participant TreeB as TreeBuilder
participant TreeC as TreeCacheService
participant Index as ChunkIndexService
participant Snapshot as SnapshotService
participant Blob as BlobStorage
CLI->>Archive: Start archive
Archive->>TreeC: ValidateAsync()
TreeC-->>Archive: Validation result (fast/slow)
Archive->>TreeB: Build tree (calls EnsureUploadedAsync)
TreeB->>TreeC: ExistsInRemote(hash)
alt not exists
TreeB->>TreeC: WriteAsync(hash, treeBlob)
TreeC->>Blob: Upload filetrees/{hash}
Blob-->>TreeC: Upload ack or BlobAlreadyExists
TreeC->>TreeB: Write-through disk cache completed
end
Archive->>Index: Flush chunk-index (may be parallel per spec)
Index->>Blob: Upload shard blobs
Archive->>Snapshot: CreateAsync(...)
Snapshot->>Snapshot: Write marker JSON to disk
Snapshot->>Blob: Upload snapshot
Snapshot-->>Archive: Snapshot created
Archive-->>CLI: Archive completed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #49 +/- ##
==========================================
- Coverage 71.04% 68.48% -2.57%
==========================================
Files 59 60 +1
Lines 4687 4794 +107
Branches 624 636 +12
==========================================
- Hits 3330 3283 -47
- Misses 1225 1372 +147
- Partials 132 139 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (4)
src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs (2)
379-393: Inconsistent cleanup: uses inlineDirectory.Deleteinstead of shared helper.Test
ExistsInRemote_BeforeValidateAsync_ThrowsInvalidOperationExceptionuses inline cleanup (lines 390-391) instead ofCleanupAsync. Consider using the shared helper for consistency.♻️ Suggested fix
finally { - if (Directory.Exists(cacheDir)) Directory.Delete(cacheDir, recursive: true); - if (Directory.Exists(snapshotsDir)) Directory.Delete(snapshotsDir, recursive: true); + await CleanupAsync(cacheDir, snapshotsDir); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs` around lines 379 - 393, The test ExistsInRemote_BeforeValidateAsync_ThrowsInvalidOperationException currently performs inline cleanup with Directory.Delete; replace those inline deletions with the shared helper by invoking CleanupAsync with the cacheDir and snapshotsDir returned from MakeService (if the test is synchronous call CleanupAsync(...).GetAwaiter().GetResult() or make the test async and await CleanupAsync(...)) so cleanup is consistent with other tests.
51-56: Unnecessaryawait Task.CompletedTaskin cleanup helper.
CleanupAsyncperforms only synchronousDirectory.Deleteoperations, then awaitsTask.CompletedTaskwhich adds no value. Either make it synchronous or useTask.Runfor the directory operations if async behavior is truly needed.♻️ Suggested simplification
- private static async Task CleanupAsync(string cacheDir, string snapshotsDir) + private static void Cleanup(string cacheDir, string snapshotsDir) { if (Directory.Exists(cacheDir)) Directory.Delete(cacheDir, recursive: true); if (Directory.Exists(snapshotsDir)) Directory.Delete(snapshotsDir, recursive: true); - await Task.CompletedTask; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs` around lines 51 - 56, The helper CleanupAsync currently marks the method async and awaits Task.CompletedTask after performing synchronous Directory.Delete calls; remove the pointless async/await by turning CleanupAsync into a non-async method that returns Task.CompletedTask (i.e., drop the async keyword and the await) or, if you need true async IO, run the deletes inside Task.Run and await that task; reference the CleanupAsync method and the Directory.Delete calls when making the change.openspec/changes/tree-cache-service/proposal.md (1)
39-39: Minor: "CLI interface" is redundant."CLI" already stands for "Command Line Interface", so "CLI interface" is tautological. Consider changing to just "CLI" or "command-line interface".
📝 Suggested fix
-- **No breaking changes**: All changes are internal; the CLI interface, blob storage layout, and wire formats are unchanged. +- **No breaking changes**: All changes are internal; the CLI, blob storage layout, and wire formats are unchanged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/tree-cache-service/proposal.md` at line 39, The sentence containing the redundant phrase "CLI interface" should be updated to remove the tautology: replace "CLI interface" with either "CLI" or "command-line interface" in the line that currently reads "- **No breaking changes**: All changes are internal; the CLI interface, blob storage layout, and wire formats are unchanged." so it becomes "- **No breaking changes**: All changes are internal; the CLI, blob storage layout, and wire formats are unchanged." (or use "command-line interface" if you prefer the spelled-out form).openspec/changes/tree-cache-service/specs/tree-cache-service/spec.md (1)
90-103: Spec-implementation mismatch: marker file content.The spec states "write an empty marker file" at line 91-95, but
SnapshotService.CreateAsyncwrites the full plain-JSON manifest to disk (not an empty file). The implementation behavior is actually better for debugging and cache validation, but the spec should be updated to reflect "full plain-JSON snapshot manifest" instead of "empty marker file" for consistency.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/tree-cache-service/specs/tree-cache-service/spec.md` around lines 90 - 103, Update the spec to reflect that SnapshotService.CreateAsync writes the full plain-JSON snapshot manifest (not an empty file) into ~/.arius/{repo}/snapshots/ using SnapshotService.TimestampFormat; change the requirement text and each Scenario outcome to state that a plain-JSON snapshot manifest file named by the snapshot timestamp (e.g., 2026-03-22T150000.000Z) is created, ensure the directory creation clause remains, and keep references to SnapshotService.CreateAsync and SnapshotService.TimestampFormat for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openspec/changes/tree-cache-service/design.md`:
- Around line 39-50: Decision 2 incorrectly describes the on-disk snapshot
artifacts as empty timestamp-named marker files; update the text to reflect that
SnapshotService writes plain JSON snapshot manifests into
~/.arius/{repo}/snapshots/ and that local discovery/validation reads those JSON
files (not empty markers) and still uses SnapshotService.ListBlobNamesAsync and
SnapshotService.TimestampFormat for naming/sorting consistency; also apply the
same correction to the repeated text at the other occurrence (lines referenced
as 89-90) so the document matches the current implementation.
In `@openspec/changes/tree-cache-service/specs/archive-pipeline/spec.md`:
- Line 52: Remove the separate snapshot marker-file requirement from the spec:
delete the sentence that mandates writing an empty marker file named by the
snapshot timestamp to `~/.arius/{repo}/snapshots/` and any duplicate mentions on
lines 62-64, and instead reference that SnapshotService's write-through disk
caching handles cache validation; update the text around the snapshot storage
location `snapshots/<UTC-timestamp>` and the statement "Snapshots SHALL NEVER be
deleted" so the spec no longer requires or describes the marker-file mechanism
and points implementers to use SnapshotService for fast-path cache validation.
In `@openspec/changes/tree-cache-service/tasks.md`:
- Line 9: Remove the stale checklist entries that add WriteSnapshotMarkerAsync
responsibilities: delete the checklist line referencing "Implement
`WriteSnapshotMarkerAsync(string timestamp, CancellationToken)`" (and any
duplicate at the other location) from tasks.md so marker-file creation is not
reintroduced; rely on the refactored SnapshotService write-through behavior
instead and ensure no new task items or references to the
WriteSnapshotMarkerAsync method remain in the file.
- Line 45: Update the checklist wording to reflect actual behavior: change the
phrase "chunk-index L2 directory deleted" to "chunk-index L2 files deleted
(files under the L2 directory removed)". Also ensure the test description for
ValidateAsync mentions verifying ListAsync("filetrees/") was called and that
empty marker files were created on disk and that chunk-index L2 files (not the
directory itself) were removed; reference the ValidateAsync logic in
TreeCacheService (and the ListAsync("filetrees/") call) so it's clear the test
should assert file removals rather than directory deletion.
In `@openspec/specs/tree-cache-service/spec.md`:
- Line 21: Update the spec language for TreeCacheService.ReadAsync to stop
calling the on-disk cache "plaintext JSON" and instead describe it as the
existing line-based text entry format used by EnsureUploadedAsync; state that on
cache miss the downloaded blob SHALL be serialized using TreeBlobSerializer into
the same line-based text format and written to disk before returning, and on
cache hit the file SHALL be read and deserialized via TreeBlobSerializer without
any remote call.
- Around line 31-33: The TreeCacheService.ReadAsync implementation can have
concurrent writers on cache-miss and must be changed to serialize or make writes
atomic per hash: introduce a per-hash lock/mutex map (keyed by the cache hash)
or a concurrent lock striping inside TreeCacheService, acquire the lock when you
detect a cache miss before writing, write to a temporary file and atomically
rename into the cache path (or hold the lock for the entire write+rename), and
finally release and clean up the per-hash lock; update ReadAsync and any helper
methods used for cache population to use this locking/atomic-temp-file pattern
so multiple concurrent callers for the same hash cannot corrupt the disk cache.
In `@src/Arius.Core.Tests/ListQuery/ListQueryHandlerTests.cs`:
- Around line 527-528: Make the ListQueryHandler tests hermetic by not
hardcoding "account"/"container" in MakeHandler; instead generate a fresh
account/container pair per test (e.g. Guid-based names) and pass the same pair
into the ChunkIndexService, TreeCacheService, and the ListQueryHandler so they
share the same repo id and use isolated disk paths; update MakeHandler to accept
account and container parameters (or create a small factory that returns
(handler, account, container)) and ensure any SnapshotService/ChunkIndexService
construction in the tests is created with the same generated account/container
values.
In `@src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs`:
- Around line 604-605: Move the tree-cache validation so it runs before any
index lookups to avoid reading stale L2 shard data: call
_treeCache.ValidateAsync(...) at the start of the ArchiveCommandHandler method
(before the first _index.LookupAsync call) so that stale chunk-index entries are
invalidated prior to deduplication; update any control flow that assumes
validated cache state and verify that dedup logic around _index.LookupAsync and
subsequent pointer-only/missing-chunk handling still operates correctly and
remains thread-safe after this reorder.
In `@src/Arius.Core/ServiceCollectionExtensions.cs`:
- Around line 58-64: The TreeCacheService is registered as a singleton while it
contains mutable validation state (_validated) that is shared across concurrent
archive/list/restore handlers, causing race conditions and stale state leaks;
fix by removing mutable instance state from TreeCacheService or change its DI
lifetime so each command gets an isolated instance (e.g., stop using
services.AddSingleton for TreeCacheService and register
per-request/per-operation via services.AddScoped or services.AddTransient), and
update any consumers/constructors that assume singleton behavior; ensure the
validation flag (_validated) is either made thread-safe (immutable or protected
by synchronization) or moved to a per-operation context so archive/list/restore
handlers do not share it.
In `@src/Arius.Core/Shared/FileTree/TreeCacheService.cs`:
- Around line 33-36: The _validated boolean on TreeCacheService is accessed
without synchronization; change ValidateAsync to atomically set the flag using
Interlocked.CompareExchange (or Interlocked.Exchange) so only the first caller
runs the slow validation path and subsequent callers skip it, and remove the
non-atomic assignment at the end of ValidateAsync; in ExistsInRemote read the
flag with a thread-safe read (Volatile.Read or an Interlocked comparison) to
ensure callers see the updated state. Reference the private field _validated and
the methods ValidateAsync and ExistsInRemote when making the change.
- Around line 155-215: The ValidateAsync method can race when multiple callers
run concurrently because _validated may be set before the slow-path work
completes; introduce a SemaphoreSlim (e.g., _validationSemaphore = new
SemaphoreSlim(1,1)) and wrap the body of ValidateAsync with await
_validationSemaphore.WaitAsync(cancellationToken) and a try/finally that
releases the semaphore, so only one caller performs the slow-path
materialization (the blob enumeration and File.WriteAllBytesAsync calls and
chunk-index cleanup) while other callers await and then see _validated==true;
ensure _validated is only set to true inside the try block after all work
(including listing _blobs.ListAsync, creating marker files in _diskCacheDir, and
deleting files in _chunkIndexL2Dir) completes, and handle cancellation via
cancellationToken.ThrowIfCancellationRequested() as already done.
- Around line 170-174: In TreeCacheService where you iterate
_blobs.ListAsync(BlobPaths.Snapshots, cancellationToken) and set latestRemote =
Path.GetFileName(name), don't assume ListAsync yields sorted results; either
remove the "last one is newest" comment or explicitly collect the snapshot
names, parse their ISO8601 timestamps (same approach SnapshotService uses) and
sort by that timestamp before picking the newest; update the logic around
latestRemote and the loop to use the sorted result set (referencing
TreeCacheService, _blobs.ListAsync, BlobPaths.Snapshots, latestRemote and mimic
SnapshotService's parsing/sorting behavior).
In `@src/Arius.Core/Shared/Snapshot/SnapshotService.cs`:
- Around line 266-272: Concurrent calls to WriteToDiskAsync (from
CreateAsync/ResolveAsync) can cause corruption when both call
File.WriteAllBytesAsync to the same path; change WriteToDiskAsync to write to a
unique temp file (e.g., path + ".tmp."+Guid), flush/close it, then atomically
replace the target using File.Replace (or File.Move with overwrite semantics) to
move the temp into path; reference SnapshotManifest, WriteToDiskAsync,
_diskCacheDir, TimestampFormat and ensure you still pass the cancellationToken
to the write and clean up the temp file on error.
---
Nitpick comments:
In `@openspec/changes/tree-cache-service/proposal.md`:
- Line 39: The sentence containing the redundant phrase "CLI interface" should
be updated to remove the tautology: replace "CLI interface" with either "CLI" or
"command-line interface" in the line that currently reads "- **No breaking
changes**: All changes are internal; the CLI interface, blob storage layout, and
wire formats are unchanged." so it becomes "- **No breaking changes**: All
changes are internal; the CLI, blob storage layout, and wire formats are
unchanged." (or use "command-line interface" if you prefer the spelled-out
form).
In `@openspec/changes/tree-cache-service/specs/tree-cache-service/spec.md`:
- Around line 90-103: Update the spec to reflect that
SnapshotService.CreateAsync writes the full plain-JSON snapshot manifest (not an
empty file) into ~/.arius/{repo}/snapshots/ using
SnapshotService.TimestampFormat; change the requirement text and each Scenario
outcome to state that a plain-JSON snapshot manifest file named by the snapshot
timestamp (e.g., 2026-03-22T150000.000Z) is created, ensure the directory
creation clause remains, and keep references to SnapshotService.CreateAsync and
SnapshotService.TimestampFormat for clarity.
In `@src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs`:
- Around line 379-393: The test
ExistsInRemote_BeforeValidateAsync_ThrowsInvalidOperationException currently
performs inline cleanup with Directory.Delete; replace those inline deletions
with the shared helper by invoking CleanupAsync with the cacheDir and
snapshotsDir returned from MakeService (if the test is synchronous call
CleanupAsync(...).GetAwaiter().GetResult() or make the test async and await
CleanupAsync(...)) so cleanup is consistent with other tests.
- Around line 51-56: The helper CleanupAsync currently marks the method async
and awaits Task.CompletedTask after performing synchronous Directory.Delete
calls; remove the pointless async/await by turning CleanupAsync into a non-async
method that returns Task.CompletedTask (i.e., drop the async keyword and the
await) or, if you need true async IO, run the deletes inside Task.Run and await
that task; reference the CleanupAsync method and the Directory.Delete calls when
making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 316d4c4a-3665-4664-966c-30a629d1bac8
📒 Files selected for processing (32)
.serena/.gitignore.serena/memories/project_overview.md.serena/memories/style_conventions.md.serena/memories/suggested_commands.md.serena/project.ymlopenspec/changes/parallel-flush-and-tree-upload/.openspec.yamlopenspec/changes/parallel-flush-and-tree-upload/proposal.mdopenspec/changes/tree-cache-service/.openspec.yamlopenspec/changes/tree-cache-service/design.mdopenspec/changes/tree-cache-service/proposal.mdopenspec/changes/tree-cache-service/specs/archive-pipeline/spec.mdopenspec/changes/tree-cache-service/specs/list-query/spec.mdopenspec/changes/tree-cache-service/specs/restore-pipeline/spec.mdopenspec/changes/tree-cache-service/specs/tree-cache-service/spec.mdopenspec/changes/tree-cache-service/tasks.mdopenspec/specs/archive-pipeline/spec.mdopenspec/specs/list-query/spec.mdopenspec/specs/restore-pipeline/spec.mdopenspec/specs/tree-cache-service/spec.mdsrc/Arius.Core.Tests/Archive/ArchiveRecoveryTests.cssrc/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cssrc/Arius.Core.Tests/FileTree/TreeServiceTests.cssrc/Arius.Core.Tests/ListQuery/ListQueryHandlerTests.cssrc/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cssrc/Arius.Core/Features/ListQuery/ListQueryHandler.cssrc/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cssrc/Arius.Core/ServiceCollectionExtensions.cssrc/Arius.Core/Shared/FileTree/TreeBlobSerializer.cssrc/Arius.Core/Shared/FileTree/TreeCacheService.cssrc/Arius.Core/Shared/FileTree/TreeService.cssrc/Arius.Core/Shared/Snapshot/SnapshotService.cssrc/Arius.Integration.Tests/Snapshot/SnapshotServiceIntegrationTests.cs
💤 Files with no reviewable changes (5)
- .serena/.gitignore
- .serena/memories/suggested_commands.md
- .serena/project.yml
- .serena/memories/style_conventions.md
- .serena/memories/project_overview.md
| ### Decision 2: Snapshot validation via timestamp-named marker files in `~/.arius/{repo}/snapshots/` | ||
|
|
||
| **Choice**: After each archive, write an empty marker file named by the snapshot timestamp to `~/.arius/{repo}/snapshots/` (e.g., `2026-03-22T150000.000Z`). To determine the latest local snapshot, enumerate the directory and sort by parsed timestamp — the same approach `SnapshotService.ListBlobNamesAsync` uses for remote snapshots. | ||
|
|
||
| **Rationale**: The `snapshots/` directory mirrors the blob container's virtual directory layout, keeping the local cache structure intuitive (`ls ~/.arius/{repo}/` shows `chunk-index/`, `filetrees/`, `snapshots/` — same as the container). Marker files named by timestamp give a natural local history of all snapshots this machine has written, and inferring the latest requires only enumerating and sorting — trivially cheap for a directory with typically < 100 entries. The `SnapshotService.TimestampFormat` (`"yyyy-MM-ddTHHmmss.fffZ"`) sorts lexicographically, so a simple string sort suffices. | ||
|
|
||
| On archive start: | ||
| 1. Enumerate `snapshots/` directory → sort → latest local timestamp (or null if empty). | ||
| 2. Call `ListAsync("snapshots/")` to get the latest remote snapshot name → remote timestamp. (This is a lightweight call — typically < 100 snapshots.) | ||
| 3. Compare: if local == remote, **fast path**. If mismatch (or no local markers), **slow path**. | ||
|
|
||
| **Alternatives considered**: Using a single `.last` file containing the latest timestamp. Rejected — timestamp-named marker files mirror the remote layout more faithfully and provide a natural local history without additional complexity (the timestamp format already sorts lexicographically). |
There was a problem hiding this comment.
Document snapshots/ as JSON cache files, not marker files.
These sections still describe empty marker files in ~/.arius/{repo}/snapshots/, but the implementation now stores plain-JSON snapshot manifests there via SnapshotService. Leaving the old mechanism here will misdocument how local snapshot discovery and validation actually work.
Also applies to: 89-90
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/changes/tree-cache-service/design.md` around lines 39 - 50, Decision
2 incorrectly describes the on-disk snapshot artifacts as empty timestamp-named
marker files; update the text to reflect that SnapshotService writes plain JSON
snapshot manifests into ~/.arius/{repo}/snapshots/ and that local
discovery/validation reads those JSON files (not empty markers) and still uses
SnapshotService.ListBlobNamesAsync and SnapshotService.TimestampFormat for
naming/sorting consistency; also apply the same correction to the repeated text
at the other occurrence (lines referenced as 89-90) so the document matches the
current implementation.
| - **THEN** the tree hash SHALL be SHA256 of the UTF-8 encoded text bytes, optionally passphrase-seeded via `IEncryptionService.ComputeHash` | ||
|
|
||
| ### Requirement: Snapshot creation | ||
| The system SHALL create a snapshot manifest after tree construction completes. The snapshot SHALL be stored at `snapshots/<UTC-timestamp>` (e.g., `2026-03-22T150000.000Z`) and SHALL contain the root tree hash, timestamp, file count, total size, and Arius version. Snapshots SHALL NEVER be deleted. After successful snapshot creation, the pipeline SHALL write an empty marker file named by the snapshot timestamp to `~/.arius/{repo}/snapshots/` to enable fast-path cache validation on subsequent runs. |
There was a problem hiding this comment.
Remove the separate snapshot marker-file requirement.
This conflicts with the refactor objective that SnapshotService write-through disk caching removes separate marker-file usage. Keeping this requirement will drift implementation/tests back toward redundant marker handling.
Also applies to: 62-64
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/changes/tree-cache-service/specs/archive-pipeline/spec.md` at line
52, Remove the separate snapshot marker-file requirement from the spec: delete
the sentence that mandates writing an empty marker file named by the snapshot
timestamp to `~/.arius/{repo}/snapshots/` and any duplicate mentions on lines
62-64, and instead reference that SnapshotService's write-through disk caching
handles cache validation; update the text around the snapshot storage location
`snapshots/<UTC-timestamp>` and the statement "Snapshots SHALL NEVER be deleted"
so the spec no longer requires or describes the marker-file mechanism and points
implementers to use SnapshotService for fast-path cache validation.
| - [x] 1.4 Implement `WriteAsync(string hash, TreeBlob, CancellationToken)`: serialize via `TreeBlobSerializer.SerializeForStorageAsync` → upload to Azure (`overwrite: false`, catch `BlobAlreadyExistsException`) → write plaintext to disk cache. | ||
| - [x] 1.5 Implement `ValidateAsync(CancellationToken)`: enumerate `snapshots/` directory for timestamp-named marker files → sort → latest local timestamp. Call `ListAsync("snapshots/")` to find latest remote → compare. On mismatch: `ListAsync("filetrees/")` → create empty marker file on disk for each remote blob not already cached; delete chunk-index L2 directory files. | ||
| - [x] 1.6 Implement `ExistsInRemote(string hash)`: always `File.Exists` on disk (works on both paths after `ValidateAsync`). Throw `InvalidOperationException` if called before `ValidateAsync`. | ||
| - [x] 1.7 Implement `WriteSnapshotMarkerAsync(string timestamp, CancellationToken)`: create empty file at `~/.arius/{repo}/snapshots/{timestamp}`. Create directory if needed. |
There was a problem hiding this comment.
Drop WriteSnapshotMarkerAsync tasks; they’re stale with the SnapshotService refactor.
These checklist items reintroduce separate marker-file responsibilities that the SnapshotService write-through change was meant to eliminate.
Also applies to: 32-32
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/changes/tree-cache-service/tasks.md` at line 9, Remove the stale
checklist entries that add WriteSnapshotMarkerAsync responsibilities: delete the
checklist line referencing "Implement `WriteSnapshotMarkerAsync(string
timestamp, CancellationToken)`" (and any duplicate at the other location) from
tasks.md so marker-file creation is not reintroduced; rely on the refactored
SnapshotService write-through behavior instead and ensure no new task items or
references to the WriteSnapshotMarkerAsync method remain in the file.
| - [x] 7.3 Unit test: `WriteAsync` — verify Azure upload and disk cache write. | ||
| - [x] 7.4 Unit test: `WriteAsync` with existing blob — verify `BlobAlreadyExistsException` is caught and disk cache is still written. | ||
| - [x] 7.5 Unit test: `ValidateAsync` snapshot match — verify no `ListAsync("filetrees/")` call. | ||
| - [x] 7.6 Unit test: `ValidateAsync` snapshot mismatch — verify `ListAsync("filetrees/")` called, empty marker files created on disk, chunk-index L2 directory deleted. |
There was a problem hiding this comment.
Adjust invalidation wording: files are deleted, not the L2 directory.
Current implementation in src/Arius.Core/Shared/FileTree/TreeCacheService.cs removes files under chunk-index L2; it does not delete the directory itself.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/changes/tree-cache-service/tasks.md` at line 45, Update the
checklist wording to reflect actual behavior: change the phrase "chunk-index L2
directory deleted" to "chunk-index L2 files deleted (files under the L2
directory removed)". Also ensure the test description for ValidateAsync mentions
verifying ListAsync("filetrees/") was called and that empty marker files were
created on disk and that chunk-index L2 files (not the directory itself) were
removed; reference the ValidateAsync logic in TreeCacheService (and the
ListAsync("filetrees/") call) so it's clear the test should assert file removals
rather than directory deletion.
| - **THEN** `TreeCacheService` SHALL follow the same `{Thing}Service` pattern as `ChunkIndexService`, located in `Shared/FileTree/` | ||
|
|
||
| ### Requirement: TreeCacheService.ReadAsync | ||
| `TreeCacheService.ReadAsync(string hash, CancellationToken)` SHALL return a deserialized `TreeBlob` by checking the local disk cache first, then falling back to Azure download. On cache miss, the downloaded blob SHALL be deserialized and written to the disk cache (plaintext JSON, matching the existing `EnsureUploadedAsync` cache format) before returning. On cache hit, the blob SHALL be read from disk and deserialized without any remote call. The method SHALL handle deserialization and decryption internally via `TreeBlobSerializer`. |
There was a problem hiding this comment.
Fix tree-blob cache format wording ("plaintext JSON" is inaccurate).
Tree blobs are specified elsewhere in this PR as line-based text entries; this line should not describe the on-disk tree cache as JSON.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/specs/tree-cache-service/spec.md` at line 21, Update the spec
language for TreeCacheService.ReadAsync to stop calling the on-disk cache
"plaintext JSON" and instead describe it as the existing line-based text entry
format used by EnsureUploadedAsync; state that on cache miss the downloaded blob
SHALL be serialized using TreeBlobSerializer into the same line-based text
format and written to disk before returning, and on cache hit the file SHALL be
read and deserialized via TreeBlobSerializer without any remote call.
| // Tree cache | ||
| services.AddSingleton(sp => | ||
| new TreeCacheService( | ||
| sp.GetRequiredService<IBlobContainerService>(), | ||
| sp.GetRequiredService<IEncryptionService>(), | ||
| accountName, | ||
| containerName)); |
There was a problem hiding this comment.
Singleton TreeCacheService shares unsynchronized mutable state across handlers.
Line 59 registers TreeCacheService as a singleton, but the service keeps mutable validation state (_validated). With archive/list/restore handlers all holding this shared instance, concurrent command execution can race validation and leak stale state across operations. Consider removing mutable state from the singleton or changing lifetimes so each command gets an isolated cache service instance.
As per coding guidelines, src/Arius.Core/**: “Pay close attention to deduplication logic, encryption correctness, and thread safety.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Arius.Core/ServiceCollectionExtensions.cs` around lines 58 - 64, The
TreeCacheService is registered as a singleton while it contains mutable
validation state (_validated) that is shared across concurrent
archive/list/restore handlers, causing race conditions and stale state leaks;
fix by removing mutable instance state from TreeCacheService or change its DI
lifetime so each command gets an isolated instance (e.g., stop using
services.AddSingleton for TreeCacheService and register
per-request/per-operation via services.AddScoped or services.AddTransient), and
update any consumers/constructors that assume singleton behavior; ensure the
validation flag (_validated) is either made thread-safe (immutable or protected
by synchronization) or moved to a per-operation context so archive/list/restore
handlers do not share it.
| /// <summary> | ||
| /// Guard ensuring <see cref="ExistsInRemote"/> is not called before <see cref="ValidateAsync"/>. | ||
| /// </summary> | ||
| private bool _validated; |
There was a problem hiding this comment.
Thread safety concern: _validated flag is not synchronized.
The _validated boolean is read and written without synchronization. If ValidateAsync is called concurrently from multiple threads (e.g., during parallel handler initialization), a race condition could cause multiple threads to execute the slow path simultaneously.
Consider using Interlocked.CompareExchange or a SemaphoreSlim for proper synchronization.
🔒 Suggested fix using Interlocked
- /// <summary>
- /// Guard ensuring <see cref="ExistsInRemote"/> is not called before <see cref="ValidateAsync"/>.
- /// </summary>
- private bool _validated;
+ /// <summary>
+ /// Guard ensuring <see cref="ExistsInRemote"/> is not called before <see cref="ValidateAsync"/>.
+ /// 0 = not validated, 1 = validated.
+ /// </summary>
+ private int _validated;Then in ValidateAsync:
- if (_validated) return;
+ if (Interlocked.CompareExchange(ref _validated, 1, 0) == 1) return;And at the end of ValidateAsync, remove the assignment since it's done atomically above.
In ExistsInRemote:
- if (!_validated)
+ if (Volatile.Read(ref _validated) == 0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Arius.Core/Shared/FileTree/TreeCacheService.cs` around lines 33 - 36, The
_validated boolean on TreeCacheService is accessed without synchronization;
change ValidateAsync to atomically set the flag using
Interlocked.CompareExchange (or Interlocked.Exchange) so only the first caller
runs the slow validation path and subsequent callers skip it, and remove the
non-atomic assignment at the end of ValidateAsync; in ExistsInRemote read the
flag with a thread-safe read (Volatile.Read or an Interlocked comparison) to
ensure callers see the updated state. Reference the private field _validated and
the methods ValidateAsync and ExistsInRemote when making the change.
| public async Task ValidateAsync(CancellationToken cancellationToken = default) | ||
| { | ||
| if (_validated) return; | ||
|
|
||
| // Latest local snapshot filename (lexicographic = chronological due to timestamp format) | ||
| var latestLocal = Directory.Exists(_snapshotsDir) | ||
| ? Directory.EnumerateFiles(_snapshotsDir) | ||
| .Select(Path.GetFileName) | ||
| .Where(n => n is not null) | ||
| .OrderByDescending(n => n, StringComparer.Ordinal) | ||
| .FirstOrDefault() | ||
| : null; | ||
|
|
||
| // Latest remote snapshot | ||
| string? latestRemote = null; | ||
| await foreach (var name in _blobs.ListAsync(BlobPaths.Snapshots, cancellationToken)) | ||
| { | ||
| // ListAsync returns names sorted; last one is newest | ||
| latestRemote = Path.GetFileName(name); // strip "snapshots/" prefix | ||
| } | ||
|
|
||
| // If there are no remote snapshots, the repository is empty — fast path. | ||
| if (latestRemote is null) | ||
| { | ||
| _validated = true; | ||
| return; | ||
| } | ||
|
|
||
| // Fast path: this machine wrote the last snapshot. | ||
| if (latestLocal is not null && | ||
| string.Equals(latestLocal, latestRemote, StringComparison.Ordinal)) | ||
| { | ||
| _validated = true; | ||
| return; | ||
| } | ||
|
|
||
| // Slow path: snapshot mismatch (or no local snapshot at all). | ||
| // Materialize empty marker files for all remote filetree blobs not yet cached. | ||
| await foreach (var blobName in _blobs.ListAsync(BlobPaths.FileTrees, cancellationToken)) | ||
| { | ||
| cancellationToken.ThrowIfCancellationRequested(); | ||
| var hash = Path.GetFileName(blobName); // strip "filetrees/" prefix | ||
| var diskPath = Path.Combine(_diskCacheDir, hash); | ||
| if (!File.Exists(diskPath)) | ||
| { | ||
| // Create an empty marker file (will be filled by ReadAsync on demand) | ||
| await File.WriteAllBytesAsync(diskPath, [], cancellationToken); | ||
| } | ||
| } | ||
|
|
||
| // Invalidate chunk-index L2 (another machine may have updated shards) | ||
| if (Directory.Exists(_chunkIndexL2Dir)) | ||
| { | ||
| foreach (var file in Directory.EnumerateFiles(_chunkIndexL2Dir)) | ||
| { | ||
| try { File.Delete(file); } catch { /* ignore individual failures */ } | ||
| } | ||
| } | ||
|
|
||
| _validated = true; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
ValidateAsync sets _validated = true before completing slow-path work.
If using the atomic approach, note that the current code sets _validated = true at line 214, which is correct placement (after all work is done). However, with the Interlocked.CompareExchange pattern, you'd need to restructure to avoid the second call executing before the first completes its slow-path work.
A safer pattern would use a SemaphoreSlim(1,1) to serialize validation:
🔒 Alternative: SemaphoreSlim for serialization
+ private readonly SemaphoreSlim _validateLock = new(1, 1);
private bool _validated;
public async Task ValidateAsync(CancellationToken cancellationToken = default)
{
- if (_validated) return;
+ if (_validated) return;
+
+ await _validateLock.WaitAsync(cancellationToken);
+ try
+ {
+ if (_validated) return; // double-check after acquiring lock
+
+ // ... existing validation logic ...
+
+ _validated = true;
+ }
+ finally
+ {
+ _validateLock.Release();
+ }
- _validated = true;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Arius.Core/Shared/FileTree/TreeCacheService.cs` around lines 155 - 215,
The ValidateAsync method can race when multiple callers run concurrently because
_validated may be set before the slow-path work completes; introduce a
SemaphoreSlim (e.g., _validationSemaphore = new SemaphoreSlim(1,1)) and wrap the
body of ValidateAsync with await
_validationSemaphore.WaitAsync(cancellationToken) and a try/finally that
releases the semaphore, so only one caller performs the slow-path
materialization (the blob enumeration and File.WriteAllBytesAsync calls and
chunk-index cleanup) while other callers await and then see _validated==true;
ensure _validated is only set to true inside the try block after all work
(including listing _blobs.ListAsync, creating marker files in _diskCacheDir, and
deleting files in _chunkIndexL2Dir) completes, and handle cancellation via
cancellationToken.ThrowIfCancellationRequested() as already done.
| await foreach (var name in _blobs.ListAsync(BlobPaths.Snapshots, cancellationToken)) | ||
| { | ||
| // ListAsync returns names sorted; last one is newest | ||
| latestRemote = Path.GetFileName(name); // strip "snapshots/" prefix | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for ListAsync documentation or implementation to verify sorting guarantee
rg -n "ListAsync" --type cs -A 5 -B 2 | head -100Repository: woutervanranst/Arius7
Length of output: 9995
🏁 Script executed:
rg -n "BlobPaths" --type cs -A 3 -B 1 | head -80Repository: woutervanranst/Arius7
Length of output: 8119
🏁 Script executed:
rg -n "ParseTimestamp|Snapshots" --type cs -B 2 -A 8 | head -120Repository: woutervanranst/Arius7
Length of output: 11460
🏁 Script executed:
rg -n "TimestampFormat" --type cs -B 2 -A 2Repository: woutervanranst/Arius7
Length of output: 3785
ListAsync does not guarantee sorted ordering — TreeCacheService should not assume "last item is newest".
The IBlobContainerService.ListAsync interface provides no ordering guarantee. While snapshot blob names follow an ISO 8601 format (yyyy-MM-ddTHHmmss.fffZ) that is lexicographically sortable, the interface contract itself does not document this. SnapshotService explicitly sorts results from ListAsync by parsed timestamp (lines 209–212), which indicates this codebase treats ListAsync order as unreliable. The comment assumes undocumented behavior and should either be removed or the code should sort results to be consistent with SnapshotService.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Arius.Core/Shared/FileTree/TreeCacheService.cs` around lines 170 - 174,
In TreeCacheService where you iterate _blobs.ListAsync(BlobPaths.Snapshots,
cancellationToken) and set latestRemote = Path.GetFileName(name), don't assume
ListAsync yields sorted results; either remove the "last one is newest" comment
or explicitly collect the snapshot names, parse their ISO8601 timestamps (same
approach SnapshotService uses) and sort by that timestamp before picking the
newest; update the logic around latestRemote and the loop to use the sorted
result set (referencing TreeCacheService, _blobs.ListAsync, BlobPaths.Snapshots,
latestRemote and mimic SnapshotService's parsing/sorting behavior).
| private async Task WriteToDiskAsync(SnapshotManifest manifest, CancellationToken cancellationToken) | ||
| { | ||
| var fileName = manifest.Timestamp.UtcDateTime.ToString(TimestampFormat); | ||
| var path = Path.Combine(_diskCacheDir, fileName); | ||
| var json = JsonSerializer.SerializeToUtf8Bytes(manifest, s_localJsonOptions); | ||
| await File.WriteAllBytesAsync(path, json, cancellationToken); | ||
| } |
There was a problem hiding this comment.
Potential race condition on concurrent writes to same snapshot file.
If two CreateAsync or ResolveAsync calls target the same timestamp concurrently (e.g., during testing or edge-case recovery), File.WriteAllBytesAsync could corrupt the file. Since snapshots are typically created sequentially and timestamps include milliseconds, this is low risk in practice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Arius.Core/Shared/Snapshot/SnapshotService.cs` around lines 266 - 272,
Concurrent calls to WriteToDiskAsync (from CreateAsync/ResolveAsync) can cause
corruption when both call File.WriteAllBytesAsync to the same path; change
WriteToDiskAsync to write to a unique temp file (e.g., path + ".tmp."+Guid),
flush/close it, then atomically replace the target using File.Replace (or
File.Move with overwrite semantics) to move the temp into path; reference
SnapshotManifest, WriteToDiskAsync, _diskCacheDir, TimestampFormat and ensure
you still pass the cancellationToken to the write and clean up the temp file on
error.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs (1)
118-131: Reuse a fixture-scopedTreeCacheServiceinstead of allocating one per handler call.Current factories recreate the cache service every time, which resets its internal validation state and adds avoidable overhead. A single fixture instance would better match production DI behavior.
♻️ Suggested refactor
public sealed class PipelineFixture : IAsyncDisposable { + private TreeCacheService TreeCache { get; set; } = null!; @@ private async Task InitAsync(string? passphrase, CancellationToken ct) { @@ Index = new ChunkIndexService(BlobContainer, Encryption, Account, container.Name); + TreeCache = new TreeCacheService(BlobContainer, Encryption, Account, container.Name); Mediator = Substitute.For<IMediator>(); @@ private async Task InitAsyncWithEncryption( IEncryptionService encryption, BlobContainerClient? existingContainer, CancellationToken ct) { @@ Encryption = encryption; Index = new ChunkIndexService(BlobContainer, Encryption, Account, Container.Name); + TreeCache = new TreeCacheService(BlobContainer, Encryption, Account, Container.Name); Mediator = Substitute.For<IMediator>(); @@ public ArchiveCommandHandler CreateArchiveHandler() => - new(BlobContainer, Encryption, Index, new TreeCacheService(BlobContainer, Encryption, Account, Container.Name), Mediator, + new(BlobContainer, Encryption, Index, TreeCache, Mediator, NullLogger<ArchiveCommandHandler>.Instance, Account, Container.Name); public RestoreCommandHandler CreateRestoreHandler() => - new(BlobContainer, Encryption, Index, new TreeCacheService(BlobContainer, Encryption, Account, Container.Name), Mediator, + new(BlobContainer, Encryption, Index, TreeCache, Mediator, NullLogger<RestoreCommandHandler>.Instance, Account, Container.Name); public ListQueryHandler CreateListQueryHandler() => - new(BlobContainer, Encryption, Index, new TreeCacheService(BlobContainer, Encryption, Account, Container.Name), + new(BlobContainer, Encryption, Index, TreeCache, NullLogger<ListQueryHandler>.Instance, Account, Container.Name); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs` around lines 118 - 131, The handlers (CreateArchiveHandler, CreateRestoreHandler, CreateListQueryHandler) each instantiate a new TreeCacheService causing resets and extra work; instead add a fixture-scoped TreeCacheService field on PipelineFixture, instantiate it once (e.g., in the fixture constructor or setup), and pass that single TreeCacheService instance into each handler factory rather than creating new ones inline so the cache's validation/state persists and DI behavior is mirrored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs`:
- Around line 118-131: The handlers (CreateArchiveHandler, CreateRestoreHandler,
CreateListQueryHandler) each instantiate a new TreeCacheService causing resets
and extra work; instead add a fixture-scoped TreeCacheService field on
PipelineFixture, instantiate it once (e.g., in the fixture constructor or
setup), and pass that single TreeCacheService instance into each handler factory
rather than creating new ones inline so the cache's validation/state persists
and DI behavior is mirrored.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ccfc7b26-3d04-46ea-8fd3-f256fad17f1b
📒 Files selected for processing (7)
src/Arius.E2E.Tests/E2ETests.cssrc/Arius.E2E.Tests/RehydrationE2ETests.cssrc/Arius.Integration.Tests/FileTree/TreeBuilderIntegrationTests.cssrc/Arius.Integration.Tests/Pipeline/ContainerCreationTests.cssrc/Arius.Integration.Tests/Pipeline/CrashRecoveryTests.cssrc/Arius.Integration.Tests/Pipeline/PipelineFixture.cssrc/Arius.Integration.Tests/Pipeline/RehydrationStateTests.cs
…date test fixtures Register SnapshotService as singleton in ServiceCollectionExtensions alongside TreeCacheService and ChunkIndexService, inject it via constructor into ArchiveCommandHandler, RestoreCommandHandler, and ListQueryHandler, and update all test files (PipelineFixture, CrashRecoveryTests, ContainerCreationTests, RehydrationStateTests, E2ETests, RehydrationE2ETests) to pass the new arg. Also notes DI singleton registration in the tree-cache-service spec.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs (1)
98-118:⚠️ Potential issue | 🔴 CriticalValidate the repo caches before restore starts using them.
TreeCacheService.ValidateAsync()is the code path that invalidates stale chunk-index L2 data when the local snapshot markers no longer match remote. This handler never calls it, so Line 187 can still read old shard data and fall into the current “Content hash not found in index, skipping” path for files that do exist remotely. Please run validation once near the top ofHandlebefore snapshot/tree/index work begins.🔁 Suggested fix
// ── Ensure container exists ─────────────────────────────────────────── await _blobs.CreateContainerIfNotExistsAsync(cancellationToken); try { + await _treeCache.ValidateAsync(cancellationToken); + // ── Step 1: Resolve snapshot ────────────────────────────────────── var snapshot = await _snapshotSvc.ResolveAsync(opts.Version, cancellationToken);As per coding guidelines,
src/Arius.Core/**: “Core library. Pay close attention to deduplication logic, encryption correctness, and thread safety. Flag any changes that could cause data loss or corruption.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs` around lines 98 - 118, Call TreeCacheService.ValidateAsync once at the start of RestoreCommandHandler.Handle (before any snapshot/tree/index work such as _snapshotSvc.ResolveAsync or CollectFilesAsync) to ensure local caches are validated and stale L2 chunk-index data is invalidated; invoke it with the current cancellationToken (e.g. await TreeCacheService.ValidateAsync(cancellationToken)) and propagate or log any errors per existing error-handling patterns so validation runs before reading shards/indexes.
♻️ Duplicate comments (2)
src/Arius.Core/ServiceCollectionExtensions.cs (1)
60-65:⚠️ Potential issue | 🟠 MajorDon't capture
TreeCacheServicein singleton handlers.
TreeCacheServicecarries mutable repo state (_validated) and cache-file side effects, but this registration makes one instance shared by all archive/list/restore operations. Once one operation flips_validated, later operations in the same process can skip revalidation even after the remote repo changes. Either makeTreeCacheServicestateless/thread-safe, or stop registering both it and the handlers that capture it as singletons.As per coding guidelines,
src/Arius.Core/**: “Core library. Pay close attention to deduplication logic, encryption correctness, and thread safety. Flag any changes that could cause data loss or corruption.”Also applies to: 83-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Arius.Core/ServiceCollectionExtensions.cs` around lines 60 - 65, The TreeCacheService is stateful (field _validated and cache-file side effects) and must not be registered as a process-wide singleton; change the DI registration in ServiceCollectionExtensions (the AddSingleton that constructs new TreeCacheService) to a non-singleton lifetime (e.g., AddScoped or AddTransient) and ensure any handlers/services that currently capture TreeCacheService as singletons are also converted to a compatible shorter lifetime (or refactored to not hold state) so each archive/list/restore operation gets its own TreeCacheService instance; locate registrations and handler constructors that accept TreeCacheService to update their lifetimes accordingly.src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs (1)
607-608:⚠️ Potential issue | 🔴 CriticalMove cache validation ahead of the first dedup lookup.
ValidateAsyncis what clears stale chunk-index L2 state when the latest remote snapshot no longer matches local. Here it still runs after dedup has already hit Line 227, so those lookups can use stale shard data. The downstream effect is missed remote dedup hits and pointer-only files being skipped as “missing chunk”. Call it once near the start ofHandle, before any_index.LookupAsync(...).🔁 Suggested move
// ── Ensure container exists ─────────────────────────────────────────── await _blobs.CreateContainerIfNotExistsAsync(cancellationToken); + await _treeCache.ValidateAsync(cancellationToken); // Validate options (task 8.13) if (opts is { RemoveLocal: true, NoPointers: true }) return new ArchiveResult @@ - // Task 5.1: Validate tree cache before building tree (compare local/remote snapshot markers) - await _treeCache.ValidateAsync(cancellationToken); - // Task 8.10: Index flush await _index.FlushAsync(cancellationToken);As per coding guidelines,
src/Arius.Core/**: “Core library. Pay close attention to deduplication logic, encryption correctness, and thread safety. Flag any changes that could cause data loss or corruption.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs` around lines 607 - 608, Move the cache validation earlier in the request flow: call _treeCache.ValidateAsync(cancellationToken) at the start of the ArchiveCommandHandler.Handle method (before any calls to _index.LookupAsync) so stale L2 shard state is cleared prior to dedup lookups; locate the Handle method and the first _index.LookupAsync usage and reposition the ValidateAsync call to execute immediately after entering Handle and before any dedup/index operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs`:
- Around line 98-118: Call TreeCacheService.ValidateAsync once at the start of
RestoreCommandHandler.Handle (before any snapshot/tree/index work such as
_snapshotSvc.ResolveAsync or CollectFilesAsync) to ensure local caches are
validated and stale L2 chunk-index data is invalidated; invoke it with the
current cancellationToken (e.g. await
TreeCacheService.ValidateAsync(cancellationToken)) and propagate or log any
errors per existing error-handling patterns so validation runs before reading
shards/indexes.
---
Duplicate comments:
In `@src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs`:
- Around line 607-608: Move the cache validation earlier in the request flow:
call _treeCache.ValidateAsync(cancellationToken) at the start of the
ArchiveCommandHandler.Handle method (before any calls to _index.LookupAsync) so
stale L2 shard state is cleared prior to dedup lookups; locate the Handle method
and the first _index.LookupAsync usage and reposition the ValidateAsync call to
execute immediately after entering Handle and before any dedup/index operations.
In `@src/Arius.Core/ServiceCollectionExtensions.cs`:
- Around line 60-65: The TreeCacheService is stateful (field _validated and
cache-file side effects) and must not be registered as a process-wide singleton;
change the DI registration in ServiceCollectionExtensions (the AddSingleton that
constructs new TreeCacheService) to a non-singleton lifetime (e.g., AddScoped or
AddTransient) and ensure any handlers/services that currently capture
TreeCacheService as singletons are also converted to a compatible shorter
lifetime (or refactored to not hold state) so each archive/list/restore
operation gets its own TreeCacheService instance; locate registrations and
handler constructors that accept TreeCacheService to update their lifetimes
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dbe888b0-8eea-48cb-adb0-e6df22a85b16
📒 Files selected for processing (11)
openspec/specs/tree-cache-service/spec.mdsrc/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cssrc/Arius.Core/Features/ListQuery/ListQueryHandler.cssrc/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cssrc/Arius.Core/ServiceCollectionExtensions.cssrc/Arius.E2E.Tests/E2ETests.cssrc/Arius.E2E.Tests/RehydrationE2ETests.cssrc/Arius.Integration.Tests/Pipeline/ContainerCreationTests.cssrc/Arius.Integration.Tests/Pipeline/CrashRecoveryTests.cssrc/Arius.Integration.Tests/Pipeline/PipelineFixture.cssrc/Arius.Integration.Tests/Pipeline/RehydrationStateTests.cs
✅ Files skipped from review due to trivial changes (2)
- src/Arius.E2E.Tests/E2ETests.cs
- openspec/specs/tree-cache-service/spec.md
🚧 Files skipped from review as they are similar to previous changes (4)
- src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs
- src/Arius.Core/Features/ListQuery/ListQueryHandler.cs
- src/Arius.Integration.Tests/Pipeline/ContainerCreationTests.cs
- src/Arius.E2E.Tests/RehydrationE2ETests.cs
Summary
TreeCacheService— a singleton that centralizes all filetree blob reads/writes through a local disk cache (~/.arius/{repo}/filetrees/), eliminating redundant Azure downloads across archive, restore, and list operationsSnapshotServiceto use a write-through local plain-JSON disk cache (~/.arius/{repo}/snapshots/) with disk-firstResolveAsync, removing the need for separate empty marker filesopenspec/specs/, reflecting both the original design and the evolved implementationChanges
New
TreeCacheService(Shared/FileTree/) withReadAsync,WriteAsync,ExistsInRemote,ValidateAsyncTreeCacheServiceTests(13 unit tests)openspec/specs/tree-cache-service/spec.mdModified
SnapshotService: 4-param constructor,GetDiskCacheDirectory, write-throughCreateAsync, disk-firstResolveAsyncTreeService/TreeBlobSerializer: useTreeCacheService; addedDeserialize(byte[])ArchiveCommandHandler,RestoreCommandHandler,ListQueryHandler: route throughTreeCacheServiceand updatedSnapshotServiceconstructorServiceCollectionExtensions: registerTreeCacheServiceas singletonarchive-pipeline,restore-pipeline,list-queryTreeServiceTests,ArchiveRecoveryTests,ListQueryHandlerTests,SnapshotServiceIntegrationTestsAll 168 tests pass.
Summary by CodeRabbit
New Features
Chores
Tests