Skip to content

feat: introduce TreeCacheService for centralized filetree caching#49

Closed
woutervanranst wants to merge 7 commits into
masterfrom
optimize-chunkindex-filetrees
Closed

feat: introduce TreeCacheService for centralized filetree caching#49
woutervanranst wants to merge 7 commits into
masterfrom
optimize-chunkindex-filetrees

Conversation

@woutervanranst

@woutervanranst woutervanranst commented Apr 4, 2026

Copy link
Copy Markdown
Owner

Summary

  • Introduces TreeCacheService — a singleton that centralizes all filetree blob reads/writes through a local disk cache (~/.arius/{repo}/filetrees/), eliminating redundant Azure downloads across archive, restore, and list operations
  • Refactors SnapshotService to use a write-through local plain-JSON disk cache (~/.arius/{repo}/snapshots/) with disk-first ResolveAsync, removing the need for separate empty marker files
  • Syncs all delta specs back to main specs in openspec/specs/, reflecting both the original design and the evolved implementation

Changes

New

  • TreeCacheService (Shared/FileTree/) with ReadAsync, WriteAsync, ExistsInRemote, ValidateAsync
  • TreeCacheServiceTests (13 unit tests)
  • openspec/specs/tree-cache-service/spec.md

Modified

  • SnapshotService: 4-param constructor, GetDiskCacheDirectory, write-through CreateAsync, disk-first ResolveAsync
  • TreeService / TreeBlobSerializer: use TreeCacheService; added Deserialize(byte[])
  • ArchiveCommandHandler, RestoreCommandHandler, ListQueryHandler: route through TreeCacheService and updated SnapshotService constructor
  • ServiceCollectionExtensions: register TreeCacheService as singleton
  • Main specs updated: archive-pipeline, restore-pipeline, list-query
  • Tests updated: TreeServiceTests, ArchiveRecoveryTests, ListQueryHandlerTests, SnapshotServiceIntegrationTests

All 168 tests pass.

Summary by CodeRabbit

  • New Features

    • Local disk write-through caching for file-tree blobs with snapshot-based validation for faster archive/ls/restore workflows.
    • Parallelized end-of-pipeline processing (concurrent shard flush + tree uploads) with configurable concurrency and progress events.
  • Chores

    • Centralized tree-blob persistence and snapshot marker write/read to enable the fast-path cache behavior.
  • Tests

    • Extensive unit and integration tests covering cache behavior, snapshot validation, parallel pipeline, and progress reporting.

…h-and-tree-upload

tree-cache-service: proposal, design, delta specs, and tasks for a shared
TreeCacheService with disk-materialized prefetch and snapshot-based
validation via timestamp-named marker files.

parallel-flush-and-tree-upload: proposal only (parked).
Centralizes all filetree blob reads/writes through a new TreeCacheService
with snapshot-based cache validation. On snapshot mismatch, remote blob
listing is materialized as empty marker files on disk (no in-memory HashSet).
All handlers (Archive, Restore, List) and TreeBuilder now use the service.
Registered as singleton in AddArius.
…sk cache

SnapshotService now mirrors TreeCacheService: CreateAsync writes plain JSON
to disk first, then uploads to Azure. ResolveAsync checks disk before
downloading from Azure. Removes WriteSnapshotMarkerAsync from TreeCacheService
(snapshot dir ownership moves to SnapshotService). All call sites updated to
pass accountName/containerName to SnapshotService constructor.
@coderabbitai

coderabbitai Bot commented Apr 4, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds a new TreeCacheService (disk + remote cache) and integrates it across Archive, List, and Restore flows; refactors TreeBuilder and SnapshotService to use the cache and write local snapshot markers; removes several .serena project/memory files; adds OpenSpec design/proposals (tree-cache and parallel flush/tree upload) and comprehensive unit/integration tests for the cache behavior.

Changes

Cohort / File(s) Summary
Serena config & memories
\.serena/.gitignore, \.serena/project.yml, \.serena/memories/project_overview.md, \.serena/memories/style_conventions.md, \.serena/memories/suggested_commands.md
Deleted Serena project config and repository-local memory/docs; removed two gitignore entries in .serena/.gitignore.
Tree cache implementation & serializer
src/Arius.Core/Shared/FileTree/TreeCacheService.cs, src/Arius.Core/Shared/FileTree/TreeBlobSerializer.cs
New public TreeCacheService with ReadAsync/WriteAsync/ValidateAsync/ExistsInRemote/GetDiskCacheDirectory; added plaintext TreeBlobSerializer.Deserialize(byte[]) to support disk cache format.
TreeBuilder / TreeService refactor
src/Arius.Core/Shared/FileTree/TreeService.cs
TreeBuilder now delegates existence checks and uploads to TreeCacheService (removed inline disk-cache and direct blob logic; removed GetDiskCacheDirectory).
Snapshot service changes
src/Arius.Core/Shared/Snapshot/SnapshotService.cs
SnapshotService now has a disk JSON cache, new constructor parameters (account/container), GetDiskCacheDirectory, write-through snapshot marker behavior, and disk-first ResolveAsync/CreateAsync flows.
Handler DI and wiring
src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs, src/Arius.Core/Features/ListQuery/ListQueryHandler.cs, src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs, src/Arius.Core/ServiceCollectionExtensions.cs
Registered TreeCacheService and SnapshotService as singletons; handlers now accept/inject them; archive pipeline calls _treeCache.ValidateAsync(...) before flush/tree build; handlers use _treeCache.ReadAsync(...) and injected _snapshotSvc.
Specs / OpenSpec docs
openspec/changes/tree-cache-service/..., openspec/changes/parallel-flush-and-tree-upload/..., openspec/specs/*
Added design, proposal, and spec documents for TreeCacheService and for parallel flush/tree upload; updated archive/list/restore specs to require use of TreeCacheService and snapshot marker semantics.
Tests & test wiring
src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs, src/Arius.Core.Tests/FileTree/TreeServiceTests.cs, src/Arius.Core.Tests/Archive/ArchiveRecoveryTests.cs, src/Arius.Core.Tests/ListQuery/ListQueryHandlerTests.cs, src/Arius.Integration.Tests/*, src/Arius.E2E.Tests/*
Added comprehensive unit tests for TreeCacheService; updated many unit/integration/E2E tests and fixtures to construct and pass TreeCacheService/SnapshotService into handlers and to use GetDiskCacheDirectory helpers.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Client/CLI
    participant Archive as ArchiveCommandHandler
    participant TreeB as TreeBuilder
    participant TreeC as TreeCacheService
    participant Index as ChunkIndexService
    participant Snapshot as SnapshotService
    participant Blob as BlobStorage

    CLI->>Archive: Start archive
    Archive->>TreeC: ValidateAsync()
    TreeC-->>Archive: Validation result (fast/slow)
    Archive->>TreeB: Build tree (calls EnsureUploadedAsync)
    TreeB->>TreeC: ExistsInRemote(hash)
    alt not exists
        TreeB->>TreeC: WriteAsync(hash, treeBlob)
        TreeC->>Blob: Upload filetrees/{hash}
        Blob-->>TreeC: Upload ack or BlobAlreadyExists
        TreeC->>TreeB: Write-through disk cache completed
    end
    Archive->>Index: Flush chunk-index (may be parallel per spec)
    Index->>Blob: Upload shard blobs
    Archive->>Snapshot: CreateAsync(...)
    Snapshot->>Snapshot: Write marker JSON to disk
    Snapshot->>Blob: Upload snapshot
    Snapshot-->>Archive: Snapshot created
    Archive-->>CLI: Archive completed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.49% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: introduce TreeCacheService for centralized filetree caching' is a concise, clear summary that directly reflects the main change in the changeset: the addition of a TreeCacheService for centralized filetree blob caching. The title accurately captures the primary feature being introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch optimize-chunkindex-filetrees

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Apr 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.48148% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.48%. Comparing base (ad9bc84) to head (e87ae61).

Files with missing lines Patch % Lines
src/Arius.Core/Shared/FileTree/TreeCacheService.cs 69.62% 20 Missing and 4 partials ⚠️
src/Arius.Core/Shared/Snapshot/SnapshotService.cs 82.35% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #49      +/-   ##
==========================================
- Coverage   71.04%   68.48%   -2.57%     
==========================================
  Files          59       60       +1     
  Lines        4687     4794     +107     
  Branches      624      636      +12     
==========================================
- Hits         3330     3283      -47     
- Misses       1225     1372     +147     
- Partials      132      139       +7     
Flag Coverage Δ
linux 76.93% <81.48%> (-3.30%) ⬇️
windows 54.13% <72.22%> (-12.87%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (4)
src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs (2)

379-393: Inconsistent cleanup: uses inline Directory.Delete instead of shared helper.

Test ExistsInRemote_BeforeValidateAsync_ThrowsInvalidOperationException uses inline cleanup (lines 390-391) instead of CleanupAsync. Consider using the shared helper for consistency.

♻️ Suggested fix
         finally
         {
-            if (Directory.Exists(cacheDir))    Directory.Delete(cacheDir,     recursive: true);
-            if (Directory.Exists(snapshotsDir)) Directory.Delete(snapshotsDir, recursive: true);
+            await CleanupAsync(cacheDir, snapshotsDir);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs` around lines 379 -
393, The test ExistsInRemote_BeforeValidateAsync_ThrowsInvalidOperationException
currently performs inline cleanup with Directory.Delete; replace those inline
deletions with the shared helper by invoking CleanupAsync with the cacheDir and
snapshotsDir returned from MakeService (if the test is synchronous call
CleanupAsync(...).GetAwaiter().GetResult() or make the test async and await
CleanupAsync(...)) so cleanup is consistent with other tests.

51-56: Unnecessary await Task.CompletedTask in cleanup helper.

CleanupAsync performs only synchronous Directory.Delete operations, then awaits Task.CompletedTask which adds no value. Either make it synchronous or use Task.Run for the directory operations if async behavior is truly needed.

♻️ Suggested simplification
-    private static async Task CleanupAsync(string cacheDir, string snapshotsDir)
+    private static void Cleanup(string cacheDir, string snapshotsDir)
     {
         if (Directory.Exists(cacheDir))    Directory.Delete(cacheDir,     recursive: true);
         if (Directory.Exists(snapshotsDir)) Directory.Delete(snapshotsDir, recursive: true);
-        await Task.CompletedTask;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs` around lines 51 - 56,
The helper CleanupAsync currently marks the method async and awaits
Task.CompletedTask after performing synchronous Directory.Delete calls; remove
the pointless async/await by turning CleanupAsync into a non-async method that
returns Task.CompletedTask (i.e., drop the async keyword and the await) or, if
you need true async IO, run the deletes inside Task.Run and await that task;
reference the CleanupAsync method and the Directory.Delete calls when making the
change.
openspec/changes/tree-cache-service/proposal.md (1)

39-39: Minor: "CLI interface" is redundant.

"CLI" already stands for "Command Line Interface", so "CLI interface" is tautological. Consider changing to just "CLI" or "command-line interface".

📝 Suggested fix
-- **No breaking changes**: All changes are internal; the CLI interface, blob storage layout, and wire formats are unchanged.
+- **No breaking changes**: All changes are internal; the CLI, blob storage layout, and wire formats are unchanged.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/tree-cache-service/proposal.md` at line 39, The sentence
containing the redundant phrase "CLI interface" should be updated to remove the
tautology: replace "CLI interface" with either "CLI" or "command-line interface"
in the line that currently reads "- **No breaking changes**: All changes are
internal; the CLI interface, blob storage layout, and wire formats are
unchanged." so it becomes "- **No breaking changes**: All changes are internal;
the CLI, blob storage layout, and wire formats are unchanged." (or use
"command-line interface" if you prefer the spelled-out form).
openspec/changes/tree-cache-service/specs/tree-cache-service/spec.md (1)

90-103: Spec-implementation mismatch: marker file content.

The spec states "write an empty marker file" at line 91-95, but SnapshotService.CreateAsync writes the full plain-JSON manifest to disk (not an empty file). The implementation behavior is actually better for debugging and cache validation, but the spec should be updated to reflect "full plain-JSON snapshot manifest" instead of "empty marker file" for consistency.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/tree-cache-service/specs/tree-cache-service/spec.md` around
lines 90 - 103, Update the spec to reflect that SnapshotService.CreateAsync
writes the full plain-JSON snapshot manifest (not an empty file) into
~/.arius/{repo}/snapshots/ using SnapshotService.TimestampFormat; change the
requirement text and each Scenario outcome to state that a plain-JSON snapshot
manifest file named by the snapshot timestamp (e.g., 2026-03-22T150000.000Z) is
created, ensure the directory creation clause remains, and keep references to
SnapshotService.CreateAsync and SnapshotService.TimestampFormat for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openspec/changes/tree-cache-service/design.md`:
- Around line 39-50: Decision 2 incorrectly describes the on-disk snapshot
artifacts as empty timestamp-named marker files; update the text to reflect that
SnapshotService writes plain JSON snapshot manifests into
~/.arius/{repo}/snapshots/ and that local discovery/validation reads those JSON
files (not empty markers) and still uses SnapshotService.ListBlobNamesAsync and
SnapshotService.TimestampFormat for naming/sorting consistency; also apply the
same correction to the repeated text at the other occurrence (lines referenced
as 89-90) so the document matches the current implementation.

In `@openspec/changes/tree-cache-service/specs/archive-pipeline/spec.md`:
- Line 52: Remove the separate snapshot marker-file requirement from the spec:
delete the sentence that mandates writing an empty marker file named by the
snapshot timestamp to `~/.arius/{repo}/snapshots/` and any duplicate mentions on
lines 62-64, and instead reference that SnapshotService's write-through disk
caching handles cache validation; update the text around the snapshot storage
location `snapshots/<UTC-timestamp>` and the statement "Snapshots SHALL NEVER be
deleted" so the spec no longer requires or describes the marker-file mechanism
and points implementers to use SnapshotService for fast-path cache validation.

In `@openspec/changes/tree-cache-service/tasks.md`:
- Line 9: Remove the stale checklist entries that add WriteSnapshotMarkerAsync
responsibilities: delete the checklist line referencing "Implement
`WriteSnapshotMarkerAsync(string timestamp, CancellationToken)`" (and any
duplicate at the other location) from tasks.md so marker-file creation is not
reintroduced; rely on the refactored SnapshotService write-through behavior
instead and ensure no new task items or references to the
WriteSnapshotMarkerAsync method remain in the file.
- Line 45: Update the checklist wording to reflect actual behavior: change the
phrase "chunk-index L2 directory deleted" to "chunk-index L2 files deleted
(files under the L2 directory removed)". Also ensure the test description for
ValidateAsync mentions verifying ListAsync("filetrees/") was called and that
empty marker files were created on disk and that chunk-index L2 files (not the
directory itself) were removed; reference the ValidateAsync logic in
TreeCacheService (and the ListAsync("filetrees/") call) so it's clear the test
should assert file removals rather than directory deletion.

In `@openspec/specs/tree-cache-service/spec.md`:
- Line 21: Update the spec language for TreeCacheService.ReadAsync to stop
calling the on-disk cache "plaintext JSON" and instead describe it as the
existing line-based text entry format used by EnsureUploadedAsync; state that on
cache miss the downloaded blob SHALL be serialized using TreeBlobSerializer into
the same line-based text format and written to disk before returning, and on
cache hit the file SHALL be read and deserialized via TreeBlobSerializer without
any remote call.
- Around line 31-33: The TreeCacheService.ReadAsync implementation can have
concurrent writers on cache-miss and must be changed to serialize or make writes
atomic per hash: introduce a per-hash lock/mutex map (keyed by the cache hash)
or a concurrent lock striping inside TreeCacheService, acquire the lock when you
detect a cache miss before writing, write to a temporary file and atomically
rename into the cache path (or hold the lock for the entire write+rename), and
finally release and clean up the per-hash lock; update ReadAsync and any helper
methods used for cache population to use this locking/atomic-temp-file pattern
so multiple concurrent callers for the same hash cannot corrupt the disk cache.

In `@src/Arius.Core.Tests/ListQuery/ListQueryHandlerTests.cs`:
- Around line 527-528: Make the ListQueryHandler tests hermetic by not
hardcoding "account"/"container" in MakeHandler; instead generate a fresh
account/container pair per test (e.g. Guid-based names) and pass the same pair
into the ChunkIndexService, TreeCacheService, and the ListQueryHandler so they
share the same repo id and use isolated disk paths; update MakeHandler to accept
account and container parameters (or create a small factory that returns
(handler, account, container)) and ensure any SnapshotService/ChunkIndexService
construction in the tests is created with the same generated account/container
values.

In `@src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs`:
- Around line 604-605: Move the tree-cache validation so it runs before any
index lookups to avoid reading stale L2 shard data: call
_treeCache.ValidateAsync(...) at the start of the ArchiveCommandHandler method
(before the first _index.LookupAsync call) so that stale chunk-index entries are
invalidated prior to deduplication; update any control flow that assumes
validated cache state and verify that dedup logic around _index.LookupAsync and
subsequent pointer-only/missing-chunk handling still operates correctly and
remains thread-safe after this reorder.

In `@src/Arius.Core/ServiceCollectionExtensions.cs`:
- Around line 58-64: The TreeCacheService is registered as a singleton while it
contains mutable validation state (_validated) that is shared across concurrent
archive/list/restore handlers, causing race conditions and stale state leaks;
fix by removing mutable instance state from TreeCacheService or change its DI
lifetime so each command gets an isolated instance (e.g., stop using
services.AddSingleton for TreeCacheService and register
per-request/per-operation via services.AddScoped or services.AddTransient), and
update any consumers/constructors that assume singleton behavior; ensure the
validation flag (_validated) is either made thread-safe (immutable or protected
by synchronization) or moved to a per-operation context so archive/list/restore
handlers do not share it.

In `@src/Arius.Core/Shared/FileTree/TreeCacheService.cs`:
- Around line 33-36: The _validated boolean on TreeCacheService is accessed
without synchronization; change ValidateAsync to atomically set the flag using
Interlocked.CompareExchange (or Interlocked.Exchange) so only the first caller
runs the slow validation path and subsequent callers skip it, and remove the
non-atomic assignment at the end of ValidateAsync; in ExistsInRemote read the
flag with a thread-safe read (Volatile.Read or an Interlocked comparison) to
ensure callers see the updated state. Reference the private field _validated and
the methods ValidateAsync and ExistsInRemote when making the change.
- Around line 155-215: The ValidateAsync method can race when multiple callers
run concurrently because _validated may be set before the slow-path work
completes; introduce a SemaphoreSlim (e.g., _validationSemaphore = new
SemaphoreSlim(1,1)) and wrap the body of ValidateAsync with await
_validationSemaphore.WaitAsync(cancellationToken) and a try/finally that
releases the semaphore, so only one caller performs the slow-path
materialization (the blob enumeration and File.WriteAllBytesAsync calls and
chunk-index cleanup) while other callers await and then see _validated==true;
ensure _validated is only set to true inside the try block after all work
(including listing _blobs.ListAsync, creating marker files in _diskCacheDir, and
deleting files in _chunkIndexL2Dir) completes, and handle cancellation via
cancellationToken.ThrowIfCancellationRequested() as already done.
- Around line 170-174: In TreeCacheService where you iterate
_blobs.ListAsync(BlobPaths.Snapshots, cancellationToken) and set latestRemote =
Path.GetFileName(name), don't assume ListAsync yields sorted results; either
remove the "last one is newest" comment or explicitly collect the snapshot
names, parse their ISO8601 timestamps (same approach SnapshotService uses) and
sort by that timestamp before picking the newest; update the logic around
latestRemote and the loop to use the sorted result set (referencing
TreeCacheService, _blobs.ListAsync, BlobPaths.Snapshots, latestRemote and mimic
SnapshotService's parsing/sorting behavior).

In `@src/Arius.Core/Shared/Snapshot/SnapshotService.cs`:
- Around line 266-272: Concurrent calls to WriteToDiskAsync (from
CreateAsync/ResolveAsync) can cause corruption when both call
File.WriteAllBytesAsync to the same path; change WriteToDiskAsync to write to a
unique temp file (e.g., path + ".tmp."+Guid), flush/close it, then atomically
replace the target using File.Replace (or File.Move with overwrite semantics) to
move the temp into path; reference SnapshotManifest, WriteToDiskAsync,
_diskCacheDir, TimestampFormat and ensure you still pass the cancellationToken
to the write and clean up the temp file on error.

---

Nitpick comments:
In `@openspec/changes/tree-cache-service/proposal.md`:
- Line 39: The sentence containing the redundant phrase "CLI interface" should
be updated to remove the tautology: replace "CLI interface" with either "CLI" or
"command-line interface" in the line that currently reads "- **No breaking
changes**: All changes are internal; the CLI interface, blob storage layout, and
wire formats are unchanged." so it becomes "- **No breaking changes**: All
changes are internal; the CLI, blob storage layout, and wire formats are
unchanged." (or use "command-line interface" if you prefer the spelled-out
form).

In `@openspec/changes/tree-cache-service/specs/tree-cache-service/spec.md`:
- Around line 90-103: Update the spec to reflect that
SnapshotService.CreateAsync writes the full plain-JSON snapshot manifest (not an
empty file) into ~/.arius/{repo}/snapshots/ using
SnapshotService.TimestampFormat; change the requirement text and each Scenario
outcome to state that a plain-JSON snapshot manifest file named by the snapshot
timestamp (e.g., 2026-03-22T150000.000Z) is created, ensure the directory
creation clause remains, and keep references to SnapshotService.CreateAsync and
SnapshotService.TimestampFormat for clarity.

In `@src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs`:
- Around line 379-393: The test
ExistsInRemote_BeforeValidateAsync_ThrowsInvalidOperationException currently
performs inline cleanup with Directory.Delete; replace those inline deletions
with the shared helper by invoking CleanupAsync with the cacheDir and
snapshotsDir returned from MakeService (if the test is synchronous call
CleanupAsync(...).GetAwaiter().GetResult() or make the test async and await
CleanupAsync(...)) so cleanup is consistent with other tests.
- Around line 51-56: The helper CleanupAsync currently marks the method async
and awaits Task.CompletedTask after performing synchronous Directory.Delete
calls; remove the pointless async/await by turning CleanupAsync into a non-async
method that returns Task.CompletedTask (i.e., drop the async keyword and the
await) or, if you need true async IO, run the deletes inside Task.Run and await
that task; reference the CleanupAsync method and the Directory.Delete calls when
making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 316d4c4a-3665-4664-966c-30a629d1bac8

📥 Commits

Reviewing files that changed from the base of the PR and between ad9bc84 and 86f61ac.

📒 Files selected for processing (32)
  • .serena/.gitignore
  • .serena/memories/project_overview.md
  • .serena/memories/style_conventions.md
  • .serena/memories/suggested_commands.md
  • .serena/project.yml
  • openspec/changes/parallel-flush-and-tree-upload/.openspec.yaml
  • openspec/changes/parallel-flush-and-tree-upload/proposal.md
  • openspec/changes/tree-cache-service/.openspec.yaml
  • openspec/changes/tree-cache-service/design.md
  • openspec/changes/tree-cache-service/proposal.md
  • openspec/changes/tree-cache-service/specs/archive-pipeline/spec.md
  • openspec/changes/tree-cache-service/specs/list-query/spec.md
  • openspec/changes/tree-cache-service/specs/restore-pipeline/spec.md
  • openspec/changes/tree-cache-service/specs/tree-cache-service/spec.md
  • openspec/changes/tree-cache-service/tasks.md
  • openspec/specs/archive-pipeline/spec.md
  • openspec/specs/list-query/spec.md
  • openspec/specs/restore-pipeline/spec.md
  • openspec/specs/tree-cache-service/spec.md
  • src/Arius.Core.Tests/Archive/ArchiveRecoveryTests.cs
  • src/Arius.Core.Tests/FileTree/TreeCacheServiceTests.cs
  • src/Arius.Core.Tests/FileTree/TreeServiceTests.cs
  • src/Arius.Core.Tests/ListQuery/ListQueryHandlerTests.cs
  • src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs
  • src/Arius.Core/Features/ListQuery/ListQueryHandler.cs
  • src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs
  • src/Arius.Core/ServiceCollectionExtensions.cs
  • src/Arius.Core/Shared/FileTree/TreeBlobSerializer.cs
  • src/Arius.Core/Shared/FileTree/TreeCacheService.cs
  • src/Arius.Core/Shared/FileTree/TreeService.cs
  • src/Arius.Core/Shared/Snapshot/SnapshotService.cs
  • src/Arius.Integration.Tests/Snapshot/SnapshotServiceIntegrationTests.cs
💤 Files with no reviewable changes (5)
  • .serena/.gitignore
  • .serena/memories/suggested_commands.md
  • .serena/project.yml
  • .serena/memories/style_conventions.md
  • .serena/memories/project_overview.md

Comment on lines +39 to +50
### Decision 2: Snapshot validation via timestamp-named marker files in `~/.arius/{repo}/snapshots/`

**Choice**: After each archive, write an empty marker file named by the snapshot timestamp to `~/.arius/{repo}/snapshots/` (e.g., `2026-03-22T150000.000Z`). To determine the latest local snapshot, enumerate the directory and sort by parsed timestamp — the same approach `SnapshotService.ListBlobNamesAsync` uses for remote snapshots.

**Rationale**: The `snapshots/` directory mirrors the blob container's virtual directory layout, keeping the local cache structure intuitive (`ls ~/.arius/{repo}/` shows `chunk-index/`, `filetrees/`, `snapshots/` — same as the container). Marker files named by timestamp give a natural local history of all snapshots this machine has written, and inferring the latest requires only enumerating and sorting — trivially cheap for a directory with typically < 100 entries. The `SnapshotService.TimestampFormat` (`"yyyy-MM-ddTHHmmss.fffZ"`) sorts lexicographically, so a simple string sort suffices.

On archive start:
1. Enumerate `snapshots/` directory → sort → latest local timestamp (or null if empty).
2. Call `ListAsync("snapshots/")` to get the latest remote snapshot name → remote timestamp. (This is a lightweight call — typically < 100 snapshots.)
3. Compare: if local == remote, **fast path**. If mismatch (or no local markers), **slow path**.

**Alternatives considered**: Using a single `.last` file containing the latest timestamp. Rejected — timestamp-named marker files mirror the remote layout more faithfully and provide a natural local history without additional complexity (the timestamp format already sorts lexicographically).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document snapshots/ as JSON cache files, not marker files.

These sections still describe empty marker files in ~/.arius/{repo}/snapshots/, but the implementation now stores plain-JSON snapshot manifests there via SnapshotService. Leaving the old mechanism here will misdocument how local snapshot discovery and validation actually work.

Also applies to: 89-90

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/tree-cache-service/design.md` around lines 39 - 50, Decision
2 incorrectly describes the on-disk snapshot artifacts as empty timestamp-named
marker files; update the text to reflect that SnapshotService writes plain JSON
snapshot manifests into ~/.arius/{repo}/snapshots/ and that local
discovery/validation reads those JSON files (not empty markers) and still uses
SnapshotService.ListBlobNamesAsync and SnapshotService.TimestampFormat for
naming/sorting consistency; also apply the same correction to the repeated text
at the other occurrence (lines referenced as 89-90) so the document matches the
current implementation.

- **THEN** the tree hash SHALL be SHA256 of the UTF-8 encoded text bytes, optionally passphrase-seeded via `IEncryptionService.ComputeHash`

### Requirement: Snapshot creation
The system SHALL create a snapshot manifest after tree construction completes. The snapshot SHALL be stored at `snapshots/<UTC-timestamp>` (e.g., `2026-03-22T150000.000Z`) and SHALL contain the root tree hash, timestamp, file count, total size, and Arius version. Snapshots SHALL NEVER be deleted. After successful snapshot creation, the pipeline SHALL write an empty marker file named by the snapshot timestamp to `~/.arius/{repo}/snapshots/` to enable fast-path cache validation on subsequent runs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the separate snapshot marker-file requirement.

This conflicts with the refactor objective that SnapshotService write-through disk caching removes separate marker-file usage. Keeping this requirement will drift implementation/tests back toward redundant marker handling.

Also applies to: 62-64

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/tree-cache-service/specs/archive-pipeline/spec.md` at line
52, Remove the separate snapshot marker-file requirement from the spec: delete
the sentence that mandates writing an empty marker file named by the snapshot
timestamp to `~/.arius/{repo}/snapshots/` and any duplicate mentions on lines
62-64, and instead reference that SnapshotService's write-through disk caching
handles cache validation; update the text around the snapshot storage location
`snapshots/<UTC-timestamp>` and the statement "Snapshots SHALL NEVER be deleted"
so the spec no longer requires or describes the marker-file mechanism and points
implementers to use SnapshotService for fast-path cache validation.

- [x] 1.4 Implement `WriteAsync(string hash, TreeBlob, CancellationToken)`: serialize via `TreeBlobSerializer.SerializeForStorageAsync` → upload to Azure (`overwrite: false`, catch `BlobAlreadyExistsException`) → write plaintext to disk cache.
- [x] 1.5 Implement `ValidateAsync(CancellationToken)`: enumerate `snapshots/` directory for timestamp-named marker files → sort → latest local timestamp. Call `ListAsync("snapshots/")` to find latest remote → compare. On mismatch: `ListAsync("filetrees/")` → create empty marker file on disk for each remote blob not already cached; delete chunk-index L2 directory files.
- [x] 1.6 Implement `ExistsInRemote(string hash)`: always `File.Exists` on disk (works on both paths after `ValidateAsync`). Throw `InvalidOperationException` if called before `ValidateAsync`.
- [x] 1.7 Implement `WriteSnapshotMarkerAsync(string timestamp, CancellationToken)`: create empty file at `~/.arius/{repo}/snapshots/{timestamp}`. Create directory if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Drop WriteSnapshotMarkerAsync tasks; they’re stale with the SnapshotService refactor.

These checklist items reintroduce separate marker-file responsibilities that the SnapshotService write-through change was meant to eliminate.

Also applies to: 32-32

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/tree-cache-service/tasks.md` at line 9, Remove the stale
checklist entries that add WriteSnapshotMarkerAsync responsibilities: delete the
checklist line referencing "Implement `WriteSnapshotMarkerAsync(string
timestamp, CancellationToken)`" (and any duplicate at the other location) from
tasks.md so marker-file creation is not reintroduced; rely on the refactored
SnapshotService write-through behavior instead and ensure no new task items or
references to the WriteSnapshotMarkerAsync method remain in the file.

- [x] 7.3 Unit test: `WriteAsync` — verify Azure upload and disk cache write.
- [x] 7.4 Unit test: `WriteAsync` with existing blob — verify `BlobAlreadyExistsException` is caught and disk cache is still written.
- [x] 7.5 Unit test: `ValidateAsync` snapshot match — verify no `ListAsync("filetrees/")` call.
- [x] 7.6 Unit test: `ValidateAsync` snapshot mismatch — verify `ListAsync("filetrees/")` called, empty marker files created on disk, chunk-index L2 directory deleted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Adjust invalidation wording: files are deleted, not the L2 directory.

Current implementation in src/Arius.Core/Shared/FileTree/TreeCacheService.cs removes files under chunk-index L2; it does not delete the directory itself.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/tree-cache-service/tasks.md` at line 45, Update the
checklist wording to reflect actual behavior: change the phrase "chunk-index L2
directory deleted" to "chunk-index L2 files deleted (files under the L2
directory removed)". Also ensure the test description for ValidateAsync mentions
verifying ListAsync("filetrees/") was called and that empty marker files were
created on disk and that chunk-index L2 files (not the directory itself) were
removed; reference the ValidateAsync logic in TreeCacheService (and the
ListAsync("filetrees/") call) so it's clear the test should assert file removals
rather than directory deletion.

- **THEN** `TreeCacheService` SHALL follow the same `{Thing}Service` pattern as `ChunkIndexService`, located in `Shared/FileTree/`

### Requirement: TreeCacheService.ReadAsync
`TreeCacheService.ReadAsync(string hash, CancellationToken)` SHALL return a deserialized `TreeBlob` by checking the local disk cache first, then falling back to Azure download. On cache miss, the downloaded blob SHALL be deserialized and written to the disk cache (plaintext JSON, matching the existing `EnsureUploadedAsync` cache format) before returning. On cache hit, the blob SHALL be read from disk and deserialized without any remote call. The method SHALL handle deserialization and decryption internally via `TreeBlobSerializer`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix tree-blob cache format wording ("plaintext JSON" is inaccurate).

Tree blobs are specified elsewhere in this PR as line-based text entries; this line should not describe the on-disk tree cache as JSON.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/specs/tree-cache-service/spec.md` at line 21, Update the spec
language for TreeCacheService.ReadAsync to stop calling the on-disk cache
"plaintext JSON" and instead describe it as the existing line-based text entry
format used by EnsureUploadedAsync; state that on cache miss the downloaded blob
SHALL be serialized using TreeBlobSerializer into the same line-based text
format and written to disk before returning, and on cache hit the file SHALL be
read and deserialized via TreeBlobSerializer without any remote call.

Comment on lines +58 to +64
// Tree cache
services.AddSingleton(sp =>
new TreeCacheService(
sp.GetRequiredService<IBlobContainerService>(),
sp.GetRequiredService<IEncryptionService>(),
accountName,
containerName));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Singleton TreeCacheService shares unsynchronized mutable state across handlers.

Line 59 registers TreeCacheService as a singleton, but the service keeps mutable validation state (_validated). With archive/list/restore handlers all holding this shared instance, concurrent command execution can race validation and leak stale state across operations. Consider removing mutable state from the singleton or changing lifetimes so each command gets an isolated cache service instance.

As per coding guidelines, src/Arius.Core/**: “Pay close attention to deduplication logic, encryption correctness, and thread safety.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core/ServiceCollectionExtensions.cs` around lines 58 - 64, The
TreeCacheService is registered as a singleton while it contains mutable
validation state (_validated) that is shared across concurrent
archive/list/restore handlers, causing race conditions and stale state leaks;
fix by removing mutable instance state from TreeCacheService or change its DI
lifetime so each command gets an isolated instance (e.g., stop using
services.AddSingleton for TreeCacheService and register
per-request/per-operation via services.AddScoped or services.AddTransient), and
update any consumers/constructors that assume singleton behavior; ensure the
validation flag (_validated) is either made thread-safe (immutable or protected
by synchronization) or moved to a per-operation context so archive/list/restore
handlers do not share it.

Comment on lines +33 to +36
/// <summary>
/// Guard ensuring <see cref="ExistsInRemote"/> is not called before <see cref="ValidateAsync"/>.
/// </summary>
private bool _validated;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Thread safety concern: _validated flag is not synchronized.

The _validated boolean is read and written without synchronization. If ValidateAsync is called concurrently from multiple threads (e.g., during parallel handler initialization), a race condition could cause multiple threads to execute the slow path simultaneously.

Consider using Interlocked.CompareExchange or a SemaphoreSlim for proper synchronization.

🔒 Suggested fix using Interlocked
-    /// <summary>
-    /// Guard ensuring <see cref="ExistsInRemote"/> is not called before <see cref="ValidateAsync"/>.
-    /// </summary>
-    private bool _validated;
+    /// <summary>
+    /// Guard ensuring <see cref="ExistsInRemote"/> is not called before <see cref="ValidateAsync"/>.
+    /// 0 = not validated, 1 = validated.
+    /// </summary>
+    private int _validated;

Then in ValidateAsync:

-        if (_validated) return;
+        if (Interlocked.CompareExchange(ref _validated, 1, 0) == 1) return;

And at the end of ValidateAsync, remove the assignment since it's done atomically above.

In ExistsInRemote:

-        if (!_validated)
+        if (Volatile.Read(ref _validated) == 0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core/Shared/FileTree/TreeCacheService.cs` around lines 33 - 36, The
_validated boolean on TreeCacheService is accessed without synchronization;
change ValidateAsync to atomically set the flag using
Interlocked.CompareExchange (or Interlocked.Exchange) so only the first caller
runs the slow validation path and subsequent callers skip it, and remove the
non-atomic assignment at the end of ValidateAsync; in ExistsInRemote read the
flag with a thread-safe read (Volatile.Read or an Interlocked comparison) to
ensure callers see the updated state. Reference the private field _validated and
the methods ValidateAsync and ExistsInRemote when making the change.

Comment on lines +155 to +215
public async Task ValidateAsync(CancellationToken cancellationToken = default)
{
if (_validated) return;

// Latest local snapshot filename (lexicographic = chronological due to timestamp format)
var latestLocal = Directory.Exists(_snapshotsDir)
? Directory.EnumerateFiles(_snapshotsDir)
.Select(Path.GetFileName)
.Where(n => n is not null)
.OrderByDescending(n => n, StringComparer.Ordinal)
.FirstOrDefault()
: null;

// Latest remote snapshot
string? latestRemote = null;
await foreach (var name in _blobs.ListAsync(BlobPaths.Snapshots, cancellationToken))
{
// ListAsync returns names sorted; last one is newest
latestRemote = Path.GetFileName(name); // strip "snapshots/" prefix
}

// If there are no remote snapshots, the repository is empty — fast path.
if (latestRemote is null)
{
_validated = true;
return;
}

// Fast path: this machine wrote the last snapshot.
if (latestLocal is not null &&
string.Equals(latestLocal, latestRemote, StringComparison.Ordinal))
{
_validated = true;
return;
}

// Slow path: snapshot mismatch (or no local snapshot at all).
// Materialize empty marker files for all remote filetree blobs not yet cached.
await foreach (var blobName in _blobs.ListAsync(BlobPaths.FileTrees, cancellationToken))
{
cancellationToken.ThrowIfCancellationRequested();
var hash = Path.GetFileName(blobName); // strip "filetrees/" prefix
var diskPath = Path.Combine(_diskCacheDir, hash);
if (!File.Exists(diskPath))
{
// Create an empty marker file (will be filled by ReadAsync on demand)
await File.WriteAllBytesAsync(diskPath, [], cancellationToken);
}
}

// Invalidate chunk-index L2 (another machine may have updated shards)
if (Directory.Exists(_chunkIndexL2Dir))
{
foreach (var file in Directory.EnumerateFiles(_chunkIndexL2Dir))
{
try { File.Delete(file); } catch { /* ignore individual failures */ }
}
}

_validated = true;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

ValidateAsync sets _validated = true before completing slow-path work.

If using the atomic approach, note that the current code sets _validated = true at line 214, which is correct placement (after all work is done). However, with the Interlocked.CompareExchange pattern, you'd need to restructure to avoid the second call executing before the first completes its slow-path work.

A safer pattern would use a SemaphoreSlim(1,1) to serialize validation:

🔒 Alternative: SemaphoreSlim for serialization
+    private readonly SemaphoreSlim _validateLock = new(1, 1);
     private bool _validated;

     public async Task ValidateAsync(CancellationToken cancellationToken = default)
     {
-        if (_validated) return;
+        if (_validated) return;
+        
+        await _validateLock.WaitAsync(cancellationToken);
+        try
+        {
+            if (_validated) return; // double-check after acquiring lock
+            
+            // ... existing validation logic ...
+            
+            _validated = true;
+        }
+        finally
+        {
+            _validateLock.Release();
+        }
-        _validated = true;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core/Shared/FileTree/TreeCacheService.cs` around lines 155 - 215,
The ValidateAsync method can race when multiple callers run concurrently because
_validated may be set before the slow-path work completes; introduce a
SemaphoreSlim (e.g., _validationSemaphore = new SemaphoreSlim(1,1)) and wrap the
body of ValidateAsync with await
_validationSemaphore.WaitAsync(cancellationToken) and a try/finally that
releases the semaphore, so only one caller performs the slow-path
materialization (the blob enumeration and File.WriteAllBytesAsync calls and
chunk-index cleanup) while other callers await and then see _validated==true;
ensure _validated is only set to true inside the try block after all work
(including listing _blobs.ListAsync, creating marker files in _diskCacheDir, and
deleting files in _chunkIndexL2Dir) completes, and handle cancellation via
cancellationToken.ThrowIfCancellationRequested() as already done.

Comment on lines +170 to +174
await foreach (var name in _blobs.ListAsync(BlobPaths.Snapshots, cancellationToken))
{
// ListAsync returns names sorted; last one is newest
latestRemote = Path.GetFileName(name); // strip "snapshots/" prefix
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for ListAsync documentation or implementation to verify sorting guarantee
rg -n "ListAsync" --type cs -A 5 -B 2 | head -100

Repository: woutervanranst/Arius7

Length of output: 9995


🏁 Script executed:

rg -n "BlobPaths" --type cs -A 3 -B 1 | head -80

Repository: woutervanranst/Arius7

Length of output: 8119


🏁 Script executed:

rg -n "ParseTimestamp|Snapshots" --type cs -B 2 -A 8 | head -120

Repository: woutervanranst/Arius7

Length of output: 11460


🏁 Script executed:

rg -n "TimestampFormat" --type cs -B 2 -A 2

Repository: woutervanranst/Arius7

Length of output: 3785


ListAsync does not guarantee sorted ordering — TreeCacheService should not assume "last item is newest".

The IBlobContainerService.ListAsync interface provides no ordering guarantee. While snapshot blob names follow an ISO 8601 format (yyyy-MM-ddTHHmmss.fffZ) that is lexicographically sortable, the interface contract itself does not document this. SnapshotService explicitly sorts results from ListAsync by parsed timestamp (lines 209–212), which indicates this codebase treats ListAsync order as unreliable. The comment assumes undocumented behavior and should either be removed or the code should sort results to be consistent with SnapshotService.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core/Shared/FileTree/TreeCacheService.cs` around lines 170 - 174,
In TreeCacheService where you iterate _blobs.ListAsync(BlobPaths.Snapshots,
cancellationToken) and set latestRemote = Path.GetFileName(name), don't assume
ListAsync yields sorted results; either remove the "last one is newest" comment
or explicitly collect the snapshot names, parse their ISO8601 timestamps (same
approach SnapshotService uses) and sort by that timestamp before picking the
newest; update the logic around latestRemote and the loop to use the sorted
result set (referencing TreeCacheService, _blobs.ListAsync, BlobPaths.Snapshots,
latestRemote and mimic SnapshotService's parsing/sorting behavior).

Comment on lines +266 to +272
private async Task WriteToDiskAsync(SnapshotManifest manifest, CancellationToken cancellationToken)
{
var fileName = manifest.Timestamp.UtcDateTime.ToString(TimestampFormat);
var path = Path.Combine(_diskCacheDir, fileName);
var json = JsonSerializer.SerializeToUtf8Bytes(manifest, s_localJsonOptions);
await File.WriteAllBytesAsync(path, json, cancellationToken);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential race condition on concurrent writes to same snapshot file.

If two CreateAsync or ResolveAsync calls target the same timestamp concurrently (e.g., during testing or edge-case recovery), File.WriteAllBytesAsync could corrupt the file. Since snapshots are typically created sequentially and timestamps include milliseconds, this is low risk in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core/Shared/Snapshot/SnapshotService.cs` around lines 266 - 272,
Concurrent calls to WriteToDiskAsync (from CreateAsync/ResolveAsync) can cause
corruption when both call File.WriteAllBytesAsync to the same path; change
WriteToDiskAsync to write to a unique temp file (e.g., path + ".tmp."+Guid),
flush/close it, then atomically replace the target using File.Replace (or
File.Move with overwrite semantics) to move the temp into path; reference
SnapshotManifest, WriteToDiskAsync, _diskCacheDir, TimestampFormat and ensure
you still pass the cancellationToken to the write and clean up the temp file on
error.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs (1)

118-131: Reuse a fixture-scoped TreeCacheService instead of allocating one per handler call.

Current factories recreate the cache service every time, which resets its internal validation state and adds avoidable overhead. A single fixture instance would better match production DI behavior.

♻️ Suggested refactor
 public sealed class PipelineFixture : IAsyncDisposable
 {
+    private TreeCacheService TreeCache { get; set; } = null!;
@@
     private async Task InitAsync(string? passphrase, CancellationToken ct)
     {
@@
         Index      = new ChunkIndexService(BlobContainer, Encryption, Account, container.Name);
+        TreeCache  = new TreeCacheService(BlobContainer, Encryption, Account, container.Name);
         Mediator   = Substitute.For<IMediator>();
@@
     private async Task InitAsyncWithEncryption(
         IEncryptionService   encryption,
         BlobContainerClient? existingContainer,
         CancellationToken    ct)
     {
@@
         Encryption = encryption;
         Index      = new ChunkIndexService(BlobContainer, Encryption, Account, Container.Name);
+        TreeCache  = new TreeCacheService(BlobContainer, Encryption, Account, Container.Name);
         Mediator   = Substitute.For<IMediator>();
@@
     public ArchiveCommandHandler CreateArchiveHandler() =>
-        new(BlobContainer, Encryption, Index, new TreeCacheService(BlobContainer, Encryption, Account, Container.Name), Mediator,
+        new(BlobContainer, Encryption, Index, TreeCache, Mediator,
             NullLogger<ArchiveCommandHandler>.Instance,
             Account, Container.Name);

     public RestoreCommandHandler CreateRestoreHandler() =>
-        new(BlobContainer, Encryption, Index, new TreeCacheService(BlobContainer, Encryption, Account, Container.Name), Mediator,
+        new(BlobContainer, Encryption, Index, TreeCache, Mediator,
             NullLogger<RestoreCommandHandler>.Instance,
             Account, Container.Name);

     public ListQueryHandler CreateListQueryHandler() =>
-        new(BlobContainer, Encryption, Index, new TreeCacheService(BlobContainer, Encryption, Account, Container.Name),
+        new(BlobContainer, Encryption, Index, TreeCache,
             NullLogger<ListQueryHandler>.Instance,
             Account, Container.Name);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs` around lines 118 -
131, The handlers (CreateArchiveHandler, CreateRestoreHandler,
CreateListQueryHandler) each instantiate a new TreeCacheService causing resets
and extra work; instead add a fixture-scoped TreeCacheService field on
PipelineFixture, instantiate it once (e.g., in the fixture constructor or
setup), and pass that single TreeCacheService instance into each handler factory
rather than creating new ones inline so the cache's validation/state persists
and DI behavior is mirrored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs`:
- Around line 118-131: The handlers (CreateArchiveHandler, CreateRestoreHandler,
CreateListQueryHandler) each instantiate a new TreeCacheService causing resets
and extra work; instead add a fixture-scoped TreeCacheService field on
PipelineFixture, instantiate it once (e.g., in the fixture constructor or
setup), and pass that single TreeCacheService instance into each handler factory
rather than creating new ones inline so the cache's validation/state persists
and DI behavior is mirrored.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ccfc7b26-3d04-46ea-8fd3-f256fad17f1b

📥 Commits

Reviewing files that changed from the base of the PR and between 86f61ac and 71b5c25.

📒 Files selected for processing (7)
  • src/Arius.E2E.Tests/E2ETests.cs
  • src/Arius.E2E.Tests/RehydrationE2ETests.cs
  • src/Arius.Integration.Tests/FileTree/TreeBuilderIntegrationTests.cs
  • src/Arius.Integration.Tests/Pipeline/ContainerCreationTests.cs
  • src/Arius.Integration.Tests/Pipeline/CrashRecoveryTests.cs
  • src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs
  • src/Arius.Integration.Tests/Pipeline/RehydrationStateTests.cs

…date test fixtures

Register SnapshotService as singleton in ServiceCollectionExtensions alongside
TreeCacheService and ChunkIndexService, inject it via constructor into
ArchiveCommandHandler, RestoreCommandHandler, and ListQueryHandler, and update
all test files (PipelineFixture, CrashRecoveryTests, ContainerCreationTests,
RehydrationStateTests, E2ETests, RehydrationE2ETests) to pass the new arg.
Also notes DI singleton registration in the tree-cache-service spec.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs (1)

98-118: ⚠️ Potential issue | 🔴 Critical

Validate the repo caches before restore starts using them.

TreeCacheService.ValidateAsync() is the code path that invalidates stale chunk-index L2 data when the local snapshot markers no longer match remote. This handler never calls it, so Line 187 can still read old shard data and fall into the current “Content hash not found in index, skipping” path for files that do exist remotely. Please run validation once near the top of Handle before snapshot/tree/index work begins.

🔁 Suggested fix
         // ── Ensure container exists ───────────────────────────────────────────
         await _blobs.CreateContainerIfNotExistsAsync(cancellationToken);

         try
         {
+            await _treeCache.ValidateAsync(cancellationToken);
+
             // ── Step 1: Resolve snapshot ──────────────────────────────────────
             var snapshot    = await _snapshotSvc.ResolveAsync(opts.Version, cancellationToken);

As per coding guidelines, src/Arius.Core/**: “Core library. Pay close attention to deduplication logic, encryption correctness, and thread safety. Flag any changes that could cause data loss or corruption.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs` around lines
98 - 118, Call TreeCacheService.ValidateAsync once at the start of
RestoreCommandHandler.Handle (before any snapshot/tree/index work such as
_snapshotSvc.ResolveAsync or CollectFilesAsync) to ensure local caches are
validated and stale L2 chunk-index data is invalidated; invoke it with the
current cancellationToken (e.g. await
TreeCacheService.ValidateAsync(cancellationToken)) and propagate or log any
errors per existing error-handling patterns so validation runs before reading
shards/indexes.
♻️ Duplicate comments (2)
src/Arius.Core/ServiceCollectionExtensions.cs (1)

60-65: ⚠️ Potential issue | 🟠 Major

Don't capture TreeCacheService in singleton handlers.

TreeCacheService carries mutable repo state (_validated) and cache-file side effects, but this registration makes one instance shared by all archive/list/restore operations. Once one operation flips _validated, later operations in the same process can skip revalidation even after the remote repo changes. Either make TreeCacheService stateless/thread-safe, or stop registering both it and the handlers that capture it as singletons.

As per coding guidelines, src/Arius.Core/**: “Core library. Pay close attention to deduplication logic, encryption correctness, and thread safety. Flag any changes that could cause data loss or corruption.”

Also applies to: 83-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core/ServiceCollectionExtensions.cs` around lines 60 - 65, The
TreeCacheService is stateful (field _validated and cache-file side effects) and
must not be registered as a process-wide singleton; change the DI registration
in ServiceCollectionExtensions (the AddSingleton that constructs new
TreeCacheService) to a non-singleton lifetime (e.g., AddScoped or AddTransient)
and ensure any handlers/services that currently capture TreeCacheService as
singletons are also converted to a compatible shorter lifetime (or refactored to
not hold state) so each archive/list/restore operation gets its own
TreeCacheService instance; locate registrations and handler constructors that
accept TreeCacheService to update their lifetimes accordingly.
src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs (1)

607-608: ⚠️ Potential issue | 🔴 Critical

Move cache validation ahead of the first dedup lookup.

ValidateAsync is what clears stale chunk-index L2 state when the latest remote snapshot no longer matches local. Here it still runs after dedup has already hit Line 227, so those lookups can use stale shard data. The downstream effect is missed remote dedup hits and pointer-only files being skipped as “missing chunk”. Call it once near the start of Handle, before any _index.LookupAsync(...).

🔁 Suggested move
         // ── Ensure container exists ───────────────────────────────────────────
         await _blobs.CreateContainerIfNotExistsAsync(cancellationToken);
+        await _treeCache.ValidateAsync(cancellationToken);

         // Validate options (task 8.13)
         if (opts is { RemoveLocal: true, NoPointers: true })
             return new ArchiveResult
@@
-            // Task 5.1: Validate tree cache before building tree (compare local/remote snapshot markers)
-            await _treeCache.ValidateAsync(cancellationToken);
-
             // Task 8.10: Index flush
             await _index.FlushAsync(cancellationToken);

As per coding guidelines, src/Arius.Core/**: “Core library. Pay close attention to deduplication logic, encryption correctness, and thread safety. Flag any changes that could cause data loss or corruption.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs` around lines
607 - 608, Move the cache validation earlier in the request flow: call
_treeCache.ValidateAsync(cancellationToken) at the start of the
ArchiveCommandHandler.Handle method (before any calls to _index.LookupAsync) so
stale L2 shard state is cleared prior to dedup lookups; locate the Handle method
and the first _index.LookupAsync usage and reposition the ValidateAsync call to
execute immediately after entering Handle and before any dedup/index operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs`:
- Around line 98-118: Call TreeCacheService.ValidateAsync once at the start of
RestoreCommandHandler.Handle (before any snapshot/tree/index work such as
_snapshotSvc.ResolveAsync or CollectFilesAsync) to ensure local caches are
validated and stale L2 chunk-index data is invalidated; invoke it with the
current cancellationToken (e.g. await
TreeCacheService.ValidateAsync(cancellationToken)) and propagate or log any
errors per existing error-handling patterns so validation runs before reading
shards/indexes.

---

Duplicate comments:
In `@src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs`:
- Around line 607-608: Move the cache validation earlier in the request flow:
call _treeCache.ValidateAsync(cancellationToken) at the start of the
ArchiveCommandHandler.Handle method (before any calls to _index.LookupAsync) so
stale L2 shard state is cleared prior to dedup lookups; locate the Handle method
and the first _index.LookupAsync usage and reposition the ValidateAsync call to
execute immediately after entering Handle and before any dedup/index operations.

In `@src/Arius.Core/ServiceCollectionExtensions.cs`:
- Around line 60-65: The TreeCacheService is stateful (field _validated and
cache-file side effects) and must not be registered as a process-wide singleton;
change the DI registration in ServiceCollectionExtensions (the AddSingleton that
constructs new TreeCacheService) to a non-singleton lifetime (e.g., AddScoped or
AddTransient) and ensure any handlers/services that currently capture
TreeCacheService as singletons are also converted to a compatible shorter
lifetime (or refactored to not hold state) so each archive/list/restore
operation gets its own TreeCacheService instance; locate registrations and
handler constructors that accept TreeCacheService to update their lifetimes
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dbe888b0-8eea-48cb-adb0-e6df22a85b16

📥 Commits

Reviewing files that changed from the base of the PR and between 71b5c25 and e87ae61.

📒 Files selected for processing (11)
  • openspec/specs/tree-cache-service/spec.md
  • src/Arius.Core/Features/ArchiveCommand/ArchiveCommandHandler.cs
  • src/Arius.Core/Features/ListQuery/ListQueryHandler.cs
  • src/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs
  • src/Arius.Core/ServiceCollectionExtensions.cs
  • src/Arius.E2E.Tests/E2ETests.cs
  • src/Arius.E2E.Tests/RehydrationE2ETests.cs
  • src/Arius.Integration.Tests/Pipeline/ContainerCreationTests.cs
  • src/Arius.Integration.Tests/Pipeline/CrashRecoveryTests.cs
  • src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs
  • src/Arius.Integration.Tests/Pipeline/RehydrationStateTests.cs
✅ Files skipped from review due to trivial changes (2)
  • src/Arius.E2E.Tests/E2ETests.cs
  • openspec/specs/tree-cache-service/spec.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/Arius.Integration.Tests/Pipeline/PipelineFixture.cs
  • src/Arius.Core/Features/ListQuery/ListQueryHandler.cs
  • src/Arius.Integration.Tests/Pipeline/ContainerCreationTests.cs
  • src/Arius.E2E.Tests/RehydrationE2ETests.cs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant