feat: optimize restoring duplicate files from TAR chunk#73
Conversation
📝 WalkthroughWalkthroughRestore loop now pattern-matches tar entries, streams the first destination directly from the tar entry DataStream, creates duplicates by copying the written file (or an empty file when no stream), and derives restored size from on-disk file length. Tests switched to direct Changes
Sequence Diagram(s)sequenceDiagram
participant Handler as RestoreCommandHandler
participant Tar as TarEntry
participant FS as FileSystem
participant Bus as EventBus
Handler->>Tar: enumerate tar entries (pattern match)
alt entry has DataStream
Handler->>FS: stream-write first destination from DataStream
FS-->>Handler: file written
Handler->>FS: copy written file for duplicate paths
FS-->>Handler: duplicates created
else no DataStream
Handler->>FS: create empty file at path
FS-->>Handler: empty file created
end
Handler->>FS: stat file -> get on-disk length
FS-->>Handler: FileInfo.Length
Handler->>Bus: publish FileRestoredEvent (size from disk)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Arius.Core.Tests/Features/RestoreCommand/RestoreCommandHandlerTests.cs`:
- Around line 108-166: The test
Handle_Restores_DuplicateTarEntryContent_ToAllPaths_WithPerPathMetadata should
be extended to add a pair of zero-byte files (e.g., WriteFile(localRoot,
"zero/a.txt", Array.Empty<byte>(), created, modified) and WriteFile(localRoot,
"zero/b.txt", Array.Empty<byte>(), created2, modified2)) so the archive/restore
flow hits the tarEntry.DataStream is null branch in RestoreCommandHandler and
exercises pointer-file creation and per-path timestamps; after archiving and
restoring assert restoreResult.FilesRestored increments accordingly and add
AssertRestoredFile checks for both zero-byte paths with their respective
created/modified timestamps to validate the per-path metadata behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 61924f74-ffe2-4875-9055-5b2ca91ac7d7
📒 Files selected for processing (2)
src/Arius.Core.Tests/Features/RestoreCommand/RestoreCommandHandlerTests.cssrc/Arius.Core/Features/RestoreCommand/RestoreCommandHandler.cs
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #73 +/- ##
=======================================
Coverage 78.23% 78.24%
=======================================
Files 70 70
Lines 4931 4932 +1
Branches 680 679 -1
=======================================
+ Hits 3858 3859 +1
Misses 909 909
Partials 164 164
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/Arius.Core.Tests/Features/RestoreCommand/RestoreCommandHandlerTests.cs (1)
164-165: Also assert pointer file payload content, not only existence/timestamps.Right now these tests can pass even if the
.pointer.ariusfile contains the wrong hash. Add a payload assertion to tighten regression protection.♻️ Proposed test assertion enhancement
- AssertRestoredFile("archives/duplicates/copy-a.bin", content, firstCreated, firstModified); - AssertRestoredFile("nested/deep/a/b/c/copy-b.bin", content, secondCreated, secondModified); + var expectedPointerHash = ContentHash.Parse(encryption.ComputeHash(content).ToString()).ToString(); + AssertRestoredFile("archives/duplicates/copy-a.bin", content, expectedPointerHash, firstCreated, firstModified); + AssertRestoredFile("nested/deep/a/b/c/copy-b.bin", content, expectedPointerHash, secondCreated, secondModified); - void AssertRestoredFile(string relativePath, byte[] expectedContent, DateTime expectedCreated, DateTime expectedModified) + void AssertRestoredFile(string relativePath, byte[] expectedContent, string expectedPointerHash, DateTime expectedCreated, DateTime expectedModified) { var restoredPath = Path.Combine(restoreRoot, relativePath.Replace('/', Path.DirectorySeparatorChar)); var pointerPath = restoredPath + ".pointer.arius"; File.ReadAllBytes(restoredPath).ShouldBe(expectedContent); File.Exists(pointerPath).ShouldBeTrue($"Pointer file should exist for {relativePath}"); + File.ReadAllText(pointerPath).Trim().ShouldBe(expectedPointerHash, $"Pointer payload for {relativePath}");As per coding guidelines:
**/*.Tests/**: Test projects. Focus on test coverage gaps and assertion quality rather than style.Also applies to: 174-190, 248-249, 258-274
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Arius.Core.Tests/Features/RestoreCommand/RestoreCommandHandlerTests.cs` around lines 164 - 165, The tests currently only check pointer file existence/timestamps; update RestoreCommandHandlerTests to also read and assert the pointer file payload matches the expected payload/hash for each restored file: after calling AssertRestoredFile("archives/duplicates/copy-a.bin", ...) and AssertRestoredFile("nested/deep/a/b/c/copy-b.bin", ...) open the corresponding ".pointer.arius" file, deserialize or parse its payload and assert it equals the expected payload/hash (use the same expected value you derive for content e.g. expectedHash or expectedPayload variable), and apply the same additional assertion for the other test cases referenced (lines 174-190, 248-249, 258-274) so each pointer file's content is validated not just its existence/timestamps.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/Arius.Core.Tests/Features/RestoreCommand/RestoreCommandHandlerTests.cs`:
- Around line 164-165: The tests currently only check pointer file
existence/timestamps; update RestoreCommandHandlerTests to also read and assert
the pointer file payload matches the expected payload/hash for each restored
file: after calling AssertRestoredFile("archives/duplicates/copy-a.bin", ...)
and AssertRestoredFile("nested/deep/a/b/c/copy-b.bin", ...) open the
corresponding ".pointer.arius" file, deserialize or parse its payload and assert
it equals the expected payload/hash (use the same expected value you derive for
content e.g. expectedHash or expectedPayload variable), and apply the same
additional assertion for the other test cases referenced (lines 174-190,
248-249, 258-274) so each pointer file's content is validated not just its
existence/timestamps.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d17d4239-9a29-4e9d-a642-23133550a1e9
📒 Files selected for processing (2)
.github/workflows/ci.ymlsrc/Arius.Core.Tests/Features/RestoreCommand/RestoreCommandHandlerTests.cs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Arius.Benchmarks/benchmark-tail.md`:
- Line 8: Edit the markdown row that starts with "| woutbook6 |
2026-04-28T12:24:00.9455930+00:00 | f053596521cec1c1b5322c39cec1cc3d89d7d7d5 | 1
| 3 | 1.404 m | 0.1645 m | 0.0090 m | 409000.0000 | 73000.0000 | 43000.0000 |
4.77 GB | 459541.0000 | 7.0000 | src/Arius.Benchmarks/raw/20260428T122400.945Z"
so it ends with a trailing pipe ("|") to keep the correct 15 cells, and move the
inline comment "# Larger woutbook run after refactor to hashes" off the table
(place it on the following line as a plain paragraph or an HTML comment) so it
is not parsed as a table cell; finally ensure the file ends with a trailing
newline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5bb5c649-94fb-42b3-900c-21bc5fb142be
📒 Files selected for processing (1)
src/Arius.Benchmarks/benchmark-tail.md
| | runnervmhkfpo | 2026-04-28T04:51:52.2915132+00:00 | 0ee7db665f5f425a1a77a69e1c63778789648460 | 8 | 3 | 38.54 s | 6.373 s | 0.349 s | 49000.0000 | 17000.0000 | 14000.0000 | 829.57 MB | 70331.0000 | 21.0000 | src/Arius.Benchmarks/raw/20260428T045152.291Z | # Run on GH runner after refactor | ||
|
|
||
| | woutbook6 | 2026-04-28T12:24:00.9455930+00:00 | f053596521cec1c1b5322c39cec1cc3d89d7d7d5 | 1 | 3 | 1.404 m | 0.1645 m | 0.0090 m | 409000.0000 | 73000.0000 | 43000.0000 | 4.77 GB | 459541.0000 | 7.0000 | src/Arius.Benchmarks/raw/20260428T122400.945Z | # Larger woutbook run after refactor to hashes | ||
| | woutbook6 | 2026-04-28T12:24:00.9455930+00:00 | f053596521cec1c1b5322c39cec1cc3d89d7d7d5 | 1 | 3 | 1.404 m | 0.1645 m | 0.0090 m | 409000.0000 | 73000.0000 | 43000.0000 | 4.77 GB | 459541.0000 | 7.0000 | src/Arius.Benchmarks/raw/20260428T122400.945Z | # Larger woutbook run after refactor to hashes No newline at end of file |
There was a problem hiding this comment.
Fix markdown table formatting.
The table row has two formatting issues:
- Missing trailing pipe (
|) at the end of the row - The comment is being treated as a table cell (causing 16 cells instead of the expected 15)
The file also ends without a trailing newline, which violates common text file best practices.
📝 Proposed fix for table formatting
-| woutbook6 | 2026-04-28T12:24:00.9455930+00:00 | f053596521cec1c1b5322c39cec1cc3d89d7d7d5 | 1 | 3 | 1.404 m | 0.1645 m | 0.0090 m | 409000.0000 | 73000.0000 | 43000.0000 | 4.77 GB | 459541.0000 | 7.0000 | src/Arius.Benchmarks/raw/20260428T122400.945Z | # Larger woutbook run after refactor to hashes
+| woutbook6 | 2026-04-28T12:24:00.9455930+00:00 | f053596521cec1c1b5322c39cec1cc3d89d7d7d5 | 1 | 3 | 1.404 m | 0.1645 m | 0.0090 m | 409000.0000 | 73000.0000 | 43000.0000 | 4.77 GB | 459541.0000 | 7.0000 | src/Arius.Benchmarks/raw/20260428T122400.945Z | # Larger woutbook run after refactor to hashes
+Note: Added trailing newline and consistent spacing to match the formatting of other rows.
🧰 Tools
🪛 LanguageTool
[grammar] ~8-~8: Ensure spelling is correct
Context: ...rks/raw/20260428T122400.945Z | # Larger woutbook run after refactor to hashes
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.22.1)
[warning] 8-8: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe
(MD055, table-pipe-style)
[warning] 8-8: Table column count
Expected: 15; Actual: 16; Too many cells, extra data will be missing
(MD056, table-column-count)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Arius.Benchmarks/benchmark-tail.md` at line 8, Edit the markdown row that
starts with "| woutbook6 | 2026-04-28T12:24:00.9455930+00:00 |
f053596521cec1c1b5322c39cec1cc3d89d7d7d5 | 1 | 3 | 1.404 m | 0.1645 m | 0.0090 m
| 409000.0000 | 73000.0000 | 43000.0000 | 4.77 GB | 459541.0000 | 7.0000 |
src/Arius.Benchmarks/raw/20260428T122400.945Z" so it ends with a trailing pipe
("|") to keep the correct 15 cells, and move the inline comment "# Larger
woutbook run after refactor to hashes" off the table (place it on the following
line as a plain paragraph or an HTML comment) so it is not parsed as a table
cell; finally ensure the file ends with a trailing newline.
|
fixes #12 |
Fix for #12
Summary by CodeRabbit
Improvements
Tests
Chores