fix: reimplement push parsing to prevent Z_DATA_ERROR#1187
fix: reimplement push parsing to prevent Z_DATA_ERROR#1187kriswest merged 39 commits intofinos:mainfrom
Conversation
…and baseSha for ref_deltas
…ore it were processed correctly
✅ Deploy Preview for endearing-brigadeiros-63f9d0 canceled.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1187 +/- ##
==========================================
+ Coverage 83.88% 84.11% +0.22%
==========================================
Files 68 68
Lines 2904 2958 +54
Branches 364 373 +9
==========================================
+ Hits 2436 2488 +52
- Misses 409 410 +1
- Partials 59 60 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…rsing' into 1040-Z-DATA-ERROR-during-push-parsing
jescalada
left a comment
There was a problem hiding this comment.
I'm not a low-level code expert, so I'll just leave some general suggestions!
I've experimented a bit to verify whether the current code slows down the push process somewhat, and it seems that it's slightly slower (10~20%) for very large pushes:
Basic timing method
Timing scripts
Regular commit
main
This PR
Massive commit (~12 MB of text changes)
main
This PR
This might not be something we want to optimize right now, though.
|
I think there's not a lot more I can do on this PR, without heading into a port to fast-zlib or zlib-sync, which I think is a separate challenge - we probably need to do something similar to fast-zlib (using the internals of zlib, such as |
coopernetes
left a comment
There was a problem hiding this comment.
One very minor nit but otherwise LGTM. I think @kriswest knows the packet format best at this point now that this refactoring has been done lol.
My only suggestion would be to add some real (binary) receive-pack files to use as input for the tests.
# Create some new content to generate a pack
echo "New content for pack testing $(date)" > new-file.txt
echo "# Updated README $(date)" >> README.md
git add .
git commit -m "Test commit for pack capture $(date)"
echo "Capturing receive-pack output to $output_file..."
# Method 1: Capture the actual pack data being sent
# This captures the raw pack file that would be sent to receive-pack
git format-patch --stdout HEAD~1..HEAD > "$output_file.patch"
# Method 2: Capture pack file from push operation with verbose output
GIT_TRACE_PACKET=1 git push origin main 2> "$output_file.trace" || true
# Method 3: Create a pack file directly
git pack-objects "$output_file" < <(git rev-list --objects HEAD)Co-authored-by: Thomas Cooper <57812123+coopernetes@users.noreply.github.com> Signed-off-by: Kris West <kristopher.west@natwest.com>
…rsing' into 1040-Z-DATA-ERROR-during-push-parsing
|
@coopernetes test added that uses a captured push. I ended up popping a temporary change into parsePush to capture it as I had trouble with the methods given (and alternatives that an AI came up with):
Produces a patch file rather than a pack file.
...and a number of variations I tried, produce a trace file that you can extract the pack data from in theory. The output didn't quite match what I was expecting and was missing the packet lines that proceed the PACK data in a push
Creates PACK and IDX files, but for the entire content of the repo (so about 16 mb for git proxy). Again is missing the packet lines that a push will have preceding the PACK file data. However, there is probably a variation on this one that would have been quite close. In the end, just capturing a request body as a buffer and writing it to a file worked and is easy to replicate. Added details to a readme file adjacent to the captured binary file. |
resolves #1040
Reimplements parsing of PACK file contents to resolve a number of issues that were causing a loss of sync with the headers and compressed data streams that make up the pack file, resulting in Z_DATA_ERROR being thrown and the rest of the pack file failing to decode. This results in partial content being extracted from push pack files and commits being missed/not processed by later steps. The missing commits can also interact with the checkHiddenCommits task to block a subset of pushes, while others proceed with missing data.
After this PR, Git Proxy will correctly process a handful of pushes I have queued up that either parse with missing data or are blocked by checkHiddenCommits.
This PR: