Fix leading-slash ZIP entry names breaking ZIP-based publications#809
Conversation
Some archives — notably CBZ comics packed by tools that don't strip the
leading slash, and Internet Archive comic uploads — store entry names as
absolute-looking paths like `/001.jpg`. Per APPNOTE.TXT 4.4.17.1 a ZIP
entry name is a *relative* path and "MUST NOT contain ... a leading
slash", so these are spec-noncompliant but common in the wild.
Left untouched, the leading slash makes the manifest href an absolute-path
reference: when resolved against the publication server's base URL it
replaces the base path entirely, producing a request that no longer maps
to the archive entry. Every page 404s and the reader shows broken images.
This bites image-based (Divina/CBZ) publications in particular, because
they address resources by the entry's path rather than by index.
Strip leading slashes when interpreting a stored entry name, via a new
`RelativeURL(zipEntryPath:)`, in both ZIP backends:
- MinizipContainer / ZIPFoundationContainer now key entries under the
normalized, slash-free relative URL — so `entries` and the manifest
reading order are slash-free and resolve correctly.
- MinizipContainer additionally preserves the *original* stored name in
its entry metadata and locates by it verbatim when reading
(`unzLocateFile` matches the central-directory name exactly), with a
slash-stripped fallback. ZIPFoundation already reads via the retained
`Entry`, so it needs no read-path change.
This mirrors the Kotlin toolkit, which strips the prefix for the same
reason (`fromDecodedPath(href.removePrefix("/"))`), and the wider
ecosystem (Python's `ZipFile` strips leading slashes on extraction).
Adds a forged `leading-slash.zip` fixture and tests covering both the
entries set and the read path in each backend.
There was a problem hiding this comment.
Pull request overview
This PR fixes interoperability with ZIP/CBZ archives whose entry names are stored with a leading slash (e.g. /001.jpg). By normalizing entry names to slash-free relative URLs, manifest hrefs resolve correctly against the publication server base URL, preventing widespread 404s (notably in image-based/Divina publications).
Changes:
- Added
RelativeURL(zipEntryPath:)to strip leading slashes from stored ZIP entry names. - Updated both ZIP backends (
MinizipContainer,ZIPFoundationContainer) to key entries using normalized, slash-free relative URLs, and added read-path handling for Minizip. - Added test coverage for leading-slash entry names in both container backends and documented the fix in the changelog.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| Tests/SharedTests/Toolkit/ZIP/ZIPFoundationContainerTests.swift | Adds tests ensuring leading-slash ZIP entry names are exposed slash-free and remain readable. |
| Tests/SharedTests/Toolkit/ZIP/MinizipContainerTests.swift | Adds equivalent leading-slash normalization + read tests for the Minizip backend. |
| Sources/Shared/Toolkit/ZIP/ZIPFoundation/ZIPFoundationContainer.swift | Switches entry URL creation to RelativeURL(zipEntryPath:) so entries are keyed without leading slashes. |
| Sources/Shared/Toolkit/ZIP/ZIPEntryURL.swift | Introduces RelativeURL(zipEntryPath:) helper to normalize ZIP entry paths (strip leading /). |
| Sources/Shared/Toolkit/ZIP/Minizip/MinizipContainer.swift | Keys entries by normalized slash-free URLs while preserving original stored names for reading; adjusts entry location logic. |
| CHANGELOG.md | Documents the behavior fix under Unreleased. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mickael-menu
left a comment
There was a problem hiding this comment.
Thank you @raphi011! I've simplified the CHANGELOG a bit, removed the reference to the Internet Archive (as I couldn't observe it), and relaxed the claim that most CBZ files are impacted.
Summary
ZIP archives sometimes store entry names with a leading slash (e.g.
/001.jpg). This is spec-noncompliant — APPNOTE.TXT §4.4.17.1 defines an entry name as "the name of the file, with optional relative path" that "MUST NOT contain a drive or device letter, or a leading slash" — but it's common in the wild, especially CBZ comics packed by tools that don't strip the slash, and Internet Archive comic uploads.Left untouched, the leading slash turns the manifest href into an absolute-path reference: when
/001.jpgis resolved against the publication server's base URL (e.g.readium://<uuid>/), the slash replaces the base path entirely, producing a request that no longer maps to the archive entry. Every resource 404s. This is most visible with image-based (Divina/CBZ) publications, which address resources by the entry's path — every page renders as a broken image.This was discovered while opening a real Pepper&Carrot CBZ from the Internet Archive (entries stored as
/001.jpg, …) in a Readium-based reader: import and cover extraction succeeded (direct, symmetric container lookups), but every page 404'd.Fix
Interpret a stored ZIP entry name as the relative path the format intends, by stripping leading slashes via a new
RelativeURL(zipEntryPath:), in both ZIP backends:MinizipContainer/ZIPFoundationContainernow key entries under the normalized, slash-free relative URL — soentriesand the manifest reading order are slash-free and resolve correctly against the base URL.MinizipContaineradditionally preserves the original stored name in its entry metadata and locates by it verbatim when reading (unzLocateFilematches the central-directory name exactly), with a slash-stripped fallback.ZIPFoundationContaineralready reads via the retainedEntry, so it needs no read-path change.This aligns Swift with the Kotlin toolkit, which strips the prefix for the same reason (
Url.fromLegacyHref→fromDecodedPath(href.removePrefix("/")), migration guide: "We dropped the/prefix to avoid issues when resolving to a base URL"), and with the wider ecosystem (Python'sZipFilestrips leading slashes on extraction). It also complements #432, which removed the leading-/prefix for exploded archives but not for compressed ZIP/CBZ containers.Tests
Adds a forged
leading-slash.zipfixture (entries/root.txt,/folder/file.txt, plus a well-formednormal.txt) and, for each backend, tests covering:entriesset is exposed slash-free, andAll
MinizipContainerTests+ZIPFoundationContainerTestspass (28/28) on the iOS simulator. Verified end-to-end in a downstream app: the original broken CBZ renders all pages with no re-import.