Skip to content

Fix leading-slash ZIP entry names breaking ZIP-based publications#809

Merged
mickael-menu merged 2 commits into
readium:developfrom
raphi011:fix/zip-leading-slash-entries
Jun 14, 2026
Merged

Fix leading-slash ZIP entry names breaking ZIP-based publications#809
mickael-menu merged 2 commits into
readium:developfrom
raphi011:fix/zip-leading-slash-entries

Conversation

@raphi011

@raphi011 raphi011 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

ZIP archives sometimes store entry names with a leading slash (e.g. /001.jpg). This is spec-noncompliant — APPNOTE.TXT §4.4.17.1 defines an entry name as "the name of the file, with optional relative path" that "MUST NOT contain a drive or device letter, or a leading slash" — but it's common in the wild, especially CBZ comics packed by tools that don't strip the slash, and Internet Archive comic uploads.

Left untouched, the leading slash turns the manifest href into an absolute-path reference: when /001.jpg is resolved against the publication server's base URL (e.g. readium://<uuid>/), the slash replaces the base path entirely, producing a request that no longer maps to the archive entry. Every resource 404s. This is most visible with image-based (Divina/CBZ) publications, which address resources by the entry's path — every page renders as a broken image.

This was discovered while opening a real Pepper&Carrot CBZ from the Internet Archive (entries stored as /001.jpg, …) in a Readium-based reader: import and cover extraction succeeded (direct, symmetric container lookups), but every page 404'd.

Fix

Interpret a stored ZIP entry name as the relative path the format intends, by stripping leading slashes via a new RelativeURL(zipEntryPath:), in both ZIP backends:

  • MinizipContainer / ZIPFoundationContainer now key entries under the normalized, slash-free relative URL — so entries and the manifest reading order are slash-free and resolve correctly against the base URL.
  • MinizipContainer additionally preserves the original stored name in its entry metadata and locates by it verbatim when reading (unzLocateFile matches the central-directory name exactly), with a slash-stripped fallback. ZIPFoundationContainer already reads via the retained Entry, so it needs no read-path change.

This aligns Swift with the Kotlin toolkit, which strips the prefix for the same reason (Url.fromLegacyHreffromDecodedPath(href.removePrefix("/")), migration guide: "We dropped the / prefix to avoid issues when resolving to a base URL"), and with the wider ecosystem (Python's ZipFile strips leading slashes on extraction). It also complements #432, which removed the leading-/ prefix for exploded archives but not for compressed ZIP/CBZ containers.

Tests

Adds a forged leading-slash.zip fixture (entries /root.txt, /folder/file.txt, plus a well-formed normal.txt) and, for each backend, tests covering:

  • the entries set is exposed slash-free, and
  • the read path resolves a leading-slash entry to its bytes.

All MinizipContainerTests + ZIPFoundationContainerTests pass (28/28) on the iOS simulator. Verified end-to-end in a downstream app: the original broken CBZ renders all pages with no re-import.

Some archives — notably CBZ comics packed by tools that don't strip the
leading slash, and Internet Archive comic uploads — store entry names as
absolute-looking paths like `/001.jpg`. Per APPNOTE.TXT 4.4.17.1 a ZIP
entry name is a *relative* path and "MUST NOT contain ... a leading
slash", so these are spec-noncompliant but common in the wild.

Left untouched, the leading slash makes the manifest href an absolute-path
reference: when resolved against the publication server's base URL it
replaces the base path entirely, producing a request that no longer maps
to the archive entry. Every page 404s and the reader shows broken images.
This bites image-based (Divina/CBZ) publications in particular, because
they address resources by the entry's path rather than by index.

Strip leading slashes when interpreting a stored entry name, via a new
`RelativeURL(zipEntryPath:)`, in both ZIP backends:

- MinizipContainer / ZIPFoundationContainer now key entries under the
  normalized, slash-free relative URL — so `entries` and the manifest
  reading order are slash-free and resolve correctly.
- MinizipContainer additionally preserves the *original* stored name in
  its entry metadata and locates by it verbatim when reading
  (`unzLocateFile` matches the central-directory name exactly), with a
  slash-stripped fallback. ZIPFoundation already reads via the retained
  `Entry`, so it needs no read-path change.

This mirrors the Kotlin toolkit, which strips the prefix for the same
reason (`fromDecodedPath(href.removePrefix("/"))`), and the wider
ecosystem (Python's `ZipFile` strips leading slashes on extraction).

Adds a forged `leading-slash.zip` fixture and tests covering both the
entries set and the read path in each backend.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes interoperability with ZIP/CBZ archives whose entry names are stored with a leading slash (e.g. /001.jpg). By normalizing entry names to slash-free relative URLs, manifest hrefs resolve correctly against the publication server base URL, preventing widespread 404s (notably in image-based/Divina publications).

Changes:

  • Added RelativeURL(zipEntryPath:) to strip leading slashes from stored ZIP entry names.
  • Updated both ZIP backends (MinizipContainer, ZIPFoundationContainer) to key entries using normalized, slash-free relative URLs, and added read-path handling for Minizip.
  • Added test coverage for leading-slash entry names in both container backends and documented the fix in the changelog.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Tests/SharedTests/Toolkit/ZIP/ZIPFoundationContainerTests.swift Adds tests ensuring leading-slash ZIP entry names are exposed slash-free and remain readable.
Tests/SharedTests/Toolkit/ZIP/MinizipContainerTests.swift Adds equivalent leading-slash normalization + read tests for the Minizip backend.
Sources/Shared/Toolkit/ZIP/ZIPFoundation/ZIPFoundationContainer.swift Switches entry URL creation to RelativeURL(zipEntryPath:) so entries are keyed without leading slashes.
Sources/Shared/Toolkit/ZIP/ZIPEntryURL.swift Introduces RelativeURL(zipEntryPath:) helper to normalize ZIP entry paths (strip leading /).
Sources/Shared/Toolkit/ZIP/Minizip/MinizipContainer.swift Keys entries by normalized slash-free URLs while preserving original stored names for reading; adjusts entry location logic.
CHANGELOG.md Documents the behavior fix under Unreleased.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Sources/Shared/Toolkit/ZIP/Minizip/MinizipContainer.swift Outdated
Comment thread Sources/Shared/Toolkit/ZIP/Minizip/MinizipContainer.swift
@mickael-menu mickael-menu changed the title fix(zip): normalize leading-slash entry names in ZIP containers Fix leading-slash ZIP entry names breaking ZIP-based publications Jun 14, 2026

@mickael-menu mickael-menu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @raphi011! I've simplified the CHANGELOG a bit, removed the reference to the Internet Archive (as I couldn't observe it), and relaxed the claim that most CBZ files are impacted.

@mickael-menu mickael-menu merged commit ec6795f into readium:develop Jun 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants