Skip to content

fix: populate missing tarball checksums (#84) + init --provider=postgresql (#85)#86

Open
renecannao wants to merge 5 commits intomasterfrom
fix/checksums-and-init-provider
Open

fix: populate missing tarball checksums (#84) + init --provider=postgresql (#85)#86
renecannao wants to merge 5 commits intomasterfrom
fix/checksums-and-init-provider

Conversation

@renecannao
Copy link
Copy Markdown

@renecannao renecannao commented Apr 10, 2026

Summary

Addresses both issues #84 and #85, plus adds macOS PostgreSQL support as a follow-up to the #85 fix.

#84 — Missing checksums for 91 tarballs

91 out of 254 tarball entries had no checksum field, so CompareTarballChecksum silently skipped verification and users downloaded without any integrity check.

Populated 89 checksums from upstream sources:

  • MySQL (70 entries): MD5 scraped from dev.mysql.com/downloads/mysql archive pages
  • MariaDB (5 entries): SHA256 from archive.mariadb.org sha256sums.txt files
  • Percona Server (14 entries): SHA256 from per-tarball .sha256sum sidecars on downloads.percona.com

The 2 remaining uncheckable entries are TiDB 3.0.0 "master" rolling tarballs, which the code already exempts.

Verified end-to-end for one MySQL entry:

scraped: MD5:43cd4e14b688e9364ec4dbd58dd37437
actual:  43cd4e14b688e9364ec4dbd58dd37437  mysql-8.4.8-linux-glibc2.17-x86_64.tar.xz

Other changes:

  • CompareTarballChecksum now prints a warning to stderr when called with an empty checksum, instead of silently skipping
  • Scraper scripts committed under scripts/checksums/ with a README so this can be repeated when new versions are added

#85dbdeployer init doesn't support PostgreSQL (Linux + macOS)

init was hardcoded to download a MySQL tarball even though PostgreSQL is a first-class provider for deploy/unpack/downloads.

Added --provider flag to dbdeployer init with full provider support:

Linux — prints clear instructions for obtaining PG .deb files and running dbdeployer unpack --provider=postgresql. (Automating .deb download from PGDG is non-trivial — multiple distro/arch variants per release — so it's deferred to a follow-up.)

macOS — fully automated via Postgres.app:

  1. Query the Postgres.app GitHub releases API
  2. Pick the newest single-major-version .dmg asset
  3. Download + mount headlessly via hdiutil attach -nobrowse -readonly
  4. Query the bundled postgres --version to detect exact X.Y version
  5. Copy bin/lib/share/include into ~/opt/postgresql/<X.Y>/
  6. Detach the DMG

No sudo, no GUI, no Homebrew. Postgres.app is a universal binary so the same path works on Apple Silicon and Intel.

For both: --provider is validated (mysql, postgresql) and errors on unknown values.

CI coverage

  • Linux PG init test (in PostgreSQL 16 job): verifies dirs created, Linux instructions printed, invalid provider rejected
  • macOS PG init + deploy test (new postgresql-macos-test job, matrix: macos-14, macos-15):
    • runs init --provider=postgresql → downloads real Postgres.app DMG
    • verifies binaries land at ~/opt/postgresql/<X.Y>/bin
    • verifies directory name is in X.Y format
    • runs deploy postgresql <X.Y> → real sandbox
    • runs CREATE TABLE / INSERT / SELECT roundtrip

Test plan

  • Locally verified MySQL 8.4.8 scraped MD5 matches actual file
  • Locally verified init --provider=postgresql on Linux creates dirs, prints instructions
  • Locally verified init --provider=oracle errors
  • Locally verified init --provider=mysql --dry-run still works
  • Unit test for Postgres.app asset filtering (offline httptest server)
  • CI: PostgreSQL job passes the new init test
  • CI: new postgresql-macos-test job passes on macos-14 and macos-15
  • CI: rest of Integration Tests / ProxySQL / ts still green

Closes #84
Closes #85

Summary by CodeRabbit

  • New Features

    • Add provider selection for init (MySQL default / PostgreSQL) and macOS PostgreSQL install support
    • Utilities to fetch and populate tarball checksums from upstream
  • Bug Fixes

    • Emit a warning when a tarball checksum is missing
  • Tests

    • New CI integration tests covering PostgreSQL init and macOS end-to-end flow
    • Unit tests for macOS PostgreSQL release parsing
  • Documentation

    • Added README for checksum scraping utilities
  • Chores

    • Updated tarball metadata with checksum entries

91 tarball entries (all MySQL 8.4.x/9.x, MariaDB, Percona 5.7+/8.x,
TiDB) had no checksum, so CompareTarballChecksum silently returned nil
and users downloaded without any integrity verification.

Scraped 89 checksums from upstream sources:
- MySQL (70 entries): MD5 from dev.mysql.com/downloads/mysql pages
- MariaDB (5 entries): SHA256 from archive.mariadb.org sha256sums.txt
- Percona Server (14 entries): SHA256 from per-tarball .sha256sum sidecars

The two remaining uncheckable entries are TiDB 3.0.0 "master" rolling
tarballs, which CompareTarballChecksum already exempts.

Also:
- CompareTarballChecksum now prints a warning to stderr when checksum
  is empty instead of silently skipping verification
- Scraper scripts committed under scripts/checksums/ for repeatability
  when new versions are added

Verified one MySQL checksum end-to-end:
  scraped: MD5:43cd4e14b688e9364ec4dbd58dd37437
  actual:  43cd4e14b688e9364ec4dbd58dd37437  mysql-8.4.8-linux-glibc2.17-x86_64.tar.xz
dbdeployer init was hardcoded to download a MySQL tarball, even though
PostgreSQL is a first-class provider for deploy/unpack/downloads. Add
a --provider flag so users can initialize a PostgreSQL environment:

  dbdeployer init --provider=postgresql

For PostgreSQL, init:
  - creates $SANDBOX_HOME as usual
  - points $SANDBOX_BINARY at ~/opt/postgresql (unless overridden)
  - does NOT try to auto-download MySQL
  - prints instructions for obtaining the PG .deb files and running
    `dbdeployer unpack --provider=postgresql` to populate the binary dir

Auto-downloading PostgreSQL .debs (to make init fully hands-off for
PG) is left for a follow-up — this commit resolves #85 by making init
not blow up when the user's target is PostgreSQL.

Also adds a CI step that:
  - runs init --provider=postgresql in an isolated dir
  - verifies directories are created
  - verifies the PG instructions are printed
  - verifies init does not reference a MySQL tarball
  - verifies init --provider=oracle errors out
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Warning

Rate limit exceeded

@renecannao has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 11 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 11 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 96af3d23-d65d-4083-95ec-518285e7954e

📥 Commits

Reviewing files that changed from the base of the PR and between 20a9f79 and 2bb81a1.

📒 Files selected for processing (2)
  • .github/workflows/integration_tests.yml
  • providers/postgresql/macos_install.go
📝 Walkthrough

Walkthrough

Adds PostgreSQL support to dbdeployer init via a new --provider flag, warns when tarball checksums are missing, populates many tarball checksums via new scraping scripts, implements a macOS Postgres.app installer, and adds CI jobs testing PostgreSQL init on Linux and macOS.

Changes

Cohort / File(s) Summary
Workflows & CI
.github/workflows/integration_tests.yml
Adds postgresql-test Linux step and postgresql-macos-test job validating dbdeployer init --provider=postgresql, sandbox creation, macOS Postgres.app extraction, simple query smoke tests, negative invalid-provider test, and cleanup.
CLI / Init Flow
cmd/init.go, ops/init.go
Adds --provider flag (defaults to mysql), propagates to InitEnvironment via InitOptions.Provider, validates provider values, branches init flow for postgresql vs mysql, and may override sandbox-binary to ~/opt/postgresql when appropriate.
PostgreSQL macOS installer
providers/postgresql/macos_install.go, providers/postgresql/macos_install_test.go
New GitHub releases parser for Postgres.app assets, selection of single-version .dmg assets, installer that mounts DMG, extracts exact X.Y version and copies bin/lib/share/include into sandboxBinaryDir/<X.Y>/; includes parsing/ordering logic and offline unit test fixture.
Downloads / Checksums
downloads/remote_registry.go, downloads/tarball_list.json
CompareTarballChecksum() now warns to stderr when checksum is empty; tarball_list.json updated with many checksum entries (MD5/SHA256) across MySQL, MariaDB, Percona tarballs.
Checksum scraping tooling
scripts/checksums/README.md, scripts/checksums/scrape_mysql_checksums.py, scripts/checksums/scrape_mariadb_percona_checksums.py
Adds documentation and two idempotent Python scrapers that fetch MD5/SHA256 checksums from upstream MySQL/MariaDB/Percona sources and update downloads/tarball_list.json; scripts skip already-populated entries and report remaining misses.

Sequence Diagrams

sequenceDiagram
    participant User
    participant CLI as dbdeployer (init)
    participant Cmd as cmd/init.go
    participant Ops as ops/init.go
    participant Installer as providers/postgresql (macOS/Linux)
    participant Filesystem as FS

    User->>CLI: dbdeployer init --provider=postgresql
    CLI->>Cmd: parse flags (--provider)
    Cmd->>Ops: InitEnvironment(provider)
    Ops->>Ops: validate provider (mysql|postgresql)
    alt provider == postgresql
        Ops->>Installer: choose platform-specific installer
        Installer->>FS: create/modify sandbox-binary (~/opt/postgresql/<X.Y>)
        Installer-->>Ops: success / instructions (Linux)
    else provider == mysql
        Ops->>Downloads: find/download/unpack tarball
        Downloads-->>Ops: tarball extracted
    end
    Ops-->>User: init complete (or printed instructions)
Loading
sequenceDiagram
    participant Downloads as dbdeployer downloads
    participant Registry as downloads/tarball_list.json
    participant Checksum as CompareTarballChecksum
    participant User

    User->>Downloads: request tarball
    Downloads->>Registry: read metadata (checksum?)
    Registry-->>Downloads: checksum present/empty
    Downloads->>Checksum: CompareTarballChecksum(...)
    alt checksum present
        Checksum-->>Downloads: verify hashes
    else checksum missing
        Checksum-->>User: WARN: checksum missing (stderr)
    end
    Downloads-->>User: deliver tarball
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I dug through tarballs, checksums in tow,
I fetched Postgres.app from releases aglow,
I nudged init to accept a new seed,
and warned when a checksum was missing indeed.
Hop, deploy, verify—now both databases grow!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the two main changes: populating missing tarball checksums (#84) and adding PostgreSQL provider support to init (#85).
Linked Issues check ✅ Passed All coding requirements from #84 and #85 are met: checksums populated (89 of 91), CompareTarballChecksum warns on missing checksums, --provider flag added, PostgreSQL init support implemented with directory setup and instructions.
Out of Scope Changes check ✅ Passed All changes are in scope: checksum scraping scripts, tarball_list.json updates, init command modifications, PostgreSQL installer implementation, and CI tests for both features.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/checksums-and-init-provider

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a --provider flag to the init command, enabling support for PostgreSQL alongside the default MySQL. It updates the initialization logic to handle PostgreSQL-specific paths and instructions, adds checksums for various database tarballs, and includes Python scripts to automate checksum scraping from official sources. Review feedback recommends enhancing the MySQL scraper's error handling and portability, and suggests making the PostgreSQL setup instructions more generic regarding version numbers.

Comment on lines +30 to +36
def fetch(url: str, cookies: str | None = None) -> str:
"""Fetch a URL via curl and return the body."""
cmd = ["curl", "-sL"]
if cookies:
cmd += ["-b", cookies]
cmd.append(url)
return subprocess.check_output(cmd, text=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fetch function does not handle errors from curl. If curl fails (e.g., with a 404 error), subprocess.check_output will raise an exception and crash the script. It would be more robust to handle this error, similar to how it's done in scrape_mariadb_percona_checksums.py.

Suggested change
def fetch(url: str, cookies: str | None = None) -> str:
"""Fetch a URL via curl and return the body."""
cmd = ["curl", "-sL"]
if cookies:
cmd += ["-b", cookies]
cmd.append(url)
return subprocess.check_output(cmd, text=True)
def fetch(url: str, cookies: str | None = None) -> str:
"""Fetch a URL via curl and return the body."""
cmd = ["curl", "-fsL"]
if cookies:
cmd += ["-b", cookies]
cmd.append(url)
try:
return subprocess.check_output(cmd, text=True, stderr=subprocess.DEVNULL)
except subprocess.CalledProcessError:
return ""

ops/init.go Outdated
Comment on lines +264 to +271
fmt.Println("PostgreSQL binaries are not auto-downloaded.")
fmt.Println("Fetch the server + client .deb packages from either:")
fmt.Println(" - https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/")
fmt.Println(" - your distribution's PostgreSQL package mirror")
fmt.Println()
fmt.Println("Then unpack them into the sandbox-binary directory:")
fmt.Printf(" dbdeployer unpack --provider=postgresql \\\n")
fmt.Printf(" postgresql-16_*.deb postgresql-client-16_*.deb\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The setup instructions for PostgreSQL hardcode version 16 in the URL and example commands. This could be confusing for users wanting to install a different version. It would be better to clarify that this is an example and provide more generic guidance.

Suggested change
fmt.Println("PostgreSQL binaries are not auto-downloaded.")
fmt.Println("Fetch the server + client .deb packages from either:")
fmt.Println(" - https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/")
fmt.Println(" - your distribution's PostgreSQL package mirror")
fmt.Println()
fmt.Println("Then unpack them into the sandbox-binary directory:")
fmt.Printf(" dbdeployer unpack --provider=postgresql \\\n")
fmt.Printf(" postgresql-16_*.deb postgresql-client-16_*.deb\n")
fmt.Println("PostgreSQL binaries are not auto-downloaded.")
fmt.Println("Fetch the server + client .deb packages for your desired version (e.g. 16) from either:")
fmt.Println(" - https://apt.postgresql.org/pub/repos/apt/ (e.g. .../pool/main/p/postgresql-16/)")
fmt.Println(" - your distribution's PostgreSQL package mirror")
fmt.Println()
fmt.Println("Then unpack them into the sandbox-binary directory (example for version 16):")
fmt.Printf(" dbdeployer unpack --provider=postgresql \\n")
fmt.Printf(" postgresql-16_*.deb postgresql-client-16_*.deb\n")


def prime_cookies() -> str:
"""Prime cookies by fetching the archives page once."""
jar = "/tmp/mysql-scrape-cookies.txt"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cookie jar path /tmp/mysql-scrape-cookies.txt is hardcoded. This is not portable and will fail on systems without a /tmp directory, such as Windows. Consider using a more portable way to define the temporary file path, for example by using the tempfile module.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/integration_tests.yml:
- Around line 382-425: The CI currently exercises PostgreSQL init but lacks the
PR-time gate to reject changes that add non-TiDB tarballs without checksums; add
a new job step in .github/workflows/integration_tests.yml (near the existing
"Test init --provider=postgresql" step) that runs the checksum-completeness
validator against downloads/tarball_list.json and fails the job if any non-TiDB
entry lacks a checksum; invoke the same validation logic used at runtime (the
code in downloads/remote_registry.go) — e.g., call the repository's checksum
validation command or a small Go script that loads downloads/tarball_list.json
and enforces checksum presence for new tarballs — so PRs are blocked when a
missing checksum is detected.

In `@ops/init.go`:
- Around line 70-79: The code currently overwrites the global default
sandboxBinary when provider == "postgresql" causing unintended persistence;
change the behavior so init only uses ~/opt/postgresql as a local runtime value
(sandboxBinary) and does not mark it as an explicit global default unless the
user actually passed --sandbox-binary. Concretely: in the provider switch where
sandboxBinary is set for PostgreSQL, do not modify persistence-related defaults
or globals; instead set only the local sandboxBinary variable, and in the
defaults-update/persistence block (the later defaults-update logic) consult
flags.Changed(globals.SandboxBinaryLabel) (populate
globals.SandboxBinaryExplicit in cmd/init.go) before writing the sandbox-binary
into defaults so that only an explicit flag causes persistence. Ensure the same
fix is applied to the second occurrence referenced (lines ~166-173) to avoid
accidental global updates.

In `@scripts/checksums/scrape_mysql_checksums.py`:
- Around line 30-36: fetch currently returns only the body and hides HTTP
errors; change fetch(url, cookies) to capture and return the HTTP status
alongside the response body (e.g., return (status_code, body)) instead of
swallowing non-200 responses, by invoking curl to output the status code (or
using --write-out) and parsing it; then update scrape_page to check the returned
status_code from fetch and bail (raise an error or return early) on any non-200
response so the script fails explicitly rather than treating error pages as “no
checksum found.”
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2fd7a9d1-b635-4fec-9a2b-4a91e86d8790

📥 Commits

Reviewing files that changed from the base of the PR and between fbc739b and 76c8d9a.

📒 Files selected for processing (8)
  • .github/workflows/integration_tests.yml
  • cmd/init.go
  • downloads/remote_registry.go
  • downloads/tarball_list.json
  • ops/init.go
  • scripts/checksums/README.md
  • scripts/checksums/scrape_mariadb_percona_checksums.py
  • scripts/checksums/scrape_mysql_checksums.py

Comment on lines +382 to +425
- name: Test init --provider=postgresql
run: |
# Run init against isolated dirs, verify it creates them and
# prints the PostgreSQL setup instructions.
OUT=$(./dbdeployer init --provider=postgresql \
--sandbox-binary=/tmp/init-pg-bin \
--sandbox-home=/tmp/init-pg-home \
--skip-shell-completion 2>&1)
echo "$OUT"

# Verify directories were created
[ -d /tmp/init-pg-bin ] || { echo "FAIL: sandbox-binary not created"; exit 1; }
[ -d /tmp/init-pg-home ] || { echo "FAIL: sandbox-home not created"; exit 1; }

# Verify the PostgreSQL setup instructions were printed
echo "$OUT" | grep -q "PostgreSQL binaries are not auto-downloaded" || {
echo "FAIL: expected PostgreSQL setup instructions in output"
exit 1
}
echo "$OUT" | grep -q "dbdeployer unpack --provider=postgresql" || {
echo "FAIL: expected unpack instructions in output"
exit 1
}

# Verify it didn't try to download a MySQL tarball
echo "$OUT" | grep -q "mysql-" && {
echo "FAIL: output references mysql tarball"
exit 1
}

# Sanity: invalid provider should fail
./dbdeployer init --provider=oracle --dry-run \
--sandbox-binary=/tmp/init-bad \
--sandbox-home=/tmp/init-bad-home 2>&1 | grep -q "unknown provider" || {
echo "FAIL: invalid provider did not error as expected"
exit 1
}

# Cleanup init side-effects so the rest of the job uses its own dirs
rm -rf /tmp/init-pg-bin /tmp/init-pg-home /tmp/init-bad /tmp/init-bad-home
# Restore sandbox defaults so subsequent steps find ~/opt/postgresql
./dbdeployer defaults update sandbox-binary "$HOME/opt/mysql"
./dbdeployer defaults update sandbox-home "$HOME/sandboxes"
echo "OK: dbdeployer init --provider=postgresql works"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add the checksum-completeness gate to CI.

This workflow now covers the PostgreSQL init path, but it still doesn’t implement #84’s requirement to fail PRs when a new non-TiDB tarball is added without a checksum. downloads/tarball_list.json can still regress if it’s edited directly, because the runtime validation in downloads/remote_registry.go is not a PR-time guard.

Suggested CI step
+      - name: Verify tarball checksums are populated
+        run: |
+          missing=$(jq -r '
+            .Tarballs[]
+            | select((.flavor != "tidb") and (.checksum == null or .checksum == ""))
+            | .name
+          ' downloads/tarball_list.json)
+          [ -z "$missing" ] || {
+            echo "Missing checksums for:"
+            echo "$missing"
+            exit 1
+          }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/integration_tests.yml around lines 382 - 425, The CI
currently exercises PostgreSQL init but lacks the PR-time gate to reject changes
that add non-TiDB tarballs without checksums; add a new job step in
.github/workflows/integration_tests.yml (near the existing "Test init
--provider=postgresql" step) that runs the checksum-completeness validator
against downloads/tarball_list.json and fails the job if any non-TiDB entry
lacks a checksum; invoke the same validation logic used at runtime (the code in
downloads/remote_registry.go) — e.g., call the repository's checksum validation
command or a small Go script that loads downloads/tarball_list.json and enforces
checksum presence for new tarballs — so PRs are blocked when a missing checksum
is detected.

Comment on lines +70 to +79
// For PostgreSQL, point sandbox-binary at ~/opt/postgresql instead of
// the MySQL default, unless the user explicitly overrode it via
// --sandbox-binary.
if provider == "postgresql" && sandboxBinary == defaults.Defaults().SandboxBinary {
home, herr := os.UserHomeDir()
if herr != nil {
return fmt.Errorf("cannot determine home directory: %s", herr)
}
sandboxBinary = path.Join(home, "opt", "postgresql")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t persist the PostgreSQL path as the new global default unless the user explicitly asked for it.

Right now init --provider=postgresql rewrites sandboxBinary to ~/opt/postgresql, and the later defaults-update block persists that as the global sandbox-binary. That changes subsequent MySQL commands too, which is why the workflow has to manually restore defaults afterward. It also can’t distinguish an inherited custom MySQL path from an explicit --sandbox-binary override.

Suggested direction
 type InitOptions struct {
 	SandboxBinary       string
 	SandboxHome         string
 	DryRun              bool
 	SkipDownloads       bool
 	SkipTarballDownload bool
 	SkipCompletion      bool
 	Provider            string
+	SandboxBinaryExplicit bool
 }
-	if provider == "postgresql" && sandboxBinary == defaults.Defaults().SandboxBinary {
+	if provider == "postgresql" && !options.SandboxBinaryExplicit {
 		home, herr := os.UserHomeDir()
 		if herr != nil {
 			return fmt.Errorf("cannot determine home directory: %s", herr)
 		}
 		sandboxBinary = path.Join(home, "opt", "postgresql")
 	}
-	if sandboxBinary != defaults.Defaults().SandboxBinary {
+	if options.SandboxBinaryExplicit && sandboxBinary != defaults.Defaults().SandboxBinary {
 		// update defaults
 	}

cmd/init.go can populate SandboxBinaryExplicit via flags.Changed(globals.SandboxBinaryLabel).

Also applies to: 166-173

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ops/init.go` around lines 70 - 79, The code currently overwrites the global
default sandboxBinary when provider == "postgresql" causing unintended
persistence; change the behavior so init only uses ~/opt/postgresql as a local
runtime value (sandboxBinary) and does not mark it as an explicit global default
unless the user actually passed --sandbox-binary. Concretely: in the provider
switch where sandboxBinary is set for PostgreSQL, do not modify
persistence-related defaults or globals; instead set only the local
sandboxBinary variable, and in the defaults-update/persistence block (the later
defaults-update logic) consult flags.Changed(globals.SandboxBinaryLabel)
(populate globals.SandboxBinaryExplicit in cmd/init.go) before writing the
sandbox-binary into defaults so that only an explicit flag causes persistence.
Ensure the same fix is applied to the second occurrence referenced (lines
~166-173) to avoid accidental global updates.

Comment on lines +30 to +36
def fetch(url: str, cookies: str | None = None) -> str:
"""Fetch a URL via curl and return the body."""
cmd = ["curl", "-sL"]
if cookies:
cmd += ["-b", cookies]
cmd.append(url)
return subprocess.check_output(cmd, text=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail explicitly on HTTP fetch errors instead of treating them as “no checksum found.”

curl -sL will happily return HTML for 404/interstitial pages, so scrape_page() can silently interpret a fetch failure as an empty result set and under-populate the registry. The MariaDB/Percona scraper in scripts/checksums/scrape_mariadb_percona_checksums.py:19-33 already preserves HTTP status; this one should do the same.

Suggested direction
-def fetch(url: str, cookies: str | None = None) -> str:
-    """Fetch a URL via curl and return the body."""
-    cmd = ["curl", "-sL"]
+def fetch(url: str, cookies: str | None = None) -> tuple[int, str]:
+    """Fetch a URL via curl and return (http_status, body)."""
+    cmd = ["curl", "-sL", "-w", "\n%{http_code}"]
     if cookies:
         cmd += ["-b", cookies]
     cmd.append(url)
-    return subprocess.check_output(cmd, text=True)
+    try:
+        out = subprocess.check_output(cmd, text=True, stderr=subprocess.DEVNULL)
+    except subprocess.CalledProcessError:
+        return (0, "")
+    body, _, code = out.rpartition("\n")
+    try:
+        return (int(code), body)
+    except ValueError:
+        return (0, "")

Then make scrape_page() bail out on non-200 responses.

🧰 Tools
🪛 Ruff (0.15.9)

[error] 36-36: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/checksums/scrape_mysql_checksums.py` around lines 30 - 36, fetch
currently returns only the body and hides HTTP errors; change fetch(url,
cookies) to capture and return the HTTP status alongside the response body
(e.g., return (status_code, body)) instead of swallowing non-200 responses, by
invoking curl to output the status code (or using --write-out) and parsing it;
then update scrape_page to check the returned status_code from fetch and bail
(raise an error or return early) on any non-200 response so the script fails
explicitly rather than treating error pages as “no checksum found.”

On macOS there is no official PostgreSQL tarball distribution. The
closest to a drop-in binary package is Postgres.app, which bundles
full PG major versions inside a .dmg with the same bin/lib/share
layout that dbdeployer already uses.

This commit wires Postgres.app into `dbdeployer init --provider=postgresql`
for darwin builds:

  1. Query the Postgres.app GitHub releases API for the latest release
  2. Pick the newest single-major-version .dmg asset
     (skipping the large multi-version bundle)
  3. Download to a tempdir
  4. Mount headlessly with `hdiutil attach -nobrowse -readonly -noverify`
  5. Run the bundled `postgres --version` to detect the exact X.Y
     version (deploy postgresql requires X.Y format)
  6. Copy bin/lib/share/include into ~/opt/postgresql/<X.Y>/
  7. Detach the DMG

No sudo, no GUI, no Homebrew. Postgres.app is a universal binary so
the same code path works on both Apple Silicon and Intel Macs.

Linux behavior is unchanged — `init --provider=postgresql` still
prints instructions to download .deb files manually, since the PGDG
pool structure (distro-specific, architecture-specific, multiple
build variants per release) is too messy to resolve generically.

Tests:
- Unit test for the GitHub release asset filtering
  (offline, using httptest.NewServer)
- CI job `postgresql-macos-test` on macos-14 and macos-15:
  * runs `init --provider=postgresql`
  * verifies binaries land at ~/opt/postgresql/<X.Y>/bin
  * verifies the directory name is in X.Y format
  * runs `deploy postgresql <X.Y>`
  * verifies CREATE TABLE / INSERT / SELECT roundtrip
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
providers/postgresql/macos_install_test.go (1)

54-57: Consider handling the JSON encoding error.

The error from json.NewEncoder(w).Encode(fakeGithubRelease) is silently discarded. While unlikely to fail in a test with a static struct, explicitly handling it would prevent silent test corruption if the fixture becomes invalid.

♻️ Suggested improvement
 	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		w.Header().Set("Content-Type", "application/json")
-		_ = json.NewEncoder(w).Encode(fakeGithubRelease)
+		if err := json.NewEncoder(w).Encode(fakeGithubRelease); err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+		}
 	}))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@providers/postgresql/macos_install_test.go` around lines 54 - 57, The test's
HTTP handler discards the error from
json.NewEncoder(w).Encode(fakeGithubRelease), which can silently mask encoding
failures; update the httptest.NewServer handler function used to construct
server to capture the error returned by
json.NewEncoder(w).Encode(fakeGithubRelease) and call t.Fatalf or t.Errorf (or
write an appropriate HTTP 500) when encoding fails so the test fails loudly;
locate the anonymous handler passed to httptest.NewServer and modify the json
encoding call there to check and handle the returned error.
providers/postgresql/macos_install.go (3)

121-129: Sorting assumes major versions remain ≤2 digits.

The string comparison assets[j].Major > assets[i].Major works correctly for PostgreSQL majors 14-18 (and up to 99), but would sort incorrectly if PostgreSQL ever reaches major version 100+ (e.g., "100" < "99" lexicographically).

This is unlikely to be an issue in practice for many years, but consider parsing to integers if future-proofing is desired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@providers/postgresql/macos_install.go` around lines 121 - 129, The current
sorting loop compares assets[j].Major and assets[i].Major as strings which can
misorder majors with three+ digits; modify the sort to compare numeric majors by
parsing assets[].Major into integers (handle parse errors safely, e.g., treat
non-numeric or parse failure as 0 or skip) and then use the integer comparison
in the nested loop where assets are swapped; update any references in this block
that use assets and the Major field to use the parsed int for sorting.

279-286: File close error is discarded.

If out.Close() fails (e.g., due to disk full on final flush), the error is silently ignored because defer out.Close() doesn't capture the return value. For large DMG downloads, this could mask data corruption.

♻️ Handle close error
-	out, err := os.Create(dst) //nolint:gosec // dst is a path we constructed in a tempdir
-	if err != nil {
-		return err
-	}
-	defer out.Close()
-
-	_, err = io.Copy(out, resp.Body)
-	return err
+	out, err := os.Create(dst) //nolint:gosec // dst is a path we constructed in a tempdir
+	if err != nil {
+		return err
+	}
+
+	_, copyErr := io.Copy(out, resp.Body)
+	closeErr := out.Close()
+	if copyErr != nil {
+		return copyErr
+	}
+	return closeErr
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@providers/postgresql/macos_install.go` around lines 279 - 286, The file
currently defers out.Close() and returns the io.Copy error, discarding any Close
error; change the logic around the writer "out" so you capture the result of
io.Copy (the current "err"), then call out.Close() explicitly, capture its error
(e.g., closeErr) and if closeErr is non-nil and the copy error is nil, return
closeErr (or combine/annotate both errors), ensuring errors from io.Copy and
out.Close() (final flush) are both handled before returning; locate the block
that creates "out" and calls io.Copy(out, resp.Body) to implement this change.

169-177: Consider the security implications of -noverify.

The -noverify flag skips DMG checksum verification. While this speeds up mounting, it bypasses macOS's built-in integrity check for the downloaded image. Since the DMG is downloaded over HTTPS from GitHub, the transport is secure, but an attacker who compromises the GitHub release could provide a corrupted/malicious DMG that would not be detected.

If performance is acceptable, consider removing -noverify to let hdiutil verify the DMG's internal checksum.

♻️ Consider removing -noverify
 	mountCmd := exec.Command("hdiutil", "attach",
-		"-nobrowse", "-readonly", "-noverify",
+		"-nobrowse", "-readonly",
 		"-mountpoint", mountPoint,
 		dmgPath)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@providers/postgresql/macos_install.go` around lines 169 - 177, The code
creates mountCmd using exec.Command("hdiutil", "attach", "-nobrowse",
"-readonly", "-noverify", "-mountpoint", mountPoint, dmgPath) which uses
-noverify (skips DMG checksum verification); remove the "-noverify" argument so
hdiutil performs integrity checks when mounting. Update the argument list passed
to exec.Command in the mountCmd creation (where hdiutil attach is invoked) to
exclude "-noverify" and keep "-nobrowse" and "-readonly" as before, then run
tests/flows to ensure mounting still works under expected performance
constraints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@providers/postgresql/macos_install_test.go`:
- Around line 54-57: The test's HTTP handler discards the error from
json.NewEncoder(w).Encode(fakeGithubRelease), which can silently mask encoding
failures; update the httptest.NewServer handler function used to construct
server to capture the error returned by
json.NewEncoder(w).Encode(fakeGithubRelease) and call t.Fatalf or t.Errorf (or
write an appropriate HTTP 500) when encoding fails so the test fails loudly;
locate the anonymous handler passed to httptest.NewServer and modify the json
encoding call there to check and handle the returned error.

In `@providers/postgresql/macos_install.go`:
- Around line 121-129: The current sorting loop compares assets[j].Major and
assets[i].Major as strings which can misorder majors with three+ digits; modify
the sort to compare numeric majors by parsing assets[].Major into integers
(handle parse errors safely, e.g., treat non-numeric or parse failure as 0 or
skip) and then use the integer comparison in the nested loop where assets are
swapped; update any references in this block that use assets and the Major field
to use the parsed int for sorting.
- Around line 279-286: The file currently defers out.Close() and returns the
io.Copy error, discarding any Close error; change the logic around the writer
"out" so you capture the result of io.Copy (the current "err"), then call
out.Close() explicitly, capture its error (e.g., closeErr) and if closeErr is
non-nil and the copy error is nil, return closeErr (or combine/annotate both
errors), ensuring errors from io.Copy and out.Close() (final flush) are both
handled before returning; locate the block that creates "out" and calls
io.Copy(out, resp.Body) to implement this change.
- Around line 169-177: The code creates mountCmd using exec.Command("hdiutil",
"attach", "-nobrowse", "-readonly", "-noverify", "-mountpoint", mountPoint,
dmgPath) which uses -noverify (skips DMG checksum verification); remove the
"-noverify" argument so hdiutil performs integrity checks when mounting. Update
the argument list passed to exec.Command in the mountCmd creation (where hdiutil
attach is invoked) to exclude "-noverify" and keep "-nobrowse" and "-readonly"
as before, then run tests/flows to ensure mounting still works under expected
performance constraints.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 753b1084-1ab9-47fd-8678-b03f4737d330

📥 Commits

Reviewing files that changed from the base of the PR and between 76c8d9a and 20a9f79.

📒 Files selected for processing (4)
  • .github/workflows/integration_tests.yml
  • ops/init.go
  • providers/postgresql/macos_install.go
  • providers/postgresql/macos_install_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • ops/init.go

dbdeployer invokes initdb with `-L <basedir>/share`, which expects
postgres.bki directly at that path. Postgres.app keeps datadir
contents at share/postgresql/ (nested), so initdb failed with:

    initdb: error: file "share/postgres.bki" does not exist

Flatten share/postgresql/* into share/ after the main copy, in
addition to the nested copy that the postgres runtime needs (via
compiled-in sharedir relocation). Mirrors what the Linux .deb
extractor already does.
The unauthenticated GitHub API allows 60 req/hour per IP. Shared CI
runners on GitHub Actions routinely exhaust this: the first macOS
init run got HTTP 403.

Send GITHUB_TOKEN as a Bearer token if set in the environment
(automatic on every GitHub Actions runner). This raises the limit to
5000 req/hour and is harmless for unauthenticated local use where
the token is simply absent.

Also expose GITHUB_TOKEN to the macOS init CI step.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dbdeployer init doesn't support PostgreSQL (hardcoded to MySQL) Missing checksums for MySQL 8.4+, 9.x, MariaDB, Percona 5.7+/8.x (91/254 tarballs)

1 participant