feat: add auto-update mechanism for CLI#825
Open
m-abdelwahab wants to merge 61 commits intomasterfrom
Open
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ck issues - Don't await download_and_stage on exit; spawn it as fire-and-forget so commands like --help return instantly even when an update is available - Add writability check before attempting self-update download, skipping the download entirely for root-owned paths like /usr/local/bin - Replace acquire-then-drop file lock with PID file guard for package manager updates to prevent duplicate concurrent spawns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move download_and_stage back into the awaited task so the tokio runtime doesn't cancel it on exit, but cap handle_update_task with a 5-second timeout so short commands like --help aren't blocked - Write "PID TIMESTAMP" to the lock file and treat entries older than 10 minutes as stale, fixing the permanent block on Windows where is_pid_alive always returned true Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… control flow - Write last_update_check even when CLI is up-to-date, preventing a GitHub API call on every invocation - Use separate file paths for self-update flock (update.lock) and package-manager PID guard (package-update.pid) - Return Ok(None) instead of bail! for "already checked today" since it's normal control flow, not an error - Make Preferences::write atomic via temp file + rename Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…date - Bypass same-day gate for known-pending versions so timed-out downloads retry on the next invocation - Skip spawning update task when running `autoupdate` subcommand to avoid racing with preference changes - Use nix crate for Unix PID liveness check; add Windows implementation via winapi instead of conservative fallback - Detach child update process from console Ctrl+C on Windows - Add Windows CREATE_NEW_PROCESS_GROUP flag to package manager spawning - Use scopeguard for write-probe cleanup in install method detection - Warn when checksums.txt is missing rather than silently skipping - Improve rollback: back up current binary first, support multi-candidate selection via inquire prompt - Remove freebsd target triple (no release asset published) - Use nanosecond-precision tmp file names to reduce collision risk Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…on-writable shell installs, and preserve retry signal - Skip try_apply_staged() for both `upgrade` and `autoupdate` subcommands so `railway autoupdate disable` doesn't swap the binary before disabling - Add can_write_binary() check in `railway upgrade` for shell installs in non-writable locations (e.g. /usr/local/bin with sudo), guiding users to use sudo or reinstall instead of failing with a permission error - Move clear_latest() from eager call in main to post-success in the background task, so a timed-out download preserves the retry signal for the next invocation instead of losing it behind the same-day gate Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ownload_and_stage, and add FreeBSD target - For installs that can't self-update or auto-run a package manager (Homebrew, Cargo, Unknown, non-writable Shell), clear latest_version after the notification so the next day's check_update() can discover newer releases instead of freezing on the first cached version - Wrap download_and_stage() with an exclusive file lock (update.lock) using double-checked locking to prevent concurrent CLI processes from racing on the staged-update directory - Add FreeBSD x86_64 to detect_target_triple() to match install.sh support, preventing shell-installed FreeBSD users from hitting an "Unsupported platform" error on upgrade Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fe rename, custom bin dir detection, and rollback target tracking - download_and_stage returns Result<bool> so callers distinguish "lock held" (no work done) from a real staging; failed package-manager spawns no longer fall through to the notification-only branch that clears the cache - Add rename_replacing() helper that removes the destination on Windows before renaming, fixing silent write failures for preferences.json and update.json after the first successful write - Shell install detection now falls back to any parent directory named "bin", covering custom --bin-dir installs (~/bin, /opt/bin, etc.) - Backup filenames include the target triple; rollback filters candidates by current architecture to prevent cross-arch restoration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g, preference persistence, and FreeBSD self-update - Use rename_replacing() in UpdateCheck::write() so Windows cache clears work - Add interact_or! and can_write_binary() guards to rollback path - Report CI-disabled state in autoupdate status - Create ~/.railway dir in Preferences::write() for clean HOME directories - Remove FreeBSD from self-update targets until release pipeline publishes assets Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…D self-update, and preserve notifications when auto-updates disabled Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, and use atomic Windows rename - Notification-only installs (Homebrew, Cargo, Unknown) now preserve latest_version in the cache so the "new version available" notice actually shows on the next invocation instead of being cleared before the user sees it. - Detached package-manager updates no longer clear the retry signal on spawn — the cached version persists until the user is actually on the new version. - Failed staged-update applies preserve the staged payload for retry instead of deleting it; the 7-day staleness TTL handles permanent failures. - `railway upgrade` now clears the version cache after a successful update so the next invocation doesn't redundantly re-download. - Preferences::write() returns Result so `railway autoupdate disable` and `railway telemetry disable` report failures instead of silently succeeding. - Standardize timestamp_nanos_opt() on unwrap_or_default() everywhere. - Windows rename_replacing uses MoveFileExW(MOVEFILE_REPLACE_EXISTING) for a single-syscall atomic replace instead of remove+rename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lper Consolidate repeated UpdateCheck write blocks in spawn_update_task into a single persist_latest() method, and skip redundant writes when check_update() already persisted the timestamp. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
list_backups relied on filesystem modification times, which required write access to set in tests and is fragile across platforms. Sort by the semver version embedded in the backup filename instead. Also removes low-value tests that only assert trivial string formatting or compiler-guaranteed match exhaustiveness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, and clarify update message - Spawn background downloads in a detached child process so they survive beyond the parent's exit timeout instead of restarting from zero - Add download_failures counter to version.json; after 3 consecutive failures, clear the cached version to force a fresh API re-check (fixes infinite retry loop on yanked/stale releases) - Reduce exit timeout from 5s to 2s since it now only gates the fast API version check, not the download - Append "(active on next run)" to the auto-update message since the current process still executes the old binary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ndant spawns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rollback prune timing, and check_update timeout - Correct "5 s" → "2 s" in download_and_stage comment to match handle_update_task - Simplify can_write_binary probe to unconditional create-then-remove - Skip spawn_update_task in non-TTY when auto-update is disabled - Clean up leftover .old.exe on Windows at the start of try_apply_staged - Defer backup pruning until after rollback succeeds so candidates aren't removed before the picker - Add 30 s timeout to check_update reqwest client to prevent indefinite hangs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…output println! in the auto-update and version-banner paths wrote to stdout, which breaks JSON parsing when the CLI is invoked programmatically (e.g. by Claude Code piping `railway status --json`). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Once the installed binary reaches or surpasses the cached latest_version, clear the cache so spawn_update_task falls through to a fresh check_update(). This prevents repeated package-manager spawns on every invocation after an update lands and allows discovery of newer releases on manual install paths. Also null out stdin on the detached package-manager update process to match spawn_background_download and fully detach from the terminal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… permanent Previously spawn_update_task skipped check_update() entirely when a cached version existed, and re-persisted the timestamp on every invocation. This prevented the same-day gate from ever expiring, so the CLI would advertise a stale cached version forever and never discover newer releases. Now we always call check_update() first (gated to once per UTC day), falling back to the cached version only within the same day for download retries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
background_stage_update() was clearing the cached latest_version immediately after staging succeeded, but the binary isn't replaced until try_apply_staged() runs on a later invocation. If that apply step fails, the user loses both the upgrade banner and future download retries until the 7-day stale TTL expires. The cache is already cleared on successful apply in try_apply_staged(), so the staging path should leave it alone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
self_update_interactive() dropped the update lock after staging but before applying, letting a concurrent try_apply_staged() race the binary replacement. Now the lock is held across both staging and apply. rollback() was mutating the binary and cleaning staged state without acquiring update.lock at all, which could interleave with a background auto-updater. Now it acquires the same lock before replacing the binary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ose stage-apply race update_strategy() now checks can_self_update() and can_write_binary() so `autoupdate status` no longer claims "Background download + auto-swap" when the binary directory is not writable or the platform is unsupported. self_update_interactive() now holds a single lock across both download_and_stage_inner() and apply_staged_update(), eliminating the window where a concurrent try_apply_staged() could consume the staged binary between staging and applying. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lock race Removing the lock file while the handle is still held unlinks the inode on Unix, allowing a concurrent process to create a new file at the same path and acquire its own "exclusive" lock. Replacing remove_file with drop releases the lock via the OS and leaves the file as an inert sentinel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract shared spawn_detached() into util/mod.rs, deduplicating the detached-process setup from spawn_background_download and spawn_package_manager_update - Remove redundant staged-update check in spawn_background_download (child process already checks in download_and_stage) - Consolidate scattered from_cache/persist_latest branches into a single needs_persist flag - Cache is_auto_update_disabled() result in main() instead of calling it twice - Use persist_latest(None) in check_update's up-to-date branch - Trim narrating comments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rely After rollback, record the rolled-back-from version as skipped in version.json. Auto-update skips only that version and resumes normally once a newer release is published. This avoids silent staleness from a full pause while still respecting the user's rollback intent. - Add skipped_version field to UpdateCheck (serde-default for compat) - Record skip in rollback(), guard try_apply_staged() against the race where a pre-rollback detached download stages the skipped version - Clear skip on successful update (clear_after_update) and autoupdate enable - Suppress "new version available" notification for skipped version - Show skipped version in autoupdate status Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ope) The missing checksums.txt in the release pipeline predates this PR. Reverting the workflow change to keep this PR focused on auto-update fixes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ut of scope)" This reverts commit e1e76a1.
…rmissions - Gate background updates and staged-apply on is_tty so cron jobs, scripts, and piped invocations never silently self-mutate - Resolve symlinks in InstallMethod::detect() to prevent Intel Homebrew misclassification when current_exe() returns the symlink path - Replace catch-all bin/ Shell detection with explicit allowlist of known shell-installer directories to avoid overwriting version-manager binaries - Probe binary directory writability in can_auto_run_package_manager() to prevent repeated doomed npm/Bun/Scoop updates on sudo installs - Acquire package-update lock in autoupdate disable to wait for in-flight package manager updates before returning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, fix status accuracy - autoupdate disable now polls the actual child PID instead of acquiring a lock the detached process never holds, so it truly waits for in-flight npm/Bun/Scoop updates (30s timeout with warning) - Replace install-method allowlist with version-manager exclusion list (asdf, mise, rtx, proto, volta, fnm, nodenv, rbenv, pyenv) and restore the */bin/ catch-all so custom --bin-dir installs from install.sh are correctly classified as Shell again - update_strategy() now reflects the writability check for npm/Bun/Scoop, showing "Notification only" when the binary directory is not writable instead of overpromising Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…inary, quiesce detached child on disable - `railway autoupdate enable` no longer clears skipped_version; the rollback guard persists until a newer release is applied - Staged-update fast paths now verify the binary exists before short-circuiting; missing binary cleans stale metadata immediately - Detached background child re-checks auto-update preferences before downloading and again after acquiring the update lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
check_updates is a read-only command but was not in the update-management guard, so it could apply staged binaries and spawn background downloads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ter rollback, fix Windows upgrade instructions - spawn_package_manager_update re-checks is_auto_update_disabled() after acquiring its lock, preventing a concurrent invocation from spawning an updater after the user ran autoupdate disable - skip_version() now clears last_update_check so the next invocation performs a fresh API check and can discover a newer release published the same day as the rollback - Upgrade/rollback non-writable instructions now show Windows-appropriate guidance (run as Administrator) instead of sudo/bash commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ire package-update lock in disable - check_update() no longer arms the daily gate when the discovered version matches skipped_version, so a fix release published later the same day is discovered on the next invocation - autoupdate disable now acquires package_update_lock (blocking) before reading the PID file, closing the race where spawn_package_manager_update writes the PID after disable already looked Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eout disable was unconditionally removing the PID file even when the detached updater was still alive. This left no in-flight marker, so re-enabling auto-updates could launch a duplicate updater. Now the PID file is only removed once the child has actually exited. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… file On a fresh install where ~/.railway has never been written, rollback would fail with "Failed to create update lock file" instead of reporting no backups available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The */bin catch-all could misclassify a custom Cargo install root (e.g. CARGO_INSTALL_ROOT=~/tools) as a shell install, allowing the auto-updater to self-replace a Cargo-managed binary. Now checks for Cargo's .crates.toml marker in the parent of bin/ before falling through to Shell. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p loss, check Cargo marker before shell-path heuristic check_update() loaded version.json, did a network call (up to 30s), then wrote the stale snapshot back — silently overwriting a skipped_version set by a concurrent rollback. Now re-reads from disk after the network call. InstallMethod::detect() matched /usr/local/bin before the .crates.toml marker check, so CARGO_INSTALL_ROOT=/usr/local installs were misclassified as Shell. Moved the marker check above the shell-path heuristic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… from npm detection clear_latest() called persist_latest(None) which set last_update_check=now, preventing the background task from discovering a hotfix published later the same day after a manual upgrade. Now clears the cached version without stamping the gate. pnpm global paths contain "npm" as a substring, causing misclassification as InstallMethod::Npm and driving updates through the wrong package manager. Added an early pnpm check that falls back to Unknown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dates run_upgrade_command() now acquires the same file lock and checks the PID file used by spawn_package_manager_update(), preventing concurrent package-manager processes against the same global install when a user runs `railway upgrade` while a background auto-update is in flight. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d version on API failure Use raw args to detect update-management subcommands so that clap DisplayHelp/DisplayVersion error paths (e.g. `railway upgrade --help`) no longer trigger try_apply_staged() as a side effect. Match on the Result from check_update() instead of using `?` so that API errors fall back to the cached known_version for retry rather than short-circuiting the entire update task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…aths --help, --version, and invalid-input paths now bypass both try_apply_staged() and the background update spawn, ensuring they are truly read-only with zero added latency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…check - Extract `parse_pid_file()` in check_update.rs to deduplicate PID file parsing across autoupdate, upgrade, and check_update modules - Remove unreachable `.crates.toml` guard in install_method.rs catch-all (the same check already fires earlier in detect()) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… idempotent - After rollback, an API failure would fall back to the cached (skipped) version and call persist_latest(), arming the daily gate. This prevented re-checking for a newer release until the next day. Now skipped versions skip persist_latest so the API is re-checked on every invocation. - Add --clobber to gh release upload so re-running the checksums job doesn't fail on an existing asset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…get triple - Bare `railway` (no subcommand) now skips try_apply_staged() and the background updater, matching the read-only behavior of --help/--version. Previously a first-time user typing `railway` to explore would trigger update side effects before seeing help. - detect_target_triple() now uses the compile-time BUILD_TARGET from build.rs instead of runtime OS/arch guessing. This ensures the self-updater fetches the correct ABI variant (e.g. a binary built for x86_64-unknown-linux-gnu will not be replaced with a musl build). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… format - Add "help" to the read-only invocation guard so `railway help` skips try_apply_staged() and the background updater, matching the behavior of --help and bare `railway`. - i686-pc-windows-gnu is cross-compiled on Linux and only ships as .tar.gz (no .zip). The asset name logic now accounts for this, and extraction derives the format from the asset name rather than re-checking the target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, eager download spawn - clear_latest() now resets last_update_check so that same-day hotfixes are discovered after the user catches up to a cached version. - Shell installs on unsupported self-update platforms (e.g. FreeBSD) now get a dedicated match arm showing the reinstall command instead of falling through to the vague catch-all message. - spawn_update_task now starts the background download from the cached version before the API call, so the 2s exit timeout cannot strand the spawn on slow networks. The API check still runs to refresh the cache. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The eager download spawn could race with a newer version discovered by the API: the cached-version child holds the update lock, so the newer-version child exits immediately, and try_apply_staged would apply the stale cached release on the next run. Fix: only eagerly spawn when the same-day gate is armed (last_update_check is today). In that case check_update(false) returns Ok(None) instantly without a network request, so the API cannot discover a newer version that would conflict. When the gate is NOT armed, the API call goes over the network and may return a newer release — defer the spawn to after the check completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…log file spawn_detached() creates the log file without ensuring the parent directory exists. On a fresh install where ~/.railway doesn't exist yet, this fails silently (the spawn result is ignored), preventing the background download from starting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ne, add PID TTL to disable - Version discovery and the "New version available" banner now run regardless of auto-update preference. Disabling auto-update stops automatic installation, not release awareness. - `railway upgrade` now falls back to an already-staged update when the network check fails, so offline users can apply a previously downloaded binary. - `autoupdate disable` now applies the 10-minute PID file TTL before trusting the stored PID, consistent with upgrade and spawn_package_manager_update. Prevents blocking 30s on a recycled PID. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fallback path in self_update_interactive now applies the same guards as try_apply_staged: rejects stale entries, wrong-platform binaries, versions <= current, and versions the user rolled back from. Prevents offline `railway upgrade` from downgrading or re-applying a skipped version. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch version check gate from calendar-day UTC to fixed 12h window - Remove checksums.txt machinery (code, tests, workflow job) — co-located checksums don't add security; TLS handles integrity in transit - Reduce exit-delay timeout from 2s to 1s for version check - Simplify `autoupdate disable` — set flag and clean staged, no PID polling - Run version check in non-TTY to keep cache fresh for script-heavy users - Suppress update banner when disabled via env var or CI, keep for preference - Extract `validate_staged()` to deduplicate safety checks - Enrich `autoupdate status` with pipeline state (latest version, staged, last check time, in-flight PID) - Add `autoupdate skip` subcommand for package-manager installs - Split download timeout: 30s background, 120s interactive - Replace count-based package-manager retry (5 attempts) with time gate (1/hr) - Add telemetry event for silent auto-update apply - Clean up orphaned generate-checksums workflow job Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract `is_package_update_running()` helper to replace 3 identical PID-file-parse + age-check + liveness-check blocks - Extract `PID_STALENESS_TTL_SECS` constant (was magic number 600) - Remove unnecessary clone in `autoupdate skip` - Remove TOCTOU `.exists()` check before `hard_link` in backup - Remove unnecessary comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move validate_staged() inside the exclusive lock in try_apply_staged() to close TOCTOU window where another process could delete the staged binary between validation and apply - Use std::mem::forget on detached Child handles to avoid leaking OS resources on Windows (both package-manager and background-download paths) - Use rename_replacing() in replace_binary() on Unix for consistency - Replace silent `let _ = write()` in cache mutation methods with try_write() that logs warnings on failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves conflicts: - src/telemetry.rs: keep auto-update additions, drop Notices struct and show_notice_if_needed (moved to install script in #832) - src/main.rs: keep auto-update startup logic, drop show_notice call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a complete auto-update system to the Railway CLI that detects the install method, picks the right update strategy, and applies updates in the background — transparently and safely. Users no longer need to manually run
railway upgradeto stay current.New commands
railway autoupdate enable— Enable auto-updates (preserves any rollback skip; prints a notice if one is active)railway autoupdate disable— Disable auto-updates and clean up staged binaries (returns instantly, no blocking wait)railway autoupdate status— Show current state: enabled/disabled reason, install method, update strategy, latest known version, staged update, last check time, in-flight background PID, and skipped versionrailway autoupdate skip— Skip the current pending version so auto-update stops trying to install it (essential for package-manager installs where--rollbackisn't available)railway upgrade --rollback— Revert to a previous CLI version from local backups (shell installs only)Design
Two-phase update: stage now, apply next run
The core mechanism is stage → apply:
~/.railway/staged-update/. This process is fully independent — it survives after the parent exits, so slow downloads never block the user. Background downloads timeout after 30 seconds; interactiverailway upgradeallows 120 seconds.Why not download-and-exec in one shot? Replacing a running binary and exec'ing into it is fragile, especially on Windows where the OS locks the running executable. The two-phase approach avoids self-replacement races, keeps each invocation's behavior predictable (you always run the binary you launched), and makes rollback straightforward since the old binary is backed up before replacement.
Why detached child processes? A CLI command should return in milliseconds. Downloads can take seconds. Spawning a detached process (
process_group(0)on Unix,CREATE_NEW_PROCESS_GROUPon Windows) means the download runs independently of the parent's lifetime. A 1-second timeout on the version-check task ensures evenrailway --helpreturns instantly — the only work gated on that timeout is the fast GitHub API call, never the download.12-hour API check gate
The CLI queries the GitHub Releases API at most once per 12 hours. The check timestamp is stored in
~/.railway/version.jsonand compared as a fixed duration from the last check.Why 12 hours? It balances hotfix discovery speed (a critical fix is picked up within half a day) against API call volume. Users who work in morning and afternoon sessions get a fresh check each session. The fixed-duration approach avoids the timezone-dependent edge cases of a calendar-day gate (e.g., checking at 11:59pm UTC and not rechecking until the next calendar day).
The version check runs in all environments (including non-TTY and piped output) to keep the cache fresh. Only the staged-binary apply and update banner are suppressed in non-interactive contexts.
If the user rolls back from a bad version, the check gate is explicitly cleared so the CLI re-checks on every invocation until a newer (non-skipped) release appears — ensuring a hotfix is discovered promptly.
Install method detection
The CLI resolves
current_exe(), follows symlinks, and matches the path against known patterns:~/.railway/bin,/usr/local/bin, generic*/binsudo/ Administratornode_modules,.bun, orscooppathshomebreworCellar.cargo/binor customCARGO_INSTALL_ROOT(detected via.crates.tomlmarker)/usr/bin,/nix/,/snap/), version managers (.asdf,.mise,.volta, etc.)Why notification-only for Homebrew and Cargo? Both package managers can take minutes to complete (Homebrew resolves the full dependency graph; Cargo recompiles from source). Spawning them in the background would leave a long-running process that's invisible to the user, consuming CPU and potentially conflicting with other brew/cargo operations. npm, Bun, and Scoop are fast enough for background use.
Why a write probe instead of checking file permissions? Unix file permissions don't account for ACLs, mount flags, or SELinux policies. Creating a temp file in the binary's directory and immediately removing it is the most reliable cross-platform writability check.
Version skipping
When a release is broken, users need a way to say "stop trying to install this version" without disabling all future updates. The
skipped_versionfield inversion.jsonserves this purpose:railway upgrade --rollbackswaps the binary back and setsskipped_versionautomatically.railway autoupdate skipmarks the currently cached pending version as skipped. This is essential because--rollbackisn't available for package-manager installs — withoutskip, the next CLI invocation would re-run the package manager and put the user right back on the broken version.The skip clears automatically when a newer release is successfully applied (via auto-update or
railway upgrade).autoupdate enabledoes not clear it — the user can explicitly take the skipped version withrailway upgradeif they choose.Failure recovery
Shell-download failures are tracked by a
download_failurescounter. After 3 consecutive failures, the cached version is cleared to force a fresh API check — preventing a stale or unreachable version from blocking discovery of newer releases.Package-manager spawns are time-gated: at most one spawn per hour per cached version. The gate resets when the GitHub API discovers a new release. This prevents rapid-fire retries when multiple CLI invocations happen before the update finishes (e.g., 5 quick commands during a slow
npm update), while still retrying periodically.sudo/ AdministratorConcurrency & safety
fs2) on~/.railway/update.lockprevents concurrent processes from racing on download/stage/apply. A separatepackage-update.lockserializes package manager spawns.kill(pid, 0), WindowsOpenProcess).is_auto_update_disabled()before downloading and again after acquiring the lock, soautoupdate disablefrom another terminal is respected immediately.MoveFileExWwithMOVEFILE_REPLACE_EXISTINGon Windows).railway,railway help,railway upgrade,railway autoupdate, andcheck_updatesnever trigger staged-binary apply or background spawns. Non-TTY environments stage updates but never apply. CI environments never self-mutate.validate_staged()is the single source of truth for staged-update safety checks (stale, wrong platform, not newer, skipped version), used by both the silent startup apply and the interactiverailway upgradefallback path.Telemetry
Auto-update emits an
autoupdate_applytelemetry event when a staged update is silently applied at startup. The event includes the old CLI version (cli_version) and the new version (sub_command), enabling tracking of update adoption rates and success.All other auto-update actions (enable, disable, skip, status, upgrade, rollback) are tracked through the existing command telemetry system.
User controls
railway autoupdate [enable|disable|status|skip]for persistent preferencesRAILWAY_NO_AUTO_UPDATE=1env var override — fully silent (no banner, no updates)railway autoupdate disable(preference) — stops installation but still shows "new version available" bannerWhy different banner behavior for env var vs. preference? The env var and CI are scripted environments where any extra output is noise. The preference is for cautious users who want to know about releases but control when they install — they still benefit from the notification.
Backup & rollback
~/.railway/backups/, sorted by semverrailway upgrade --rollbackpresents an interactive picker if multiple candidates existautoupdate skipto prevent re-upgradePlatform support
.old.exerename-then-copy strategy (OS locks running binaries), cleanup on next run,CREATE_NEW_PROCESS_GROUPfor detached processes, platform-appropriate elevation instructionsBUILD_TARGETenv var (set bybuild.rs) ensures the self-updater fetches the exact ABI variant (gnu vs musl, msvc vs gnu)Test plan
railway upgradedownloads and applies interactivelyrailway upgrade --rollbackreverts and skips the rolled-back versionnpm update -grailway autoupdate skipstops re-upgrading after manual downgraderailway autoupdate disablereturns instantly and cleans staged binariesrailway autoupdate enablere-enables without clearing rollback skiprailway autoupdate statusshows install method, strategy, latest version, staged update, last check, and in-flight PIDRAILWAY_NO_AUTO_UPDATE=1: verify fully silent (no banner, no updates)autoupdate_applytelemetry event fires after silent startup apply🤖 Generated with Claude Code