feat(cli): llama.cpp-compatible --ngl/--device GPU selection flags#244
Conversation
Reimplements the stale PR #144 fresh on master. Adds llama.cpp flag aliases and single-GPU device selection to both `run` and `image`: - `--ngl`/`--gpu-layers` as aliases for the existing `-g`/`--n-gpu-layers` (kept `-g` — non-breaking, unlike #144 which dropped it). - `--model-draft` (alias of `--draft-model`) and `--repeat-penalty` (alias of `--rep-penalty`) for llama.cpp parity. - `--device <auto|none|cpu|INDEX|NAMEINDEX>` (e.g. `0`, `CUDA0`, `Vulkan1`). Parsing lives in the new `GpuDevice.Resolve`: it pins CUDA via CUDA_VISIBLE_DEVICES (process-wide, only when the user hasn't set it) and returns the index for Vulkan's explicit physical-device selector. `none`/`cpu` forces the CPU path, overriding `--ngl`. Single-device only (no multi-GPU split) — comma lists are rejected. VulkanBackend gains `VulkanBackend(int deviceIndex = -1)` + `SelectPhysicalDevice(int)` with an explicit-index path (bounds-checked, compute-queue-checked); the default `-1` preserves the prior discrete-GPU-preferred auto-select, so existing callers are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for the --device option to pin a single GPU or disable GPU offloading via --device none, mirroring llama.cpp behavior. It adds a GpuDevice helper to parse the option, updates the CLI commands to support the new flag and its aliases, and updates VulkanBackend to select the specified physical device. The feedback suggests simplifying the device string parsing logic in GpuDevice.cs and adding a status message in RunZImage when the upscaler falls back to CPU under --device none to match the behavior in RunFlux.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // Strip an optional leading backend name (CUDA/Vulkan/GPU/…) and read the trailing index. | ||
| int i = value.Length; | ||
| while (i > 0 && char.IsDigit(value[i - 1])) i--; | ||
| var digits = value[i..]; | ||
| // The part before the index must be empty or a plain backend name (letters only) — | ||
| // this rejects things like "-1" (which would otherwise parse as device 1). | ||
| bool prefixOk = true; | ||
| for (int j = 0; j < i; j++) | ||
| if (!char.IsLetter(value[j])) { prefixOk = false; break; } | ||
| if (!prefixOk || digits.Length == 0 || !int.TryParse(digits, out int index) || index < 0) | ||
| throw new InvalidOperationException( | ||
| $"--device '{device}': expected a device index (0, 1, …), a named device " + | ||
| "(CUDA0, Vulkan1), 'auto', or 'none'."); |
There was a problem hiding this comment.
The logic to parse the trailing index and validate the prefix can be simplified and made more robust. Currently, the loop checks if each character in the prefix is a letter, but we can use int.TryParse on the trailing digits and check if the remaining prefix is entirely alphabetic more cleanly. Additionally, we should ensure that we handle cases where there are no digits at all or if the index parsing fails due to overflow (which is already handled by int.TryParse returning false, but we can make the error message more specific). Let's refactor this parsing logic to be more readable and robust.
// Strip an optional leading backend name (CUDA/Vulkan/GPU/…) and read the trailing index.
int i = value.Length;
while (i > 0 && char.IsDigit(value[i - 1])) i--;
var digits = value[i..];
bool prefixOk = true;
for (int j = 0; j < i; j++)
{
if (!char.IsLetter(value[j]))
{
prefixOk = false;
break;
}
}
if (!prefixOk || digits.Length == 0 || !int.TryParse(digits, out int index) || index < 0)
{
throw new InvalidOperationException(
$"--device '{device}': expected a device index (0, 1, …), a named device " +
"(CUDA0, Vulkan1), 'auto', or 'none'.");
}| // default (-1) → CUDA → Vulkan → CPU fallback | ||
| string backendChoice = (s.Backend ?? "auto").ToLowerInvariant(); | ||
| bool forceCpu = s.NGpuLayers == 0 || backendChoice == "cpu"; | ||
| bool forceCpu = s.NGpuLayers == 0 || backendChoice == "cpu" || deviceNone; |
There was a problem hiding this comment.
In RunZImage, when deviceNone is true, the main compute backend correctly falls back to CPU. However, unlike RunFlux (which explicitly logs [dim]Upscaler backend:[/] CPU (--device none)), RunZImage silently falls back to CPU for the upscaler without printing any status message. We should add a check for deviceNone or gpu == null to print a clear status message when the upscaler falls back to CPU.
bool forceCpu = s.NGpuLayers == 0 || backendChoice == Addresses PR review (silent-failure-hunter + pr-test-analyzer): - ImageCommand swallowed an explicit `--device N` failure into a silent CPU fallback (Z-Image printed the actively-wrong "no GPU detected"; FLUX printed nothing at all). Now the backend-creation fallback `catch` blocks are gated on `deviceIndex < 0` (auto-select only), so an explicit device error propagates to the outer handler and is reported. Auto-select behavior is unchanged. - Warn when `--device none` contradicts an explicit GPU request (`--ngl/-g` in run, `--backend cuda|vulkan` in image) instead of silently dropping to CPU. - Add SharpInference.Tests.Cli with 25 GpuDevice.Resolve cases covering every parse branch (incl. the `-1`-must-not-mis-parse-as-device-1 guard and int overflow) and the CUDA_VISIBLE_DEVICES pin/preserve side effect. Exposed via InternalsVisibleTo; xunit collection parallelism is off so the env-mutating cases stay deterministic. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
RunZImage silently used the CPU upscaler when no GPU backend was selected (incl. --device none), while RunFlux logged it. Print the matching "Upscaler backend: CPU (--device none)" / "CPU" line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed the review feedback across three commits: Silent-failure-hunter (CRITICAL #1/#2): An explicit Silent-failure-hunter (#5): pr-test-analyzer (#9): Added gemini (ImageCommand.cs:249): Z-Image now logs the upscaler CPU fallback ( gemini (GpuDevice.cs:57): Declined — the suggested refactor is byte-for-byte identical to the current loop (same digit-strip + letters-only prefix check). The existing logic is now locked down by the 25 unit tests above (the silent-failure-hunter (#3, CUDA index validation): Not adding a CUDA-side device-count check. |
Reimplements stale PR #144 fresh on current master (that branch had conflicts in hot files; the core here is a clean new
GpuDevicehelper).What
Adds llama.cpp flag aliases and single-GPU device selection to both the
runandimagecommands.--ngl/--gpu-layers— aliases for the existing-g/--n-gpu-layers.-gis kept (non-breaking; cli: match llama.cpp GPU flag conventions (--ngl, --device) and align flag names #144 had dropped it).--model-draft(alias of--draft-model) and--repeat-penalty(alias of--rep-penalty) — llama.cpp parity.--device <auto|none|cpu|INDEX|NAMEINDEX>— e.g.0,CUDA0,Vulkan1. Parsing lives in the newGpuDevice.Resolve:CUDA_VISIBLE_DEVICES(process-wide; only set when the user hasn't already constrained it),none/cpuforces the CPU path (overrides--ngl),VulkanBackendgainsVulkanBackend(int deviceIndex = -1)+SelectPhysicalDevice(int)with a bounds-checked, compute-queue-checked explicit-index path. The default-1preserves the prior discrete-GPU-preferred auto-select, so all existing callers (server, tests, FLUX) are byte-for-byte unchanged.Verification
--deviceparse branch:--device foo→ "expected a device index…"--device 0,1→ "multi-device split is not supported"--device=-1→ rejected (won't mis-parse as device 1)--device CUDA3/--device none→ valid, proceeds to model load--ngl,--model-draft,--repeat-penaltyrecognized and shown in--help.🤖 Generated with Claude Code