feat(cli): llama.cpp-compatible --ngl/--device GPU selection flags by pekkah · Pull Request #244 · pekkah/SharpInference

pekkah · 2026-06-14T09:09:58Z

Reimplements stale PR #144 fresh on current master (that branch had conflicts in hot files; the core here is a clean new GpuDevice helper).

What

Adds llama.cpp flag aliases and single-GPU device selection to both the run and image commands.

--ngl / --gpu-layers — aliases for the existing -g / --n-gpu-layers. -g is kept (non-breaking; cli: match llama.cpp GPU flag conventions (--ngl, --device) and align flag names #144 had dropped it).
--model-draft (alias of --draft-model) and --repeat-penalty (alias of --rep-penalty) — llama.cpp parity.
--device <auto|none|cpu|INDEX|NAMEINDEX> — e.g. 0, CUDA0, Vulkan1. Parsing lives in the new GpuDevice.Resolve:
- pins CUDA via CUDA_VISIBLE_DEVICES (process-wide; only set when the user hasn't already constrained it),
- returns the index for Vulkan's explicit physical-device selector,
- none / cpu forces the CPU path (overrides --ngl),
- single-device only — comma-separated multi-GPU lists are rejected (SharpInference has no tensor/row split).

VulkanBackend gains VulkanBackend(int deviceIndex = -1) + SelectPhysicalDevice(int) with a bounds-checked, compute-queue-checked explicit-index path. The default -1 preserves the prior discrete-GPU-preferred auto-select, so all existing callers (server, tests, FLUX) are byte-for-byte unchanged.

Verification

Full solution builds clean (TreatWarningsAsErrors, 0 warnings).
CLI smoke tests of every --device parse branch:
- --device foo → "expected a device index…"
- --device 0,1 → "multi-device split is not supported"
- --device=-1 → rejected (won't mis-parse as device 1)
- --device CUDA3 / --device none → valid, proceeds to model load
- --ngl, --model-draft, --repeat-penalty recognized and shown in --help.

🤖 Generated with Claude Code

Reimplements the stale PR #144 fresh on master. Adds llama.cpp flag aliases and single-GPU device selection to both `run` and `image`: - `--ngl`/`--gpu-layers` as aliases for the existing `-g`/`--n-gpu-layers` (kept `-g` — non-breaking, unlike #144 which dropped it). - `--model-draft` (alias of `--draft-model`) and `--repeat-penalty` (alias of `--rep-penalty`) for llama.cpp parity. - `--device <auto|none|cpu|INDEX|NAMEINDEX>` (e.g. `0`, `CUDA0`, `Vulkan1`). Parsing lives in the new `GpuDevice.Resolve`: it pins CUDA via CUDA_VISIBLE_DEVICES (process-wide, only when the user hasn't set it) and returns the index for Vulkan's explicit physical-device selector. `none`/`cpu` forces the CPU path, overriding `--ngl`. Single-device only (no multi-GPU split) — comma lists are rejected. VulkanBackend gains `VulkanBackend(int deviceIndex = -1)` + `SelectPhysicalDevice(int)` with an explicit-index path (bounds-checked, compute-queue-checked); the default `-1` preserves the prior discrete-GPU-preferred auto-select, so existing callers are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces support for the --device option to pin a single GPU or disable GPU offloading via --device none, mirroring llama.cpp behavior. It adds a GpuDevice helper to parse the option, updates the CLI commands to support the new flag and its aliases, and updates VulkanBackend to select the specified physical device. The feedback suggests simplifying the device string parsing logic in GpuDevice.cs and adding a status message in RunZImage when the upscaler falls back to CPU under --device none to match the behavior in RunFlux.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-14T09:13:39Z

+        // Strip an optional leading backend name (CUDA/Vulkan/GPU/…) and read the trailing index.
+        int i = value.Length;
+        while (i > 0 && char.IsDigit(value[i - 1])) i--;
+        var digits = value[i..];
+        // The part before the index must be empty or a plain backend name (letters only) —
+        // this rejects things like "-1" (which would otherwise parse as device 1).
+        bool prefixOk = true;
+        for (int j = 0; j < i; j++)
+            if (!char.IsLetter(value[j])) { prefixOk = false; break; }
+        if (!prefixOk || digits.Length == 0 || !int.TryParse(digits, out int index) || index < 0)
+            throw new InvalidOperationException(
+                $"--device '{device}': expected a device index (0, 1, …), a named device " +
+                "(CUDA0, Vulkan1), 'auto', or 'none'.");


The logic to parse the trailing index and validate the prefix can be simplified and made more robust. Currently, the loop checks if each character in the prefix is a letter, but we can use int.TryParse on the trailing digits and check if the remaining prefix is entirely alphabetic more cleanly. Additionally, we should ensure that we handle cases where there are no digits at all or if the index parsing fails due to overflow (which is already handled by int.TryParse returning false, but we can make the error message more specific). Let's refactor this parsing logic to be more readable and robust.

// Strip an optional leading backend name (CUDA/Vulkan/GPU/…) and read the trailing index. int i = value.Length; while (i > 0 && char.IsDigit(value[i - 1])) i--; var digits = value[i..]; bool prefixOk = true; for (int j = 0; j < i; j++) { if (!char.IsLetter(value[j])) { prefixOk = false; break; } } if (!prefixOk || digits.Length == 0 || !int.TryParse(digits, out int index) || index < 0) { throw new InvalidOperationException( $"--device '{device}': expected a device index (0, 1, …), a named device " + "(CUDA0, Vulkan1), 'auto', or 'none'."); }

gemini-code-assist · 2026-06-14T09:13:39Z

                        //   default (-1)               → CUDA → Vulkan → CPU fallback
                        string backendChoice = (s.Backend ?? "auto").ToLowerInvariant();
-                        bool forceCpu    = s.NGpuLayers == 0 || backendChoice == "cpu";
+                        bool forceCpu    = s.NGpuLayers == 0 || backendChoice == "cpu" || deviceNone;


In RunZImage, when deviceNone is true, the main compute backend correctly falls back to CPU. However, unlike RunFlux (which explicitly logs [dim]Upscaler backend:[/] CPU (--device none)), RunZImage silently falls back to CPU for the upscaler without printing any status message. We should add a check for deviceNone or gpu == null to print a clear status message when the upscaler falls back to CPU.

bool forceCpu = s.NGpuLayers == 0 || backendChoice ==

Addresses PR review (silent-failure-hunter + pr-test-analyzer): - ImageCommand swallowed an explicit `--device N` failure into a silent CPU fallback (Z-Image printed the actively-wrong "no GPU detected"; FLUX printed nothing at all). Now the backend-creation fallback `catch` blocks are gated on `deviceIndex < 0` (auto-select only), so an explicit device error propagates to the outer handler and is reported. Auto-select behavior is unchanged. - Warn when `--device none` contradicts an explicit GPU request (`--ngl/-g` in run, `--backend cuda|vulkan` in image) instead of silently dropping to CPU. - Add SharpInference.Tests.Cli with 25 GpuDevice.Resolve cases covering every parse branch (incl. the `-1`-must-not-mis-parse-as-device-1 guard and int overflow) and the CUDA_VISIBLE_DEVICES pin/preserve side effect. Exposed via InternalsVisibleTo; xunit collection parallelism is off so the env-mutating cases stay deterministic. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

RunZImage silently used the CPU upscaler when no GPU backend was selected (incl. --device none), while RunFlux logged it. Print the matching "Upscaler backend: CPU (--device none)" / "CPU" line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

pekkah · 2026-06-14T09:21:16Z

Addressed the review feedback across three commits:

Silent-failure-hunter (CRITICAL #1/#2): An explicit --device N that was invalid/absent was being swallowed into a silent CPU fallback in ImageCommand (Z-Image printed the actively-wrong "no GPU detected"; FLUX printed nothing). Fixed by gating every backend-creation fallback catch on deviceIndex < 0 — auto-select still falls back, but an explicitly-named device now propagates its error to the outer handler and is reported. Verified: image --device 9 --backend vulkan → Error: --device 9: only 2 Vulkan device(s) present (valid indices 0..1).

Silent-failure-hunter (#5): --device none contradicting an explicit GPU request now prints a Note: (run: vs --ngl/-g; image: vs --backend) instead of silently dropping to CPU.

pr-test-analyzer (#9): Added SharpInference.Tests.Cli (25 cases) covering every GpuDevice.Resolve parse branch — including the -1-must-not-mis-parse-as-device-1 guard and int overflow — plus the CUDA_VISIBLE_DEVICES pin/preserve side effect. Exposed via InternalsVisibleTo; collection parallelism is off so the env-mutating cases are deterministic.

gemini (ImageCommand.cs:249): Z-Image now logs the upscaler CPU fallback (Upscaler backend: CPU (--device none) / CPU) to match FLUX.

gemini (GpuDevice.cs:57): Declined — the suggested refactor is byte-for-byte identical to the current loop (same digit-strip + letters-only prefix check). The existing logic is now locked down by the 25 unit tests above (the -1/0x1F/CUDA/overflow rejection cases specifically).

silent-failure-hunter (#3, CUDA index validation): Not adding a CUDA-side device-count check. GpuDevice.Resolve runs before backend selection and can't know whether the target is CUDA or Vulkan, so a CUDA-count check there would wrongly reject a valid Vulkan-only index. After the #1/#2 fix every invalid explicit device does surface an error (Vulkan validates in-ctor; an explicit CUDA index that doesn't exist fails loudly at cublasCreate). The residual is message wording on the CUDA path, not a silent failure — left as a known limitation.

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

pekkah and others added 2 commits June 14, 2026 12:19

pekkah merged commit ee1fd37 into master Jun 14, 2026
1 check passed

pekkah deleted the feat/cli-ngl-device-flags branch June 14, 2026 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): llama.cpp-compatible --ngl/--device GPU selection flags#244

feat(cli): llama.cpp-compatible --ngl/--device GPU selection flags#244
pekkah merged 3 commits into
masterfrom
feat/cli-ngl-device-flags

pekkah commented Jun 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

pekkah commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pekkah commented Jun 14, 2026

What

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

pekkah commented Jun 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant