Skip to content

cli: match llama.cpp GPU flag conventions (--ngl, --device) and align flag names#144

Closed
pekkah wants to merge 4 commits into
masterfrom
claude/cli-args-llama-cpp-Z9boe
Closed

cli: match llama.cpp GPU flag conventions (--ngl, --device) and align flag names#144
pekkah wants to merge 4 commits into
masterfrom
claude/cli-args-llama-cpp-Z9boe

Conversation

@pekkah

@pekkah pekkah commented Jun 6, 2026

Copy link
Copy Markdown
Owner

What & why

Aligns the CLI flag names with llama.cpp / llama-cli. Previously -g was the short flag for --n-gpu-layers, which collides with llama.cpp's -ngl. Backwards compatibility was intentionally dropped per request.

Flag changes (text generation + image commands)

Concept llama.cpp Before Now
GPU layers -ngl / --n-gpu-layers -g --ngl / --n-gpu-layers / --gpu-layers
Device select --device (none) --device (functional, single-GPU)
Repetition penalty --repeat-penalty --rep-penalty --repeat-penalty
Draft model --model-draft --draft-model --model-draft (+ --draft-model alias)

Note: Spectre.Console.Cli requires single-dash options to be one character, so llama's single-dash multi-char spellings (-ngl, -dev, …) are exposed as double-dash long options (--ngl, --device). No custom argument parsing was added.

Device selection (--device, new)

Accepts a single device: index (0, 1), name (CUDA0, Vulkan1), or none (CPU). Single-device only (no multi-GPU split).

  • CUDA: pinned via CUDA_VISIBLE_DEVICES set before first CUDA init — robust across the prefetch worker threads (a per-thread cudaSetDevice would not have been).
  • Vulkan: new VulkanBackend(int deviceIndex) physical-device selector with bounds + compute-queue validation.

Also updated

User-facing console/error strings that referenced -g, plus the CLI README, root README benchmark table, CLAUDE.md, the ToolCall sample (its own parser + docs), and bench-129-ab.ps1.

Testing

⚠️ Not compiled. The session environment's network policy blocks the .NET SDK CDN (all Microsoft download hosts return HTTP 403) and only .NET 8 is available via apt, so I could not run dotnet build / dotnet test. Changes are verified by inspection only. Please run a local dotnet build -c Release && dotnet test before merging.

https://claude.ai/code/session_01RvSxRhAddVVMd4DGMkvV4d


Generated by Claude Code

claude added 2 commits June 5, 2026 07:14
… names

- Rename the GPU-layers short flag -g to -ngl (matches llama.cpp); also accept
  --gpu-layers and --ngl long forms. Add -dev/--device for single-GPU device
  selection (index, CUDAn/Vulkann name, or 'none'); CUDA is pinned via
  CUDA_VISIBLE_DEVICES (robust across worker threads), Vulkan via a new
  VulkanBackend(deviceIndex) physical-device selector.
- Rename --rep-penalty to --repeat-penalty and --draft-model to
  --model-draft (keeping --draft-model as an alias).
- Spectre.Console.Cli forbids multi-char single-dash options, so add an argv
  shim in Program.cs translating llama's -ngl/-md/-st/-sys/-dev spellings to
  the registered long options; -st/-sys now reach --single-turn/--system-prompt.
- Update user-facing strings, CLI/root README, CLAUDE.md, the ToolCall sample,
  and bench-129-ab.ps1 to the new flags.

Backwards compatibility intentionally dropped per request.
Per review: no custom arg parsing. Register the llama.cpp names directly as
long options (--ngl, --device, --model-draft, --repeat-penalty) and remove the
Program.cs flag-translation shim. Single-dash llama spellings (-ngl, -dev, …)
are not accepted; their double-dash equivalents are. Updated docs, the ToolCall
sample parser/examples, and bench-129-ab.ps1 accordingly.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the CLI flags to match llama.cpp by renaming the GPU layers option from -g to --ngl / --n-gpu-layers and introducing a new --device option to target a specific GPU. It also updates the Vulkan backend to support explicit device selection. The review comments correctly identify two issues: first, in RunFlux, the deviceNone flag is discarded, which prevents --device none from disabling the GPU upscaler; second, the usage help string in the tool-call sample still references the deprecated -g flag.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/SharpInference.Cli/ImageCommand.cs
Comment thread samples/SharpInference.Sample.ToolCall/Program.cs
claude added 2 commits June 6, 2026 07:05
- RunFlux now captures deviceNone and keeps the RRDBNet upscaler on CPU when
  --device none/cpu is requested, instead of auto-selecting a GPU.
- ToolCall sample usage string: -g <layers> -> --ngl <layers>.

Addresses review feedback on PR #144.
@pekkah

pekkah commented Jun 14, 2026

Copy link
Copy Markdown
Owner Author

Closing in favor of a fresh reimplementation on current master. This branch is 8 days behind and conflicts (DIRTY) across README.md, RunCommand.cs (now collides with the #233 -f/--file work), CudaHybridForwardPass.cs / VulkanBackend.cs (rewritten by the #215/#235/#238 perf arc), and CLAUDE.md — a rebase would be an error-prone 3-way merge through hot files. The valuable core (GpuDevice.cs --device parser + the VulkanBackend device-index selection) carries over verbatim; the reimplementation will KEEP -g and ADD the llama.cpp aliases (--ngl/--device/--model-draft) so it's non-breaking, avoiding the -g--ngl example churn this PR had.

@pekkah pekkah closed this Jun 14, 2026
@pekkah pekkah deleted the claude/cli-args-llama-cpp-Z9boe branch June 14, 2026 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants