convert : avoid dequantizing mxfp4 for GPT-OSS by compilade · Pull Request #16756 · ggml-org/llama.cpp

compilade · 2025-10-24T11:59:08Z

Follow-up to #14810.

I accidentally broke MXFP4 GPT-OSS conversion in #14810 :

  File "/.../llama.cpp/convert_hf_to_gguf.py", line 656, in __init__
    super().__init__(*args, **kwargs)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/.../llama.cpp/convert_hf_to_gguf.py", line 155, in __init__
    self.dequant_model()
    ~~~~~~~~~~~~~~~~~~^^
  File "/.../llama.cpp/convert_hf_to_gguf.py", line 375, in dequant_model
    raise NotImplementedError(f"Quant method is not yet supported: {quant_method!r}")
NotImplementedError: Quant method is not yet supported: 'mxfp4'

This PR makes the convert script avoid running dequant_model() for GPT-OSS when the quant_method of the quantization_config is mxfp4.

Tested with a --dry-run conversion of https://huggingface.co/openai/gpt-oss-20b, which no longer fails.

Make sure to read the contributing guidelines before submitting a PR

* model-conversion : add trust_remote_code for orig model run [no ci] (ggml-org#16751) This commit add the trust_remote_code=True argument when loading models using AutoConfig, AutoTokenizer, and AutoModelForCausalLM for the run original model script. The motivation for this is that some models require custom code to be loaded properly, and setting trust_remote_code=True avoids a prompt asking for user confirmation: ```console (venv) $ make causal-run-original-model The repository /path/to/model contains custom code which must be executed to correctly load the model. You can inspect the repository content at /path/to/model. Do you wish to run the custom code? [y/N] N ``` Having this as the default seems like a safe choice as we have to clone or download the models we convert and would be expecting to run any custom code they have. * webui: support q URL parameter (ggml-org#16728) * webui: support q URL parameter Fixes ggml-org#16722 I’ve checked that it works with Firefox’s AI tools * webui: apply suggestions from code review Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * CUDA: use CUB for arbitary size argsort (ggml-org#16754) * ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (ggml-org#16742) * Fix CUDA grid launch condition for large block_nums.y * add backend ops test * reduce test repetitions * convert : avoid dequantizing mxfp4 for GPT-OSS (ggml-org#16756) * vulkan: Optimize SSM_SCAN (ggml-org#16645) * vulkan: delete dead code (ggml-org#16732) ggml_vk_create_buffer_temp is not used anywhere, and it is the only caller for ggml_vk_pool_malloc. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> * model : set res->t_embd in PLaMo2 models (ggml-org#16766) --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com> Co-authored-by: Florian Badie <florianbadie@odrling.xyz> Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: leejet <leejet714@gmail.com> Co-authored-by: compilade <git@compilade.net> Co-authored-by: Jeff Bolz <jbolz@nvidia.com> Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com> Co-authored-by: Shunta Saito <shunta.saito@gmail.com>

convert : avoid dequantizing mxfp4 for GPT-OSS

d7f794e

compilade requested a review from CISC as a code owner October 24, 2025 11:59

compilade added bugfix fixes an issue or bug python python script changes labels Oct 24, 2025

danbev approved these changes Oct 24, 2025

View reviewed changes

ngxson approved these changes Oct 24, 2025

View reviewed changes

compilade merged commit 5cca254 into master Oct 25, 2025
10 checks passed

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

convert : avoid dequantizing mxfp4 for GPT-OSS (ggml-org#16756)

48311c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert : avoid dequantizing mxfp4 for GPT-OSS#16756

convert : avoid dequantizing mxfp4 for GPT-OSS#16756
compilade merged 1 commit intomasterfrom
compilade/fix-prequant-mxfp4-gpt-oss

compilade commented Oct 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

compilade commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

compilade commented Oct 24, 2025 •

edited

Loading