support Qwen3.5 quantization by deepindeed2022 · Pull Request #1230 · NVIDIA/Model-Optimizer

deepindeed2022 · 2026-04-10T08:36:23Z

What does this PR do?

feature for qwen3.5 quantization

Usage

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path ./Qwen3.5-0.8B/ --qformat int4_awq --export_path Qwen3.5-0.8B-AWQ 

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path ./Qwen3.5-0.8B/ --qformat fp8 --export_path Qwen3.5-0.8B-AWQ

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added support for Qwen3.5 models in post-training quantization workflows with compatibility for fp8, int4_awq, w4a8_awq, and nvfp4 quantization methods.
Documentation
- Updated quantization support matrices for both LLM and VLM to reflect Qwen3.5 model compatibility.

copy-pr-bot · 2026-04-10T08:36:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-10T08:36:37Z

📝 Walkthrough

Walkthrough

This change adds comprehensive support for the Qwen3.5 model family across the quantization framework, including configuration mappings, quantization exclusions for specific layer projections, documentation updates reflecting support status, and corresponding test utilities and test cases.

Changes

Cohort / File(s)	Summary
Documentation Updates `examples/llm_ptq/README.md`, `examples/vlm_ptq/README.md`	Added Qwen3.5 model support matrix entries indicating support for quantization modes (fp8, int4_awq, w4a8_awq) in both LLM and VLM PTQ documentation.
Core Quantization Configuration `examples/llm_ptq/example_utils.py`	Extended `build_quant_cfg()` to add quantizer exclusions for `in_proj_b` and `in_proj_a` parameters when `model_type == "qwen3_5"`, addressing narrow projection layers in hybrid attention mechanisms.
Model Type Mappings `modelopt/torch/export/model_utils.py`	Added two new entries to `MODEL_NAME_TO_TYPE` dictionary: `Qwen3_5Moe` → `qwen3_5moe` and `Qwen3_5` → `qwen3_5` for model type recognition.
Quantization Utility Updates `modelopt/torch/export/quant_utils.py`	Updated fusion mapping in `_update_svdquant()` to apply linear-to-linear scale adjustments for `Qwen3_5MLP` alongside existing Qwen3 MLP variants.
Test Utilities `tests/_test_utils/torch/transformers_models.py`	Added `get_tiny_qwen3_5()` function to provide minimal Qwen3.5 model instances for testing with configurable hybrid attention and linear-attention parameters.
Quantization Tests `tests/unit/torch/quantization/plugins/test_huggingface.py`	Introduced parametrized test `test_qwen3_5_hybrid_attention_quantize()` that validates quantization of Qwen3.5 hybrid attention models, verifying inference correctness and proper handling of GatedDeltaNet projection layer quantization exclusions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'support Qwen3.5 quantization' directly and accurately summarizes the main purpose of the changeset, which adds comprehensive quantization support for the Qwen3.5 model across multiple files and configurations.
Security Anti-Patterns	✅ Passed	The PR adds Qwen3.5 model support through configuration updates and a new test utility function. No unsafe serialization calls, trust_remote_code hardcoding, or external dependencies were introduced.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 286-293: The test currently sets has_gdn_quantized /
has_attn_quantized based only on module name and presence of
module.weight_quantizer, which can give false positives when quantization is
disabled; update the loop over model.named_modules() to also check
module.weight_quantizer.is_enabled (or truthiness of that property) before
setting the flags and ensure the final assertions verify that the found modules
have weight_quantizer.is_enabled true (i.e., assert the quantizer is enabled for
"linear_attn.in_proj_qkv" and "self_attn.q_proj" modules).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b1ad8b23-7859-407c-8231-f295b28c6ba4

📥 Commits

Reviewing files that changed from the base of the PR and between 3baa2da and 5acee3e.

📒 Files selected for processing (7)

examples/llm_ptq/README.md
examples/llm_ptq/example_utils.py
examples/vlm_ptq/README.md
modelopt/torch/export/model_utils.py
modelopt/torch/export/quant_utils.py
tests/_test_utils/torch/transformers_models.py
tests/unit/torch/quantization/plugins/test_huggingface.py

coderabbitai · 2026-04-10T08:43:15Z

tests/unit/torch/quantization/plugins/test_huggingface.py

+    for name, module in model.named_modules():
+        if hasattr(module, "weight_quantizer") and hasattr(module, "weight"):
+            if "linear_attn.in_proj_qkv" in name:
+                has_gdn_quantized = True
+            if "self_attn.q_proj" in name:
+                has_attn_quantized = True
+    assert has_gdn_quantized, "GatedDeltaNet linear layers should be quantized"
+    assert has_attn_quantized, "Attention linear layers should be quantized"


⚠️ Potential issue | 🟡 Minor

Strengthen quantization assertions to avoid false positives.

The current flags only confirm module-name presence. They should verify weight_quantizer.is_enabled so the test fails when quantization is unexpectedly disabled.

✅ Suggested test fix

- for name, module in model.named_modules(): - if hasattr(module, "weight_quantizer") and hasattr(module, "weight"): - if "linear_attn.in_proj_qkv" in name: - has_gdn_quantized = True - if "self_attn.q_proj" in name: - has_attn_quantized = True + for name, module in model.named_modules(): + if hasattr(module, "weight_quantizer") and hasattr(module, "weight"): + if "linear_attn.in_proj_qkv" in name and module.weight_quantizer.is_enabled: + has_gdn_quantized = True + if "self_attn.q_proj" in name and module.weight_quantizer.is_enabled: + has_attn_quantized = True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 286 - 293, The test currently sets has_gdn_quantized / has_attn_quantized based only on module name and presence of module.weight_quantizer, which can give false positives when quantization is disabled; update the loop over model.named_modules() to also check module.weight_quantizer.is_enabled (or truthiness of that property) before setting the flags and ensure the final assertions verify that the found modules have weight_quantizer.is_enabled true (i.e., assert the quantizer is enabled for "linear_attn.in_proj_qkv" and "self_attn.q_proj" modules).

support Qwen3.5 quantization

5acee3e

deepindeed2022 requested review from a team as code owners April 10, 2026 08:36

deepindeed2022 requested review from Edwardf0t1 and yueshen2016 April 10, 2026 08:36

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support Qwen3.5 quantization#1230

support Qwen3.5 quantization#1230
deepindeed2022 wants to merge 1 commit intoNVIDIA:mainfrom
deepindeed2022:main

deepindeed2022 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deepindeed2022 commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

deepindeed2022 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading