Skip to content

support Qwen3.5 quantization#1230

Open
deepindeed2022 wants to merge 1 commit intoNVIDIA:mainfrom
deepindeed2022:main
Open

support Qwen3.5 quantization#1230
deepindeed2022 wants to merge 1 commit intoNVIDIA:mainfrom
deepindeed2022:main

Conversation

@deepindeed2022
Copy link
Copy Markdown

@deepindeed2022 deepindeed2022 commented Apr 10, 2026

What does this PR do?

feature for qwen3.5 quantization

Usage

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path ./Qwen3.5-0.8B/ --qformat int4_awq --export_path Qwen3.5-0.8B-AWQ 

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path ./Qwen3.5-0.8B/ --qformat fp8 --export_path Qwen3.5-0.8B-AWQ 

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Added support for Qwen3.5 models in post-training quantization workflows with compatibility for fp8, int4_awq, w4a8_awq, and nvfp4 quantization methods.
  • Documentation

    • Updated quantization support matrices for both LLM and VLM to reflect Qwen3.5 model compatibility.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

This change adds comprehensive support for the Qwen3.5 model family across the quantization framework, including configuration mappings, quantization exclusions for specific layer projections, documentation updates reflecting support status, and corresponding test utilities and test cases.

Changes

Cohort / File(s) Summary
Documentation Updates
examples/llm_ptq/README.md, examples/vlm_ptq/README.md
Added Qwen3.5 model support matrix entries indicating support for quantization modes (fp8, int4_awq, w4a8_awq) in both LLM and VLM PTQ documentation.
Core Quantization Configuration
examples/llm_ptq/example_utils.py
Extended build_quant_cfg() to add quantizer exclusions for *in_proj_b* and *in_proj_a* parameters when model_type == "qwen3_5", addressing narrow projection layers in hybrid attention mechanisms.
Model Type Mappings
modelopt/torch/export/model_utils.py
Added two new entries to MODEL_NAME_TO_TYPE dictionary: Qwen3_5Moeqwen3_5moe and Qwen3_5qwen3_5 for model type recognition.
Quantization Utility Updates
modelopt/torch/export/quant_utils.py
Updated fusion mapping in _update_svdquant() to apply linear-to-linear scale adjustments for Qwen3_5MLP alongside existing Qwen3 MLP variants.
Test Utilities
tests/_test_utils/torch/transformers_models.py
Added get_tiny_qwen3_5() function to provide minimal Qwen3.5 model instances for testing with configurable hybrid attention and linear-attention parameters.
Quantization Tests
tests/unit/torch/quantization/plugins/test_huggingface.py
Introduced parametrized test test_qwen3_5_hybrid_attention_quantize() that validates quantization of Qwen3.5 hybrid attention models, verifying inference correctness and proper handling of GatedDeltaNet projection layer quantization exclusions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'support Qwen3.5 quantization' directly and accurately summarizes the main purpose of the changeset, which adds comprehensive quantization support for the Qwen3.5 model across multiple files and configurations.
Security Anti-Patterns ✅ Passed The PR adds Qwen3.5 model support through configuration updates and a new test utility function. No unsafe serialization calls, trust_remote_code hardcoding, or external dependencies were introduced.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 286-293: The test currently sets has_gdn_quantized /
has_attn_quantized based only on module name and presence of
module.weight_quantizer, which can give false positives when quantization is
disabled; update the loop over model.named_modules() to also check
module.weight_quantizer.is_enabled (or truthiness of that property) before
setting the flags and ensure the final assertions verify that the found modules
have weight_quantizer.is_enabled true (i.e., assert the quantizer is enabled for
"linear_attn.in_proj_qkv" and "self_attn.q_proj" modules).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b1ad8b23-7859-407c-8231-f295b28c6ba4

📥 Commits

Reviewing files that changed from the base of the PR and between 3baa2da and 5acee3e.

📒 Files selected for processing (7)
  • examples/llm_ptq/README.md
  • examples/llm_ptq/example_utils.py
  • examples/vlm_ptq/README.md
  • modelopt/torch/export/model_utils.py
  • modelopt/torch/export/quant_utils.py
  • tests/_test_utils/torch/transformers_models.py
  • tests/unit/torch/quantization/plugins/test_huggingface.py

Comment on lines +286 to +293
for name, module in model.named_modules():
if hasattr(module, "weight_quantizer") and hasattr(module, "weight"):
if "linear_attn.in_proj_qkv" in name:
has_gdn_quantized = True
if "self_attn.q_proj" in name:
has_attn_quantized = True
assert has_gdn_quantized, "GatedDeltaNet linear layers should be quantized"
assert has_attn_quantized, "Attention linear layers should be quantized"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Strengthen quantization assertions to avoid false positives.

The current flags only confirm module-name presence. They should verify weight_quantizer.is_enabled so the test fails when quantization is unexpectedly disabled.

✅ Suggested test fix
-    for name, module in model.named_modules():
-        if hasattr(module, "weight_quantizer") and hasattr(module, "weight"):
-            if "linear_attn.in_proj_qkv" in name:
-                has_gdn_quantized = True
-            if "self_attn.q_proj" in name:
-                has_attn_quantized = True
+    for name, module in model.named_modules():
+        if hasattr(module, "weight_quantizer") and hasattr(module, "weight"):
+            if "linear_attn.in_proj_qkv" in name and module.weight_quantizer.is_enabled:
+                has_gdn_quantized = True
+            if "self_attn.q_proj" in name and module.weight_quantizer.is_enabled:
+                has_attn_quantized = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 286 -
293, The test currently sets has_gdn_quantized / has_attn_quantized based only
on module name and presence of module.weight_quantizer, which can give false
positives when quantization is disabled; update the loop over
model.named_modules() to also check module.weight_quantizer.is_enabled (or
truthiness of that property) before setting the flags and ensure the final
assertions verify that the found modules have weight_quantizer.is_enabled true
(i.e., assert the quantizer is enabled for "linear_attn.in_proj_qkv" and
"self_attn.q_proj" modules).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant