Fix quantized model initialization for int8 dtypes by Krish0909 · Pull Request #39456 · huggingface/transformers

Krish0909 · 2025-07-16T16:26:30Z

Fix quantized model initialization for int8 dtypes

This PR resolves a critical issue where loading quantized models (particularly llmcompressor W8A8 models) fails with:
RuntimeError: expected a floating-point or complex dtype, but got dtype=torch.int8

Root Cause: The model initialization code calls normal_() on int8 tensors during weight initialization, but PyTorch only supports this operation on floating-point tensors.

Solution: Skip weight initialization for quantized models since their weights are already loaded from checkpoints. Changed the conditional in _load_pretrained_model from else: to elif not is_quantized: to prevent calling initialize_weights() on quantized models.

Impact:

Enables loading of llmcompressor quantized models without crashes
No impact on non-quantized models
Minimal, targeted fix with backward compatibility

Fixes the issue reported in the original GitHub discussion about RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8 model loading.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? (No doc changes needed for this internal fix)
Did you write any new necessary tests?

Who can review?

@SunMarc @MekkCyber (quantization experts)

@yonigozlan

- Fix KeyError when do_image_splitting=False - Move split_images_grouped assignment inside loop - Ensures all image shapes are stored, not just the last one - This fixes the bug in both Idefics3 and generated SmolVLM processors cc @yonigozlan

Skip weight initialization for quantized models to prevent normal_() call on int8 tensors which causes RuntimeError. Fixes initialization error when loading llmcompressor quantized models.

Krish0909 · 2025-07-16T16:28:07Z

@SunMarc Saw this as a good first issue, fixed it.

SunMarc · 2025-07-16T16:42:13Z

            with deepspeed.zero.GatheredParameters(not_initialized_parameters, modifier_rank=0):
                self.initialize_weights()
-        else:
+        elif not is_quantized:


not the right fix, not sure why we are initializing the tensor in int8 when all the quantized weights should not be random

Let me investigate why _initialize_missing_keys is being called for quantized models and fix the actual loading flow that's causing this. I'll trace back through the quantized model loading path to find where missing keys are incorrectly identified

- Root cause: quantized models should preserve pre-quantized values - Fix: prevent _initialize_missing_keys call when model is quantized - Resolves RuntimeError from normal_() on int8 tensors

github-actions · 2025-07-16T18:13:57Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: idefics3, smolvlm

Krishnan Vignesh added 2 commits July 16, 2025 19:53

Fix quantized model initialization for int8 dtypes

9d6fcd4

Skip weight initialization for quantized models to prevent normal_() call on int8 tensors which causes RuntimeError. Fixes initialization error when loading llmcompressor quantized models.

SunMarc reviewed Jul 16, 2025

View reviewed changes

fix: skip missing key initialization for quantized models

7958e72

- Root cause: quantized models should preserve pre-quantized values - Fix: prevent _initialize_missing_keys call when model is quantized - Resolves RuntimeError from normal_() on int8 tensors

Krish0909 force-pushed the fix-quantized-model-initialization branch from 15d5a10 to 7958e72 Compare July 16, 2025 18:12

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix quantized model initialization for int8 dtypes#39456

Fix quantized model initialization for int8 dtypes#39456
Krish0909 wants to merge 3 commits into
huggingface:mainfrom
Krish0909:fix-quantized-model-initialization

Krish0909 commented Jul 16, 2025

Uh oh!

Krish0909 commented Jul 16, 2025

Uh oh!

SunMarc Jul 16, 2025

Uh oh!

Krish0909 Jul 16, 2025

Uh oh!

github-actions Bot commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Krish0909 commented Jul 16, 2025

Fix quantized model initialization for int8 dtypes

Before submitting

Who can review?

Uh oh!

Krish0909 commented Jul 16, 2025

Uh oh!

SunMarc Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Krish0909 Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants