Fix quantized model initialization for int8 dtypes#39456
Open
Krish0909 wants to merge 3 commits into
Open
Conversation
added 2 commits
July 16, 2025 19:53
- Fix KeyError when do_image_splitting=False - Move split_images_grouped assignment inside loop - Ensures all image shapes are stored, not just the last one - This fixes the bug in both Idefics3 and generated SmolVLM processors cc @yonigozlan
Skip weight initialization for quantized models to prevent normal_() call on int8 tensors which causes RuntimeError. Fixes initialization error when loading llmcompressor quantized models.
Contributor
Author
|
@SunMarc Saw this as a good first issue, fixed it. |
SunMarc
reviewed
Jul 16, 2025
| with deepspeed.zero.GatheredParameters(not_initialized_parameters, modifier_rank=0): | ||
| self.initialize_weights() | ||
| else: | ||
| elif not is_quantized: |
Member
There was a problem hiding this comment.
not the right fix, not sure why we are initializing the tensor in int8 when all the quantized weights should not be random
Contributor
Author
There was a problem hiding this comment.
Let me investigate why _initialize_missing_keys is being called for quantized models and fix the actual loading flow that's causing this. I'll trace back through the quantized model loading path to find where missing keys are incorrectly identified
- Root cause: quantized models should preserve pre-quantized values - Fix: prevent _initialize_missing_keys call when model is quantized - Resolves RuntimeError from normal_() on int8 tensors
15d5a10 to
7958e72
Compare
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: idefics3, smolvlm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix quantized model initialization for int8 dtypes
This PR resolves a critical issue where loading quantized models (particularly llmcompressor W8A8 models) fails with:
RuntimeError: expected a floating-point or complex dtype, but got dtype=torch.int8
Root Cause: The model initialization code calls
normal_()on int8 tensors during weight initialization, but PyTorch only supports this operation on floating-point tensors.Solution: Skip weight initialization for quantized models since their weights are already loaded from checkpoints. Changed the conditional in
_load_pretrained_modelfromelse:toelif not is_quantized:to prevent callinginitialize_weights()on quantized models.Impact:
Fixes the issue reported in the original GitHub discussion about RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8 model loading.
Before submitting
Who can review?
@SunMarc @MekkCyber (quantization experts)