Skip to content

Commit 4b1e42f

Browse files
realAsmasugunav14
authored andcommitted
Track global_amax for weight FP4 MSE sweep; Refactor to NVFP4StaticQantizer, NVFP4MSECalibrator (#849)
**Type of change:** ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> **Overview:** ? <!-- You can potentially add a usage example below. --> ```python ``` <!-- Mention how have you tested your change if applicable. --> <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> * **New Features** * Added NVFP4StaticQuantizer for improved 4-bit quantization with enhanced precision control * Introduced NVFP4MSECalibrator with flexible candidate generation for calibration optimization * **Improvements** * Optimized GPU kernels for Hopper+ graphics cards with better performance * Extended Triton support to broader GPU compatibility * Enhanced backward compatibility for restoring previously quantized models * **Tests** * Added comprehensive test coverage for new quantizers and calibration methods <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: realAsma <akuriparambi@nvidia.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
1 parent 5aee517 commit 4b1e42f

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

modelopt/torch/quantization/triton/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@
3434
from .fp4_kernel import *
3535
from .fp8_kernel import *
3636

37+
# fp4_kernel_hopper requires compute >= 8.9 (uses tl.float8e4nv)
38+
if torch.cuda.get_device_capability() >= (8, 9):
39+
from .fp4_kernel_hopper import *
40+
3741
# fp4_kernel_hopper requires compute >= 8.9 (uses tl.float8e4nv)
3842
if torch.cuda.get_device_capability() >= (8, 9):
3943
from .fp4_kernel_hopper import *

0 commit comments

Comments
 (0)