Skip to content

Support MXINT4 scheme#1666

Merged
mengniwang95 merged 16 commits intomainfrom
mengni/mx_int4
Apr 13, 2026
Merged

Support MXINT4 scheme#1666
mengniwang95 merged 16 commits intomainfrom
mengni/mx_int4

Conversation

@mengniwang95
Copy link
Copy Markdown
Contributor

@mengniwang95 mengniwang95 commented Apr 7, 2026

Description

Support MXINT4 scheme

How to use:

model quantization:

CUDA_VISIBLE_DEVICES=0 auto-round --model /models/Llama-3.2-3B/ --scheme MXINT4 --iters 0 --format auto_round

inference with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "tmp_autoround/Llama-3.2-3B-mxint-w4g32/"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to(
    model.device
)
output = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0]))

results:

BF16
hf ({'pretrained': '/models/Llama-3.1-8B-Instruct', 'add_bos_token': True}), gen_kwargs: ({}), limit: None, num_fewshot: None, batch_size: 64

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.5977 ± 0.0049
none 0 acc_norm 0.7954 ± 0.0040
piqa 1 none 0 acc 0.8014 ± 0.0093
none 0 acc_norm 0.8101 ± 0.0092

hf ({'pretrained': '/models/Llama-3.1-8B-Instruct', 'add_bos_token': True}), gen_kwargs: ({}), limit: None, num_fewshot: None, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
gsm8k_llama 3 flexible_extract 8 exact_match 0.8620 ± 0.0095
strict_match 8 exact_match 0.8567 ± 0.0097
mmlu_llama 1 strict_match exact_match 0.6946 ± 0.0037

MXINT4
hf ({'pretrained': 'tmp_autoround/Llama-3.1-8B-Instruct-mxint-w4g32/', 'add_bos_token': True}), gen_kwargs: ({}), limit: None, num_fewshot: None, batch_size: 64

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.5460 ± 0.0050
none 0 acc_norm 0.7396 ± 0.0044
piqa 1 none 0 acc 0.7535 ± 0.0101
none 0 acc_norm 0.7693 ± 0.0098

hf ({'pretrained': 'tmp_autoround/Llama-3.1-8B-Instruct-mxint-w4g32', 'add_bos_token': True}), gen_kwargs: ({}), limit: None, num_fewshot: None, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
gsm8k_llama 3 flexible_extract 8 exact_match 0.6285 ± 0.0133
strict_match 8 exact_match 0.5663 ± 0.0137
mmlu_llama 1 strict_match exact_match 0.5786 ± 0.0040

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

mengniwang95 and others added 5 commits April 7, 2026 17:33
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@hshen14
Copy link
Copy Markdown
Contributor

hshen14 commented Apr 10, 2026

@lvliang-intel pls review also.

@mengniwang95 mengniwang95 requested a review from wenhuach21 April 10, 2026 06:32
Comment thread auto_round/experimental/qmodules/mxint4_utils.py
Comment thread auto_round/inference/backend.py Outdated
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@mengniwang95
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Comment thread docs/step_by_step_CN.md
Comment thread auto_round/formats.py
Comment thread auto_round/inference/backend.py Outdated
Comment thread auto_round/__main__.py
Comment thread auto_round/inference/backend.py
Copy link
Copy Markdown
Contributor

@lkk12014402 lkk12014402 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mengniwang95
Copy link
Copy Markdown
Contributor Author

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mengniwang95 mengniwang95 merged commit c817d49 into main Apr 13, 2026
42 checks passed
@mengniwang95 mengniwang95 deleted the mengni/mx_int4 branch April 13, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants