Skip to content

Add compressed-tensors format export support for W4A16 and W8A16 #1669

Merged
thuang6 merged 14 commits intomainfrom
thuang6/int4-ct
Apr 13, 2026
Merged

Add compressed-tensors format export support for W4A16 and W8A16 #1669
thuang6 merged 14 commits intomainfrom
thuang6/int4-ct

Conversation

@thuang6
Copy link
Copy Markdown
Contributor

@thuang6 thuang6 commented Apr 9, 2026

Description

Added compressed-tensors format export support for W4A16 and W8A16,
Replaced previous INT W8A8 support from internal NaiveQuantizationCompressor interface with new compress_module interface (require >=0.15.0)

updated PR to use BaseCompressor class method to be compatiable with old version

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #1567

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

thuang6 added 2 commits April 9, 2026 11:35
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@thuang6 thuang6 added this to the 0.13.0 milestone Apr 9, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds llm_compressor (compressed-tensors) export support for INT weight-only schemes (W4A16, W8A16), and updates docs/tests accordingly.

Changes:

  • Extend llm_compressor format to accept W4A16/W8A16 and route them through a new backend path.
  • Update compressed-tensors scheme construction to omit activation quantization for weight-only exports.
  • Add/adjust CPU export tests and document the newly supported schemes (EN + CN).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
auto_round/formats.py Adds W4A16/W8A16 to llm_compressor support and introduces a WOQ backend selector (wint_a16).
auto_round/export/export_to_llmcompressor/export.py Treats W*A16 as weight-only in compressed-tensors scheme creation; tightens dependency expectations around compress_module.
auto_round/compressors/utils.py Adds helper to detect integer weight-only quantization (WOQ).
test/test_cpu/export/test_export.py Refactors INT8_W8A8 export test and adds new W4A16/W8A16 llm_compressor export assertions.
README.md Documents llm_compressor support for FP8_BLOCK, INT8_W8A8, W4A16, W8A16.
README_CN.md Mirrors the README support-matrix update in Chinese.

Comment thread test/test_cpu/export/test_export.py
Comment thread test/test_cpu/export/test_export.py Outdated
Comment thread auto_round/export/export_to_llmcompressor/export.py Outdated
Comment thread auto_round/export/export_to_llmcompressor/export.py Outdated
Comment thread auto_round/compressors/utils.py Outdated
Comment thread auto_round/compressors/utils.py Outdated
Comment thread README_CN.md Outdated
Copy link
Copy Markdown
Contributor

@yiliu30 yiliu30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread README.md Outdated
Comment thread README_CN.md Outdated
@thuang6
Copy link
Copy Markdown
Contributor Author

thuang6 commented Apr 13, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@thuang6
Copy link
Copy Markdown
Contributor Author

thuang6 commented Apr 13, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@thuang6
Copy link
Copy Markdown
Contributor Author

thuang6 commented Apr 13, 2026

/azp run Unit-Test-CUDA-AutoRound

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@thuang6 thuang6 merged commit bbda38a into main Apr 13, 2026
42 checks passed
@thuang6 thuang6 deleted the thuang6/int4-ct branch April 13, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants