[Bugfix] Enhance LowerAsyncCopy Pass to handle INT8 dma copy with predicate by LeiWang1999 · Pull Request #219 · microsoft/BitBLAS

LeiWang1999 · 2024-10-11T11:06:50Z

This pull request includes several changes across multiple files to enhance error handling, improve GPU matrix multiplication logic, and update integration benchmarks. The most important changes include increasing the maximum error message length, refining the logic for GPU matrix operations, and updating integration benchmarks.

Error Handling Improvements:

Increased MAX_ERROR_MESSAGE_LENGTH from 200 to 500 in bitblas/common.py.

GPU Matrix Multiplication Logic Enhancements:

Refined the condition to check block_reduction_depth and added a default value of 1 if block_reduction_depth is None in bitblas/gpu/matmul_mma_dequantize.py. [1] [2]
Updated thread binding and loop splitting logic based on the reduce_k value in bitblas/gpu/matmul_mma_dequantize.py. [1] [2] [3] [4] [5] [6] [7] [8]

Integration Benchmark Updates:

Updated integration benchmarks to use model.quantize() and torch.compile(model) in integration/BitNet/benchmark_inference_latency.py.

Import Optimization:

Optimized imports in integration/pytorch/bitblas_linear.py by updating the import statement for MatmulConfig and Matmul.

Submodule Update:

Updated the submodule commit for 3rdparty/tvm.

Ref to Issue #218

…e_tl

Increase MAX_ERROR_MESSAGE_LENGTH from 200 to 500 Improve thread binding in MatmulTensorizationMMAWithDequantizeInfo

LeiWang1999 added 6 commits October 4, 2024 05:49

Merge TL Update

acb4aa4

submodule update

519e21f

cutlass submodule update

ea24d45

Refactor model quantization and compilation for improved performance

50b8faa

Merge branch 'main' of https://github.com/microsoft/BitBLAS into merg…

3b53be0

…e_tl

Refactor model quantization and compilation for improved performance

5bb9c13

Increase MAX_ERROR_MESSAGE_LENGTH from 200 to 500 Improve thread binding in MatmulTensorizationMMAWithDequantizeInfo

LeiWang1999 marked this pull request as ready for review October 11, 2024 11:06

Refactor GPU schedule rules and benchmark script

cd2931f

LeiWang1999 merged commit 5f10b44 into microsoft:main Oct 11, 2024

LeiWang1999 mentioned this pull request Oct 11, 2024

[BUG] Dynamic symoblic may block the lowering phase of async copy #218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Enhance LowerAsyncCopy Pass to handle INT8 dma copy with predicate #219

[Bugfix] Enhance LowerAsyncCopy Pass to handle INT8 dma copy with predicate #219
LeiWang1999 merged 7 commits into
microsoft:mainfrom
LeiWang1999:fix-async-copy-on-bitnet

LeiWang1999 commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LeiWang1999 commented Oct 11, 2024

Error Handling Improvements:

GPU Matrix Multiplication Logic Enhancements:

Integration Benchmark Updates:

Import Optimization:

Submodule Update:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant