Skip to content

Conversation

@kyuyeunk
Copy link
Contributor

@kyuyeunk kyuyeunk commented Jul 11, 2025

This PR optimizes performance of quantized matmul kernel using following optimizations

  • Minimize run-time branching logic by creating multiple functions during compile time
  • Use bf16 during input quantization
  • If possible, cache input quantization result for later use to reduce re-quantization
  • Only save accumulation output to scratch memory when necessary
  • Only create scratch memory when necessary

@vanbasten23 vanbasten23 self-requested a review July 11, 2025 17:45
@kyuyeunk kyuyeunk force-pushed the optimize-w8a8-pallas-kernel branch 7 times, most recently from c246fa6 to 9a965b0 Compare July 14, 2025 21:56
Copy link
Collaborator

@vanbasten23 vanbasten23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @kyuyeunk
Will merge once the TPU CI finishes.

@vanbasten23
Copy link
Collaborator

The TPU CI seems blocking the merge.

@kyuyeunk
Copy link
Contributor Author

The TPU CI seems blocking the merge.

Do you know what is wrong & how to resolve it?

Adds following optimizations
- Minimize run-time branching logic by creating multiple functions during compile time
- Use bf16 during input quantization
- If possible, cache input quantization result for later use to reduce re-quantization
- Only save accumulation output to scratch memory when necessary
- Only create scratch memory when necessary
@kyuyeunk kyuyeunk force-pushed the optimize-w8a8-pallas-kernel branch from 9a965b0 to bdc7787 Compare July 15, 2025 17:59
@lsy323 lsy323 merged commit 36ff641 into pytorch:master Jul 16, 2025
23 of 24 checks passed
@kyuyeunk kyuyeunk deleted the optimize-w8a8-pallas-kernel branch July 16, 2025 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants