Skip to content

Conversation

@triple-Mu
Copy link
Collaborator

support per_token_quant_fp8 triton kernel
x: [b, s, n, d] bfloat16
quant_x: [b, s, n, d+2] float8+bfloat16

@DefTruth DefTruth marked this pull request as draft December 3, 2025 08:16
@DefTruth DefTruth self-requested a review December 3, 2025 10:18
@DefTruth DefTruth marked this pull request as ready for review December 3, 2025 10:18
@DefTruth DefTruth changed the title [feat]: support per_token_quant_fp8 triton kernel feat: support per_token_quant_fp8 triton kernel Dec 3, 2025
Copy link
Member

@DefTruth DefTruth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

@DefTruth DefTruth merged commit 85549dc into vipshop:main Dec 3, 2025
@triple-Mu triple-Mu deleted the triplemu/per_token_quant_fp8 branch December 3, 2025 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants