Skip to content

Conversation

@yarongmu-google
Copy link
Contributor

num_q_heads num_kv_heads max_num_batched_tokens max_num_seqs input_len output_len best_block_size
10 2 4096 128 1800 128 (128, 32)
5 1 4096 128 1800 128 (128, 64)
10 2 1024 1024 2000 48 (128, 32)
5 1 1024 1024 2000 48 (128, 32)
10 2 2048 128 1800 128 (128, 32)
5 1 2048 128 1800 128 (128, 32)
10 2 1024 128 1800 128 (128, 32)
5 1 1024 128 1800 128 (128, 32)
10 2 4096 1024 2000 48 (128, 32)
5 1 4096 1024 2000 48 (128, 64)
10 2 2048 1024 2000 48 (128, 32)
5 1 2048 1024 2000 48 (128, 32)

Copy link
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks, Yarong!

Signed-off-by: Yarong Mu <[email protected]>
@yaochengji yaochengji enabled auto-merge (squash) April 11, 2025 22:06
@yaochengji yaochengji merged commit 4eea5e1 into pytorch:master Apr 12, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants