Add zero dim tensor check when using flash_attention by ranzhejiang · Pull Request #38280 · huggingface/transformers

ranzhejiang · 2025-05-22T05:20:17Z

The cuda or triton kernel can not support this case: dimensions of size is 0, but traditional SDPA is implemented using PyTorch's tensor operations, which have robust support for tensors with dimensions of size 0. We need to add this check and error tips for developers when using flash_attention. Related issue is in deepspeedai/DeepSpeed#7275

Rocketknight1 · 2025-05-22T16:38:36Z

cc @ArthurZucker for FA2

ArthurZucker

Thanks, I'd rather we write 3 explicit checks as only query, key and value need this check. Maybe even just testing key?

ranzhejiang · 2025-05-23T08:10:35Z

Thanks, I'd rather we write 3 explicit checks as only query, key and value need this check. Maybe even just testing key?

Do you mean we don't need write a function to do this check, just call explicit check directly? if so, I agree with you, I can write 3 explicit checks for query, key and value, or only for key, and give users info which shape is wrong.

ArthurZucker · 2025-05-26T07:34:52Z

Yeah, I mean only checking q should be enough as well no?

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

ranzhejiang · 2025-05-30T09:38:56Z

Yeah, I mean only checking q should be enough as well no?

Yes, I agree with you and have change my code, thanks for review

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

ranzhejiang · 2025-06-16T01:32:06Z

@ArthurZucker Hi ArthurZucker, I have changed my code following your advice, can you help review this PR ? Thanks

ArthurZucker · 2025-06-25T07:48:59Z

sorry I was out for holidays

ranzhejiang force-pushed the zhejiang_fix_flash_attn branch 2 times, most recently from 6afac40 to a9fdcf4 Compare May 22, 2025 05:43

ranzhejiang mentioned this pull request May 22, 2025

[BUG] Qwen3: model loading failed when using meta device deepspeedai/DeepSpeed#7275

Open

ArthurZucker reviewed May 23, 2025

View reviewed changes

ranzhejiang force-pushed the zhejiang_fix_flash_attn branch 3 times, most recently from 3d36772 to 8365960 Compare May 30, 2025 09:37

Add zero dim tensor check when using flash_attention

8365960

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

Add zero dim tensor check when using flash_attention

827b230

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

ArthurZucker merged commit ae32f1a into huggingface:main Jun 25, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add zero dim tensor check when using flash_attention#38280

Add zero dim tensor check when using flash_attention#38280
ArthurZucker merged 2 commits into
huggingface:mainfrom
ranzhejiang:zhejiang_fix_flash_attn

ranzhejiang commented May 22, 2025

Uh oh!

Rocketknight1 commented May 22, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ranzhejiang commented May 23, 2025 •

edited

Loading

Uh oh!

ArthurZucker commented May 26, 2025

Uh oh!

ranzhejiang commented May 30, 2025

Uh oh!

ranzhejiang commented Jun 16, 2025

Uh oh!

Uh oh!

ArthurZucker commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ranzhejiang commented May 22, 2025

Uh oh!

Rocketknight1 commented May 22, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ranzhejiang commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented May 26, 2025

Uh oh!

ranzhejiang commented May 30, 2025

Uh oh!

ranzhejiang commented Jun 16, 2025

Uh oh!

Uh oh!

ArthurZucker commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ranzhejiang commented May 23, 2025 •

edited

Loading