Skip to content

Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs#39435

Open
xadupre wants to merge 51 commits into
huggingface:mainfrom
xadupre:eaf
Open

Add a unit test for BartModel to compare eager, sdpa on one particular set of inputs#39435
xadupre wants to merge 51 commits into
huggingface:mainfrom
xadupre:eaf

Conversation

@xadupre

@xadupre xadupre commented Jul 15, 2025

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes #39365. Not a fix yet but introducing a unit test failing for the reason explained in this issue. Either the inputs are wrong, either the fix from issue #39365 is needed.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

xadupre added 21 commits May 22, 2025 23:40
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: bart

@vasqu vasqu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts on the test: Does this only happen if we pass invalid masks? I.e. the mask was too long?

Added some comments on the test design to make it more simple / reuse things we have.

context = torch.tensor(
[[71, 82, 18, 33, 46, 91, 2], [68, 34, 26, 58, 30, 2, 1]], device=torch_device, dtype=torch.long
)
mask = torch.ones((context.shape[0], context.shape[1] + 2), device=context.device, dtype=torch.int64)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it fails when we give it a mask that is too long for the original input?

mask = torch.ones((context.shape[0], context.shape[1] + 2), device=context.device, dtype=torch.int64)
shape1 = (2, 2, 7, 7)
shape2 = (2, 2, 2, 7)
past_key_values = EncoderDecoderCache(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather if we could just pass use_cache in the forward call.

Comment on lines +322 to +323
@parameterized.expand(["sdpa", "eager"])
def test_lm_uneven_forward_with_mask(self, attn_implementation):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to parametrize this; just set the config accordingly and check one after the other 👀

Other than that, I think it makes sense to move this test under BartModelTester/BartModelTest, you can look at test_encoder_decoder_model_standalone for reference. We should possibly only modify the inputs (attention mask) and keep the rest from the defaults, i.e. config_and_inputs = self.model_tester.prepare_config_and_inputs_for_common().

@vasqu

vasqu commented Aug 7, 2025

Copy link
Copy Markdown
Collaborator

@xadupre Still any interest in this PR/bug? Otherwise, I'd make another PR to apply this fix and propagate it 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug in modeling_bart.eager_attention_forward

2 participants