Skip to content

Enable granite 4 hybrid integration tests#39222

Open
alex-jw-brooks wants to merge 1 commit into
huggingface:mainfrom
alex-jw-brooks:enable_granite4_tests
Open

Enable granite 4 hybrid integration tests#39222
alex-jw-brooks wants to merge 1 commit into
huggingface:mainfrom
alex-jw-brooks:enable_granite4_tests

Conversation

@alex-jw-brooks

Copy link
Copy Markdown
Contributor

Enables granite moe hybrid integration tests using the tiny preview model. Tolerance is adjusted to be more lenient for bfloat16.

Fixes # #38542

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ydshieh can you please take a look?

@github-actions

github-actions Bot commented Jul 4, 2025

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: granitemoehybrid

@ydshieh

ydshieh commented Jul 4, 2025

Copy link
Copy Markdown
Collaborator

Thank you a lot. I will check the tests in our runner.

@ydshieh

ydshieh commented Jul 4, 2025

Copy link
Copy Markdown
Collaborator

@alex-jw-brooks

Do you know why test_model_logits have different out.logits.float().mean(-1) in different runs?

It's just forward and should give deterministic outputs no?

@alex-jw-brooks

alex-jw-brooks commented Jul 7, 2025

Copy link
Copy Markdown
Contributor Author

@ydshieh I agree, but this doesn't seem to be the case, and it seems to diverge a lot more when it's run with bf16. Please let me investigate a bit - I'll follow up once I at least have a better understanding of the root cause of the non-determinism 🙂

@ydshieh

ydshieh commented Jul 7, 2025

Copy link
Copy Markdown
Collaborator

It's known bf16 have less precision (to have wilder range). I'm happy to change it to fp32 (if it fits into A10).
The worrying part is more about the non-deterministic outputs in different runs.

That would be nice if we can figure out why?

(Maybe first check if all model's weights are loaded from the model checkpoint - if not we will see some warning from the log. But I think it is correctly loaded otherwise the output would be even non-sense).

Thank you for helping 🙏

@alex-jw-brooks

alex-jw-brooks commented Jul 7, 2025

Copy link
Copy Markdown
Contributor Author

Unfortunately, it won't without CPU offload, it's about 30 gb of vram to run it in full precision 😞 This model is pretty much combining elements from bamba and granite moe shared - I'm guessing it's something to do with bamba and hopefully expected since the bamba integration tests use similarly high tolerance for these checks, but just need to dig a bit to understand since I'm less familiar with SSMs in general 😅

I'll follow up as soon as possible, and also see if anything can be done about getting a smaller model for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants