Skip to content

Fix deepseekv3#38562

Merged
ydshieh merged 5 commits into
mainfrom
fix_deepseekv3
Jun 4, 2025
Merged

Fix deepseekv3#38562
ydshieh merged 5 commits into
mainfrom
fix_deepseekv3

Conversation

@ydshieh

@ydshieh ydshieh commented Jun 3, 2025

Copy link
Copy Markdown
Collaborator

What does this PR do?

See comments on the PR changes

]
super().test_past_key_values_format(custom_all_cache_shapes=all_cache_shapes)

@require_torch_large_accelerator

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOM on T4

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And from time to time, this OOM affect subsequent tests (that will pass if this test is not OOM)

Comment on lines +517 to +518
"Simply put, the theory of relativity states that Frojekecdytesాలు sicʰtinaccianntuala breej的效率和质量的控制lavestock-PraccuraciesOTTensorialoghismos的思路astiomotivityosexualriad TherapeuticsoldtYPEface Kishsatellite-TV",
"My favorite all time favorite condiment is ketchup.ieden沟渠係室温 Fryrok般地Segmentation Cycle/physicalwarenkrautempsాలు蹈梗 Mesomac一等asan lethality suspended Causewaydreamswith Fossilsdorfాలు蹈 ChristiansenHOMEbrew",

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already failing when the model is added. Don't know what machine was used when this test is written. The updated value works for both T4 and A10, but it looks like gibberish.

@bzantium If you know some details back then?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is this comment above with

# Note on `EXPECTED_TEXT_COMPLETION`'s diff: the current value matches the original test if the original test
# was changed to have a cache of 53 tokens (as opposed to 4096), on Ampere GPUs.

Not sure what they did there then tho

@ydshieh

ydshieh commented Jun 3, 2025

Copy link
Copy Markdown
Collaborator Author

There are still 2 remaining issues

FAILED tests/models/deepseek_v3/test_modeling_deepseek_v3.py::DeepseekV3ModelTest::test_torch_compile_for_training - torch._dynamo.exc.Unsupported: dynamic shape operator: aten.nonzero.default; to enable, set torch._dynamo.config.capture_dynamic_output_shape_ops = True

FAILED tests/models/deepseek_v3/test_modeling_deepseek_v3.py::DeepseekV3IntegrationTest::test_compile_static_cache - RuntimeError: index_copy_(): Source/destination tensor must have same slice shapes. Destination slice shape: 2 128 64 at dimension 2 and source slice shape: 2 128 192 at dimension 0.

(failing since the day when the model is added)

cc @gante when you have the bandwidth.

@ydshieh ydshieh requested a review from vasqu June 3, 2025 13:44
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu vasqu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to change the values just to make the tests happy here. My gripe is that the output is too gibberish - I'd like to hear from @bzantium

If there is no response, I'm ok with changing it under a TODO / extra note of what the previous value has been.

Comment on lines +517 to +518
"Simply put, the theory of relativity states that Frojekecdytesాలు sicʰtinaccianntuala breej的效率和质量的控制lavestock-PraccuraciesOTTensorialoghismos的思路astiomotivityosexualriad TherapeuticsoldtYPEface Kishsatellite-TV",
"My favorite all time favorite condiment is ketchup.ieden沟渠係室温 Fryrok般地Segmentation Cycle/physicalwarenkrautempsాలు蹈梗 Mesomac一等asan lethality suspended Causewaydreamswith Fossilsdorfాలు蹈 ChristiansenHOMEbrew",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is this comment above with

# Note on `EXPECTED_TEXT_COMPLETION`'s diff: the current value matches the original test if the original test
# was changed to have a cache of 53 tokens (as opposed to 4096), on Ampere GPUs.

Not sure what they did there then tho

@bzantium

bzantium commented Jun 4, 2025

Copy link
Copy Markdown
Contributor

@ydshieh @vasqu
Thanks for the question! The reason why the output is too gibberish is because the testing model bzantium/tiny-deepseek-v3 is not trained one. Since original DeepSeek-V3 model is too big to debug and test, there was no testing with the original one.

@ydshieh

ydshieh commented Jun 4, 2025

Copy link
Copy Markdown
Collaborator Author

@bzantium thank you for confirming (that is what I guess) too.
(It would be nice to add a comment in the test to avoid confusion 😅 )

@vasqu I will add the comment and we are good to go?

@vasqu vasqu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :) perfect, then let's add a comment for clarification and then it's good to go

@ydshieh ydshieh merged commit ff3fad6 into main Jun 4, 2025
15 checks passed
@ydshieh ydshieh deleted the fix_deepseekv3 branch June 4, 2025 09:40
bvantuan pushed a commit to bvantuan/transformers that referenced this pull request Jun 12, 2025
* fix 1

* fix 2

* fix 3

* fix 4

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants