Document tensor parallelism configuration with Trainer#42876
Conversation
Added a section on configuring tensor parallelism with Trainer, including an example script for TP-only training.
Yeah you can pass tp_plan="auto" which will shard the model on the gpus that you have and Trainer will create the correct ParallelismConfig. You can also pass a device_mesh created from ParallelismConfig when initializing the model. As for |
|
Thanks very much for the review and constructive feedback! I’m happy to make the suggested adjustments:
If it’s helpful, I can update the doc text to reflect these points — just let me know if there’s a preferred location/heading for the Trainer optimization section. Thanks again for guiding this consolidated effort — I’m glad the content has come together nicely and addresses the original issue (#41141)! |
|
sounds good, you can add a "tensor parallelism" section after this |
|
Thanks for the guidance! I’ve added the Tensor Parallelism section immediately after the |
This PR adds a concise documentation section explaining how to configure tensor parallelism (TP) when training with
Trainer.It clarifies that TP must be managed by Accelerate rather than during model loading, and explains why
device_map="auto"should not be used with distributed training. A minimal, runnable TP-only Trainer example is included.Fixes #41141
Before submitting
Who can review?
@stevhliu