Skip to content

xm.all_to_all working for TPU Pod ? #2601

@shz0116

Description

@shz0116

Hi,

I am trying to learn how xm.all_to_all works on a TPU Pod (across multiple nodes).
I see examples using xmp.spawn to fork 1 or 8 processes on a node (xla/test/test_mp_all_to_all.py).
I assume torch_xla.distributed.xla_multiprocessing only work on a TPU node, not for POD, right ?
Is there an example of using xm.all_to_all on a POD, such as TPUv3-128 ?
Do I need to use cluster ? Thanks.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions