xm.all_to_all working for TPU Pod ?

Hi, 

I am trying to learn how xm.all_to_all works on a TPU Pod (across multiple nodes).
I see examples using xmp.spawn to fork 1 or 8 processes on a node (xla/test/test_mp_all_to_all.py).
I assume torch_xla.distributed.xla_multiprocessing only work on a TPU node, not for POD, right ?
Is there an example of using xm.all_to_all on a POD, such as TPUv3-128 ?
Do I need to use cluster ? Thanks.