Skip to content

Commit 62ba7ad

Browse files
committed
WIP
1 parent 3ec6c8e commit 62ba7ad

File tree

4 files changed

+55
-54
lines changed

4 files changed

+55
-54
lines changed

docs/source/learn/xla-advanced.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
# Advanced PyTorch XLA
1+
# Advanced Topics in PyTorch XLA
22

3-
- HLO is fed to the XLA compiler
3+
## Compilation, caching and execution
4+
5+
HLO is fed to the XLA compiler
46
for compilation and optimization. Compilation is then cached by PyTorch
57
XLA to be reused later if/when needed. The compilation of the graph is
68
done on the host (CPU), which is the machine that runs the Python code.
@@ -14,6 +16,8 @@ host does the compilation for XLA devices it is attached to. If SPMD is
1416
used, then the code is compiled only once (for given shapes and
1517
computations) on each host for all the devices.
1618

19+
## Synchronous execution and blocking
20+
1721
The *synchronous* operations in Pytorch XLA, like printing, logging,
1822
checkpointing or callbacks block tracing and result in slower execution.
1923
In the case when an operation requires a specific value of an XLA

docs/source/learn/xla-examples.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,3 +256,8 @@ gcloud compute tpus tpu-vm ssh ${TPU_NAME} \
256256
--worker=all \
257257
--command="python3 your_script.py"
258258
```
259+
260+
## Reference implementations
261+
262+
The [AI-Hypercomputer/tpu-recipes](https://github.com/AI-Hypercomputer/tpu-recipes)
263+
repo. contains examples for training and serving many LLM and diffusion models.

docs/source/learn/xla-overview.md

Lines changed: 44 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ models with minimal code changes from their existing PyTorch workflows.
99

1010
At its core, PyTorch/XLA acts as a bridge between the familiar PyTorch Python
1111
frontend and the XLA compiler. When you run PyTorch operations on XLA
12-
devices using this library, the following steps are performed:
12+
devices using this library, you get the following key features:
1313

1414
1. **Lazy Evaluation**: Operations are not executed immediately. Instead,
1515
PyTorch/XLA records these operations in an intermediate representation (IR)
@@ -34,7 +34,7 @@ devices using this library, the following steps are performed:
3434

3535
![img](../_static/img/pytorchXLA_flow.svg)
3636

37-
This process allows PyTorch/XLA to unlock significant performance benefits,
37+
This process allows PyTorch/XLA to provide significant performance benefits,
3838
especially for large models and distributed training scenarios. For a deeper
3939
dive into the lazy tensor system, see our
4040
[LazyTensor guide](https://pytorch.org/blog/understanding-lazytensor-system-performance-with-pytorch-xla-on-cloud-tpu/).
@@ -44,7 +44,7 @@ dive into the lazy tensor system, see our
4444
* **High Performance on TPUs**: PyTorch/XLA is optimized to deliver exceptional performance for training and inference on Google Cloud TPUs, which are custom-designed AI accelerators.
4545
* **Scalability**: Seamlessly scale your models from a single device to large TPU Pods with minimal code changes, enabling you to tackle more ambitious projects.
4646
* **Familiar PyTorch Experience**: Continue using the PyTorch APIs and ecosystem you know and love. PyTorch/XLA aims to make the transition to XLA devices as smooth as possible, often requiring only minor modifications to existing PyTorch code.
47-
* **Cost-Efficiency**: TPUs offer a compelling price/performance ratio for many AI workloads. PyTorch/XLA helps you harness this efficiency.
47+
* **Cost-Efficiency**: TPUs offer a compelling price/performance ratio for many AI workloads. PyTorch/XLA helps you take advantage of this efficiency.
4848
* **Versatility**: Accelerate a wide range of AI workloads, including chatbots, code generation, media content generation, vision services, and recommendation engines.
4949
* **Support for Leading Frameworks**: While focused on PyTorch, XLA itself is a compiler backend used by other major frameworks like JAX and TensorFlow.
5050

@@ -55,10 +55,49 @@ While PyTorch/XLA can theoretically run on any XLA-compatible backend, its prima
5555
* **Google Cloud TPUs**: Including various generations like TPU v5 and v6. [Learn more about TPUs](https://cloud.google.com/tpu/docs/intro-to-tpu).
5656
* **GPUs via XLA**: PyTorch/XLA also supports running on NVIDIA GPUs through the OpenXLA PJRT plugin, providing an alternative execution path. [Learn more about GPUs on Google Cloud](https://cloud.google.com/compute/docs/gpus).
5757

58+
## TPU Setup
59+
60+
Create a TPU with the base image to use nightly wheels or from the stable
61+
release by specifying the `RUNTIME_VERSION`.
62+
63+
``` bash
64+
export ZONE=us-central2-b
65+
export PROJECT_ID=your-project-id
66+
export ACCELERATOR_TYPE=v4-8 # v4-16, v4-32, …
67+
export RUNTIME_VERSION=tpu-vm-v4-pt-2.0 # or tpu-vm-v4-base
68+
export TPU_NAME=your_tpu_name
69+
70+
gcloud compute tpus tpu-vm create ${TPU_NAME} \
71+
--zone=${ZONE} \
72+
--accelerator-type=${ACCELERATOR_TYPE} \
73+
--version=${RUNTIME_VERSION} \
74+
--subnetwork=tpusubnet
75+
```
76+
77+
If you have a single host VM (e.g. v4-8), you can ssh to your vm and run
78+
the following commands from the vm directly. Otherwise, in case of TPU
79+
pods, you can use `--worker=all --command=""` similar to
80+
81+
``` bash
82+
gcloud compute tpus tpu-vm ssh ${TPU_NAME} \
83+
--zone=us-central2-b \
84+
--worker=all \
85+
--command="pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch-nightly-cp38-cp38-linux_x86_64.whl"
86+
```
87+
88+
Next, if you are using base image, install nightly packages and required
89+
libraries
90+
91+
``` bash
92+
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch-nightly-cp38-cp38-linux_x86_64.whl
93+
pip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-nightly-cp38-cp38-linux_x86_64.whl
94+
sudo apt-get install libopenblas-dev -y
95+
96+
sudo apt-get update && sudo apt-get install libgl1 -y # diffusion specific
97+
```
98+
5899
## Next Steps
59100

60-
- [Quickstart Guide](./xla-quickstart.md): Get started with PyTorch/XLA on Google Cloud TPUs.
61101
- [Examples](./xla-examples.md): Explore example code for training and inference on TPUs.
62102
- [Profiling and Performance](./xla-profiling.md): Learn how to profile and optimize your PyTorch/XLA applications.
63103
- [Advanced Topics](./xla-advanced.md): Dive deeper into advanced concepts like graph optimization, data loading, and distributed training with PyTorch/XLA.
64-

docs/source/learn/xla-quickstart.md

Lines changed: 0 additions & 47 deletions
This file was deleted.

0 commit comments

Comments
 (0)