-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Question:
When I use torch_xla to accelerate model training, I find the inlace op will go through PyTorch's functionalization. The inlace op will be dispatched to the python level decompose implementation registered here:
pytorch/torch/_refs/__init__.py
Line 504 in 8f70bf7
| def _make_inplace(fn): |
| void FunctionalTensorWrapper::replace_(const Tensor& other, bool from_lazy_regenerate) { |
I find a pr related to the python level implementation:
and a pr related to the c++ level implementation:
And I guess when you need to use functionalization inlace op will meet this problem.
System info:
Pytorch version: 2.2.2
torch_xla version: 2.3.0
python version: 3.10.13
reproducible code:
import torch
import torch_xla.core.xla_model as xm
dev = xm.xla_device()
tensor = torch.tensor([1.0, 2.0, 3.0])
xla_tensor_a = tensor.to(dev)
xla_tensor_b = tensor.to(dev)
xla_tensor_a.mul_(xla_tensor_b)you can check the timeline or add some log in the python level and c++ level implementation
here's the timeline of mul_ op:

I want to know is there any problem in this implementation? Or can someone tell me why the functionalization of Pytorch do this, thank you!