Skip to content

[Functionalization] The inplace decompose op is meaningless #130021

@hanwen-sun

Description

@hanwen-sun

Question:

When I use torch_xla to accelerate model training, I find the inlace op will go through PyTorch's functionalization. The inlace op will be dispatched to the python level decompose implementation registered here:

def _make_inplace(fn):
. However, after the python level implementation, it will then go to the c++ level implementation:
void FunctionalTensorWrapper::replace_(const Tensor& other, bool from_lazy_regenerate) {
and the output is determined by the c++ level implementation. It seems that the output of decompose op is meaningless.
I find a pr related to the python level implementation:

and a pr related to the c++ level implementation:

And I guess when you need to use functionalization inlace op will meet this problem.

System info:

Pytorch version: 2.2.2
torch_xla version: 2.3.0
python version: 3.10.13

reproducible code:

import torch
import torch_xla.core.xla_model as xm

dev = xm.xla_device()

tensor = torch.tensor([1.0, 2.0, 3.0])

xla_tensor_a = tensor.to(dev)
xla_tensor_b = tensor.to(dev)

xla_tensor_a.mul_(xla_tensor_b)

you can check the timeline or add some log in the python level and c++ level implementation
here's the timeline of mul_ op:
image

I want to know is there any problem in this implementation? Or can someone tell me why the functionalization of Pytorch do this, thank you!

cc @bdhirsh @ezyang

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: functionalizationused for issues that are specific to functionalization (AOTAutograd bugs should start w aotdispatch)module: xlaRelated to XLA supporttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions