Skip to content

Conversation

@oliverholworthy
Copy link
Contributor

Support serving in container without tensorflow, torch, or cudf.

tensorflow/torch

The python packages tensorflow, and torch are not required to be installed to serve an Systems Ensemble with Triton since we're using the the tensorflow and pytorch Triton backends which don't require the python packages.

cudf

Adding check for cudf to convert_format enables us to run an NVTabular workflow in an environment without cudf installed

@oliverholworthy oliverholworthy added the chore Maintenance for the repository label Apr 13, 2023
@oliverholworthy oliverholworthy added this to the Merlin 23.04 milestone Apr 13, 2023
@oliverholworthy oliverholworthy self-assigned this Apr 13, 2023
@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/systems/review/pr-326

try:
import torch
except ImportError:
torch = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably import from merlin.core.compat.torch and merlin.core.compat.tensorflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. There is a side-effect in the merlin.core.compat.tensorflow that does more than importing tensorflow, which I'm not sure about - the configure_tensorflow function. It may be worth looking at making the device configuration something that is called explictly instead of called as a result of the import of that module.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are arguments for both sides here. We have had customer complaints before about TF taking up more space than required. The issue manifests itself as OOM error, and this takes a while to investigate given the back and forth required with the customer. So we have to go through this whole thing that explains that it is required that you use the configure_tensorflow method to clamp down TF gpu memory reservation. However this ends up confusing the customer, so we wanted to handle that for the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to do it explicitly, but it was a giant pain to remember to do it everywhere and if you didn't remember to do it before importing tensorflow then things would break. I'm honestly not sure that doing it this way is entirely better because you can't really set the parameters anymore, but 🤷🏻 six of one, half dozen of the other.

@karlhigley karlhigley requested a review from jperez999 April 13, 2023 14:37
@oliverholworthy
Copy link
Contributor Author

It would be nice to be able to have a test that checks this. One option would involve configuring a workflow that uses triton as a service-container so that we can test with a more minimal triton docker image that doesn't have tensorflow/torch or cudf installed

@karlhigley karlhigley merged commit da739e2 into NVIDIA-Merlin:main Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Maintenance for the repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants