-
Notifications
You must be signed in to change notification settings - Fork 31
Support serving in container without tensorflow, torch, or cudf
#326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support serving in container without tensorflow, torch, or cudf
#326
Conversation
Documentation preview |
merlin/systems/dag/ops/pytorch.py
Outdated
| try: | ||
| import torch | ||
| except ImportError: | ||
| torch = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably import from merlin.core.compat.torch and merlin.core.compat.tensorflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. There is a side-effect in the merlin.core.compat.tensorflow that does more than importing tensorflow, which I'm not sure about - the configure_tensorflow function. It may be worth looking at making the device configuration something that is called explictly instead of called as a result of the import of that module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are arguments for both sides here. We have had customer complaints before about TF taking up more space than required. The issue manifests itself as OOM error, and this takes a while to investigate given the back and forth required with the customer. So we have to go through this whole thing that explains that it is required that you use the configure_tensorflow method to clamp down TF gpu memory reservation. However this ends up confusing the customer, so we wanted to handle that for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to do it explicitly, but it was a giant pain to remember to do it everywhere and if you didn't remember to do it before importing tensorflow then things would break. I'm honestly not sure that doing it this way is entirely better because you can't really set the parameters anymore, but 🤷🏻 six of one, half dozen of the other.
|
It would be nice to be able to have a test that checks this. One option would involve configuring a workflow that uses triton as a service-container so that we can test with a more minimal triton docker image that doesn't have tensorflow/torch or cudf installed |
Support serving in container without
tensorflow,torch, orcudf.tensorflow/torchThe python packages
tensorflow, andtorchare not required to be installed to serve an Systems Ensemble with Triton since we're using the the tensorflow and pytorch Triton backends which don't require the python packages.cudfAdding check for cudf to
convert_formatenables us to run an NVTabular workflow in an environment without cudf installed