Update with compiling optimization and DT_UINT32 support for HDF5#379
Update with compiling optimization and DT_UINT32 support for HDF5#379yongtang merged 3 commits intotensorflow:masterfrom
Conversation
For compilation optimization flags, the default (-march=native) optimizes the generated code for your machine's CPU type. [see here](https://www.tensorflow.org/install/source#configuration_options)
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
|
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
|
I signed it! |
yongtang
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the fix!
We already had some discussion about batch size and the overall column based data (e.g., Parquet, Feather, HDF5) pipeline in #366 (comment)
Previously the batch is a limitation of overall tf.data.Dataset pipeline where it generate each record one by one. This is not an issue for large records such as image files but is really slowing down everything when each record is say one integer or one float32.
We added batch concept in tensorflow-io to speed up. But we were using the same batch as tf.keras which actually have different concept (number of sample).
My way of thinking is that we may want to
read and process as much as possible in one chunk of big memory, if not the whole file
for each "batch process" in tf.data pipeline, then rebatch() to align tf.keras' batch if needed.
That likely needs some change in the overall tf.data pipeline (or move much of the logic out of tf.data pipeline). With TF 2.0 I think the effort will be smaller.
/cc @BryanCutler
Thanks for your review and detailed reply! |
…nsorflow#379) * Update README.md * Update with compiling optimization For compilation optimization flags, the default (-march=native) optimizes the generated code for your machine's CPU type. [see here](https://www.tensorflow.org/install/source#configuration_options) * add DT_UINT32 support
More than 50% acceleration would be achieved when reading compressed hdf5 files, using compiling optimization, and even more with large batch_size.
Besides, DT_UINT32 would be supported.