Add read_avro and list_avro_columns for rework on Splittable Avro support by yongtang · Pull Request #399 · tensorflow/io

yongtang · 2019-07-31T04:55:37Z

This PR is part of the effort to rework on Dataset with large files reading into Tensors first to speed up performance. See #382 and #366 for related discussions.

Summary:

read_avro is able to read a avro file within the range of [offset, offset+length] (Splittable)
we use primitive read_avro C++ ops to read in big chunks and then wire up with tf.data.Dataset
read_avro could be used in other places.
AvroDataset automatically find out the dtype in eager mode, in graph mode, user has
to specify the dtype in kwargs.

Signed-off-by: Yong Tang [email protected]

…port This PR is part of the effort to rework on Dataset with large files reading into Tensors first to speed up performance. See 382 and 366 for related discussions. Summary: 1) read_avro is able to read a avro file within the range of [offset, offset+length] (Splittable) 2) we use primitive read_avro C++ ops to read in big chunks and then wire up with tf.data.Dataset 3) read_avro could be used in other places. 4) AvroDataset automatically find out the dtype in eager mode, in graph mode, user has to specify the dtype in kwargs. Signed-off-by: Yong Tang <[email protected]>

yongtang · 2019-08-04T15:49:05Z

Also plan to merge this PR, as it exposes a primitive op (read_avro) which could be more useful than dataset (unless directly passed to tf.keras).

…port (tensorflow#399) This PR is part of the effort to rework on Dataset with large files reading into Tensors first to speed up performance. See 382 and 366 for related discussions. Summary: 1) read_avro is able to read a avro file within the range of [offset, offset+length] (Splittable) 2) we use primitive read_avro C++ ops to read in big chunks and then wire up with tf.data.Dataset 3) read_avro could be used in other places. 4) AvroDataset automatically find out the dtype in eager mode, in graph mode, user has to specify the dtype in kwargs. Signed-off-by: Yong Tang <[email protected]>

yongtang force-pushed the avro branch from 98bfe0f to 1de1d12 Compare July 31, 2019 06:16

yongtang mentioned this pull request Jul 31, 2019

Discuss Batch Standards in TFIO with Keras #382

Open

yongtang force-pushed the avro branch from 1de1d12 to aada0a5 Compare August 4, 2019 00:41

yongtang merged commit 77ee1da into tensorflow:master Aug 4, 2019

yongtang deleted the avro branch August 4, 2019 20:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read_avro and list_avro_columns for rework on Splittable Avro support#399

Add read_avro and list_avro_columns for rework on Splittable Avro support#399
yongtang merged 1 commit intotensorflow:masterfrom
yongtang:avro

yongtang commented Jul 31, 2019

Uh oh!

yongtang commented Aug 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yongtang commented Jul 31, 2019

Uh oh!

yongtang commented Aug 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant