feat: Add ability to read and process complex/nested data types from a parquet file in Parquet Data Source: enums, repeated enums, message, repeated message, repeated primitives

Dagger has been processing real-time Kafka streams for years now, And now with parquet file processing, we aim to add the capability of performing dagger operations over the historical data, making Dagger a complete solution for data processing from historical to real-time.As part of this feature, we want to extend <a href="https://github.com/odpf/dagger/issues/99">#99</a> and add the capability to read repeated primitive types: repeated primitives, repeated enums and repeated simple groups and some complex types: enums and nested simple groups.All the current features of Dagger like transformers, UDFs, continue to work on the data. From the perspective of downstream components, they need not know what kind of source produced this data.ACCEPTANCE CRITERIA

GIVEN | WHEN | THEN
-- | -- | --
Dagger job is created | Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more enum fields | Dagger should process the data from the local parquet file instead and then exit gracefully.The enum fields should be able to get added into a Flink row.Suitable default value should be used when data is not present in parquet file but present in schema.
Dagger job is created | Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more repeated enum fields ( array of enums) | Dagger should process the data from the local parquet file instead and then exit gracefully.Each list of enums should be able to get added into a Flink row as a list.Suitable default value should be used when data is not present in parquet file but present in schema.
Dagger job is created | Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more nested simple groups(i:e, simple group within another simple group) | Dagger should process the data from the local parquet file instead and then exit gracefully.The nested simple groups should be able to get parsed into nested flink rows.Suitable default value should be used when data is not present in parquet file but present in schema.
Dagger job is created | Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more fields of type repeated simplegroups simple groups(i:e, array of simple groups) | Dagger should process the data from the local parquet file instead and then exit gracefully.The repeated simplegroups should be able to get parsed into array of flink rows.Suitable default value should be used when data is not present in parquet file but present in schema.
Dagger job is created | Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more fields of type repeated primitives (i:e, array of parquet primitive types like int64, boolean, etc) | Dagger should process the data from the local parquet file instead and then exit gracefully.Repeated primitive types should be able to get parsed into array of flink rows.Suitable default value should be used when data is not present in parquet file but present in schema.

**Out of scope**

* Struct

* Repeated Struct

* Maps

* Timestamp of type SimpleGroup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add ability to read and process complex/nested data types from a parquet file in Parquet Data Source: enums, repeated enums, message, repeated message, repeated primitives #100

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GIVEN	WHEN	THEN
Dagger job is created	Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more enum fields	Dagger should process the data from the local parquet file instead and then exit gracefully.The enum fields should be able to get added into a Flink row.Suitable default value should be used when data is not present in parquet file but present in schema.
Dagger job is created	Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more repeated enum fields ( array of enums)	Dagger should process the data from the local parquet file instead and then exit gracefully.Each list of enums should be able to get added into a Flink row as a list.Suitable default value should be used when data is not present in parquet file but present in schema.
Dagger job is created	Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more nested simple groups(i:e, simple group within another simple group)	Dagger should process the data from the local parquet file instead and then exit gracefully.The nested simple groups should be able to get parsed into nested flink rows.Suitable default value should be used when data is not present in parquet file but present in schema.
Dagger job is created	Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more fields of type repeated simplegroups simple groups(i:e, array of simple groups)	Dagger should process the data from the local parquet file instead and then exit gracefully.The repeated simplegroups should be able to get parsed into array of flink rows.Suitable default value should be used when data is not present in parquet file but present in schema.
Dagger job is created	Data source is selected as parquet One or more parquet file is provided as inputParquet file has a parent simple group which contains one or more fields of type repeated primitives (i:e, array of parquet primitive types like int64, boolean, etc)	Dagger should process the data from the local parquet file instead and then exit gracefully.Repeated primitive types should be able to get parsed into array of flink rows.Suitable default value should be used when data is not present in parquet file but present in schema.

feat: Add ability to read and process complex/nested data types from a parquet file in Parquet Data Source: enums, repeated enums, message, repeated message, repeated primitives #100

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions