Arrow2 test fix by Igosuki · Pull Request #1733 · apache/datafusion

Igosuki · 2022-02-02T20:34:47Z

Which issue does this PR close?

None

Rationale for this change

Stop using git repo for arrow2 and use a stabilized version

What changes are included in this PR?

Use arrow2 0.9 instead of master, integrate latest datafusion and a few cosmetic changes to make future merges easier.

Are there any user-facing changes?

No

…debug!` (apache#1689)

…ntegration` tests (apache#1684) * Move tests from context.rs to information_schema.rs * Fix up tests to compile

…on test (apache#1696) * Move some tests out of context.rs and into sql * Move support test out of context.rs and into sql tests * Fixup tests and make them compile

…ry consumers (apache#1691) * Memory manager no longer track consumers, update aggregatedMetricsSet * Easy memory tracking with metrics * use tracking metrics in SPMS * tests * fix * doc * Update datafusion/src/physical_plan/sorts/sort.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * make tracker AtomicUsize Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Add TableProvider impl for DataFrameImpl * Add physical plan in * Clean up plan construction and names construction * Remove duplicate comments * Remove unused parameter * Add test * Remove duplicate limit comment * Use cloned instead of individual clone * Reduce the amount of code to get a schema Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Add comments to test * Fix plan comparison * Compare only the results of execution * Remove println * Refer to df_impl instead of table in test Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Fix the register_table test to use the correct result set for comparison * Consolidate group/agg exprs * Format * Remove outdated comment Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Rebase * impl ToNumeric for ScalarValue * Update macro to be based on * Add floats * Cleanup * Newline

…tPhysicalPlanner` for faster speed (apache#1700) * Change physical_expr creation API * Refactor API usage to avoid creating ExecutionContextState * Fixup ballista * clippy!

…1665) * fix can not load parquet table form spark * add Invalid file in log. * fix fmt

Signed-off-by: remzi <13716567376yh@gmail.com>

…e#1709) * Create SchemaAdapter trait to map table schema to file schemas * Linting fix * Remove commented code

* feat: implement TDigest for approx quantile Adds a [TDigest] implementation providing approximate quantile estimations of large inputs using a small amount of (bounded) memory. A TDigest is most accurate near either "end" of the quantile range (that is, 0.1, 0.9, 0.95, etc) due to the use of a scalaing function that increases resolution at the tails. The paper claims single digit part per million errors for q ≤ 0.001 or q ≥ 0.999 using 100 centroids, and in practice I have found accuracy to be more than acceptable for an apprixmate function across the entire quantile range. The implementation is a modified copy of https://github.com/MnO2/t-digest, itself a Rust port of [Facebook's C++ implementation]. Both Facebook's implementation, and Mn02's Rust port are Apache 2.0 licensed. [TDigest]: https://arxiv.org/abs/1902.04023 [Facebook's C++ implementation]: https://github.com/facebook/folly/blob/main/folly/stats/TDigest.h * feat: approx_quantile aggregation Adds the ApproxQuantile physical expression, plumbing & test cases. The function signature is: approx_quantile(column, quantile) Where column can be any numeric type (that can be cast to a float64) and quantile is a float64 literal between 0 and 1. * feat: approx_quantile dataframe function Adds the approx_quantile() dataframe function, and exports it in the prelude. * refactor: bastilla approx_quantile support Adds bastilla wire encoding for approx_quantile. Adding support for this required modifying the AggregateExprNode proto message to support propigating multiple LogicalExprNode aggregate arguments - all the existing aggregations take a single argument, so this wasn't needed before. This commit adds "repeated" to the expr field, which I believe is backwards compatible as described here: https://developers.google.com/protocol-buffers/docs/proto3#updating Specifically, adding "repeated" to an existing message field: "For ... message fields, optional is compatible with repeated" No existing tests needed fixing, and a new roundtrip test is included that covers the change to allow multiple expr. * refactor: use input type as return type Casts the calculated quantile value to the same type as the input data. * fixup! refactor: bastilla approx_quantile support * refactor: rebase onto main * refactor: validate quantile value Ensures the quantile values is between 0 and 1, emitting a plan error if not. * refactor: rename to approx_percentile_cont * refactor: clippy lints

* suppport bitwise and as an example * Use $OP in macro rather than `&` * fix: change signature to &dyn Array * fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Convert boolean case expressions to boolean logic * Review feedback

* Substitute parking_lot::Mutex for std::sync::Mutex * enable parking_lot feature in tokio

* Add Expression Simplification API * fmt

…merge conflicts

houqp

Left one minor comment, the rest looks good to me. Thanks @Igosuki for this work. Sorry that I wasn't able to help with the migration last weekend because something urgent came up :(

datafusion/src/physical_plan/expressions/cast.rs

…tics::add

…hema to the RecordBatch

Igosuki · 2022-02-03T12:36:36Z

Fixed the parquet tests as well as some of the decimal tests. The rest of the failures are due to weird behavior when casting decimals...

houqp · 2022-02-06T03:32:11Z

Float to decimal failure is caused by upstream bug, I will send a PR to get it fixed shortly.

alamb · 2022-02-08T21:17:11Z

I don't see any reason not to merge this into the arrow2 branch! ✅

xudong963 and others added 28 commits January 27, 2022 10:17

feat: add join type for logical plan display (apache#1674)

7b8d72c

(minor) Reduce memory manager and disk manager logs from info! to `…

18ced8d

…debug!` (apache#1689)

Move information_schema tests out of execution/context.rs to `sql_i…

ed1de63

…ntegration` tests (apache#1684) * Move tests from context.rs to information_schema.rs * Fix up tests to compile

Move timestamp related tests out of context.rs and into sql integrati…

ab145c8

…on test (apache#1696) * Move some tests out of context.rs and into sql * Move support test out of context.rs and into sql tests * Fixup tests and make them compile

Fix parquet projection

39632dd

fix pruning casting

a34213e

fix test based on debug strings

530f4f4

revert read_spill method by getting schema from file

b95044e

refine test in repartition.rs & coalesce_batches.rs (apache#1707)

75c7578

Fuzz test for spillable sort (apache#1706)

a7f0156

Lazy TempDir creation in DiskManager (apache#1695)

fecce97

Incorporate dyn scalar kernels (apache#1685)

3494e9c

* Rebase * impl ToNumeric for ScalarValue * Update macro to be based on * Add floats * Cleanup * Newline

add annotation for select_to_plan (apache#1714)

2512608

Support create_physical_expr and ExecutionContextState or `Defaul…

1caf52a

…tPhysicalPlanner` for faster speed (apache#1700) * Change physical_expr creation API * Refactor API usage to avoid creating ExecutionContextState * Fixup ballista * clippy!

Fix can not load parquet table form spark in datafusion-cli. (apache#…

f849968

…1665) * fix can not load parquet table form spark * add Invalid file in log. * fix fmt

add upper bound for pub fn (apache#1713)

d01d8d5

Signed-off-by: remzi <13716567376yh@gmail.com>

Create SchemaAdapter trait to map table schema to file schemas (apach…

7bec762

…e#1709) * Create SchemaAdapter trait to map table schema to file schemas * Linting fix * Remove commented code

suppport bitwise and as an example (apache#1653)

940d4eb

* suppport bitwise and as an example * Use $OP in macro rather than `&` * fix: change signature to &dyn Array * fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

fix: substr - correct behaivour with negative start pos (apache#1660)

b6ace16

minor: fix cargo run --release error (apache#1723)

bacf10d

Convert boolean case expressions to boolean logic (apache#1719)

b9a8f15

* Convert boolean case expressions to boolean logic * Review feedback

substitute parking_lot::Mutex for std::sync::Mutex (apache#1720)

46879f1

* Substitute parking_lot::Mutex for std::sync::Mutex * enable parking_lot feature in tokio

Add Expression Simplification API (apache#1717)

e4a056f

* Add Expression Simplification API * fmt

Use arrow2 0.9 and pull master from Feb 2 2022

469731b

use from_slice(&[T]) instead of from_slice(Vec<T>) to prevent future …

5ad5f7c

…merge conflicts

github-actions bot added ballista labels Feb 2, 2022

github-actions bot added the sql SQL Planner label Feb 2, 2022

houqp approved these changes Feb 3, 2022

View reviewed changes

datafusion/src/physical_plan/expressions/cast.rs Outdated Show resolved Hide resolved

fix decimal add because arrow2 doesn't include decimal add in arithme…

b8f9bc2

…tics::add

Igosuki force-pushed the arrow2_test_fix branch from 1577d92 to b8f9bc2 Compare February 3, 2022 09:31

Igosuki added 2 commits February 3, 2022 10:53

fix decimal scale for cast test

80078b5

fix parquet file format adapted projection by providing the proper sc…

f2debbb

…hema to the RecordBatch

Igosuki force-pushed the arrow2_test_fix branch from 8ddbc47 to f2debbb Compare February 3, 2022 18:22

houqp mentioned this pull request Feb 6, 2022

Fixed float to i128 cast jorgecarleitao/arrow2#817

Merged

alamb merged commit 83f937a into apache:arrow2 Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow2 test fix#1733

Arrow2 test fix#1733
alamb merged 31 commits intoapache:arrow2from
Igosuki:arrow2_test_fix

Igosuki commented Feb 2, 2022

Uh oh!

houqp left a comment

Uh oh!

Uh oh!

Igosuki commented Feb 3, 2022

Uh oh!

houqp commented Feb 6, 2022

Uh oh!

alamb commented Feb 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

Igosuki commented Feb 2, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

houqp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Igosuki commented Feb 3, 2022

Uh oh!

houqp commented Feb 6, 2022

Uh oh!

alamb commented Feb 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants