Conversation
|
cc @andygrove @alamb @yjshen |
| "+--------------+-----------------------+", | ||
| "| AVG(test.c1) | SUM(DISTINCT test.c2) |", | ||
| "+--------------+-----------------------+", | ||
| "| 1.75 | 3 |", |
| (ScalarValue::Int64(lhs), ScalarValue::Int8(rhs)) => { | ||
| typed_sum!(lhs, rhs, Int64, i64) | ||
| } | ||
| (ScalarValue::Int64(lhs), ScalarValue::UInt64(rhs)) => { |
There was a problem hiding this comment.
This looks like a fine change in this PR -- though it is strange to me that we have to be doing these casts in sum.rs as it duplicates some non trivial amount of the logic in coercion -- maybe it would be possible to make this code cleaner / consolidate more of the coercion logic.
Again, no changes needed for this PR but I figured I would point it out while reading this
There was a problem hiding this comment.
Thanks @alamb
I'd look into this coercion logic in some follow up pr
There was a problem hiding this comment.
Coercion u64 to i64 seems irrational to me. Why do we need this kind of coercion in sum distinct?
There was a problem hiding this comment.
I think @WinkerDu is just following the same pre-existing pattern with this PR
I agree that the pre-existing pattern doesn't make sense to me (it should have already been done by the time the distinct aggregate code is being executed)
|
|
||
| fn update(&mut self, values: &[ScalarValue]) -> Result<()> { | ||
| values.iter().for_each(|v| { | ||
| // If the value is NULL, it is not included in the final sum. |
| generic_test_sum_distinct!( | ||
| array, | ||
| DataType::Int32, | ||
| ScalarValue::from(6i64), |
| generic_test_sum_distinct!( | ||
| array, | ||
| DataType::UInt32, | ||
| ScalarValue::from(4i64), |
|
cc @andygrove @tustvold @Dandandan |
|
Thanks @WinkerDu for the code and @yjshen @Ted-Jiang and @xudong963 for the reviews. It will be nice to clean up some of the coercion logic -- @WinkerDu let me know if I should file a ticket to track that work |
Which issue does this PR close?
Closes #2404 .
Rationale for this change
For now, DataFusion hasn't support
sum(distinct)yet. Though optimizerSingleDistinctToGroupBy(#1315) supports single distinct usage, there are more SQL scenes have not been covered.For example, runs the following unit test,
then error raises like
What changes are included in this PR?
Introduces
expressions::DistinctSuminto DFHashSetto record unique numeric list, updateHashSetwhen new item input.ScalarValue::List, which Built fromHashSet.DistinctSumby computing sum from numeric stored inHashSetAre there any user-facing changes?
No.