Conversation
|
@comphead The AMD64 bulid issue can be fixed by merging latest from master |
| typed_sum!(lhs, rhs, UInt64, u64) | ||
| (DataType::Float32, _) | (_, DataType::Float32) => { | ||
| let data: ArrayRef = | ||
| union_arrays!(lhs, rhs, &DataType::Float32, Float32Array, "f32"); |
There was a problem hiding this comment.
this is an interesting idea, but I suspect the performance will be fairly low (as it creates arrays for each value 🤔 )
I wonder if we could move the sum logic into scalar.rs and instead add some sort of coertion logic
Not sure.
There was a problem hiding this comment.
do you mean add a sum function to scalar value struct?
like
sum(self, other: ScalarValue) -> ScalarValue
There was a problem hiding this comment.
Went through those PRs. Also currently we have negate function on scalar value level.
pub fn arithmetic_negate(&self) -> Self {
Probably we can extend this trend and add other common math functions to Scalar Value too like sum, minus, multiply, divide?
There was a problem hiding this comment.
I am torn -- basically I worry that adding the common math functions to ScalarValue will result in them being used more (and they are very slow). See more discussion on #1525 (review)
However, if the alternative is a bunch of replicated code over the codebase, consolidating that all into ScalarValue seems like a much better outcome
There was a problem hiding this comment.
I think we had a bunch of discussion on a related topic (how to handle constants and arrays similarly) on #1248
Maybe that will provide some insight
There was a problem hiding this comment.
I came up to idea like that, it should put the boilerplate once. And then use external functions over values.
use std::ops::Sub;
use std::ops::Add;
use std::fmt::Debug;
use std::any::Any;
#[derive(Debug, Copy, Clone)]
enum DataType {
Int32(Option<i32>),
Float64(Option<f64>)
}
// boilerplate comes here
macro_rules! op {
($ARG1: expr, $ARG2: expr, $FUNC: block) => {{
let res = match ($ARG1, $ARG2) {
(DataType::Int32(Some(v1)), DataType::Float64(Some(v2))) =>
DataType::Float64(Some($FUNC(v1 as f64, v2 as f64))),
_ => panic!("123")
};
res
}};
}
fn sum<T:Add<Output = T> + Copy> (num1: T, num2: T) -> T {
return num1 + num2;
}
fn minus<T:Sub<Output = T> + Copy> (num1: T, num2: T) -> T {
return num1 - num2;
}
fn main() {
let i_32 = DataType::Int32(Some(1));
let f_64 = DataType::Float64(Some(2.0));
let s = op!(i_32, f_64, {sum});
dbg!(&s);
let m = op!(i_32, f_64, {minus});
dbg!(&m);
}
There was a problem hiding this comment.
There was a problem hiding this comment.
What about just cut all this coercions logic? I've investigate all the occurrences of sum(), it's only used to accumulate aggregator state in sum, sum_distinct and average where the operand's type of sum() is deterministic. And sum() is an internal function (pub(crate)), API change of this function is acceptable.
I try to remove all the match arms that with different operand types and only fail two cases (sum_distinct_i32_with_nulls and sum_distinct_u32_with_nulls). I think this is acceptable. And I find the min/max calculator already applied this.
About how to achieve calculate operator over different types, I think we can extract our coercion rule to something like
fn coercion(lhs: DataType, rhs: DataType) -> DataType {}And cast both operands to the result type before calculation.
There was a problem hiding this comment.
Thanks @waynexia for your response.
I think we already have similar coercion in type_coercion.rs
/// Returns the data types that each argument must be coerced to match
/// `signature`.
///
/// See the module level documentation for more detail on coercion.
pub fn data_types(
current_types: &[DataType],
signature: &Signature,
) -> Result<Vec<DataType>> {
Imho AggregateFunction::Sum => sum_return_type(&coerced_data_types[0]), already does the proposed solution.
I'm still afraid the problem is not in the result type coercion but how to do operation with underlying values using correct datatypes without boilerplate.
|
This PR is more than 6 month old, so closing it down for now to clean up the PR list. Please reopen if this is a mistake and you plan to work on it more |
Which issue does this PR close?
Closes #2447 .
Rationale for this change
The current sum function looks a bit boilerplate, this is attempt to refactor it.
What changes are included in this PR?
Are there any user-facing changes?