Change flatten so it does only a level, not recursively#15160
Change flatten so it does only a level, not recursively#15160alamb merged 8 commits intoapache:mainfrom
flatten so it does only a level, not recursively#15160Conversation
| _ => arg_types[0].clone(), | ||
| }, | ||
| LargeList(field) => match field.data_type() { | ||
| LargeList(field) => LargeList(Arc::clone(field)), |
There was a problem hiding this comment.
It deserves to return LargeList, if the nested data is List
| LargeList(field) => LargeList(Arc::clone(field)), | |
| List(field) | FixedSizeList(field, _) | LargeList(field) => { | |
| LargeList(Arc::clone(field)) | |
| } |
There was a problem hiding this comment.
Ah ok, I'll need to make some changes to support this. Currently trying to flatten LargeList(List) will fail when casting the inner list to generic list with i64 offset (using the type parameter O)
| signature: Signature { | ||
| // TODO (https://github.com/apache/datafusion/issues/13757) flatten should be single-step, not recursive | ||
| type_signature: TypeSignature::ArraySignature( | ||
| ArrayFunctionSignature::RecursiveArray, |
There was a problem hiding this comment.
Is this signature still applicable?
Maybe we should switch to Array, and mark RecursiveArray as deprecated?
There was a problem hiding this comment.
What would the deprecated note be since, from what I understand, this was added in specifically for flatten to recursively coerce FixedLengthList to List. I'm wondering if any users would rely on that downstream
|
Hey @delamarch3 are you still tracking merging this ? interested in this new behavior |
Weijun-H
left a comment
There was a problem hiding this comment.
Thanks @delamarch3 👍 , I think the pr looks good. Waiting for other comments.
| let list_arr = as_list_array(&array)?; | ||
| let flattened_array = flatten_internal::<i32>(list_arr.clone(), None)?; | ||
| Ok(Arc::new(flattened_array) as ArrayRef) | ||
| let (field, offsets, values, _) = as_list_array(&array)?.clone().into_parts(); |
There was a problem hiding this comment.
| let (field, offsets, values, _) = as_list_array(&array)?.clone().into_parts(); | |
| let (field, offsets, values, nulls) = as_list_array(&array)?.clone().into_parts(); |
| inner_field, | ||
| offsets, | ||
| inner_values, | ||
| None, |
There was a problem hiding this comment.
| None, | |
| nulls, |
| let list_arr = as_large_list_array(&array)?; | ||
| let flattened_array = flatten_internal::<i64>(list_arr.clone(), None)?; | ||
| Ok(Arc::new(flattened_array) as ArrayRef) | ||
| let (field, offsets, values, _) = |
There was a problem hiding this comment.
| let (field, offsets, values, _) = | |
| let (field, offsets, values, nulls) = |
| inner_field, | ||
| offsets, | ||
| inner_values, | ||
| None, |
There was a problem hiding this comment.
| None, | |
| nulls, |
| Ok(Arc::new(flattened_array) as ArrayRef) | ||
| } | ||
| LargeList(_) => { | ||
| let (inner_field, inner_offsets, inner_values, _) = |
There was a problem hiding this comment.
| let (inner_field, inner_offsets, inner_values, _) = | |
| let (inner_field, inner_offsets, inner_values, nulls) = |
| inner_field, | ||
| offsets, | ||
| inner_values, | ||
| None, |
There was a problem hiding this comment.
| None, | |
| nulls, |
|
Thanks for the reviews! |
|
@alamb can we get this merged please ? |
flatten so it does only a level, not recursively
* flatten array in a single step instead of recursive * clippy * update flatten type signature to Array * add fixed list to list coercion to flatten signature * support LargeList(List) and LargeList(FixedSizeList) in flatten * add test for LargeList(FixedSizeList) * handle nulls * uncomment flatten(NULL) test - it already works
Which issue does this PR close?
flattenshould be single-step, not recursive #13757Rationale for this change
Parity with the
flattenimplementation in duckdb.What changes are included in this PR?
Remove the recursion in
flatten_internalso that only the top level elements are flattened.Are these changes tested?
Existing sqllogictests have been updated.
Are there any user-facing changes?
Yes,
flattenno longer recursively flattens the array