You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Upstream in DataFusion, there is a common common pattern where we have multiple input RecordBatches and want to produce an output RecordBatch with some subset of the rows from the input batches. This happens in
FilterExec --> CoalesceBatchesExec when filtering
RepartitionExec --> CoalesceBatchesExec
The kernels used here are:
FilterExec uses filter, takes a single input Array and produces a single output Array
RepartitionExec uses take, which also takes a single input Array and produces a single output Array``RepartitionExeceach take a single input batch and produce a single output Array
CoalesceBatchesExec calls concat which takes multple Arrays and produces a single Array as output
The use of these kernels and patterns has two downsides:
Performance overhead due to a second copy: Calling filter/take immediately copies the data, which is copied again in CoalesceBatches (see illustration below)
Memory Overhead / Performance Overhead for GarbageCollecting StringView: Buffering up several RecordBatches with StringView may consume significant amounts of memory for mostly filtered rows, which requires us to run gc periodically which actually slows some things down (see Reduce copying in CoalesceBatchesExec for StringViews datafusion#11628)
Here is an ascii art picture (from apache/datafusion#7957) that shows the extra copy in action
I would like to apply filter/take to each incoming RecordBatch as it arrives, copying the data to an in progress output array, in a way that is as fast as the filter and take operations. This would reduce the extra copy that is currently required.
We only need the output rows to be in the same order as the input batches (so the second usize batch index is not needed)
We don't want to have to buffer all the input
Describe alternatives you've considered
One thing I have thought about is extending the builders so they can append more than one row at a time. For example:
Builder::append_filtered
Builder::append_take
So for example, to filter a stream of StringViewArrays I might do something like;
letmut builder = StringViewBuilder::new();whileletSome(input) = stream.next(){// compute some subset of input rows that make it to the outputlet filter:BooleanArray = compute_filter(&input, ....);// append all rows from input where filter[i] is true
builder.append_filtered(&input,&filter);}
And also add an equivalent for append_take
I think if we did this right, it wouldn't be a lot of new code, we could just refactor the existing filter/take implementations. For example, I would expect that the filter kernel would then devolve into something like
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Upstream in DataFusion, there is a common common pattern where we have multiple input
RecordBatches and want to produce an outputRecordBatchwith some subset of the rows from the input batches. This happens inFilterExec-->CoalesceBatchesExecwhen filteringRepartitionExec-->CoalesceBatchesExecThe kernels used here are:
FilterExecusesfilter, takes a single inputArrayand produces a single outputArrayRepartitionExecusestake, which also takes a single inputArrayand produces a single outputArray``RepartitionExeceach take a single input batch and produce a single outputArrayCoalesceBatchesExeccallsconcatwhich takes multple Arrays and produces a single Array as outputThe use of these kernels and patterns has two downsides:
filter/takeimmediately copies the data, which is copied again inCoalesceBatches(see illustration below)RecordBatches with StringView may consume significant amounts of memory for mostly filtered rows, which requires us to run gc periodically which actually slows some things down (see Reduce copying inCoalesceBatchesExecfor StringViews datafusion#11628)Here is an ascii art picture (from apache/datafusion#7957) that shows the extra copy in action
Describe the solution you'd like
I would like to apply
filter/taketo each incomingRecordBatchas it arrives, copying the data to an in progress output array, in a way that is as fast as thefilterandtakeoperations. This would reduce the extra copy that is currently required.Note this is somewhat like the
interleavekernel, except thatusizebatch index is not needed)Describe alternatives you've considered
One thing I have thought about is extending the builders so they can append more than one row at a time. For example:
Builder::append_filteredBuilder::append_takeSo for example, to filter a stream of StringViewArrays I might do something like;
And also add an equivalent for
append_takeI think if we did this right, it wouldn't be a lot of new code, we could just refactor the existing filter/take implementations. For example, I would expect that the
filterkernel would then devolve into something likeAdditional context
CoalesceBatchesExecto improve performance datafusion#7957