implement lead and lag built-in window function#429
Conversation
b78b355 to
26ca0fe
Compare
Codecov Report
@@ Coverage Diff @@
## master #429 +/- ##
==========================================
- Coverage 75.16% 75.15% -0.02%
==========================================
Files 150 152 +2
Lines 25144 25357 +213
==========================================
+ Hits 18899 19056 +157
- Misses 6245 6301 +56
Continue to review full report at Codecov.
|
4161e98 to
fddf52a
Compare
|
I plan to review this PR tomorrow |
9d153a7 to
737c2dd
Compare
|
Actually let's park this pull request for a while - I plan to implement sort and partition first and then window frame, after which the window shift approach might not be relevant. |
737c2dd to
225c7ec
Compare
a4523e6 to
f676db8
Compare
now that #520 is implemented, this PR is ready |
f676db8 to
7db8d17
Compare
|
putting this back to draft as this relies on apache/arrow-rs#388 which is not yet in arrow 4.3 |
Oh no! Can we possibly use the API that is in Arrow 4.3 (and then we can upgrade datafusion to use the new api when the next version of Arrow comes out)? |
I don't mind parking this one here for a while since there would be many other window frame stuff to be done before revisiting this and by that time newer version would be released |
Ok, thank you. The plan is to do a 4.4 release in ~ 2 weeks |
e2d40bc to
9f78341
Compare
9f78341 to
3a88c0d
Compare
ca475b4 to
1fae443
Compare
|
@alamb and @Dandandan this pull request is ready now |
alamb
left a comment
There was a problem hiding this comment.
Looks like a great start @jimexist -- I do have a question if this will generate the correct answer with multiple partitions.
Also, I suggest an end-to-end integration test using your great harness, but I suspect you plan to do so in a subsequent PR :) 👍
| impl PartitionEvaluator for WindowShiftEvaluator { | ||
| fn evaluate_partition(&self, _partition: Range<usize>) -> Result<ArrayRef> { | ||
| let value = &self.values[0]; | ||
| shift(value.as_ref(), self.shift_offset).map_err(DataFusionError::ArrowError) |
There was a problem hiding this comment.
do you need to restrict the window to the partition bounds? If the input array had 10 rows in 2 partitions, wouldn't this code produce 2 output partitions of 10 rows each (rather than 2 output partitions of 5 rows each)?
There was a problem hiding this comment.
@alamb good catch, this is fixed and add with integration tests.
1fae443 to
29fdc24
Compare
|
Thanks @jimexist -- I ran out of time today but will check this out tomorrow |
Which issue does this PR close?
implement lead and lag built-in window function.
based on #520 so review that first
Closes #553
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?