DataFrame supports window function by xudong963 · Pull Request #1167 · apache/datafusion

xudong963 · 2021-10-22T17:56:17Z

Which issue does this PR close?

Closes #1147

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

jimexist

thanks for working on this!

jimexist · 2021-10-24T15:02:02Z

it might be worthwhile to note down the code duplication here - i did plan to optimize for reordering of sorts in future if multiple window functions exist.

houqp

Agree with what @jimexist said, I think logical plan builder is probably the best place to host the duplicated code, for example: LogicalPlanBuilder::from(plan)::windows(window_func_exprs), which takes a list of window expressions, then calls self.window over them.

xudong963 · 2021-10-25T14:26:21Z

Thanks for your comments, I encapsulated redundant code in LogicalPlanBuilder. PTAL @jimexist @houqp

houqp

LGTM, thanks @xudong963. I will let @jimexist do the final review and merge :)

houqp · 2021-10-25T17:25:26Z

datafusion/src/sql/planner.rs

@@ -840,24 +839,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {

    /// Wrap a plan in a window
    fn window(&self, input: LogicalPlan, window_exprs: Vec<Expr>) -> Result<LogicalPlan> {


perhaps we could get rid of this function as well since it now just does a single function call :)

nice caught!

jimexist · 2021-10-26T00:43:25Z

datafusion/src/logical_plan/builder.rs

    }
-
+    /// Wrap a plan in a window
+    pub fn window_plan(


can this be crate public?

jimexist

lgtm thanks

…he#1167) * Add ignored tests for reading structs from Parquet * add basic map test * add tests for Map and Array

* feat: add support for array_contains expression * test: add unit test for array_contains function * Removes unnecessary case expression for handling null values * chore: Move more expressions from core crate to spark-expr crate (apache#1152) * move aggregate expressions to spark-expr crate * move more expressions * move benchmark * normalize_nan * bitwise not * comet scalar funcs * update bench imports * remove dead code (apache#1155) * fix: Spark 4.0-preview1 SPARK-47120 (apache#1156) ## Which issue does this PR close? Part of apache/datafusion-comet#372 and apache/datafusion-comet#551 ## Rationale for this change To be ready for Spark 4.0 ## What changes are included in this PR? This PR fixes the new test SPARK-47120 added in Spark 4.0 ## How are these changes tested? tests enabled * chore: Move string kernels and expressions to spark-expr crate (apache#1164) * Move string kernels and expressions to spark-expr crate * remove unused hash kernel * remove unused dependencies * chore: Move remaining expressions to spark-expr crate + some minor refactoring (apache#1165) * move CheckOverflow to spark-expr crate * move NegativeExpr to spark-expr crate * move UnboundColumn to spark-expr crate * move ExpandExec from execution::datafusion::operators to execution::operators * refactoring to remove datafusion subpackage * update imports in benches * fix * fix * chore: Add ignored tests for reading complex types from Parquet (apache#1167) * Add ignored tests for reading structs from Parquet * add basic map test * add tests for Map and Array * feat: Add Spark-compatible implementation of SchemaAdapterFactory (apache#1169) * Add Spark-compatible SchemaAdapterFactory implementation * remove prototype code * fix * refactor * implement more cast logic * implement more cast logic * add basic test * improve test * cleanup * fmt * add support for casting unsigned int to signed int * clippy * address feedback * fix test * fix: Document enabling comet explain plan usage in Spark (4.0) (apache#1176) * test: enabling Spark tests with offHeap requirement (apache#1177) ## Which issue does this PR close? ## Rationale for this change After apache/datafusion-comet#1062 We have not running Spark tests for native execution ## What changes are included in this PR? Removed the off heap requirement for testing ## How are these changes tested? Bringing back Spark tests for native execution * feat: Improve shuffle metrics (second attempt) (apache#1175) * improve shuffle metrics * docs * more metrics * refactor * address feedback * fix: stddev_pop should not directly return 0.0 when count is 1.0 (apache#1184) * add test * fix * fix * fix * feat: Make native shuffle compression configurable and respect `spark.shuffle.compress` (apache#1185) * Make shuffle compression codec and level configurable * remove lz4 references * docs * update comment * clippy * fix benches * clippy * clippy * disable test for miri * remove lz4 reference from proto * minor: move shuffle classes from common to spark (apache#1193) * minor: refactor decodeBatches to make private in broadcast exchange (apache#1195) * minor: refactor prepare_output so that it does not require an ExecutionContext (apache#1194) * fix: fix missing explanation for then branch in case when (apache#1200) * minor: remove unused source files (apache#1202) * chore: Upgrade to DataFusion 44.0.0-rc2 (apache#1154) * move aggregate expressions to spark-expr crate * move more expressions * move benchmark * normalize_nan * bitwise not * comet scalar funcs * update bench imports * save * save * save * remove unused imports * clippy * implement more hashers * implement Hash and PartialEq * implement Hash and PartialEq * implement Hash and PartialEq * benches * fix ScalarUDFImpl.return_type failure * exclude test from miri * ignore correct test * ignore another test * remove miri checks * use return_type_from_exprs * Revert "use return_type_from_exprs" This reverts commit febc1f1ec1301f9b359fc23ad6a117224fce35b7. * use DF main branch * hacky workaround for regression in ScalarUDFImpl.return_type * fix repo url * pin to revision * bump to latest rev * bump to latest DF rev * bump DF to rev 9f530dd * add Cargo.lock * bump DF version * no default features * Revert "remove miri checks" This reverts commit 4638fe3aa5501966cd5d8b53acf26c698b10b3c9. * Update pin to DataFusion e99e02b * update pin * Update Cargo.toml Bump to 44.0.0-rc2 * update cargo lock * revert miri change --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * update UT Signed-off-by: Dharan Aditya <dharan.aditya@gmail.com> * fix typo in UT Signed-off-by: Dharan Aditya <dharan.aditya@gmail.com> --------- Signed-off-by: Dharan Aditya <dharan.aditya@gmail.com> Co-authored-by: Andy Grove <agrove@apache.org> Co-authored-by: KAZUYUKI TANIMURA <ktanimura@apple.com> Co-authored-by: Parth Chandra <parthc@apache.org> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Only collect one time during display() in jupyter notebooks * Check for juypter notebook environment specifically * Remove approach of checking environment which could not differentiate between jupyter console and notebook * Instead of trying to detect notebook vs console, collect one time when we have any kind if ipython environment.

github-actions bot added the datafusion label Oct 22, 2021

houqp requested a review from jimexist October 23, 2021 06:54

houqp added the enhancement New feature or request label Oct 23, 2021

jimexist approved these changes Oct 23, 2021

View reviewed changes

houqp approved these changes Oct 25, 2021

View reviewed changes

xudong963 force-pushed the df_add_window branch from d5e1335 to b29fd4a Compare October 25, 2021 14:24

github-actions bot added the sql SQL Planner label Oct 25, 2021

houqp approved these changes Oct 25, 2021

View reviewed changes

xudong963 force-pushed the df_add_window branch from b29fd4a to 686e7d7 Compare October 26, 2021 00:06

jimexist reviewed Oct 26, 2021

View reviewed changes

DataFrame supports window function

df510e0

xudong963 force-pushed the df_add_window branch from 686e7d7 to df510e0 Compare October 26, 2021 00:51

jimexist approved these changes Oct 26, 2021

View reviewed changes

jimexist merged commit b9ef8de into apache:master Oct 26, 2021

jimexist mentioned this pull request Oct 26, 2021

Python bindings for window functions #819

Merged

xudong963 deleted the df_add_window branch October 26, 2021 03:19

jgoday pushed a commit to jgoday/arrow-datafusion that referenced this pull request Oct 27, 2021

DataFrame supports window function (apache#1167)

0057285

unkloud pushed a commit to unkloud/datafusion that referenced this pull request Mar 23, 2025

chore: Add ignored tests for reading complex types from Parquet (apac…

f1d0879

…he#1167) * Add ignored tests for reading structs from Parquet * add basic map test * add tests for Map and Array

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame supports window function#1167

DataFrame supports window function#1167
jimexist merged 1 commit intoapache:masterfrom
xudong963:df_add_window

xudong963 commented Oct 22, 2021

Uh oh!

jimexist left a comment

Uh oh!

jimexist commented Oct 24, 2021

Uh oh!

houqp left a comment

Uh oh!

xudong963 commented Oct 25, 2021

Uh oh!

houqp left a comment

Uh oh!

houqp Oct 25, 2021

Uh oh!

xudong963 Oct 26, 2021

Uh oh!

jimexist Oct 26, 2021

Uh oh!

jimexist left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -840,24 +839,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {

		/// Wrap a plan in a window
		fn window(&self, input: LogicalPlan, window_exprs: Vec<Expr>) -> Result<LogicalPlan> {

Conversation

xudong963 commented Oct 22, 2021

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

jimexist left a comment

Choose a reason for hiding this comment

Uh oh!

jimexist commented Oct 24, 2021

Uh oh!

houqp left a comment

Choose a reason for hiding this comment

Uh oh!

xudong963 commented Oct 25, 2021

Uh oh!

houqp left a comment

Choose a reason for hiding this comment

Uh oh!

houqp Oct 25, 2021

Choose a reason for hiding this comment

Uh oh!

xudong963 Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

jimexist Oct 26, 2021

Choose a reason for hiding this comment

Uh oh!

jimexist left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants