[SPARK-23160][SQL][TEST] Port window.sql by DylanGuedes · Pull Request #24881 · apache/spark

DylanGuedes · 2019-06-16T00:23:13Z

What changes were proposed in this pull request?

This PR ports window.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql

The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/window.out

How was this patch tested?

Pass the Jenkins.

DylanGuedes · 2019-06-16T01:42:17Z

Question: should I check the window.sql src/test/resources/sql-tests/inputs to remove overlapping tests here or it is ok to have duplicates?

dongjoon-hyun · 2019-06-16T03:40:20Z

ok to test

dongjoon-hyun

Thank you for making a PR, @DylanGuedes .

You don't need to avoid the test coverage duplication . The purpose of this porting is to ensure Apache Spark's capability.
For the error and different results, please file Apache Spark JIRA issues after checking the duplications.

cc @gatorsmile

dongjoon-hyun · 2019-06-16T03:46:03Z

+
+SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname);
+
+-- I get an error when trying `order by rank() over w`, however it works for `order by r' if column rank is renamed to r


Please keep the following original query as a comment. And file a JIRA.

SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w;

dongjoon-hyun · 2019-06-16T03:46:52Z

+
+SELECT ntile(3) OVER (ORDER BY ten, four), ten, four FROM tenk1 WHERE unique2 < 10;
+
+-- Spark does not accept null as input for `ntile`


Please file an Apache Spark JIRA issue if it doesn't exist.

SparkQA · 2019-06-16T05:50:48Z

Test build #106545 has finished for PR 24881 at commit 7ec6b76.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-06-16T12:58:19Z

+SELECT i,SUM(v) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
+  FROM (VALUES(1,1),(2,2),(3,3),(4,4)) t(i,v);
+
+-- bool_and?


Add SPARK-27880 here.

DylanGuedes · 2019-06-16T14:15:16Z

Hey guys, I added new references to JIRAs. I removed the WIP tag so that the CI could run because I wanted to be sure that I wasn't breaking other things - and I was lmao. I forgot to drop the tables at the end of the file. However, I still need to pay attention to a few things to finish this. For instance, postgres have tons of tests with UDF+Window, but I don't know if we should also because I don't think that we can create UDFs in SQL.

Btw, a question:
I checked and the results order are different from postgresql. I.e: if I don't explicitly order by some column, the results between Postgresql and Spark are different. For me, this makes a lot of sense: Spark is a distributed processing engine, so the order may differ unless we explicitly say to not. Should this be documented or it is too obvious?

SparkQA · 2019-06-16T16:20:48Z

Test build #106556 has finished for PR 24881 at commit 4b87073.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-16T21:34:27Z

Test build #106561 has finished for PR 24881 at commit fe280f3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

DylanGuedes · 2019-06-16T22:03:00Z

Btw, another question: the CI isn't passing and I can't identify which query causes it: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106561/testReport/junit/org.apache.spark.sql/SQLQueryTestSuite/sql/
I think that it has some relation with the new tempview that I created (tenk2) but I'm not sure. Suggestions?

SparkQA · 2019-06-17T00:13:46Z

Test build #106563 has finished for PR 24881 at commit b679e58.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-06-17T07:12:35Z

@DylanGuedes Please re-generate golden file by:

SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z window.sql"

You can verify it by:

build/sbt "sql/test-only *SQLQueryTestSuite -- -z window.sql"

DylanGuedes · 2019-06-17T12:21:27Z

Hmm I was always generating them, I think that at the end I made a minor change and forgot to rerun then because I just changed a comment. Whatever, thank you!

DylanGuedes · 2019-06-17T20:28:02Z

I updated with a few changes. I just got noticed (after @wangyum help) that I was having problems in udf-inner-join.sql, but I didn't figured out why since I don't built any changes to it. Do you guys have any idea of the reason?

SparkQA · 2019-06-17T22:30:04Z

Test build #106596 has finished for PR 24881 at commit 8273331.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-06-19T20:56:56Z

Retest this please.

dongjoon-hyun · 2019-06-19T20:58:56Z

Ur, please regenerate this test file.
This output is wrong.

dongjoon-hyun · 2019-06-19T20:59:53Z

The failure is due to the wrong generated output. Please configure your Python environment and regenerate this.

SparkQA · 2019-06-19T23:24:09Z

Test build #106687 has finished for PR 24881 at commit 8273331.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

DylanGuedes · 2019-06-20T12:11:34Z

You are right, I had an environment variable that was setting ipython instead. Whatever, I regenerated the golden files, looks fine now.

SparkQA · 2019-06-20T15:08:48Z

Test build #106722 has finished for PR 24881 at commit dc8915c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

DylanGuedes · 2019-07-02T12:43:36Z

By the way, the CI looks fine now.

dongjoon-hyun · 2019-07-05T08:12:23Z

Hi, @DylanGuedes .
The output file (21,485 line) looks ridiculously bigger than the original one (3823 line).

https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/window.out

Could you compare the result with PostgreSQL. Is the output correct?

DylanGuedes · 2019-07-05T13:09:35Z

@dongjoon-hyun You are absolutely right - for some reason, results from two queries were not being truncated. I commented them out and will probably create a JIRA for them, but for now I just commented they out.

SparkQA · 2019-07-05T16:15:23Z

Test build #107284 has finished for PR 24881 at commit 48bd010.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

SparkQA · 2019-08-14T23:23:52Z

Test build #109122 has finished for PR 24881 at commit 826708d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2019-08-26T05:11:24Z

cc @maropu @dongjoon-hyun @wangyum

maropu · 2019-08-26T05:28:48Z

Yea, I'll check now.

maropu

I personally think that this pr is too big in a sing pr. So, how about splitting into smaller prs along with the aggregate tests? WDYT? @dongjoon-hyun @wangyum

maropu · 2019-08-26T05:39:01Z

+-- !query 145 schema
+struct<>
+-- !query 145 output
+org.apache.spark.sql.AnalysisException


Why this failed? https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/window.out#L3791

Because Spark does not handle 'NaN' for inline tables.

maropu · 2019-08-26T05:41:15Z

+       SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
+FROM (VALUES(1,1),(2,2),(3,'NaN'),(4,3),(5,4)) t(a,b);
+
+select f_float4, sum(f_float4) over (order by f_float8 rows between 1


Where this test comes from? https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql#L1249

Lol, I don't know what happened there. Whatever, it is fixed now.

maropu · 2019-08-26T05:55:46Z

+  FROM (VALUES(1,1.5),(2,2.5),(3,NULL),(4,NULL)) t(i,v);
+
+-- [SPARK-28602] Spark does not recognize 'interval' type as 'numeric'
+-- SELECT i,AVG(cast(v as interval)) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)


plz keep the original query: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql#L1133

maropu · 2019-08-26T06:01:21Z

+-- WINDOW wnd AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
+-- ORDER BY i;
+
+-- SELECT


Can you describe a comment-out reason for each query where possible?

maropu · 2019-08-26T06:04:14Z

+struct<>
+-- !query 104 output
+org.apache.spark.sql.AnalysisException
+Undefined function: 'range'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7


This error is not expected one: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/window.out#L2982

Although the message is misleading, I think that they represent the same: there's no existent range aggregated function. I should JIRA that?

gatorsmile · 2019-09-03T03:11:46Z

@DylanGuedes any update?

DylanGuedes · 2019-09-03T12:57:25Z

@gatorsmile @maropu Sorry guys, altough I`ve got noticed about some of your comments, I didn't get any notification about the new suggestions to the merge, so I thought that your guys were discussing about splitting the merge or not. I'll try to finish all the suggestions til next week. Also, if you guys preffer a splitted merge, I can work on it, for sure.

maropu · 2019-09-04T01:57:28Z

ping @dongjoon-hyun @wangyum

HyukjinKwon · 2019-09-04T10:11:37Z

+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+-- Window Functions Testing
+-- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql


REL_12_BETA2 -> REL_12_BETA3. Let's use REL_12_BETA3.

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

SparkQA · 2019-09-07T18:35:45Z

Test build #110280 has finished for PR 24881 at commit 80f2915.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-16T23:58:01Z

ok to test

HyukjinKwon · 2019-09-16T23:58:43Z

Looks we're blocked. If the current status doesn't looks enough to merge, shall we split? @wangyum, @maropu WDYT? I am willing to help as well.

HyukjinKwon · 2019-09-16T23:59:07Z

This seems the last ticket in PostgreSQL umbrella. Let's get this done.

maropu · 2019-09-17T00:00:20Z

+1 for the split....

maropu · 2019-09-17T00:49:52Z

@DylanGuedes Are you still here? Can you split this pr (~1400 lines) into 4 parts (each file has 300-400 lines) by referring pgSQL/aggregates_partX.sql?
https://github.com/apache/spark/tree/master/sql/core/src/test/resources/sql-tests/inputs/pgSQL

DylanGuedes · 2019-09-17T00:53:36Z

@maropu Yes, I'm following everything. Yes, I can split th PR.

maropu · 2019-09-17T01:10:16Z

Thanks!

SparkQA · 2019-09-17T04:14:38Z

Test build #110683 has finished for PR 24881 at commit 80f2915.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-09-19T04:21:37Z

Close this pr cuz I saw your split prs.

DylanGuedes changed the title ~~[SPARK-23160][SQL][WIP] Port window.sql~~ [SPARK-23160][SQL] Port window.sql Jun 16, 2019

dongjoon-hyun added SQL TESTS labels Jun 16, 2019

dongjoon-hyun changed the title ~~[SPARK-23160][SQL] Port window.sql~~ [SPARK-23160][SQL][TEST] Port window.sql Jun 16, 2019

dongjoon-hyun reviewed Jun 16, 2019

View reviewed changes

wangyum reviewed Jun 16, 2019

View reviewed changes

dongjoon-hyun reviewed Jun 19, 2019

View reviewed changes

DylanGuedes added 9 commits August 14, 2019 15:16

update golden file

76d2820

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

temporarily turns off ansi

bea5ba6

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

exchange jira numbers

d7b1f77

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

remove query

9150d0d

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

uncomment wrong queries

25181fd

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

update with casting types

e776cc3

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

adds new jira

ef23020

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

adds groups jira

60c626a

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

update new PRs

826708d

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

DylanGuedes force-pushed the SPARK-23160 branch from de9361c to 826708d Compare August 14, 2019 19:54

maropu reviewed Aug 26, 2019

View reviewed changes

HyukjinKwon reviewed Sep 4, 2019

View reviewed changes

adds comment for some queries

80f2915

Signed-off-by: DylanGuedes <djmgguedes@gmail.com>

maropu closed this Sep 19, 2019


		SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname);

		-- I get an error when trying `order by rank() over w`, however it works for `order by r' if column rank is renamed to r


		SELECT ntile(3) OVER (ORDER BY ten, four), ten, four FROM tenk1 WHERE unique2 < 10;

		-- Spark does not accept null as input for `ntile`

Conversation

DylanGuedes commented Jun 16, 2019 • edited by dongjoon-hyun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

DylanGuedes commented Jun 16, 2019

Uh oh!

dongjoon-hyun commented Jun 16, 2019

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 16, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DylanGuedes commented Jun 16, 2019

Uh oh!

SparkQA commented Jun 16, 2019

Uh oh!

SparkQA commented Jun 16, 2019

Uh oh!

DylanGuedes commented Jun 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jun 17, 2019

Uh oh!

wangyum commented Jun 17, 2019

Uh oh!

DylanGuedes commented Jun 17, 2019

Uh oh!

DylanGuedes commented Jun 17, 2019

Uh oh!

SparkQA commented Jun 17, 2019

Uh oh!

dongjoon-hyun commented Jun 19, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 19, 2019

Uh oh!

SparkQA commented Jun 19, 2019

Uh oh!

DylanGuedes commented Jun 20, 2019

Uh oh!

SparkQA commented Jun 20, 2019

Uh oh!

DylanGuedes commented Jul 2, 2019

Uh oh!

dongjoon-hyun commented Jul 5, 2019

Uh oh!

DylanGuedes commented Jul 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jul 5, 2019

Uh oh!

SparkQA commented Aug 14, 2019

Uh oh!

gatorsmile commented Aug 26, 2019

Uh oh!

maropu commented Aug 26, 2019

Uh oh!

maropu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

DylanGuedes commented Jun 16, 2019 •

edited by dongjoon-hyun

Loading

dongjoon-hyun left a comment •

edited

Loading

DylanGuedes commented Jun 16, 2019 •

edited

Loading

DylanGuedes commented Jul 5, 2019 •

edited

Loading