[SPARK-23160][SQL][TEST] Port window.sql#24881
Conversation
|
Question: should I check the |
|
ok to test |
There was a problem hiding this comment.
Thank you for making a PR, @DylanGuedes .
- You don't need to avoid the test coverage duplication . The purpose of this porting is to ensure Apache Spark's capability.
- For the error and different results, please file Apache Spark JIRA issues after checking the duplications.
cc @gatorsmile
|
|
||
| SELECT depname, empno, salary, sum(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname); | ||
|
|
||
| -- I get an error when trying `order by rank() over w`, however it works for `order by r' if column rank is renamed to r |
There was a problem hiding this comment.
Please keep the following original query as a comment. And file a JIRA.
SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w;
|
|
||
| SELECT ntile(3) OVER (ORDER BY ten, four), ten, four FROM tenk1 WHERE unique2 < 10; | ||
|
|
||
| -- Spark does not accept null as input for `ntile` |
There was a problem hiding this comment.
Please file an Apache Spark JIRA issue if it doesn't exist.
|
Test build #106545 has finished for PR 24881 at commit
|
| SELECT i,SUM(v) OVER (ORDER BY i ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) | ||
| FROM (VALUES(1,1),(2,2),(3,3),(4,4)) t(i,v); | ||
|
|
||
| -- bool_and? |
|
Hey guys, I added new references to JIRAs. I removed the WIP tag so that the CI could run because I wanted to be sure that I wasn't breaking other things - and I was lmao. I forgot to drop the tables at the end of the file. However, I still need to pay attention to a few things to finish this. For instance, postgres have tons of tests with UDF+Window, but I don't know if we should also because I don't think that we can create UDFs in SQL. Btw, a question: |
|
Test build #106556 has finished for PR 24881 at commit
|
|
Test build #106561 has finished for PR 24881 at commit
|
|
Btw, another question: the CI isn't passing and I can't identify which query causes it: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106561/testReport/junit/org.apache.spark.sql/SQLQueryTestSuite/sql/ |
|
Test build #106563 has finished for PR 24881 at commit
|
|
@DylanGuedes Please re-generate golden file by: SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z window.sql"You can verify it by: build/sbt "sql/test-only *SQLQueryTestSuite -- -z window.sql" |
|
Hmm I was always generating them, I think that at the end I made a minor change and forgot to rerun then because I just changed a comment. Whatever, thank you! |
|
I updated with a few changes. I just got noticed (after @wangyum help) that I was having problems in |
|
Test build #106596 has finished for PR 24881 at commit
|
|
Retest this please. |
There was a problem hiding this comment.
Ur, please regenerate this test file.
This output is wrong.
|
The failure is due to the wrong generated output. Please configure your Python environment and regenerate this. |
|
Test build #106687 has finished for PR 24881 at commit
|
|
You are right, I had an environment variable that was setting ipython instead. Whatever, I regenerated the golden files, looks fine now. |
|
Test build #106722 has finished for PR 24881 at commit
|
|
By the way, the CI looks fine now. |
|
Hi, @DylanGuedes . Could you compare the result with PostgreSQL. Is the output correct? |
|
@dongjoon-hyun You are absolutely right - for some reason, results from two queries were not being truncated. I commented them out and will probably create a JIRA for them, but for now I just commented they out. |
|
Test build #107284 has finished for PR 24881 at commit
|
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
de9361c to
826708d
Compare
|
Test build #109122 has finished for PR 24881 at commit
|
|
Yea, I'll check now. |
maropu
left a comment
There was a problem hiding this comment.
I personally think that this pr is too big in a sing pr. So, how about splitting into smaller prs along with the aggregate tests? WDYT? @dongjoon-hyun @wangyum
| -- !query 145 schema | ||
| struct<> | ||
| -- !query 145 output | ||
| org.apache.spark.sql.AnalysisException |
There was a problem hiding this comment.
There was a problem hiding this comment.
Because Spark does not handle 'NaN' for inline tables.
| SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) | ||
| FROM (VALUES(1,1),(2,2),(3,'NaN'),(4,3),(5,4)) t(a,b); | ||
|
|
||
| select f_float4, sum(f_float4) over (order by f_float8 rows between 1 |
There was a problem hiding this comment.
Where this test comes from? https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql#L1249
There was a problem hiding this comment.
Lol, I don't know what happened there. Whatever, it is fixed now.
| FROM (VALUES(1,1.5),(2,2.5),(3,NULL),(4,NULL)) t(i,v); | ||
|
|
||
| -- [SPARK-28602] Spark does not recognize 'interval' type as 'numeric' | ||
| -- SELECT i,AVG(cast(v as interval)) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) |
There was a problem hiding this comment.
plz keep the original query: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql#L1133
| -- WINDOW wnd AS (ORDER BY i ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) | ||
| -- ORDER BY i; | ||
|
|
||
| -- SELECT |
There was a problem hiding this comment.
Can you describe a comment-out reason for each query where possible?
| struct<> | ||
| -- !query 104 output | ||
| org.apache.spark.sql.AnalysisException | ||
| Undefined function: 'range'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 |
There was a problem hiding this comment.
This error is not expected one: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/window.out#L2982
There was a problem hiding this comment.
Although the message is misleading, I think that they represent the same: there's no existent range aggregated function. I should JIRA that?
|
@DylanGuedes any update? |
|
@gatorsmile @maropu Sorry guys, altough I`ve got noticed about some of your comments, I didn't get any notification about the new suggestions to the merge, so I thought that your guys were discussing about splitting the merge or not. I'll try to finish all the suggestions til next week. Also, if you guys preffer a splitted merge, I can work on it, for sure. |
|
ping @dongjoon-hyun @wangyum |
| -- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group | ||
| -- | ||
| -- Window Functions Testing | ||
| -- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql |
There was a problem hiding this comment.
REL_12_BETA2 -> REL_12_BETA3. Let's use REL_12_BETA3.
Signed-off-by: DylanGuedes <djmgguedes@gmail.com>
|
Test build #110280 has finished for PR 24881 at commit
|
|
ok to test |
|
This seems the last ticket in PostgreSQL umbrella. Let's get this done. |
|
+1 for the split.... |
|
@DylanGuedes Are you still here? Can you split this pr (~1400 lines) into 4 parts (each file has 300-400 lines) by referring |
|
@maropu Yes, I'm following everything. Yes, I can split th PR. |
|
Thanks! |
|
Test build #110683 has finished for PR 24881 at commit
|
|
Close this pr cuz I saw your split prs. |
What changes were proposed in this pull request?
This PR ports window.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/window.sql
The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/window.out
How was this patch tested?
Pass the Jenkins.