[SPARK-26057][SQL] Transform also analyzed plans when dedup references by mgaido91 · Pull Request #23035 · apache/spark

mgaido91 · 2018-11-14T15:20:22Z

What changes were proposed in this pull request?

In SPARK-24865 AnalysisBarrier was removed and in order to improve resolution speed, the analyzed flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans.

How was this patch tested?

added UT

Please review http://spark.apache.org/contributing.html before opening a pull request.

mgaido91 · 2018-11-14T15:20:42Z

cc @cloud-fan @gatorsmile @rxin

SparkQA · 2018-11-14T18:58:40Z

Test build #98829 has finished for PR 23035 at commit 62a895f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-11-15T02:43:39Z

  }
+
+  test("SPARK-26057: attribute deduplication on already analyzed plans") {
+    withTempView("cc", "p", "c") {


if we don't care about naming, how about a, b, c instead of cc, p, c?

cloud-fan · 2018-11-15T02:48:01Z

+          |  WHERE c.id = cc.id AND c.layout = cc.layout AND c.ts > p.ts)
+          |GROUP BY cc.id, cc.layout
+        """.stripMargin).createOrReplaceTempView("pcc")
+      val res = spark.sql(


good catch on the problem! Do you think it's possible to simplify the test? I think we just need a temp view with subquery, and use it in a join.

yes, I simplified as much as I was able to. I hope now it is fine. Thanks.

SparkQA · 2018-11-15T11:50:09Z

Test build #98861 has finished for PR 23035 at commit 98d91a3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-11-15T12:10:57Z

thanks, merging to master/2.4!

## What changes were proposed in this pull request? In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans. ## How was this patch tested? added UT Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #23035 from mgaido91/SPARK-26057. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b46f75a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

## What changes were proposed in this pull request? In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans. ## How was this patch tested? added UT Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#23035 from mgaido91/SPARK-26057. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans. added UT Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#23035 from mgaido91/SPARK-26057. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b46f75a) Signed-off-by: Wenchen Fan <wenchen@databricks.com> Conflicts: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

## What changes were proposed in this pull request? In SPARK-24865 `AnalysisBarrier` was removed and in order to improve resolution speed, the `analyzed` flag was (re-)introduced in order to process only plans which are not yet analyzed. This should not be the case when performing attribute deduplication as in that case we need to transform also the plans which were already analyzed, otherwise we can miss to rewrite some attributes leading to invalid plans. ## How was this patch tested? added UT Please review http://spark.apache.org/contributing.html before opening a pull request. Closes apache#23035 from mgaido91/SPARK-26057. Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b46f75a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

[SPARK-26054][SQL] Trasform also analyzed plans when dedup references

62a895f

mgaido91 changed the title ~~[SPARK-26054][SQL] Transform also analyzed plans when dedup references~~ [SPARK-26057][SQL] Transform also analyzed plans when dedup references Nov 14, 2018

cloud-fan reviewed Nov 15, 2018

View reviewed changes

mgaido91 added 3 commits November 15, 2018 09:10

simplify ut

63c70e5

fix

f23a46e

remove newline

98d91a3

asfgit closed this in b46f75a Nov 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-26057][SQL] Transform also analyzed plans when dedup references#23035

[SPARK-26057][SQL] Transform also analyzed plans when dedup references#23035
mgaido91 wants to merge 4 commits into
apache:masterfrom
mgaido91:SPARK-26057

mgaido91 commented Nov 14, 2018

Uh oh!

mgaido91 commented Nov 14, 2018

Uh oh!

SparkQA commented Nov 14, 2018

Uh oh!

cloud-fan Nov 15, 2018

Uh oh!

cloud-fan Nov 15, 2018

Uh oh!

mgaido91 Nov 15, 2018

Uh oh!

SparkQA commented Nov 15, 2018

Uh oh!

cloud-fan commented Nov 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mgaido91 commented Nov 14, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

mgaido91 commented Nov 14, 2018

Uh oh!

SparkQA commented Nov 14, 2018

Uh oh!

cloud-fan Nov 15, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 15, 2018

Choose a reason for hiding this comment

Uh oh!

mgaido91 Nov 15, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 15, 2018

Uh oh!

cloud-fan commented Nov 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants