[SPARK-19471] AggregationIterator does not initialize the generated result projection before using it by yangw1234 · Pull Request #16820 · apache/spark

yangw1234 · 2017-02-06T12:21:24Z

What changes were proposed in this pull request?

When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions.

This problem was introduced by #15567.

How was this patch tested?

unit test

yangw1234 · 2017-02-06T12:24:05Z

@mengxr @rxin

hvanhovell · 2017-02-06T13:12:40Z

ok to test

hvanhovell · 2017-02-06T13:13:10Z

@yangw1234 could you also check if we need to do this for whole stage code generation?

...and you really need to add tests.

SparkQA · 2017-02-06T13:18:11Z

Test build #72445 has finished for PR 16820 at commit 7ec5ebf.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

yangw1234 · 2017-02-06T14:35:09Z

@hvanhovell thanks for your review. Whole stage code generation seems fine and unit test is added.

SparkQA · 2017-02-06T16:25:16Z

Test build #72450 has finished for PR 16820 at commit 97b07a1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-06T16:49:21Z

Test build #72451 has finished for PR 16820 at commit b9b9693.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2017-02-07T15:41:45Z

+  private def assertNoExceptions(c: Column): Unit = {
+    for (wholeStage <- Seq(true, false)) {
+      withSQLConf((SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, wholeStage.toString)) {
+        spark.range(0, 5).toDF("a").agg(sum("a")).withColumn("v", c).collect()


This test also passes without your test. I think you need to reference a NonDeterministic expression in the aggregate.

Could also make sure that we test all aggregation paths:

HashAggregate

ObjectHashAggregate

SortAggregate

gatorsmile · 2017-05-23T16:36:47Z

@yangw1234 Could you address the comment by @hvanhovell ? Thanks!

yangw1234 · 2017-05-24T03:18:47Z

@gatorsmile Sorry, I totally forget this pr. I will try to address the comment this week (need a little time to re-familiarize the context).

yangw1234 · 2017-06-06T02:39:03Z

Sorry I could not find time to finish this pr recently. Close it for now. If you need this fix, please feel free to base on it and finish it.

…ted result projection before using it ## What changes were proposed in this pull request? Recently, we have also encountered such NPE issues in our production environment as described in: https://issues.apache.org/jira/browse/SPARK-19471 This issue can be reproduced by the following examples: ` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4))).toDF("x", "y") //HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false df.groupBy("x").agg(rand(),sum("y")).show() //ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false df.groupBy("x").agg(rand(),collect_list("y")).show() //SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false &&SQLConf.USE_OBJECT_HASH_AGG.key=false df.groupBy("x").agg(rand(),collect_list("y")).show()` ` This PR is based on PR-16820(apache#16820) with test cases for all aggregation paths. We want to push it forward. > When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions. ## How was this patch tested? unit test verified in production environment Author: donnyzone <wellfengzhu@gmail.com> Closes apache#18920 from DonnyZone/Branch-spark-19471.

SPARK-19471

7ec5ebf

wangyang added 2 commits February 6, 2017 22:00

fix build

97b07a1

add test

b9b9693

hvanhovell reviewed Feb 7, 2017

View reviewed changes

yangw1234 closed this Jun 6, 2017

This was referenced Aug 11, 2017

[SPARK-19471][SQL]AggregationIterator does not initialize the generated result projection before using it #18919

Closed

[SPARK-19471][SQL]AggregationIterator does not initialize the generated result projection before using it #18920

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-19471] AggregationIterator does not initialize the generated result projection before using it#16820

[SPARK-19471] AggregationIterator does not initialize the generated result projection before using it#16820
yangw1234 wants to merge 3 commits into
apache:masterfrom
yangw1234:proj

yangw1234 commented Feb 6, 2017 •

edited

Loading

Uh oh!

yangw1234 commented Feb 6, 2017

Uh oh!

hvanhovell commented Feb 6, 2017

Uh oh!

hvanhovell commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

yangw1234 commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

hvanhovell Feb 7, 2017

Uh oh!

gatorsmile commented May 23, 2017

Uh oh!

yangw1234 commented May 24, 2017

Uh oh!

yangw1234 commented Jun 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yangw1234 commented Feb 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

yangw1234 commented Feb 6, 2017

Uh oh!

hvanhovell commented Feb 6, 2017

Uh oh!

hvanhovell commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

yangw1234 commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

SparkQA commented Feb 6, 2017

Uh oh!

hvanhovell Feb 7, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented May 23, 2017

Uh oh!

yangw1234 commented May 24, 2017

Uh oh!

yangw1234 commented Jun 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yangw1234 commented Feb 6, 2017 •

edited

Loading