[SPARK-20450] [SQL] Unexpected first-query schema inference cost with 2.1.1 by ericl · Pull Request #17749 · apache/spark

ericl · 2017-04-24T19:35:14Z

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-19611 fixes a regression from 2.0 where Spark silently fails to read case-sensitive fields missing a case-sensitive schema in the table properties. The fix is to detect this situation, infer the schema, and write the case-sensitive schema into the metastore.

However this can incur an unexpected performance hit the first time such a problematic table is queried (and there is a high false-positive rate here since most tables don't actually have case-sensitive fields).

This PR changes the default to NEVER_INFER (same behavior as 2.1.0). In 2.2, we can consider leaving the default to INFER_AND_SAVE.

How was this patch tested?

Unit tests.

… 2.1.1 RC

mallman · 2017-04-24T21:40:54Z

LGTM (pending tests, of course)

SparkQA · 2017-04-24T21:56:55Z

Test build #76116 has finished for PR 17749 at commit 4c0ff63.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2017-04-24T22:32:42Z

LGTM - merging to 2.1. Thanks!

hvanhovell · 2017-04-24T22:33:23Z

can you close?

…2.1.1 ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-19611 fixes a regression from 2.0 where Spark silently fails to read case-sensitive fields missing a case-sensitive schema in the table properties. The fix is to detect this situation, infer the schema, and write the case-sensitive schema into the metastore. However this can incur an unexpected performance hit the first time such a problematic table is queried (and there is a high false-positive rate here since most tables don't actually have case-sensitive fields). This PR changes the default to NEVER_INFER (same behavior as 2.1.0). In 2.2, we can consider leaving the default to INFER_AND_SAVE. ## How was this patch tested? Unit tests. Author: Eric Liang <ekl@databricks.com> Closes #17749 from ericl/spark-20450.

[SPARK-20450] [SQL] Unexpected first-query schema inference cost with…

4c0ff63

… 2.1.1 RC

ericl closed this Apr 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20450] [SQL] Unexpected first-query schema inference cost with 2.1.1#17749

[SPARK-20450] [SQL] Unexpected first-query schema inference cost with 2.1.1#17749
ericl wants to merge 1 commit into
apache:branch-2.1from
ericl:spark-20450

ericl commented Apr 24, 2017

Uh oh!

mallman commented Apr 24, 2017 •

edited

Loading

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

hvanhovell commented Apr 24, 2017

Uh oh!

hvanhovell commented Apr 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ericl commented Apr 24, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

mallman commented Apr 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Apr 24, 2017

Uh oh!

hvanhovell commented Apr 24, 2017

Uh oh!

hvanhovell commented Apr 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mallman commented Apr 24, 2017 •

edited

Loading