Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ import org.apache.spark.sql.catalyst.util.sideBySide
import org.apache.spark.sql.errors.QueryExecutionErrors
import org.apache.spark.sql.execution._
import org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec._
import org.apache.spark.sql.execution.bucketing.DisableUnnecessaryBucketedScan
import org.apache.spark.sql.execution.bucketing.{CoalesceBucketsInJoin, DisableUnnecessaryBucketedScan}
import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
import org.apache.spark.sql.execution.exchange._
import org.apache.spark.sql.execution.ui.{SparkListenerSQLAdaptiveExecutionUpdate, SparkListenerSQLAdaptiveSQLMetricUpdates, SQLPlanMetric}
Expand Down Expand Up @@ -117,7 +117,10 @@ case class AdaptiveSparkPlanExec(
// around this case.
val ensureRequirements =
EnsureRequirements(requiredDistribution.isDefined, requiredDistribution)
// CoalesceBucketsInJoin can help eliminate shuffles and must be run before
// EnsureRequirements
Seq(
Comment thread
zzzzming95 marked this conversation as resolved.
CoalesceBucketsInJoin,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put it in queryStageOptimizerRules?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rules in queryStageOptimizerRules are invoked less often which is more efficient. The rule CoalesceBucketsInJoin does not change plan partitioning and seems can be put in queryStageOptimizerRules

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test , the UT run failed if CoalesceBucketsInJoin add in queryStageOptimizerRules .

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan Apr 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we spend a bit of time understanding why? Then we can write a code comment to explain it and future developers won't try to move this rule to queryStageOptimizerRules ever.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah , I will provide detailed information and supplement it .

Copy link
Copy Markdown
Contributor Author

@zzzzming95 zzzzming95 Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because queryStageOptimizerRules is not applied at the beginning of the init plan. Instead, they are applied in the createQueryStages() method. And createQueryStages() is bottom-up, which causes the exchange to be eliminated to be wrapped in a layer of ShuffleQueryStage first, making CoalesceBucketsInJoin unrecognizable. And I have added these to the notes at the top. thanks @cloud-fan

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoalesceBucketsInJoin should before EnsureRequirements.

RemoveRedundantProjects,
ensureRequirements,
AdjustShuffleExchangePosition,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.expressions
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.plans.physical.HashPartitioning
import org.apache.spark.sql.execution.{FileSourceScanExec, SortExec, SparkPlan}
import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanExec, AdaptiveSparkPlanHelper, DisableAdaptiveExecution}
import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanExec, AdaptiveSparkPlanHelper}
import org.apache.spark.sql.execution.datasources.BucketingUtils
import org.apache.spark.sql.execution.exchange.ShuffleExchangeExec
import org.apache.spark.sql.execution.joins.SortMergeJoinExec
Expand Down Expand Up @@ -1010,8 +1010,7 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils with Adapti
}
}

test("bucket coalescing is applied when join expressions match with partitioning expressions",
DisableAdaptiveExecution("Expected shuffle num mismatched")) {
test("bucket coalescing is applied when join expressions match with partitioning expressions") {
withTable("t1", "t2") {
df1.write.format("parquet").bucketBy(8, "i", "j").saveAsTable("t1")
df2.write.format("parquet").bucketBy(4, "i", "j").saveAsTable("t2")
Expand All @@ -1023,18 +1022,22 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils with Adapti
query: String,
expectedNumShuffles: Int,
expectedCoalescedNumBuckets: Option[Int]): Unit = {
val plan = sql(query).queryExecution.executedPlan
val shuffles = plan.collect { case s: ShuffleExchangeExec => s }
assert(shuffles.length == expectedNumShuffles)

val scans = plan.collect {
case f: FileSourceScanExec if f.optionalNumCoalescedBuckets.isDefined => f
}
if (expectedCoalescedNumBuckets.isDefined) {
assert(scans.length == 1)
assert(scans.head.optionalNumCoalescedBuckets == expectedCoalescedNumBuckets)
} else {
assert(scans.isEmpty)
Seq(true, false).foreach { aqeEnabled =>
withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> aqeEnabled.toString) {
val plan = sql(query).queryExecution.executedPlan
val shuffles = collect(plan) { case s: ShuffleExchangeExec => s }
assert(shuffles.length == expectedNumShuffles)

val scans = collect(plan) {
case f: FileSourceScanExec if f.optionalNumCoalescedBuckets.isDefined => f
}
if (expectedCoalescedNumBuckets.isDefined) {
assert(scans.length == 1)
assert(scans.head.optionalNumCoalescedBuckets == expectedCoalescedNumBuckets)
} else {
assert(scans.isEmpty)
}
}
}
}

Expand Down