[SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator#32944
[SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator#32944c21 wants to merge 5 commits into
Conversation
|
cc @cloud-fan could you help take a look when you have time? Thanks. |
|
does it work well with #32816 ? |
@cloud-fan - I think so. If we decide merge this first, then in #32816, we don't need the extra config |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@c21 thank you for ping me. Not sure it's worth to make cost evaluator as plugin. You mentioned sort (I think it's local sort, isn't it ?), and can you provide a real use case about it ? |
|
Test build #139911 has finished for PR 32944 at commit
|
I don't think it's that simple. If force-skew-join-handling is enabled, Spark must use |
@ulysses-you - e.g. AQE might change it to With our Cosco remote shuffle service, we already implemented the sorted shuffle ( |
@cloud-fan - from my checking of #32816, it looks like the only logic controlled by the new config |
|
@c21 thanks for the explaination, the example |
@ulysses-you - sure, I agree with boolean config is more intuitive and easier to use. If we do need the boolean config, we can add special logic in |
There was a problem hiding this comment.
We can make it an optional conf: spark.sql.adaptive.customCostEvaluatorClass. If not set, we use the builtin impl.
There was a problem hiding this comment.
We can use the standard API in Spark: Utils.loadExtensions
There was a problem hiding this comment.
This can still be an object, if we follow https://github.com/apache/spark/pull/32944/files#r662513062
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
There was a problem hiding this comment.
nit: we don't have to create a method here if it's only called once
There was a problem hiding this comment.
does this custom cost evaluator change the query plan? It seems to be the same with the builtin cost evaluator.
There was a problem hiding this comment.
@cloud-fan - this evaluator does not change plan, and to be the same with the builtin evaluator for this query. Do we want to come up a different one here? I think this just validates the custom evaluator works.
There was a problem hiding this comment.
SGTM, let's leave it then
|
Test build #140547 has finished for PR 32944 at commit
|
|
@c21 can you fix the code conflicts? |
|
@cloud-fan - thanks, just rebased to latest master. |
|
Kubernetes integration test starting |
|
@c21, can you at least mark |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #140567 has finished for PR 32944 at commit
|
|
Test build #140570 has finished for PR 32944 at commit
|
|
@HyukjinKwon - updated per discussion, and this is ready for review again, thanks. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #140618 has finished for PR 32944 at commit
|
| buildConf("spark.sql.adaptive.customCostEvaluatorClass") | ||
| .doc("The custom cost evaluator class to be used for adaptive execution. If not being set," + | ||
| " Spark will use its own SimpleCostEvaluator by default.") | ||
| .version("3.2.0") |
There was a problem hiding this comment.
the only think is that the version has to be 3.3.0 since we cut the branch now. Since this PR won't likely affect anything in the main code, I am okay with merging to 3.2.0 either tho. I will leave it to @cloud-fan and you.
There was a problem hiding this comment.
3.2 is the first version that enables AQE by default, and this seems to be a useful extension. Let's include it in 3.2.
|
thanks, merging to master/3.2! |
### What changes were proposed in this pull request? Current AQE has cost evaluator to decide whether to use new plan after replanning. The current used evaluator is `SimpleCostEvaluator` to make decision based on number of shuffle in the query plan. This is not perfect cost evaluator, and different production environments might want to use different custom evaluators. E.g., sometimes we might want to still do skew join even though it might introduce extra shuffle (trade off resource for better latency), sometimes we might want to take sort into consideration for cost as well. Take our own setting as an example, we are using a custom remote shuffle service (Cosco), and the cost model is more complicated. So We want to make the cost evaluator to be pluggable, and developers can implement their own `CostEvaluator` subclass and plug in dynamically based on configuration. The approach is to introduce a new config to allow define sub-class name of `CostEvaluator` - `spark.sql.adaptive.customCostEvaluatorClass`. And add `CostEvaluator.instantiate` to instantiate the cost evaluator class in `AdaptiveSparkPlanExec.costEvaluator`. ### Why are the changes needed? Make AQE cost evaluation more flexible. ### Does this PR introduce _any_ user-facing change? No but an internal config is introduced - `spark.sql.adaptive.customCostEvaluatorClass` to allow custom implementation of `CostEvaluator`. ### How was this patch tested? Added unit test in `AdaptiveQueryExecSuite.scala`. Closes #32944 from c21/aqe-cost. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 044dddf) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
Thank you @cloud-fan and @HyukjinKwon for review! |
What changes were proposed in this pull request?
Current AQE has cost evaluator to decide whether to use new plan after replanning. The current used evaluator is
SimpleCostEvaluatorto make decision based on number of shuffle in the query plan. This is not perfect cost evaluator, and different production environments might want to use different custom evaluators. E.g., sometimes we might want to still do skew join even though it might introduce extra shuffle (trade off resource for better latency), sometimes we might want to take sort into consideration for cost as well. Take our own setting as an example, we are using a custom remote shuffle service (Cosco), and the cost model is more complicated. So We want to make the cost evaluator to be pluggable, and developers can implement their ownCostEvaluatorsubclass and plug in dynamically based on configuration.The approach is to introduce a new config to allow define sub-class name of
CostEvaluator-spark.sql.adaptive.customCostEvaluatorClass. And addCostEvaluator.instantiateto instantiate the cost evaluator class inAdaptiveSparkPlanExec.costEvaluator.Why are the changes needed?
Make AQE cost evaluation more flexible.
Does this PR introduce any user-facing change?
No but an internal config is introduced -
spark.sql.adaptive.customCostEvaluatorClassto allow custom implementation ofCostEvaluator.How was this patch tested?
Added unit test in
AdaptiveQueryExecSuite.scala.