[SPARK-40193][SQL][FOLLOWUP] Restrict cached-side If wrapping to original cached range#55500
Closed
cloud-fan wants to merge 2 commits into
Closed
[SPARK-40193][SQL][FOLLOWUP] Restrict cached-side If wrapping to original cached range#55500cloud-fan wants to merge 2 commits into
cloud-fan wants to merge 2 commits into
Conversation
…inal cached range Fix `PlanMerger.mergeNamedExpressions` to wrap only the original cached expressions with the cached plan's filter. The loop previously iterated over all of `mergedExpressions`, including new-plan entries that were appended earlier in the same call and already wrapped with the new plan's filter; re-wrapping them with the cached plan's filter produced double-wrapped `If(cpFilter, If(npFilter, expr, null), null)` expressions, stale `newNPMapping` targets, and analysis failures (missing attribute). Also tighten the `(np: Filter, cp)` and `(np, cp: Filter)` cases in `tryMergePlans` to match only the structurally reachable child results (`cpFilter`/`npFilter` always `None` because the recursion keeps the non-Filter side unchanged), and drop the associated dead-code appends. Co-authored-by: Isaac
Replace `.collect` with `.map` + explicit `assert` in the `(np: Filter, cp)` and `(np, cp: Filter)` cases so a future refactor that surfaces a filter on the unchanged side fails loudly instead of silently dropping the merge. Inline the single-use `cachedPlanLength` local. Co-authored-by: Isaac
Contributor
|
@cloud-fan , I opened a code cleanup and simplification PR yesterday: #55482. It also handles 1. of this PR, but probably the new test of this PR should be added to there. |
Contributor
Author
|
@peter-toth ah didn't see that PR, feel free to take the test here! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Follow-up to #55298 (SPARK-40193). Two related cleanups to
PlanMerger's filter propagation:Correctness fix in
mergeNamedExpressions. Wrapping of unmatched cached expressions with the cached plan's filter now iterates only over the original cached range[0, cachedPlanExpressions.length), not over all ofmergedExpressions. The previous loop also touched new-plan entries that were appended earlier in the same call and already wrapped with the new plan's filter.Tighten the
(np: Filter, cp)/(np, cp: Filter)cases intryMergePlans. Drop the structurally unreachable branches that appendedcpFilter.toSeq/npFilter.map(_._1).toSeqto the newProjectand the correspondingsymmetricFilterPropagationEnabledescape in the guard. In both cases the recursion keeps the non-Filter side unchanged, so no deeper case can expose aFilteron that side — the child result always hascpFilter = None/npFilter = None. MatchingNoneexplicitly makes the invariant explicit and removes dead code that would have produced aProjectwith duplicate attributes if ever reached.Why are the changes needed?
For (1): with symmetric filter propagation enabled (
spark.sql.optimizer.mergeSubplans.symmetricFilterPropagation.enabled = true) and non-attributeProjectexpressions on both sides of the merge, the cached-side loop double-wrapped new-plan-appended expressions withIf(cpFilter, If(npFilter, expr, null), null)and replaced the slot inmergedExpressionswith a newAlias(freshexprId). ThenewNPMappingbuilt earlier in the same call still pointed at the single-wrap alias's attribute, so the parentAggregatewas rewritten to reference an attribute that was no longer in the mergedProject's output. The resulting plan failed analysis withMISSING_ATTRIBUTES.RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION.Minimal reproducer (fails on master before this PR):
For (2): the branches in question are unreachable by case analysis and the appended
cpFilter.toSeq/npFilter.map(_._1).toSeqwould duplicate an attribute already present inmergedChild.output. Removing them makes the reachable contract explicit.Does this PR introduce any user-facing change?
No. The bug was only observable as an analyzer failure, and only when
spark.sql.optimizer.mergeSubplans.symmetricFilterPropagation.enabled(which defaults tofalse) was enabled together with subqueries whose merge path exercises non-attributeProjectexpressions on both sides. Behavior otherwise matches the released master.How was this patch tested?
MergeSubplansSuite:"SPARK-40193: Merge non-grouping subqueries with different filter conditions and non-attribute Project expressions on both sides"— fails on master without the fix (analysis error), passes with the fix.MergeSubplansSuite(42 tests) andPlanMergeSuite(12 tests) continue to pass.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.5
This pull request and its description were written by Isaac.