[SPARK-50593][SQL] SPJ: Support truncate transform#49211
Conversation
|
Putting this PR up for discussion about this problem, as I may lack the context. This PR has one way to support truncate and any future transform in this category (single attribute reference but multi-argument), but wondering if its overkill or is there any better way. Or do we need to add an explicit TruncateTransform, and support it like BucketTransform? |
There was a problem hiding this comment.
This is to fix the 'non-clustered distribution: V2 function with multiple args' test below, which now needs to compare the actual partitionRow by values (and not just reference).
7862f50 to
2f37bc9
Compare
|
Note : This unfortunately doesn't work to identify the negative case (not making SPJ when truncate has different arguments), without breaking some existing test in EnsureRequirementSuite. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Truncate(col, len) partition transform is not supported in SPJ.
It seems for transforms with literal args, support was added for write distribution and ordering in #37749 but somehow storage-partition join does not support it despite the comment in : https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java#L129
This change extends SPJ support to transforms where there is a singular column reference attribute and other literal attributes. Note, it seems bucket (another instance of this) is supported in SPJ via another mechanism, by having an explicit BucketTransform, but not sure it is necessary here.
Why are the changes needed?
A join between tables partitioned by truncate transform, on the partition column, do not benefit from skipping shuffle.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New unit test in KeyGroupedPartitioningSuite
Was this patch authored or co-authored using generative AI tooling?
No