[SPARK-47410][SQL] Refactor UTF8String and CollationFactory#45978
Closed
uros-db wants to merge 14 commits into
Closed
[SPARK-47410][SQL] Refactor UTF8String and CollationFactory#45978uros-db wants to merge 14 commits into
uros-db wants to merge 14 commits into
Conversation
dbatomic
reviewed
Apr 10, 2024
uros-db
commented
Apr 10, 2024
uros-db
commented
Apr 10, 2024
uros-db
commented
Apr 10, 2024
dbatomic
reviewed
Apr 10, 2024
dbatomic
reviewed
Apr 10, 2024
Member
|
Can you please fill the PR description? |
cloud-fan
reviewed
Apr 11, 2024
cloud-fan
reviewed
Apr 11, 2024
cloud-fan
approved these changes
Apr 11, 2024
Contributor
Author
|
updated PR description, stand by for some more small changes before merging |
dbatomic
reviewed
Apr 11, 2024
dbatomic
reviewed
Apr 11, 2024
Contributor
|
Just wanted to thank you for doing this. IMO, things are much cleaner than they used to be. |
dbatomic
approved these changes
Apr 11, 2024
Contributor
Author
|
@cloud-fan all checks look good, ready to merge |
Contributor
|
thanks, merging to master! |
This was referenced Apr 11, 2024
Closed
Closed
cloud-fan
pushed a commit
that referenced
this pull request
May 9, 2024
…unctions/expressions (for UTF8_BINARY & LCASE) Recreating [original PR](#45749) because code has been reorganized in [this PR](#45978). ### What changes were proposed in this pull request? This PR is created to add support for collations to StringTrim family of functions/expressions, specifically: - `StringTrim` - `StringTrimBoth` - `StringTrimLeft` - `StringTrimRight` Changes: - `CollationSupport.java` - Add new `StringTrim`, `StringTrimLeft` and `StringTrimRight` classes with corresponding logic. - `CollationAwareUTF8String` - add new `trim`, `trimLeft` and `trimRight` methods that actually implement trim logic. - `UTF8String.java` - expose some of the methods publicly. - `stringExpressions.scala` - Change input types. - Change eval and code gen logic. - `CollationTypeCasts.scala` - add `StringTrim*` expressions to `CollationTypeCasts` rules. ### Why are the changes needed? We are incrementally adding collation support to a built-in string functions in Spark. ### Does this PR introduce _any_ user-facing change? Yes: - User should now be able to use non-default collations in string trim functions. ### How was this patch tested? Already existing tests + new unit/e2e tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46206 from davidm-db/string-trim-functions. Authored-by: David Milicevic <david.milicevic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR introduces comprehensive support for collation-aware expressions in Spark, focusing on improving code structure, clarity, and testing coverage for various expressions (including: Contains, StartsWith, EndsWith).
Why are the changes needed?
The changes are essential to improve the maintainability and readability of collation-related code in Spark expressions. By restructuring and centralizing collation support through, we simplify the addition of new collation-aware operations and ensure consistent testing across different collation types.
Does this PR introduce any user-facing change?
No, this PR is focused on internal refactoring and testing enhancements for collation-aware expression support.
How was this patch tested?
Unit tests in CollationSupportSuite.java
E2E tests in CollationStringExpressionsSuite.scala
Was this patch authored or co-authored using generative AI tooling?
Yes.