[SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE & ICU collations by uros-db · Pull Request #46720 · apache/spark

uros-db · 2024-05-23T18:01:39Z

What changes were proposed in this pull request?

String lowercase/uppercase conversion in UTF8_BINARY_LCASE now works using ICU default locale, similar to how other ICU collations currently work in Spark.

Why are the changes needed?

All collations apart from UTF8_BINARY should use the same interface (UCharacter) that utilizes ICU toLowerCase/toUpperCase implementation, rather than mixing JVM & ICU implementations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests and e2e sql tests.

Was this patch authored or co-authored using generative AI tooling?

No.

mkaravel

LGTM. Please add a few more interesting test cases for uppercasing in this PR or a follow up one.

cloud-fan · 2024-06-10T16:13:27Z

thanks, merging to master!

uros-db added 2 commits May 23, 2024 19:59

Initial commit

949eb7c

Tests

182c2c5

uros-db changed the title ~~[SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation~~ [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation May 23, 2024

github-actions Bot added the SQL label May 23, 2024

mkaravel reviewed May 24, 2024

View reviewed changes

uros-db added 3 commits May 24, 2024 08:53

Update doc comments

d9a0b11

Remove InitCap

df12951

Undo unnecessary changes

3e70e6d

uros-db changed the title ~~[WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation~~ [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation May 24, 2024

Correct naming

f5a3939

uros-db changed the title ~~[WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation~~ [SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation May 27, 2024

Merge branch 'apache:master' into lower-upper-initcap

aa74fff

uros-db requested a review from mkaravel May 31, 2024 12:20

dbatomic reviewed Jun 4, 2024

View reviewed changes

Comment thread common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java

dbatomic reviewed Jun 4, 2024

View reviewed changes

Comment thread common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java

Small fixes

a2d16c0

uros-db requested a review from dbatomic June 5, 2024 07:58

uros-db changed the title ~~[SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation~~ [SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE & ICU collations Jun 5, 2024

mkaravel approved these changes Jun 7, 2024

View reviewed changes

Comment thread common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java

uros-db added 2 commits June 7, 2024 12:24

Add tests

3646785

Merge branch 'apache:master' into lower-upper-initcap

799a2a0

cloud-fan approved these changes Jun 10, 2024

View reviewed changes

cloud-fan closed this in 61fd936 Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE & ICU collations#46720

[SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE & ICU collations#46720
uros-db wants to merge 10 commits into
apache:masterfrom
uros-db:lower-upper-initcap

uros-db commented May 23, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkaravel left a comment

Uh oh!

Uh oh!

cloud-fan commented Jun 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

uros-db commented May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkaravel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloud-fan commented Jun 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

uros-db commented May 23, 2024 •

edited

Loading