Skip to content

[WIP][VL] Fix inconsistency issue of PartitionFile path unescaping & GPL issue#8793

Merged
FelixYBW merged 2 commits into
apache:mainfrom
yaooqinn:GPL
Feb 20, 2025
Merged

[WIP][VL] Fix inconsistency issue of PartitionFile path unescaping & GPL issue#8793
FelixYBW merged 2 commits into
apache:mainfrom
yaooqinn:GPL

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn commented Feb 20, 2025

What changes were proposed in this pull request?

GlutenURLDecoder.java is copied from OpenJDK and it's under GPL v2 which belongs to Category X, we can't have it in Apache Releases.

URLDecoder decode/encode is not fully compatible with the Hive catalog path escaping/unescaping, which Spark also follows.

Besides, apache/spark#46938 has improved unescapePathName's speed at the Spark side by ~10x. So This PR also helps gluten gain perf which handles datasets w/ large partition numbers.

How was this patch tested?

The current tests shall be fine.

@github-actions github-actions Bot added the VELOX label Feb 20, 2025
@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@FelixYBW
Copy link
Copy Markdown
Contributor

Thank you for the catch.

@FelixYBW FelixYBW merged commit 18cbb27 into apache:main Feb 20, 2025
@yaooqinn yaooqinn deleted the GPL branch February 21, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants