feat: Support murmur3_hash and sha2 family hash functions#226
feat: Support murmur3_hash and sha2 family hash functions#226viirya merged 4 commits intoapache:mainfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #226 +/- ##
============================================
- Coverage 33.41% 33.35% -0.06%
- Complexity 768 770 +2
============================================
Files 107 107
Lines 36329 37057 +728
Branches 7935 8110 +175
============================================
+ Hits 12138 12361 +223
- Misses 21643 22097 +454
- Partials 2548 2599 +51 ☔ View full report in Codecov by Sentry. |
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
|
|
||
| case Md5(child) => | ||
| val childExpr = exprToProtoInternal(Cast(child, StringType), inputs) | ||
| val childExpr = exprToProtoInternal(child, inputs) |
There was a problem hiding this comment.
I don't think we need to add a cast here? As Spark will perform the cast to Binary type and DataFusion supports both utf8 and binary input now.
There was a problem hiding this comment.
Is binary support added to DataFusion Md5 recently?
There was a problem hiding this comment.
hmmm, I think binary support is added by this PR: apache/datafusion#3124, which is pretty old.
Anyway, currently all the digest func supports both utf8 and binary input.
1c63018 to
e55f8fc
Compare
e55f8fc to
8931c10
Compare
|
I will review this soon. Thanks. |
|
|
||
| // TODO: enable this when we add md5 function to Comet | ||
| ignore("md5") { | ||
| test("md5") { |
There was a problem hiding this comment.
Hmm, I remember we explicitly disable it because DataFusion crypto_expressions feature includes blake3 which cannot be built on Mac platform.
There was a problem hiding this comment.
I don't see you add crypto_expressions to Cargo.toml, how does it work?
There was a problem hiding this comment.
hmm, I think crypto_expressions is enabled as default features in DataFusion?
There was a problem hiding this comment.
I remember we explicitly disable it because DataFusion crypto_expressions feature includes blake3 which cannot be built on Mac platform.
Do you have any issues to track this one? I think I can build DataFusion by default on Apple Silicon.
There was a problem hiding this comment.
hmm, I think
crypto_expressionsis enabled as default features in DataFusion?
Yes, but we don't use default features:
default-features = false
There was a problem hiding this comment.
I remember we explicitly disable it because DataFusion crypto_expressions feature includes blake3 which cannot be built on Mac platform.
Do you have any issues to track this one? I think I can build DataFusion by default on Apple Silicon.
Not sure if the crate is updated to fix that. We encountered the issue and disabled crypto_expressions one year ago (internally, before we open sourced Comet).
Maybe it is okay now. But I'm wondering as we don't add back crypto_expressions feature, is md5 function working? I think it is guarded by this feature in DataFusion.
There was a problem hiding this comment.
But I'm wondering as we don't add back crypto_expressions feature, is md5 function working? I think it is guarded by this feature in DataFusion.
That's new.
I did a quick digging. It seems that the crypto expressions are enabled in datafusion-functions as it's enabled as default features.
Let me ensure the crypto_expressions feature is enabled then.
| } | ||
|
|
||
| case Murmur3Hash(children, seed) if children.forall(c => supportedDataType(c.dataType)) => | ||
| // TODO: support list/map/struct type for murmur3 hash |
There was a problem hiding this comment.
This TODO seems unnecessary as other expressions also don't support nested types.
There was a problem hiding this comment.
I see.
Let me create an issue to track the complex(list/map/struct) type support then? I think we can start by supporting them in literal and scalar expressions.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
8931c10 to
991fec0
Compare
|
@viirya do you have any other comments? |
|
I will take another look today. |
|
Merged. Thanks. |
Thanks for your comments and review. |
* feat: Support murmur3_hash and sha2 family hash functions * address comments * apply scalafix * ensure crypto_expressions feature is enabled

Which issue does this PR close?
This partially closes #205
Rationale for this change
More expression coverage for comet
What changes are included in this PR?
spark_murmur3_hashto support murmur3_hash in cometHow are these changes tested?
Added new test code