[SPARK-17495] [SQL] Add more tests for hive hash#17049
Conversation
|
ok to test |
|
Test build #73395 has finished for PR 17049 at commit
|
| def checkHiveHash(value: Any, dataType: DataType, expected: Long): Unit = { | ||
| // Note : All expected hashes need to be computed using Hive 1.2.1 | ||
| val actual = HiveHashFunction.hash(value, dataType, seed = 0) | ||
| assert(actual == expected) |
There was a problem hiding this comment.
we should add a clue; otherwise we will never be able to tell what's going on if the tests fail on those randomized vlaues.
withClue(s"value is $value") {
assert(..
}
|
Looks good except that comment. |
| val length = struct.numFields | ||
| while (i < length) { | ||
| result = (31 * result) + hash(struct.get(i, types(i)), types(i), seed + 1).toInt | ||
| result = (31 * result) + hash(struct.get(i, types(i)), types(i), 0).toInt |
There was a problem hiding this comment.
Could you explain the reason?
There was a problem hiding this comment.
The seed is something used in murmur3 hash and hive hash does not need it. See original impl in Hive codebase : https://github.com/apache/hive/blob/4ba713ccd85c3706d195aeef9476e6e6363f1c21/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638
Since the methods related to hashing in Spark already had seed, I had to add it in hive-hash. When I compute the hash, I always need to set seed to 0 which is what is done here.
|
Jenkins retest this please |
|
Test build #73398 has started for PR 17049 at commit |
|
Jenkins retest this please The failure in last run was from SparkR tests. All SQL tests had passed. |
|
Merging in master. |
## What changes were proposed in this pull request? This PR adds tests hive-hash by comparing the outputs generated against Hive 1.2.1. Following datatypes are covered by this PR: - null - boolean - byte - short - int - long - float - double - string - array - map - struct Datatypes that I have _NOT_ covered but I will work on separately are: - Decimal (handled separately in apache#17056) - TimestampType - DateType - CalendarIntervalType ## How was this patch tested? NA Author: Tejas Patil <tejasp@fb.com> Closes apache#17049 from tejasapatil/SPARK-17495_remaining_types.
What changes were proposed in this pull request?
This PR adds tests hive-hash by comparing the outputs generated against Hive 1.2.1. Following datatypes are covered by this PR:
Datatypes that I have NOT covered but I will work on separately are:
How was this patch tested?
NA