I recently improved the badly chosen load factor (old: 0.9, new: 0.75) and the table initial bucket count (old: 1024, new: 1031). Experiments have shown this gave a major speedup, but it is still unknown whether the hashtable works as it should (O(1) lookup speed) or just works much better as before, but still not at maximum performance.
Insert: statistics about how long the linear search usually is to find a free bucket.
Have: key x, bucket = hash(x), is it free? if not, how many buckets did we need to search until we found a free one?
also: how well did the hash function spread over all the buckets?
Lookup: statistics about how long the linear search usually is to find the correct bucket.
Wanted: key x, bucket = hash(x), is it there? if not, how many buckets did we need to visit until we found key x?
Implementation note: data gathering needs to be coded in C, analysis could by in Cython or Python.
I recently improved the badly chosen load factor (old: 0.9, new: 0.75) and the table initial bucket count (old: 1024, new: 1031). Experiments have shown this gave a major speedup, but it is still unknown whether the hashtable works as it should (O(1) lookup speed) or just works much better as before, but still not at maximum performance.
Insert: statistics about how long the linear search usually is to find a free bucket.
Have: key x, bucket = hash(x), is it free? if not, how many buckets did we need to search until we found a free one?
also: how well did the hash function spread over all the buckets?
Lookup: statistics about how long the linear search usually is to find the correct bucket.
Wanted: key x, bucket = hash(x), is it there? if not, how many buckets did we need to visit until we found key x?
Implementation note: data gathering needs to be coded in C, analysis could by in Cython or Python.