Taking a closer look at how key-value pairs are stored in Python dictionaries yesterday, I noticed that alongside storing the key+value, the hash of the key is also stored.
Why would that be?
It can't be to speed up the lookup, as we cannot get to the data (the tuple where the hash+key+value is stored) until we know it's address, and address can only be known by computing the hash first.
Why then?
Turns out that when resizing the addresses
vector, the hash for all the keys in items
needs to be recomputed. This is expensive to calculate, thus slowing down the resize operation. Caching the hash speeds things up:
def resize_to(n):
new_addresses = [None] * n
for index, entry in enumerate(entries):
h = perturb = entry[0] # No need to compute the hash.
addr = h % n
while new_addresses[i] is not None:
addr = (5 * addr + 1 + perturb) % n
perturb >>= 5
new_addresses[addr] = index
return new_addresses
Other than this, storing the hash has another benefit -- during lookup, there's an inequality test required, which ensures that we've arrived at the correct hash+key+value pair, and need not probe any further.
The lookup code from the earlier post:
def lookup(key):
perturb = h = hash(key)
address = addresses[h % n]
while items[address][1] != key: # <- inequality test
address = (5 * address + 1 + perturb) % n
perturb >>= 5
return items[address][2]
Comparing two objects for [in]equality can be an expensive operation. The solution here is to rely on the fact that "if two objects have unequal hashes, then the objects must be unequal as well", and use an equality checker like below:
def fast_is_equal(
key, target_key, key_hash, target_key_hash
):
if key is target_key: return True # Identity implies equality
if hash_key != hash_target_key: return False
return key == hash_key
And then during the lookup:
# ...
while not fast_is_equal(
# reference: items[address] -> (hash, key, value)
items[address][1], key, items[address][0], h
):
# ...
So, my initial guess was right in a way: it does indeed speeds up the lookup, just not in a way I thought it'd be.
Top comments (0)