Open addressing

#computerscience #datastructure #tutorial

If you ever wondered how collisions are handled in hash tables, chances are you've heard about open addressing.

The goal of a hash table is to construct a mapping from a set of keys to a set of values.
Hash table uses a hash function to compute an index (hash) that we store in an array.

When two keys have the same hash value hashf(k1) = hashf(k2)
collision happens. Open addressing is one of ways to avoid it.
As opposed to separate chaining where we use some sort of a list for entries with the same index, in open addressing we keep all the key-value pairs in the array itself.
That means that we have to pay attention on the size of the array in order to have enough space for all entries.

Personally, I would keep the load factor ( α ) less than 0.5 and once we reach the threshold, simply double the array size.

Now let's see how open addressing works. Firstly, when a new entry has to be inserted, we check if the corresponding slot is empty, if it is we insert the element. If not, we proceed with a probing sequence.

x = 1
keyHash = H(k)
index = keyHash

while table[index] != null:
    index = (keyHash + P(k, x)) mod N
    x = x + 1

N -> table size
H -> hash function
P -> Probing function

Be wary when choosing a probing sequence since some of them may produce cycle shorter than N and as a result you'll get stuck in an infinite loop.

Open Addressing techniques

Linear Probing

When collision occurs, we linearly probe for the next bucket.
We keep probing until an empty bucket is found.
Linear probing function could look like this

P(x) = a * x

Now if we had a table of size N = 9 and probing function P(x) = a * x, where a = 6 we wouldn't be able to reach all the buckets in the array and as a consequence, would be stuck in an infinite loop.
In order to avoid this, N and a should be relatively prime (Their global common denominator should be equal to 1).

Quadratic Probing

Similar as linear probing but instead of a linear function we use a quadratic one.

P(x) = a * x² + b * x

There are a few ways to avoid infinite loop

P(x) = x² => keep the table size a prime number and > 3 and keep α <= 1/2
P(x) = (x² + x) / 2 => keep the table size a power of two.
P(x) = (-1^x) * x² => keep the table size a prime N where N ≡ 3 mod 4

Double Hashing

Double hashing uses the idea of applying a second hash function to key when a collision occurs.
Probing method in double hashing could look like this

P(k, x) = x * secondHashFunc(k)

To avoid infinite loop when using double hashes make sure that the table size N is a prime number and calculate the delta
δ = secondHashFunc(k) mod N.
If δ happens to be 0, set it to 1.

There are many variations of open addressing such as Coalesced Hashing, Cuckoo Hashing, Robin Hood Hashing... but they are beyond the scope of this post. I recommend that you explore them on your own.

Key points to take from this post: