## DEV Community is a community of 871,998 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

JB

Posted on • Updated on

# Hash Tables

## Resources

1. Hash table overview
2. Hashing explanation
3. Hash table explanation
4. Hash table overview + implementation
5. In depth explanation - warning this one is a long read (~60 mins). But there is too much good, approachable, info in there to not mention it.

Takeaways:

• A hash table is essentially an array of buckets.
• Each bucket contains a key value pair (KVP).
• When inserting a KVP into a hash table, the key is hashed and modulo'd by the max size/capacity of the array. The resulting integer is the index of the array the KVP should be inserted at.
• Hashing of keys can be done using a variety of hashing functions*.
• No hashing function is perfect and collisions do happen.
• Collisions are when we try to insert a KVP at a bucket that is not empty. This can happen when the same hash is generated for different keys. When the generated hash is the same, the resulting indexes are the same.
• There are two ways to perform collision resolution: Separate Chaining and Open Addressing (also known as Closed Hashing).
• Separate Chaining means that each bucket in our array is a Linked List. When a collision occurs the KVP gets added to the Linked List. This means multiple KVPs can reside at the same index.
• Open Addressing uses a technique called probing to find the next available index to insert a KVP at. The order you search for an available bucket is the probe sequence. One type of probe sequence is linear probing - where we just increment the index by 1 until we find a free bucket in the array.
• One thing that determines how likely collisions are is the Load Factor of a hash table's underlying array. Load factor is: Number of KVPs/Number of Buckets.
• More robust hash table implementations will resize the underlying array when the Load Factor has reached a certain point. In my implementations I resize the arrays when the Load Factor reaches 0.75.
• Searching, Inserting, and Deleting are all worst case `O(n)` (linear), but for each the amortized time is `O(1)` (constant). This is because collisions should be rare (if you use a good hashing function and keep the Load Factor below a certain threshold).
• Space for hash tables is `O(n)`.

*The most common hashing functions I came across were: DJB2, FNV-1a, and Murmur. FNV-1a is pretty simple, as is DJB2. As it turns out FNV-1a and Murmur are probably the best hashing functions to rely on. I personally like FNV-1a for how simple, yet effective, it is.

Below is the finished implementation of two Hash Tables (with test code). One is implemented using separate chaining and the other open addressing:

As always, if you found any errors in this post please let me know!

## Discussion (1)

Thorsten Hirsch

Thank you so much for this series. The takeaways at the beginning are really helpful!