Hash Tables
JB Updated on ãƒ»2 min read
CS Level Up Series (29 Part Series)
1) CS Level Up Series Introduction
2) Dynamic Arrays
3 ... 27
3) Linked Lists
4) Stacks
5) Queues
6) Hash Tables
7) Binary Search Trees
8) Binary Heaps
9) Priority Queues
10) Graphs
11) Tries
12) Binary & Bit Manipulation
13) Common Sorting Algorithms
14) Searching Algorithms
15) Permutations, Combinations, & Subsets
16) NP-Complete & Fibonacci Heap
17) Detecting Graph Cycles With Depth-First Search
18) Finding Shortest Paths In Graphs (using Dijkstra's & BFS)
19) Topological Sorting of Directed Acyclic Graphs (DAGs)
20) Finding Articulation Points & Bridges in Undirected Graphs
21) Finding Strongly Connected Components in Directed Graphs using Tarjan's Algorithm
22) Checking If An Undirected Graph Is Bipartite
23) Extending An Iterator
24) Union-find (Disjoint-set)
25) Minimum Spanning Tree (Kruskal's Algorithm)
26) Sliding Window Technique
27) String Searching Using Rabin-Karp
28) Fenwick Tree (Binary Indexed Tree)
29) Dynamic Programming
Resources
- Hash table overview
- Hashing explanation
- Hash table explanation
- Hash table overview + implementation
- In depth explanation - warning this one is a long read (~60 mins). But there is too much good, approachable, info in there to not mention it.
Takeaways:
- A hash table is essentially an array of buckets.
- Each bucket contains a key value pair (KVP).
- When inserting a KVP into a hash table, the key is hashed and modulo'd by the max size/capacity of the array. The resulting integer is the index of the array the KVP should be inserted at.
- Hashing of keys can be done using a variety of hashing functions*.
- No hashing function is perfect and collisions do happen.
- Collisions are when we try to insert a KVP at a bucket that is not empty. This can happen when the same hash is generated for different keys. When the generated hash is the same, the resulting indexes are the same.
- There are two ways to perform collision resolution: Separate Chaining and Open Addressing (also known as Closed Hashing).
- Separate Chaining means that each bucket in our array is a Linked List. When a collision occurs the KVP gets added to the Linked List. This means multiple KVPs can reside at the same index.
- Open Addressing uses a technique called probing to find the next available index to insert a KVP at. The order you search for an available bucket is the probe sequence. One type of probe sequence is linear probing - where we just increment the index by 1 until we find a free bucket in the array.
- One thing that determines how likely collisions are is the Load Factor of a hash table's underlying array. Load factor is: Number of KVPs/Number of Buckets.
- More robust hash table implementations will resize the underlying array when the Load Factor has reached a certain point. In my implementations I resize the arrays when the Load Factor reaches 0.75.
- Searching, Inserting, and Deleting are all worst case O(n) (linear), but for each the amortized time is O(1) (constant). This is because collisions should be rare (if you use a good hashing function and keep the Load Factor below a certain threshold).
- Space for hash tables is O(n).
*The most common hashing functions I came across were: DJB2, FNV-1a, and Murmur. FNV-1a is pretty simple, as is DJB2. As it turns out FNV-1a and Murmur are probably the best hashing functions to rely on. I personally like FNV-1a for how simple, yet effective, it is.
Below is the finished implementation of two Hash Tables (with test code). One is implemented using separate chaining and the other open addressing:
As always, if you found any errors in this post please let me know!
CS Level Up Series (29 Part Series)
1) CS Level Up Series Introduction
2) Dynamic Arrays
3 ... 27
3) Linked Lists
4) Stacks
5) Queues
6) Hash Tables
7) Binary Search Trees
8) Binary Heaps
9) Priority Queues
10) Graphs
11) Tries
12) Binary & Bit Manipulation
13) Common Sorting Algorithms
14) Searching Algorithms
15) Permutations, Combinations, & Subsets
16) NP-Complete & Fibonacci Heap
17) Detecting Graph Cycles With Depth-First Search
18) Finding Shortest Paths In Graphs (using Dijkstra's & BFS)
19) Topological Sorting of Directed Acyclic Graphs (DAGs)
20) Finding Articulation Points & Bridges in Undirected Graphs
21) Finding Strongly Connected Components in Directed Graphs using Tarjan's Algorithm
22) Checking If An Undirected Graph Is Bipartite
23) Extending An Iterator
24) Union-find (Disjoint-set)
25) Minimum Spanning Tree (Kruskal's Algorithm)
26) Sliding Window Technique
27) String Searching Using Rabin-Karp
28) Fenwick Tree (Binary Indexed Tree)
29) Dynamic Programming
Thank you so much for this series. The takeaways at the beginning are really helpful!