A hash table is type of key-value data store. Nearly every coding language includes some form of hash table by default. In Ruby, this is called a "hash", in python a "dictionary", in JavaScript an "object" or "map".
Understanding the Basics
Typically, a hash table is implemented with an associative array where each index in the array provides a slot for storing data, in this case key-value pairs. The basic operations we'll discuss here are:
- Insertion: adding key-value pairs
- Access/Search: finding a value in the store based on a key
- Deletion: deleting a key-value pair
Hash Function
All of these operations (insertion, removal, and access) depend on a well written hash function. On a high level, a hash function takes an input of any size and converts that to an output of a fixed size. The output of the hash function is used as the index in the associative array for storing the data.
For a hash table, the hash function must do the following:
- Convert the input (key) to an index that exists in the associative array
- Distribute hashed values evenly to reduce collisions
- Perform in constant time
- Be deterministic - always convert the same input to the same output
Handling Collisions
A collision results when the hash of multiple keys result in the same output (or index in our associative array).
There are multiple solutions for handling collisions. One solution is called Separate Chaining, where we use a second data structure at each index, perhaps an array or linked list, to list all pairs whose key is hashed to that index. In order to perform any of our basic functions, we first iterate through the list at that index to check for the key.
If every hashed key "collides" or results in the same index in the associative array, we end up performing an O(N) check for each function. With minimal collisions however, we typically will have such few items at each index that each operation is just O(1).
The best hash tables, such as the built-in hash tables provided by coding languages, minimize these collisions in order to ensure the best time complexity for standard operations. A well written hash table ensures that:
- It allocates a large enough associative array to provide many slots for data .
- It uses a high quality hash function to evenly distribute keys within the associative array.
Using the Hash Table
Insertion
To add an element, the key is hashed to get an index in the associative array. If the key already exists in the array at that index, the existing key's value is updated. Otherwise, the new key-value pair is pushed onto the inner array.
Access/Search (By Key)
Getting the value of a key follows a similar process to insertion. First, the key is hashed to get the index. That index in the array is then visited. If the key is found at that index, the value of the key is returned. If the key is not found, it does not exist and some falsey signifier is returned.
Note: It is also possible to search a hash table by value. However, this is not an optimal use of a hash table, and the Big O will be linear or O(N).
Removal
Again, the key is hashed to get the index where it is stored in the associative array. The array at that index is searched. If the key is found, the key-value pair is removed. If it's not found, nothing else happens.
The Benefits of a Hash Table:
A hash table is optimized for insertion, removal, and searching based on the key. A well written hash table has constant (O(1)) "amortized" time for these operations. Thus, this can be an excellent choice for storing unordered data. It provides much faster access than an array, and is virtually guaranteed to be faster that an array for insertion or deletion, when the data's index in an array is unknown.
Top comments (0)