Hashmaps
In C++, hashmaps plays a crucial role in efficient data storage and retrieval. To understand how hashmaps work, let's delve into the concepts of hashing and Hash Tables.
Hashing
First, let's start with the concept of hashing, it is a process of converting data (text, numbers, files, or anything) into a fixed-length string of letters and numbers known as a Hash Code. This transformation is achieved through a specialized algorithm called a Hash Function.
Hashing Techniques in PPT format
How hashing works?
Input data is the key
The input data to be hashed is also referred to as the key. Keys can be in various formats, such as a string of text, a list of numbers, an image, or even an application file.
The hash function
The core component of any hashing process is the hash function. This function takes the key and converts it into a fixed-length string of characters.
Where hashing works?
- In large database structures, searching through all index values across different levels to reach the desired data block can be time-consuming. Hashing addresses this issue by allowing faster searches using shorter hashed keys instead of the original values.
- Hashing is employed to index and retrieve items in a database, providing a quicker way to search for specific items using their hashed keys.
- It proves to be an efficient method for determining the direct location of a data record on a disk without relying on complex index structures.
- Hashing is also beneficial for implementing dictionaries.
Hash Tables
Hash tables are like organized boxes where we can quickly put things and find them later. They are handy for storing and finding information quickly. Imagine a bunch of labeled boxes (buckets) where we keep our stuff (data elements). We use a special method called hashing to decide which box to put each thing in based on its characteristics. This helps us easily locate and retrieve our items when needed.
Structure of Hash Tables
- Array: The backbone of a hash table is an array where data elements are stored. Each element in the array is often referred to as a bucket or slot.
- Hash Function: Hash tables use a hash function to compute an index or hash code for each data element. This index determines the position where the element will be stored in the array.
- Collision Handling: Since multiple data elements may hash to the same index, collisions can occur. Hash tables employ various collision resolution techniques to handle this issue, such as chaining or open addressing.
- Key-Value Pairs: Data elements stored in a hash table are typically in key-value pairs. The key is used to compute the hash code, and the value is the associated data.
Complexity Analysis
The complexity of a hash table depends on the hash function picked. The more collisions it generates, the more the complexity tends toward Θ(n). Hash tables have a Θ(1) complexity in best and average-case scenarios but fall to Θ(n) in their worst-case scenario.
Hashing Concept in C++
We can say that Hashing is pre-storing and then fetch when needed. let's check some hashing methods:
Number Hashing
Consider an array:
Given Number lies in the range <=0 and >=8
arr[] = {2, 3, 3, 1, 2, 3, 4, 6, 7, 1}
First, we have to do prefetching for that we will make a has array as shown in the diagram
hash[9] = {0}
new hash array initialized with 0
Now We have to prefetch the values in the has array
according to the number of times, the value appears like this:
The provided information is clear and provides insights into the occurrences of values in the given array. However, I suggest a slight modification for improved clarity and organization:
From the diagram above, we can easily extract several details about the given array:
- 1 appears 2 times
- 2 appears 2 times
- 3 appears 3 times
- 4 appears 1 time
- 6 appears 1 time
- 7 appears 1 time For all other values, the occurrences are 0 times. This can be expressed as hash[1] = 2, hash[2] = 2, hash[3] = 3, hash[4] = 0, and so on.
Additionally, it's noteworthy to consider the maximum array size in C++ under different scenarios:
Case 1: (Declared Inside Main Function)
The maximum size for an array declared inside the main function is:
-
arr[10^6]
(for an integer array) -
arr[10^7]
(for a boolean array) Case 2: (Declared Globally) - If declared globally, the maximum size can extend up to
arr[10^7]
. In this case, every element in the array is initialized to 0 by default if not explicitly assigned.
Try to solve this problem with this approach and you can see how easily hashing solves our problem: Count Frequency
Character Hashing
In character hashing, a process similar to number hashing, characters are mapped into an array.
The approach to achieving this in programming involves utilizing ASCII values.
ASCII aka American Standard Code for Information Interchange
Consider an example:
int x = 'a'; //Automatically typecast 'a' to 97 which is ASCII code
i.e. similar to int x = 97;
therefore, 'b' is 98, 'c' is 99, ..., y is 121 and z is 122
Let's get back to how can we hash it,
A quick question for you, if you what is the value of x here
x = 'a' - 'a'
Yes! you guessed it right, x = 0 here
if we do x = 'b' - 'a', then x = 1
.... x = 'z' - 'a' gives 25 i.e. x = 25
here we get our range i.e. 0 to 25 (and there are 26 letters in English alphabets)
Consider an example
s = "abcdbeadf"
Similar to number hashing, we have to do prefetching for that we will make a has array
as shown in diagram:
hash[26] = {0}
new hash array initialized with 0
Now We have to prefetch the values in the has array
according to the number of times the value appears like this:
This exploration of hashmaps and hash tables is just the tip of the iceberg. There's a vast sea of knowledge and applications waiting to be uncovered. Keep diving deeper, learning more, and discovering the multitude of possibilities that these powerful data structures bring to your programming toolkit. Happy coding! 🚀📚
Top comments (0)