This script demonstrates different hash functions and tests how they distribute elements in a hash table. It includes functions for measuring distribution, collisions, execution time, and sensitivity to minor changes.
- Simple Hash Function (
simple_hash
):
This hash function sums the ASCII values of each character in the input string and then takes the result modulo the table size. It's a basic form of hashing and can lead to poor distribution.
def simple_hash(input_str, table_size):
hash_value = 0
for char in input_str:
hash_value += ord(char)
return hash_value % table_size
- Polynomial Hash (
polynomial_hash
):
Uses a polynomial accumulation of ASCII values, with a prime number (default is 31) to give more weight to characters earlier in the string.
def polynomial_hash(input_str, table_size, prime=31):
hash_value = 0
for i, char in enumerate(input_str):
hash_value += ord(char) * (prime ** i)
return hash_value % table_size
- FNV-1a Hash (
fnv1a_hash
):
A 32-bit version of the FNV-1a hash, which multiplies and XORs the input characters using constants to generate a hash value. It's commonly used for its good distribution characteristics.
def fnv1a_hash(key, table_size):
FNV_prime = 16777619
FNV_offset_basis = 2166136261
hash_value = FNV_offset_basis
for char in key:
hash_value ^= ord(char)
hash_value *= FNV_prime
hash_value &= 0xffffffff # Ensure 32-bit hash
return hash_value % table_size
- XXHash (
xx_hash
):
Utilizes the xxhash library to compute a fast, non-cryptographic hash. It’s very efficient for large datasets.
def xx_hash(input_str, table_size):
return xxhash.xxh32(input_str).intdigest() % table_size
- SipHash (
sip_hash
):
Uses the hmac library to wrap the input string with a secret key and apply the sha256 hash function. This can provide security, though it's slower than non-cryptographic hashes.
def sip_hash(input_str, table_size, key=b'secretkey'):
hash_value = hmac.new(key, input_str.encode(), digestmod='sha256').hexdigest()
return int(hash_value, 16) % table_size
- MurmurHash (
murmur_hash
):
A fast, non-cryptographic hash function using the mmh3 library. It's widely used in hash tables and bloom filters.
def murmur_hash(input_str, table_size):
hash_value = mmh3.hash(input_str) % table_size
return hash_value
Generating Random Strings (generate_random_strings
):
This function generates a list of random alphanumeric strings of a specified length. It's used for testing the hash functions.
def generate_random_strings(n, length=12):
random_strings = []
for _ in range(n):
random_str = ''.join(random.choices(string.ascii_letters + string.digits, k=length))
random_strings.append(random_str)
return random_strings
Printing the Distribution (pretty_print_distribution
):
Displays how many elements are present in each location of the hash table, giving insight into the distribution and potential collisions.
def pretty_print_distribution(distribution):
print("Element distribution in the table:")
for num_elements, count in sorted(distribution.items()):
print(f" Locations with {num_elements} element(s): {count}")
Testing Distribution and Collisions (test_distribution_and_collisions)
:
This function evaluates how well a hash function distributes a set of elements across the hash table. It checks for collisions and calculates the final distribution.
def test_distribution_and_collisions(hash_func, table_size, num_elements):
hash_table = [0] * table_size
elements = generate_random_strings(num_elements)
collisions = 0
for elem in elements:
hash_index = hash_func(elem, table_size)
if hash_table[hash_index] != 0:
collisions += 1
hash_table[hash_index] += 1
distribution = collections.Counter(hash_table)
print(f"Total number of collisions: {collisions}")
pretty_print_distribution(distribution)
Testing Execution Time (test_execution_time
):
Measures the time it takes to apply a hash function to a set of elements, helping compare performance between different hash functions.
def test_execution_time(hash_func, table_size, num_elements):
elements = generate_random_strings(num_elements)
start_time = time.time()
for elem in elements:
hash_func(elem, table_size)
end_time = time.time()
execution_time = end_time - start_time
print(f"Total execution time for {num_elements} elements: {execution_time:.6f} seconds")
Testing Sensitivity to Minor Changes (test_sensitivity
):
This function checks whether the hash function is sensitive to small changes in input. It compares the hash values of two very similar strings.
def test_sensitivity(hash_func, table_size):
original_str = "example"
modified_str = "exampLe" # Changing a single character
original_hash = hash_func(original_str, table_size)
modified_hash = hash_func(modified_str, table_size)
print(f"Original hash for '{original_str}': {original_hash}")
print(f"Modified hash for '{modified_str}': {modified_hash}")
if original_hash != modified_hash:
print("The hash function is sensitive to small changes.")
else:
print("The hash function is NOT sensitive to small changes.")
Top comments (0)