DEV Community

Gagan
Gagan

Posted on

Bloom filters

A Bloom filter is a probabilistic data structure used to test whether an element is a member of a set. The Bloom filter was invented by Burton Howard Bloom in 1970. It is a space-efficient data structure that is used to test whether an element is a member of a set or not. It is used in many applications where it is necessary to quickly determine whether an element is in a set or not, without having to store all of the elements of the set.

A Bloom filter is implemented as a bit array of m bits, all initially set to 0. It also has k hash functions, each of which maps or hashes an element to one of the m bits in the array. To add an element to the set, the element is passed through each of the k hash functions, and the resulting bits in the array are set to 1. To check if an element is in the set, the element is passed through each of the k hash functions, and if any of the resulting bits in the array are 0, the element is definitely not in the set. If all of the resulting bits are 1, the element may be in the set, but it could also be a false positive due to the probabilistic nature of the Bloom filter.

Bloom filters are used in many applications, such as in spell-checking, network routers, and databases, where it is necessary to quickly determine whether an element is in a set without having to store all of the elements in the set. They are also used in bioinformatics to quickly identify whether a DNA sequence has been previously observed.

Here's an example of a simple implementation of a Bloom filter in Python:

import mmh3

class BloomFilter:
    def __init__(self, size, hash_count):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [0] * size

    def add(self, item):
        for seed in range(self.hash_count):
            result = mmh3.hash(item, seed) % self.size
            self.bit_array[result] = 1

    def check(self, item):
        for seed in range(self.hash_count):
            result = mmh3.hash(item, seed) % self.size
            if self.bit_array[result] == 0:
                return "Nope"
        return "Maybe"

bf = BloomFilter(32, 5)
bf.add("hello")
bf.add("world")
print(bf.check("hello")) # prints "Maybe"
print(bf.check("hi")) # prints "Nope"

Enter fullscreen mode Exit fullscreen mode

In this example, the BloomFilter class has a constructor that takes in two arguments: the size of the bit array and the number of hash functions to be used. The add method takes an item and adds it to the filter by passing it through each of the hash functions and setting the bits.

Top comments (0)