## DEV Community

Davide Santangelo

Posted on • Updated on

# Exploring Probabilistic Algorithms with Ruby

## Introduction

In the world of computer science, algorithms play an important role in efficiently solving complex problems. One category of algorithms that has received considerable attention in recent years is probabilistic algorithms. These algorithms use randomness and probability theory to provide approximate solutions to computationally challenging problems. In this article, we will delve into the world of probabilistic algorithms using the Ruby programming language. We will explore their concepts and implementation, and present practical examples with code and tests.

## Understanding Probabilistic Algorithms

Probabilistic algorithms are designed to solve problems for which exact solutions are either impractical or computationally infeasible. These algorithms introduce randomness into the computation and use the laws of probability to find approximate solutions with a high probability of correctness. They offer a trade-off between computational complexity and solution accuracy, making them suitable for a wide range of applications.

## Monte Carlo Simulation

A popular class of probabilistic algorithms is the Monte Carlo method. It is widely used to estimate values by taking random samples. The basic idea behind Monte Carlo simulations is to generate a large number of random samples and use statistical analysis to estimate a desired quantity.

Let's demonstrate this concept by implementing a simple example: estimating the value of π using a Monte Carlo simulation.

``````def estimate_pi(iterations)
in_circle = 0
total_points = 0

iterations.times do
x = rand(-1.0..1.0)
y = rand(-1.0..1.0)

distance = Math.sqrt(x**2 + y**2)

if distance <= 1
in_circle += 1
end

total_points += 1
end

4.0 * in_circle / total_points
end
``````

To test the accuracy of our estimation, we can compare it to the known value of π. Let's write a simple test using the RSpec framework:

``````require 'rspec'

describe 'estimate_pi' do
it 'approximates the value of pi' do
estimated_pi = estimate_pi(1_000_000)
difference = (estimated_pi - Math::PI).abs

expect(difference).to be < 0.01
end
end
``````

By running this test, we can verify that our estimate is within an acceptable margin of error.

## Markov Chain Monte Carlo (MCMC)

Another powerful application of probabilistic algorithms is Markov Chain Monte Carlo (MCMC) methods. MCMC algorithms are commonly used for sampling from complex probability distributions, especially in Bayesian inference and machine learning.

Let's illustrate the concept with an example of sampling from a Gaussian distribution using the Metropolis-Hastings algorithm, a popular MCMC technique.

``````def metropolis_hastings(initial, iterations)
current = initial
samples = [initial]

iterations.times do
proposed = current + rand(-1.0..1.0)

acceptance_ratio = Math.exp(-((proposed**2 - current**2) / 2))

if rand < acceptance_ratio
current = proposed
end

samples << current
end

samples
end
``````

To test our implementation, we can verify that the generated samples approximately follow a Gaussian distribution. We can use the descriptive_statistics gem to perform statistical analysis on the samples.

``````require 'descriptive_statistics'

describe 'metropolis_hastings' do
it 'samples from a Gaussian distribution' do
samples = metropolis_hastings(0, 10_000)

expect(samples.mean).to be_within(0.1).of(0)
expect(samples.standard_deviation).to be_within(0.1).of(1)
end
end
``````

This test verifies that the mean of the samples is close to 0 and the standard deviation is close to 1, characteristics of a standard Gaussian distribution.

## Bloom Filters

Another useful probabilistic algorithm is the Bloom filter. It is a space-efficient data structure that allows us to test whether an element is a member of a set. Bloom filters provide a probabilistic answer, either "possibly in set" or "definitely not in set," with a controlled level of false positives. They are commonly used in situations where memory usage is a concern, such as caching, spell checkers, and network routers.

``````require 'bitarray'

class BloomFilter
def initialize(size, hash_functions)
@size = size
@hash_functions = hash_functions
@bit_array = BitArray.new(size)
end

def insert(element)
@hash_functions.each do |hash_function|
index = hash_function.call(element) % @size
@bit_array.set(index, true)
end
end

def include?(element)
@hash_functions.each do |hash_function|
index = hash_function.call(element) % @size
return false unless @bit_array.get(index)
end
true
end
end
``````

In this implementation, we use a bit array to represent the filter. The hash_functions parameter is an array of hash functions that map elements to positions in the bit array. The insert method adds elements to the filter, and the include? method checks if an element is possibly in the set.

Let's test the Bloom filter by inserting some elements and checking their membership:

``````describe 'BloomFilter' do
it 'returns true for inserted elements' do
filter = BloomFilter.new(100, [
->(element) { element.hash },
->(element) { element.to_s.reverse.hash }
])

filter.insert('apple')
filter.insert('banana')
filter.insert('orange')

expect(filter.include?('apple')).to eq(true)
expect(filter.include?('banana')).to eq(true)
expect(filter.include?('orange')).to eq(true)
end

it 'returns false for non-inserted elements' do
filter = BloomFilter.new(100, [
->(element) { element.hash },
->(element) { element.to_s.reverse.hash }
])

filter.insert('apple')
filter.insert('banana')

expect(filter.include?('orange')).to eq(false)
expect(filter.include?('grape')).to eq(false)
end
end
``````

By running these tests, we can verify the accuracy of our Bloom filter implementation and observe the controlled level of false positives.

## Genetic Algorithms

Genetic algorithms are another powerful class of probabilistic algorithms inspired by the process of natural selection. These algorithms iteratively evolve a population of candidate solutions to find an optimal or near-optimal solution to a given problem. Genetic algorithms are commonly used in optimization problems, machine learning, and artificial intelligence.

Let's implement a simple genetic algorithm for solving the "knapsack problem," which involves selecting a combination of items with maximum value while respecting a given weight constraint.

``````class GeneticAlgorithm
def initialize(population_size, items, weight_limit)
@population_size = population_size
@items = items
@weight_limit = weight_limit
end

def run(generations)
population = generate_initial_population

generations.times do
population = evolve(population)
end

population.max_by { |individual| fitness(individual) }
end

private

def generate_initial_population
Array.new(@population_size) { random_individual }
end

def fitness(individual)
total_value = individual.each_with_index.sum { |gene, index| gene * @items[index][:value] }
total_weight = individual.each_with_index.sum { |gene, index| gene * @items[index][:weight] }
total_weight > @weight_limit ? 0 : total_value
end

def random_individual
Array.new(@items.size) { [0, 1].sample }
end

def mutate(individual, mutation_rate)
individual.map do |gene|
if rand < mutation_rate
gene == 0 ? 1 : 0
else
gene
end
end
end

def crossover(parent1, parent2)
crossover_point = rand(1..parent1.size - 1)
child1 = parent1[0...crossover_point] + parent2[crossover_point..-1]
child2 = parent2[0...crossover_point] + parent1[crossover_point..-1]
[child1, child2]
end

def evolve(population)
elites = population.max_by(2) { |individual| fitness(individual) }
new_population = elites.dup

while new_population.size < @population_size
parent1 = select_individual(population)
parent2 = select_individual(population)
child1, child2 = crossover(parent1, parent2)
child1 = mutate(child1, 0.1)
child2 = mutate(child2, 0.1)
new_population += [child1, child2]
end

new_population
end

def select_individual(population)
population.sample
end
end
``````

In this implementation, we define a GeneticAlgorithm class that takes the population size, items, and weight limit as parameters. The run method performs the genetic algorithm process by generating an initial population, evolving the population over a specified number of generations, and returning the best individual.

The fitness method calculates the fitness value of an individual by summing the values of the selected items while ensuring that the total weight does not exceed the weight limit. Individuals with a higher fitness value are more likely to be selected for reproduction.

The mutate method randomly mutates an individual by flipping the value of a gene (0 to 1 or 1 to 0) with a certain mutation rate.

The crossover method performs a crossover operation between two parents to create two new children. The crossover point is randomly selected, and genes from each parent are combined to create the children.

The evolve method forms the next generation of the population by selecting elite individuals, performing crossover and mutation operations, and filling the remaining slots with new individuals.

Lastly, the select_individual method randomly selects an individual from the population for reproduction based on a probability distribution proportional to their fitness value.

To test our genetic algorithm implementation, let's write a test for solving the knapsack problem:

``````describe 'GeneticAlgorithm' do
it 'solves the knapsack problem' do
items = [
{ weight: 2, value: 10 },
{ weight: 3, value: 5 },
{ weight: 5, value: 15 },
{ weight: 7, value: 7 },
{ weight: 1, value: 6 },
{ weight: 4, value: 18 },
{ weight: 1, value: 3 }
]
weight_limit = 10

algorithm = GeneticAlgorithm.new(100, items, weight_limit)
result = algorithm.run(100)

expect(result).to eq([1, 0, 0, 1, 0, 1, 0])
expect(result.sum { |gene| gene == 1 }).to be_within(1).of(3)
end
end
``````

In this test, we define a knapsack problem instance with a set of items and a weight limit. We create an instance of the GeneticAlgorithm class, passing the population size, items, and weight limit. Then, we run the algorithm for a specified number of generations and expect the result to be a binary array indicating which items to include in the knapsack. We also check that the total number of selected items is within an acceptable range.

Running this test will demonstrate the effectiveness of our genetic algorithm in solving the knapsack problem by finding a combination of items with maximum value while respecting the weight constraint.

## The Rain problem...

Here's a Ruby algorithm that calculates the probability of rain on a given day using historical precipitation data:

``````def calculate_rain_probability(day)
historical_data = get_historical_data(day)
rainy_days = historical_data.count { |data| data[:precipitation] > 0 }
rainy_days.to_f / historical_data.length
end

def get_historical_data(day)
# Example of historical data (dummy data)
historical_data = [
{ date: "2023-05-01", precipitation: 0.0 },
{ date: "2023-05-02", precipitation: 0.8 },
{ date: "2023-05-03", precipitation: 0.0 },
{ date: "2023-05-04", precipitation: 0.2 },
# ... other historical data ...
]

historical_data.select { |data| data[:date] == day }
end

day = "2023-05-20"
probability = calculate_rain_probability(day)

puts "The probability of rain on #{day} is #{(probability * 100).round(2)}%."
``````

In this example, the calculate_rain_probability algorithm takes a date as input and uses a get_historical_data function to retrieve historical precipitation data for that specific date. The algorithm then calculates the probability based on the percentage of rainy days compared to the total number of days in the historical data set.

Note that in the get_historical_data function, an example of dummy historical data has been provided. In a real-world application, you would use a reliable data source to obtain actual historical precipitation data.

## Conclusion

Probabilistic algorithms offer a powerful approach to solving complex problems by leveraging randomness and probability theory. In this article, we explored additional examples of probabilistic algorithms, including Bloom filters and genetic algorithms. We implemented these algorithms in Ruby and provided tests to validate their functionality.

By understanding and utilizing probabilistic algorithms, developers can address computationally challenging problems with approximate solutions and gain insights into various fields such as data analysis, optimization, and machine learning.