Davide Santangelo

Posted on May 15, 2023 • Edited on May 20, 2023

Exploring Probabilistic Algorithms with Ruby

#ruby #math #computerscience #programming

Introduction

In the world of computer science, algorithms play an important role in efficiently solving complex problems. One category of algorithms that has received considerable attention in recent years is probabilistic algorithms. These algorithms use randomness and probability theory to provide approximate solutions to computationally challenging problems. In this article, we will delve into the world of probabilistic algorithms using the Ruby programming language. We will explore their concepts and implementation, and present practical examples with code and tests.

Understanding Probabilistic Algorithms

Probabilistic algorithms are designed to solve problems for which exact solutions are either impractical or computationally infeasible. These algorithms introduce randomness into the computation and use the laws of probability to find approximate solutions with a high probability of correctness. They offer a trade-off between computational complexity and solution accuracy, making them suitable for a wide range of applications.

Monte Carlo Simulation

A popular class of probabilistic algorithms is the Monte Carlo method. It is widely used to estimate values by taking random samples. The basic idea behind Monte Carlo simulations is to generate a large number of random samples and use statistical analysis to estimate a desired quantity.

Let's demonstrate this concept by implementing a simple example: estimating the value of π using a Monte Carlo simulation.

def estimate_pi(iterations)
  in_circle = 0
  total_points = 0

  iterations.times do
    x = rand(-1.0..1.0)
    y = rand(-1.0..1.0)

    distance = Math.sqrt(x**2 + y**2)

    if distance <= 1
      in_circle += 1
    end

    total_points += 1
  end

  4.0 * in_circle / total_points
end

To test the accuracy of our estimation, we can compare it to the known value of π. Let's write a simple test using the RSpec framework:

require 'rspec'

describe 'estimate_pi' do
  it 'approximates the value of pi' do
    estimated_pi = estimate_pi(1_000_000)
    difference = (estimated_pi - Math::PI).abs

    expect(difference).to be < 0.01
  end
end

By running this test, we can verify that our estimate is within an acceptable margin of error.

Markov Chain Monte Carlo (MCMC)

Another powerful application of probabilistic algorithms is Markov Chain Monte Carlo (MCMC) methods. MCMC algorithms are commonly used for sampling from complex probability distributions, especially in Bayesian inference and machine learning.

Let's illustrate the concept with an example of sampling from a Gaussian distribution using the Metropolis-Hastings algorithm, a popular MCMC technique.

def metropolis_hastings(initial, iterations)
  current = initial
  samples = [initial]

  iterations.times do
    proposed = current + rand(-1.0..1.0)

    acceptance_ratio = Math.exp(-((proposed**2 - current**2) / 2))

    if rand < acceptance_ratio
      current = proposed
    end

    samples << current
  end

  samples
end

To test our implementation, we can verify that the generated samples approximately follow a Gaussian distribution. We can use the descriptive_statistics gem to perform statistical analysis on the samples.

require 'descriptive_statistics'

describe 'metropolis_hastings' do
  it 'samples from a Gaussian distribution' do
    samples = metropolis_hastings(0, 10_000)

    expect(samples.mean).to be_within(0.1).of(0)
    expect(samples.standard_deviation).to be_within(0.1).of(1)
  end
end

This test verifies that the mean of the samples is close to 0 and the standard deviation is close to 1, characteristics of a standard Gaussian distribution.

Bloom Filters

Another useful probabilistic algorithm is the Bloom filter. It is a space-efficient data structure that allows us to test whether an element is a member of a set. Bloom filters provide a probabilistic answer, either "possibly in set" or "definitely not in set," with a controlled level of false positives. They are commonly used in situations where memory usage is a concern, such as caching, spell checkers, and network routers.

require 'bitarray'

class BloomFilter
  def initialize(size, hash_functions)
    @size = size
    @hash_functions = hash_functions
    @bit_array = BitArray.new(size)
  end

  def insert(element)
    @hash_functions.each do |hash_function|
      index = hash_function.call(element) % @size
      @bit_array.set(index, true)
    end
  end

  def include?(element)
    @hash_functions.each do |hash_function|
      index = hash_function.call(element) % @size
      return false unless @bit_array.get(index)
    end
    true
  end
end

In this implementation, we use a bit array to represent the filter. The hash_functions parameter is an array of hash functions that map elements to positions in the bit array. The insert method adds elements to the filter, and the include? method checks if an element is possibly in the set.

Let's test the Bloom filter by inserting some elements and checking their membership:

describe 'BloomFilter' do
  it 'returns true for inserted elements' do
    filter = BloomFilter.new(100, [
      ->(element) { element.hash },
      ->(element) { element.to_s.reverse.hash }
    ])

    filter.insert('apple')
    filter.insert('banana')
    filter.insert('orange')

    expect(filter.include?('apple')).to eq(true)
    expect(filter.include?('banana')).to eq(true)
    expect(filter.include?('orange')).to eq(true)
  end

  it 'returns false for non-inserted elements' do
    filter = BloomFilter.new(100, [
      ->(element) { element.hash },
      ->(element) { element.to_s.reverse.hash }
    ])

    filter.insert('apple')
    filter.insert('banana')

    expect(filter.include?('orange')).to eq(false)
    expect(filter.include?('grape')).to eq(false)
  end
end

By running these tests, we can verify the accuracy of our Bloom filter implementation and observe the controlled level of false positives.

Genetic Algorithms

Genetic algorithms are another powerful class of probabilistic algorithms inspired by the process of natural selection. These algorithms iteratively evolve a population of candidate solutions to find an optimal or near-optimal solution to a given problem. Genetic algorithms are commonly used in optimization problems, machine learning, and artificial intelligence.

Let's implement a simple genetic algorithm for solving the "knapsack problem," which involves selecting a combination of items with maximum value while respecting a given weight constraint.

class GeneticAlgorithm
  def initialize(population_size, items, weight_limit)
    @population_size = population_size
    @items = items
    @weight_limit = weight_limit
  end

  def run(generations)
    population = generate_initial_population

    generations.times do
      population = evolve(population)
    end

    population.max_by { |individual| fitness(individual) }
  end

  private

  def generate_initial_population
    Array.new(@population_size) { random_individual }
  end

  def fitness(individual)
    total_value = individual.each_with_index.sum { |gene, index| gene * @items[index][:value] }
    total_weight = individual.each_with_index.sum { |gene, index| gene * @items[index][:weight] }
    total_weight > @weight_limit ? 0 : total_value
  end

  def random_individual
    Array.new(@items.size) { [0, 1].sample }
  end

  def mutate(individual, mutation_rate)
    individual.map do |gene|
      if rand < mutation_rate
        gene == 0 ? 1 : 0
      else
        gene
      end
    end
  end

  def crossover(parent1, parent2)
    crossover_point = rand(1..parent1.size - 1)
    child1 = parent1[0...crossover_point] + parent2[crossover_point..-1]
    child2 = parent2[0...crossover_point] + parent1[crossover_point..-1]
    [child1, child2]
  end

  def evolve(population)
    elites = population.max_by(2) { |individual| fitness(individual) }
    new_population = elites.dup

    while new_population.size < @population_size
      parent1 = select_individual(population)
      parent2 = select_individual(population)
      child1, child2 = crossover(parent1, parent2)
      child1 = mutate(child1, 0.1)
      child2 = mutate(child2, 0.1)
      new_population += [child1, child2]
    end

    new_population
  end

  def select_individual(population)
    population.sample
  end
end

In this implementation, we define a GeneticAlgorithm class that takes the population size, items, and weight limit as parameters. The run method performs the genetic algorithm process by generating an initial population, evolving the population over a specified number of generations, and returning the best individual.

The fitness method calculates the fitness value of an individual by summing the values of the selected items while ensuring that the total weight does not exceed the weight limit. Individuals with a higher fitness value are more likely to be selected for reproduction.

The mutate method randomly mutates an individual by flipping the value of a gene (0 to 1 or 1 to 0) with a certain mutation rate.

The crossover method performs a crossover operation between two parents to create two new children. The crossover point is randomly selected, and genes from each parent are combined to create the children.

The evolve method forms the next generation of the population by selecting elite individuals, performing crossover and mutation operations, and filling the remaining slots with new individuals.

Lastly, the select_individual method randomly selects an individual from the population for reproduction based on a probability distribution proportional to their fitness value.

To test our genetic algorithm implementation, let's write a test for solving the knapsack problem:

describe 'GeneticAlgorithm' do
  it 'solves the knapsack problem' do
    items = [
      { weight: 2, value: 10 },
      { weight: 3, value: 5 },
      { weight: 5, value: 15 },
      { weight: 7, value: 7 },
      { weight: 1, value: 6 },
      { weight: 4, value: 18 },
      { weight: 1, value: 3 }
    ]
    weight_limit = 10

    algorithm = GeneticAlgorithm.new(100, items, weight_limit)
    result = algorithm.run(100)

    expect(result).to eq([1, 0, 0, 1, 0, 1, 0])
    expect(result.sum { |gene| gene == 1 }).to be_within(1).of(3)
  end
end

In this test, we define a knapsack problem instance with a set of items and a weight limit. We create an instance of the GeneticAlgorithm class, passing the population size, items, and weight limit. Then, we run the algorithm for a specified number of generations and expect the result to be a binary array indicating which items to include in the knapsack. We also check that the total number of selected items is within an acceptable range.

Running this test will demonstrate the effectiveness of our genetic algorithm in solving the knapsack problem by finding a combination of items with maximum value while respecting the weight constraint.

The Rain problem...

Here's a Ruby algorithm that calculates the probability of rain on a given day using historical precipitation data:

def calculate_rain_probability(day)
  historical_data = get_historical_data(day)
  rainy_days = historical_data.count { |data| data[:precipitation] > 0 }
  rainy_days.to_f / historical_data.length
end

def get_historical_data(day)
  # Example of historical data (dummy data)
  historical_data = [
    { date: "2023-05-01", precipitation: 0.0 },
    { date: "2023-05-02", precipitation: 0.8 },
    { date: "2023-05-03", precipitation: 0.0 },
    { date: "2023-05-04", precipitation: 0.2 },
    # ... other historical data ...
  ]

  historical_data.select { |data| data[:date] == day }
end

day = "2023-05-20"
probability = calculate_rain_probability(day)

puts "The probability of rain on #{day} is #{(probability * 100).round(2)}%."

In this example, the calculate_rain_probability algorithm takes a date as input and uses a get_historical_data function to retrieve historical precipitation data for that specific date. The algorithm then calculates the probability based on the percentage of rainy days compared to the total number of days in the historical data set.

Note that in the get_historical_data function, an example of dummy historical data has been provided. In a real-world application, you would use a reliable data source to obtain actual historical precipitation data.

Conclusion

Probabilistic algorithms offer a powerful approach to solving complex problems by leveraging randomness and probability theory. In this article, we explored additional examples of probabilistic algorithms, including Bloom filters and genetic algorithms. We implemented these algorithms in Ruby and provided tests to validate their functionality.

By understanding and utilizing probabilistic algorithms, developers can address computationally challenging problems with approximate solutions and gain insights into various fields such as data analysis, optimization, and machine learning.

Top comments (1)

Kevin Luo • May 20 '23

Nice job! That's what the ruby community lacks. Use ruby for science and math! 👏