stevensonmt

Posted on

# Prim's algorithm in Elixir

Prim's algorithm is a greedy algorithm for finding the lowest cost tree that includes all nodes of a graph, also called a minimum spanning tree. There are lots of blog posts and tutorials out there about implementing this, but I found it difficult to find information on implementations in functional languages or a functional style with immutable data structures.

In the implementation below, the nodes of the graph are given as an unordered list. I found it easiest to work with the list as a tuple that could be accessed by index. The collection of nodes is not going to be mutated (no additions or insertions) but is going to be frequently accessed. Because accessing a tuple by index is quite fast I felt this was a good approach. That said, it is not necessary and may actually make this less "functional" in style.

In Prim's algorithm you need two mathematical sets -- the set of all visited nodes (i.e., the nodes included in the minimum spanning tree) and the set of all nodes in the graph. You choose an arbitrary node to start and iterate over the unvisited nodes to find the minimum distance to connect to the MST. The algorithm is completed when these two sets are equivalent. In the implementation below I have chosen to simply track the set of unvisited nodes, with the algorithm being complete when the set is empty. This is because in this case I'm only interested in the cost of the MST, not the MST itself. If I needed to return the MST I would track the visited nodes alongside the original set of nodes.

You also need a way to know the cost of any edge in the graph. In this case I am assuming the graph is given as a series of {x, y} coordinate pairs and the weight of an edge is given by the Manhattan distance between them.

To begin we instantiate the set of unvisited nodes, the running "cost" of the minimum spanning tree, and a priority queue. In this instance I'm using a general balanced tree as the priority queue structure. I know they are not exactly equivalent, but Erlang has builtin GBT and I did not want to implement a priority queue from scratch.

The keys of the GBT are the distances, and the values of the GBT are a list of nodes that connects with at least one edge at that key distance. Because I'm using a GBT as a queue and the keys must be unique, if I don't use a list as the value of each GBT node the queue will drop nodes and end up being incorrect. This is the only issue I encountered with using GBT as the priority queue.

``````  def min_cost_connect_points(points) do
vertices =
points
|> Enum.with_index()
|> Enum.map(fn {pt, i} -> {i, pt} end)
|> Enum.into(%{})

len = length(points)

candidates = MapSet.new(0..len - 1)
# This will be our set of unvisited nodes, well, indices
# of nodes.

queue = :gb_trees.enter(0, , :gb_trees.empty())
# We initiate the priority queue with an arbitrary node
# that has a minimum distance of zero. The node index is
# wrapped in a list and entered into the GBT with key 0.

cost = 0
do_prims(vertices, candidates, queue, cost)
end

defp do_prims(vertices, candidates, queue, cost) do
if MapSet.size(candidates) == 0 do
# When the set of unvisited nodes (`candidates`) is
# empty we're done.
cost
else
{min_dist, [next_vertex | rest], new_queue} =
:gb_trees.take_smallest(queue)
# take_smallest/1 returns a 3 tuple of {key, value, new_tree}
# where the new_tree is the original tree with the node that
# has the smallest key removed.
# The next line is necessary to avoid 'deduplicating' the
# queue of nodes that have min_dist as a potential edge cost.
# Arbitrarily taking the head of the list does not impact the
# correctness of the algorithm.
new_queue = if rest == [] do
new_queue
else
# If multiple nodes have an edge where distance == min_dist,
# we need to put the min_dist key back in the queue with the
# rest of those nodes as the value.
:gb_trees.enter(min_dist, rest, new_queue)
end
if not MapSet.member?(candidates, next_vertex) do
# If the node we chose as the current smallest distance edge
# from the queue has already been visited (i.e., removed from
# the candidates set), then we skip next_vertex and try again
do_prims(vertices, candidates, new_queue, cost)
else
# If next_vertex has not been visited yet, we mark it as
# visited and update the queue with edges connecting to
# next_vertex. Note that while the queue may include edges
# having been visited, this step will only ever introduce
# edges for nodes that have not been visited yet.
candidates = MapSet.delete(candidates, next_vertex)
updated_queue =
candidates
|> Enum.reduce(new_queue, fn i, q ->
d =
distance(vertices[i], vertices[next_vertex])
# Here's where :gb_trees could really use an ergonomic
# update_or_insert function. There is an update/3 but it
# crashes if the key is not present. It also does not take
# a function as a parameter for the update, so it is less an
# update as a replace method. That's why we first check if the
# distance is already a key in the GBT. If it is, we have to
# get the value of that key and add the current node to the
# list. Otherwise we can just wrap the current node in a list.
is = if :gb_trees.is_defined(d, q) do
is = :gb_trees.get(d, q)
[i | is]
else
[i]
end
:gb_trees.enter(d, is, q)
end)
# We have updated the candidates list and the queue, so we
# recurse with the updated cost.
do_prims(vertices,
candidates,
updated_queue,
cost + min_dist)
end
end
end

defp distance([a, b], [c, d]), do: abs(a - c) + abs(b - d)
``````

I hope someone finds this helpful. If anyone has tips on optimizations or corrections to my understanding of Prim's algorithm I'd love to hear them. Thanks for reading!