DEV Community

Leo Pfeiffer
Leo Pfeiffer

Posted on

Topological Sort with Kahn's Algorithm

On a recent project, we were working on a spreadsheet-like tool where users could cross-reference different rows in formula, just like in Excel. We ran into the question of how to order the evaluation of the rows such that the dependencies do not cause any conflict.

For example, imagine we have three rows:

Row id Formula Rendered values
row1 manual input 1, 2, 3
row2 row3 * 2 6, 8, 6
row3 row1 + 1 2, 3, 4

row1 can be rendered directly, as its formula does not depend on any other rows. row2 however, references row3. row3 references row1. Thus in order to render row2, we first need to render row3.

It turns out, this is quite a common problem, for example in resolving the import order of different modules in a project.

We can solve these types of problems using topological sorting, which takes a DAG (directed acyclical graph) and orders its nodes in a way that for every edge x -> y, x comes before y in the ordering.

Kahn's algorithm is one algorithm that can establish this topological order.

Kahn's Algorithm: Theory

The idea behind Kahn's algorithm is simple:

  1. Put all nodes that have no dependencies in your order
  2. Remove all occurrences of those nodes from the dependencies of all remaining nodes. If there are no cycles, this will give you new nodes without dependencies.
  3. Repeat until there are no more nodes left.

For example:

Node Depends on
a b, c
b d
c d
d -

Here's the dependency graph (where x -> y means that x depends on y.

Example DAG

Thus:

  1. d has no dependencies. Put it in the order.
    Order = [d]

  2. Remove occurrences of d from the other dependencies. d occurs in the dependencies of b, c. Thus, after removing d from those dependencies, neither b nor c have any dependencies.

Example DAG Step 1

  1. b and c have no dependencies. Put then in the order.
    Order = [d, b, c]

  2. Remove occurrences of b and c from the other dependencies. Both b and c occur in the dependencies of a. Thus, after removing b and c from those dependencies, a has no more dependencies.

Example DAG Step 2

  1. a has no dependencies. Put it in the order. Order = [d, b, c, a]

Example DAG Step 3

  1. No more nodes left. Done. Final order = [d, b, c, a]

Kahn's Algorithm: Implementation

For our implementation we assume as input a dictionary where the keys are the node names and the values are a list of all nodes that the depend on the key.

For example, if nodes B and C depend on A, then on entry in the dictionary would be A: [B, C].

Translating the theory from the previous section into Python code, a possible implementation looks like this:

def topological_sort(nodes: dict[str, list[str]]) -> list[str]:
    """
    Topological sort for a network of nodes

        nodes = {"A": ["B", "C"], "B": [], "C": ["B"]}
        topological_sort(nodes)
        # ["A", "C", "B"]

    :param nodes: Nodes of the network
    :return: nodes in topological order
    """

    # Calculate the indegree for each node
    indegrees = {k: 0 for k in nodes.keys()}
    for name, dependencies in nodes.items():
        for dependency in dependencies:
            indegrees[dependency] += 1

    # Place all elements with indegree 0 in queue
    queue = [k for k in nodes.keys() if indegrees[k] == 0]

    final_order = []

    # Continue until all nodes have been dealt with
    while len(queue) > 0:

        # node of current iteration is the first one from the queue
        curr = queue.pop(0)
        final_order.append(curr)

        # remove the current node from other dependencies
        for dependency in nodes[curr]:
            indegrees[dependency] -= 1

            if indegrees[dependency] == 0:
                queue.append(dependency)

    # check for circular dependencies
    if len(final_order) != len(nodes):
        raise Exception("Circular dependency found.")

    return final_order

Enter fullscreen mode Exit fullscreen mode

Let's try this with a slightly more evolved network:

Graph #2

The corresponding input looks like this:

nodes = {
    "A": ["B", "C"],
    "B": ["C", "D"],
    "C": ["F"],
    "D": ["F"],
    "E": ["F"],
    "F": []
}
Enter fullscreen mode Exit fullscreen mode

If we run this we get the following order:

order = topological_sort(nodes)
print(order)

# ['A', 'E', 'B', 'C', 'D', 'F']
Enter fullscreen mode Exit fullscreen mode

Take another look at the graph and convince yourself that that's indeed a correct order that avoids dependency clashes!

Top comments (0)