DEV Community

Deepak Rajamohan
Deepak Rajamohan

Posted on • Originally published at Medium on

Learning Tarjans algorithm with a NY subway map

Recently I had to go through many implementations provided for Tarjan algorithm for solving graph questions in coding websites (of course while preparing for interviews). It took me a while to understand the basics of the algorithm and once it was clear, the simplicity of it’s idea was impressive.

The Problem:

Say, we are taking the Subway from Columbus circle to JFK airport using the red or green lines. How to find if an outage in the green line will block our journey altogether ?

In graph theory, a group of nodes are part of a “ Sub Tree” if the nodes are strongly connected i.e, all nodes are accessible even after one edge or one node is removed. A critical connection is a loose connection between two subtrees and if removed creates two separate graphs. Sub-trees are vital to identification of articulation points and critical connections.

Directed graph problems like “find strongly connected nodes”, “critical connections”, “articulation points” are all asking a similar question in different contexts.

The problem statement can thus be written as, “How do we identify if there is a critical connection during our travel to JFK from Columbus circle ?”

The Solution:

The Tarjan algorithm starts with doing a Depth first search (DFS) on each node and attempts to identify a group of “subtrees” in a directed graph.

from collections import defaultdict

class Node:
    def __init__ (self) :
        self.succ = []
        self.id = None
        self.name = None

def tarjans(graph) :
    rank_index = [-1 for _ in range(len(graph) + 1)]
    low_index = [-1 for _ in range(len(graph) + 1)]
    result = []
    counter = {
        "index" : 0
    }
Enter fullscreen mode Exit fullscreen mode

So let’s see what each of those declared variables above are; “counter” acts as a global counter that is incremented for every newly discovered node in the dfs. The incremented value of the counter variable is used to rank the newly visited node and assigned to the node using the “rank_index” variable.

In the above map, all stations to the right of woodside are strongly connected. But the 31st station is loosely connected to the woodside subtree (i.e., stations in Brooklyn and Queens). Can we find that with an algorithm ? Tarjans ranking discovery of nodes helps with that.

Let’s start traversing the stations from Columbus (assign rank 0) and assign rank indexes incrementally 48th street - (1), 31st -(2) and Penn station -(3).

At its foundation, the Tarjans algorithm attempts to find a “low rank Index” value for each node. This “LOW” function is discussed in detail below.

    def strongconnect(v, parent) :
        rank_index[v.index] = count["index"]
        low_index[v.index] = count["index"]
        count["index"] += 1

        for w in v.succ:
            if low_index[w.index] == -1 :
                strongconnect(w, v) #dfs to w from parent v
                low_index[v.index] = min(low_index[v.index], low_index[w.index])
            else :
                low_index[v.index] = min(low_index[v.index], low_index[w.index])
Enter fullscreen mode Exit fullscreen mode

Moving from Columbus to Penn station, we attempt to find what is the lowest rank value that we are able to discover for every dfs. For this we use the “low_index” variable.

For our initial search the low rank index for all the nodes in the dfs path 48th street, 31st and Penn station is ‘0’ since the dfs path reached back to Columbus which had a rank (0). The “LOW” function helps to track cycles in the graph and thus identifies a subtree !

Next, let’s keep proceeding with the original dfs that started from Columbus circle to the right of the graph.

Here the children in the dfs path from Woodside discovers that the lowest rank is (4) since the path winds back to woodside. But the dfs starting at woodside knows that the originally assigned rank has not changed, which makes it a critical node, since “ Woodside ” is the origin of a new subtree !

The most important aspect of the ranking discovery of lowest indexes is the ability to find if a predecessor node is reachable from the current node, which indicates a strongly connected subtree !

    def strongconnect(v, parent) :
        rank_index[v.index] = count["index"]
        low_index[v.index] = count["index"]
        count["index"] += 1

        for w in v.succ:
            if low_index[w.index] == -1 :
                strongconnect(w, v)
                low_index[v.index] = min(low_index[v.index], low_index[w.index])
            else :
                low_index[v.index] = min(low_index[v.index], low_index[w.index])

        if low_index[v.index] == index[v.index] and v != parent :
            result.append([parent.name, v.name])
Enter fullscreen mode Exit fullscreen mode

In the last line in the “strongconnect” function above, after iterating all the child nodes, we compare the low ranking index value of the current node with it’s originally assigned value. If they match, it indicates that none of the DFS led to the discovery of a predecessor and this node is root of a subtree.

Now lets right some test cases and check if our algorithm works !

edges_list = [[
            ('Columbus', '48th'),
            ('48th', '31st'),
            ('31st', 'Penn'),
            ('Penn', 'Columbus'),
            ('31st', 'woodside'),
            ('woodside', 'JFK'),
            ('JFK', 'EastNY'),
            ('EastNY', '14th'),
            ('14th', 'woodside'),
           ]]

for edges in (edges_list) :
    for critical_connection in tarjans(create_graph(edges)) :
        print(critical_connection)
    print()
Enter fullscreen mode Exit fullscreen mode

Running the above code would give a result as [‘31st’, ‘woodside’] !! Following is the complete source code.

from collections import defaultdict

class Node:
    def __init__ (self) :
        self.succ = []
        self.index = None
        self.name = None

def tarjans(graph) :
    rank_index = [-1 for _ in range(len(graph) + 1)]
    low_index = [-1 for _ in range(len(graph) + 1)]
    result = []
    count = {
        "index" : 0
    }

    def strongconnect(v, parent) :
        rank_index[v.index] = count["index"]
        low_index[v.index] = count["index"]
        count["index"] += 1

        for w in v.succ:
            if low_index[w.index] == -1 :
                strongconnect(w, v)
                low_index[v.index] = min(low_index[v.index], low_index[w.index])
            else :
                low_index[v.index] = min(low_index[v.index], low_index[w.index])

        if low_index[v.index] == index[v.index] and v != parent :
            result.append([parent.name, v.name])

    for v in graph :
        # not yet visited
        if low_index[v.index] == -1 :
            strongconnect(v, v)
    return result

def create_graph(edges):
    nodes = defaultdict(Node)
    index = 0
    for v,w in edges :
        nodes[v].succ.append(nodes[w])
        if not nodes[v].index :
            nodes[v].index = index
            nodes[v].name = v
            index += 1
        if not nodes[w].index :
            nodes[w].index = index
            nodes[w].name = w
            index += 1
    return nodes.values()
Enter fullscreen mode Exit fullscreen mode

References:

  1. https://www.geeksforgeeks.org/articulation-points-or-cut-vertices-in-a-graph/
  2. https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm

Top comments (0)