Nikhil Thomas

Posted on May 16, 2020 • Originally published at nthomas.org

Learning Tarjan's Algorithm - solving Leetcode to learn graph theory

#computerscience #graphs #algorithms

Intro

After reading source code for bundlers, linters, compilers, and other projects I've been convinced that everything is a graph. Well, not everything but enough. So I began digging into practice algorithm questions on Leetcode, AlgoExpert and other websites. I'm pretty comfortable at coding up BFS and DFS when it's obviously a matter of "search for this element" but knowing when to model something as a graph when it's a little more vague is a challenge.

One of my personal goals is being able to code up most Leetcode hard questions in under 20 minutes. My hope is that practicing repeatedly will give me enough exposure to graph and tree traversal algorithms, array sorting and searching, etc so when I do write new projects at work or contribute to open source, patterrn recognition will kick in. Most Leetcode hard questions are fairly well written, and if not at least have a well authored solution. However, no matter what, Leetcode 1192 stumped me. It's flagged as hard, rated fairly highly on frequency of interviews seen, yet there's no canonical solution and many of the user answers aren't too clear.

I spent some time doing my own reading outside of Leetcode. I didn't find many of the Youtube videos to be helpful. However, in a bit of luck I stumbled onto these lecture notes from an NYU course on data structures. It happens to be the final lecture for the entire course, which is somewhat satisfying to know that at least my graph theory knowledge only has holes at the later stages of education.

Description

First off, I highly recommend reading the problem once or twice. Then, take a best effort at solving it!

Now let's dig in.

There are n servers numbered from 0 to n-1 connected by undirected server-to-server connections forming a network where connections[i] = [a, b] represents a connection between servers a and b. Any server can reach any other server directly or indirectly through the network.

A critical connection is a connection that, if removed, will make some server unable to reach some other server.

Return all critical connections in the network in any order.

Our input parameters are n, the number of nodes in our graph, and an edge list. We know the graph is undirected (that is, node A points to node B and node B points to node A).

We are looking for critical connections, or the edges that will effectively break our graph into two smaller graphs. That is

before

    1 <----> 2 <----> 3
    ^        ^
    |        |
    v        v
    4 <----> 5

If we broke the connection between 2 and 3, and then our graph breaks into two parts and nodes 1, 2, 4 and 5 are unreachable from 3.

after

    1 <----> 2      3
    ^        ^
    |        |
    v        v
    4 <----> 5

The key insight here is that we're looking for the bridge between strongly connected components. Luckily this is a solved problem, and Tarjan's algorithm will come to the rescue.

Let's come up with a quick definition of a strongly connected component that works for us. A strongly connected component can be thought of as a graph in which no one edge is the weak link in the graph. Or, all edges have at least one backup to keep the graph tied together if it disappears.

If we take a set of nodes and number them in the order we visit them (like a timestamp, or just a monotonically increasing id), we can end up building an array of IDs. Let's call that the dfsNumber array, is for any node i the order we visited it in a depth first search.

Additionally, we want to know how far "backwards" into a graph a node can reach. If we know nothing about a node, the "oldest" seen node that node N can see is itself. Once we start to look at neighbor nodes, we can assert that the oldest node a neighbor can reach is the oldest node any of its neighbors can reach (ignoring the direct parent we came from). We can hold oldestReachable in an array, where oldestReachable[i] indicates the oldest timestamp reachable from node i

Let's visualize this. I'll use letters for node names so dfsNumbers are clear.

    A  <--->  B
    ^         ^
    |         |
    v         v
    D  <--->  C

If we start at node named A, we label it with a dfsNumber of 1, and we also assert that right now, the oldestReachable for A is also 1. Now we look at its neighbors and travel ONLY if they have no dfsNumber. Once we label it's neighbors, we look to see how far back thhe neighbor can reach.

So we'll pick B, give it a dfsNumber and oldestReachable of two. Then we dfs into C, labeled with 3, then D as 4. Remember this is DFS, so we have a call stack. So

1) Visit and label A
2) visit and label B
3) visit and label D
4) visit and label D. Then we visit A.
5) A has a label so we do nothing. What's the oldest D can reach? Currently, it's 4. But we look. Can a neighbor reach older? Yes! A can reach older (itself)! So we declare that, oldestReachable for D is equal to oldestReachable for A, or oldestReachable[D] = 1

now we unwind the call stack.

What's the oldest C can reach? Well, since we said D can reach to A and C can reach D, C can reach A or oldestReachable[C] = 1

As we unwind the stack, we keep assigning all oldestReachable values to 1.

Now when would this be false? What's an example of a graph where some components do not have neighbors that can reach back into the graph? Another visualization and example.

    A  <--->  B
    ^      /  ^
    |    /    |
    v  /      v
    D         C

A points to B and D. B points to A, D and C. C points to B. D points to B and D. Let's again begin our traversal, starting at A

1) visit and label A. dfsNumber of 1, oldestReachable is itself of 1. Pick a neighbor - randomly select D
2) Visit and label D. dfsNumber of 2, oldestReachable is itself of 2. Pick a neighbor. We just came from A, so we have to go to B.
3) Visit and label B. dfsNumber of 3, oldestReachable is itself of 3. Pick a neighbor. We just came from D, so we have to go to C.
4) Visit and label C. dfsNumber of 4, oldestReachable is itself of 4. Pick a neighbor. We just came from B, so we have nowhere to go. Unwind the call stack and go back to B
5) B's oldestReachable is the lowest of its own oldestReachable and the oldestReachable of the neighbor we just procesesed (D). D = 4, B = 3. So D cannot reach further back into the graph on its own. Therefore we know that, if we remove the edge between B and D, D becomes part of a new graph.

    A  <--->  B
    ^      /
    |    /
    v  /
    D         C  (ALONE)

Pseudocode

Let's try and write this in pseudocode

  numberOfNodes = n
  dfsNumber = new Array, numberOfNodes in length
  oldestReachable = new Array, numberOfNodes in length
  timestamp = 0

  dfs(node, parent, graph)
    timestamp++
    dfsNumber[node] = timestamp
    oldestReachable[node] = timestamp

    for neighbor of graph[node]
      if neighbor is parent, continue
      if !dfsNumber(neighbor) dfs(neighbor, node, graph) // unlabeled neighbor

      // check if neighbor can reach deeper. If yes, reset our oldestReachable

      oldestReachable[node] = min of oldestReachable[node], oldestReachable[neighbor]

      // check if this neighbor's oldest reachable isn't as old as where we are now

      if oldestReachable[neighbor] > dfsNumber[node]
        collect [node, neighbor]


  dfs(idx, null, graph)

Solution

Now, Leetcode gives us an edge list but Tarjan's algorithm requires an adjacency list. So the first thing we do is process our data. Otherwise we follow our steps.

var criticalConnections = function(n, connections) {
  const adjacency = {}
  for (let i = 0; i < n; i++) {
    adjacency[i] = []
  }

  for (let [node1, node2] of connections) {
    adjacency[node1].push(node2)
    adjacency[node2].push(node1)
  }

  const dfsNumber = new Array(n).fill(0)
  // this is what we called oldestReachable
  // the lowest number reachable by dfs from a node
  const dfsLow = new Array(n).fill(0)
  const criticalEdges = []
  let timestamp = 0

  function tarjan(node, parent, adjacency, edges) {
    timestamp++
    dfsNumber[node] = timestamp
    dfsLow[node] = timestamp

    for (let neighbor of adjacency[node]) {
      if (neighbor === parent) continue
      if (!dfsNumber[neighbor]) tarjan(neighbor, node, adjacency, edges)

      // resetting oldestReachable if neighbor can cycle back
      dfsLow[node] = Math.min(dfsLow[node], dfsLow[neighbor])

      // if neighbor cannot reach back to me or oldest, we have a critical edge
      if (dfsLow[neighbor] > dfsNumber[node]) edges.push([node, neighbor])
    }
  }

  tarjan(0, null, adjacency, criticalEdges)

  return criticalEdges
}

And lo and behold, our solution works.

Let me know what you thought. If these kinds of articles are useful, and I should write more of them I'll happily do so. I'm most easily reached on Twitter (@nikhilthomas90).

DEV Community

Learning Tarjan's Algorithm - solving Leetcode to learn graph theory

Intro

Description

Pseudocode

Solution

Top comments (0)

Read next

DSA: Heap - Key Questions and Challenges

Representing and Manipulating Information in modern computer - Part 1

Java Inner Classes and Nested Classes

1277. Count Square Submatrices with All Ones