In the previous article, we explored the definition of a graph and gave some brief examples of how they are represented in computer memory. Today's article will focus on some basic graph algorithms showcasing graph traversals.
Graph traversal is the process of visiting each node of a graph. Such traversals are classified into two categories, by the order in which the nodes are visited: depth-first search (at which we will look today) and breadth-first search (for next week!). For the rest of the article, let G be a graph with N nodes and E edges.
Before talking about traversals themselves, there is some graph terminology we have to introduce.
Below, I will be using the notion of degree when speaking of a node. The degree of a node represents the number of direct connections that node has to other nodes in the graphs.
For example, in the image above, the degree of node 1 is 3, the degree of nodes 4 and 2 is 2 and degree of node 3 is 1.
In an undirected graph, the sum of the degree of each node equals half the number of edges (as for any nodes i and j such there is an edge between them, the edge (i, j) is counted as both (i, j) and (j, i))
When speaking about directed graphs, for any node i there exists an in degree (number of edges that lead to i) and an out degree (number of edges that leave from i).
A walk in a graph is a sequence of nodes in a graph such that for any two consecutive nodes, i and j, there exists the edge (i, j). This implies that nodes and edges can be repeated. For example, in the graph we presented above, 1 -> 3 -> 1 -> 2 -> 4 is a walk.
A trail in a graph is a walk with no repeated edges. For example, in the graph above, 1 -> 2 -> 4 or 2 -> 4 -> 1 -> 3 are trails.
A cycle in a graph is a trail in which only the first and last nodes are repeated. In our graph, 1 -> 2 -> 4 -> 1 is a cycle.
The depth-first search algorithms follows a recursive approach and aims to traverse the graph in its depth, by following these steps:
- start the traversal from an arbitrary node, let it be i; mark i as visited.
- search for the neighbours of i.
- if a neighbour, let it be j, is found, then the algorithm makes a recursive call to step 1, now starting from node j instead of i.
- if there are no neighbours to be found, the call ends and the algorithm moves up the recursive stack.
Applying this recursive approach to a node i (finding its neighbours and making correspondent recursive calls) can also be called, for short, expanding i.
Now, let's have a look at the time complexity of the DFS algorithm when representing G using each of the methods showcased in the previous article:
- G is represented by using an adjacency matrix, let it be A of size N * N. If we have to search for the neighbours of an arbitrary node i, we have to loop through all the N columns of i-th line of the matrix. We have to do this exactly N times as we can't expand a note multiple times and therefore the complexity of DFS while using an adjacency matrix is O(N*N).
- G is represented by using adjacency lists, let A[i] denote the list of the neighbours of i. Then, for each node i, we have to loop through its neighbours - equal in number to the degree of node i. Adding those operations result in an O(N+E).
As we said in the previous article - for a dense graph (that is, when E approaches N*N) the adjacency matrix approach would prove more effective as it would use considerably less memory, while having the same asymptotic complexity - when E approaches N*N, O(N+E) is O(N*N).
Let us now see an implementation of DFS, using a graph represented by adjacency lists. To avoid duplicate code, during this series, I will use the adjacency_list.py I wrote for the first article, which includes a basic class definition for managing graphs represented by adjacency lists.
This snippet runs on the graph used above when explaining node degree and its output is "1 2 4 3".
The algorithm starts from 1, searching for its neighbours (which are 2, 3, 4). The first one found is 2, and then the algorithm searches for 2's unvisited neighbours, finding 4. Node 4 has no unvisited neighbours left, and the algorithm moves up the stack. Same goes for 2. Node 3 is the last of node 1's unvisited neighbours, resulting in the output.
If G is a directed graph with no cycles (also called a directed acyclic graph, or DAG for short), then the topological sorting of the nodes of graph G is a linear ordering of nodes such that for any edge (i, j), i comes before j in the ordering.
This sorting has a wide range of real world uses, most notably helping organize a schedule of tasks based on their dependencies.
For example, the graph above can represent dependencies between tasks (e. g., task 2 should be started after finishing task 1). By find a topological sorting of the graph, we can determine a way in which to start the tasks so there are no bottlenecks.
A simple way to do this is to first find a node with in degree 0 (in our case, node 1), and start a depth first search with it as root. Instead of using the node right away, we will use a stack to push it only after all of its neighbours have been visited. Then, the reversed stack will hold a topological sorting for our graph, as the last node we push to the stack will be the root.
For implementing topological sorting, I have modified the graph class we used before for a directed graph and also calculated the in degree of each node at initialisation:
Then, I implemented the modified DFS using the modified version of the graph manager class as the parent:
For the directed graph above, the output of the snippet is "1 4 2 3 5", an ordering that satisfies the conditions of our definition. Note, though, that a topological sorting of a graph is not necessarily unique. For our example, "1 2 3 5 4" is also a correct ordering.
This was a brief introduction into depth first search and one of its real world applications. Next week, we will be talking about the other graph traversal - breadth first search (or BFS for short), and we will also showcase some of its powers.
Until then, stay tuned, any why don't delight yourself with some of the other articles in The Magic of Computing series? The part about the Fibonacci numbers is quite a hit!