DEV Community

loading...
Cover image for Graph Analytics with Python -Graph Search-

Graph Analytics with Python -Graph Search-

Shitian Daxiang
Bio := struct {Identity, Undergrad, Research}{ "Ainu descendants", "📊 Data Science", "🔬 Privacy Preserving Data Mining"}
・5 min read

The attempt to find the shortest path on a graph has been applied in real life in devising railroad networks and delivery routes. Given a network $G=(V,E)$ and two vertices $v_s$ and $v_t$, we want to find a path from $v_s$ to $v_t$.

Breadth Firtst Search and Depth First Search

We start with a breadth-first search (BFS), which searches for all vertices adjacent to $v_s$ at distance 1, then all vertices at distance 2, and then all vertices at distance 3, in that order. We use a graph model called Zachary's karate club, which is available in Networkx.

Karate is a Japanese martial art, and the sister of Ran in the popular Japanese anime Detective Conan is a national karate champion.

Alt Text

In bfs_edges, a function of Networkx, we give the target network, the starting point of the search, and an upper limit on the depth of the search. By computing the difference between the edges searched when the upper limit is n and the edges searched when the upper limit is n-1, we can obtain only the edges that lead to vertices with distance n. Since the list is empty at depth 4, we know that from vertex 0 of this network, we can reach all other vertices up to distance 3.

G = nx.karate_club_graph()
plt.figure(figsize=(5,5))
nx.draw_spring(G, node_size=400, node_color="#00C98D", with_labels=True, font_weight="bold")
print("BFS", list(nx.bfs_edges(G, source=0)))

# Depth = 1
d1 = list(nx.bfs_edges(G, source=0, depth_limit=1))
print("depth 1:", d1)

# Depth = 2
d2 = list(nx.bfs_edges(G, source=0, depth_limit=2))
print("depth 2:", list(set(d2) - set(d1)))

# Depth = 3
d3 = list(nx.bfs_edges(G, source=0, depth_limit=3))
print("depth 3:", list(set(d3) - set(d2)))

# Depth = 4
d4 = list(nx.bfs_edges(G, source=0, depth_limit=4))
print("depth 4:", list(set(d4) - set(d3)))
Enter fullscreen mode Exit fullscreen mode

Alt Text

BFS [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 10), (0, 11), (0, 12), (0, 13), (0, 17), (0, 19), (0, 21), (0, 31), (1, 30), (2, 9), (2, 27), (2, 28), (2, 32), (5, 16), (8, 33), (31, 24), (31, 25), (27, 23), (32, 14), (32, 15), (32, 18), (32, 20), (32, 22), (32, 29), (33, 26)]
depth 1: [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 10), (0, 11), (0, 12), (0, 13), (0, 17), (0, 19), (0, 21), (0, 31)]
depth 2: [(1, 30), (2, 28), (2, 9), (5, 16), (8, 33), (2, 32), (31, 24), (31, 25), (2, 27)]
depth 3: [(27, 23), (32, 22), (32, 15), (32, 20), (33, 26), (32, 14), (32, 18), (32, 29)]
depth 4: []
Enter fullscreen mode Exit fullscreen mode

Next is Depth First Search (DFS). This is a search method that selects one vertex at distance 1 adjacent to vertex V_s, then selects one vertex at distance 2 adjacent to it that has not yet been visited, and so on.

print("DFS:", list(nx.dfs_edges(G, source=0)))
print("traversed nodes:", list(nx.dfs_preorder_nodes(G, source=0)))
Enter fullscreen mode Exit fullscreen mode

It shows the list of edges and the list of vertices visited during the depth-first search from vertex 0, computed by dfs_edges and dfs_preorder_nodes, respectively. We can see that from vertex 0, we search 1, 2, 3, 7, backtrack there, and then search 12, 13 from vertex 3.

DFS: [(0, 1), (1, 2), (2, 3), (3, 7), (3, 12), (3, 13), (13, 33), (33, 8), (8, 30), (30, 32), (32, 14), (32, 15), (32, 18), (32, 20), (32, 22), (32, 23), (23, 25), (25, 24), (24, 27), (24, 31), (31, 28), (23, 29), (29, 26), (33, 9), (33, 19), (1, 17), (1, 21), (0, 4), (4, 6), (6, 5), (5, 10), (5, 16), (0, 11)]
traversed nodes: [0, 1, 2, 3, 7, 12, 13, 33, 8, 30, 32, 14, 15, 18, 20, 22, 23, 25, 24, 27, 31, 28, 29, 26, 9, 19, 17, 21, 4, 6, 5, 10, 16, 11]
Enter fullscreen mode Exit fullscreen mode

Alt Text

Dijkstra's Algorithm

If the cost of all edges is the same, width-first search or depth-first search can be used, but there are cases where the distances between edges are different. In such a case, Dijkstra's algorithm is an effective search method. Dijkstra's algorithm is an algorithm to find the shortest path length to each vertex, given a graph and a starting vertex.

#Define Graph
G = nx.Graph()
G.add_nodes_from(range(0, 5))
G.add_weighted_edges_from([(0, 1, 7), (0, 2, 9), (0, 5, 14), (1, 2, 10), (1, 3, 15), (2, 3, 11), (2, 5, 2), (3, 4, 6), (4, 5, 9)])

plt.figure(figsize=(5, 5))
pos = nx.spring_layout(G)
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_nodes(G, pos)
nx.draw_networkx_edge_labels(G, pos, font_size=16, edge_labels={(u, v): d["weight"] for u, v, d in G.edges(data=True)})
nx.draw_networkx_labels(G, pos)
plt.axis("off")
plt.show()
Enter fullscreen mode Exit fullscreen mode

Alt Text

dist_estimate = [999] * nx.number_of_nodes(G)
dist_certainty = [0] * nx.number_of_nodes(G)
dist_estimate[1] = 0

while functools.reduce(operator.mul, dist_certainty) == 0:
  print("Estimation: ", dist_estimate)
  print("Certaintly: ", dist_certainty)
  min_v = 999
  for n in nx.nodes(G):
    if (dist_certainty[n] ==0) and (dist_estimate[n] <= min_v):
      min_v = dist_estimate[n]
      min_id = n
    dist_certainty[min_id] = 1
    for nb in G.neighbors(min_id):
      new_estimate = G[min_id][nb]["weight"] + dist_estimate[min_id]
      if new_estimate < dist_estimate[nb]:
        dist_estimate[nb] = new_estimate
        print("Estimation: ", dist_estimate)
        print("Certaintly: ", dist_certainty)
Enter fullscreen mode Exit fullscreen mode

Initially, the shortest path length estimates are very large integers, and their certainty is zero. The shortest path in the array of path length estimates is considered as a reliable estimate, and the distance of the path through that vertex to the neighboring vertex is calculated. If it is shorter than the previous estimate of the shortest path length, the process is repeated to update the estimate.

Estimation:  [999, 0, 999, 999, 999, 999]
Certaintly:  [0, 0, 0, 0, 0, 0]
Estimation:  [7, 0, 999, 999, 999, 999]
Certaintly:  [1, 1, 0, 0, 0, 0]
Estimation:  [7, 0, 10, 999, 999, 999]
Certaintly:  [1, 1, 0, 0, 0, 0]
Estimation:  [7, 0, 10, 15, 999, 999]
Certaintly:  [1, 1, 0, 0, 0, 0]
Estimation:  [7, 0, 10, 15, 999, 999]
Certaintly:  [1, 1, 0, 0, 0, 0]
Estimation:  [7, 0, 10, 15, 999, 12]
Certaintly:  [1, 1, 1, 0, 0, 0]
Estimation:  [7, 0, 10, 15, 999, 12]
Certaintly:  [1, 1, 1, 0, 0, 0]
Estimation:  [7, 0, 10, 15, 21, 12]
Certaintly:  [1, 1, 1, 1, 0, 0]
Estimation:  [7, 0, 10, 15, 21, 12]
Certaintly:  [1, 1, 1, 1, 0, 1]
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this article, I explained the main search methods for graphs: breadth-first search, depth-first search, and Dijkstra's algorithm. There are other search methods for graphs, so please refer to the following links and try to implement them.

Graph Data Structure

Discussion (0)