DEV Community

loading...

Graph Analytics with Python -Node Similarity-

Shitian Daxiang
Bio := struct {Identity, Undergrad, Research}{ "Ainu descendants", "📊 Data Science", "🔬 Privacy Preserving Data Mining"}
・2 min read

Node Similarity

There are many link prediction methods to predict what edges will be formed from a given graph. One of the simplest approaches is to calculate the similarity between nodes, assuming that edges are likely to be formed between nodes with high similarity. The following four methods are used to calculate the similarity of nodes.

  1. Common Neighbors
  2. Jaccard Coefficient
  3. Admic/Adar
  4. Preferential Attachment

1. Common Neighbors

In Common Neighbors, given a graph G and two vertices v and w, the set of nodes adjacent to v is denoted by
Γ(v), and the following formula is used to calculate the similarity between v and w.

Score(v,w)=Γ(v)Γ(w) Score(v,w) = |\Gamma (v) \cup \Gamma (w)|

In Common Neighbors, given a graph G and two vertices v and w, the set of nodes adjacent to v is denoted by Γ(v), and the following formula is used to calculate the similarity between v and w.

import networkx as nx
import matplotlib.pyplot as plt
G = nx.karate_club_graph()
plt.figure(figsize=(5,5))
nx.draw_spring(G, node_size=400, node_color="#00C98D", with_labels=True, font_weight="bold")
x = 4
y = 5
print("vertex pair:", x, "and", y)
print("neighbors of", x, ":", list(G.neighbors(x)))
print("neighbors of", y, ":", list(G.neighbors(y)))
print("degree of", x, ":", G.degree(x))
print("degree of", y, ":", G.degree(y))
print("common neighbors:", len(list(nx.common_neighbors(G, x, y))))
Enter fullscreen mode Exit fullscreen mode
vertex pair: 4 and 5
neighbors of 4 : [0, 6, 10]
neighbors of 5 : [0, 6, 10, 16]
degree of 4 : 3
degree of 5 : 4
common neighbors: 3
Enter fullscreen mode Exit fullscreen mode

2. Jaccard Coefficient

Common Neighbors was simply the number of common neighboring nodes. However, this does not tell us how much of the total is made up of neighboring nodes.

On the other hand, Jaccard Coefficient is the ratio of common neighbors to all neighbors of v and w.

Score(v,w)=Γ(v)Γ(w)Γ(v)Γ(w) Score(v,w) = \frac{|\Gamma (v) \cap \Gamma (w)|}{|\Gamma (v) \cup \Gamma (w)|}
print("Jaccard coefficient:", list(nx.jaccard_coefficient(G, [(x, y)]))[0][2])
Enter fullscreen mode Exit fullscreen mode
Jaccard coefficient: 0.75
Enter fullscreen mode Exit fullscreen mode

3. Adamic/Adar

Adamic/Adar is a modification of Common Neighbors. Adamic/Adar is an improvement on Common Neighbors in that it does not simply count common neighbors as the same, but rather focuses on the nodes with the lowest degree among the common neighbors.

Score(v,w)=xΓ(v)Γ(w)1logΓ(x) Score(v,w) = \sum_{x \in \Gamma (v) \cap \Gamma (w)} \frac{1}{\log |\Gamma (x)|}
print("Adamic/Adar:", list(nx.adamic_adar_index(G, [(x, y)]))[0][2])
Enter fullscreen mode Exit fullscreen mode
Adamic/Adar: 1.9922605072935597
Enter fullscreen mode Exit fullscreen mode

Preferential Attachment

In Preferential Attachment, nodes with more neighbors are considered to be more similar to each other.

Score(v,w)=Γ(v)Γ(w) Score(v,w) = |\Gamma(v)| ・ |\Gamma(w)|
print("preferential attachment:", list(nx.preferential_attachment(G, [(x,y)]))[0][2])
Enter fullscreen mode Exit fullscreen mode
preferential attachment: 12
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this article, we have explained the method for calculating the similarity of nodes used in link prediction. In the next article, we will discuss link prediction.

Discussion (0)