DEV Community is a community of 696,672 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Graph Analytics with Python -Node Similarity-

Shitian Daxiang
Bio := struct {Identity, Undergrad, Research}{ "Ainu descendants", "📊 Data Science", "🔬 Privacy Preserving Data Mining"}

Node Similarity

There are many link prediction methods to predict what edges will be formed from a given graph. One of the simplest approaches is to calculate the similarity between nodes, assuming that edges are likely to be formed between nodes with high similarity. The following four methods are used to calculate the similarity of nodes.

1. Common Neighbors
2. Jaccard Coefficient
4. Preferential Attachment

1. Common Neighbors

In Common Neighbors, given a graph G and two vertices v and w, the set of nodes adjacent to v is denoted by
Γ(v), and the following formula is used to calculate the similarity between v and w.

$Score(v,w) = |\Gamma (v) \cup \Gamma (w)|$

In Common Neighbors, given a graph G and two vertices v and w, the set of nodes adjacent to v is denoted by Γ(v), and the following formula is used to calculate the similarity between v and w.

import networkx as nx
import matplotlib.pyplot as plt
G = nx.karate_club_graph()
plt.figure(figsize=(5,5))
nx.draw_spring(G, node_size=400, node_color="#00C98D", with_labels=True, font_weight="bold")
x = 4
y = 5
print("vertex pair:", x, "and", y)
print("neighbors of", x, ":", list(G.neighbors(x)))
print("neighbors of", y, ":", list(G.neighbors(y)))
print("degree of", x, ":", G.degree(x))
print("degree of", y, ":", G.degree(y))
print("common neighbors:", len(list(nx.common_neighbors(G, x, y))))

vertex pair: 4 and 5
neighbors of 4 : [0, 6, 10]
neighbors of 5 : [0, 6, 10, 16]
degree of 4 : 3
degree of 5 : 4
common neighbors: 3


2. Jaccard Coefficient

Common Neighbors was simply the number of common neighboring nodes. However, this does not tell us how much of the total is made up of neighboring nodes.

On the other hand, Jaccard Coefficient is the ratio of common neighbors to all neighbors of v and w.

$Score(v,w) = \frac{|\Gamma (v) \cap \Gamma (w)|}{|\Gamma (v) \cup \Gamma (w)|}$
print("Jaccard coefficient:", list(nx.jaccard_coefficient(G, [(x, y)]))[0][2])

Jaccard coefficient: 0.75


Adamic/Adar is a modification of Common Neighbors. Adamic/Adar is an improvement on Common Neighbors in that it does not simply count common neighbors as the same, but rather focuses on the nodes with the lowest degree among the common neighbors.

$Score(v,w) = \sum_{x \in \Gamma (v) \cap \Gamma (w)} \frac{1}{\log |\Gamma (x)|}$
print("Adamic/Adar:", list(nx.adamic_adar_index(G, [(x, y)]))[0][2])

Adamic/Adar: 1.9922605072935597


Preferential Attachment

In Preferential Attachment, nodes with more neighbors are considered to be more similar to each other.

$Score(v,w) = |\Gamma(v)| ・ |\Gamma(w)|$
print("preferential attachment:", list(nx.preferential_attachment(G, [(x,y)]))[0][2])

preferential attachment: 12


Conclusion

In this article, we have explained the method for calculating the similarity of nodes used in link prediction. In the next article, we will discuss link prediction.