karapto

Posted on Feb 9, 2021

Graph Analytics with Python -Node Similarity-

#python #datascience #graph #networkx

Node Similarity

There are many link prediction methods to predict what edges will be formed from a given graph. One of the simplest approaches is to calculate the similarity between nodes, assuming that edges are likely to be formed between nodes with high similarity. The following four methods are used to calculate the similarity of nodes.

Common Neighbors
Jaccard Coefficient
Admic/Adar
Preferential Attachment

1. Common Neighbors

In Common Neighbors, given a graph G and two vertices v and w, the set of nodes adjacent to v is denoted by
Γ(v), and the following formula is used to calculate the similarity between v and w.

Score(v,w) = |\Gamma (v) \cup \Gamma (w)|

In Common Neighbors, given a graph G and two vertices v and w, the set of nodes adjacent to v is denoted by Γ(v), and the following formula is used to calculate the similarity between v and w.

import networkx as nx
import matplotlib.pyplot as plt
G = nx.karate_club_graph()
plt.figure(figsize=(5,5))
nx.draw_spring(G, node_size=400, node_color="#00C98D", with_labels=True, font_weight="bold")
x = 4
y = 5
print("vertex pair:", x, "and", y)
print("neighbors of", x, ":", list(G.neighbors(x)))
print("neighbors of", y, ":", list(G.neighbors(y)))
print("degree of", x, ":", G.degree(x))
print("degree of", y, ":", G.degree(y))
print("common neighbors:", len(list(nx.common_neighbors(G, x, y))))

vertex pair: 4 and 5
neighbors of 4 : [0, 6, 10]
neighbors of 5 : [0, 6, 10, 16]
degree of 4 : 3
degree of 5 : 4
common neighbors: 3

2. Jaccard Coefficient

Common Neighbors was simply the number of common neighboring nodes. However, this does not tell us how much of the total is made up of neighboring nodes.

On the other hand, Jaccard Coefficient is the ratio of common neighbors to all neighbors of v and w.

Score(v,w) = \frac{|\Gamma (v) \cap \Gamma (w)|}{|\Gamma (v) \cup \Gamma (w)|}

print("Jaccard coefficient:", list(nx.jaccard_coefficient(G, [(x, y)]))[0][2])

Jaccard coefficient: 0.75

3. Adamic/Adar

Adamic/Adar is a modification of Common Neighbors. Adamic/Adar is an improvement on Common Neighbors in that it does not simply count common neighbors as the same, but rather focuses on the nodes with the lowest degree among the common neighbors.

Score(v,w) = \sum_{x \in \Gamma (v) \cap \Gamma (w)} \frac{1}{\log |\Gamma (x)|}

print("Adamic/Adar:", list(nx.adamic_adar_index(G, [(x, y)]))[0][2])

Adamic/Adar: 1.9922605072935597

Preferential Attachment

In Preferential Attachment, nodes with more neighbors are considered to be more similar to each other.

Score(v,w) = |\Gamma(v)| ・ |\Gamma(w)|

print("preferential attachment:", list(nx.preferential_attachment(G, [(x,y)]))[0][2])

preferential attachment: 12

Conclusion

In this article, we have explained the method for calculating the similarity of nodes used in link prediction. In the next article, we will discuss link prediction.

DEV Community

Graph Analytics with Python -Node Similarity-

Node Similarity

1. Common Neighbors

2. Jaccard Coefficient

3. Adamic/Adar

Preferential Attachment

Conclusion

Top comments (0)

Read next

Building a Full Stack Web Application using Flask (Python Web Framework) - Part Two

Convert Images Into Pencil Sketch Using Python.

Crafting an Image to PDF Converter App Using Python.

Calculator with GUI Using Python Tkinter.