DEV Community

Cover image for Math for Devs - Cosine Similarity in Python
José Thomaz
José Thomaz

Posted on • Updated on

Math for Devs - Cosine Similarity in Python

In this article you will learn what is cosine similarity and how to calculate it using Python.

What is cosine similarity?

Cosine similarity is a metric used to measure how similar two entities are, irrespective of their size. It measures the cosine of the angle between two vectors projected in a multi-dimensional space.

In this context, the two vectors I am talking about are arrays of numbers (like a list in Python), and the angle between them is a measure of how similar they are. The closer the vectors, the smaller the angle, leading to a cosine close to 1, and vice versa. This metric is a measurement of orientation (not magnitude).

Now, if the arrows are at a 90 degree angle, it means the data sets are unrelated, giving a cosine similarity of 0. So, in short, cosine similarity is a way of measuring how related two sets of data are. The similarity will range from -1 to 1, where:

  • 1 means the vectors are identical
  • 0 means the vectors are unrelated (not similar)
  • -1 means the vectors are diametrically opposed (completely dissimilar)

Cosine Similarity representation

In the image above, you can visually see the cosine similarity, and its classification for two distinct vectors.

If you want to learn more about vectors, I have an article explaining it with more details: What is a vector embedding?

 

Cosine Similarity formula

The mathematical formula for calculating cosine similarity is:

cosine similarity formula

Where:

a and b are our vectors
The dot product (a . b) of a and b is calculated as Dot Product
||a|| and ||b|| are the magnitudes (lengths) of the vectors

 

Calculating it with Python

The Python function cosine_similarity(vector1: list[float], vector2: list[float]) -> float: takes two vectors as input and calculates their cosine similarity.

Let's see the full code

from math import sqrt, pow

def cosine_similarity(vector1: list[float], vector2: list[float]) -> float:
    """Returns the cosine of the angle between two vectors."""
    # the cosine similarity between two vectors is the dot product of the two vectors divided by the magnitude of each vector

    dot_product = 0
    magnitude_vector1 = 0
    magnitude_vector2 = 0

    vector1_length = len(vector1)
    vector2_length = len(vector2)

    if vector1_length > vector2_length:
        # fill vector2 with 0s until it is the same length as vector1 (required for dot product)
        vector2 = vector2 + [0] * (vector1_length - vector2_length)
    elif vector2_length > vector1_length:
        # fill vector1 with 0s until it is the same length as vector2 (required for dot product)
        vector1 = vector1 + [0] * (vector2_length - vector1_length)

    # dot product calculation
    for i in range(len(vector1)):
        dot_product += vector1[i] * vector2[i]

    # vector1 magnitude calculation
    for i in range(len(vector1)):
        magnitude_vector1 += pow(vector1[i], 2)

    # vector2 magnitude calculation
    for i in range(len(vector2)):
        magnitude_vector2 += pow(vector2[i], 2)

    # final magnitude calculation
    magnitude = sqrt(magnitude_vector1) * sqrt(magnitude_vector2)

    # return cosine similarity
    return dot_product / magnitude
Enter fullscreen mode Exit fullscreen mode

The code begins by initializing the variables for dot product and magnitudes of the vectors. It then checks the lengths of the two input vectors and pads the shorter one with zeros so that they have the same length. This step is necessary for calculating the dot product.

Then, the dot product of the two vectors and the magnitude of each vector are calculated using for loops. Finally, the cosine similarity is calculated by dividing the dot product by the product of the magnitudes.

 

Usage

You can use this function in your code as follows:

vector1 = [1, 2, 3]
vector2 = [2, 3, 4]
similarity = cosine_similarity(vector1, vector2)

print("The cosine similarity between vector1 and vector2 is: ", similarity)
Enter fullscreen mode Exit fullscreen mode

 

Conclusion

If you want to see a more complete explanation about cosine similarity and the code, I published a video on YouTube, teaching it step by step.

Top comments (0)