Data Science term Variance

Hello! Lets talk about an important Data Science term, variance, and how to calculate it with Python. Variance describes how spread out the points in a data set are.
Here you see two histograms - the data of the second histogram is more spread out than that of the first one. If you want to exactly measure the variance, we need to take into account the distance of each data point to the mean.

After this we square each difference, to solve the problem of negative distances, which could falsify our result. Finally we calculate the average of all the distances. The larger the sum of the distances of the single data points to the mean is, the more spread the data is. Variance is represented by the symbol sigma squared. This is the impressive looking equation of the variance.

🐍You can easily calculate the variance with Python like that:

import numpy as np

dataset = [3, 5, -2, 49, 10]
variance = np.var(dataset)
If you want to learn more about Statistics with Python, check out the link at the end. What are you currently learning? Have a nice day. :)


