DEV Community

loading...
Cover image for Data Science term Variance

Data Science term Variance

Anja
Software Engineer | Linux fan | lawyer | Sharing what I'm learning 😊
・1 min read

Hello! Lets talk about an important Data Science term, variance, and how to calculate it with Python. Variance describes how spread out the points in a data set are.
Alt Text
Here you see two histograms - the data of the second histogram is more spread out than that of the first one. If you want to exactly measure the variance, we need to take into account the distance of each data point to the mean.

After this we square each difference, to solve the problem of negative distances, which could falsify our result. Finally we calculate the average of all the distances. The larger the sum of the distances of the single data points to the mean is, the more spread the data is. Variance is represented by the symbol sigma squared. This is the impressive looking equation of the variance.

Alt Text

🐍You can easily calculate the variance with Python like that:

import numpy as np

dataset = [3, 5, -2, 49, 10]
variance = np.var(dataset)
Enter fullscreen mode Exit fullscreen mode

If you want to learn more about Statistics with Python, check out the link at the end. What are you currently learning? Have a nice day. :)

πŸ“šSources:

https://www.codecademy.com/learn/learn-statistics-with-python

Discussion (0)