## DEV Community is a community of 700,720 amazing developers

We're a place where coders share, stay up-to-date and grow their careers. # Data Science term Variance Anja
Software Engineer | Linux fan | lawyer | Sharing what I'm learning 😊

Hello! Lets talk about an important Data Science term, variance, and how to calculate it with Python. Variance describes how spread out the points in a data set are. Here you see two histograms - the data of the second histogram is more spread out than that of the first one. If you want to exactly measure the variance, we need to take into account the distance of each data point to the mean.

After this we square each difference, to solve the problem of negative distances, which could falsify our result. Finally we calculate the average of all the distances. The larger the sum of the distances of the single data points to the mean is, the more spread the data is. Variance is represented by the symbol sigma squared. This is the impressive looking equation of the variance. 🐍You can easily calculate the variance with Python like that:

``````import numpy as np

dataset = [3, 5, -2, 49, 10]
variance = np.var(dataset)
``````

If you want to learn more about Statistics with Python, check out the link at the end. What are you currently learning? Have a nice day. :)

📚Sources: