Navigating the world of data often means operating in scenarios where not all data points have the same importance as one another
This is where the weighted average, a statistical tool that assigns importance to each value, helps us incorporate the context of a situation into our average calculations!
import numpy as np
With Python's versatile ecosystem we're able to leverage tools such as
numpy to quickly and efficiently calculate the weighted average in our analyses and data projects
- Prerequisites and installation
- What is the weighted average?
- Examining a simple example
- Using np.average to calculate weighted mean
- Additional resources
The following package is a prerequisite installation for following along with this blog post!
To install it open your preferred terminal/console and run:
pip3 install numpy
The weighted average is an extension of a typical arithmetic mean that includes the importance (or weight) of each data point when calculating the average
In scenarios where all data points have the same importance, the weighted average simplifies to the standard arithmetic mean. However, when the significance of each data point varies the weighted average becomes a vital tool
Let's consider an example where we are a data scientist employed by a university to calculate the average student grade across all classes in the school
To preserve the privacy of individual students we are only provided data aggregated at the class level and are thus given each individual class'
- average grade
- number of students
Our initial instinct might be to just take the usual average across all classes but what happens when comparing small classes to very large classes?
If a class has an average test score of 20/100 but only has 4 students is it fair to compare it to a class that has an average test score of 93 and 500 students? No!
If we did that the small class would be given an outsized level of importance as the test grades of just 4 students should not impact the overall mean as much as 500 students
So how do we incorporate the number of students into our university grade average?
With the weighted average!
Continuing with the previous example let's say these are the
grades and their respective
number_of_students per class:
grades = [20, 93, 56, 79, 100, 86] number_of_students = [4, 500, 93, 274, 12, 30]
To get the weighted average across the entire university using
numpy all we have to do is incorporate the weights into the
import numpy as np university_average = np.average(grades, weights=number_of_students) print(university_average) >>> 84.57174151150055
And just like that we're able to quickly incorporate the weighted average into our projects by leveraging the
Thanks so much for reading and if you liked my content, be sure to check out some of my other work or connect with me on social media or my personal website 😄