DEV Community

Chris Greening

Posted on • Updated on

Calculating weighted averages with numpy and Python!

Introduction

Navigating the world of data often means operating in scenarios where not all data points have the same importance as one another

This is where the weighted average, a statistical tool that assigns importance to each value, helps us incorporate the context of a situation into our average calculations!

``````import numpy as np
``````

With Python's versatile ecosystem we're able to leverage tools such as `numpy` to quickly and efficiently calculate the weighted average in our analyses and data projects

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

christophergreening.com

Prerequisites and installation

The following package is a prerequisite installation for following along with this blog post!

To install it open your preferred terminal/console and run:

``````pip3 install numpy
``````

What is the weighted average?

The weighted average is an extension of a typical arithmetic mean that includes the importance (or weight) of each data point when calculating the average

In scenarios where all data points have the same importance, the weighted average simplifies to the standard arithmetic mean. However, when the significance of each data point varies the weighted average becomes a vital tool

Examining a simple example

Let's consider an example where we are a data scientist employed by a university to calculate the average student grade across all classes in the school

To preserve the privacy of individual students we are only provided data aggregated at the class level and are thus given each individual class'

• number of students

Our initial instinct might be to just take the usual average across all classes but what happens when comparing small classes to very large classes?

If a class has an average test score of 20/100 but only has 4 students is it fair to compare it to a class that has an average test score of 93 and 500 students? No!

If we did that the small class would be given an outsized level of importance as the test grades of just 4 students should not impact the overall mean as much as 500 students

So how do we incorporate the number of students into our university grade average?

With the weighted average!

Using np.average to calculate weighted average

Continuing with the previous example let's say these are the `grades` and their respective `number_of_students` per class:

``````grades = [20, 93, 56, 79, 100, 86]
number_of_students = [4, 500, 93, 274, 12, 30]
``````

To get the weighted average across the entire university using `numpy` all we have to do is incorporate the weights into the `np.average`:

``````import numpy as np
print(university_average)

>>> 84.57174151150055
``````

Conclusion

And just like that we're able to quickly incorporate the weighted average into our projects by leveraging the `np.average`'s `weights` argument

Thanks so much for reading and if you liked my content, be sure to check out some of my other work or connect with me on social media or my personal website ðŸ˜„

Chris Greening - Software Developer

Hey! My name's Chris Greening and I'm a software developer from the New York metro area with a diverse range of engineering experience - beam me a message and let's build something great!

christophergreening.com

Cheers!