Thales Bruno

Posted on Jul 14, 2020 • Edited on Jul 20, 2020 • Originally published at thalesbr.uno

Mean, Median, Mode

#python #datascience #statistics #beginners

In this article, we are talking about the common methods to measure the central tendency of the data, that is a way to explain our data in some manner. Although these measurements are quite simple, they are a foundation for a lot of other more complex measurements.

We will use this fake list of grades of a class below to demonstrate the concepts:

Grades
9.2
7.5
8
9
8.5
8
2.5

We create a Python variable to store the grades:

grades = [9.2, 7.5, 8, 9, 8.5, 8, 2.5]

Mean

The first one is the mean, which is the arithmetic average of the data. To calculate the mean of a set of numeric data, we take the sum of all observations divided by the number of observations.

In Python:

from typing import List

def mean(xs: List[float]) -> float:
  return sum(xs)/len(xs)

So, applying in our grades data: mean(grades) gives us ~7.52.

Median

The second one is the median, which gives us something more positional. The median is the exact central value of a data, so to calculate it we need to:

Sort data from smaller to largest, and
If ODD length: pick up the middle value
If EVEN length: calculate the mean of the two middle values

So, in Python could be like this:

def median(xs: List[float]) -> float:
  s = sorted(xs)
  # If odd lenght, just pick up the middle value
  if len(xs) % 2:
    p = len(xs)//2
    return xs[p]
  # If even lenght, take the mean of the left and right middle values
  else:
    l_p = (len(xs) // 2) - 1
    r_p = len(xs) // 2
    mid_values = [l_p, r_p]
    return mean(mid_values)

This way, the grades sorted looks like this: [2.5, 7.5, 8, 8, 8.5, 9, 9.2] and their median, median(grades), is 8.

Mode

Finally, we have the mode, i.e., the observation that occurs the most in a dataset. Thus, a dataset may have one, multiple, or even none mode.

Python mode function:

from collections import Counter
def mode(xs: List[float]) -> List[float]:
  counter = Counter(xs)
  return [x_i for x_i, count in counter.items() if count == max(counter.values())]

The grade that occurs the most in our dataset is [8] as we can see calling mode(grades)

Python mean, median, and mode

In this article, we made some Python code to illustrate our measurements, just as didactic purposes. By the way, we already have functions in Python libraries that make the job. Bellow, some of them: