DEV Community

Mariana Medeiros
Mariana Medeiros

Posted on

Standardization | Normalization

Preprocessing Data

First things first! When we come across some dataset (raw data), it is recommended to follow these steps to understand and, if necessary, adjust your data for further analyses:

  1. Read Data
  2. Checking for missing values, categorical data, and outliers (and what to do with them)
  3. Apply preprocessing techniques as necessary

Some of the techniques are:

  • Mean removal
  • Standardization
  • Normalization
  • Binarization
  • One Hot Encoding
  • Label Encoding

In this post, I will cover the differences between Normalization and Standardization 🤓

Standardization vs Normalization

One of the things that we do is to understand what type of variables (which we call features, simplifying) our dataset have and in what scale/unit/range/magnitude they are presented.

If the features have a significant variability, we can use the concept of feature scaling as features with high magnitudes will weigh in a lot more in the distance calculations than features with low magnitudes. To address that, we need to bring them to similar magnitude levels by using methods such as standardization and normalization.

Normalization

This method scales the numeric variables in a certain range, usually from 0 to 1 or -1 to 1 when having negative values. It is also known as Min-Max scaling.

03_MaxMin.png

Through this method, we calculate the proportion of the distance between the value itself to the minimum value, and the distance of the maximum and minimum values (so we need to be careful regarding outliers).

03_normalize.png
not nice to have outliers right?

Standardization

This method uses the z-score formula to scale the values, resulting in a dataset of zero mean and unit variance.

03_z-score.png

For a brief explanation, the z-score represents how many standard deviations a value is from the mean.

03_standardize.png
thanks to Math is Fun


Quick notes:
- I am not a big fan of too long articles as I find it easier to write and absorb information by smaller ones. So I would rather split information along more articles about the subject.
- Writing has been challenging yet rewarding as it is a good way to learn as well.
- Any comment that can improve my writing/knowledge is more than welcome!

Thanks for reading 😊

Also posted on Hashnode

Top comments (0)