DEV Community

Cover image for "Day 33 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -12)
Nitin-bhatt46
Nitin-bhatt46

Posted on

"Day 33 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -12)

STATISTICS FOR DATA ANALYTICS - 12

Standard Normal distribution And Z-score

Featuring Scaling :- Normalisation and Standardization.

In a data set everything is in a different format. Like a single data is having quantity of both in kg and grams.

We use feature scaling to normalise and standardise the data.

Used in Linear Regression , K Means, KNN, PCA, Gradient Descent etc.

Normalisation
It is a scaling technique that re-scales the values into a range of [0,1].
Min-max scaling.
X’= X - Xmin / Xmax -Xmin

Points to remember
Min and max value of features are used for scaling
It is used when features are of different scales.
Scale values between [0,1] or [-1,1]
It is really affected by outliers
It is useful when we don’t know about the distribution.
It is often called Scaling normalisation.

Standardisation
Standardisation does not have a bounding range, like normalisation. Si, even if you have outliers in your data, they will not be affected by standardisation.

X’ = X - MEAN / STANDARD DEVIATION

Points to remember
Mean and standard deviation is used for scaling .
It is used when we want to ensure zero mean and unit standard deviation.
It is not bound to a certain range.
It is much less affected by outliers.
It is useful when the feature distribution of Normal or Gaussian.
It is often called Z- Score Normalisation.

Normal distribution To Standard normal distribution is done by Z-score.

Z-score tells us How much standard deviation away from the mean.

Central limit Theorem
It says that the sampling distribution of the mean will always be normally distributed, as long as the sample size is large enough. Regardless of whether the population has a normal, poisson,binomial or any other distribution the sampling distribution of the mean will be normal.

ESTIMATE :- Its is an observed numerical value used to estimate an unknown population

Type : -

Point estimate - single numerical value.

Interval estimate - Range of value.

Follow me on this where every day will be added if i learn something new about it :- https://dev.to/nitinbhatt46

Thank you for your Time.

Top comments (0)