DEV Community

Nitin-bhatt46
Nitin-bhatt46

Posted on

"Day 32 of My Learning Journey: Setting Sail into Data Excellence! Today's Focus: Mathematics for Data Analysis (Stats Day -11)

STATISTICS FOR DATA ANALYTICS - 11

Sampling

Why do we need sampling ?
To draw valid conclusions from results, we need to carefully select a sample.

Type of Sampling.
Probability sampling
Simple Random sampling
Systematic sampling
Stratified Sampling
Cluster sampling

Non Probability sampling - Based on criteria.
Convenience sampling
Voluntary response sampling
Purposive sampling.
Snowball sampling.

Methods to Detect Outliers

Due to outliers -

Mean , Variance , Standard Deviation is highly affected.

Three Methods

Visualisation of data
Z-score
IQR ( inter quartile range )

Visualisation of data

Box-plot

Histogram

Distribution Plot

Scatter Plot

Z-score

Any data point whose Z-score falls out of 3rd deviation is called outliers.

IQR ( inter quartile range )

In these we need
Min
Max
Percentile of 25
Percentile of 75

To find out IQR

IQR = Q3 - Q2

Lower fence = Q1 - IQR * 1.5

Higher fence = Q3 + IQR * 1.5

1.5 is experimental which is commonly used.

Above higher fence and lower than lower fence is taken as outliers.

How we can handle it

We can handle it with

IQR METHOD

TRANSFORMATION
LOG TRANSFORMATION
BOX COX TRANSFORMATION

IMPUTATION
MEAN IMPUTATION
Median IMPUTATION
Zero Value IMPUTATION

Follow me on this where every day will be added if i learn something new about it :- https://dev.to/nitinbhatt46

follow me on linkedin :

Thank you for your Time.

Top comments (0)