STATISTICS FOR DATA ANALYTICS - 11
Sampling
Why do we need sampling ?
To draw valid conclusions from results, we need to carefully select a sample.
Type of Sampling.
Probability sampling
Simple Random sampling
Systematic sampling
Stratified Sampling
Cluster sampling
Non Probability sampling - Based on criteria.
Convenience sampling
Voluntary response sampling
Purposive sampling.
Snowball sampling.
Methods to Detect Outliers
Due to outliers -
Mean , Variance , Standard Deviation is highly affected.
Three Methods
Visualisation of data
Z-score
IQR ( inter quartile range )
Visualisation of data
Box-plot
Histogram
Distribution Plot
Scatter Plot
Z-score
Any data point whose Z-score falls out of 3rd deviation is called outliers.
IQR ( inter quartile range )
In these we need
Min
Max
Percentile of 25
Percentile of 75
To find out IQR
IQR = Q3 - Q2
Lower fence = Q1 - IQR * 1.5
Higher fence = Q3 + IQR * 1.5
1.5 is experimental which is commonly used.
Above higher fence and lower than lower fence is taken as outliers.
How we can handle it
We can handle it with
IQR METHOD
TRANSFORMATION
LOG TRANSFORMATION
BOX COX TRANSFORMATION
IMPUTATION
MEAN IMPUTATION
Median IMPUTATION
Zero Value IMPUTATION
Follow me on this where every day will be added if i learn something new about it :- https://dev.to/nitinbhatt46
Thank you for your Time.
Top comments (0)