Statistics is one of the 4 pillars of Machine Learning, other 3 are Linear Algebra, Calculus, and Probability. To excel in machine learning or data science one of the things that you should master is Statistics.
Here I wrote common terms with examples.
Statistics is divided into two parts, Those are:
- Descriptive Statistics
- Inferential Statistics
Explore the data (No point of view yet). Understanding what type of data we have, how many samples we have, is it qualitative or quantitative.
Frame hypothesis and test them. Once you described your data with the tools of descriptive statistics now it's time to go further with Inferential Statistics.
Once your accepted or rejected your hypothesis, you can then setup :
You can frame rules based on data and the relationship that you discovered. These rules are generally formulated by subject matter experts. And this can be a little dangerous because there is the risk of too much certainty. If you are the expert in formulating rules, you find it hard to reject them even if an alternate explanation arrives. And this is why machine learning models are much more powerful than rule-based learning models.
Build models that change with data.
Let's understand some common terms of Statistics.
The proposed explanation for a phenomenon is objectively testable.
Singular - Hypothesis
Plural - Hypotheses
Null Hypothesis H0: It is considered true till it is proven false.
Alternative Hypothesis: It is the negation of the null hypothesis.
Apply the Test on the data available. ( Test Statistic ), The output of the Test is Test Statistic.
Test statistics is used to accept or reject the alternative hypothesis that you proposed.
You need to convert this Test Statistics to a p-value.
This p-value tells you the Test Statistics that you got whether it is due to random chance or luck or whether it is actually significant.
Before you perform hypothesis testing, you need to specify a threshold for luck.
We compare the p-value of the test to this significance level. Significance level basically says if there is less than a 5% chance that this test statistics is due to luck then we consider the test significant.
Now we check the p-value, which is small, below the significance level you will reject the null hypothesis and accept the alternative hypothesis.
And the alternative hypothesis is our proposed explanation.
A difference exists, there is a relationship, When you get the number above the threshold for the p-value, you will reject the alternative hypothesis and accept the null hypothesis which poses no relationship.
A large p-value indicates that the results are due to luck.
It is a famous experiment
Let say tea is served to you and this tea contains tea and milk. Is it possible for you to tell whether the tea is added before or after the milk?
Let's see this in the form of a hypothesis.
Null Hypothesis H0: The Lady cannot tell if milk was poured first. ( NH possess no relationship and no difference )
Alternate Hypothesis H1: The lady can tell if milk was poured first. It is good practice to assume that the null hypothesis is correct unless proven otherwise.
Here the lady was given 8 cups of tea, 4 of each type.
Lady got all 8 correct - this is our test statistics and we need to convert this into a p-value
P-value = 1/70 = 1.4% ( 70 is the number of combinations )
8C4 = 70 combinations
Let's say our significance level was 5%
1.4% < 5%=> Reject H0
When you reject the null hypothesis when it was actually true. Claim the lady can tell the difference based on spurious test results which are not statistically significant.
We don't reject the null hypothesis and accept the alternate hypothesis. Fail to realize that the test for the alternative hypothesis was statistically significant.
Probability of rejecting H0 when H1 is true. It's a score between the range 0 to 1 High values of power are good. If the power of the statistical test is high it implies the low probability of Type-II error.
When we work with a binary classifier the power of the binary classifier is also known as recall ( metric to evaluate the model )
Alpha is the probability of rejecting H0 when H0 is true.
Alpha = probability of committing a Type I error. Ranges from 0 to 1
If the test has a high value of alpha that is not good.
the p-value is compared to alpha to decide whether to accept H0
The significance level or the threshold that we decide as a part of the experiment design is basically the alpha threshold.
p-value should be as small as possible that is below the alpha threshold.
A low p-value indicates that the test statistics are not obtained due to chance.
Typical cut-off values for statistical significance are 1% and 5%.
It is the number of values in the final calculation of statistics that are free to vary.
It is very important to understand these terms on a zoom-out level so that you can excel in statistics and move towards machine learning. Terms we studied: Descriptive and Inferential statistics, rule-based learning models, machine learning models, Hypothesis, Select a Test, Significance Level, p-value, Type I error, Type II error, Power, Alpha, Degree of Freedom.