When we build a Machine Learning model after training it on the training data, we often want to know how this model will perform in the real world. Hence we also test it on unseen data before deploying it on the cloud for production. But how can one be sure that our model is actually performing well or not? Thus evaluating Machine Learning models, prior to its deployment, is a crucial step for judging its performance. This is when the Confusion Matrix comes to the rescue.
When I first encountered this term while understanding Machine Learning, my first reaction was similar to as described below. 😝
Confusion matrix_? Seriously? How do they even come up with these terms?_
Well, however, as vague and confusing this may sound, a Confusion Matrix, also known as an Error Matrix, is one of the most useful metrics to evaluate the performance of your classifier(classification model).
A confusion matrix is basically a nXn square matrix that describes a model’s performance, where n represents the number of classifications the model is trying to predict. It could either be a binary classification(2X2 matrix) or a multi-class classification(nXn).
Let us now understand how the confusion matrix helps us to evaluate the performance of a model on a 2X2 binary classification.
For example, Let’s say we want to predict whether a person is suffering from Diabetes or Not which is clearly a binary classification(0/1 or Yes/No).
The rows represent the Predicted Values our model is predicting.
The columns represent the Actual Values that we know beforehand.
The primary diagonal, i.e, the cells (1, 1) and (2, 2) represents the values that are predicted correctly by our model.
While all the other cells represent values that are incorrectly predicted by our model.
Let’s have a more in-depth understanding of each cell we are dealt with.
- The cell (1, 1) represents that our model is classifying a person to be diabetic and he/she is actually diabetic. This is also known as True Positive.
- The cell (2, 2) represents that our model is classifying a person to be NOT diabetic and he/she is actually NOT diabetic. This is also known as True Negative.
- The cell (1, 2) represents that our model is classifying a person to be diabetic and he/she is actually NOT diabetic. This is also known as False Positive.
- The cell (2, 1) represents that our model is classifying a person to be NOT diabetic and he/she is actually diabetic. This is also known as False Negative.
False Positive is also known as Type 1 Error.
Similarly, False Negative is also known as Type II Error.
One of the measures you can derive from the confusion matrix is the Accuracy and Error Percentage.
- accuracy(%) = (True Positive+True Negative)* 100/ Total_values
error(%) = (False Negative+False Negative)*100/ Total_Values
where Total_values = TP + TN + FP + FN
Ideally, in our Machine Learning model the number of Type I error and Type II error should be as low as possible.
Depending on the problem statement and its use case we should know up to what error percentage our model is acceptable.
For example, the problem statement for Detecting Breast Cancer needs to have an extremely low error. Just imagine the situation if we say someone does NOT have cancer when she actually has. 😲
There are certain methods to increase the performance of a model but in this post we’re only focussing on the evaluation aspect.
From the above Confusion Matrix we could also calculate some advanced measures to evaluate and understand our model even further. Some of these are enlisted below:
- Sensitivity and Specificity
Sensitivity measures how well the model is able to detect events in the positive class. It is calculated as the number of correct positive predictions divided by the total number of actual positive values.
Sensitivity = True Positive/(True Positive +False Negative)
Specificity measures how well the model is able to detect events in the negative class. It is calculated as the number of correct negative predictions divided by the total number of actual negative values.
Specificity = True Negative/(True Negative +False Positive)
- Precision and Recall
Precision is the ratio of the number of correct positive predictions to the total number of predicted positive values.
Precision = True Positive/(True Positive +False Positive)
Recall is the ratio of the number of correct positive predictions to the total number of actual positive values. (Sensitivity)
Recall = True Positive/(True Positive +False Negative)
The F-Score or F-Measure combines both Precision and Recall.
It is the Harmonic Mean of Precision and Recall.
F Score = 2 * (recall*precision)/(recall+precision)