DEV Community

Ajaykrishnan Selucca
Ajaykrishnan Selucca

Posted on

Machine Learning - Performance Metrics

ACCURACY OF THE MODEL:

Accuracy is most common in all the machine learning models we work.It is so much important majorly in Classification problems.Accuracy is defined as the ratio of the number of correct predictions made by the model over all kinds of predictions made. It can be expressed as follow,

                   Number of Correct predictions           
       Accuracy =  ------------------------------
                   Total No. of Predictions made

It can also be said that the major disadvantage of an accuracy is it wont work upto the mark when we have an imbalanced dataset. It works only if there is an equal number of samples belonging to each class.

Let us consider that we have 98% of samples of class A and 2% of samples of class B in our training set. Then the model we build may easily get 98% training accuracy by simply predicting the training samples of Class A. When this same model is tested in Test Set with 60% of samples of Class A and 40% samples of Class B, then test accuracy drops to 60%. Classification Accuracy being great, just gives us a false sense of achieving high accuracy.

The real problem arises when the cost of mis-classification of the minor class samples are very high. If we deal with a rare but a fatal disease, the cost of failing to diagnose the disease of a sick person is much higher than the cost of selling a healthy person to more tests. Similarly in case of predicting bank fraud detection.

CALCULATING A F1-SCORE:

F1 Score combines precision and recall relative to a specific positive class. It conveys the balance between the precision and the recall and also between an uneven class distribution. The F1 Score reaches its best value at 1 and worst at 0.

                            Precision * Recall
           F1 Score = 2 *  ---------------------
                            Precision + Recall

                True (+ve)                             True (+ve)
 Precision = -------------------          Recall = -------------------
            True (+ve) + False (+ve)              True (+ve) + False (-ve)

Now lets understand , What is Precision and Recall? When to use it?

Precision gives us information about its performance with respect to False positives ( how many did we caught). It is about being precise, which means, by an example, if we managed to capture one cancer case and we captured it correctly, then we are 100% precise

Recall gives us information about the performance with respect to False negatives ( how many did we miss).Recall is not so much about capturing cases correctly, by an example, it is more about capturing all cases that have "cancer" with the answer as "cancer", then we have 100% recall.

So basically, if we want to focus more on minimizing False Negatives, we would want our recall to be as close as 100% as possible without precision being too bad.If we want to focus on minimizing False Positives, then we want our Precision as close to 100% as possible.

CONFUSION MATRIX:

                          Actually (+ve) (1)      Actually (-ve)(0)

    Predicted (+ve)(1)       True (+ve)               False (+ve)

    Predicted (-ve)(0)       False (-ve)               True (-ve)

Confusion matrix as the name suggests gives us a matrix as output as N*N matrix, where N is the number of classes being predicted. This metrics work for imbalanced data set.It finds the correctness and accuracy of the model and completely based on the numbers inside this table.

This confusion matrix is a table with two dimensions ('Actual' and 'Predicted'), and sets of "classes" in both dimensions. Our actual classifications are columns and Predicted once are rows.

What are these Positives and Negatives?

True Positives : The cases in which we predicted YES and the actual output was also YES
True Negatives : The cases in which we predicted NO and the actual output was NO.
False Positives : The cases in which we predicted YES and the actual output was NO.
False Negatives : The cases in which we predicted NO and the actual output was YES.

Top comments (0)