I have been following a course on udemy on Machine Learning. Here is a quick explanation of how
predict_proba() works and how it can be useful to us.
predict_proba() basically returns probabilities of a classification label
Official Documentation: The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
Let's try and understand this with an example.
Our first 5 records of X_test are:
predict_proba() on the same data...
On the same data
Observations from two outputs:
- In o/p 1 the sum of values in each row is 1 (0.89 + 0.11 = 1)
- In o/p2, when the prediction is of 0, the corresponding column in op/1 has higher value/probability.
- Observe that in 1st row value is higher when prediction is of 0 and vice versa.
Predict_proba() analyses the values of a row in our dataset and gives the probability of a result. So this can help us understand what factors determine the higher or lower probability of the result.