DEV Community

Shaurya Lalwani
Shaurya Lalwani

Posted on

What is Stacked Generalization in ML?

The way to test out classification algorithms is generally by looking at the AUC-ROC curve, and by understanding the Precision and Recall. For assumption, think that we are dealing with a decision tree. This decision tree is built on all of the data and can have various hyper-parameters.

One way to achieve a better model, or to obtain required predictions, is to tune the tree. There is a method to fine-tune this technique to another level, i.e. make lots of decision trees (based on different hyper-parameters), and then add predictions from all these trees, and average them out. This is the random forest algorithm.

An important thing to note here is that it depends on the data at times, which algorithm will perform better on it. What if our data doesn't perform well on a decision tree?
In that case, we need to look at other algorithms, and if possible, an even better method would be to combine the best instances of all possible algorithms.

As described above, the random forest algorithm is a homogeneous ensemble, meaning that it is an average prediction of various instances of the same algorithm.
Now, we come to a heterogeneous ensemble. A heterogeneous ensemble is an average prediction based on outcomes of different algorithms applied to different samples of the data. This is widely used in the industry, as it removes the possibility of bias to a very high extent.

Now, the topic of the day: Stacked Generalization, is a method that comes under advanced machine learning, because this a step further from heterogeneous assembling. After performing the heterogeneous ensemble, the predicted values which we obtain, are used to build the model using a Neural Net, etc. such that a model based on predicted values is built, at a point where predicted values are representative of many algorithms (ensemble).

Happy learning!

Top comments (0)