Welcome to yet another exciting series of blogs. In this series, we are going to talk about dimensionality reduction techniques. Our thinking throughout this series should be oriented towards optimization. Every technique discussed here shall be keeping that in mind. However, this section will also involve some bit of math, but just to be inclusive, we shall keep it simple and straight, discussing only the concepts and how to perceive them.
In the last article we discussed the Principal Component Analysis. Without delving into the mathematics behind it we understood the working of through intuition and small example at each step. In this article, we shall discuss yet another important dimensionality reduction technique that is ubiquitous in machine learning projects. We shall see why do we need it, where do we use it, and many more. As always let's start the discussion by asking the very fundamental question of this blog.
I suggest this article is a must-read on this topic. But still, to simplify this article a little more here we go...
If you happened to read it somewhere else or just hit Google for this, you may receive a completely unexpected answer. What I learned with my first expedition on LDA was it is a
Linear Classifier. You must be acquainted with what is classifier by now and as the name suggests it is a linear classifier. It is used as an extension to Logistic Regression. But a few moments later, from another source, of course, I learned that is a dimensionality reduction technique. That was confusing! But as this article puts it together beautifully, it can actually be used as both. He has clearly stated reasons for using it either way and under what conditions can you use it. This is what he says:
the characteristics of the dataset that you are working on will guide you about the decision to apply LDA as a classifier or a dimensionality reduction algorithm to perform a classification task.
The main task of Linear Discriminant Analysis is basically to separate out the examples of classes linearly moving them into a different feature space, therefore if your dataset is linearly separable, only applying LDA as a classifier you will get great results. But if that isn't the case then Linear Discriminant Analysis (LDA) could act as an extra tool to be applied over the dataset in order to try to “make things better” or facilitate the job of the posterior classifier.
In some cases, it could be used as a reinforcement of PCA by applying linear dimensionality reduction to the reduced set of variables from PCA. This could give a much better dataset to apply the modeling on.
The theory is pretty simple. It finds the linear combination of the predictors such the variance within the group is maximized while at the same time minimizing the variation within each group of data.
This video should help in understanding more intricate details and the working of the model.
I hope this was helpful and was able to put things down in a simple way. Please feel free to reach to me on Twitter @AashishLChaubey in case you need more clarity or have any suggestions.
Until next time...