In this article I will explain difference between two most common types of learning algorithms, supervised and unsupervised. When starting with machine learning, on your own or at college, this is probably one of first topics you will cover. With this article I will try to explain difference between them.
Goal of machine learning is making predictions like price of house, filtering out spam messages, fraud detection and many more. Those predictions are done based on some previous data we use for learning. Based on how it learns we can split those methods into following groups:
- Unsupervised learning
- Supervised learning
- Reinforcement learning
When starting in machine learning, supervised and unsupervised learning are two most common types used, and that is why I will cover difference between them in the rest of this post.
As said before, our goal when using machine learning is to predict something. In order to do that, you need to train the model. That is what train data is used for. One distinctive thing when it comes to this data and supervised learning is that predicting values are known and used to learn.
To better understand, let’s take one of most common examples used, price of house. To make prediction what would be worth of some house, we need to collect data on other houses. That data could be number of rooms, size of house, area, type and many others. One of necessary parameters for those houses would be prices. Given all that data, learning algorithm can make analysis, look what is price for which house and create model. Then this model could be used to predict price on a new house. Some other problems we could solve is given set of spam mails, filter out new spam mails or deciding if animal is cat by using set of cat images.
- Linear Regression
- Nearest Neighbor
- Guassian Naive Bayes
- Decision Trees
- Support Vector Machine (SVM)
- Random Forest
Now that I covered supervised learning, understanding unsupervised learning can be easier in comparison to it. In supervised, for training data we had features and values we want to predict. Difference with unsupervised learning is that training data doesn’t contain value we are predicting. So how does it that work? What are we predicting? Given data, algorithm will use different methods to find patterns in that data. Some of those methods are clustering, dimensionality reduction and principal component analysis.
We could be having set of news articles that need to be grouped. We don’t know what these groups are or how many of them is there. What we could do is give them to some clustering algorithm like k-means. This algorithm would run analysist on them and try to find which are similar and put them in different buckets. Same approach could be used for recommendations in shopping, news and movies.
- Anomality detection
- Neural Networks
When talking about supervised and unsupervised learning we could go much more into details. But I wanted to keep it simple and focus on basic, difference between them. And that is train data. With one we know predicted value in train set, in other we don’t, and understanding that can already help with deciding which algorithm is right for which problem.