Unsupervised learning describes a subset of problems in machine learning where we want to find relationships in data, but we don't have the ability to "teach" or "supervise" the model with target outputs. This is what distinguishes this type of problem from supervised learning: The data is not labeled. Instead you give your model an input, which is your data, and the model, as its output, tries to derive structure based on the inherent similarities and groupings between various inputs. In unsupervised learning the model is looking for a description that summarizes the data.
Here are some ways to divide up the data:
Clustering: no labels are provided to the algorithm. Instead it separates data into different groups based on similarities between features in the data.
Density estimation: Summarizing the probability distribution that appears to give rise to the observed patterns in the data.
Dimensionality reduction: A technique for finding a smaller set of input variables that retain the same properties as the original dataset, to improve the performance in predictive modeling that you might later feed these inputs to.
- Annotating large datasets: Say we have a massive amount of data and there's just too much to label manually. Unsupervised learning can help fill that gap.
- Data Mining: you have data, but you aren't yet sure what classes are present in the data, or even how many different classes there are.
- Data Preparation: maybe you want to get an idea of the structure of your data before devising a classifier model.
So unsupervised learning can be used on its own, or as one of the intermediary steps between data preparation and predictive modeling.
There is much more to be said about unsupervised learning, and I haven't really scratched the surface of it in my studies, so I will not be the one to say those things at this time. 😉
I hope you enjoyed learning a little bit about the possibilities of machine learning today.