Photo by Denys Nevozhai on Unsplash
This is short overview of machine learning. What it is, what learning is and what it's most common concepts are. It is designed as a first step into the topic.
ML finds patterns in data and uses them to predict the future.
- identifying patterns
- recognizing those patterns
Now it's easy to find patterns. But it is not easy to find patterns that are correct. Increasing the size of data allows to predict outcome that is more and more correct.
|contains patterns||finds patterns||recognizes patterns||uses to recognition on other data|
Common programming languages used for ML are:
- what questions to ask
- what data helps you to answer the question
- how do you measure success
- select and prepare your data over and over to make it useable for the algorithm
- apply an algorithm on the data and create models over and over to increase your success rate
- expose and test successful models to different data
- supervised learning (the value you want to predict is already in the data)
- unsupervised learning (the value you want to predict is not in the data)
Raw data has to be transformed in to training data by removing unnecessary items like duplicates, wrong/false information, useless information.
The training data contains features, which stand for important classifications and target values, which stand for the desired piece of information for the model.
|Goal||trying to find a line or curve that fit the data||trying to group data into classes||trying to identify segments of the data|
|Image Credit||By Sewaqu - Own work, Public Domain, Link||By Elizabeth Goodspeed - Own work, CC BY-SA 4.0, Link||By Chire - Own work, CC BY-SA 3.0, Link|
Common styles are:
- decision trees (construct a model based on actual values of attributes in a data)
- neural networks (construct a model based on the recombination and reevaluation of results within the training data)
- bayesian (filters according to probabilistic classifiers)
- K-means (construct a model based on vector quantization to the k closest training examples)
(Iris flower data set, clustered using k means (left) and true species in the data set (right). Note that k-means is non-determinicstic, so results vary. Cluster means are visualized using larger, semi-transparent markers. The visualization was generated using ELKI.)
- find features that are relevant to identifying the target value
- put a significant percentage of the features data into the algorithm
- generate a model
- test the model with the remaining percentage of the features data by comparing the target values with the values form the actual data
- if the model is not accurate, change the features, change the algorithm or change the data
Thanks for reading my article! Feel free to leave any feedback!