DEV Community

Daniel Deutsch
Daniel Deutsch

Posted on

Understanding Machine Learning


Photo by Denys Nevozhai on Unsplash

This is short overview of machine learning. What it is, what learning is and what it's most common concepts are. It is designed as a first step into the topic.


"A wise man can learn more from a foolish question than a fool can learn from a wise answer." - Bruce Lee

What is Machine Learning (ML)

ML finds patterns in data and uses them to predict the future.

Learning requires:

  • identifying patterns
  • recognizing those patterns

Now it's easy to find patterns. But it is not easy to find patterns that are correct. Increasing the size of data allows to predict outcome that is more and more correct.

Data Algorithm Model Application
contains patterns finds patterns recognizes patterns uses to recognition on other data

img

By Megajuice - Own work, CC0, Link

Common programming languages used for ML are:

  • R
  • Python

The learning process

1. Asking questions

  • what questions to ask
  • what data helps you to answer the question
  • how do you measure success

2. Iterate

  • select and prepare your data over and over to make it useable for the algorithm
  • apply an algorithm on the data and create models over and over to increase your success rate
  • expose and test successful models to different data

ML concepts

  • supervised learning (the value you want to predict is already in the data)
  • unsupervised learning (the value you want to predict is not in the data)

Data preprocessing with supervised learning

Raw data has to be transformed in to training data by removing unnecessary items like duplicates, wrong/false information, useless information.

The training data contains features, which stand for important classifications and target values, which stand for the desired piece of information for the model.

Problems

regressions classification clustering
Goal trying to find a line or curve that fit the data trying to group data into classes trying to identify segments of the data
Example img img img
Image Credit By Sewaqu - Own work, Public Domain, Link By Elizabeth Goodspeed - Own work, CC BY-SA 4.0, Link By Chire - Own work, CC BY-SA 3.0, Link

Algorithms

Common styles are:

  • decision trees (construct a model based on actual values of attributes in a data)

img

By Stephen Milborrow - Own work, CC BY-SA 3.0, Link

  • neural networks (construct a model based on the recombination and reevaluation of results within the training data) img

By Glosser.ca - Own work, Derivative of File:Artificial neural network.svg, CC BY-SA 3.0, Link

  • bayesian (filters according to probabilistic classifiers)

img

By AnAj - Own work (Original text: self-made), Public Domain, Link

  • K-means (construct a model based on vector quantization to the k closest training examples)

img

By Chire - Own work, Public Domain, Link

(Iris flower data set, clustered using k means (left) and true species in the data set (right). Note that k-means is non-determinicstic, so results vary. Cluster means are visualized using larger, semi-transparent markers. The visualization was generated using ELKI.)

Training the model

  1. find features that are relevant to identifying the target value
  2. put a significant percentage of the features data into the algorithm
  3. generate a model
  4. test the model with the remaining percentage of the features data by comparing the target values with the values form the actual data
  5. if the model is not accurate, change the features, change the algorithm or change the data

img
By Docurbs - Own work, CC BY-SA 4.0, Link


Thanks for reading my article! Feel free to leave any feedback!

Top comments (0)