(This article was originally published on the LiveEdu blog).
Machine learning (abbreviated as ML) is a field of computer science that endows applications with the capability to automatically learn from examples and improve from that experience without relying on hard-coded rules.
With machine learning, it is possible for computers to identify hidden insights and make accurate predictions based on the examples of data provided—with minimal human intervention.
For example, AndreyBu, who has over five years of ML experience, has created a model that can predict stock market data. You can watch and learn from his project here.
In this article, we are going to illustrate how to create a simple machine learning algorithm that can differentiate between an apple and orange.
Without machine learning, finding a solution to such a problem could require writing several lines of code, which may not give the desired results. ML simplifies that process.
Simply, we will create a machine learning algorithm that can learn the differences between the two fruits and make predictions accordingly, after being given some examples.
We are going to use the Anaconda open source Python distribution. It comes with Scikit-learn, which is the machine learning library we’ll use to implement our algorithm.
In this example, we are going to use a supervised machine learning algorithm, which utilizes a known dataset (referred to as the training dataset) to predict future events.
The training dataset, which consists of input data and output values, learns from the examples provided and makes use of the experience to distinguish between the two fruits.
Here is a general flow of the supervised learning recipe:
- Gather training data
- Train the classifier
- Make predictions
Let’s talk about each of the steps.
The training data are the examples of apples and oranges that we’ll classify according to their differences.
Here is a table that uses the features of the two fruits to differentiate them:
As you can see above, the fruits are differentiated according to their weight and texture.
The last column in every row is what is called label in machine learning. In this case, the label can either be an orange or an apple.
Every row in the table is called a data point. The whole table is called the training data.
It’s important to note that the accuracy of the outcome will depend on the number of examples provided in the training data.
Now, let’s use some Python code to show what is happening in the table.
We will use two variables: features (data in the first two columns) and labels (data in the last column).
In other words, features are the input data and labels are the output values.
Here is the code:
features = [[155, "rough"], [180, "rough"], [135, "smooth"], [110, "smooth"]] labels = ["orange", "orange", "apple", "apple"]
Next, because Scikit-learn requires numerical features, let’s convert the strings into integers by defining rough as 0 and smooth as 1.
Let’s do the same for the oranges and apples by giving them integer values of 1 and 0 respectively.
Here is our new code:
features = [[155, 0], [180, 0], [135, 1], [110, 1]] labels = [1, 1, 0, 0]
In ML, classification is a supervised learning technique whereby the algorithm learns from the examples data and then utilizes the experience to categorize the new observation.
Now, we’ll use the examples data to train our classifier to make new observations and categorize them. There are several types of classifiers you can use in your machine learning problems.
In this case, to keep things simple, we are going to use a decision tree, which can use decision rules deduced from the data features to learn and make appropriate predictions.
Here is a code that imports the decision tree classifier and adds it to our project:
classifier = tree.DecisionTreeClassifier()
After adding the classifier to our project, we’ll need to train it using a learning algorithm; otherwise, it may not differentiate between an apple and an orange.
The learning algorithm identifies patterns in the training data and makes suitable conclusions.
For example, it may recognize that apples tend to be smoother in texture; therefore, it will develop a rule that tends to identify any smooth fruit as an apple.
In Scikit-learn, the training algorithm is referred to as Fit (which can loosely be interpreted as “find patterns in data.”).
Here is a code that adds it to our project:
classifier = classifier.fit(features, labels)
The last step after subjecting the classifier to some training is to test it and see if it can classify a new fruit. We’ll use the predict function to make the predictions.
Let’s say that the new fruit is smooth and weighs 120 grams. Remember that we represented smooth as 1.
And, because the weight is not very high, it is likely to be an apple (0).
Furthermore, smoothness is a feature of apples.
Let’s see if our ML algorithm can make such a prediction:
print (classifier.predict([[120, 1]]))
The output is what we expected: 0 (apple).
Here is the entire code for the project:
from sklearn import tree # Gathering training data # features = [[155, "rough"], [180, "rough"], [135, "smooth"], [110, "smooth"]] # Input to classifier features = [[155, 0], [180, 0], [135, 1], [110, 1]] # scikit-learn requires real-valued features # labels = ["orange", "orange", "apple", "apple"] # output values labels = [1, 1, 0, 0] # Training classifier classifier = tree.DecisionTreeClassifier() # using decision tree classifier classifier = classifier.fit(features, labels) # Find patterns in data # Making predictions print (classifier.predict([[120, 1]])) # Output is 0 for apple
Programming with machine learning is not difficult. To master how to use it in your applications, you need to understand a few important concepts.
Therefore, to increase your skills, you can use practical projects from LiveEdu to learn how experts build real world-changing machine learning applications.
Happy learning machine learning!