DEV Community

Cover image for Demystifying Machine Learning for Beginners
Code_Jedi
Code_Jedi

Posted on • Updated on

Demystifying Machine Learning for Beginners

If you're a confused beginner like I was when just starting out with machine learning in python, then stick around, because today, I'll be trying my best at demystifying and simplifying machine learning for you!


To start off, I presume that you would like to learn machine learning for the following reasons:

  1. Working with datasets
  2. Visualizing data
  3. Predicting data
  4. Classifying data

In this tutorial we're going to be making a python script, that will:

  • Load a dataset
  • Visualize the dataset
  • Classify a new piece of data given the dataset

Let's get started!

First, let's import the required libraries:

import pandas
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier
from sklearn import preprocessing
Enter fullscreen mode Exit fullscreen mode

If you don't have some of these installed, you can install them by using pip install or pip3 install


Next, we're going to load-in the dataset which we're going to be using for this project:

import pandas
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier
from sklearn import preprocessing

df = pandas.read_csv('IRIS.csv')
Enter fullscreen mode Exit fullscreen mode

For this project, we're going to be using the classic iris dataset which you can download here


Now comes the tricky bit...

Add these lines of code to your python script:

model = KNeighborsClassifier(n_neighbors=3)

features = list(zip(df["sepal_length"], df["sepal_width"]))

model.fit(features,df["species"])
Enter fullscreen mode Exit fullscreen mode

Let me explain...

  • First, we define our model and give it 3 possible classes into which a new piece of data can be classified.
  • We then define the "features" variable which is going to take the "sepal_length" and "sepal_width" columns as the characteristics that we're going to compare in order to classify new pieces of data.
  • Finally, we fit our model with the names of the 3 Iris species, as well as their corresponding "sepal_length" and "sepal_width" values.

Before, we start predicting new pieces of data, let's graph our dataset using a scatter graph. In our graph, the X axis will be representing the "sepal_length" and the Y axis will be representing the "sepal_width". We're also going to color code the different species of Iris flowers by adding hue='species'. and then finally we'll define the data that we're going to be graphing as our Iris dataset by adding data=df to the end:

sns.scatterplot(x='sepal_length', y='sepal_width',
                hue='species', data=df, )

# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1), loc=1)

plt.show()
Enter fullscreen mode Exit fullscreen mode

Here's how the scatter graph should look:
scatter


To start classifying new pieces of data, first comment out the last code snippet like so:

import pandas
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier
from sklearn import preprocessing


df = pandas.read_csv('IRIS.csv')
model = KNeighborsClassifier(n_neighbors=3)

features = list(zip(df["sepal_length"], df["sepal_width"]))

model.fit(features,df["species"])

"""sns.scatterplot(x='sepal_length', y='sepal_width',
                hue='species', data=df, )

# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1), loc=1)

plt.show()
"""
Enter fullscreen mode Exit fullscreen mode

Then add these 2 lines of code to the end of your script:

predicted = model.predict([[4.6,5.8]]) 
print(predicted) 
Enter fullscreen mode Exit fullscreen mode

This will simply predict which species of Iris flower is one that has a sepal_length of 4.6 and a sepal_width of 5.8.


Now if you run your code, your output should look like this:

['Iris-setosa']
Enter fullscreen mode Exit fullscreen mode

This means that our new mystery Iris flower has been classified as an "Iris-setosa".


Congratulations!

You've made your first machine learning project!


You can now experiment with this code as well as try some new datasets(you can find lots of great ones on https://www.kaggle.com/).


Byeeeee👋

Top comments (0)