Hello! Welcome to this post where we talk about a popular machine learning algorithm: KNN Classifier. We'll go through what it is and write it ourselves using python.
You can find the code in this post here: https://github.com/ashwins-code/machine-learning-algorithms/blob/main/knn-classifier.py
What is a KNN Classifier and how does it work??
KNN stands for K-Nearest-Neighbours. A KNN Classifier is a common machine learning algorithm that classifies pieces of data.
Classifying data means putting that data into certain categories. An example could be classifying text data as happy, sad or neutral.
The way a KNN Classifier works is by having some initial training data which contains several data points and their respective classifications. When it is asked to classify some new data, it looks at the k closest data points in the training data (this is commonly calculated using the euclidean distance which calculates the straight line distance between 2 points).
After it has got the k closest data points, it finds the most common classification of these data points and returns that as the classification of the new data it is trying to classify.
Our classification problem
Here's the classification problem we'll want to solve using the KNN Classifier we're about to create.
Given an RGB value, we want to classify it as red, blue or green.
If you don't know what an RGB value is, it shows how much red, green and blue is in a colour. The max a number can be in an RGB value is 255. Here's an example:
(255, 200, 140)
The first number is how red the colour is, the second number is how green it is and the third number is how blue the colour is. Now the following should make sense
(255, 0, 0) #Red
(0, 255, 0) #Green
(0, 0, 255) #Blue
In the context of our problem, we will consider a value such as (195, 50, 0) to be red as it is the closest to (255, 0, 0)
The Code!
First, let's import numpy for mathematical computations
import numpy as np
Now let's create a function which returns the mode of a given list
def get_mode(l):
mode = ""
max_count = 0
count = {}
for i in l:
if i not in count:
count[i] = 0
count[i] += 1
if count[i] > max_count:
max_count = count[i]
mode = i
return mode
Let's create our KNN Classifier class
class knn_classifier:
def __init__(self):
self.data_points = []
self.classifications = []
def add_example(self, data_point, classification):
#Adding training data points
#self.data_points contain the data points themseleves, self.classification contain their respective classifications
self.data_points.append(data_point)
self.classifications.append(classification)
def classify(self, input, k = 3):
#Classifies new data
classification = sorted(self.classifications, key = lambda x: np.linalg.norm(np.subtract(input, self.data_points[self.classifications.index(x)])))[:k]
#The above line may seem confusing. It sorts self.classification by the euclidean distance between each classification's respective data point and the input data point
#"classification" is ultimately sliced to contain the classifications of the k closest data points
# Returns the final classification
return get_mode(classification)
Finally, let's setup a new classifier and classify some values!
classifier = knn_classifier()
training_data_points = [
[[255, 0, 0], "red"],
[[0, 255, 0], "green"],
[[0, 0, 255], "blue"],
[[250, 5, 5], "red"],
[[5, 250, 5], "green"],
[[5, 5, 250], "blue"],
[[245, 10, 10], "red"],
[[10, 245, 10], "green"],
[[10, 10, 245], "blue"],
]
for point in training_data_points:
classifier.add_example(point[0], point[1])
print (classifier.classify([250, 0, 0], k = 3))
print (classifier.classify([100, 180, 50], k = 3))
print (classifier.classify([50, 50, 190], k = 3))
Our output
red
green
blue
It works! Here's all the code together
import numpy as np
def get_mode(l):
mode = ""
max_count = 0
count = {}
for i in l:
if i not in count:
count[i] = 0
count[i] += 1
if count[i] > max_count:
max_count = count[i]
mode = i
return mode
class knn_classifier:
def __init__(self):
self.data_points = []
self.classifications = []
def add_example(self, data_point, classification):
#Adding training data points
#self.data_points contain the data points themseleves, self.classification contain their respective classifications
self.data_points.append(data_point)
self.classifications.append(classification)
def classify(self, input, k = 3):
#Classifies new data
classification = sorted(self.classifications, key = lambda x: np.linalg.norm(np.subtract(input, self.data_points[self.classifications.index(x)])))[:k]
#The above line may seem confusing. It sorts self.classification by the euclidean distance between each classification's respective data point and the input data point
#"classification" is ultimately sliced to contain the classifications of the k closest data points
# Returning the final classification
return get_mode(classification)
classifier = knn_classifier()
training_data_points = [
[[255, 0, 0], "red"],
[[0, 255, 0], "green"],
[[0, 0, 255], "blue"],
[[250, 5, 5], "red"],
[[5, 250, 5], "green"],
[[5, 5, 250], "blue"],
[[245, 10, 10], "red"],
[[10, 245, 10], "green"],
[[10, 10, 245], "blue"],
]
for point in training_data_points:
classifier.add_example(point[0], point[1])
print (classifier.classify([250, 0, 0], k = 3))
print (classifier.classify([100, 180, 50], k = 3))
print (classifier.classify([50, 50, 190], k = 3))
Top comments (1)
It is a good article with also has codes l like and your explanation also.
I also write a blog on KNN please visit it and provide me your valuable thoughts: - KNN