DEV Community

Mr Codeslinger
Mr Codeslinger

Posted on

K-Nearest Neighbors with Scikit-learn

Before we start talking about K-Nearest Neighbors, I'm going to list other common classification algorithms in Machine Learning:

  • Logistic regression
  • Support Vector Machines
  • Decision trees
  • Random forests
  • Naive Bayes classifier

Now I'm gonna focus on the questions that are probably in your head right now.

What is the K-Nearest Neighbors algorithm?

This algorithm is used to solve the classification model problems. K-nearest neighbor or K-NN algorithm creates an imaginary boundary to classify the data. When new data points come in, the algorithm will try to predict that to the nearest of the boundary line.

It could look like this:

Image

From this image, you would be able to depict that:

  • When k=3 the new data point(the star) introduced is going to be classified into Class B because there are more Class B data points in the imaginary boundary.

  • When k=6 the new data point(the star) introduced is going to be classified into Class A because there are more Class A data points in the imaginary boundary.

Before we start coding you'll need to install the dataset we're gonna use. Click here to install the dataset we're gonna use. Open the file named Breast_cancer_data.csv. You should see something like this:

Image

GOAL OF THE DAY

We're gonna make a classification model that would be able to predict whether a breast is cancerous or not.

We're gonna start coding now.

Importing the needed libraries

import pandas as pd
import numpy as np
from sklearn.neighbors import KNeighborsClassifier

Load and view the dataset

df = pd.read_csv('Breast_cancer_data.csv')
df.head()

OUTPUT

Image

Feature Extraction

data = df[["mean_radius", "mean_texture", "mean_perimeter", "mean_area", "mean_smoothness"]]
data = data.values.reshape(-1,5)
labels = df["diagnosis"]

Making a classification model

classifier = KNeighborsClassifier(n_neighbors=100)
classifier.fit(data, labels)
print(classifier.score(data, labels))

Just know that n_neighbors represents k

OUTPUT

0.8945518453427065

Making predictions with your model

print(classifier.predict([[7.76,24.54,47.92,181.0,0.05263]]))

OUTPUT

[1]

This shows that this is a cancerous breast but who knows, our model's prediction might be wrong. Just know that even if your model has a high score some of its predictions might still be wrong.

You can visit Kaggle to find more datasets that you can perform Classification with K-Nearest Neighbors.

Check out my Twitter or Instagram.

Feel free to ask questions in the comments.

GOOD LUCK 👍

Top comments (0)