DEV Community

Cover image for KNN (K-nearest neighbors) Classification
Anurag Verma
Anurag Verma

Posted on

KNN (K-nearest neighbors) Classification

Import necessary libraries:

The code imports numpy, pandas, seaborn, matplotlib, and scikit-learn libraries, which are commonly used for data manipulation, visualization, and machine learning tasks.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
Enter fullscreen mode Exit fullscreen mode

Load the data:

The code reads a csv file containing the dataset into a pandas dataframe using pd.read_csv() method.

# Load your data into a pandas dataframe
df = pd.read_csv("/kaggle/input/knn-data1/KNN_Project_Data")
Enter fullscreen mode Exit fullscreen mode
df
Enter fullscreen mode Exit fullscreen mode
XVPM GWYH TRAT TLLZ IGGA HYKR EDFS GUUB MGJM JHZC TARGET CLASS
0 1636.670614 817.988525 2565.995189 358.347163 550.417491 1618.870897 2147.641254 330.727893 1494.878631 845.136088 0
1 1013.402760 577.587332 2644.141273 280.428203 1161.873391 2084.107872 853.404981 447.157619 1193.032521 861.081809 1
2 1300.035501 820.518697 2025.854469 525.562292 922.206261 2552.355407 818.676686 845.491492 1968.367513 1647.186291 1
3 1059.347542 1066.866418 612.000041 480.827789 419.467495 685.666983 852.867810 341.664784 1154.391368 1450.935357 0
4 1018.340526 1313.679056 950.622661 724.742174 843.065903 1370.554164 905.469453 658.118202 539.459350 1899.850792 0
... ... ... ... ... ... ... ... ... ... ... ...
995 1343.060600 1289.142057 407.307449 567.564764 1000.953905 919.602401 485.269059 668.007397 1124.772996 2127.628290 0
996 938.847057 1142.884331 2096.064295 483.242220 522.755771 1703.169782 2007.548635 533.514816 379.264597 567.200545 1
997 921.994822 607.996901 2065.482529 497.107790 457.430427 1577.506205 1659.197738 186.854577 978.340107 1943.304912 1
998 1157.069348 602.749160 1548.809995 646.809528 1335.737820 1455.504390 2788.366441 552.388107 1264.818079 1331.879020 1
999 1287.150025 1303.600085 2247.287535 664.362479 1132.682562 991.774941 2007.676371 251.916948 846.167511 952.895751 1

1000 rows × 11 columns

Data visualization:

The code creates a scatter plot matrix using the sns.pairplot method from the seaborn library and plots it using the plt.show method from the matplotlib library. This scatter plot matrix is used to visualize the relationships between the variables in the data.

# scatter plot matrix
sns.pairplot(df, hue='TARGET CLASS')
plt.show()
Enter fullscreen mode Exit fullscreen mode

KNN visualization

Split the dataset into training and testing sets:

The code splits the dataset into two parts: training and testing. The train_test_split method from scikit-learn is used to split the data into 80% training data and 20% testing data. The X variable is assigned the values of the dataframe with the target column dropped and y variable is assigned the values of the target column.

# Split the dataset into training and testing sets
X = df.drop("TARGET CLASS", axis=1)
y = df["TARGET CLASS"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Enter fullscreen mode Exit fullscreen mode

Pre-process the data:

The code scales the data using the StandardScaler method from scikit-learn. The method fit_transform is applied to the training data and transform is applied to the testing data. This scaling is necessary because different features in the data have different ranges, and it is important to pre-process the data before applying a machine learning model.

# Pre-process the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Enter fullscreen mode Exit fullscreen mode

Training and testing the KNN model:

The code trains a K-nearest neighbors (KNN) model using the training data and the KNeighborsClassifier method from scikit-learn. The KNN model is then tested using the testing data, and the accuracy of the model is calculated using the accuracy_score method from scikit-learn.

# Train the KNN model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
Enter fullscreen mode Exit fullscreen mode
KNeighborsClassifier()
Enter fullscreen mode Exit fullscreen mode
# Evaluate the model
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy*100)
Enter fullscreen mode Exit fullscreen mode
Accuracy: 82.0
Enter fullscreen mode Exit fullscreen mode




GitHub link: Complete-Data-Science-Bootcamp

Main Post: Complete-Data-Science-Bootcamp

Buy Me A Coffee

Top comments (0)