DEV Community

Cover image for Diabetes Prediction using Machine Learning.
Oluwafunmilola Obisesan
Oluwafunmilola Obisesan

Posted on • Updated on

Diabetes Prediction using Machine Learning.

Machine learning (ML) is a sub set of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so.
Machine learning algorithms uses historical data as input to predict new output values.
If you’re looking to read more about machine learning, check out this article I wrote for FreeCodeCamp[(https://www.freecodecamp.org/news/what-is-machine-learning-for-beginners/)]

In this project, I worked on developing a machine learning model that predicts the diabetic status of a patient. This was done using classification machine learning algorithms; Support Vector Machine and Logistic Regression.

I decided to use both algorithms so I could compare the performance of both on the dataset.

I chose SVM in particular for this project because it excels in handling high-dimensional data, making it adept at identifying complex patterns in datasets, resulting in accurate predictions.

Support Vector Machine (SVM) is quiet a powerful machine learning model that operates by finding an optimal hyperplane to separate data into distinct classes. My interest in SVM stems from its core principles, where maximizing the margin between data points ensures robust classification.

Data Description:
The dataset used for this project is a diabetes focused dataset that contains columns such as age, glucose level, blood pressure, insulin level, BMI, and other data, which were used to determine if a person is diabetic or not.

Steps:

  1. Importing the necessary libraries.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn import svm
from sklearn.metrics import accuracy_score
Enter fullscreen mode Exit fullscreen mode

Image description

2. Loading in the dataset:
The csv was loaded using the code below:

Diabetes_dataset = pd.read_csv("diabetes.csv”)
Enter fullscreen mode Exit fullscreen mode

Image description

A peep into what the dataset looks like:

Diabetes_dataset.head()
Enter fullscreen mode Exit fullscreen mode

Image description

Checking the number of rows and columns present in the dataset.

Diabetes_dataset.shape
Enter fullscreen mode Exit fullscreen mode

Image description

Statistical description of the dataset:

Diabetes_dataset.describe()
Enter fullscreen mode Exit fullscreen mode

Image description

Value counts of number of diabetic and non diabetic records in the dataset.

Diabetes_dataset['Outcome'].value_counts()
Enter fullscreen mode Exit fullscreen mode

Image description

3. Extracting dependent and independent variables

X = Diabetes_dataset.drop(columns = 'Outcome',axis=1)
Y = Diabetes_dataset['Outcome']
Enter fullscreen mode Exit fullscreen mode

Image description

4. Standardizing the “X” values due to the high variation in range of numbers present in the different columns.

scaler = StandardScaler()
scaler.fit(X)
standardized_data = scaler.transform(X)
print(standardized_data)
Enter fullscreen mode Exit fullscreen mode

Image description

The data has now been standardized and the range is now between -1 and +1.

5. Splitting the dataset into test and train.

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify=Y, random_state=2)
Enter fullscreen mode Exit fullscreen mode

Image description

6.Training and fitting the model using Logistic Regression.

model = LogisticRegression()
model.fit(X_train, Y_train)
Enter fullscreen mode Exit fullscreen mode

Image description

7. Checking the accuracy score of the model using the train and test data.

Accuracy score using the train data:

X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy score on Training data : ', training_data_accuracy)
Enter fullscreen mode Exit fullscreen mode

Image description

Accuracy score using the test data:

X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy score on Test Data : ', test_data_accuracy)
Enter fullscreen mode Exit fullscreen mode

Image description

8. Training and fitting the model using Support Vector Machine.

classifier = svm.SVC(kernel='linear')
classifier.fit(X_train, Y_train)
Enter fullscreen mode Exit fullscreen mode

Image description

8. Checking the accuracy score of the model using the train and test data.

Accuracy score using the train data:

X_train_prediction = classifier.predict(X_train)
"training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy score on the training data : ', training_data_accuracy)
Enter fullscreen mode Exit fullscreen mode

Image description

Accuracy score using the test data:

X_test_prediction = classifier.predict(X_test)test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy score on the test data : ', test_data_accuracy)
Enter fullscreen mode Exit fullscreen mode

Image description

From the accuracy score gotten from both model, we can see that the Support Vector Machine performed slightly better compared to the Logistic Regression Model.

Testing the model: Predicting a random individual's diabetics status using the model.

# Step 1
individuals_data = (2,141,84,26,175,34,0.42,36)
# Step  individuals_data_as_numpy_array = np.asarray(individuals_data)
# Step 3
individuals_data_reshaped = individuals_data_as_numpy_array.reshape(1,-1)
# Step 4
std_data = scaler.transform(individuals_data_reshaped)
print(std_data)
#Step 5
prediction = classifier.predict(std_data)
print(prediction)
if (prediction[0] == 0):
   print('The person is not diabetic')
else:
    print('The person is diabetic')]
Enter fullscreen mode Exit fullscreen mode

For the entire code of this project, check the notebook on my GitHub.
[(https://github.com/heyfunmi/Diabetes-Prediction-using-SVM/blob/main/Diabetes_Prediction.ipynb)]

Thank you for reading, Ciao!!

Top comments (0)