DEV Community

Cover image for My First Machine Learning Mini-Adventure with Python
Anuj
Anuj

Posted on

My First Machine Learning Mini-Adventure with Python

Being a Python Developer I have always heard about its application in all the latest technological advancements going on around the world. This always fascinated me to delve into the details of how it is done.

Alt Text

And coincidentally, I heard a news about one of my relatives being suffering from Parkinson’s Disease and how they the doctors were unable to diagnose it in the earlier stages. 

Some of the Most Challenging Python Projects

This is where I got the idea for my next adventure with Python. I decided to build a system that detects Parkinson's Disease quite easily.Parkinson’s Disease is a disorder of the nervous system that affects the movement of few parts of the body. 

MY MACHINE LEARNING ADVENTURE WITH PYTHON

Insights of the Project

For this Machine Learning Project I used different Python libraries such as scikit-learn, numpy, pandas, and xgboost to build a model by using XGBClassifier. The flow of project goes like this — loading the data, getting the features and labels, scaling the features, then splitting the dataset, building an XGBClassifier, and then calculating the accuracy of the model.

The Power of Python at your fingertips

My Dataset

I used the UCI ML Parkinsons dataset for my project. It has 24 columns and 195 records and is sized only 39.7 KB

A huge collection of Datasets

Prerequisites

I installed the following libraries through pip:

pip install numpy pandas sklearn xgboost

I also installed JupyterLab and then run through the following command:

C:\Users\Anuj>jupyter lab

This prompted a new JupyterLab window into the browser. Then I created a new console and typed in my code, then I pressed Shift+Enter to execute the lines.

'Coz this Cheat Sheet is a treat for all the Tech freaks

BUILDING THE PROJECT

STEP №1

The first step of any project is to make all the necessary imports for the project. I used the following commands for the same.

import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

STEP №2

Now the next step was to read the data into a DataFrame so that I get the first 5 records.

#Anuj— Read the data
df=pd.read_csv(‘D:\\Rinu\\parkinsons.data’)
df.head()

Screenshot:

Alt Text

STEP №3

Then I got the features and labels from the DataFrame (dataset). The features are all the columns except, and the labels are those in the column.

#Anuj — Get the features and labels
features=df.loc[:,df.columns!=’status’].values[:,1:]
labels=df.loc[:,’status’].values

STEP №4

The ‘status’ column has values 0 and 1 as labels; then I counted the labels for both 0 and 1.

#Anuj— Get the count of each label (0 and 1) in labels
print(labels[labels==1].shape[0], labels[labels==0].shape[0])

There are 147 ones and 48 zeros in the status column in the dataset.

STEP №5

So, now I initialized a MinMaxScaler and scaled the features to between -1 and 1 to normalize them. The MinMaxScaler is meant to transform the features by scaling them to a given range. The fit_transform() method does two things; fits to the data and then transforms it. There's no need to scale the labels.

#Anuj — Scale the features to between -1 and 1
scaler=MinMaxScaler((-1,1))
x=scaler.fit_transform(features)
y=labels

STEP №6

Time to split the dataset into training and testing sets keeping only 20% of the data for testing.

#Anuj — Split the dataset
x_train,x_test,y_train,y_test=train_test_split(x, y, test_size=0.2, random_state=7)

STEP №7

I then initialized an XGBClassifier and trained the model. The classification is done using eXtreme Gradient Boosting, i.e. using gradient boosting algorithms for modern Data Science problems. This comes under the category of Ensemble Learning in ML, where we train and predict using many models to produce one superior output.

#Anuj — Train the model
model=XGBClassifier()
model.fit(x_train,y_train)

Screenshot:

Alt Text

STEP №8

The final step is to generate y_pred (predicted values for x_test) and to calculate the accuracy for the model. Finally I printed it.

#Anuj — Calculate the accuracy
y_pred=model.predict(x_test)
print(accuracy_score(y_test, y_pred)*100)

Screenshot:

Alt Text

THE FINAL NOTE...

In this Machine Learning project of mine, I detected the presence of Parkinson’s Disease in people using various factors. For this, I used an XGBClassifier and made use of the sklearn library to prepare the dataset. Through this I achieved an accuracy of 94.87%, which is more than decent considering the number of lines of code in this Python project.

Have all the Latest Technology Trends before your friends

Top comments (0)