Welcome to Data Stories, this is a series where we will explore various data and deduce insights. We shall listen to the data as it tells us its story.
We shall begin with a story about medical appointment attendance in Brazil. The data hails from Kaggle and we shall explore it to find out whether or not Brazilians attend their medical appointments.
Side Note: This project was part of the Udacity Data Analytics Nanodegree and why not share it with fellow techies.
Table of Contents
- Questions For Analysis
- Data Wrangling
- Data Cleaning
- Exploratory Data Analysis
Introduction: Dataset Description
The no-show appointments dataset is a collection of 100,000 medical appointments made by Brazilian patients who did not show up. It aims to help us figure out why they don't show up for their doctor's visits. There is enough data to tell a story about why people don't show up and to help mitigate this problem. This will aid in answering questions in the analysis portion and overcoming issues such as poor patient management.
Questions For Analysis
The following are some of the questions that we seek to answer from the analysis of this dataset:
- How is the attendance to the appointments by the patients?
- What is the relationship between the independent variables and the dependent variable?
- What is the attendance of the patients on the Scholarship program to their medical appointments?
In this section the data is loaded into the notebook and we'll review it to see if there are any errors, which we'll address in the Data Cleaning section.
We begin by importing the necessary libraries.
#Import statements for all of the packages that are used. import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline
During the wrangling process there are few problems noticed with the dataset and they include:
The column names Hipertension and Handicap and are misspelled while SMS_ received doesn't require the underscore also No-show does not need the - sign.
The Patient ID and Appointment ID columns are not relevant to this analysis.
The datatypes of Scheduled day and Appointment day are incorrect. They are objects which is incorrect because they are dates which should be date datatypes.
In this Data Cleaning section we shall make corrections to aforementioned problems.
Exploratory Data Analysis
The data is clean and ready for exploration. Here statistics shall be used to address the questions in the Introduction part and visualizations will also be created to answer the questions.
The important details from the visualizations are as follows:
Gender - there is a stark difference between males and females in terms of attendance to their medical appointments. The visualization show fewer males are attending their appointments compared to the females.
SMS received - there is a large difference in the people who do receive SMS reminders for the appointments. A large number of those who receive the text do not show up to their appointments.
Diabetes - there is a dismal trend in the patients who have Diabetes. The visualization shows they attend their appointments the least compared to those without the disease.
Hypertension - there is a dismal trend in the patients who have Hypertension. The visualization shows they attend their appointments the least compared to those without the disease.
In this section a detailed summary of the research is provide in relation to the questions posed in the Introduction section.
From the study of 100,000 medical appointments by patients in Brazil we can draw the general conclusion that most of them do not attend their scheduled appointments. This is a sad state of affairs as medical appointments are very useful in early detection of diseases and also management of patients' conditions.
How is the attendance to the appointments by the patients?
The attendance of patients to their medical appointments is poor as only 20% of them attend their appointments that they schedule. 80% of patients fail to attend their appointments and this poses a challenge of management of their conditions such as Diabetes and Hypertension. This may have a domino effect of symptoms getting worse and even early deaths when this could be prevented by the patients attending their appointments.
What is the relationships between the independent variables and the dependent variable?
In terms of Gender, there is a difference between males and females in terms of attendance to their medical appointments with fewer males are attending their appointments compared to the females.
In terms of SMS received, there is a large difference in the people who do receive SMS reminders for the appointments with a greater number of those who receive the text not showing up to their appointments.
In terms of Diabetes, there is a dismal trend in the patients who have Diabetes, they attend their appointments the least compared to those without the disease.
In terms of Hypertension, there is a dismal trend in the patients who have Hypertension, they attend their appointments the least compared to those without the disease.
What is the attendance of the patients on the Scholarship program to their medical appointments?
The attendance of patients on the Scholarship program is poor with only 10% of them attending their scheduled appointments while an astounding 90% do not. This was a special feature to study because the Scholarship program dubbed Bolsa Familia is a welfare program intended to assist families in Brazil by providing them with an allowance, this move as made to assist them in part with their medical expenses. However, as the study shows this is not the case. This situation needs to be looked into by the government and measures put in place to ensure the program works for the people of Brazil and also they should be encouraged to attend their medical appointments.
Limitations in the research
There may be more factors around the reasons Brazilian citizens fail to show up to their medical appointments that may not have been factored in to this research and may affect the results of the study.
Additional research needs to be done on the Scholarship program to find out why the program intention was not met and why citizens on the Scholarship program are not responding to medical attention in terms of attending their scheduled appointments.
Pereira, A. W. (2015). Bolsa Família and democracy in Brazil. Third World Quarterly, 36(9), 1682-1699.
Here is the repo with the notebook. Feel free to leave a comment and critique or share more insights we could deduce.
Until next time, may the code be with you.