DEV Community

Cover image for Kenya's COVID-19 Curve: Peaks, Silences, and Predicting the Future
Webs254
Webs254

Posted on

Kenya's COVID-19 Curve: Peaks, Silences, and Predicting the Future

While whispers of persistent coughs and distant outbreaks linger, the global narrative of COVID-19 seems to have shifted. Yet, the virus's shadow remains, particularly in regions like Kenya. As an MPH student in Epidemiology and Disease Control, I embarked on a data-driven exploration of Kenya's COVID-19 journey using the World Health Organization (WHO) dataset, venturing up to December 31, 2023.

Cleaning and Shaping the Data:

Before delving into the Kenyan story, I addressed the messy reality of data. Country names were streamlined for clarity (Tanzania replacing "United Republic of Tanzania"), and missing values were tackled.
I carved out a dedicated dataframe for Kenya, ready for focused analysis. Some of the code I utilized to achieve this is tagged below:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df['Country'].replace({"Bolivia (Plurinational State of)":"Bolivia", "Democratic Republic of the Congo":"DRC", "Iran (Islamic Republic of)": "Iran", "Kosovo (in accordance with UN Security Council resolution 1244 (1999))": "Kosovo", "Micronesia (Federated States of)": "Micronesia", "Netherlands (Kingdom of the)": "Netherlands", "occupied Palestinian territory, including east Jerusalem": "Palestine", "Republic of Korea": "South Korea", "Republic of Moldova": "Moldova", "Russian Federation": "Russia", "Syrian Arab Republic": "Syria", "United Kingdom of Great Britain and Northern Ireland": "UK and Norther Ireland", "United Arab Emirates": "UAE", "United Republic of Tanzania": "Tanzania", "United States of America": "USA", "United States Virgin Islands": "Virgin Islands", "Venezuela (Bolivarian Republic of)": "Bolivia"}, inplace=True)

Kenya_Statistics = df[df['Country'] == 'Kenya']
Enter fullscreen mode Exit fullscreen mode

Unveiling Kenya's COVID-19 Landscape:

The data revealed a captivating story:

Peak Panic: December 26, 2021, saw Kenya grapple with its highest reported caseload – a staggering 19,023.
Early Echoes: The lowest case numbers were recorded on January 5, 2020, likely reflecting limited detection efforts in the pandemic's nascent stages.
Spikes and Silences: The data displayed periods of worrying spikes, interspersed with quieter stretches. However, a concerning gap emerged after November 11, 2023, hindering further analysis and potentially impacting the accuracy of predictions.

A visualization of Covid-19 Deaths and Infections in Kenya

A visualization of Covid-19 Infections in Kenya

Predicting the Future with Prophet:

Despite the data gap, I ventured into the realm of prediction using Prophet, a simple yet powerful forecasting tool. The model, while projecting zero cases for later periods, highlighted the limitations of incomplete training data. This serves as a stark reminder: accurate models rely on robust and comprehensive data.

from sklearn.model_selection import train_test_split

train_data, test_data = train_test_split(Kenya_Statistics, test_size=0.2, shuffle=False)

from prophet import Prophet

train_prophet = train_data.reset_index().rename(columns={'Date_reported': 'ds', 'New_cases': 'y'})

prophet_model = Prophet()
prophet_model.fit(train_prophet)

future = prophet_model.make_future_dataframe(periods=5, freq='M')
forecast = prophet_model.predict(future)

prophet_model.plot(forecast, xlabel='Date', ylabel='New cases and New deaths', figsize=(15, 6))
plt.title('Forecast: Infections and Deaths Over Time in Kenya next 5 months')
plt.legend()
plt.show()

Enter fullscreen mode Exit fullscreen mode

This points the need towards testing validity and reliability of data when developing models.

Prediction of future cases using Prophet

Beyond the Numbers:

This exploration offers valuable takeaways:

Data matters: Highlighting the importance of data quality and completeness for reliable predictions.
Machine learning's potential: Demonstrating the power of machine learning tools like Prophet in healthcare decision-making.
Addressing data gaps: Emphasizing the need for continuous data collection and filling existing gaps for accurate analysis.

Machine learning models could help various industries in predicting future results. A production facility could use data to predict the production output of a process in the future. It could also be used to predict health events such as epidemics.

The Road Ahead:

My short journey through Kenya's COVID-19 data is just the beginning. Further research is needed to address data gaps, refine models, and provide reliable predictions for informed decision-making. As we navigate the pandemic's evolving landscape, let's remember: that high-quality data is our compass, and machine learning tools can be powerful allies in charting a safer future.

The code I have used for my models and Exploratory Data Analysis can be found at my Github .

Top comments (0)