DEV Community

Cover image for The Ultimate Guide To Getting Started in Data Science
Andisi (Roseland) Ambuku
Andisi (Roseland) Ambuku

Posted on

The Ultimate Guide To Getting Started in Data Science

I am sure if you follow #TechTwitter, you have come across this buzzword; Data Science. (Alongside Web 3 and JavaScript of course).

Let us explore what Data Science is and find out what makes it so popular.

Data Science is defined as deriving insights from a large amount of data. Data refers to the facts that are collected for analysis.

Why learn Data Science?

We live in a world that is data-driven meaning a large number of decisions that are made are arrived at with data at the core of the decision.
Data also helps us understand the world better. Data on populations, trade e.tc help us better understand how the world works and how to optimize our world.
With the genesis of digital devices such as smartphones, laptops, and even refrigerators we are constantly churning large amounts of data which can lead to actionable about ourselves, for example, our Social Media usage (yikes! probably might want to cut down on it)

What next?

Now, that you know how vital Data Science is in our modern world, you presume we dive into the tools and skillset, No (we'll get their eager learner)

The foundation for proper data science is understanding the issue or problem you are trying to solve. This will help guide the data science process and lead to very insightful actions from the data, for example, if the data consists of marketing and customer data then the problem should be defined as better targeted marketing.
This is a simple checklist to guide the process

  1. Assess the issue
  2. Define the objective
  3. Define the problem With the problem clearly defined you can proceed to the next aspect.

The Data Science Process

data science process
With the problem defined you can go ahead and solve it via this process.
Data Collection
The first step is to collect the data related to the problem you are trying to solve, for example,
Problem: The extent of Covid-19 spread in the country
Data: Reported Covid-19 cases in the country

Data Preparation
Once you have collected the data, explore the data to find patterns and interests that suit the problem. This process is important as it helps you know the important things you need from the data and what to eliminate.
Here is where data cleaning, the backbone of data science, lies. Missing data is handled and outliers are taken care of.

Data Analysis
Once you have clean data to work with you can analyze the data and select features of the data that will help create the best model, A data model is a conceptual model that organizes how different features of data relate to one another. This will assist in the analysis of your data and reveal insights.

Report on the data findings
Once the insights have been deduced, you move on to writing a report on your findings. You present the insights and visualize the results to help your audience better understand the outcome of the whole process. You can use data visualization tools such as charts and heat maps to present findings.

Implement Actions
A the end of the process you deduce the insight, for example, areas that require a lockdown to curb the spread of Covid-19. You present this as a recommendation to your audience and the go-ahead to implement it.

Tools and Skillset

After going through the data process you have realized there are tools and skills that are required to fulfill this process. The tools and skillset are as follows;

  • Data Collection
    Tools: Cloud services such as AWS, Azure and Google Cloud Platform

  • Data Cleaning
    Tools and Languages: SQL, Python, R, Databases, for example, Postgres and MongoDB

  • Data Visualization
    Tools and Libraries: Excel, Tableau, Power BI, Matplotlib, Seaborn

Link to resources

Until next time, may the code be with you.

Oldest comments (0)