DEV Community

Alvin Mustafa
Alvin Mustafa

Posted on

Understanding Your Data: The Essentials of Exploratory Data Analysis.

For data to be transformed into information it should first be understood. It would be best if you first analyzed it to know the number of records(rows), features(columns), and data types and identify and handle missing values. Exploratory Data Analysis(EDA), is a very crucial step in any data analysis.

What is EDA
EDA is an abbreviation for Exploratory Data Analysis. It is an important step for analyzing and visualizing data to understand its characteristics, relationships, anomalies as well as discovering patterns. The main goal of EDA is to have a general overview of the data before diving into building predictive models.
Before beginning EDA it is important to know the language used:
Dataset: A collection of data organized in a Structured(Tabular) format.

Value: A specific piece of data such as a number, or a name.

Outlier: It is a data value that is totally different from the rest of the dataset.

Steps Involved in Exploratory Data Analysis
EDA entails a comprehensive range of activities, here's is a breakdown:
1. Data Observation
You start by knowing the size of your dataset, know the number of rows and columns. Data observation helps in determining the method of analysis to use.

2. Data cleaning
Data cleaning involves:

  • Identifying missing values and handling them. They can be handled by filing them with relevant values or dropping the affected rows/columns.
  • Detecting outliers and handling them.
  • Transforming data to make it suitable for data analysis

3. Categorizing your data
This helps to determine the visualization and statistical methods that can be used on your dataset. The values can be placed in the following categories:

  • Numerical: Represent measurable quantities and it is measured in numbers.
  • Categorical: Data that represents categories or groups.
  • Date and Time: represents point in time.

4. Data Visualization
Visualize the dataset using scattter plots, heatmaps, correlation matrices, etc to determine the relationship between variables.

5.Pattern Recognition
Analyzing the data to look for trends and patterns.
Investigating anomalies or unusual patterns in the data and finding its cause.

6. Data Summerization
Summarize the key observations or insights gained from your EDA and suggest the next steps for further analysis.

Tools Commonly used in EDA

  • Python Libraries such as numpy, seaborn, matplotlib and pandas.
  • IDE such as Jupyter Notebook and Spyder.

The information gained during EDA is very important and it is used in making informed decisions such as choosing the right model for your dataset.

Top comments (0)