Exploratory Data Analysis(EDA) is one of the fundamental steps in a Data Science project. In this article we will dive deep into what EDA is and its applications and why it is important in the Data Science world.

### What is Exploratory Data Analysis?

Exploratory Data Analysis is a technique used by Data Scientists/Analysts to analyse and investigate datasets and summarize the main characteristics mostly using data visualization tools such as `matplotlib`

.

EDA helps us identify errors in a dataset, understand patterns in a dataset and also detect outliers. This step is quite useful because it helps one provide valid results from a dataset.

### Steps in Exploratory Data Analysis

**1. Understand the Data and Problem**

First step is to look at the dataset we are dealing with and trying to understand what problem we are trying to solve. Here we set out clear objectives of what we want to achieve

**2. Data Collection**

Here we import our dataset into the environment we are using i.e. if we are using `pandas`

to load a csv file we use the following command;

`df = pd.load_csv('weather_data.csv')`

We then inspect the dataset, checking the rows and columns, any missing data or any errors in the dataset

**3. Data Cleaning**

In data cleaning we will look at a few things i.e. ;

Remove any duplicates in the dataset

Check for any missing values-impute or remove any missing values

Fix any apparent errors in the dataset

Convert columns to appropriate data types

**4. Data Visualization**

Now that we have explored and cleaned our data, we can present our findings graphically in order for it to be consumed by anyone who does not understand the dataset in its raw form.

Some of the visualization tools we can use include:

Bar Charts

Box plots

Scatter plots

Heatmaps and many more.

### Types of Exploratory Data Analysis

There are three main types of EDA namely;

Univariate Analysis

Bivariate Analysis

Multivariate Analysis

#### a). Univariate Analysis

Involves looking at one variable at a time. This can help you identify outliers. We can use *Histogram* to present this graphically .

Example of a univariate analysis;

#### b). Bivariate Analysis

Involves taking at least two variables. This can help you identify the relationship between two variables. Graphically we can use *Scatter plot* to represent this data.

Example of a Bivariate analysis;

#### c). Multivariate Analysis

Involves taking three or more features to help identify the relationship between the variables. Graphically we can use *Pair plot*

to represent this data.

Example of a Multivariate analysis;

### Tools used in Exploratory Data Analysis

We use different tools in EDA for example Python, R etc. In this article we will focus more on Python.

Libraries used in EDA in Python include ;

Pandas

NumPy

Matplotlib

Seaborn

### Conclusion

In conclusion, EDA is very important in any problem being looked at. For one to find conclusive and valid results we must perform EDA as one of the key steps in providing a solution to real life problems.

## Top comments (0)