DEV Community

Marcos
Marcos

Posted on

What is Exploratory Data Analysis (EDA)?

Exploratory Data Analysis (EDA), according to IBM, is a method used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization techniques.

So we can say that EDA is the process of investigating and understanding your data set by creating visualizations and summaries.

  1. Identify emerging trends or recurring patterns within the data.
  2. Validate assumptions through data analysis.
  3. Extract meaningful insights to enhance understanding.
  4. Cleanse data by eliminating irregularities and extraneous values.

Why do we need EDA?

Frankly, EDA is so important in the data science/machine learning workflow that the real question should be "what would we do without EDA!"

EDA is our way of interrogating data to find out everything we can about it and understand why it is the way it is (i.e. identifying trends, patterns, anomalies, etc.).

  1. Detecting and eliminating outliers to ensure data integrity.
  2. Analyzing temporal and spatial trends for comprehensive insight.
  3. Revealing correlations between variables and the target of interest.
  4. Formulating hypotheses and conducting rigorous experiments for validation.
  5. Exploring additional data sources to enrich analysis and understanding.

Types of Exploratory Data Analysis

There are various techniques within EDA, each offering its own unique perspective on the data landscape.

Univariate Analysis

When we do univariate analysis, we're focusing on just one thing – like a single variable.

We're not trying to figure out cause and effect here, just getting a handle on that one piece of data.

Bivariate Analysis:

With bivariate analysis, things get a bit more interesting. Now we're looking at how two variables relate to each other.

We're comparing them, seeing if there's any connection between them.

Multivariate Analysis:

Multivariate analysis takes it even further. Now we're looking at more than just two variables.

We're diving into a whole bunch of data points, whether they're numbers or categories. And the cool part? We can represent all this info in lots of ways – numbers, graphs, you name it.

It's like unlocking the full story hidden in the data!

Top comments (0)