Exploratory Data Analysis (EDA) is an approach to understand the dataset by making some summarization and visual representation on it. While summarizing the data, we can get some essential information that can be utilized while building our machine learning model. EDA will give better features to be used to find more useful insight from the data. This gives the different perspectives of data from visualizing the information.
Anyone who is exploring or working with data needs to understand the data with multiple perspectives before use that further in machine learning model building.
To analyze the data, we can approach the data by visualization technique, and by applying statistical analysis, we get a better view. Exploratory Data Analysis (EDA) is the process of discovering hidden patterns and useful information from the data.
The above image shows that the EDA process is comprised of different steps such as data collection, data preprocessing, data cleaning, and data analysis. EDA will support to validate the questions on the data, which comes from the technical perspective.
What is the need for the EDA?
The main aim of the EDA process is to use statistical techniques to efficiently summarize and visualize a better view of data, and find values about the importance of the data, its quality, and derive the new perspective and the suggestion of our analysis. EDA is always trying to give an answer to the questions on the data.
EDA is an approach for data analysis that involves a variety of techniques to:
- Exploit understanding into a dataset
- Discover different underlying structures into a dataset
- Important feature extraction from the dataset
- Identify outliers and irregularities
- Getting the answer to the various assumptions on the dataset
EDA process is iterative in nature because you have to make some thoughts and assumptions on our first look over the data, then you try to extract some useful insights from that data to build the machine learning models. Finally, you can make use of visualization techniques to preview the model results and tune them according to the applications.
Hope this was helpful.