Philemonkipkirui

Posted on

# Exploratory Data Analysis.

Exploratory Data Analysis is a technique applied in investigation and analysis of data sets so as to extract vital features and trends. Insights from EDA are essential for machine learning and deep learning models in data science. Exploratory data analysis normally comes after cleaning( removing discrepancies in data) and proper understanding of sets of data by analysts.
Exploratory data analysis helps provide a better understanding of data, helps summarize the main characteristics of data and most importantly uncover relationships between data. EDA also narrowly focuses on checking assumptions required for model fitting and hypothesis testing.
There are four major types of EDA; Univariate Graphical, Univariate nongraphical, Multivariate nongraphical and Multivariate graphical. Univariate Non-graphical is the simplest form of data analysis, as the name suggests, during analysis just one variable is considered. The main goal of univariate non-graphical EDA is to get the underlying sample distribution and make observations about the population. It involves the determination of factors such as determining of central tendencies(mean, mode and median), determination of spread and measurement of skewness.
Univariate Graphical EDA .This process involves the application of graphical tools and techniques to provide a full "picture" of the single variable data set. Some of the graphics utilised include Stem and Leaf plots, Box plots, Histograms and Quantile Normal Plots.
Multivariate Non-graphical. Multivariate Non-graphical data used to show relationships between two or more sets of data with the help of either cross-tabulation(making of a two way table with column headings that match the amount of one variable and raw headings that match the amount of the opposite two variables) or statistics.
Multivariate graphical. Here, graphics are used to to display the relationship between two or more sets of data. The outcome depends on more than two variables while the change causing variables can also be multiple. Some common types of multivariate graphics include Scatter Plots, Multivariate Charts, Run Charts , Bubble Charts and Heat Maps.
To perform exploratory data analysis, python together with its numerous mathematical and visualization libraries are utilized.