DEV Community

Zaynaib (Ola) Giwa
Zaynaib (Ola) Giwa

Posted on

Day 2 - 66 Days of Data Science

Yesterday I learned that the first step of mastering statistics is to master the art of exploring data.

Exploring Data Types

Categorical data is data that can be in groups. They are labels. In the R programming language, they are called factors. Generally categorical data you will use a bar chart or pie chart to explore data. The distribution of categorical data are counts, frequency, or percentage. 

For quantitative data, you would use a histogram, line chart, or stem plot ( only if the data is small).

Exploratory Data Analysis (EDA) workflow

  • Study each individual variable
  • Study the relationships between two variables
  • Create graphs of the distribution of variables
  • Last, add numerical summaries of specific aspects of data

Four things to measure the distribution of a variable shape, center, spread, and outliers.

Measures of center - mean and median
Measures of spread- quantile ranges

spread + center gives useful information about the distribution of the data.
shape- the data can have a normal distribution like a bell curve, or can be skewed to the right or left

Tools for finding outliers
standard deviation - measures how much a data point is away from the mean

Latest comments (2)

Collapse
 
arvindpdmn profile image
Arvind Padmanabhan

You may find this article on EDA useful for your learning: devopedia.org/exploratory-data-ana...

Collapse
 
zaynaib profile image
Zaynaib (Ola) Giwa

Thank you for sharing! I will check it out today.