DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Zaynaib (Ola) Giwa
Zaynaib (Ola) Giwa

Posted on

Long Data Vs. Wide Data

So, lately I have had my hands on some raw unclean data for an assignment for school. Originally I thought that messy data was about cleaning up blank values, formatting text, numbers, and strings in the right form, etc. But as I proceed to analyze my data in R I found out that it could not be handled. There was a key concept that I was missing when it comes to setting up data the right way: Wide and Long Data

What is Wide Data?

In the wide data (also known as unstacked) is when each variable attribute for a subject is in a separate column.

Person Age Weight
Buttercup 24 110
Bubbles 24 105
Blossom 24 107

What is Long Data?

Narrow (stacked) data is presented with one column containing all the values and another column listing the context of the value

Person Variable Value
Buttercup Age 24
Buttercup Weight 110
Bubbles Age 24
Bubbles Weight 105
Blossom Age 24
Blossom Weight 107

It is easier for r to do analysis in the Long data form. This concept might seem weird at first. We are use to seeing and analyzing data in Wide data form but with practice it gets easier over time. R has an awesome package called reshape2 to convert your data from wide to long.

First install the r package and load the library.

install.packages("reshape2")
library(reshape2)

Using the wide table above we will split our variables into two groups identifiers and measured variables.

Identifier variable:Person
Measured variable: Age, weight

In order to transform this wide data into long data we will have to use the melt method. You β€œmelt” data so that each row is a unique id-variable combination.

df
 Person Age Weight
1 Buttercup 24 110
2 Bubbles 24 105
3 Blossom 24 107

ppg <-melt(df,id=c("Person"),measured=c("Age","Weight"))
 ppg
 Person variable value
1 Buttercup Age 24
2 Bubbles Age 24
3 Blossom Age 24
4 Buttercup Weight 110
5 Bubbles Weight 105
6 Blossom Weight 107

Resources

For official documentation about the reshape library from its creator Hadley Wickham.

More about Wide vs. Long data check out The Analysis Factor

More information about cleaning and shaping data from messy data to tidy data check out Hadley Wickham’s paper Tidy Data

Top comments (0)

Let's Get Wacky


Use any Linode offering to create something unique or silly in the DEV x Linode Hackathon 2022 and win the Wacky Wildcard category

β†’ Join the Hackathon <-