Introduction
When working with raw data in R, it is common to encounter columns that contain date information stored as strings or in a format that is not recognized as dates by R. Additionally, upon importation of a data set or raw data, R will often interpret date as character objects. This therefore, means we cannot use the data to perform time based calculations such as time series and calculating time intervals. It is also important to note that there are various ways a date class can be formatted and we must help R to know which part of a date represents what i.e. year, month, date and hour. Finally, It should also be noted that there is a class that stores objects with date and time and is referred to as POSIXt and/or POSIXct classes.
This article presents a step by step guide that provides various ways on how to convert a date column in a data set to the date class or data type using R.
1.Identifying the column
Before proceeding with the conversion process, it is important to identify the column that contains the date values. Inspect the dataset to locate the column and ensure that it contains the date information .Verify that the data is consistent and follows a specific date format.
2.Approach using the lubridate package: The lubridate package in R provides convenient functions for working with date. The parsedate package provides functions to for working with messy dates. Follow the following steps to convert a column to the date data type using the lubridate package:
Step1:Install and load the lubridate package and readr
library(lubridate) load lubridate for data wrangling
library(readr) load dplyr for to enable data frame manipulation
Step2:Create data frame
student<-data.frame(name=c("Ram","Geeta","John","Paul",
"Cassie","Jim","Dwight")
,maths=c(7,8,6,9,10,8,9)
,science=c(5,7,6,8,9,7,8)
,history=c(7,5,7,7,10,7,7)
,Birthday=c("24/12/1990", "11-01-19", "24-12-19",
"18-11-19", "28-02-19",
"24-07-19", "24/11/21"))
Step 3.view data frame
`student
name maths science history Birthday
1 Ram 7 5 7 24/12/1990
2 Geeta 8 7 5 11-01-19
3 John 6 6 7 24-12-19
4 Paul 9 8 7 18-11-19
5 Cassie 10 9 10 28-02-19
6 Jim 8 7 7 24-07-19
7 Dwight 9 8 7 24/11/21`
Step4.check the date columns
class(student$Birthday)
[1] "character"
The column is character type data.
Step 5:Convert the column birth year to the date data type
We can use two common functions
ymd()This converts character in year-month-date format to date
dmy()This converts character in date-month-year format to date()
our column is in dmy format so we will use the dmy() function
student$Birthday<-dmy(student$Birthday)
Step 6.check for the changes
str(student$Birthday)
Date[1:7], format: "1990-12-24" "1990-01-11" "1989-12-24" "1989-11-18" "1989-02-28" "1988-07-24" "1989-12-24
1.Approach using the 'as.Date()'
R also provides the as.Date() function as a base method for converting columns to the date data type.Follow these steps to convert a column to the date data type.
Step 1:Use the created data frame
student_2<-data.frame(name=c("Ram","Geeta","John","Paul",
"Cassie","Jim","Dwight")
,maths=c(7,8,6,9,10,8,9)
,science=c(5,7,6,8,9,7,8)
,history=c(7,5,7,7,10,7,7)
,Birthday=c("24/11/88", "11/01/89", "24/12/90",
"18/11/89", "28/02/91",
"24/12/90", "24/07/92"))
Step 2.view data frame
student_2
name maths science history Birthday
1 Ram 7 5 7 24/11/88
2 Geeta 8 7 5 11/01/89
3 John 6 6 7 24/12/90
4 Paul 9 8 7 18/11/89
5 Cassie 10 9 10 28/02/91
6 Jim 8 7 7 24/12/90
7 Dwight 9 8 7 24/07/92
Step3.check the date columns
class(student_2$Birthday)
class type is character
The column is character type data.
Step 4.Convert the column birth year to the date data type
student_2$Birthday<-as.Date(student_2$Birthday)
Error in charToDate(x) :
character string is not in a standard unambiguous format
We get an error.
We can use the as.Date() function with the format parameter to specify the format explicitly.
student_2$Birthday<-as.Date(student_2$Birthday,format = "%d/%m/%Y")
Step 5.check for the changes
str(student_2$Birthday)
Date[1:7], format: "0088-11-24" "0089-01-11" "0090-12-24" "0089-11-18" "0091-02-28" "0090-12-24"
Considerations and Validation:
When converting columns to the date data type in R, keep the following considerations in mind:
Ensure that the data in the column follows a consistent date format. In both the lubridate and as.Date() approaches, it is important to specify the format explicitly if it deviates from the standard.
Check for missing or invalid dates. Consider using the na.rm argument in the conversion functions to handle missing values appropriately.
Validate the converted dates by performing basic checks. For example, check if the dates fall within a certain range or if they follow a specific pattern.
Conclusion:
Converting a column to the date data type in a dataset using R is a fundamental step for conducting time-based analysis. The lubridate package offers a convenient approach with functions like ymd(), while the base R method as.Date() provides a simple way to convert columns to the date data type. By following the steps outlined in this article, you can efficiently convert date columns in your dataset and leverage the power of time-based analysis in your R projects.
Top comments (0)