DEV Community

Wilson Ikaba
Wilson Ikaba

Posted on

A step by step guide to Converting a Column to Date Data Type in a Dataset using R

Introduction

When working with raw data in R, it is common to encounter columns that contain date information stored as strings or in a format that is not recognized as dates by R. Additionally, upon importation of a data set or raw data, R will often interpret date as character objects. This therefore, means we cannot use the data to perform time based calculations such as time series and calculating time intervals. It is also important to note that there are various ways a date class can be formatted and we must help R to know which part of a date represents what i.e. year, month, date and hour. Finally, It should also be noted that there is a class that stores objects with date and time and is referred to as POSIXt and/or POSIXct classes.

This article presents a step by step guide that provides various ways on how to convert a date column in a data set to the date class or data type using R.

1.Identifying the column
Before proceeding with the conversion process, it is important to identify the column that contains the date values. Inspect the dataset to locate the column and ensure that it contains the date information .Verify that the data is consistent and follows a specific date format.

2.Approach using the lubridate package: The lubridate package in R provides convenient functions for working with date. The parsedate package provides functions to for working with messy dates. Follow the following steps to convert a column to the date data type using the lubridate package:

Step1:Install and load the lubridate package and readr
library(lubridate) load lubridate for data wrangling
library(readr) load dplyr for to enable data frame manipulation

Step2:Create data frame

student<-data.frame(name=c("Ram","Geeta","John","Paul",
                          "Cassie","Jim","Dwight")
                   ,maths=c(7,8,6,9,10,8,9)
                   ,science=c(5,7,6,8,9,7,8)
                   ,history=c(7,5,7,7,10,7,7)
                   ,Birthday=c("24/12/1990", "11-01-19", "24-12-19", 
                               "18-11-19", "28-02-19", 
                               "24-07-19", "24/11/21"))
Enter fullscreen mode Exit fullscreen mode

Step 3.view data frame

`student

   name maths science history   Birthday
1    Ram     7       5       7 24/12/1990
2  Geeta     8       7       5   11-01-19
3   John     6       6       7   24-12-19
4   Paul     9       8       7   18-11-19
5 Cassie    10       9      10   28-02-19
6    Jim     8       7       7   24-07-19
7 Dwight     9       8       7   24/11/21`
Enter fullscreen mode Exit fullscreen mode

Step4.check the date columns

class(student$Birthday)

[1] "character"
Enter fullscreen mode Exit fullscreen mode

The column is character type data.

Step 5:Convert the column birth year to the date data type
We can use two common functions

ymd()This converts character in year-month-date format to date
dmy()This converts character in date-month-year format to date()

our column is in dmy format so we will use the dmy() function

student$Birthday<-dmy(student$Birthday)
Enter fullscreen mode Exit fullscreen mode

Step 6.check for the changes

str(student$Birthday)
Enter fullscreen mode Exit fullscreen mode
Date[1:7], format: "1990-12-24" "1990-01-11" "1989-12-24" "1989-11-18" "1989-02-28" "1988-07-24" "1989-12-24
Enter fullscreen mode Exit fullscreen mode

1.Approach using the 'as.Date()'
R also provides the as.Date() function as a base method for converting columns to the date data type.Follow these steps to convert a column to the date data type.

Step 1:Use the created data frame

student_2<-data.frame(name=c("Ram","Geeta","John","Paul",
                           "Cassie","Jim","Dwight")
                    ,maths=c(7,8,6,9,10,8,9)
                    ,science=c(5,7,6,8,9,7,8)
                    ,history=c(7,5,7,7,10,7,7)
                    ,Birthday=c("24/11/88", "11/01/89", "24/12/90", 
                                "18/11/89", "28/02/91", 
                                "24/12/90", "24/07/92"))
Enter fullscreen mode Exit fullscreen mode

Step 2.view data frame

student_2

   name maths science history Birthday
1    Ram     7       5       7 24/11/88
2  Geeta     8       7       5 11/01/89
3   John     6       6       7 24/12/90
4   Paul     9       8       7 18/11/89
5 Cassie    10       9      10 28/02/91
6    Jim     8       7       7 24/12/90
7 Dwight     9       8       7 24/07/92
Enter fullscreen mode Exit fullscreen mode

Step3.check the date columns

class(student_2$Birthday)
Enter fullscreen mode Exit fullscreen mode

class type is character
The column is character type data.

Step 4.Convert the column birth year to the date data type

student_2$Birthday<-as.Date(student_2$Birthday)
Enter fullscreen mode Exit fullscreen mode
Error in charToDate(x) : 
character string is not in a standard unambiguous format
Enter fullscreen mode Exit fullscreen mode

We get an error.

We can use the as.Date() function with the format parameter to specify the format explicitly.

student_2$Birthday<-as.Date(student_2$Birthday,format = "%d/%m/%Y")
Enter fullscreen mode Exit fullscreen mode

Step 5.check for the changes

str(student_2$Birthday)

Enter fullscreen mode Exit fullscreen mode
Date[1:7], format: "0088-11-24" "0089-01-11" "0090-12-24" "0089-11-18" "0091-02-28" "0090-12-24"
Enter fullscreen mode Exit fullscreen mode

Considerations and Validation:
When converting columns to the date data type in R, keep the following considerations in mind:
Ensure that the data in the column follows a consistent date format. In both the lubridate and as.Date() approaches, it is important to specify the format explicitly if it deviates from the standard.
Check for missing or invalid dates. Consider using the na.rm argument in the conversion functions to handle missing values appropriately.
Validate the converted dates by performing basic checks. For example, check if the dates fall within a certain range or if they follow a specific pattern.

Conclusion:
Converting a column to the date data type in a dataset using R is a fundamental step for conducting time-based analysis. The lubridate package offers a convenient approach with functions like ymd(), while the base R method as.Date() provides a simple way to convert columns to the date data type. By following the steps outlined in this article, you can efficiently convert date columns in your dataset and leverage the power of time-based analysis in your R projects.

Top comments (0)