justin gage for Retool

Posted on Mar 4, 2020 • Originally published at retool.com

Formatting and dealing with dates in SQL

#sql #dates #formatting

It’s 3AM, and you’re sleeping soundly in your room. But slowly, your sweet dream turns into a nightmare: all of the queries you wrote earlier in the day are parsing dates wrong, your app is down, and your boss is angry. It turns out migrating from Redshift to Bigquery was not “as easy as 123” and your DBA switched all of your timestamps to unix time. Nice.

Nobody likes dates, especially programmers, but they’re a critical part of pretty much every application. In Javascript you’ve got Moment, but parsing dates in SQL is a bit more complex. This post will run through how you can effectively work with dates in SQL, resolve your issue quickly, and get back to bed.

SQL dialects and function prototypes

Part of why writing SQL is annoying is that there are hundreds of different flavors. Syntax is slightly different from MySQL to PostgreSQL (for example), and some dialects have functions that others don’t (e.g. PIVOT in Snowflake). When you’re working with dates, there are prototypes for types of functions: even though the exact syntax might differ between dialects, the idea is the same. We’ll tackle 5 broad categories:

Formatting
Time differences / deltas
Intervals
Time zones
Current times

For each function prototype, we’ll provide the right syntax and documentation for 5 of the more popular SQL dialects:

MySQL – the world’s most popular open source relational database (thanks, Oracle)
PostgreSQL – the world’s second most popular open source relational database, and a developer favorite for syntax
BigQuery – Google’s cloud based data warehouse that shares SQL syntax with other GCP databases (Standard SQL)
Redshift – Amazon’s cloud based data warehouse (or at least one of them)
Presto – a popular open source query engine built by Facebook and often used with HDFS / Hive

Something that often gets confusing is the difference between DATE and TIMESTAMP. A TIMESTAMP is just a DATE with an additional two levels of precision: fractional seconds and fractional seconds with time zones.

#DATE
2019-01-01 04.55.14 PM
#TIMESTAMP
2019-01-01 04.55.14.000148 PM

In general, we’ll use “date” in this tutorial but the distinction isn’t super important. Let’s go!

Formatting

Dates never seem to be in the format you want them to be in.

Computers interpret dates in all different formats, from unixtime to strings and timestamps, and they’re usually not friendly to each other. Here are the function prototypes:

FROM_UNIXTIME() – convert a unix time date into a normal date.
TO_UNIXTIME() – convert a regular date format into a unix time date.
TO_DATE() – convert a string to a date format. Sometimes you’ll need to specify what format the string is in through the function arguments.
FORMAT_DATE() – convert a date into a string. This function is usually used to format dates in specific ways as strings, so the arguments are the important part.

Unix time (which is also called epochs time) is kind of funky: it corresponds to the number of seconds that have elapsed since January 1st, 1970. A typical unix timestamp might look like this: 1284352323.

Here’s how these different conversion functions look across major SQL dialects:

Function	MySQL	PostgreSQL	BigQuery	Redshift	Presto
Convert unix time to date format	`FROM_UNIXTIME()`	`TO_TIMESTAMP()`	`DATE_FROM_UNIX_DATE()`	`TIMESTAMP 'epoch' + your_timestamp_column * INTERVAL '1 second'`	`FROM_UNIXTIME()`
Convert date format to unix time	`UNIX_TIMESTAMP()`	`EXTRACT(EPOCH FROM TIMESTAMP )`	`UNIX_DATE()`	`DATE_PART(EPOCH, )`	`TO_UNIXTIME()`
Convert string to date format	`STR_TO_DATE()`	`TO_DATE()`	`PARSE_DATE()`	`TO_DATE()`	`DATE_PARSE()`
Convert date to string format	`DATE_FORMAT()`	`TO_CHAR()`	`FORMAT_DATE()`	`TO_CHAR()`	`DATE_FORMAT()`

Here’s how we’d use these functions in Postgres, with expected inputs and outputs:

#Convert unix time to date format
SELECT TO_TIMESTAMP(‘1284352323’);
-- 2010-09-13T04:32:03.000Z
#Convert date format to unix time
SELECT EXTRACT(EPOCH FROM TIMESTAMP '2001-02-16 20:38:40');
-- 982355920
#Convert string to date format
SELECT TO_DATE('05 Dec 2000', 'DD Mon YYYY')
-- 2000-12-05
#Convert date to string format
SELECT TO_CHAR(DATE('2001-02-16 08:00:00-05'), 'DD Mon YYYY')
-- 16 Feb 2001

There’s a whole “language” of string formatting arguments that developers need to frantically search Google for every time they use them. Almost every single dialect follows the C strftime standard, except for Postgres. These are usually the same across TO_DATE() and FORMAT_DATE().

Date rounding

Date rounding lets you lower the specificity of your date; this is useful for aggregations and looking at trends over time. The prototype function here is DATE_TRUNC(), which truncates your date to a lower level of specificity, like month or year. Here’s how things look across dialects:

Function	MySQL	PostgreSQL	BigQuery	Redshift	Presto
Truncate date specificity	`DATE_FORMAT()`	`DATE_TRUNC()`	`DATE_TRUNC()`	`DATE_TRUNC()`	`DATE_TRUNC()`

For a more concrete example of when you’d use a date truncation function, imagine we have a table of order where each row represents an order, and each order has a date. If we want to look at order growth month over month, we’d use a truncation function in Postgres:

#Truncate date specificity
SELECT
    DATE_TRUNC(order_time, ‘month’),
    COUNT(*) as count_orders
FROM orders
GROUP BY 1
ORDER BY 1 DESC

The DATE_TRUNC() function grabs the month and year from the date so you can get a monthly view.

Date parts

If you have a timestamp, date part functions will pick out a particular part of that timestamp; this is useful if you want to display what day of the week a user logged in, what hour someone made an order, or aggregate event data by month to see which months of the year your website gets the most traffic. Our function prototype:

DATE_PART() – extract a specific part of a date, like the day of the week or year.

Here’s how it plays out across different types of SQL:

Function	MySQL	PostgreSQL	BigQuery	Redshift	Presto
Extract specific part of date	`EXTRACT()`	`DATE_PART()`	`EXTRACT()`	`DATE_PART()`	`EXTRACT()`

In Postgres, we can extract just the day of a user’s order using DATE_PART():

#Extract specific part of date
SELECT date_part('day', DATE('2001-02-16 20:38:40'))
-- 16

Each SQL dialect has different approaches for how to specify data parts to extract, so be sure to check the documentation linked in the table above.

Differences / Deltas

If you need to calculate the duration or difference between two dates, most SQL dialects have functions for that. The popular one here is DATE_DIFF():

DATE_DIFF() – get the difference between two dates in any specificity (days, years).

If you want to get the difference between two dates in days (i.e. how many days exist between date number one and date number two) you’d use something like DATE_DIFF(‘day’, ‘2019-01-01’, ‘2019-01-06’). Weirdly, in some languages, time deltas are their own data type (see, for example, the Pandas package in Python). Here’s how these functions line up:

Function	MySQL	PostgreSQL	BigQuery	Redshift	Presto
Get difference between two dates at some specificity	`DATEDIFF()`	Subtraction	`DATE_DIFF()`	`DATEDIFF()`	`DATE_DIFF()`

Here’s how we’d use this in Postgres:

#Get difference between two dates
SELECT DATE('2019-01-31') - DATE('2019-01-01')
-- 30

A closely related cousin to time deltas is interval functions.

Intervals

Intervals let you add and subtract time from dates in SQL. This kind of function is useful for calculating rolling widows, like filtering your data for every record in the past 7 days or the past year.

DATE_ADD() – add an amount of time to a date.
DATE_SUB() – subtract an amount of time from a date.
INTERVAL – use plus and minus signs to add time to a date.
PARSE_DURATION() – parse a formatted duration into a date. This is useful if you want to create a duration from scratch instead of subtracting two dates from each other.

The first two behave like normal functions, but in Postgres and Presto, using the INTERVAL operator is funky.

Function	MySQL	PostgreSQL	BigQuery	Redshift	Presto
Add time to a date	`DATE_ADD()`	`+ INTERVAL`	`DATE_ADD()`	`DATEADD()`	`DATE_ADD()`
Subtract time from a date	`DATE_SUB()`	`- INTERVAL`	`DATE_SUB()`	`DATEADD()` with negative values	`DATE_ADD()` with negative values
Extract date difference		`MAKE_INTERVAL()`			`PARSE_DURATION()`

The INTERVAL operator in SQL has unusual syntax: you usually need to use the keyword, follow it with a number in single quotes, and then a date interval. In Presto, for example, you can add one day to your date by using <some_date> + INTERVAL ‘1’ day. Notice how the string quotes are only around the quantity (1) and not the chosen interval (day).

If we wanted to see the number of orders users have made in the past 7 days:

#Count orders over the past 7 days
SELECT
    order_date,
    count(*)
FROM orders
WHERE order_date <= NOW() - INTERVAL ‘7’ day

And here’s how we’d use the MAKE_INTERVAL() function in Postgres:

#Extract date difference
SELECT MAKE_INTERVAL(days => 10)
-- { "days": 10 }

Time zones

Time zones are one of the biggest nightmares in date handling, and chances are you’re not sitting in UTC right now. Thankfully, most SQL dialects have a bunch of functions to handle TZ conversion. Our function prototypes:

AT_TIMEZONE() – add a timezone to a date. Useful if the date doesn’t have an existing timezone attached to it.
CONVERT_TZ() – convert between timezones. Useful if that date already has an existing timezone.

Timezones also rear their ugly head when dealing with string parsing and date formatting. Here’s how these functions line up across dialects:

Function	MySQL	PostgreSQL	BigQuery	Redshift	Presto
Add timezone to a date	`CONVERT_TZ()` or system settings	`AT TIME ZONE`	In strings	`AT TIME ZONE`	`AT TIME ZONE`
Convert date to timezone or convert between timezones	`CONVERT_TZ()`	`AT TIME ZONE`	`FORMAT_DATE()`	`CONVERT_TIMEZONE()`	`AT TIME ZONE`

Here’s how we’d add a timezone to a user’s click event or convert between timezones in Postgres:

#Add timezone to a date
SELECT DATE('2001-02-16 20:38:40') AT TIME ZONE 'America/Los_Angeles';
-- 2001-02-15T16:00:00.000Z
#Convert between timezones
SELECT DATE('2001-02-15T16:00:00.000Z') AT TIME ZONE 'America/Denver';
-- 2001-02-14T17:00:00.000Z

In a few of these dialects, you use the same function (AT TIMEZONE) to add a timezone to a date, as well as convert a date that already has a timezone to a different one. MySQL also has a dedicated function (UTC_DATE()) for converting dates to UTC time. For more about MySQL timezones, check out this wonderful cheatsheet.

Current times

The last big category of time related functionality is getting the current time. This kind of functionality is useful in similar cases as intervals, like building a rolling window. Because most of these functions return timestamps, they’re often used in conjunction with something like DATE_TRUNC to pick the right intervals.

NOW() – gets the current timestamp at query run time.

This is a pretty popular series of functions, so there are often a bunch of aliases. Here’s how things stack up across SQL flavors:

Function	MySQL	PostgreSQL	BigQuery	Redshift	Presto
Get the current timestamp	`NOW()`, `CURTIME()`	`CURRENT_DATE` or 10 others	`CURRENT_DATE()`	`CURRENT_DATE`	`NOW()`

General best practice is to keep all of your dates and timestamps in UTC time, and display them based on the client’s timezone. If we wanted to log the time that a user logs in using Postgres, here’s what our query might look like:

#Get the current timestamp
SELECT CURRENT_TIMESTAMP
#Display the timestamp to the user in PST
SELECT CURRENT_TIMESTAMP AT TIME ZONE 'America/Los_Angeles';

Keep in mind that some of these functions return a date that’s tied to the timestamp that your system is set to, which can vary pretty widely across database engines. Postgres returns UTC by default when using CURRENT_TIMESTAMP, so no worries here.

The truth is that as useful as some of these dialects are, scripting languages are often easier to parse dates in (Moment.js is a really popular one for Javascript). If you’d prefer to use more fluid programming languages like JS to work with your relational data that's already in a SQL database, give Retool a spin.