In this part of the series, we go through how to parse and manipulate datetimes. This can be done in data pipeline with code or you can specify the format when you load data into database. In this blog post we focus on the first case.
Usually, data pipelines code are written with Python. In this blog post I'm using Clojure, though, mainly because I use it at work and it has nice library for handling datetimes. I'm using clojure.java-time library, version 0.3.2. As one can guess from the name of the library, it's a wrapper library for Java 8 time API. One of the good things about Clojure is that you can use java libraries and classes. Java happens to have a nice time API, so why not use it?
At this point, I want to point out that this is not a coding tutorial, and the purpose of the blog post is not to teach anyone how to program with Clojure. If you're not familiar with Clojure, the syntax might look quite overwhelming at first. I suggest you take a look at the Clojure syntax documentation if the syntax prevents you from understanding the examples given in the blog post.
Datetime data
I'm going to introduce the datetime classes I've used at work to represent datetimes. What they have in common is that their string presentation includes both date and time. Some classes in time API include only date or only time, but I'm not going to discuss them here. The code in examples are run in a REPL.
First, we are going to require the library, and alias it as t.
(require '[java-time :as t])
Local datetime
Local datetime is a datetime without timezone information. When there is no information about the timezone, datetime is assumed to be in local time.
For example:
(def now-time (t/local-date-time))
(str now-time)
would output for example
"2020-11-22T18:49:20.446"
You have to be mindful about when you can use local datetime and when to use some other class, which has a timezone. For example, birthday is something you could use local datetime for, but you wouldn't want to use it for example for scheduling a meeting, would you?
Local datetime is always relative to what local means. I might run some code locally, and check that datetime looks correct, but when I deploy the solution to for example to AWS, local might mean a different thing. Method now
returns a different time depending on the system clock, unless you provide a timezone as an argument for it.
Instant
Instant is a single point on the timeline. The point is represented as nanoseconds since the epoch, but in string format, it looks like this:
(def instant-now (t/instant))
(str instant-now)
"2020-11-22T17:09:43.456Z"
As you may remember from the previous part of the series, Z at the end means datetime is in UTC. Instant can be used for example to record an event, for example when the event was received from an external system. Because instant is point on the timeline, it doesn't store date and time fields, only a number representing the point in time relative to epoch.
Offset datetime
Offset datetime is a datetime with an offset from UTC. As you may remember from the previous part of the series, it may have a negative or positive value, depending on whether it's less or more than UTC.
(def offset-now (t/offset-date-time))
(str offset-now)
"2020-11-22T18:58:54.961+02:00"
The offset datetime doesn't include the timezone information. Due to daylight savings, the same timezone might have a different offset depending on the time of the year. The timezone can be deducted from the offset, but the class itself doesn't store information about it.
Zoned datetime
Zoned datetime is a datetime with an offset and a timezone. It contains the most information about date and time.
(def zoned-now (t/zoned-date-time (t/zone-id "Europe/Helsinki")))
(str zoned-now)
"2020-11-22T19:13:54.682+02:00[Europe/Helsinki]"
Zoned datetime is needed for example when you need to convert local datetime to instant, since you would need the offset to calculate the time since epoch.
Parsing and formatting datetimes
If the datetime in data happens to be in a wanted format already, you can just give the datetime as an argument to the constructor
(def parsed-local-date-time (t/local-date-time "2011-12-03T10:15:30.234"))
Instead, if the datetime has space as a delimiter instead of character T, you need to pass the datetime format as an argument, too
(def parsed-local-date-time (t/local-date-time "yyyy-MM-dd HH:mm:ss.SSS" "2011-12-03 10:15:30.234"))
Otherwise local-date-time is not able to parse it.
Sometimes the datetime doesn't contain timezone information, but you know which timezone it should have. It might be that the documentation of the API tells you that, or you have confirmed it from the API developer, so you can set it as you parse the datetime:
(def parsed-zoned-date-time (t/zoned-date-time (t/local-date-time) (t/zone-id "Europe/Helsinki")))
Now let's have a look at how it works the other way around. The query might return for example an instant, and you need it to be in a certain format for sending it to an API.
You can do it as an one-liner
(->> (t/instant) (t/format (t/with-zone (t/formatter :iso-local-date-time) "Europe/Helsinki")))
but I personally think it looks a bit nasty, so I created a function for it:
(defn instant->iso-local-date-time-str [instant-to-format]
(let [iso-formatter (t/formatter :iso-local-date-time)
formatter-with-hki-zone (t/with-zone iso-formatter "Europe/Helsinki")]
(t/format formatter-with-hki-zone instant-to-format)))
; More generic function, for formatting instant with any predefined formatter and timezone:
(defn instant->formatted-str [instant-to-format predefined-formatter time-zone]
(let [iso-formatter (t/formatter predefined-formatter)
formatter-with-hki-zone (t/with-zone iso-formatter time-zone)]
(t/format formatter-with-hki-zone instant-to-format)))
Because in the example we used an instant, we need to provide timezone information before formatting it. That's because an instant has a different string presentation depending on the timezone. If you want to have the string presentation in UTC, you don't need the timezone and you can just go with
(t/format (t/formatter :iso-instant) (t/instant))
which gives you the same string as you would just
(str (t/instant))
You can find all predefined formatters in Java DateTimeFormatter documentation.
Manipulating datetimes
Once the datetime data is parsed into an object, manipulating it is quite straightforward. For example, if you want to add two weeks to the date:
(-> parsed-local-date-time
(t/plus (t/weeks 2)))
Getting the start of the last month would be like this:
(-> parsed-local-date-time
(t/adjust :first-day-of-month)
(t/adjust (t/local-time 0)))
Possible adjusters (such as :first-day-of-month) are:
:day-of-week-in-month
:first-day-of-month
:first-day-of-next-month
:first-day-of-next-year
:first-day-of-year
:first-in-month
:last-day-of-month
:last-day-of-year
:last-in-month
:next-day-of-week
:next-or-same-day-of-week
:previous-day-of-week
:previous-or-same-day-of-week
It is good to note, that you cannot use the adjusters for instant, because instant doesn't have information such as year or day of month, so you would need to convert instant to another datetime before using adjuster.
For converting datetime from a timezone to another you would need a zoned datetime. Converting the datetime from for example Helsinki timezone to UTC can be done like this:
(-> (t/zoned-date-time (t/zone-id "Europe/Helsinki"))
(t/with-zone-same-instant "UTC"))
If the provided datetime is not a zoned-date-time but for example offset-date-time, you would first need to convert it into zoned-date-time before changing the timezone.
(-> (t/offset-date-time "2020-06-30T20:30:21.145+05:00")
(t/zoned-date-time)
(t/with-zone-same-instant "Europe/Helsinki"))
Pitfalls with handling datetimes
It's easy to mess up the datetimes. It's easy to forget about the timezones and offsets, and just use local datetime only to realize later datetimes are different from what you'd expected. In some cases that's fine, but in most data and analytics platforms, timezone and offset matters. They're also important if there is any kind of scheduling involved.
If you've set up for example Black Friday offers to open at midnight, and you're in Finland, you can find yourself in a situation when the offers are opening at 2 AM instead of at midnight because the system timezone is UTC. Probably not the most horrible situation, but you might get some angry feedback from the customers that have been waiting for the offers.
Another thing that might cause trouble is daylight savings. Once a year, you might have a situation where you're missing data from one hour. And once a year, you might find yourself in the situation that you have one hour at night that has more data than usual. This might not cause any issues, but it's a good thing to acknowledge.
And how about traveling? When we talk about some wearables that produce data, the local might change to another in the middle of a day. I do not envy the developers that need to face that type of issue when they build the system.
I hope this blog post has provoked some toughts and gave you a good overview of how datetimes can be handled with Clojure. In the next part I'm going to discuss about datetimes from the perspective of the database.
Photo by Lucian Alexe on Unplash
Top comments (0)