loading...

Why you shouldn't use current time to refine a batch run

konyu profile image @kon_yu ・3 min read

First things first

I do a code review, and about once a year I point out that I shouldn't use the current time to narrow down the execution time for a batch of code.
I decided to make some material to explain `why it can't be done'.

If you don't write the test code well, it's hard to find it, and you'll notice the mistake after you've been running it in the production environment for a while, or you won't notice it for a long time.
It is also possible to have an accident that a user's billing process, which is a regular process, is missed all the time.

It is explained using Rails code, but it can be applied to any programming language.

Prerequisites.

Ruby: 2.5.1 (in Japanese)
Rails: 5.2
Table: The users table is assumed to have a column named updated_at with the date of the update.

The problem with using execution time to narrow down time in batch code

Code to get the data updated up to 1 day ago.

`
User.where(updated_at: (Time.zone.now - 1.day)... .Time.zone.now)

SQL Issued

User.where(updated_at: (Time.zone.now - 1.day)... .Time.zone.now).to_sql
=> "SELECT \"users\".* FROM \"users\" WHERE \"users\".\"updated_at\" >= '2019-02-11 03:26:21.992358' AND \"users\".\"updated_at\" < '2019-02-12 03:26:21.992952'"
`

In the same way for batch execution (Cron), etc.


User.where(updated_at: (Time.zone.now - 1.day)... .Time.zone.now)

And don't do it.

Even if you set Cron to run at 0:00:00, the target code will still be executed by
There is no guarantee that it will run at 0:00:00. There is a great possibility that the time it takes for the batch to be launched, the library to be loaded and the actual processing of the batch to take place will be 0:00:01

Think of it in terms of boundary value testing.
If the time of updated_at

Data for users updated on 2/12 with current time 2/13 0:00:00
That is, I want to get the data that updated_at is between 2/12 00:00:00 and 2/13 0:00:00.

If you have data like the following, the data you want to get is 2, 3, 4.

  1. 2/11 23:59:59
  2. 2/12 00:00:00
  3. 2/12 00:00:00:01
  4. 2/12 00:00:00:01 4. 2/12 23:59:59
  5. 2/13 00:00:00 3. 2/12 00:00:01 4. 2/12 23:59:59

`

User.where(updated_at: (Time.zone.now - 1.day)... .Time.zone.now)

If this takes a long time to start the batch and is executed at 2/13 00:00:01, the data for 2 will not be retrieved and the unnecessary 5 will be retrieved.

You can't use the current time unless the timing to get the user is 00:00:00 (strictly speaking, in milliseconds)

To solve this problem.

You can write code that takes into account a certain amount of stagger in the time the batch is executed.
In other words, you can make it okay if a batch is executed at 00:00:01 or 02:12:42, even if it is not executed exactly at 00:00:00 (of course, you need to retry when a batch fails, but that's another story)

You can explicitly set the period to be narrowed down to 00:00:00 on that day.
Concretely, you can use Time.zone.now.beginning_of_day.

Here's the code I rewrote using it

User.where(updated_at: (Time.zone.now.beginning_of_day - 1.day). .Time.zone.now.beginning_of_day)

# SQL Issued
User.where(updated_at: (Time.zone.now.beginning_of_day - 1.day). .Time.zone.now.beginning_of_day).to_sql
=> "SELECT \"users\".* FROM \"users\" WHERE \"users\".\"updated_at\" >= '2019-02-11 00:00:00' AND \"users\".\"updated_at\" < '2019-02-12 00:00:00'"

# It's a bit redundant, so if you make time a variable, you get this
midnight = Time.zone.now.beginning_of_day
User.where(updated_at: (midnight - 1.day)...midnight) .midnight)

Discussion

pic
Editor guide