DEV Community

Aldo Mendez
Aldo Mendez

Posted on • Updated on

Handle historical versions on a database record

At work, we handle data on users that change frequently as they move through production lines, shifts or departments, and we need to have historical data of the employees at all points in time and be used seamlessly, as we review historical data. From this requirement we prepare a database schema and Laravel models to manage this information in a simple way.

We begin this journey by acknowledging that we need to pull the data from a legacy system, written in VisualFoxPro and its data is in a dBase database.

We tried a lot of ways to import data to MariaDB, but what we find was the fastest, is to use a tool to push all the records to a schema in the database that is not used in production. The data is checked during import by a trigger against the data we already have in the production schema, and the "import" table is truncated at the end, as the content of this table is not important nor used.

So, this is the trigger, edited for brevity but all the important mechanics are there.

BEGIN
    set @current_id = (SELECT log_id FROM prod.people e WHERE e.id = trim(new.TR101) order by begin_at desc limit 1);

    if @current_id is null then
        insert into prod.people 
            (id, name, shift, ..., begin_at)
        values 
            ( new.id, new.name, new.shift, ..., CURRENT_TIMESTAMP());
    else 
        select id, name, shift, ... 
        into @id, @name, @shift, ... 
        from prod.people p 
        WHERE p.id = new.id
        order by begin_at desc 
        limit 1;

        if 
            -- no need to check @id as is what let us here
            new.name != @name or 
            new.shift != @shift or
            -- ... check the rest of the fields for changes
        then
            -- seal the last record
            update prod.people set end_at = CURRENT_TIMESTAMP() where log_id = @current_id;

            insert into prod.people (id, name, shift, ...) values ( new.id, new.name, new.shift, ...);

        end if;
    end if;
END
Enter fullscreen mode Exit fullscreen mode
  1. It begins checking if the record exists, if not inserts the record
  2. If exists, pull the content of the record in variables, and check those variables, against the new record information to find any changes.
  3. if a change is detected, it seal the "current" record by filling the end_at field with the current_timestamp and creates a new record with the begin_at date filled with the current_timestamp and end_at as null.

All ~3000 records are inserted in less that a second 🥳 (in the production schema). As the tool generates a single insert into temporal_table (...) values (...), (...), ... (...) query and all data is send in one shot.

This let us with data in this form:

log_id id name shift ... begin_at end_at
1 1 user1 1 ... 2020-01-01 2020-02-25
2 2 user2 1 ... 2020-01-01 null
3 1 user1 2 ... 2020-01-01 null

Now lets understand what is this information telling us:

  • log_id is the primary key of the record.
  • id is the id of the user this is not unique in this table as we need to have all the history.
  • name, shift, and the rest are self explanatory, is the information we hold of employees.
  • begin_at is the timestamp at which this record begins to be valid.
  • end_at is the timestamp at which this record is no longer valid.

To be able to query user data for a particular date we need to constraint our search to that particular date, except if we are not carefully, all records with null on any date will be discarded, so our friend coalesce comes to the rescue and we pass the current date as a fallback, so we end with:

select * 
from people 
where (
    coalesce(end_at, '2021-02-25') >= '2021-02-25' 
    and "begin_at" <= '2021-02-25'
)
Enter fullscreen mode Exit fullscreen mode

We are testing for a Point in time so all the dates should be the same, if we open the window more that one record may be returned. And undesired/unexpected things may happen.

And was until this point where we switch our SQL hats to Laravel's hats. As we need to be able to use this data from our Laravel applications. To improve in ergonomics we use basic Laravel options to provide us with a simple API to make data retrieval a simple and low psychologically taxing activity.

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Factories\HasFactory;

class Personnel extends Model
{
    use HasFactory;

    protected $primaryKey = 'log_id';
    public $timestamps = false;

    protected $hidden = ['begin_at','end_at', 'log_id'];

    public function scopeAsOfNow($q)
    {
        return $q->where(function ($query) {
            return $query->whereRaw(
                "coalesce(end_at, ?) >= ?", [
                    now()->format('Y-m-d H:i'), 
                    now()->format('Y-m-d H:i')
                ]
            )->where(
                'begin_at',
                '<=',
                now()->toDateTimeString()
            );
        });
    }

    public function scopeAt($q, $date)
    {
        return $q->where(function ($query) use ($date) {
            return $query->whereRaw(
                "coalesce(end_at, ?) >= ?", 
                [$date, $date]
            )->where('begin_at', '<=', $date);
        });
    }
}

Enter fullscreen mode Exit fullscreen mode

We ended with this Model setup (well is a little longer, but the rest is not relevant for this case).

What we did is to create two scopes at and asOfNow method. So instead of:

Personnel::where(function ($query) {
    return $query->whereRaw(
        "coalesce(end_at, ?) >= ?", [
            now()->format('Y-m-d H:i'), 
            now()->format('Y-m-d H:i')
        ]
    )->where(
        'begin_at',
        '<=',
        now()->toDateTimeString()
    );
});
Enter fullscreen mode Exit fullscreen mode

we end up using:

Personnel::asOfNow();
// or
Personnel::at($someDate);
Enter fullscreen mode Exit fullscreen mode

And that's it. After some iterations, we end with this working and very ergonomic interface.

If you have any comments, or solve this in a different way, please let me know, as will be an eye opener for all of us.

Top comments (2)

Collapse
 
darkain profile image
Vincent Milum Jr

This functionality is actually built directly into MariaDB. Check this out! mariadb.com/kb/en/system-versioned...

Collapse
 
aldomendez profile image
Aldo Mendez

Hi, is interesting how after a lot of experimentation and improvement we get to what is a feature of the database. 🥳
The good part is that we now know why. Also this implementation can be used for SQLite or other systems where this is not a database feature, I believe, given the database has the appropriate parts available (triggers and the features in the triggers we are using).

We'll try to use this feature at some point in the feature.