DEV Community

Cover image for Kalman Filtering in SQL
Danny Reed
Danny Reed

Posted on

Kalman Filtering in SQL

Kalman who?

Have you ever used a Kalman filter? My first experience with Kalman filtering was in smoothing motion sensor readings for a homemade Segway I built back in college. Many years ago it was used in the Apollo missions (ok, they win on coolness...). It's job is simple -- take noisy data and smooth it out a bit.

At work, I needed to smooth some latitude/longitude values to help make navigation paths be smoother. Normally, I'd just npm i kalmanjs and be off to the races, but this particular data pipeline presented a challenge.

Challenge

I apply several data-processing operations to the data I'm working with, and several of those operations happen in the database because SQL is a great place to do set-based operations. These operations must take effect in a specific order in order for things to come out right, and I needed to stick the Kalman filtering operation right between two specific steps in my process. Both of those steps were done in SQL.

Option 1

Do the first few operations in SQL, then return results to the API layer (Lambda) to do the Kalman filtering (using Kalmanjs) and the remaining operations. This requires porting the rest of the steps from SQL to JS and doing them in the Lambda.

Advantage: Puts some CPU load on the Lambda instead of the DB, which is a good thing in my case.

Disadvantage: I have to translate several operations from SQL to JS, and some of them are set-based, which means they're better to do in SQL.

Option 2

Try to implement Kalman filtering in SQL. This option is tricky because there's practically no information out there on Kalman filtering in SQL.

Advantage: Preserve data pipeline and existing SQL implementations for subsequent operations.

Disadvantage: Requires writing a Kalman filter in SQL, which has some specific challenges we'll discuss later. Also, I don't do much fancy math, so understanding the way Kalman filtering actually works would be a challenge.

I chose option 2 because I just can't justify doing several operations in the "wrong" place just because that's where the Kalman implementations are available.

Translation

Before attempting the implementation, I took a good look at the code in KalmanJS and tried to understand it. Honestly the code looked pretty simple, and all I really had to do was change syntax around a bit. Here's a snippet of the translation:

Original JS implementation (KalmanJS)

/**
  * Filter a new value
  * @param  {Number} z Measurement
  * @param  {Number} u Control
  * @return {Number}
  */
  filter(z, u = 0) {

    if (isNaN(this.x)) {
      this.x = (1 / this.C) * z;
      this.cov = (1 / this.C) * this.Q * (1 / this.C);
    }
    else {

      // Compute prediction
      const predX = this.predict(u);
      const predCov = this.uncertainty();

      // Kalman gain
      const K = predCov * this.C * (1 / ((this.C * predCov * this.C) + this.Q));

      // Correction
      this.x = predX + K * (z - (this.C * predX));
      this.cov = predCov - (K * this.C * predCov);
    }

    return this.x;
  }
Enter fullscreen mode Exit fullscreen mode

Essentially this just does basic ifs, and some math. In fact, I found it so clear that I decided I could probably just translate the code without even having to dive into understanding the math behind Kalman filtering.

SQL Translation:

IF (@x IS NULL)
BEGIN
    PRINT 'No existing state; proceeding without it';
    SET @x = 1 / @C * @z;
    SET @cov = 1 / @C * @Q * (1 / @C);
END
ELSE
BEGIN
    SET @predX = @A * @x + @B * @u;
    SET @predCov = @A * @cov * @A + @R;

    SET @K = @predCov * @C * (1 / (@C * @predCov * @C + @Q));

    SET @x = @predX + @K * (@z - @C * @predX);
    SET @cov = @predCov - @K * @C * @predCov;

END
Enter fullscreen mode Exit fullscreen mode

I did choose to change uncertainty() and predict() functions to inline formulas, just to minimize the need to create function declarations in SQL. Other than that, you can tell they're pretty much the same.

Speed Bump

This is where I hit a snag with the translation. Kalman works by looking at statistical properties the previous values in order to predict and smooth the next value. This means that Kalman needs to keep track of the state of the filter. In OOP you can just keep an instance of a class around in memory and use instance properties to keep track of state. In SQL, once the invocation completes, nothing is stored in memory.

Solution

Since the only obvious way to store things in SQL is in a table, I decided to create a KalmanState table in my database. I read through the JS code again to determine which values were actually stored and used in the next calculation. From what I gathered, it appeared that all it needs to store are these variables:

x, cov, predX, predCov, K
Enter fullscreen mode Exit fullscreen mode

Now it needs to store one set of these variables for each instance of the filter, so I made a simple table that looks like this:

id          x                      cov                    predX                  predCov                K                      identifier
----------- ---------------------- ---------------------- ---------------------- ---------------------- ---------------------- ----------------
43          26.9861111111111       0.618055555555556      26.9636363636364       1.61818181818182       0.618055555555555      testingLat
44          21.0416666666667       0.618055555555556      21.1090909090909       1.61818181818182       0.618055555555555      testingLon
Enter fullscreen mode Exit fullscreen mode

The last column is "identifier" which is just a string that you can use to name the instance. In my case, I'm using one filter for each of several MAC addresses of devices I'm using. So for me, I can just pass in the MAC address of the device as the identifier. This helps the SPROC retrieve the proper state when it's time to filter a new value.

Usage

Here's how I integrated this into my query.

SELECT TOP 1
    @LatIdentifier = CONCAT(@DeviceId, '-lat')
    ,@LonIdentifier = CONCAT(@DeviceId, '-lon')
    ,@rawLat = Latitude
    ,@rawLon = Longitude
    FROM #temp;

EXEC sp_kalman @LatIdentifier, @rawLat, @predictedLat OUTPUT;
EXEC sp_kalman @LonIdentifier, @rawLon, @predictedLon OUTPUT;

-- Overwrite the unfiltered lat/lon values in the temp table
UPDATE #temp 
    SET
        Latitude = @predictedLat
        ,Longitude = @predictedLon;
Enter fullscreen mode Exit fullscreen mode

Note that this is part of a much larger query, there were contextual reasons to use the temp table for this purpose, and that there is only ever one row in that table when this is executed.

Problems/Caveats

I'm not writing this to say that I've come up with some perfect solution...in fact this solution has some serious challenges.

  1. I was forced to make it a SPROC. I attempted to turn my code into a UDF. Functions are nice because you can use them right in a SELECT or even a JOIN. Sadly (but sensibly) functions are not allowed to alter data. As we discussed earlier, the Kalman filter must alter data in order to save its state. Now to integrate it with my query, I can't just flow it right in with the SELECTs but I have to EXEC the SPROC with parameters and then retrieve the output into a variable. This is just a clunky experience that makes for uglier and less readable code, in my opinion.

  2. You can't use it with ISOLATION LEVEL SNAPSHOT. This is true for the same reason as #1. You can't alter data in a SNAPSHOT-isolated transaction. This makes sense too, but it's annoying! I had to change up how my transaction worked in order to exclude the EXEC calls from it. In my case it was possible to side-step this limitation, but in other cases it may not be!

  3. You have to clean up after yourself By this I mean that there's no mechanism whereby this will ever DELETE records from the KalmanState table, which could allow for uncontrolled growth dependent on the application. In my case, I had to modify my API layer to perform cleanup when a filter is no longer useful to the system.

Should you use this?

It seems to me that the vast majority of use cases for Kalman filtering can be accomplished without doing the math in the DB. It's just easier all around :)

If you find yourself in a position like mine, where doing it in the DB really was the best option, then I hope this gives you a leg up on accomplishing that.

Next Steps

The author of the KalmanJS library (who does some cool stuff -- check out his work) has a contrib filter where he invites translations to other languages. You can find this implementation there now.

Also check out his original blog post about the development of KalmanJS here.

If you can think of improvements, please feel free to have a go at it!

(Cover image sourced from here and used with permission)

Oldest comments (0)