DEV Community

dave medlock 🧔🏻
dave medlock 🧔🏻

Posted on • Updated on • Originally published at devplaybook.com

Working with JSON in an RDBMS – Part 2, PostgreSQL

In the first installment in this series, I covered working with JSON documents in a SQL database using Microsoft SQL Server. As we saw, there are a number of functions that Microsoft has added to their database platform to allow you to store a JSON document in an nvarchar field and then validate it, query it, and manipulate it.

In part 2 of this series, we'll talk about the functionality that PostgreSQL has added to support the same requirements. First, let's do a quick refresher on the use case and structure that we'll use as an example.

Use Case for JSON in an RDBMS

I won't rehash the whole use case for this situation since that was covered in Part 1 of the series. But we'll use the same scenario. Here we're going to work with a provider (e.g. a physician, hospital, clinic) contract with a health insurance company. These provide a good analogy to a JSON document because they originate from actual documents (contracts between two parties) and they often contain contract language as well as tables and payment terms.

We'll use the same table structure in PostgreSQL that we used in MS SQL Server, with the notable exception that instead of putting our JSON document in an nvarchar field, we're going to put it in a field that uses PostgreSQL's special json data type.

  • provider
    • id (serial, pk)
    • provider_name (varchar(100))
    • npi (varchar(10))
  • provider_contract
    • id (bigint, pk)
    • provider_id (bigint, fk to provider.id)
    • contract_terms (jsonb) > this is where our JSON document will go
    • contract_eff_date (datetime)
    • contract_term_date (datetime)

Once again, we'll use the same JSON documents and provider details that we did in our previous article for consistency.

Princeton Plainsboro Teaching Hospital Contract:

{
    "out_of_area": "true",
    "service_rates": [
          {"service_description": "Office Visits 1", "service_code_range_start": "99211", "service_code_range_end": "99221", "flat_rate": "45.00"},
          {"service_description": "Office Visits 2", "service_code_range_start": "99222", "service_code_range_end": "99251", "flat_rate": "56.00"},
          {"service_description": "Anesthesia", "service_code_range_start": "00100", "service_code_range_end": "00952", "billed_percentage": "0.7401"}
    ]
}

Sacred Heart Hospital Contract:

{
    "out_of_network": "true",
    "service_rates": [
         {"service_description": "Misc Services", "service_code_range_start": "81200", "service_code_range_end": "82000", "billed_percentage": "0.6132"}
    ]
}

PostgreSQL JSON/JSONB Data Type

Unlike MS SQL Server, PostgreSQL introduced new data types to store JSON. These data types are json and jsonb data types, where jsonb is a binary representation of the JSON document that we're storing. (Lucera Del Alba has a great write-up on it here.) For this article, we'll use the jsonb data type because it gives us the ability to filter our queries using built-in JSON operators, but if you're designing and app that will store JSON data in PostgreSQL, the previously referenced article has some great pros and cons to help you make your decision about whether to use json or jsonb, including factoring in the ability to index JSON documents and whether you need your object key order to be preserved.

Setting up Our Data

Once we've created our tables, you'll notice that we can use the exact same SQL queries to insert data into our provider and provider_contract tables:

INSERT INTO provider (provider_name, npi) VALUES ('Princeton Plainsboro Teaching Hospital', '0123456789')
INSERT INTO provider (provider_name, npi) VALUES ('Sacred Heart Hospital', '1234560987')
INSERT INTO provider_contract (provider_id, contract_terms, contract_eff_date, contract_term_date)
VALUES (
1,
'{ "out_of_area": true, "service_rates": [ {"service_description": "Office Visits 1", "service_code_range_start": "99211", "service_code_range_end": "99221", "flat_rate": "45.00"}, {"service_description": "Office Visits 2", "service_code_range_start": "99222", "service_code_range_end": "99251", "flat_rate": "56.00"}, {"service_description": "Anesthesia", "service_code_range_start": "00100", "service_code_range_end": "00952", "billed_percentage": "0.7401"}] }',
'20190101 00:00:00',
'20201231 23:59:59'
)

INSERT INTO provider_contract (provider_id, contract_terms, contract_eff_date, contract_term_date)
VALUES (
2,
'{ "out_of_network": true, "service_rates": [{"service_description": "Misc Services", "service_code_range_start": "81200", "service_code_range_end": "82000", "billed_percentage": "0.6132"}] }',
'20200101 00:00:00',
'20201231 23:59:59'
)

This should work seamlessly and we're ready to start working with our JSON documents directly in PostgreSQL.

With MS SQL Server, one of the first things we did was run a query to determine if our contract_terms field had valid JSON in it. However, with PostgreSQL we don't have to do any such thing because we're defining a json data type and it won't allow invalid JSON to be entered into the table to begin with. If you modify one of the insert queries above to remove a colon or a double quote then you'll get an error when you attempt to execute the query, indicating that the JSON document you're attempting to store is invalid. I'll count this one as a win for PostgreSQL right off the bat - I always prefer that the database proactively protect itself against bad data being entered whenever possible, as opposed to leaving it up to the user or the application.

Querying JSON Data

Let's move on to the next task we wanted to perform: finding all out of area providers. PostgreSQL provides us with built-in operators for accessing values within the JSON document in our contract_terms field. So if we want to get the out of area and out of network indicators for providers who are out of area, we can run a query as simple as this:

SELECT id, contract_terms->'out_of_area' AS out_of_area,
    contract_terms->'out_of_network' AS out_of_network
FROM provider_contract
WHERE contract_terms->'out_of_area' = 'true';

Again, just as in SQL Server we're going to get a single record and the out_of_area value will be true while the out_of_network value will be null since that key isn't available in the applicable JSON document.

One thing to know about working with these operators is that the -> operator will return the JSON object whereas ->> will return the text value. This becomes important because as you start working with these queries, you may find that you need to "chain" together operators to get to the specific object that you need.

The next thing we demonstrated in SQL Server was that if I wanted to filter on some value that was contained within an array of objects in my JSON document, I was going to have to know which index specifically to go after in that array. However, PostgreSQL gives us functions and operators that really juice up the power of our JSON queries. In this case, we can use the jsonb_array_elements function combined with the operators for selecting and filtering on text values to return the Office Visits 1 rate for each provider that has that rate defined in their contract. My query then looks like this:

SELECT p.id AS provider_id,
    p.provider_name,
    pc.id AS contract_id,
    c.svcs->>'flat_rate' AS office_visit_rate
FROM provider_contract pc
INNER JOIN LATERAL (
        SELECT svc_rates.svcs
        FROM jsonb_array_elements(contract_terms->'service_rates') AS svc_rates(svcs)
        WHERE svc_rates.svcs->>'service_description' = 'Office Visits 1'
    ) c ON true
INNER JOIN provider p ON p.id = pc.provider_id;

Here, I'm starting by querying my provider contract table. I then do a lateral join to a subquery. This subquery is using the jsonb_array_elements function to "flatten" out the service_rates array and then filter it on service_description = Office Visits 1. In my select clause, I get the flat_rate text value from the svcs field. Running this query, I'm going to get my provider ID, provider name, contract ID, and the office visit rate of "45.00". So while there are some additional hoops to jump through, it is possible to go into my query knowing only a value that I need to filter on ("Office Visits 1") and come out with an answer, which wasn't possible in SQL Server. I'm going to give PostgreSQL another point for power and flexibility here. I'll also note that this is not the only way to accomplish this - PostgreSQL provides a contains operator - @> - that can be used to filter as well and might be preferable.

Returning JSON Documents

Next up, we want to return a JSON document that includes provider and contract details. You can accomplish this in one of at least two ways with PostgreSQL, both of which use the row_to_json function provided by the database. First, we can simple select the row_to_json function and pass a list of the rows and optionally the pretty_bool to indicate if we want it formatted with line breaks after the fields. That query looks like this:

SELECT row_to_json((p.id, p.provider_name, p.npi, pc.id, pc.contract_terms, pc.contract_eff_date, pc.contract_term_date), true)
FROM provider p
INNER JOIN provider_contract pc ON pc.provider_id = p.id;

That query is going to return a JSON document, but it's going to name the fields "f1", "f2", and so on - it's not going to carry through the original column names and you can't provide aliases in the function itself.

{"f1":1,
 "f2":"Princeton Plainsboro Teaching Hospital",
 "f3":"0123456789",
 "f4":3,
 "f5":{"out_of_area": true, "service_rates": [{"flat_rate": "45.00", "service_description": "Office Visits 1", "service_code_range_end": "99221", "service_code_range_start": "99211"}, {"flat_rate": "56.00", "service_description": "Office Visits 2", "service_code_range_end": "99251", "service_code_range_start": "99222"}, {"billed_percentage": "0.7401", "service_description": "Anesthesia", "service_code_range_end": "00952", "service_code_range_start": "00100"}]},
 "f6":"2019-01-01",
 "f7":"2020-12-31"}

So if you want the fields to be named in a certain way, you'll need to use a CTE query and explicitly name the rows.

WITH data(id, provider_name, npi, contract_id, contract_terms, contract_eff_date, contract_term_date) AS (
    SELECT p.id, p.provider_name, p.npi,
           pc.id AS contract_id, pc.contract_terms,
           pc.contract_eff_date, pc.contract_term_date
    FROM provider p
    INNER JOIN provider_contract pc on p.id = pc.provider_id
)
SELECT row_to_json(data, true) FROM data;

In this case, the column names will come through in the JSON document as we want them to.

{"id":1,
 "provider_name":"Princeton Plainsboro Teaching Hospital",
 "npi":"0123456789",
 "contract_id":3,
 "contract_terms":{"out_of_area": true, "service_rates": [{"flat_rate": "45.00", "service_description": "Office Visits 1", "service_code_range_end": "99221", "service_code_range_start": "99211"}, {"flat_rate": "56.00", "service_description": "Office Visits 2", "service_code_range_end": "99251", "service_code_range_start": "99222"}, {"billed_percentage": "0.7401", "service_description": "Anesthesia", "service_code_range_end": "00952", "service_code_range_start": "00100"}]},
 "contract_eff_date":"2019-01-01",
 "contract_term_date":"2020-12-31"}

For this one, I think that both Microsoft and PostgreSQL provide fairly straightforward and easily accessible ways of getting pretty much the same result, so I'm going to leave this as a toss-up for the time being, although if pressed I might say that Microsoft's way is slightly preferable since I'm simply writing my query the way I normally would plus a single line at the end. It's a bit more intuitive to me.

Final Challenge: Query for Rates with a Given Service Code

The last thing we did in the previous segment was build a query that showed how we could take a service code and find out what the rate was for that service code (whether it was a flat rate or a percentage of the billed rate). In the Microsoft world, we had to declare a variable and select our JSON document into that variable. We then had to do a SELECT query where we cross-applied the contents of the JSON document and then applied our WHERE clause to do the filtering.

Again, I'm going to give this point to PostgreSQL because we don't have to jump through any of those hoops. In fact, it was easy enough to grab the query above where we were looking for the Office Visits 1 flat rate and in a matter of seconds, it was tweaked and ready to take on this challenge:

SELECT p.provider_name,
    p.npi,
    pc.id AS contract_id,
    pc.contract_eff_date,
    pc.contract_term_date,
    c.svcs->>'service_description' AS svc_description,
    c.svcs->>'service_code_range_start' AS svc_range_start,
    c.svcs->>'service_code_range_end' AS svc_range_end,
    c.svcs->>'flat_rate' AS flat_rate,
    c.svcs->>'billed_percentage' AS billed_percentage
FROM provider_contract pc
INNER JOIN LATERAL (
        SELECT svc_rates.svcs
        FROM jsonb_array_elements(contract_terms->'service_rates') AS svc_rates(svcs)
        WHERE svc_rates.svcs->>'service_code_range_start' <= '99215'
            AND svc_rates.svcs->>'service_code_range_end' >= '99215'
    ) c ON true
INNER JOIN provider p ON p.id = pc.provider_id
WHERE pc.id = 1;

Here, we're simply changing the filtering criteria on our LATERAL subquery and then selecting a few more columns. No variables to declare or populate and no cross apply query to write. This one felt far easier and less frustrating and I wouldn't necessarily have to wrap this in a stored procedure to make it work. If I needed to tweak this even further, I feel a lot better about getting the results quickly and painlessly than I would the Microsoft query.

So the final score at this point is PostgreSQL = 3, Microsoft = 0, Tie = 1. (I didn't start out with the intention of making this a competition but things evolve...!)

We haven't even scratched the surface of the PostgreSQL JSON implementation. There's a ton of information in the documentation and PostgreSQL has introduced JSON Path querying so there's a lot more power to be explored there as well. (I've been running all these queries on 11.6, by the way. PostgreSQL introduced JSON functionality back around 9.3, I believe.)

Given the added functionality that's available in PgSQL, I will say that I think the learning curve is a little steeper. That could be more true for me personally because I've only in the last several months started to really use PgSQL more frequently as I came from a largely Microsoft background and have been using SQL Server since SQL Server 2000 was fairly new. But once you start to figure out some of those initial mechanisms for interacting with, flattening out, and filtering your JSON documents, it becomes really enjoyable to work with them in PostgreSQL.

Next up, we'll cover MySQL's JSON functionality and see how it stands up to the current front-runner. Until then, happy coding!

Top comments (0)