When working with databases especially relational ones we always consider looking to our data as structured data that can fit inside rows and columns and follow the database constraints ... so we do our best to have a great structural design in our databases
but what if we find ourselves in a situation where we do not have structured data ... we try to make it structural but we can not or it is not feasible to do that ... situations when we really need to store and manage non-relational data like JSON inside a relational database!
Fortunately, modern relational databases like Postgres support that ...
before deep diving in use JSON in postgres let's give some introduction
information about what we are trying to do here ...
What is JSON ?
- JSON stands for javascript object notation – the way that JavaScript use to represent its object
- it is an open and wide standard format that consists of key-value pairs
- defined by curly braces hold key-value pairs (object ) each key is followed by
:
(colon), and pairs are separated by,
(comma). - value can be Number, strings, boolean, null, array, or other JSON object
something like this :
{
"tags": [
"tag3",
"tag4"
],
"price": 100,
"authors": [
{
"id": 1,
"name": "author1"
},
{
"id": 2,
"name": "author2"
}
],
"publisher": "publisher1",
"published_date": "2017-04-07"
}
If you want you can learn more about JSON from json_quick_guide
When using JSON in databases ?
As I mention before the idea is “I want to store and manage data as JSON inside a relational database” … but why do we want that ??! And when use JSON in databases??
First of all please keep this common and important advice in your mind
Do not use it because you can … use it because you really need it and know what you are really doing
Here are some examples of when you might really need to use JSON in databases :
- Store non-relational data (like hierarchy or not structured you can not put it in the same shape). Examples of extra data, settings, configurations,...
- Unpredictable arbitrary data structure e.g. user add tags with different values
- Lots of attributes that are rarely used or it is in only some records
- Comes as JSON, as response comes from external data sources
- Data with dynamic values
- Schema flexibility as attributes can be added or removed in the JSON fields without having to change the DB (Database) schema
- Nested objects. If your data set has nested objects (single or multi-level)
Good use cases
Here are some good use cases for using JSON in databases:
- Maintain data that comes from an external service in the same structure and format (as JSON) that it arrived to you as
- Generate JSON response from DB directly to avoid transforming data before returning it via your JSON API because building deeply nested and has multiple arrays data in real-time on every request would be incredibly taxing on the system also when working with micro-services that depend on each other.
- In agile when a requirement is not clear we can start with JSON and then when we have a good idea about the system and DB design commit them to relational tables as needed
Postgresql & json :
Postgresql actually is a relational database but you can easily integrate relational and non-relation data as PostgreSQL care about JSON and supports it in different data types as it has json,jsonb and jsonpath :
- In version 9.2 PostgreSQL support json data type
- In version 9.4 PostgreSQL support jsonb and it provides more functions and operators for manipulating json data.
- In version 12 PostgreSQL support sql/json standard and jsonpath queries
Here we will focus on jsonb only ….
json vs jsonb
from the Postgres docs themselves:
“The json and jsonb data types accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.”
json is faster at writing, but jsonb is faster at reading.
As a general rule, your app will read more often than it writes.
In most cases, JSONB is what you should be using
JsonB stands for binary json as it is json stored in binary format…
and to make these differences between them clearer take this example about how each type deal with data
SELECT '{"c":0, "a":2,"a":1,"d":3,"b":5}'::json, '{"c":0, "a":2,"a":1,"d":"3","b":5}'::jsonb;
-- json | jsonb
-- ------------------------------------+------------------------------------
-- {"c":0, "a":2,"a":1,"d":3,"b":5} | {"a": 1, "b": 5, "c": 0, "d": "3"}
-- (1 row)
as you can see that json data type saves the same input without parsing it so without order keys , or remove duplicated key ‘a’, but in jsonb we parse the input data so we will reorder keys and remove duplicated key...
this table summarize main difference between them ...
json | jsonb |
---|---|
stores an exact copy of the text input, which must be reparsed again | stores a binary representation that avoids reparsing the data structure |
keep duplicate keys. | remove duplicate keys. |
It doesn’t support indexes | It supports indexing |
will re-parse the stored value every time; that means that key ordering will be same from the input | content is stored in binary format and no key ordering is guaranteed. |
performed a sequential scan, also remember that the JSON content is going to be reparsed for each record, which can be slow for complex structures. | The search is not as optimized as it would be with an expression index on the path you’re querying, but still uses the index. |
Drawbacks of using JSON in database:
As I mentioned in advice about do not use JSON in the database until you know what you are really doing …
as work with these features becomes more tricky over time and rather than improve your system you may destroy it !!
Here are some drawbacks of using jsonb in db (database) :
- Larger storage : jsonb storage results in a larger storage footprint , jsonb storage does not deduplicate the key names in the json, I find some people try to solve this by
use more compact names (unix-style) instead of more descriptive names. For example, if you’re storing millions of instances of a particular key, it would be better storage-wise to name it “pb” instead of “publishername”
.
but I think this is not a good idea as We can forget things very quickly and I think remembering the meaning of this abbreviation “pb” will be difficult in the coming months and can cause a lot of confusion for new developers.
No statistics : postresql does not store column statistics for jsonb columns. When you store your data as json blobs you deny the query planner the ability to make sensible decisions based on table and column statistics
Brake normalization : from a design point of view, it makes the database denormalized .the 1nf is not even satisfied, as values are not atomic with respect to the DBMS.
No constraints : yes no clear constraints and no grantees in type and consistent of data … e.g. if you store FK inside json like this no grantees that these FK are for existing records.
Week indexing : you lose most of the indexing features
Tricky query : it becomes very tricky to make sure the query will work correctly in future like key might not exist and data types can be in different data type (dynamic values ) from a row to another. what if someone store string in key that expected to store date !!
By the way this is only and introduction about the topic, in next toturails of this series we will learn and practice more about jsonb in rails and postgres ... Be around!🔥
Most information in this post is summized from these great references, take a look if you want to read more about these concepts ...
references
https://www.postgresql.org/docs/current/datatype-json.html
http://www.thomascerqueus.fr/json-fields-relational-databases-postgresql/
https://www.youtube.com/watch?v=2dQjfdXxtJw
https://scalegrid.io/blog/using-jsonb-in-postgresql-how-to-effectively-store-index-json-data-in-postgresql/
https://nandovieira.com/using-postgresql-and-jsonb-with-ruby-on-rails
https://youtu.be/p9RItyeKbLQ
Top comments (1)
Great and clear introduction. Keep it up 🔥