Scott Houseman

Posted on Aug 14, 2023 • Updated on Oct 24, 2023

A Better Way To Store Record Status In A Relational Database

#python #postgres #database #architecture

Definition

A status field is a relational database column or relationship (foreign key) in a table that indicates the current state or condition of each record in that table.

The value of this field is expected to change throughout the lifetime of any given table record.

Use Case

For this use case example, within a contrived e-commerce system, a catalogue products table might have status values such as:


`in stock`	In warehouse stock. May be purchased and shipped.
`on order`	On back order. May be purchased and shipped.
`unavailable`	Unavailable for purchase. May be viewed in customer order history.
`deleted`	Should not be viewable to customers.

Various database structures may be used to store this status. We'll explore these below.

A naive design: Use a varchar field

The most naive database design would simply store this status field as a varchar type:

/* Postgres */
DROP    TABLE IF EXISTS products CASCADE;
CREATE  TABLE           products (
    product_id  SERIAL        PRIMARY KEY,
    title       VARCHAR(200)  NOT NULL,
    sku         VARCHAR(50)   NOT NULL UNIQUE,
    status      VARCHAR(20)   NOT NULL
);
DROP    INDEX IF EXISTS idx_product_status;
CREATE  INDEX           idx_product_status
ON                      products(status);

Adding some sample data

INSERT  INTO products
        (title,                           sku,              status)
VALUES  ('EcoBoost Portable Charger',     'SKU-ECBW-1234',  'in stock'),
        ('AquaPure Water Filter',         'SKU-AQPF-5678',  'on order'),
        ('SolarGlow Garden Lights',       'SKU-SGLA-9101',  'unavailable'),
        ('FitFlex Yoga Mat',              'SKU-FFYM-1121',  'deleted'),
        ('BreezeAir Conditioner',         'SKU-BACA-3141',  'in stock'),
        ('CrispSound Bluetooth Speaker',  'SKU-CSBS-5161',  'on order'),
        ('SmoothBlend Juicer',            'SKU-SBJG-7181',  'unavailable'),
        ('QuickCook Microwave Oven',      'SKU-QCMO-9201',  'deleted'),
        ('UltraView Binoculars',          'SKU-UVBK-1221',  'in stock'),
        ('ProFit Running Shoes',          'SKU-PFRS-3241',  'in stock');

will give us a data set that looks like this:

SELECT  product_id,
        title,
        sku,
        status
FROM    products;
 product_id |            title             |      sku      |   status
------------+------------------------------+---------------+-------------
          1 | EcoBoost Portable Charger    | SKU-ECBW-1234 | in stock
          2 | AquaPure Water Filter        | SKU-AQPF-5678 | on order
          3 | SolarGlow Garden Lights      | SKU-SGLA-9101 | unavailable
          4 | FitFlex Yoga Mat             | SKU-FFYM-1121 | deleted
          5 | BreezeAir Conditioner        | SKU-BACA-3141 | in stock
          6 | CrispSound Bluetooth Speaker | SKU-CSBS-5161 | on order
          7 | SmoothBlend Juicer           | SKU-SBJG-7181 | unavailable
          8 | QuickCook Microwave Oven     | SKU-QCMO-9201 | deleted
          9 | UltraView Binoculars         | SKU-UVBK-1221 | in stock
         10 | ProFit Running Shoes         | SKU-PFRS-3241 | in stock
(10 rows)

The problem here is that the status varchar type is unconstrained; the value could be any string and this could lead to data inconsistencies.

An improved design: use an enum field

This improved design makes use of an ENUM type to define a restricted set of valid status values:

/* Postgres */
DROP    TABLE  IF EXISTS   product_status CASCADE;
DROP    TYPE   IF EXISTS   product_status CASCADE;
CREATE  TYPE               product_status AS ENUM (
    'in stock',
    'on order',
    'unavailable',
    'deleted'
);

DROP    TABLE IF EXISTS products  CASCADE;
CREATE  TABLE           products  (
    product_id  SERIAL          PRIMARY KEY,
    title       VARCHAR(200)    NOT NULL,
    sku         VARCHAR(50)     NOT NULL UNIQUE,
    status      product_status  NOT NULL          -- **Note** type here
);
DROP    INDEX IF EXISTS idx_product_status;
CREATE  INDEX           idx_product_status
ON                      products(status);

This limits the possible value of status to one of the defined string values; 'in stock', 'on order', 'unavailable' or 'deleted'.

There are several benefits of using an enum type over a varchar:

Data Integrity: ensure that the value is always within a specific set of values. This is not possible with varchar (unless you add a CHECK constraint).
Performance: enum values are stored as integers, making comparing and filtering data more efficient..
Indexing: enum types can be more efficient than indexing varchar columns, which can lead to faster search and retrieval of data.
Storage: enum values are stored as integers, which can be more space-efficient than varchar.

However, adding new values to an enum type requires database schema changes, which may be a heavy operation if your database is large.

Metadata

Metadata is data that provides information about other data.

These enum status values have the following metadata with regards to the relevant products table record:

Value	In stock	Buyable	Active
`in stock`	Yes	Yes	Yes
`on order`	No	Yes	Yes
`unavailable`	No	No	Yes
`deleted`	No	No	No

This metadata now need to be implemented through some business logic in code.

Something like:

# status.py
from __future__ import annotations

from dataclasses import dataclass


@dataclass
class ProductStatus:
    """A data model for product status"""

    is_in_stock: bool
    is_buyable: bool
    is_active: bool

    @classmethod
    def create(cls, status: str) -> ProductStatus:
        """Create a `ProductStatus` instance derived from the given string"""

        match status.lower():
            case "in stock":
                return ProductStatus(
                    is_in_stock=True,
                    is_buyable=True,
                    is_active=True,
                )
            case "on order":
                return ProductStatus(
                    is_in_stock=False,
                    is_buyable=True,
                    is_active=True,
                )
            case "unavailable":
                return ProductStatus(
                    is_in_stock=False,
                    is_buyable=False,
                    is_active=True,
                )
            case "deleted":
                return ProductStatus(
                    is_in_stock=False,
                    is_buyable=False,
                    is_active=False,
                )
            case _:
                raise ValueError(f"Unable to determine product status '{status}'")

This works well enough, but it does split the domain between the database and the code base.
It would be better if we could represent the state within the database structure itself.

The next refactor: Add state columns

In order to store these state values better in this database, we could add a few flag columns to the products table:

/* Postgres */
DROP    TABLE IF EXISTS products  CASCADE;
CREATE  TABLE           products  (
    product_id          SERIAL        PRIMARY KEY,
    title               VARCHAR(200)  NOT NULL,
    sku                 VARCHAR(50)   NOT NULL UNIQUE,
    is_in_stock         BOOLEAN       NOT NULL,
    is_buyable          BOOLEAN       NOT NULL,
    is_active           BOOLEAN       NOT NULL
);

This is an improvement, as we now have status attributes for each products table record.

But, some limitations remain.
We cannot add any metadata to the various status flags. We also would need to add further columns if we ever needed a status that requires additional state flags. This would necessitate an ALTER operation on our large products table.

We have also lost the ability to succinctly allocate a single status value to a product record.

Good database design: Apply normalisation

The best design would be to abstract product status from the products table.

To achieve this, we normalise the database structure by adding a foreign key to a product_status table.

Using a foreign key for representing record status in a database table, rather than an enum, has several advantages:

Data Integrity: Foreign keys enforce referential integrity, preventing invalid or inconsistent status values.
Flexibility and Extensibility: Easily add or modify status values without altering the table structure.
Normalization: Foreign keys follow the principles of database normalization by reducing data redundancy.
Consistency: A foreign key ensures that the status values are consistent throughout the database, as they are stored in a single table.
Indexing and Performance: Foreign key relationships are optimised, making lookups and joins between related tables more efficient.

/* Postgres */
DROP    TYPE  IF EXISTS product_status CASCADE;         -- Drop the type that we created earlier
DROP    TABLE IF EXISTS product_status CASCADE;
CREATE  TABLE           product_status (
    product_status_id   SERIAL        PRIMARY KEY,
    product_status_usid VARCHAR(50)   NOT NULL UNIQUE,  -- unique string identifier
    description         VARCHAR(200)  NULL,
    is_in_stock         BOOLEAN       NOT NULL,
    is_buyable          BOOLEAN       NOT NULL,
    is_active           BOOLEAN       NOT NULL
);

DROP    TABLE IF EXISTS products  CASCADE;
CREATE  TABLE           products  (
    product_id          SERIAL        PRIMARY KEY,
    title               VARCHAR(200)  NOT NULL,
    sku                 VARCHAR(50)   NOT NULL UNIQUE,
    product_status_id   INTEGER       NOT NULL,
                        FOREIGN KEY (product_status_id)
                          REFERENCES  product_status (product_status_id)
);

Next, let's create records in product_status, for the various status values, and associated state flags.

/* Postgres */
INSERT  INTO product_status
        (product_status_usid, description,                is_in_stock,  is_buyable, is_active)
VALUES  ('in stock',          'Product is in stock',      true,         true,       true),
        ('on order',          'Product is on back order', false,        true,       true),
        ('unavailable',       'Product is unavailable',   false,        false,      true),
        ('deleted',           'Product is deleted',       false,        false,      false);

Which gives us:

SELECT  product_status_id           AS id,
        product_status_usid         AS usid,
        description,
        is_in_stock                 AS in_stock,
        is_buyable                  AS buyable,
        is_in_stock AND is_buyable  AS shippable, -- derived attribute
        is_active                   AS active
FROM    product_status;

 id |    usid     |            description             | in_stock | buyable | shippable | active
----+-------------+------------------------------------+----------+---------+-----------+--------
  1 | in stock    | Product is in stock                | t        | t       | t         | t
  2 | on order    | Product is on back order           | f        | t       | f         | t
  3 | unavailable | Product is unavailable             | f        | f       | f         | t
  4 | deleted     | Product is deleted                 | f        | f       | f         | f
(4 rows)

And re-add our sample product data, this time using a foreign key id for the status:

INSERT  INTO products
        (title,                           sku,              product_status_id)
VALUES  ('EcoBoost Portable Charger',     'SKU-ECBW-1234',  1),
        ('AquaPure Water Filter',         'SKU-AQPF-5678',  2),
        ('SolarGlow Garden Lights',       'SKU-SGLA-9101',  3),
        ('FitFlex Yoga Mat',              'SKU-FFYM-1121',  4),
        ('BreezeAir Conditioner',         'SKU-BACA-3141',  1),
        ('CrispSound Bluetooth Speaker',  'SKU-CSBS-5161',  2),
        ('SmoothBlend Juicer',            'SKU-SBJG-7181',  3),
        ('QuickCook Microwave Oven',      'SKU-QCMO-9201',  4),
        ('UltraView Binoculars',          'SKU-UVBK-1221',  1),
        ('ProFit Running Shoes',          'SKU-PFRS-3241',  1);

We can now use a JOIN to return fields from both tables:

SELECT  p1.title,
        p1.sku,
        p2.description          AS status
FROM    products                AS p1
        JOIN  product_status    AS p2
          ON  p1.product_status_id = p2.product_status_id;
            title             |      sku      |          status
------------------------------+---------------+--------------------------
 EcoBoost Portable Charger    | SKU-ECBW-1234 | Product is in stock
 AquaPure Water Filter        | SKU-AQPF-5678 | Product is on back order
 SolarGlow Garden Lights      | SKU-SGLA-9101 | Product is unavailable
 FitFlex Yoga Mat             | SKU-FFYM-1121 | Product is deleted
 BreezeAir Conditioner        | SKU-BACA-3141 | Product is in stock
 CrispSound Bluetooth Speaker | SKU-CSBS-5161 | Product is on back order
 SmoothBlend Juicer           | SKU-SBJG-7181 | Product is unavailable
 QuickCook Microwave Oven     | SKU-QCMO-9201 | Product is deleted
 UltraView Binoculars         | SKU-UVBK-1221 | Product is in stock
 ProFit Running Shoes         | SKU-PFRS-3241 | Product is in stock
(10 rows)

The value of a usid

The unique string identifier (usid) product_status_usid value is useful for reducing cognitive load when constructing queries.
For example:

SELECT  p1.sku                  AS sku
FROM    products                AS p1
        JOIN  product_status    AS p2
          ON  p1.product_status_id = p2.product_status_id
WHERE   p2.product_status_usid = 'in stock';              -- All products in stock

is easier to understand at a glance, than

SELECT  p1.sku                  AS sku
/* ... snipped ... */
WHERE   p1.product_status_id = 1;                         -- What does 1 denote?

Similarly, when referring to these foreign key records in code, we do not want to use a primary key integer value as a constant (as these are strictly-speaking not constant) identifier. Rather, we would want to use the usid for this.

Extensibility

Adding a new status

Should we need to add a new status (for example pre-order) to our system, it is as simple as adding a new record to the product_status table.

/* Postgres */
INSERT INTO product_status
            (product_status_usid, description, is_in_stock, is_buyable, is_active)
VALUES      ('pre-order', 'Product is available for pre-order', false, true, true);

Adding a status log

Another benefit that this abstraction offers us, is the ability to extend our architecture fairly easily.
For example, to add a table to log status changes.

/* Postgres */
DROP    TABLE IF EXISTS product_status_log CASCADE;
CREATE  TABLE           product_status_log (
    product_id          INTEGER   NOT NULL,
    product_status_id   INTEGER   NOT NULL,
    logged_at           TIMESTAMP WITH TIME ZONE  DEFAULT now(),
                        FOREIGN KEY   (product_id)
                          REFERENCES  products (product_id),
                        FOREIGN KEY   (product_status_id)
                          REFERENCES  product_status (product_status_id)
);
CREATE INDEX  idx_product_status
ON            product_status_log (product_id, product_status_id);

And we have a nice log

SELECT    p3.product_status_usid    AS status,
          p2.logged_at              AS log_timestamp
FROM      products                  AS p1
          JOIN  product_status_log  AS p2
            ON  p1.product_id=p2.product_id
          JOIN  product_status      AS p3
            ON  p2.product_status_id = p3.product_status_id
WHERE     p1.sku = 'SKU-SGL-9101'
ORDER BY  p2.logged_at ASC;
   status    |         log_timestamp
-------------+-------------------------------
 in stock    | 2023-08-07 22:46:21.388738+02
 on order    | 2023-08-07 22:46:57.509255+02
 in stock    | 2023-08-07 22:47:01.686259+02
 on order    | 2023-08-07 22:47:19.070394+02
 in stock    | 2023-08-07 22:47:26.662571+02
 unavailable | 2023-08-07 22:47:31.837687+02
 deleted     | 2023-08-07 22:47:37.574532+02
(7 rows)

Cheers!

Change log

Date	Description
2023-10-18	Add "Definition" section
2023-10-18	Add "Use Case" section
2023-10-18	Qualify "adding new values to an `enum` type ..."
2023-10-19	Apply SQL style
2023-10-20	Add images

Top comments (6)

Ramakrishnan83 • Aug 28 '23

Can you help me understand this statement
"However, adding new values to an enum type requires an ALTER TYPE statement, which can be a heavy operation if your database is large."
Adding new values to new enum types impacts the product table?

Scott Houseman • Sep 7 '23

My apologies; you are quite correct @ramakrishnan83

That statement may hold true for MySQL, which requires an ALTER TABLE to add a value to an ENUM. I do not believe it is true for Postgres, which has abstracted the enum into a TYPE that may be altered.