DEV Community

Haokang Den
Haokang Den

Posted on

Thoughts about primary key

Many of us who manages databases needs to create identities for their table. As the business grows the table schema evolves with it, but the identities often tend to stays longer.

As someone who interacts with data/database on a daily basis, I spend a lot of time thinking about how to make them better.

Let's discuss one of the pitfalls by creating two tables for an growing hypothetical e-commerce startup: products and customers. It's often tempting to model the identities using the built-in mechanisms, for example:

-- MySQL --
CREATE TABLE products (
  id INT AUTO_INCREMENT NOT NULL PRIMARY KEY, -- 🌶
  name VARCHAR(50),
  price INT
);

CREATE TABLE customers (
  id INT AUTO_INCREMENT NOT NULL PRIMARY KEY, -- 🌶
  name VARCHAR(50)
);

I'm skipping over a few details, but I hope you will get the gist. Each products and customer has an numerical ID and one or more associated columns. There are two problems in the above code, which I’ve marked with a hot pepper 🌶.

Let's add more detail to our hypothetical startup. Let's say that we used to create new products and customers by an internal admin system, perhaps a simple web form.

Original

It's very likely that the database ids are used for internal communications. After all, names just aren't precise enough. We might even exposed them in the API endpoints, for example GET /products/5487 🌶.

As the business thrives, our startup needs to hire more people and start buying advanced software to manage it's growing operation. For example:

  • Sales department might want to use Salesforce as it's primary CRM.
  • Financial department(s) might want to use some kind of inventory management system.

So our operation changed significantly, and we must integrate with external systems each with it's own database and IDs.

Enterpriseimplementations Architecture

It's quite challenging to keep different databases in sync, especially when there is no common identity key.

  • Using multiple keys simultaneously is quite confusing for humans.
  • Using foreign system IDs as primary key is not a option since systems might be replaced once in a while.

If we got the chance to really think about the requirement of a common identity key, perhaps in a cozy hammock, what makes an ID good?

Let's start with the CREATE TABLE statement. Regardless of specific database implementations AUTO_INCREMENT NOT NULL PRIMARY KEY usually indicates that the column is:

  • Unique
  • Not null
  • Indexed
  • Sequential
  • Depends on internal states of the database 🌶

The last part is not very obvious, but it have caused scaling problems in Twitter. In our case, we might want to generate our own primary key using one of the alternatives algorithms.

Several alternatives have been proposed, including GUID/UUID and their semi-sequential counterpart Squuids/CUID. Our initial table design can be modified by removing the AUTO_INCREMENT statement. Using on of the algorithms above, we can easily generate the primary key in the API server or one of the external system.

-- MySQL --
CREATE TABLE products (
  id INT NOT NULL PRIMARY KEY, -- or VARCHAR(36), depends on your implementation
  urn VARCHAR(50),
  name VARCHAR(50),
  price INT,
  UNIQUE (urn)
);

If exposing primary keys might raise some security concern. We can add a second unique resource name, perhaps in the URN format just like Amazon did for AWS.

References:

Top comments (0)