DEV Community

loading...
Cover image for Database Naming Standards

Database Naming Standards

ovid profile image Ovid ・6 min read

Introduction

Database naming conventions, despite SQL having been around for decades, are ad hoc. When moving from project to project, you often have to learn their particular quirks about database design and naming. This is because most developers don't understand databases. In fact, with my years of hiring developers, only a handful of times have I met developers who can properly normalize a database, even amongst those with strong SQL skills.

The database, for many developers, is an afterthought. It's just some place to save your data until the important (*cough*) work is done.

But this article isn't about database normalization. If you want to learn more, here's a short talk I gave which explains the basics.

Instead, when you have a working database, the question we want to know is "what standards can we apply which make it easier to use that database?" If these standards were to be widely adopted, databases would be easier to use because you wouldn't have to learn and remember a new set of standards every time you work with a new database.

Tables Names

CamelCase vs underscore_name

Let's get this out of the way quickly. I routinely see database examples online where we see table names like CustomerOrders or customer_orders. Which should you use? You probably want to use whatever standard is already in place, but if you're creating a new database, I recommend using_undercores for accessibility. The words "under value" have a different meaning from the word "undervalue", but the former, with underscores, is always under_value, while the latter is undervalue. With CamelCase, it's Undervalue versus UnderValue which, since SQL is case-insensitive, are identical. Thus, underscores help reduce ambiguity.

Underscores also improve accessibility. If you have vision problems and are constantly playing around with fonts and sizes to distinguish words, the underscores are much easier to read. And CamelCase is anecdotally harder to read for people for whom English isn't their first language.

That being said, this is a personal preference and not a strong recommendation. Many people love their CamelCase and this quickly becomes a "tabs versus spaces" debate (e.g., a waste of time).

Plural or Singular Tables?

There's long been a huge debate amongst experts in database theory about whether or not database tables should be singular (customer) or plural (customers). Without going into the theory, let me cut the Gordian Knot with a healthy dose of pragmatism: plural table names are less likely to conflict with reserved keywords.

Do you have users? SQL has a user reserved word. Do you want a table of constraints? constraint is a reserved word. Is audit a reserved word but you want an audit table? By simply using the plural form of nouns, most reserved words won't cause you grief when you're writing SQL. I've even tripped up PostgreSQL, which has an excellent SQL parser, when I've had a user table and later, an ORM automatically aliased that to user (yes, the same name) and PostgreSQL got confused.

Just use plural names and you're far less likely to have a collision.

Column Names

Don't name the primary key column id

This is a sin I've been guilty of for years. When working with a client in Paris, I had a DBA complain when I named my id columns id. I, a mere developer, thought our DBA—who was a very good DBA—was just being pedantic. After all, the customers.id column is unambiguous, but customers.customer_id is repeating information.

And later I had to debug the following from the Tau Station MMORPG:

SELECT thread.*
  FROM email thread
  JOIN email selected      ON selected.id    = thread.id
  JOIN character recipient ON recipient.id   = thread.recipient_id
  JOIN station_area sa     ON sa.id          = recipient.id
  JOIN station st          ON st.id          = sa.id
  JOIN star origin         ON origin.id      = thread.id
  JOIN star destination    ON destination.id = st.id
LEFT JOIN route
       ON ( route.from_id = origin.id
            AND 
            route.to_id = destination.id )
 WHERE selected.id                = ?
   AND ( thread.sender_id         = ? 
         OR ( thread.recipient_id = ?
              AND ( origin.id = destination.id
                    OR ( route.distance IS NOT NULL
                         AND
                         now() >= thread.datesent
                         + ( route.distance * interval '30 seconds' )
        ))))
ORDER BY datesent ASC, thread.parent_id ASC
Enter fullscreen mode Exit fullscreen mode

(As an aside, yes, I really am that careful about my SQL formatting)

Do you see the problem? If the SQL had used full id names, such as email_id, star_id, and station_id, the bugs would have stood out like a sore thumb while I was typing out this SQL, not later when I was trying to figure out both what I did wrong and why I don't drink as much as I should.

And by request of a few people who couldn't see the errors, here's the SQL after it's corrected. It's very clear that star_id and email_id, or station_id and station_area_id are probably not valid comparisons. If SQL had a decent type system, this SQL would not even have compiled.

SELECT thread.*
  FROM email thread
  JOIN email selected      ON selected.email_id      = thread.email_id
  JOIN character recipient ON recipient.character_id = thread.recipient_id
  -- station_area_id = character_id is probably wrong
  JOIN station_area sa     ON sa.station_area_id     = recipient.character_id
  -- station_id = station_area_id is probably wrong
  JOIN station st          ON st.station_id          = sa.station_area_id
  -- star_id = email_id is probably wrong
  JOIN star origin         ON origin.star_id         = thread.email_id
  JOIN star destination    ON destination.star_id    = st.star_id
LEFT JOIN route
       ON ( route.from_id = origin.star_id
            AND 
            route.to_id = destination.star_id )
 WHERE selected.email_id          = ?
   AND ( thread.sender_id         = ? 
         OR ( thread.recipient_id = ?
              AND ( origin.star_id = destination.star_id
                    OR ( route.distance IS NOT NULL
                         AND
                         now() >= thread.datesent
                         + ( route.distance * interval '30 seconds' )
        ))))
ORDER BY datesent ASC, thread.parent_id ASC
Enter fullscreen mode Exit fullscreen mode

Do yourself a favor and use full names for IDs. You can thank me later. Your DBAs will thank you now.

Avoid Ambiguity

As much as possible, name columns very descriptively. For example, a temperature column doesn't make sense for this:

SELECT name, 'too cold'
  FROM areas
 WHERE temperature < 32;  
Enter fullscreen mode Exit fullscreen mode

I live in France and for anyone here, 32 would be "too hot". Instead, name that column fahrenheit.

SELECT name, 'too cold'
  FROM areas
 WHERE fahrenheit < 32;  
Enter fullscreen mode Exit fullscreen mode

Now it's unambiguous.

Also, when you have foreign key constraints, you should name the columns on each side of the constraint identically, if possible. For example, consider this perfectly reasonable, sane, SQL.

SELECT *
  FROM some_table       s
  JOIN some_other_table o
    ON o.owner = s.person_id;
Enter fullscreen mode Exit fullscreen mode

That looks fine. There's really nothing wrong with it. But when you consult the table definition, you discover that some_other_table.owner has a foreign key constraint against companies.company_id. That SQL is, in fact, wrong. Had you used identical names:

SELECT *
  FROM some_table       s
  JOIN some_other_table o
    ON o.company_id = s.person_id;
Enter fullscreen mode Exit fullscreen mode

Again, we've avoided ambiguity. It's immediately clear that we have a bug and you can see it on a single line of code and don't have to go consult the table definition.

However, it should be noted that this isn't always possible. If you have a table with a source warehouse and a destination warehouse, you might want a source_id and a destination_id to compare with your warehouse_id. Naming them source_warehouse_id and destination_warehouse_id will make this easier to follow.

It should also be noted that in the example above, owner is more descriptive of the intent than company_id. If you feel this is likely to cause confusion, you can name the column owning_company_id. That can still embed the meaning of the column in the name while giving you a strong hint as to its intent.

Summary

Naming standards are important because it helps keep your code consistent and predictable. Anything violating that predictability tends to be suspect. However, well-chosen naming standards can make it easier to understand your code and, if done well, can often make it possible to see a error in a single line of code.

  • Use underscore_names instead of CamelCase
  • Table names should be plural
  • Spell out id fields (item_id instead of id)
  • Don't use ambiguous column names
  • When possible, name foreign key columns the same as the columns they refer to

While not perfect, the above database naming conventions will make your database world a better place.

Discussion (10)

Collapse
imperious2000 profile image
Imperious2000

I really like this article, I've used pretty much the same conventions, mostly because this is what it was taught to me, and would say some things are common sense, that common sense that sometimes we miss when working on a project(s). (I'm guilty of the id usage instead of the descriptive one but changed it on a later project, and it definitely helps)

At some point, I was feeling that naming columns shouldn't be wordy but realized it's better to be descriptive and simple if possible than trying to oversimplify things unnecessarily.

Collapse
ovid profile image
Ovid Author

In programming, we often suggest that long variable names be used for variables that have a wider scope. It's important that those be descriptive. It's even more important, in my opinion, for databases, because many people will be staring at those names over and over and over again. They tend to persist even longer than that library I wrote last week. Since databases are often the foundation of apps, I try hard to put more effort into making them as understandable as possible.

Collapse
detzam profile image
webstuff

Srry to be a pain, but the part with the primary key nit being id but tablename_id ... Well in school and a lot of books the teory sais its ok id,
Maybe what you presented was a local situation, but i don't see the ambiguity when you write the query, cuz you use the alias.id ... And tablename_id normally is a foreign key in other tables.
But as i sayed i think its a matter of situations.
I for one am with team just id as a primary key and tablebame_id as a foreign key when used in other tables.
Mytwocents

Collapse
ovid profile image
Ovid Author

When I first published this article a long time ago, people complained that they needed to see the second version of the id query because they didn't see what was wrong with the first version. After all, origin.id = thread.id looks fine.

In the second version, because we use explicit names for the ids, you can immediately see that origin.star_id = thread.email_id is wrong. That's the point of this: if your naming conventions let you immediately see that a mistake is made, that's a huge win.

Collapse
aarone4 profile image
Aaron Reese

I've been a SQL developer for 30 years. And this is SO wrong.... Table.id is the PK (or at least the unique constraint) on this table. Table.other_id is the Foreign key to the related table. Origin.id = email.id immediately stands out as being incorrect because at least one side should reference the FK field unless one of them is a straight 1:1 extension table.

Table names should be singular as each row represents a single occurrence of the modelled entity, for the same reason your class names are singular. If you are going to conflict with reserved words, pick better table names: AppUser, BillingConstraint, ShippingTable.
I personally don't like underscored table names it makes code scanning harder because fields do tend to be underscored as are system configurations.
Right at the top you said SQL databases are case insensitive. This depends on the collation settings for the instance, database and table. Case sensitive collations are a nightmare!

Thread Thread
ovid profile image
Ovid Author

While I've laid out my reasoning and you should feel free to disagree. I point out my id naming reasons, you don't like them, that's fine.

As for table names being plural, as mentioned, many experts disagree strongly on this. Joe Celko, world-famous database expert and one of the authors of the SQL-89 and SQL-92 standards, argues in SQL for Smarties, that table names should be plural because they represent collections, not individual entities.

As for case-insensitivity, section 5.1 of the SQL-92 standard states (where the angle-bracketed words refer to a previously defined grammar):

There is a one-to-one correspondence between the symbols contained in <simple Latin upper case letter> and the symbols contained in <simple Latin lower case letter> such that, for all i, the symbol defined as the i-th alternative for <simple Latin upper case letter> corresponds to the symbol defined as the i-th alternative for <simple Latin lower case letter>.

SQL the language (not the data it manipulates) has been case-insensitive for decades.

Thread Thread
aarone4 profile image
Aaron Reese

I'll grant that there is no one perfect answer but I felt it was important to put across a different set of arguments. The most important thing is to be consistent with whatever naming convention you decide to use. The ANSI standard SQL-92 - The clue is in the title. That standard will be 30 years old next year, the world has moved on. Each vendor has proprietary extensions to the standard, my preferred flavour is MSSQL which allows you to define the collation rules for your database which includes whether the database and the data it contains is case sensitive, whether it uses the standard ASCIII character set or allows for 'foreign' characters such as Cyrillic, turkish or accented characters, whether Upper and lower case characters should be treated differently and whether NULL appears at the start or end of the index. This may not be standard, but it is 'real world'.
Please don't quote 'World Renowned' Joe Celko: I have had enough run-ins with him on SQLLServerCentral over the years to know that he is an absolutist and simply cannot accept that data is impure and that applications and their databases are not always (usually) developed by people with Doctorates in data management and who have the full picture of what the end solution will need to look like in 10 years time when the business needs are not well defined and you don't have an infinite amount of time. Joe has some good points and his books are comprehensive but on a personality basis he is way of the mark with understanding how to help people sove the problems they find themselves in. His attitude is - well you shouldn't have gotten there in the first place...

Collapse
detzam profile image
webstuff

Sorry about this but, from this i see that the problem us more of ambiguous aliases than column names.
But as i saud is a matter of opinion and project.

Collapse
thibaultduponchelle profile image
Tib

FYI, first link is dead

Collapse
ovid profile image
Ovid Author

Thank you, Tib. Fixed!

Forem Open with the Forem app