Database naming conventions, despite SQL having been around for decades, are ad hoc. When moving from project to project, you often have to learn their particular quirks about database design and naming. This is because most developers don't understand databases. In fact, with my years of hiring developers, only a handful of times have I met developers who can properly normalize a database, even amongst those with strong SQL skills.
The database, for many developers, is an afterthought. It's just some place to save your data until the important (*cough*) work is done.
But this article isn't about database normalization. If you want to learn more, here's a short talk I gave which explains the basics.
Instead, when you have a working database, the question we want to know is "what standards can we apply which make it easier to use that database?" If these standards were to be widely adopted, databases would be easier to use because you wouldn't have to learn and remember a new set of standards every time you work with a new database.
Let's get this out of the way quickly. I routinely see database examples online where we see table names like
customer_orders. Which should you use? You probably want to use whatever standard is already in place, but if you're creating a new database, I recommend
using_undercores for accessibility. The words "under value" have a different meaning from the word "undervalue", but the former, with underscores, is always
under_value, while the latter is
undervalue. With CamelCase, it's
UnderValue which, since SQL is case-insensitive, are identical. Thus, underscores help reduce ambiguity.
Underscores also improve accessibility. If you have vision problems and are constantly playing around with fonts and sizes to distinguish words, the underscores are much easier to read. And CamelCase is anecdotally harder to read for people for whom English isn't their first language.
That being said, this is a personal preference and not a strong recommendation. Many people love their CamelCase and this quickly becomes a "tabs versus spaces" debate (e.g., a waste of time).
There's long been a huge debate amongst experts in database theory about whether or not database tables should be singular (
customer) or plural (
customers). Without going into the theory, let me cut the Gordian Knot with a healthy dose of pragmatism: plural table names are less likely to conflict with reserved keywords.
Do you have users? SQL has a
user reserved word. Do you want a table of constraints?
constraint is a reserved word. Is
audit a reserved word but you want an
audit table? By simply using the plural form of nouns, most reserved words won't cause you grief when you're writing SQL. I've even tripped up PostgreSQL, which has an excellent SQL parser, when I've had a
user table and later, an ORM automatically aliased that to
user (yes, the same name) and PostgreSQL got confused.
Just use plural names and you're far less likely to have a collision.
This is a sin I've been guilty of for years. When working with a client in Paris, I had a DBA complain when I named my id columns
id. I, a mere developer, thought our DBA—who was a very good DBA—was just being pedantic. After all, the
customers.id column is unambiguous, but
customers.customer_id is repeating information.
And later I had to debug the following from the Tau Station MMORPG:
SELECT thread.* FROM email thread JOIN email selected ON selected.id = thread.id JOIN character recipient ON recipient.id = thread.recipient_id JOIN station_area sa ON sa.id = recipient.id JOIN station st ON st.id = sa.id JOIN star origin ON origin.id = thread.id JOIN star destination ON destination.id = st.id LEFT JOIN route ON ( route.from_id = origin.id AND route.to_id = destination.id ) WHERE selected.id = ? AND ( thread.sender_id = ? OR ( thread.recipient_id = ? AND ( origin.id = destination.id OR ( route.distance IS NOT NULL AND now() >= thread.datesent + ( route.distance * interval '30 seconds' ) )))) ORDER BY datesent ASC, thread.parent_id ASC
(As an aside, yes, I really am that careful about my SQL formatting)
Do you see the problem? If the SQL had used full id names, such as
station_id, the bugs would have stood out like a sore thumb while I was typing out this SQL, not later when I was trying to figure out both what I did wrong and why I don't drink as much as I should.
And by request of a few people who couldn't see the errors, here's the SQL after it's corrected. It's very clear that
station_area_id are probably not valid comparisons. If SQL had a decent type system, this SQL would not even have compiled.
SELECT thread.* FROM email thread JOIN email selected ON selected.email_id = thread.email_id JOIN character recipient ON recipient.character_id = thread.recipient_id -- station_area_id = character_id is probably wrong JOIN station_area sa ON sa.station_area_id = recipient.character_id -- station_id = station_area_id is probably wrong JOIN station st ON st.station_id = sa.station_area_id -- star_id = email_id is probably wrong JOIN star origin ON origin.star_id = thread.email_id JOIN star destination ON destination.star_id = st.star_id LEFT JOIN route ON ( route.from_id = origin.star_id AND route.to_id = destination.star_id ) WHERE selected.email_id = ? AND ( thread.sender_id = ? OR ( thread.recipient_id = ? AND ( origin.star_id = destination.star_id OR ( route.distance IS NOT NULL AND now() >= thread.datesent + ( route.distance * interval '30 seconds' ) )))) ORDER BY datesent ASC, thread.parent_id ASC
Do yourself a favor and use full names for IDs. You can thank me later. Your DBAs will thank you now.
As much as possible, name columns very descriptively. For example, a
temperature column doesn't make sense for this:
SELECT name, 'too cold' FROM areas WHERE temperature < 32;
I live in France and for anyone here, 32 would be "too hot". Instead, name that column
SELECT name, 'too cold' FROM areas WHERE fahrenheit < 32;
Now it's unambiguous.
Also, when you have foreign key constraints, you should name the columns on each side of the constraint identically, if possible. For example, consider this perfectly reasonable, sane, SQL.
SELECT * FROM some_table s JOIN some_other_table o ON o.owner = s.person_id;
That looks fine. There's really nothing wrong with it. But when you consult the table definition, you discover that
some_other_table.owner has a foreign key constraint against
companies.company_id. That SQL is, in fact, wrong. Had you used identical names:
SELECT * FROM some_table s JOIN some_other_table o ON o.company_id = s.person_id;
Again, we've avoided ambiguity. It's immediately clear that we have a bug and you can see it on a single line of code and don't have to go consult the table definition.
However, it should be noted that this isn't always possible. If you have a table with a source warehouse and a destination warehouse, you might want a
source_id and a
destination_id to compare with your
warehouse_id. Naming them
destination_warehouse_id will make this easier to follow.
It should also be noted that in the example above,
owner is more descriptive of the intent than
company_id. If you feel this is likely to cause confusion, you can name the column
owning_company_id. That can still embed the meaning of the column in the name while giving you a strong hint as to its intent.
Naming standards are important because it helps keep your code consistent and predictable. Anything violating that predictability tends to be suspect. However, well-chosen naming standards can make it easier to understand your code and, if done well, can often make it possible to see a error in a single line of code.
- Table names should be plural
- Spell out id fields (
- Don't use ambiguous column names
- When possible, name foreign key columns the same as the columns they refer to
While not perfect, the above database naming conventions will make your database world a better place.