DEV Community

Database Naming Standards

Ovid on March 02, 2021

Introduction Database naming conventions, despite SQL having been around for decades, are ad hoc. When moving from project to project, y...

Read full post

webstuff • Mar 4 '21

Srry to be a pain, but the part with the primary key nit being id but tablename_id ... Well in school and a lot of books the teory sais its ok id,
Maybe what you presented was a local situation, but i don't see the ambiguity when you write the query, cuz you use the alias.id ... And tablename_id normally is a foreign key in other tables.
But as i sayed i think its a matter of situations.
I for one am with team just id as a primary key and tablebame_id as a foreign key when used in other tables.
Mytwocents

Ovid • Mar 4 '21

When I first published this article a long time ago, people complained that they needed to see the second version of the id query because they didn't see what was wrong with the first version. After all, origin.id = thread.id looks fine.

In the second version, because we use explicit names for the ids, you can immediately see that origin.star_id = thread.email_id is wrong. That's the point of this: if your naming conventions let you immediately see that a mistake is made, that's a huge win.

Aaron Reese • Mar 26 '21

I've been a SQL developer for 30 years. And this is SO wrong.... Table.id is the PK (or at least the unique constraint) on this table. Table.other_id is the Foreign key to the related table. Origin.id = email.id immediately stands out as being incorrect because at least one side should reference the FK field unless one of them is a straight 1:1 extension table.

Table names should be singular as each row represents a single occurrence of the modelled entity, for the same reason your class names are singular. If you are going to conflict with reserved words, pick better table names: AppUser, BillingConstraint, ShippingTable.
I personally don't like underscored table names it makes code scanning harder because fields do tend to be underscored as are system configurations.
Right at the top you said SQL databases are case insensitive. This depends on the collation settings for the instance, database and table. Case sensitive collations are a nightmare!

Ovid • Mar 26 '21

While I've laid out my reasoning and you should feel free to disagree. I point out my id naming reasons, you don't like them, that's fine.

As for table names being plural, as mentioned, many experts disagree strongly on this. Joe Celko, world-famous database expert and one of the authors of the SQL-89 and SQL-92 standards, argues in SQL for Smarties, that table names should be plural because they represent collections, not individual entities.

As for case-insensitivity, section 5.1 of the SQL-92 standard states (where the angle-bracketed words refer to a previously defined grammar):

There is a one-to-one correspondence between the symbols contained in <simple Latin upper case letter> and the symbols contained in <simple Latin lower case letter> such that, for all i, the symbol defined as the i-th alternative for <simple Latin upper case letter> corresponds to the symbol defined as the i-th alternative for <simple Latin lower case letter>.

SQL the language (not the data it manipulates) has been case-insensitive for decades.

Aaron Reese • Mar 26 '21

I'll grant that there is no one perfect answer but I felt it was important to put across a different set of arguments. The most important thing is to be consistent with whatever naming convention you decide to use. The ANSI standard SQL-92 - The clue is in the title. That standard will be 30 years old next year, the world has moved on. Each vendor has proprietary extensions to the standard, my preferred flavour is MSSQL which allows you to define the collation rules for your database which includes whether the database and the data it contains is case sensitive, whether it uses the standard ASCIII character set or allows for 'foreign' characters such as Cyrillic, turkish or accented characters, whether Upper and lower case characters should be treated differently and whether NULL appears at the start or end of the index. This may not be standard, but it is 'real world'.
Please don't quote 'World Renowned' Joe Celko: I have had enough run-ins with him on SQLLServerCentral over the years to know that he is an absolutist and simply cannot accept that data is impure and that applications and their databases are not always (usually) developed by people with Doctorates in data management and who have the full picture of what the end solution will need to look like in 10 years time when the business needs are not well defined and you don't have an infinite amount of time. Joe has some good points and his books are comprehensive but on a personality basis he is way of the mark with understanding how to help people sove the problems they find themselves in. His attitude is - well you shouldn't have gotten there in the first place...

rowlandholden76 • Jan 12 '23

I know I am late to the party, but I am going to jump in here. I understand that it is common practice to name the PK, ID. But I have always disagreed with this. Reason being I don't like to spend time researching stuff that should be plainly obvious by reading the code. Here is an example:

WHERE tblUser.ID = tblCar.ID

Obviously, there is a problem here, we are looking at two PK fields, which shouldn't happen except in special cases. But which one is wrong? Unless I coded this, I can't 100% know which side is wrong. The policy may be that the PK is always on the left side, but who is to say that Joe Blow actually did that. We already know he messed up the criteria, so maybe when he messed up he did it all the way wrong and put the actual key he wanted to use on the wrong side. The bottom line is I can't actually know that without doing research on the table designs. This is a complete waste of time. Given that we know the intent, what is supposed to be accomplished here, helps. It may not give us the full picture. and for giggles, lets ay the intent is to see which mechanic is assigned to the car. Now we have a little light, but it is still a mess, Perhaps this is supposed to be the PK tblCar.ID to tblUser.CarID. After all, one mechanic can be assigned to fix many cars in one day. Or maybe, it is supposed to be Pk tblUser.ID to tblCar.UserID, after all, many mechanics can be assigned to fix a car if it is tough job. See my point? These situations are rare, but as programmers shouldn't we prepare of all issues as opposed to ommitting the rare cases?

Another example:

WHERE tblUser.UserID = tblCar.ID

This is clear, this states plainly that the userID field is the PK field. If our policy is to always name Key fields, Pk and Fk respectively, we know the issue is in the design of the tblCar as there is missing a name for the key field. Perhaps this field is a PK, or maybe someone didn't name the FK. The result though, is we know EXACTLY where to find the issue from just reading the code. In this case, we open the design of tblCar and name this field appropriately. If it turns out this field is the wrong field for our criteria, then good for us, we just fixed two issues by finding one.

Aaron Reese • Jan 12 '23

OK. Lets unpack your comment as the arguments are common and as always - Make a decision and stick to it.

Firstly - don't name for tables tblUser and tblCar. table is the default object in the database. Yes User is probably a reserved word so find a better table name (like systemUser or employee). Also when you need to 'replace' the table with a view, for example if the fields get split into two separeate tables, to prevent changing ALL your code you will now need to create the view (or a synonym) called tblUser which is doubly confusing because it now isn't a table.....

A.ID = B.ID is patently wrong in the vast majority of cases. It would only really be valid if you have aliased a field as part of a view or CTE or table B is an extension table to A and is a 1:1 relationship so it is an instant code-smell and that makes it easy for a seasoned developer to spot (so it is a good thinig...) Yes you have to interrogate the code to find the intention but you are doing that anyway.

In you example you suggested that the User table in this context is the mechanic. If this was the case then I would be aliasing the tblUser and (tbl)Mechanic (I wouldn't use the tbl - but each to their own) and then presumably the tblCar should have a field MechanicID or MechanicUserID. Under your proposal the JOIN would look like

tblMechanic.UserID = Car.MechanicID

Under my proposal it would be

tblMechanic.ID = Car.MechanicID Which to me is much clearer and confirms that we are joining a PK to an FK. Your way requires more intimate knowlege of the database to be sure that UserID is the PK on the aliased table.

This becomes especially important if the tblUser is going to be used multiple times in the query (Mechanic, Salesman, Valet, Delivery driver, Service receptionist etc).

You said:

after all, many mechanics can be assigned to fix a car if it is tough job

That is a many to many relationship and therefore should be modelled appropriately in the database with a bridging table (e.g. ServiceTeam)
ID INT IDENTITY(1,1), /*yes I know....*/ Car_ID INT NOT NULL, /*FK to Car*/ Mechanic_ID NOT NULL /*FK to User in the context of the role of Mechanic*/ (alternately as the same person may take on many roles...) Role_ID /*FK to job role to identify Mechanic, Valet, Driver, Receptionist*/ User_ID /**FK to User table./

Feel free to respond as these discussions help junior developer to avoid pitfalls later on in the project.

SuperDre • Aug 30 '21

Your example is already wrong, wrong wrong. origin.id = thread.id you should immediately know there is something wrong as it both points to primary keys in different tables, so you do know something is wrong already.
And with origin.star_id = thread.email_id, yeah you immediately see something is wrong, but it's wrong due to 2 things (at least if we're going from the notion at least one of the two is the primary key), so it should be origin.star_id = thread.star_id or origin.email_id = thread.email_id or origin.origin_id = thread.origin_id or origin.id = thread.origin_id (and so on).

webstuff • Mar 5 '21

Sorry about this but, from this i see that the problem us more of ambiguous aliases than column names.
But as i saud is a matter of opinion and project.

Imperious2000 • Mar 5 '21

I really like this article, I've used pretty much the same conventions, mostly because this is what it was taught to me, and would say some things are common sense, that common sense that sometimes we miss when working on a project(s). (I'm guilty of the id usage instead of the descriptive one but changed it on a later project, and it definitely helps)

At some point, I was feeling that naming columns shouldn't be wordy but realized it's better to be descriptive and simple if possible than trying to oversimplify things unnecessarily.

Ovid • Mar 9 '21

In programming, we often suggest that long variable names be used for variables that have a wider scope. It's important that those be descriptive. It's even more important, in my opinion, for databases, because many people will be staring at those names over and over and over again. They tend to persist even longer than that library I wrote last week. Since databases are often the foundation of apps, I try hard to put more effort into making them as understandable as possible.

Michael Lawson • Oct 5 '21

Not sure where the ideas some of the commenters have regarding singular naming and camel case are coming from, but I strongly STRONGLY disagree. Plural table names imply that you're storing a collection of objects. Singular implies the opposite. Can you imagine if in code we called a collection "user" instead of "users"?
As for PascalCase or camelCase naming...this is just ugly.

The argument that it better represents what folks are coding for falls apart easy. Java uses camelCase, as does OO PHP. C, Python, and Ruby use snake_case. C# uses PascalCase. Go uses both camelCase and PascalCase depending on what you're doing. All those considered, when it comes to database columns, underscores make the most sense when we really need to focus on ultimate readability of the name in a data context. (Don't apply this to the programming language being used!!!). Heck most ORMs will convert this automatically in the first place. If anything, using the _ convention, as it's so common, is only going to help coding efforts. Take Laravel's Eloquent ORM for example It already knows that if you have underscores that it can turn those in to camelCase properties when you access the generated object. It's just such a bad argument that you tend to see from crotchety old DBAs that were working since before best practices were even an idea in tech.

I'm 100% with this article. The only disagreement I have is on the naming of the id column. This isn't one I've come across before, but here is why I disagree: It actually increases ambiguity rather than decreases it. If I see a tablename_id pattern, I'm going to assume a 1:1 relationship, not a PK. You could have a self referential column which, while I can't stand the idea, happens in legacy databases or in newer databases created by novice engineers. I get what you're saying, but I think this is where aliases really shine.

Full disclosure: I'm not a DBA, just an engineer that has to work with them. I can tell you right now which I'd prefer, as would all the engineers I have ever worked with.

SuperDre • Aug 30 '21

My god, it seems that every single point you set as THE naming convention is exactly opposite as what I think is a good naming convention.

Use underscore_names instead of CamelCase Nope, Use CamelCase as it reads more fluent and conforms better to the actual code it's being used.

*Table names should be plural
Nop, use singular as every record is a single object, not multiple, and the more likely to be the same as a keyword is bullocks ofcourse, as it would also assert to fieldnames

*Spell out id fields (item_id instead of id)
This is certainly a topic that many people disagree on, many people think every table should have an Id field and only the reference from a foreign table should have the Id. Especially if you want to make things more generic. And ofcourse there are also a lot of people who think you should have the fullname in both tables.
Personally at the moment I opt for the Id variant as it makes a lot much easier.

*Don't use ambiguous column names
Ok, I give you this, but to me this is more due to logic then anything else.

"While not perfect, the above database naming conventions will make your database world a better place"
Well, that's all in the eye of the beholder, as many people have different idea's about what makes it a better place.

Michael Lawson • Oct 5 '21

"Use underscore_names instead of CamelCase Nope, Use CamelCase as it reads more fluent and conforms better to the actual code it's being used."

I guess if the only language you use is C# then this makes sense. Otherwise why are you tying naming to the programming language in use? It should be totally agnostic. This is just bad practice in the first place.

SuperDre • Oct 5 '21

It's got nothing to do with the language I'm using, I'm using even multiple languages. But it's got all to do with what I think reads better and looks more clean, and even less typing. everything lowercase and using underscores is like writing ancient english to me, it's just ugly.
And calling lowercase/underscore best practice, is all in the eye of the beholder. Most people I work with use the CamelCasing with the database designs now.
In reality, in these cases there is no real best practice, it's just what you prefer.

Michael Lawson • Oct 6 '21

"conforms better to the actual code being used" directly implies the language you're using because it's not true in all cases (that it conforms) by a long shot.

I've never in my life seen that kind of casing in database design, like ever. Not sure who you're working with or where but yeah, I think this ain't the hill to die on.

Tamás Rátkay • Sep 10

I would like to settle the issue of plural table names once and for all here.

I know this may sound surprising and aggressive at first, but I have good reason to be so adamant. I find it shocking that database professionals are discussing this issue without understanding the theoretical concepts behind databases.

So, every Entity (database table, if you're really down to earth...) represents an abstract (!) entity TYPE (general category, concept), NOT the entity itself (there is no such thing as "user," only "John," "Jim," etc., i.e., real, flesh-and-blood users). The Entity name therefore represents the "User" category, the content of which, in turn, represents Entity OCCURRENCES, Entity instances, and therefore a kind of "singulars". Therefore, there is no place for any plural form, because the Entity name does NOT represent a collection, but the TYPE of occurrences. Of course, the attributes of Entities (table fields...) are also not given plural names, because they are also only TYPES that refer to specific property occurrences (the current value of the attribute).

I think it's time for many people to look for real knowledge when they talk (brag...) about databases.