Understanding password storage

#programming #security #backend

Storing passwords can be a challenging and painful task. With this post, I hope to highlight some of the challenges. Keep in mind that managing password by yourself is never a good idea and it's always best to hand it over to a security expert by making use of one of the several auth providers out there.

Storing Passwords in plain text

So what's the deal with storing the passwords. Can't they just be stored as it is like other things in DB. Well, we can, but we should not do that due to the following 2 reasons

Anyone who has access to the database can then easily see and use the password. Sure, they might not do it, but nothing is stopping them from not doing it.
If your table is compromised by someway by a hacker, they can easily get access to the login details of your user. They just have to get access to the one table in the database. This might cause serious data breach and even legal problems.

Now we know that we should not store passwords as plain text. Lets look at some of the way, data can be stored safely.

Encryption Vs Hashing

Encryption is a mathematical algorithm that converts some text to something completely different by using an encryption key. Anyone who has access to the key can decrypt the encrypted data. If the encryption key is compromised, the passwords are likely compromised as well. Since we can find out the original password using an encryption key, it is not safe to use encryption for storing passwords.

Hashing is another kind of cryptographic mathematical function that converts a string into a completely random combination of characters of a fixed length. Unlike encryption, it is not possible to find the original string easily once hashed. Due to this reason, hashing is the preferred way of storing passwords. Some common hashing algorithms are MD-5, SHA-1, and SHA-2 family algorithms.

Hashing works well for storing the password for the below reason

It is impossible to know the length of the original password by looking into the hashed password.
It is close to impossible to reverse the hash.
Even a small change in input will produce a dramatically different hash.

input - password
hash - 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8

input - Password
hash - e7cf3ef4f17c3999a94f2c6f612e8a888e5b1026878e4e19398b23bd38ec221a
// Just by changing p to caps produces totally different hash

So if we don't know the original password, how will we validate if the password entered by the user is correct? We will validate it by hashing the password entered by the user and comparing it with the stored hash in the table. We will get the same hash for the same input. Hence if the hashed password matches, that means the user has entered the right password. This is also the reason why it is not possible to retrieve your password once you have lost it (but still you can validate it with previously stored hashed passwords by comparing the hashed values)
So now even if your database is compromised, all the attacker will see is a collection of hashed passwords. Great! that means our client's data is safe right? Well, hashing protects data to an extent, but it's not completely safe. Let's put on our hacker's cap and try to see what we can do with the hashed password list.

Limitations of Hashing

Now we have a list of hashed passwords. You go through the list and find that there are a few identical hashes. That probably means that the password is a commonly used word. As a hacker, you can then just use the commonly used passwords as input and see if you can crack it. Most probably you would. That's the reason it's always advised to use a complicated password. But not all users are going to use it. And if you lose the user's data, it's going to be your responsibility.
There is also a list of commonly used passwords and there hashes available that get updated with every data breach. These are called the rainbow table which the hackers can use to figure out the password. So how can we overcome it?

Seasoning with Salt

As storing the password with hashing alone might not be enough, we have to combine the password with a randomly generated string which we call as the "salt" and then hash them together. Since we generate the salt randomly, we will get different hash even from the same passwords. So hashing makes it much more secure to save the password. The salt is then stored usually in a different database.

input1 - password
unsalted hash - 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
salted input - password6m:f?F!cZal-
salted hash - d60ec14145dcf8631e604b6db2a2fc1b0ef18460f8ecba77e79b2662ba36c9be

input2 - password
unsalted hash - 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
salted input - passwordtJgeu:nJSgvf
salted hash - 6a8cdd9c910aa70ed8bbd58eb03b1d46bc524cc0d85800b58c9bcf801b15c1d9

As you can see, with salt even for the same input, we generate completely different hashes. This makes salting passwords much better even for not so strong passwords.
To validate the user input, we will have to get the salt stored for that user, add the given password with the salt and hash it and see if it matches the hash stored in the database.

Spicing with pepper

It is not unheard of that even the database containing the salt of the user is compromised. If you want to secure our passwords further, we have to use a random string that we add to all passwords along with the password and salt. This random string is called "pepper" and it's usually stored right inside the source code. Just make sure you don't push this into a publicly hosted repository.
Using pepper might not be necessary all the time. And it might cause an issue if you compromise the value in the pepper cause you have hashed all the passwords with this single value and it is not easy to change. So keep that in mind.

DEV Community