Are you hashing your user's passwords? More importantly, are you doing it correctly? There's a lot of information out there on password hashing, and there are certainly more than a few different hash algorithms available for you to use.
As a full-stack engineer, I've spent plenty of time building password-based authentication mechanisms. As an ethical hacker, I've spent plenty of time trying to break those mechanisms and crack password hashes.
In this article, I'm going to provide a brief overview of secure password hashing and storage, and then I'm going to show you how to securely hash your passwords for your next application.
There's been a lot written about what a hash algorithm is, so I won't waste your time reiterating all of it. In short, a hash algorithm is a one-way "trapdoor" function. Let's call the hash function
H. Given some data
d, it's trivial to compute
H(d). But given only the hash
H(d), it's nearly impossible to compute
d. It's also important to note that even a one-byte difference in
d will result in a completely different hash
Instead of storing passwords in "plaintext" (i.e. storing passwords directly in our database), it is more secure to hash passwords before storing them in the database. Given a password
p, we compute
H(p), and store that value in the database. When a user tries to log in, we hash the password that the user tried to log in with, and compare it to the hash in the database. If the two match, then the password is valid, and we log the user in. If they don't, then the user provided the wrong password.
Why do we do this? It protects passwords from hackers, lazy or mal-intentioned system administrators, and data leaks. If the database is leaked or hacked, then hackers can't easily determine what all of the user's passwords are simply by looking at them. This is even more important considering that many people use the same or similar passwords for most of their accounts. Without password hashing, one account being hacked could lead to all of a user's accounts across multiple services being compromised.
Below I've provided some python pseudo-code to give you an idea of what your login function might look like using password hashing.
def login(username, password): user = Users.get(username) # fetch the user record from the database # if no user matches the username, don't log them in if not user: return False # hash the supplied password supplied_hash = some_hash_function(password) # see if that hash matches the user's hash if supplied_hash == user.password_hash: return True else: return False
The above code is a little simplified, since it'll depend on how you're storing and retrieving users from your database. In this example, I also didn't touch on the actual hash function you should use, so let's dig into the details a little more.
There are a myriad of hash functions out there, and many offer different advantages. For example, some hash functions are fast, and others are slow. Fast hashing algorithms are great for building data structures like hash tables, but we want to use slow hash functions for password hashing since slow hash functions make brute force attacks more difficult. Let's look at a few common hash functions.
|MD5||Was commonly used for password hashing, but now considered insecure for cryptographic purposes due to some vulnerabilities that were discovered in it|
|SHA-1||Originally designed by the NSA for various purposes, now considered deprecated and insecure|
|SHA-3||Better than SHA-1, considered both safe and flexible|
|NTLM||Commonly used in Windows active directory, but easy to crack. Use NTLMv2 instead.|
|Bcrypt||A slow hash function that is resistant to brute-force cracks. Commonly used in some Linux distributions. Considered very secure.|
|Argon2||A complicated but extremely secure hash function, resistant to brute force attacks. Can be difficult to implement.|
A "salt" is a random piece of data that is often added to the data you want to hash before you actually hash it. Adding a salt to your data before hashing it will make the output of the hash function different than it would be if you had only hashed the data.
When a user sets their password (often on signing up), a random salt should be generated and used to compute the password hash. The salt should then be stored with the password hash. When the user tries to log in, combine the salt with the supplied password, hash the combination of the two, and compare it to the hash in the database.
Without going into too much detail, hackers commonly use rainbow table attacks, dictionary attacks, and brute-force attacks to try and crack password hashes. While hackers can't compute the original password given only a hash, they can take a long list of possible passwords and compute hashes for them to try and match them with the passwords in the database. This is effectively how these types of attacks work, although each of the above works somewhat differently.
A salt makes it much more difficult for hackers to perform these types of attacks. Depending on the hash function, salted hashes take nearly exponentially more time to crack than unsalted ones. They also make rainbow table attacks nearly impossible. It's therefore important to always use salts in your hashes.
Lots of people will tell you that there's no "right" or "wrong" answer to this question, only trade-offs. That's true, but some hash functions make better trade-offs than others.
Personally, I'm a big fan of Bcrypt and Argon2 because both are extremely secure, both require salts, and both are slow (which as we discussed above, is a property we want for password hashing functions). Argon2 is a lot more complicated than Bcrypt, and can be more difficult to implement. Bcrypt is also a lot more common and more languages have libraries for it, so it's what I tend to use. I recommend that you use one of these two as well.
Below, I have provided two examples on how to implement everything that we've discussed today. The first example is pseudo-code, and the second one is in Python.
Most common languages should provide a bcrypt module or package, but the interface to it will invariably look different, so I've tried to be as language-agnostic as possible.
# should be called when a user signs up or changes their password function calculate_hash(password) salt = random_bytes(14) # or any other length hash = bcrypt_hash(password, salt) # store this with the user in your database return hash # called whenever a user tries to login function login_user(username, password) user = get_user_from_database(username) # bcrypt stores the salt with the hash, your library should manage this for you salt = get_salt(user.hash) new_hash = bcrypt_hash(password, salt) if new_hash == user.hash login_user() else dont_login_user()
Note that your salt should be at least 8 bytes long, but longer is more secure.
Python provides a bcrypt module that can be installed with Pip, and I'm going to use that for this example. The bcrypt module handles the computation behind the scenes for you, so it's super easy to use:
import bcrypt # this will create the hash that you need to store in your database def create_bcrypt_hash(password): # convert the string to bytes password_bytes = password.encode() # generate a salt salt = bcrypt.gensalt(14) # calculate a hash as bytes password_hash_bytes = bcrypt.hashpw(password_bytes, salt) # decode bytes to a string password_hash_str = password_hash_bytes.decode() # the password hash string should similar to: # $2b$10$//DXiVVE59p7G5k/4Klx/ezF7BI42QZKmoOD0NDvUuqxRE5bFFBLy return password_hash_str # this will return true if the user supplied a valid password and # should be logged in, otherwise false def verify_password(password, hash_from_database): password_bytes = password.encode() hash_bytes = hash_from_database.encode() # this will automatically retrieve the salt from the hash, # then combine it with the password (parameter 1) # and then hash that, and compare it to the user's hash does_match = bcrypt.checkpw(password_bytes, hash_bytes) return does_match
- Always store your passwords as hashes, and never as plain text.
- Use a salt for extra security.
- Use Bcrypt or Argon2 for your hash function
I hope you find this article useful! Let me know what you think in the comments below.
If you're writing code for cloud applications, you need to go when things go wrong. I helped buildCodeLighthouse to send real-time application error notifications straight to developers so that you can find and fix errors faster. Get started for free at codelighthouse.io today!