Kyle Mistele

Posted on Dec 27, 2020

How to securely hash and store passwords in your next application

#security #python #database #cloud

Are you hashing your user's passwords? More importantly, are you doing it correctly? There's a lot of information out there on password hashing, and there are certainly more than a few different hash algorithms available for you to use.

As a full-stack engineer, I've spent plenty of time building password-based authentication mechanisms. As an ethical hacker, I've spent plenty of time trying to break those mechanisms and crack password hashes.

In this article, I'm going to provide a brief overview of secure password hashing and storage, and then I'm going to show you how to securely hash your passwords for your next application.

Overview

Password hashing: a 30-second summary

There's been a lot written about what a hash algorithm is, so I won't waste your time reiterating all of it. In short, a hash algorithm is a one-way "trapdoor" function. Let's call the hash function H. Given some data d, it's trivial to compute H(d). But given only the hash H(d), it's nearly impossible to compute d. It's also important to note that even a one-byte difference in d will result in a completely different hash H(d).

Instead of storing passwords in "plaintext" (i.e. storing passwords directly in our database), it is more secure to hash passwords before storing them in the database. Given a password p, we compute H(p), and store that value in the database. When a user tries to log in, we hash the password that the user tried to log in with, and compare it to the hash in the database. If the two match, then the password is valid, and we log the user in. If they don't, then the user provided the wrong password.

Why do we do this? It protects passwords from hackers, lazy or mal-intentioned system administrators, and data leaks. If the database is leaked or hacked, then hackers can't easily determine what all of the user's passwords are simply by looking at them. This is even more important considering that many people use the same or similar passwords for most of their accounts. Without password hashing, one account being hacked could lead to all of a user's accounts across multiple services being compromised.

A brief example

Below I've provided some python pseudo-code to give you an idea of what your login function might look like using password hashing.

def login(username, password):
    user = Users.get(username) # fetch the user record from the database

    # if no user matches the username, don't log them in
    if not user:  
        return False

    # hash the supplied password
    supplied_hash = some_hash_function(password)

    # see if that hash matches the user's hash
    if supplied_hash == user.password_hash:
        return True
    else:  
        return False

The above code is a little simplified, since it'll depend on how you're storing and retrieving users from your database. In this example, I also didn't touch on the actual hash function you should use, so let's dig into the details a little more.

Hash Functions

There are a myriad of hash functions out there, and many offer different advantages. For example, some hash functions are fast, and others are slow. Fast hashing algorithms are great for building data structures like hash tables, but we want to use slow hash functions for password hashing since slow hash functions make brute force attacks more difficult. Let's look at a few common hash functions.

Some common hash functions

Hash Function	Description
MD5	Was commonly used for password hashing, but now considered insecure for cryptographic purposes due to some vulnerabilities that were discovered in it
SHA-1	Originally designed by the NSA for various purposes, now considered deprecated and insecure
SHA-3	Better than SHA-1, considered both safe and flexible
NTLM	Commonly used in Windows active directory, but easy to crack. Use NTLMv2 instead.
Bcrypt	A slow hash function that is resistant to brute-force cracks. Commonly used in some Linux distributions. Considered very secure.
Argon2	A complicated but extremely secure hash function, resistant to brute force attacks. Can be difficult to implement.

What on earth is a salt?

A "salt" is a random piece of data that is often added to the data you want to hash before you actually hash it. Adding a salt to your data before hashing it will make the output of the hash function different than it would be if you had only hashed the data.

When a user sets their password (often on signing up), a random salt should be generated and used to compute the password hash. The salt should then be stored with the password hash. When the user tries to log in, combine the salt with the supplied password, hash the combination of the two, and compare it to the hash in the database.

Why should you use a salt?

Without going into too much detail, hackers commonly use rainbow table attacks, dictionary attacks, and brute-force attacks to try and crack password hashes. While hackers can't compute the original password given only a hash, they can take a long list of possible passwords and compute hashes for them to try and match them with the passwords in the database. This is effectively how these types of attacks work, although each of the above works somewhat differently.

A salt makes it much more difficult for hackers to perform these types of attacks. Depending on the hash function, salted hashes take nearly exponentially more time to crack than unsalted ones. They also make rainbow table attacks nearly impossible. It's therefore important to always use salts in your hashes.

Which hash function should you use?

Lots of people will tell you that there's no "right" or "wrong" answer to this question, only trade-offs. That's true, but some hash functions make better trade-offs than others.

Personally, I'm a big fan of Bcrypt and Argon2 because both are extremely secure, both require salts, and both are slow (which as we discussed above, is a property we want for password hashing functions). Argon2 is a lot more complicated than Bcrypt, and can be more difficult to implement. Bcrypt is also a lot more common and more languages have libraries for it, so it's what I tend to use. I recommend that you use one of these two as well.

Putting it all together

Below, I have provided two examples on how to implement everything that we've discussed today. The first example is pseudo-code, and the second one is in Python.

Password hash authentication pseudo-code

Most common languages should provide a bcrypt module or package, but the interface to it will invariably look different, so I've tried to be as language-agnostic as possible.

# should be called when a user signs up or changes their password
function calculate_hash(password) 
    salt = random_bytes(14) # or any other length
    hash = bcrypt_hash(password, salt)

    # store this with the user in your database
    return hash 

# called whenever a user tries to login
function login_user(username, password) 
    user = get_user_from_database(username)

    # bcrypt stores the salt with the hash, your library should manage this for you
    salt = get_salt(user.hash) 
    new_hash = bcrypt_hash(password, salt)
    if new_hash == user.hash
        login_user()
    else 
        dont_login_user()

Note that your salt should be at least 8 bytes long, but longer is more secure.

Password hash authentication Python code

Python provides a bcrypt module that can be installed with Pip, and I'm going to use that for this example. The bcrypt module handles the computation behind the scenes for you, so it's super easy to use:

import bcrypt

# this will create the hash that you need to store in your database
def create_bcrypt_hash(password):
    # convert the string to bytes
    password_bytes = password.encode()      
    # generate a salt
    salt = bcrypt.gensalt(14)               
    # calculate a hash as bytes
    password_hash_bytes = bcrypt.hashpw(password_bytes, salt)   
    # decode bytes to a string
    password_hash_str = password_hash_bytes.decode()            


    # the password hash string should similar to:
    # $2b$10$//DXiVVE59p7G5k/4Klx/ezF7BI42QZKmoOD0NDvUuqxRE5bFFBLy
    return password_hash_str        

# this will return true if the user supplied a valid password and 
# should be logged in, otherwise false
def verify_password(password, hash_from_database):
    password_bytes = password.encode()
    hash_bytes = hash_from_database.encode()

    # this will automatically retrieve the salt from the hash, 
    # then combine it with the password (parameter 1)
    # and then hash that, and compare it to the user's hash
    does_match = bcrypt.checkpw(password_bytes, hash_bytes)

    return does_match

Conclusion

Key takeaways:

Always store your passwords as hashes, and never as plain text.
Use a salt for extra security.
Use Bcrypt or Argon2 for your hash function

I hope you find this article useful! Let me know what you think in the comments below.

If you're writing code for cloud applications, you need to go when things go wrong. I helped buildCodeLighthouse to send real-time application error notifications straight to developers so that you can find and fix errors faster. Get started for free at codelighthouse.io today!

Top comments (8)

AdithyaR-afk • Dec 28 '20

Is there not a vulnerability when the password is being passed as an argument to the encrypt function? Assuming I'm storing the password in a variable in the first place
Also is it fine to use the encrypt function in the client side of the webapp?

Christopher De Jesus • Dec 28 '20

This would not be considered a vulnerability (on the backend at least) because if a malicious user had access to the runtime, the program would already be compromised. (After all, python functions are values/variables themselves). As for client side, it really depends on the use case. A use case where it would be vulnerable, can potentially be in my answer for 2.
Yes/No. The actual encrypting is fine, though unconventional and i would consider bad (and even perhaps dangerous?) practice. That is, of course, assuming login authentication still happens servers side. Otherwise, NO, that is in no way secure for reasons I'm sure you know.

TlDr; Do all authentication tasks serverside to save the trouble and simplify design. Doing otherwise does not accomplish much. If you are afraid of sending plaintext passwords over HTTPS, no need to be- it is already encrypted as per TLS and encrypting the password further would not accomplish much in terms of added security. (See security.stackexchange.com/questio...)

Dave Cridland • Dec 30 '20

Encrypting on the client side leads to a subtle attack, actually - the encrypted password is now a "plaintext equivalent", and the temptation then on the serverside is to do a encrypted_password == stored_hash. And then the attacker can work through, byte by byte, with a timing attack and ultimately find the entire encrypted password.

Whoops.

This is because the typical string comparison code - such as the C library strcmp() call - works through byte by byte in a for loop and bails at the first mismatch. An attacker can time the rejections carefully, and slowly by surely increase the matches.

Yes, it's a slow method - but much faster than brute force, and surprisingly plausible.

As you've noted, libraries like the Python bcrypt provide a check function that - if you dig into the code - you'll probably find will take the same time on any failure or success. These are known as "constant time" functions, and they're useful things.

So, don't encrypt on the client-side - or rather, you can but you should always encrypt server-side as well.

Typically, though, we can trust TLS and skip doing much client-side. Indeed, if you're about to send over a plaintext equivalent - a stable, repeatable transformation of the original input password - there's little point worrying - if an attacker can see "inside" the TLS channel, they can just re-use the plaintext equivalent you're sending anyway.

But in some cases that's not a valid assumption. For example, some bizarre corporate security systems essentially MITM the TLS channel with a root certificate imposed on the client system. If this is within your threat model, look at the design of SCRAM, and use that - that uses a simple, hash-based, mechanism to encrypt and transfer a plaintext equivalent hash and then verify the server knew it, all without storing plaintext passwords (or password equivalents) on the server.

And, of course, also look at implementing TOTP and getting a cheap 2FA system up - whether or not you're worrying about plaintext equivalents.

Kyle Mistele • Dec 28 '20

Yeah, this is correct. Passing a password as an argument to an encrypt function necessarily has to happen somewhere, otherwise how would you encrypt it? To Christopher's point, if someone can mess with your runtime or source code, you have bigger problems.

Encrypting client side is uncommon and probably considered bad practice. Most back-end languages have libraries for performing hashing and so forth, but I'm not aware of many implementations to be used with JavaScript on the front end. You'd either have to roll your own (never roll your own cryptography), or copy/paste someone else's which would be bad. Doing it on the back end also will ensure standardization. As long as you force HTTPS (TLS/SSL) for authentication and authorization, which most browsers will require (many will automatically upgrade from HTTP to HTTPS where possible), you're fine.

Justin Gross • Dec 28 '20 • Edited

While I appreciate the time here discussing auth and cryptography I don't appreciate the countless articles helping users roll their own auth. These articles should be hedged with "use real auth in production" and "for learning purposes only".

If you're not inventing the next auth standard or a better implementation of the current auth standard then you shouldn't be rolling your own auth. Full stop. There's a million (being hyperbolic) things you can get wrong even if you hash passwords right. This is one of the many reasons why OpenID is the standard and everyone writing "real" applications expected to be used by "real" users should use an Identity Management service. Please stop teaching people it's ok to roll their own auth.

Here's an article which does a fairly well job at explaining why.
withblue.ink/2020/04/08/stop-writi...

Kyle Mistele • Dec 28 '20 • Edited

Yeah, there's definitely something to be said for using trusted identity providers like Google - there are a lot of advantages to that, but it has some disadvantages to it as well.

I would certainly never recommend that someone roll their own hash functions or their own implementation of JWT, or heavens forbid their own cryptography, as doing so would inevitably go disastrously wrong.

That being said, federated logins aren't a one-size fits all solution. Plenty of companies choose to use existing hash libraries and well-known, well-maintained auth libraries for session or token management, and many of them have valid reasons for doing so. I personally think there's nothing wrong with doing that as long as you are both deliberate and careful about it.

Justin Gross • Dec 28 '20 • Edited

I agree to a degree. Companies, with the ample resources (with talented developers) should absolutely contemplate writing their own auth... when existing solutions do not fit their needs and provided that they are unable to contribute to the standards (in spec or impl) or extend an existing open solution. Many, even the "big dogs", also get it wrong; for example Apples early impl of OpenID. You also don't have to go with a hosted/managed third party to use an IAM nor must it be federated. There are several "host it yourself" IAM solutions which provide OpenID compliant services. Since they are open source you could even extend them with your needed custom auth "additions" for your special edge cases. You can also extend OpenID spec with your special use cases similar to HEART and UMA.

I think these articles have merit and value for learning. The issue I take is the impression these articles give the inexperienced devs who proliferate the web trying to build their own solutions and products with a team of 1 (or very few), who don't understand (and seem to have the bare minimum concern for users security) who are the predominant audience of these articles. For those individuals these articles, almost always without disclaimer, practically vindicate that they can and should, roll their own auth.

Should a company write their own auth? Maybe. Under very limited reasons and with the right team who have vast experience in auth. Should Joe Blow and the 3 musketeers roll their own auth? If making products for actual consumption absolutely not. Not if you want to respect your users privacy and provide them with real security (federated or not). Not if you want to spend time adding value to your actual business instead of rebuilding the wheel. I'm probably being hyperbolic but I would venture to say 90%+ of products don't need custom auth solutions.

If you roll your own auth please, at the very very least, make it publicly available so that the security experts and white hat hackers of the world can tare it apart to help rigorously audit and test it.

Kyle Mistele • Jan 4 '21

Hey, thanks for the feedback! Glad you enjoyed it!