DEV Community

gao-sun for Logto

Posted on • Originally published at blog.logto.io

The evolution of password hashing

Introduction

Password hashing, as the name suggests, is the process of calculating a hash value from a password. The hash value is typically stored in a database, and during the login (sign-in) process, the hash value of the password entered by the user is calculated and compared with the hash value stored in the database. If they match, the user is authenticated.

Before we dive into the evolution of password hashing algorithms, it's important to understand why it's necessary.

Plain text passwords: A major security risk

Imagine being a user of a website where you've registered an account. One day, the website gets hacked, and the database is leaked. If the website stores passwords in plaintext, the hacker can directly access your password. Since many people reuse passwords across multiple websites, the hacker can use this password to gain unauthorized access to your other accounts. The situation becomes even worse if you use the same or a similar password for your email account, as the hacker can reset your password and take over all your associated accounts.

Even without a data breach, in large teams, anyone with database access can see passwords. Compared to other information, passwords are highly sensitive, and you definitely don't want anyone to have access to them.

Storing passwords without hashing is an amateur mistake. Unfortunately, if you search for "password leak plaintext", you'll find that major corporations like Facebook, DailyQuiz, and GoDaddy have all experienced password leaks in plaintext. It's likely that many other companies have made the same error.

Encoding v.s. encryption v.s. hashing

These three terms are often confused, but they are distinct concepts.

Encoding

Encoding is the first thing to exclude for password storage. For example, Base64 is an encoding algorithm that converts binary data into a string of characters:

const data = 'Hello, world!';
const encoded = Buffer.from(data).toString('base64');
console.log(encoded); // SGVsbG8sIHdvcmxkIQ==
Enter fullscreen mode Exit fullscreen mode

Knowing the encoding algorithm allows anyone to decode the encoded string and retrieve the original data:

const encoded = 'SGVsbG8sIHdvcmxkIQ==';
const data = Buffer.from(encoded, 'base64').toString();
console.log(data); // Hello, world!
Enter fullscreen mode Exit fullscreen mode

To hackers, most encoding algorithms are equivalent to plaintext.

Encryption

Before hashing gained popularity, encryption was used to store passwords, such as with AES. Encryption involves using a key (or a pair of keys) to encrypt and decrypt data.

The problem with encryption is evident in the term "decrypt". Encryption is reversible, meaning that if a hacker obtains the key, they can decrypt the password and retrieve the plaintext password.

Hashing

The primary difference between hashing, encoding, and encryption is that hashing is irreversible. Once a password is hashed, it cannot be decrypted back to its original form.

As a website owner, you don't actually need to know the password itself, as long as the user can log in with the correct password. The registration process can be simplified as follows:

  1. User enters the password.
  2. The service uses a hashing algorithm to calculate the hash value of the password.
  3. The service stores the hash value in the database.

When the user logs in, the process is:

  1. User enters the password.
  2. The service uses the same hashing algorithm to calculate the hash value of the password.
  3. The service compares the hash value with the hash value stored in the database.
  4. If the hash values match, the user is authenticated.

Both processes avoid storing passwords in plaintext, and since hashing is irreversible, even if the database is compromised, the hacker can only obtain hash values that appear as random strings.

Hashing algorithms starter pack

Hashing may seem like the perfect solution for password storage, but it's not that simple. To understand why, let's explore the evolution of password hashing algorithms.

MD5

In 1992, Ron Rivest designed the MD5 algorithm, a message-digest algorithm that can calculate a 128-bit hash value from any data. MD5 has been widely used in various fields, including password hashing. For example, the MD5 hash value of "123456" is:

e10adc3949ba59abbe56e057f20f883e
Enter fullscreen mode Exit fullscreen mode

As mentioned earlier, the hash value appears as a random string and is irreversible. Moreover, MD5 is fast and easy to implement, making it the most popular password hashing algorithm.

However, MD5's advantages are also its weaknesses in password hashing. Its speed makes it vulnerable to brute-force attacks. If a hacker possesses a list of common passwords and your personal information, they can calculate the MD5 hash value of each combination and compare them with the hash values in the database. For example, they might combine your birthday with your name or your pet's name.

In the present day, computers are significantly more powerful than before, making it easy to brute force MD5 password hashes.

SHA family

So, why not use a different algorithm that generates longer hash values? The SHA family seems like a good choice. SHA-1 is a hashing algorithm that generates 160-bit hash values, and SHA-2 is a family of hashing algorithms that generate hash values of 224-bit, 256-bit, 384-bit, and 512-bit lengths. Let's see the SHA-256 hash value of "123456":

8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92
Enter fullscreen mode Exit fullscreen mode

The SHA-256 hash value is much longer than MD5, and it is also irreversible. However, there's another issue: if you already know the hash value, like the one above, and you see the exact hash value in the database, you know the password is "123456". A hacker can create a list of common passwords and their corresponding hash values, and compare them against the hash values in the database. This list is known as a rainbow table.

Salt

To mitigate rainbow table attacks, the concept of salt was introduced. Salt is a random string that is added to the password before hashing. For example, if the salt is "salt", and you want to use SHA-256 to hash the password "123456" with the salt, instead of simply doing:

sha256('123456');
Enter fullscreen mode Exit fullscreen mode

You would do:

sha256('salt123456'); // 9898410d7f5045bc673db80c1a49b74f088fd7440037d8ce25c7d272a505bce5
Enter fullscreen mode Exit fullscreen mode

As you can see, the result is completely different from hashing without salt. Typically, each user is assigned a random salt during registration, which is stored in the database alongside the hash value. During the login process, the salt is used to calculate the hash value of the entered password, which is then compared to the stored hash value.

Iteration

Despite the addition of salt, the hash value is still susceptible to brute-force attacks as hardware becomes more powerful. To make it harder, iteration (i.e., running the hashing algorithm multiple times) can be introduced. For example, instead of using:

sha256('salt123456');
Enter fullscreen mode Exit fullscreen mode

You could use:

sha256('salt' + sha256('salt123456'));
Enter fullscreen mode Exit fullscreen mode

Increasing the number of iterations makes brute-forcing more difficult. However, this also affects the login process, as it becomes slower. Therefore, a balance between security and performance needs to be struck.

Halftime break

Let's take a break and summarize the characteristics of a good password hashing algorithm:

  • Irreversible (preimage resistance)
  • Difficult to brute force
  • Resistant to rainbow table attacks

As you may have noticed, salt and iteration are necessary to satisfy all these requirements. The issue is that both MD5 and the SHA family were not specifically designed for password hashing; they are widely used for integrity checks (or "message-digest"). As a result, each website may have its own implementation of salt and iteration, making standardization and migration challenging.

Password hashing algorithms

To address this problem, several hashing algorithms have been specifically designed for password hashing. Let's take a look at some of them.

bcrypt

bcrypt is a password hashing algorithm designed by Niels Provos and David Mazières. It is widely used in many programming languages. Here's an example of bcrypt hash value:

$2y$12$wNt7lt/xf8wRJgPU7kK2juGrirhHK4gdb0NiCRdsSoAxqQoNbiluu
Enter fullscreen mode Exit fullscreen mode

Although it appears as another random string, it contains additional information. Let's break it down:

[$2y][$12][$wNt7lt/xf8wRJgPU7kK2ju][GrirhHK4gdb0NiCRdsSoAxqQoNbiluu]
Enter fullscreen mode Exit fullscreen mode
  • The first section $2y indicates the algorithm, which is 2y.
  • The second section $12 indicates the number of iterations, which is 12. This means the hashing algorithm will be run 212=4096 times (iterations).
  • The third section wNt7lt/xf8wRJgPU7kK2ju is the salt.
  • The last section GrirhHK4gdb0NiCRdsSoAxqQoNbiluu is the hash value.

bcrypt has some limitations:

  • The maximum length of the password is 72 bytes.
  • The salt is limited to 16 bytes.
  • The hash value is limited to 184 bits.

Argon2

Given the debates and limitations of existing password hashing algorithms, a password hashing competition was held in 2015. Skipping the details, let's focus on the winner: Argon2.

Argon2 is a password hashing algorithm designed by Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich. It introduces several new concepts:

  • Memory-hard: The algorithm is designed to be hard to parallelize, making brute-forcing with GPUs challenging.
  • Time-hard: The algorithm is designed to be hard to optimize, making brute-forcing with ASICs (Application-specific integrated circuits) difficult.
  • Side-channel resistant: The algorithm is designed to be resistant to side-channel attacks, such as timing attacks.

There are two main versions of Argon2, Argon2i and Argon2d. Argon2i is the safest against side-channel attacks, while Argon2d provides the highest resistance against GPU cracking attacks.

-- Argon2

Here's an example of an Argon2 hash value:

$argon2i$v=19$m=16,t=2,p=1$YTZ5ZnpXRWN5SlpjMHBDRQ$12oUmJ6xV5bIadzZHkuLTg
Enter fullscreen mode Exit fullscreen mode

Let's break it down:

[$argon2i][$v=19][$m=16,t=2,p=1][$YTZ5ZnpXRWN5SlpjMHBDRQ][$12oUmJ6xV5bIadzZHkuLTg]
Enter fullscreen mode Exit fullscreen mode
  • The first section $argon2i indicates the algorithm, which is argon2i.
  • The second section $v=19 indicates the version, which is 19.
  • The third section $m=16,t=2,p=1 indicates the memory cost, time cost, and parallelism degree, where are 16, 2, and 1.
  • The fourth section $YTZ5ZnpXRWN5SlpjMHBDRQ is the salt.
  • The last section $12oUmJ6xV5bIadzZHkuLTg is the hash value.

In Argon2, the maximum length of the password is 232-1 bytes, the salt is limited to 232-1 bytes, and the hash value is limited to 232-1 bytes. This should suffice for most scenarios.

Argon2 is now available in many programming languages, such as node-argon2 for Node.js and argon2-cffi for Python.

Conclusion

Over the years, password hashing algorithms have undergone significant evolution. We owe a debt of gratitude to the security community for their decades of effort in making the internet a safer place. Thanks to their contributions, developers can pay more attention to building better services without worrying about the security of password hashing. While achieving 100% security in a system may be unattainable, we can employ diverse strategies to minimize the associated risks.

If you'd like to avoid the hassle of implementing authentication and authorization, feel free to try Logto for free. We provide secure (we use Argon2!), reliable, and scalable solutions, enabling you to focus on building your product.


Try Logto Cloud today

Top comments (0)