Calling the brute(force) squad

#security #cryptography #passwords

I got this picture in my family chat recently with a the question "is this correct?" The short answer is "kinda". The long answer is this blog post 🙂

What is Brute Forcing

Put simply, brute forcing means that a password is guessed. This is as opposed to it having be stolen, or intercepted circumvented via other means. Typically its guessed without the attacker knowing any other information about the target personally (other passwords, pets names...).

In order to calculate the time needed to brute force a password you need to know two things. How long does each guess take and how many guesses on average do they need in order to guess the password. Its important to know what is being guessed and how. Is this someone trying a login form on a website? Trying to unlock a phone they have in their possession? Maybe they're trying to decrypt a file. Typically though, charts like the one above refer to cracking a hash.

What's a hash?

A hash is the result of a "one way function" that takes in some data (call it "plaintext") and returns other data of a fixed length (a hash). "One way" means its not reversible. You cannot determine the plaintext from the hash.

For the sake of illustrating lets design a simple to understand hashing algorithm (method). It takes in a number, and returns the digital root, which is the sum of all the digits, taken until there is only 1 digit. For example, the digital root of 135 is 9 because 1+3+5 = 9. The digital root of 862=7 because 8+6+2 = 16 and 1+6 = 7. This is a hashing function because it takes in arbitrary data (any number), returns a fixed length hash (1 character long) and is irreversible (a hash of 4 can come from 31, 94, 22...)

In practice (for reasons we'll discuss later) hashing algorithms are more complex. Common examples include MD5 and SHA256.

There are a few reasons why people might want to use a hash instead of storing the plaintext directly. For one, it saves space, while the plain text may be of any length a hash is fixed and usually on a few character. For instance the MD5 hash of the small phrase "it was the best of times, it was the worst of times" is 4C0D0D10C00D6794492E86E3175F0A55, which is the same length of the hash of the the entirety of Shakespeare's Henry XIII 0449dd00c18ff30e70637dbc915912d4.

When storing passwords in a database this is more secure than storing the "plaintext" because if the database is breached, the attacker does not get the whole password, just the hash. At which point, if they want to recover the password, they need to "crack the hash".

Guessing speed

If the attacker is guessing passwords via a website login form they have to wait for the request to be sent to the remote server and returned. Perhaps the remote server limits how fast an attacker can guess passwords, in that case it may take a very long time to make a guess. This is what happened in 2014 when attackers broke into the iCloud accounts of various celebrities, they simply "guessed" the passwords. Since then iCloud has implemented a limit to how many times a user could enter a wrong password before locking the user out. On an iPhone, there is a physical backing off. The more times someone guessing wrong, the longer they have to wait to try a new password. In this case, brute forcing even a 4 character password could take decades.

Limiting the possible options

I once received an encrypted document via email. In the email, there were instructions which told me that the document itself did not contain any personal information (for my security), it went on however to tell me that the password was my social security number! By saying in the email that the password was a social, the attacker now knows they can easily brute force the encryption, because a US social security number is 9 digits.

9 digits is 1 billion options, which means on average a computer will guess the password after 500 million tries. For a computer this only takes a few seconds. For comparison, if the password could have upper and lowercase letters and number then each character would have 62 options, a 6 character password then would have ~56 billion options (62^6) and would take about 56 times as long to brute force. There are also 34 special characters, so a "secure" password of 8 uppercase, lowercase, numbers and special characters would have 6,634,204,312,890,625 options. Which is on average 3 million times harder to brute force than 9 digits. This is the basis of the chart above.

(Importantly, none of this matters if your password is p@ssword1. Hackers will likely be guessing common passwords before they try brute forcing. Password reuse attacks are also more common and easier to pull off than brute forcing. So easy that after the LinkedIn breach of 2016, other non-breached sites chose to reset **their* users passwords just to proactively stop it.*)

Collisions

If a hacker is brute forcing, they still don't need to try every possible combination. In fact, because of the way that hashes work, there are infinite amounts of plain texts that will result in the same hash. This is called a "collision". To use an the digital root hashing algorithm as an example, a user may set a 6 digit password of 938533, this is stored as "4" in the database. In this case, 6 digits is 1 million options (average 500K guesses). However an attacker doesn't need to try that many, they really just have to try 1-10 (average 5) guesses. Since 4,13,22,31, and 938533 all result in the same hash (4), this made up system will accept all of them as correct passwords. In practice since hashes are usually longer than your average password, collisions are generally not this easy to find. That is unless there is something wrong with the hashing algorithm...

An "ideal hash function" is a hash function that essentially acts as a random mapping between inputs and outputs. If a hash function was ideal, that means that the only way to find a collision is by brute forcing. In practice however, hash functions are not ideal and even the best ones we have are constantly being challenged. The MD5 hashing algorithm for instance was once considered secure, but was later discovered to have serious collision vulnerabilities making even the strongest password trivially crackable.

Hashes stored incorrectly can also causes problems. Since a plaintext will always result in the same hash, an attacker could come up with a list of mappings between plaintexts and hashes. Allowing them to save time, only having to calculate the hashes once. These exist and are called rainbow tables. Rainbow tables can be fairly large, a list of all hashes for 8 characters can be several hundred terabytes. Larger than what most PCs have available but can be stored on the cloud for a few hundred dollars. To counter this, applications can "salt" the hashes. Which means it adds some random characters to the plaintext before storing it. (the random characters are stored with the password to help with verifying passwords later). Those extra characters make rainbow tables infeasible (a rainbow table of 20 character passwords would take up more space than all the hard drives ever made).

Putting it all together

So what we've learned here. Brute forcing is guessing passwords, the more characters the more possibilities and the longer it takes to guess. Strong passwords with more types of characters are harder to guess. Still, that's not the only thing that matters. Weak hashing or storage choices can compromise even the most secure passwords, and a common password is not secure regardless of how many special characters you use.

If you really want to protect yourself, the best advice is to use a password manager (lastpass, keepass and 1password are all good solutions) to generate and store your passwords. Don't reuse credentials to prevent against a reuse attack and enable two factor authentication when available.

DEV Community