DEV Community

Cover image for What is a hash function?
Pedro Aravena for Vaultree

Posted on • Updated on

What is a hash function?

What is a hash function?

A hash function or algorithm transforms an amount of data (which can be arbitrarily large) into a fixed-length hash. Let's take a dummy hash function called “H” that takes an “x” parameter. H(“love”) could be equal to 250961 and H(“roma”) equal to 970845. Both results have the same number of digits (six), however they are remarkably different and we cannot see a correlation between the two values obtained , even using the same letters in a different order.

This function is called unidirectional, since from “love” we arrive at 250961 with a predictable computational cost, but it is basically impossible to do the opposite process (find out the information that has been hashed). An example of a function whose inverse process is easy to perform would be a function S that takes an integer and adds 25 to it; we know that, given an integer x, if the output of the function S is 35, it is because the input was the number 10.

An example of a one-way function involves multiplying arbitrarily large prime numbers. Multiplying, for example, 12,637 by 223, we get 2,818,051, this is trivial. Now try to find the two prime numbers such that when multiplied together they yield 1956803. Extremely difficult, and it is this problem that makes operations with prime numbers the basis for RSA encryption.

See, for example, the sentence below, with 18 characters (spaces are also characters):

I love Vaultree

Submitting it to SHA-256 (you can do the same in some online converter), we will get the following result:

C36e8b9d0b1d406070c04a755df7c10ad8bd3ea9ab0470f30feb306d408d00db

Note that the textual output is 64 characters long and is basically a number. However, it is represented not with our decimal system, but with hexadecimal (commonly used in the field of Computer Science), which includes the digits from 0 to 9 and the letters from A to F, totaling 16 symbols — hence the name.

This same number can be represented in binary or decimal base:

Binary

1100111101011010101111000000000100101001100001000010100111000010000010110100011101101111001100110100011010011001010111000101100010001010111111101001111100110110010101110010011111111010101001000111011100000010001100011111000101011000000011100000111010101011

Decimal

9.378907345221151 * 10⁷⁶

The hash we got is nothing more than a gigantic number displayed in hexadecimal format to represent the phrase “I love Vaultree”. Now suppose we forgot to include some letter or word and we want to do it now.

I love Vaultree's SDK

This change, which in our eyes is so subtle, causes big changes in the hash that is generated, look:

Old hash:

C36e8b9d0b1d406070c04a755df7c10ad8bd3ea9ab0470f30feb306d408d00db

New hash:

A136f1bf0c29682db0a86bf871b9f4f8eb6bf3869ab1f0d8fae036ce47c87839

It is worth mentioning that the length remains the same, as expected, but the contents have little resemblance to each other. Another characteristic of the function used to generate the hash is that it is deterministic, which means that it will produce the same result whenever the same input is used.

-

At Vaultree we are building an encrypted future. We love sharing valuable information and trends to help you keep your data safe. Sign up to stay in the loop and discuss the hottest trends in cybersec with a team of experts.

Image description

Top comments (1)

Collapse
 
robconery profile image
Rob Conery • Edited

Hi Pedro - great article! One thing to consider is that prime factorization is not a one-way function as it is simply multiplication which means, by definition, it can also be divided. It's difficult to guess and if the numbers are big enough (as with RSA) it's computationally impossible to do it in a reasonable amount of time. To be fair, prime factorization and RSA have more to do with asymmetric key encryption, not hashing. If I have your public key, I can encrypt something that you can then decrypt with your private key. Hashing is different - there's no way to unwind or decrypt a hash.

A one-way function uses modular math (which I'm sure you know) which some people call "clock math". If I add 32 hours to the current time, the answer might be 0845. There's no way to go backwards on that to figure out what both the start and offset were.

Also - it's more than "quite difficult" to figure out the opposite process, it's literally impossible, thus the term "one-way".

Hashing functions are so much fun to play with but extremely hard to get right. Speed plays a huge part in the algorithm - for things like checksums you want it fast but for passwords you kind of want it slow (say 500ms or something). This has to do with rainbow hacks etc.

Thanks for the article!