A hash function takes an entry, for example, a number, phrase, file or password and calculates a sequence from it, usually represented in hexadecimal format.
The image below shows an example of how Hash works, given a value in the set on the left, a representation is calculated in the set on the right and a mathematical function is applied to calculate a new value.
This function is only didactic, as it generates collision very easily. For example: the words SECRET and SERGEDO would generate the same Hash.
Examples of hash functions are: MD4, MD5, SHA1, SHA256, SHA512.
How a Hash Function Works
The first prerogative of a hash function is that, given an entry and the hash calculated, it should not be possible to calculate the original value back. In the example function, the source word could have been any combination of the letters involved or even other letters and larger or smaller words.
Another prerogative of Hash algorithms is that, given an input, the output is always the same, for example, the Hash MD5 of the letter A will always be: bf072e9119077b4e76437a93986787ef, no matter how many times we use it.
In an efficient hash algorithm, when changing 1 bit of the input, at least half of the output bits must be changed, so while the MD5 of the letter A is bf072e9119077b4e76437a93986787ef, the MD5 of the letter B, which is only 1 bit apart is 30cf3d7d133b08543cb6c8933cdf6 , something totally different.
This is to avoid reversing the Hash, that is, trying to deduce what the entry was from a known Hash.
That is why Hash algorithms are good for storing passwords, as a user who has set his password with the word "secret" will have stored the word or password file 88dabf3d5306998c155971257440f3be in hexadecimal notation in the password database or file.
Someone who has access to this database will not know directly that the password was "secret" and in fact neither the letters that were part of it or that the password was 7 letters long.
When that same user tries to authenticate, the system should calculate the Hash MD5 of the password entered and compare it with the stored word 88dabf3d5306998c155971257440f3be, if they match, the user has set the password.
MD5 Hash Issues
But there is a problem, the Hash MD5 for the word “secret” is and always will be 88dabf3d5306998c155971257440f3be, that is, a brute force attack (where we test several words and calculate your MD5) would find short words that exist in the dictionary quickly.
This is even more serious in systems that only allow numeric passwords, many users use dates of birth, birth of children or marriage, if we consider all dates from 1900 to 2100, we have only 73048 days, in a system that asks for 8 digits where the user had 100 million possible combinations, testing only 0.073% of the combinations should break close to 99% of passwords.
That's why we ask users to add symbols, numbers and vary between upper and lower case, to increase the complexity of the attack.
Brute force attacks can be optimized for dictionary attacks. For example, we can calculate the hash of all words in English and Portuguese, names of people, pets, cities and dates with 6 and 8 digits, in American and international format.
Using more advanced hashes to prevent attacks
To avoid dictionary attacks or rainbow tables, the Salt technique is used. In a simple example, let's say the salt is the letter A for the first time and the letter B for the second time. In this case we would calculate the MD5 for "Asenha" and "Bsenha", which are respectively: bfea17d7c6a92d55f77b81a50326f421 and e6870c54d31bc0519cbc243a0cc2ccdc, that is, totally different words.
We store salt along with Hash, for example, you can see this format on some systems: $ A $ bfea17d7c6a92d55f77b81a50326f421 and $ B $ e6870c54d31bc0519cbc243a0cc2ccdc. Salt does not have to be secret, as it only serves to prevent rainbow table attacks.
To validate a user's password, we take the stored Salt, concatenate the entered password and calculate the Hash, if it matches the stored word, the user has set the password.
Of course, in production environments, Salt is larger than 1 byte in the example.
Salt does not prevent dictionary attacks or brute force attacks (trying all combinations of letters, numbers and symbols), but the computational cost of cracking a password is no longer viable. For example: a 10-character password where the customer can choose uppercase, lowercase, numbers and some symbols can take several years.
Another point about hash algorithms is that they allow collision, that is, two or more data entries can generate the same hash, although a good algorithm tends to generate few collisions.
It is impossible to avoid collision, since in the set of inputs there is an infinite possibility (we can have from small files, with a single byte, to files with dozens of terabytes, where the content is random), while in the output set the possibilities are finite (for example, in the MD5 algorithm, we have 128 bits of output, which would give 16 characters, or 32 characters in hexadecimal notation), a large but still limited amount.
There are practical examples of collision on MD5. At www.mscs.dal.ca/~selinger/md5collision/ we have two strings that generate the same MD5 (note that they are slightly different):
d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89 55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0 e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89 55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0 e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70
There are other demonstration examples, such as two PDFs with different contents, but the same size, having been changed in the Metadata of the files so that there was an MD5 collision, or two different executable programs with the same MD5.
To avoid receiving a malicious file, a tactic is to use two different hash algorithms, computationally it is very expensive to generate two files that collide with two different algorithms, for example, MD5 and SHA1.
Despite these problems, Hash is an excellent solution for storing and validating passwords and for checking data and files received (ISO downloads for example).
To avoid problems, always try to use the latest available options and libraries provided by the programming language.
IMPORTANT 1: MD5 was used here only as a didactic example, it is known that it is possible to calculate collision in MD5 from studies in 2006, that is, a long time ago…
IMPORTANT 2: the Hash calculation example shown in figure 1 is for didactic use only, do not use in production
IMPORTANT 3: the example of Salt presented here to store passwords, although correct, is very simplistic. If you need to store and authenticate passwords on your system, use the language's native libraries, you won't (like I don't consider myself) becoming a cryptography expert by reading an article on a blog.
Post by: GR3 WEB - criação de sites