You can download the code from this article from GitHub
Elixir in itself does not have functions to calculate hashes of files or data, but as usual you can use Erlang modules to do that. The crypto module offers several cryptographic services, including many hashing algorithms.
The easiest option is the hash function. It takes an atom indicating one of the supported algorithms and the data to be hashed. When hashing a file, you need to call File.read! or similar to read the data before calling
data = File.read!("sample.pdf") sha256 = :crypto.hash(:sha256, data)
The problem with using the
hash function is that it only works if the whole file is present in the memory. When working with large files this can quickly degrade the performance of the application or even crash it.
An alternative is using the "streaming mode" of the hashing functions. Instead of feeding the data to the hashing function at once, you read the data in pieces and apply the hashing algorithm to each piece in sequence, updating its internal state until all data has been processed and the hashing algorithm has its final result. This is how these hashing algorithms actually work and this mode is available in other programming languages too.
Initialize hashing algorithm context While there is more data: Feed the algorithm a piece of data Get the final result from the hashing algorithm
In Elixir, this can be implemented using the File.stream! function and
initial_hash_state = :crypto.hash_init(:sha256) sha256 = File.stream!("sample.pdf", , 2048) |> Enum.reduce(initial_hash_state, &:crypto.hash_update(&2, &1)) |> :crypto.hash_final()
hash_init creates a "hash state" object that is updated by the hashing algorithm as new data is processed. At this point, it's state is equivalent to hashing an empty file.
File.stream! produces an enumerable in which each item is a binary with length of up to 2048 bytes (in this example). This parameter can be tuned according to memory usage and performance requirements: larger buffers are faster but use more memory.
The enumerable returned by
File.stream! is lazy and sometimes you need to explicitly execute it by calling Stream.run. Alternatively, most functions from the
Enum module will trigger the execution of the file stream.
Enum.reduce we call
hash_update, passing the current hash state and the data to be processed. It returns the new state of the hasher, to be updated with the next item from the file stream or returned as the final result of the
Having the hash state after processing the last data piece, we call
hash_final to get the calculated digest as a binary.
The result of hashing algorithms is a sequence of bytes (16 for MD5, 20 for SHA1, 32 for SHA256, etc.), but usually we present them in hexadecimal format. To do that, use the Base.encode16 function.
formatted_sha256 = Base.encode16(sha256, case: :lower)