DEV Community

natamacm
natamacm

Posted on

File checksum

A checksum is a sequence of numbers and letters used to check data for errors. You can generate a checksum for any file.

The generation of a checksum is called hashing and is done by a hashing algorithm.

hashing

You should know that hashing is not encryption. It just creates a hash, but it's impossible to reverse the process, unlike with encryption that requires a key to recreate the plain text.

You can get the checksum of any file in Python. There are several algorithms you can use. One of the older ones is md5. It's an older algorithm and I recommend newer ones. In fact md5 is from 1992 and it's better avoided.

To see a comparison of algorithms and attacks, check this page

You could use the sha256 algorithm to find a file checksum:

# Python program to find SHA256 hash string of a file
import hashlib

filename = input("Enter the input file name: ")
sha256_hash = hashlib.sha256()
with open(filename,"rb") as f:
    # Read and update hash string value in blocks of 4K
    for byte_block in iter(lambda: f.read(4096),b""):
        sha256_hash.update(byte_block)
    print(sha256_hash.hexdigest()) 

Hashlib has many implementations of hashing algorithms, you can see that in the Python shell:

>>> import hashlib
>>> hashlib.algorithms_available
{'sha256', 'blake2s', 'blake2b', 'sha384', 'md5', 'sha3_256', 'sha3_224', 'sha1', 'shake_256', 'sha3_512', 'shake_128', 'sha3_384', 'sha224', 'sha512'}

Top comments (0)