Files can become corrupted because of faults in storage media (disks, flash drives), errors during transmitting or copying to another storage or simply because of software bugs.
Most common method to detect file corruptions and verify its integrity is to use checksum.
I’ll show you how to embed CRC32
checksum of the file content into file itself while writing and how to verify the checksum while reading it in Go.
WriteWithChecksum(..)
takes a writeF func(w io.Writer)
function to write actual content to the file and writes calculated CRC32
checksum at the beginning of the file.
func WriteWithChecksum(file *os.File, writeF func(w io.Writer) error) error {
// Leave space for crc32
offset, err := file.Seek(crc32.Size, 1)
if err != nil {
return err
}
// Create an instance of crc32 hash
crc := crc32.New(crc32.MakeTable(crc32.Castagnoli))
// MultiWriter creates a writer that duplicates its writes to all the
// provided writers.
w := io.MultiWriter(file, crc)
// Call the actual write function
err = writeF(w)
if err != nil {
return err
}
// Compute CRC32 bytes
crcBytes := make([]byte, 4)
sum32 := crc.Sum32()
binary.BigEndian.PutUint32(crcBytes, sum32)
// Write checksum at the beginning of the file
_, err = file.WriteAt(crcBytes, offset - 4)
if err != nil {
return err
}
return file.Sync()
}
Similarly ReadWithChecksum(..)
takes a readF func(r io.Reader)
function to read the actual content and does an integrity verification by comparing read checksum with calculated checksum after reading all file content.
func ReadWithChecksum(file *os.File, readF func(r io.Reader) error) error {
// Read CRC32 bytes at the beginning of the file
crcBytes := make([]byte, crc32.Size)
_, err := file.Read(crcBytes)
if err != nil {
return err
}
// Convert bytes to uint32
sum32 := binary.BigEndian.Uint32(crcBytes)
// Create an instance of crc32 hash
crc := crc32.New(crc32.MakeTable(crc32.Castagnoli))
// TeeReader returns a Reader that writes to crc hash what it reads from the file.
r := io.TeeReader(file, crc)
// Call the actual read function
err = readF(r)
if err != nil {
return err
}
// Calculate CRC32 checksum
expectedSum := crc.Sum32()
// Compare read checksum with the calculated one
if sum32 != expectedSum {
err = fmt.Errorf("crc32 validation error. read: %d, expected: %d", sum32, expectedSum)
}
return err
}
It’s also possible to easily replace hashing function with another one if you don’t want to use CRC32 or want a more secure one.
Discussion