CryptoPals Crypto Challenges Using Rust: Break repeating-key XOR

#rust #programming #cryptography

This is Challenge 6 of Cryptopals challenges implemented in Rust language.

Context 💡

"It is officially on, now". Indeed.
So this is the first challenge that will require more than a couple cells of your brain to get around.

So we're given a text file which contains a message which was encrytpted with a repeating-key XOR and base64 encoded after. We don't know what is the key or even what is the size of the key. We want to crack this key. If you haven't checked it out yet, I'd highly recommend checking Challenge 5 and Challenge 3 first.

Crack 🕶

First off, we try to guess the size of the key, key_sz before guessing what the key actually is. We'll need to utilize Hamming Distance (or edit distance) between two strings. Hamming distance is nothing but number of differing bits at same positions in two inputs. So,

dist(11010110, 01011010) = 3

because of 3 differing bits at positions - 1, 5 and 6. Here's a readable implementation in Rust:

pub fn hamming_distance_bytes(b1: &[u8], b2: &[u8]) -> u32 {
    if b1.len() != b2.len() {
        panic!("Unequal byte slices!");
    }

    b1.iter().zip(b2.iter()).fold(0_u32, |dist, (x1, x2)| {
        let bin1 = format!("{:08b}", x1);
        let bin2 = format!("{:08b}", x2);

        dist + bin1
            .chars()
            .zip(bin2.chars())
            .fold(0_u32, |d, (ch1, ch2)| if ch1 == ch2 { d } else { d + 1 })
    })
}

Now, we'll loop through various key size (2 to 40) values, key_sz and try to guess the size using some intelligent guess through edit distance operation. Let's see how:

For a particular value key_sz = k (so k can be any value from 2 to 40), we'll break-up the given encrypted message bytes, into blocks of k bytes. There would be n_blocks = msg_sz / k (floored down to integer value), number of blocks, given encrypted message size is msg_sz bytes.

We'll make pairs of the blocks and calculate hamming distance between each pair. And then, get the average distance, (d1 + d2 + d3 + ....) / n_blocks. The value of k (or key_sz) for which this distance is lowest is most probably the size of actual key. Let's make a helper function to get average distance for a given key_sz:

use crate::utils::hamming_distance::hamming_distance_bytes;

fn calc_avg_edit_dist(key_sz: usize, txt_bytes: &[u8]) -> f64 {
    let len = txt_bytes.len();
    let mut i: usize = 0;
    let mut dist_sum = 0;
    let mut block1;
    let mut block2;

    loop {
        if i * 2 * key_sz >= len {
            break;
        }

        block1 = &txt_bytes[i * key_sz..(i + 1) * key_sz];
        block2 = &txt_bytes[(i + 1) * key_sz..(i + 2) * key_sz];

        dist_sum += hamming_distance_bytes(block1, block2) / (key_sz as u32);

        i += 1;
    }

    (dist_sum as f64) / (i as f64 + 1.0)

Why tho?
This is an excellent post explanation of it.

We know that XORing something with itself will result in zero, as same bits will cancel each other. Here, if we guessed the correct block size, which is the key size, key_sz = k, then we're certain that any two blocks, say b1 & b2 of the message were encrypted with the same key, k. And if b1 & b2 we're XORed with same key, then calculating hamming distance between them nullifies the key part in b1 and b2 resulting in lower distance. This is in contrast to wrong k value in which case b1 & b2 wouldn't have same key, resultin in larger distance because different keys wouldn't cancel out much.

Guessing key size was the trickiest part. Now we know (hopefully) how long (k bytes) the key size is, all's left is guessing the key itself.

For this. We divide the encrypted message bytes into k byte blocks. It can be inferred easily that the byte at ith position of every block was XORed with the same byte of key at ith position. So first character of every blocks was XORed with first character of key, second character of every blocks was XORed with second character of key & so on.
We can make a different block (not necessarily of k size) composed of first byte of each block & treat it as a single-character XOR problem from Challenge 3! Since in the newly created block each byte was XORed with same byte - the first char of key. And so we crack the first byte of key! Let's encapsulate single char xor problem in a function:

use crate::utils::letter_freq_test::calc_letter_freq_score;

fn break_single_char_xor(xor_bytes: &[u8]) -> u8 {
    let mut key: u8 = 0;
    let mut best_score = f64::MIN;
    for key_byte in 0..255 {
        let msg_bytes: Vec<u8> = xor_bytes.iter().map(|&b| b ^ key_byte).collect();

        let msg = String::from_utf8_lossy(&msg_bytes);
        let score = calc_letter_freq_score(&msg);

        if score > best_score {
            best_score = score;
            key = key_byte;
        }
    }

    key
}

Note that calc_letter_freq_score() is the same letter frequency function from Challenge 3

Now we repeated the same process composing a new block of second byte of each k-sized blocks and crack it with single-byte xor cipher. And so on.

// To read base64 encrypted message as bytes from file path
fn read_bytes(path: &str) -> Vec<u8> {
    let base64_s = fs::read_to_string(path)
        .and_then(|res| Ok(res.replace("\n", "")))
        .expect("Error reading file");
    base64::decode(base64_s).unwrap()
}

pub fn break_repeating_key_xor(path: &str) -> String {
    let text_bytes = read_bytes(path);

    // (key size, edit dist) tuples vec
    let mut edit_dist: Vec<(usize, f64)> = Vec::new();

    for key_sz in 2..=40 {
        let dist = calc_avg_edit_dist(key_sz, &text_bytes);
        edit_dist.push((key_sz, dist));
    }

    // Extract the shortest distance key
    edit_dist.sort_by(|x, y| y.1.partial_cmp(&x.1).unwrap());
    let key_sz = edit_dist.pop().and_then(|x| Some(x.0)).unwrap();

    // Key bytes
    let mut key_bytes: Vec<u8> = Vec::new();

    let mut idx;
    let mut ith_bytes: Vec<u8> = Vec::new();
    for i in 0..key_sz {
        // Take ith byte of every block of key_sz len
        idx = i;
        ith_bytes.clear();
        while idx < text_bytes.len() {
            ith_bytes.push(text_bytes[idx]);
            idx += key_sz;
        }

        let key_i = break_single_char_xor(&ith_bytes);
        key_bytes.push(key_i);
    }

    let key: String = key_bytes.iter().map(|&b| b as char).collect();

    key
}