Sergi Canal

Posted on Sep 23, 2020 • Edited on Sep 24, 2020

Cryptography for programmers 2: Blocks and Randomness

#security #webdev #typescript #startup

In the previous post of the series, I made an introduction on why it is important for programmers to have a basic knowledge of cryptography, and I gave 3 basic rules to be followed by programmers to avoid the most common mistakes. The rules however, are quite vague, and need some more specific understanding to be able to be applied effectively.

In this episode we will cover two of the most important topics in cryptography: Block cryptography, and secure randomness. I have to admit that it has been quite hard to summarize such a big and exciting topic so much, and I hope that I have done a good job! 😄

Symmetric vs Asymmetric encryption

To understand what block crypto is, we need to first discuss the two main ways of encrypting and decrypting data.

In symmetric encryption both encryption and decryption are done with the same key. In the case of encrypted communication between two parties, they need to have a shared key between them that nobody else knows. This is the main limitation for symmetric cryptography, since in contexts such as the internet parties don't have shared keys between them.

In asymmetric cryptography the key used to decrypt is not the same one that is used for encryption. Asymmetric cryptography is mainly used for the purpose of communication between different parties, where the parties have no shared key or safe channel through which to send one. In asymmetric cryptography each participant has a key pair consisting of a public key and a private key. As the name implies the public key is publicly shared and anyone can have access to it. The private key should be kept secret and never shared. Then when someone wants to communicate with you, they would encrypt the message using your public key, and only those with knowledge of the private key are able to decrypt it.

As practical as asymmetric crypto sounds, as we will see in part 4 when we talk about public key and protocols, asymmetric cryptography is rarely used to transmit data, it is usually used to share a key which can then be used to continue the communication using symmetric crypto. The main reason being that symmetric cryptography, and in particular block ciphers, are much more efficient than mathematically based asymmetric ciphers.

Block Cryptography

Block cryptography is the default symmetric cryptography used nowadays, and so it will be the only symmetric cryptography that will be covered in the series. Most cryptography courses would introduce the topic with simpler symmetric ciphers such as OTP, and other XOR based methods. It is kind of interesting from the theoretical point to study them and their attacks. As a programmer however, you should not be using those, so we will skip them.

What is Block Cryptography

A block cipher is a cryptographic algorithm that is able to encrypt a fixed amount of bits in a safe way with a symmetric key. Block ciphers have a different function for decryption and encryption. There are many different block ciphers to choose from, and many of them are considered safe, but some are outdated and no longer safe.

What Block cipher should you use?

We should pay special attention when choosing a block cipher to make sure that it is still considered safe nowadays. For example DES was a very popular block cipher that was standard for a while, and maybe some old tutorials may use it and give you the impression that it is a good choice. It is not. In fact using DES itself will already be in breach of one of the rules presented in the previous episode. There is a slightly more "modern" version called triple-des or 3DES, it basically encrypts the contents 3 times with the DES algorithm, each time with a different key, effectively making the key 3 times the size. It is possible that you encounter 3DES, specially in some legacy projects. The security is not that great however, and it should not be used for a new project.

For the purpose of this post I would say as a programmer there is no reason you should be using anything other than AES. AES is the current standard, it is secure enough and will very likely continue to be for a long time, specially if you are using the variant with a 256 bit key, which I would recommend. AES is used so much in fact, that most modern processors have specific instruction sets for AES operations, so apart from being secure, it is very efficient.

Modes of operation

Now that we have chosen the Block cipher that we want to use, we are still not done. AES has a block size of 16 bytes, independently of the size of the key, so if we want to encrypt something longer than 16 bytes, we will first have to separate the message into chunks of 16 bytes, if the last chunk does not have exactly 16 bytes, we will add padding. There are various ways of combining block ciphers to be able to encrypt long messages split into blocks, with only one key. We call them modes of operation.

ECB

This is probably the solution that somebody who does not know about cryptography would give if they encountered this problem:

ECB mode consists on splitting the message into 16 byte chunks, and encrypting each of them with the same key and the same algorithm. The supposition being that if the algorithm is secure, then none of the blocks will be able to be decrypted without the key.

There is a problem however with ECB, which is that it gives up too much information about the message that was encrypted. Block ciphers are deterministic, meaning that when you encrypt the same 16 byte block with the same key and with the same algorithm, it will always return the same ciphertext. So if you were to encrypt something which contains repeated sequences, the repetition would also show after encryption. The most famous example of this is the result of encrypting a picture of a penguin, where sections of the same color, are encrypted into the same altered colors, keeping the fact that it is a picture of a penguin completely clear.

Let's imagine that say, a videoconferencing app decided to offer the security of encryption to users, and they decided to use ECB as their mode of operation. Do you think it would be a good choice?. Ehem.

ECB is also susceptible to other attacks [1] [2], and while it is not important to know them in detail, it is important to never use ECB, for any application, ever. There is always a better alternative.

CBC

In the diagram above, we see the encryption process used by the CBC mode. Where the circles with crosses refer to the bitwise XOR operation. In CBC the ciphertext of the previous block is XORd with the following block before encryption. What this does is it alters the ciphertexts depending on the previous blocks, making it so that the problem in ECB of not masking patterns does not exist in CBC. If we were to encrypt the penguin image with CBC, we would only see what would appear like random noise.

In CBC we also introduce the concept of Initialization Vector (IV). The iv is a random block that is generated for each message and is used to mask the first block from patterns. Here are a few rules from the previous episode's link related to IVs:

don't reuse IV and key pairs

When encrypting with the same IV and Key, the first block of the ciphertext is not protected from leaking information via patterns in different messages. In other modes such as OFB and CTR, reusing an IV completely destroys the security making it trivially breakable. Never reuse an IV.

Despite having to be unique however, IVs are not secret and are usually stored or sent together with the encrypted message.

don't use a badly derived IV

As said in the previous rule, IVs are not secret. They should however be unpredictable, which means they should be generated randomly. We will talk about randomness later in this article.

don't use a static IV

I assume this refers to not hardcoding an IV into the code. In that case it would mean reusing the same IV, which we already said is not a good idea.

CBC is a more secure alternative to ECB for the majority of purposes if used correctly. CBC is perfectly safe to store encrypted data in one's server for example.

There are some cases though in which you should not use CBC, specifically in cases where CBC encrypted data is transmitted in a client-server environment, where the server is susceptible to replay attacks. This has to do with my favorite crypto attack: the padding oracle attack. It is a perfect example of how very small amounts of information leaks, can lead to complete failures of encryption systems.

CBC is also not suitable for encrypting very large files or disks. Because of the way it is designed it is impossible to parallelize the execution of the encryption (decryption can be parallelized though), and in the case of encrypting a disk, it is impossible to change a block without reencrypting all the following ones.

Others

There are many modes of operation. I would recommend only ones that are recognized by standards bodies such as NIST. I think CBC is a good example of a mode of operation for study, and it is perfectly safe as long as it is used correctly and in the use cases where it works. There are specific modes of operation for specific problems, such as modes of operation made for disk encryption. We will also see one more mode of operation (GCM) when we talk about message authentication, and we will see that it is a better option for client-server situations where CBC is not a good choice.

Randomness

When we are working with cryptography we sometimes need to create secrets that no one else knows, or just data that behaves unpredictably. For example when generating a secure Block cryptography key, we should make it not only large, but random, since it does not matter how long your key is if it is just 'AAAAAA...', also IVs need to be generated randomly. So we need a way to generate random bits with a computer. The problem is that computers are the opposite of random. The processor does exactly what it is told, always in the same way, predictably. How could it be possible for the computer to generate randomness then?

PRNGs

We call random number generators in computers PRNGs (Pseudo-random Number Generators). We add Pseudo at the beginning to signify that they are not trully completely random, as in completely unpredictable. But they behave in a random way, meaning that if we were to generate many random numbers, they would follow the same kind of distribution that we would expect from a truly random process.

Generic PRNGs are included into all modern programming languages. PRNGs can be used for purposes such as generating a random distribution of numbers for statistical studies, random sampling, rolling dices ... For this purposes it is actually interesting that the random generation is repeatable. For example if we are making a study and taking random samples, we want the sample to be randomly distributed, but we want to be able to share the code so that the same sample will be generated. This PRNGs are actually complicated algorithms that generate distributions that act randomly, in a deterministic way. They take as initial input a seed, and given the same seed they will generate the same numbers.

Let's say we want to generate a secure key, and we do it with a standard PRNG. If we really wanted our key to be random, we would first need a truly random seed. Here is the problem, to generate a random seed, we need a number that is not generated with a PRNG, but it should be random. When using PRNGs a common solution is to use the current timestamp as the seed. The timestamp has the property that it changes all the time, but it does NOT act random at all. Let's say we generate our safe key using a time-seeded PRNG. And now let's say that an attacker knows that we generated the key in 2014, which is not a narrow interval. According to google, a calendar year has 3.154e+7 seconds, so roughly 31M seconds. All the attacker would need to do to learn our key would be to try generating it with the 31M possible seeds. It is not much for a computer to generate 31M random keys with a PRNG. We essentially lowered the security of our 256 bit key (115792089237316195423570985008687907853269984665640564039457584007913129639936 possible keys) into the security of a 20 bit key. And no, using millisecond (or even nanosecond) timestamps is not a good solution either.

Cryptographically secure randomness

As we have seen, we need a few more requirements from our randomness generation for them being able to be used safely for cryptographic applications. We need Random generators that do not rely on an initial input, and are not deterministic. They are called Cryptographically Secure PRNGs and they are included in most popular cryptography libraries. They use various external sources of randomness, usually physical properties, or user interaction. If we were physicists, we could probably discuss about whether true randomness even exists, or given the state of the entire universe one could determine the future. Lucky for us, we are not protecting our systems from omniscient deities, and just need them to resist clever monkeys with current technology, so we will suppose the physical properties are chaotic and noisy enough to not be able to be predicted. Things that can be used are temperature of the processor, the movement of the cursor, and even videos of lava lamps.

When programming cryptographic code it is very important that any randomness we need, is provided by cryptographically secure generators, as normal PRNGs provided by standard libraries are not suited for this use.

Code example

In the following code example I will provide a typescript implementation of an encrypt / decrypt class, using the concepts learned in this episode. It is important to note that I will be using CBC, meaning the following code should never be used in client / server communications with the possibility of a replay attack. This code would be suitable for cases such as storing encrypted data on a database, or storing a secret file on your computer.

import * as crypto from "crypto";

export interface EncryptedData {
    iv: string;
    data: string;
}

export class AES_256_CBC {
    private readonly encryptionKey: Buffer;
    private readonly ALGORITHM = 'aes-256-cbc';

    constructor(key: string) {
        // Key is a base64 encoded 256bit key (32 bytes)
        // It should be stored safely as an environment variable
        // and never uploaded into version control
        this.encryptionKey = Buffer.from(key, 'base64');
    }

    public encrypt(text: string): EncryptedData {
        // We generate the iv with a secure PRNG
        const iv = crypto.randomBytes(16);
        // We create the cipher with algorithm aes-256-cbc
        const cipher = crypto.createCipheriv(this.ALGORITHM,
            this.encryptionKey, iv)
        let enc = cipher.update(text, 'utf8', 'base64');
        enc += cipher.final('base64');

        // We return both the encrypted text and the iv in base64.
        // If we want to store it in a DB which does not support
        // JSON format, we can serialize it by appending the data
        // to the iv and converting into a single string.
        return {
            iv: iv.toString('base64'),
            data: enc,
        }
    }

    public decrypt(enc: EncryptedData): string {
        const decipher = crypto.createDecipheriv(this.ALGORITHM,
            this.encryptionKey, Buffer.from(enc.iv, 'base64'));
        let str = decipher.update(enc.data, 'base64', 'utf8');
        str += decipher.final('utf8');
        return str;
    }

}

That's it for this episode 😊. I hope you now have a better understanding of the importance of using the right parameters and modes of operation when working with block cryptography, and the importance of using cryptographically safe randomness.

In episode 3, we will talk about securing our cryptography for communication purposes with message authentication, and we will talk about the best ways of implementing a login with JWT.

Top comments (17)

Vinicius Dias • Nov 7 '22

This series of posts is incredibly helpful. Thank you for creating it! :-D

Just one note: nowadays, XChaCha20 is getting more and more popular and is usually recommended over AES. Also, since libsodium uses it as the default for secretbox, it's a no-brainer to use it. :-D

Junxiao Shi • Sep 24 '20 • Edited

In CBC mode, IV must be random but does not have to be unique. In case PRNG generates the same IV twice, it's fine.

In CTR or GCM mode, IV must be unique but does not have to be random. I designed a protocol that uses random IV with GCM mode, and the NIST cryptographer rejected that design upon review because there's a small risk of IV collision.
The application/protocol must guarantee that IVs for the same key never repeat.
One method is to split the IV into three portions:

A fixed portion that identifies the sender. In my case, the same key is used by a client and a server, so I use 0 for messages from the client, and 1 for messages from the server. If the same key could be used across application restarts, the start timestamp goes into this portion as well.
A random portion.
A counter portion. This should be incremented by the number of blocks (not messages) encrypted by a particular key.

Sergi Canal • Sep 24 '20

Hi Junxiao, thanks for the reply. I wanted to give general rules for all iv generation to not go into too much detail, and I felt generating random ivs is ok in most cases.

About CBC, I don't agree that ivs don't have to be unique. It is not as important that they are unique than with stream modes, but reusing ivs in CBC can leak information about the first block / repeated prepends in messages. For CBC the very small probability of a PRNG collision is acceptable, but that does not mean that it would be ok to use a static iv, or to reuse the same iv intentionally.

I can see why that probability is not so acceptable in stream modes. Still we should be aware that there is a very very small chance of that happening. I find your solution to guaranteeing uniqueness very interesting, thanks. I will take this into account when I explain GCM.

Pablo Barría Urenda • Oct 10 '20

Is the CBC part (the splitting into blocks) done by the crypto library behind the scenes? Shouldn't it be asynchronous, or is it assumed that the amount of data encrypted will always be relatively small?

Sergi Canal • Oct 11 '20

Yes the library handles the blocks and padding. You could make it asyncronous depending of your use case. However if you are encrypting large amounts of data, then maybe CBC is not the best mode since it can not be parallelized.