ian-a-frankel

Posted on

# Number Theory Basics

Most algorithms in cryptography depend on large prime numbers, and the fact that certain algorithms are hard. Encryption is basically the act of applying a mathematical function for which it is very hard to determine the inverse function without a key, but easy with the key.

There are many number theoretic algorithms which are fast and easy. The mother of them all number theoretic algorithms is Euclid's algorithm for finding the greatest common divisor:

Given that we can do division with remainder, if $a > b$ and we want to compute GCD(a,b) we write

$a = bq + r,$ and notice $GCD(a,b) = GCD(b,r).$ Now we can iterate this. This is very fast; in fact its time complexity is $O(log(a + b))$ !

If $a$ and $b$ are really large and hard to factor, we can compute their GCD extremely quickly without knowing how to factor them! In fact there is no known way to efficiently determine the prime factorization of a number - if I multiply two random 300-digit primes $p$ and $q$ it's impractical to find $p$ and $q$ if you only know the value of $pq,$ although it is pretty easy to detect that $pq$ is not prime. You could write code to do it, but your computer would not last long enough to execute it.

### Powers modulo $p$

If $d$ is an integer, we write $a \equiv b ~ (mod ~ d)$ if $a$ and $b$ have the same remainder when we divide both by $p.$

If $p$ is a prime number, we can define multiplication on

${0,1,...,p-1}$
by taking the result mod p, that is, taking the remainder when we divide by $p.$ We can also define exponents in a similar way: $a^n$ is just the remainder when $a^n$ is divided by $p.$

How do we actually compute $a^n$ efficiently when $a,p,n$ are all large?

The answer is repeated squaring. We can write $n$ as a sum of powers of $2$ , and calculate those powers of , i.e. if $n = 453,$

$453 = 256 + 128 + 64 + 4 + 1 = 2^8 + 2^7 + 2^6 + 2^2 + 2^0.$

So if we want to compute $17^{453},$ we actually only have to calculate the powers of $17$ whose exponents belong to

$[1,2,4,8,16,32,64,128,256].$
All told, this is only $12$ multiplication operations to compute $17^{453}$ ( $8$ to compute the powers with exponents in the list, and $4$ more to get the correct one).

### Logarithms modulo $p$

Another well known fact in number theory: for any prime number $p,$ you can find a remainder $a$ such that all of the different powers $a, a^2, a^3, a^4, ... , a^{p-1}$ are different mod $p$ ; in this case we say that $a$ is a primitive root mod $p.$

We can get at most $p-1$ different remainders, since no power of $a$ will give a remainder $0$ . Most values of $a$ will not be primitive roots, but in practice it usually not that hard to find one such value fairly quickly.) Values of $a$ with this property are called primitive roots. For example, the primitive roots mod $5$ are $2$ and $3$ .

powers of 1: 1,1,1,1 (not all distinct mod 5)
powers of 2: 2,4,8,16 (all distinct mod 5)
powers of 3: 3,9,27,81 (all distinct mod 5)
powers of 4: 4,16,64,256 (not all distinct mod 5)

For a = 1,2,3,4 we have

$a^4 \equiv 1 ~ (mod ~ 5)$

In fact, if $p$ is a large odd number and $1 < a < p - 1$ and
$a^{p-1} \equiv 1 ~ (mod ~ p)$ then $p$ is almost certainly prime, although there are a small number of "pseudoprimes" that pass this test which are not prime.

However, we can detect these pseudoprimes quickly with high probability! And we will want to do this, because even a failure rate of 1 in a million is enough to cause a lot of problems for a company like Google or American Express.

## The Miller-Rabin Primality Test

However, if $p$ is a large pseudoprime that managed to sneak past the Fermat test, there are two cases that we might have to worry about here. The first is that the pseudoprime p could be a power of a prime, say p = $p = v^k$ where $v$ is prime.

Then $p$ is already extremely unlikely to escape the Fermat test - a random guess will only pass with probability 1 in $v^{k-1}.$ A relatively small number of Fermat tests with randomly chosen values of $a$ will detect that $p$ is not prime with an acceptably low failure rate.

For any true prime $p$ and primitive root $a$ we know $a^{k(p-1)/2} \equiv p - 1 ~ (mod ~ p)$ if $k$ is odd, and 1 if $k$ is even, and 1 and $p-1$ are the only two square roots it has.

The other case is when $p = vw$ where $v,w$ are odd and greater than 1 but and have no prime factors. We can take advantage of their existence without being able to find $v$ and $w.$

Anything that is 1 (mod $v$ ) and $w-1$ (mod $w$ ) will be another square root in of 1 in mod $p$ arithmetic, which will be different from 1 and $p-1.$

Now, the key idea here is that if I pick a random integer from 0 to $p-1,$ the remainders mod v and mod w are independent and uniformly distributed. What we do here is decorate the Fermat test a little bit.

First, find the unique odd number d for which we can write

$p - 1 = d \cdot 2^e.$

Now, we will do Fermat tests, with a random number $a$ as before, but keep track of sightly more information. In what follows, all calculations are to be reduced modulo $p.$

1.) Let $m = a^d.$

2.) Calculate $m, m^2, ... , m^{2^e}.$

If the last result isn't 1, you have failed the Fermat test and you know the result is composite. However, even if you did not, you can look at the last power of m in the list for which the value was different from 1. If it's not $p-1,$ then you have failed the Rabin-Miller primality test.

There's an entirely elementary proof that the Rabin-Miller test fails with probability at least 1/2 when $p$ is not prime.

## Sharing Secrets using Primes

We can think about logarithms as the inverse of exponentiation, similarly to how we do for positive real numbers. log( $b$ ) ( with base $a$ ) is the power $n$ for which $a^n$ = $b.$

The key fact here is that when $p$ is a large prime number, the "logarithm" operation is computationally intractable.

Given $a$ and $n,$ it's easy to find $a^n.$ However, if we know the values of $a$ and $b,$ in general it is quite hard to find $n$ such that

$a^n ~ \equiv ~ b ~ (mod ~ p).$

So here's a quick way for two parties to exploit this fact and pass information while maintaining their own privacy:

Alice and Bob have a shared prime number $p,q$ and primitive root $a.$ (In fact $a$ doesn't need to be a primitive root, but needs to just have many distinct powers.)

Each has a private key, say $x$ for Alice and $y$ for Bob (they named them after their sex chromosomes).

Alice tells Bob $a^x$ and Bob tells Alice $a^y.$

The values of $x$ and $y$ are secure (difficult for either to steal from the other or for an eavesdropper to intercept), but they can both access the value of $a^{xy}$ to decrypt any information that needs to be available to both of them.

In fact the standard cryptographic system is a small variation on this: Instead of a large prime, we choose a product $q$ of two large primes in place of $p$ and a number like $a$ in place of $p$ , and those two pieces of infomration combined are Alice's public key. This has the additional layer of security that the exponent we must raise a number to to get 1 (mod $q$ ) is hard to compute.