I just wanted to write this post to help introduce (very SIMPLY) the concept of logarithms, especially because many programmers do not come from a math background. This post is not meant to get into all the possible things logarithms are good for, but just to understand what it IS and show how it is relevant in an algorithm's time (and space) complexity analysis.
Let's start from the basics. To introduce logarithms, we need to first introduce the idea of exponents, or a number to a certain 'power'.
25 = 32
The superscript (smaller, higher up number) indicates the number of times you multiply the base number (regular font and position number) with itself. Therefore, we can calculate 2 x 2 x 2 x 2 x 2 = 32.
This can be useful for many things, such as for determining how much information can be stored digitally (see bits for an introduction) or in computing compounding interest in finance.
But just as we can build up the product of 2's, we may also want to know how many times we multiplied 2 together. For example, we already know that 32 is 2 multiplied together 5 times.
We use the following notation for the logarithm:
log2(32) = 5
The subscript (smaller, lower number) is the base, which is still 2 in this example. The number we want to take the logarithm of is 32 (we want to find out how many times 2 multiplied together to get this number). The result is 5 because we started with this example, so we already know the answer.
Exponentiation and logarithms are actually inverse functions. Assuming both operations use the same base value (2, or the number being multiplied), exponentiation takes a number and multiplies the base that many number of times. Logarithms, starting with that final number, find out how many times it was multiplied (the exponent).
This means that we can feed our exponential into a logarithm and get the exponent used.
log2(2x) = x
Basically, the logarithm should be thought of as the exponent detector or calculator. (Assuming we start with the same base; very important!)
A couple of notes about logarithms before we continue:
- Most of the time, we just see 'log', such as on a calculator. This is most likely log10, which is base 10.
- Logarithms have properties that allow conversion between different bases: Formula to convert logarithm bases. This is really useful if you need to calculate log in a different base but only have a log (base 10) function (in your programming language or on a calculator).
- Sometimes you may see 'ln' or natural log. This is a logarithm in base e, or Euler's number. This is usually irrelevant for programmers and developers unless you are working with something related to mathematics but now you know what the ln button is on a graphing calculator.
Let's get to something more relevant. We know that there are many different ways algorithms can scale in time (and space) complexity, such as O(n), O(n2), O(nlog(n)), etc.
You may have seen a graph (with two curves) like the one below:
I created this graph using GeoGebra Graphing Calculator, a site where you can draw graphs using math formulae.
These curves may already be familiar with you. The higher, puke green curve that increases to infinity is 2x. The lower, dark teal curve is log2(x). In both cases, x is simply the value on the horizontal axis of the graph.
Because exponentiation and logarithms are inverse functions, and we are using base 2 for both curves, we can observe a symmetry along the a diagonal line (whose slope is 1).
From what we know about complexity analysis and big O notation, the exponential curve is really bad (increases towards infinity very fast) while the logarithm is very good (the slope tends to descend rather than ascend like the exponential).
Now that we know what the logarithm is, and why it has that descending (but not fully reaching horizontal) slope, let's take a look at a case where we find logarithms in commonly seen algorithms.
You may have seen binary search already but here is a summary:
Let's say you are given a sorted array of integers (ascending order). One way to find if a target value is held in the array is to look through every array index in order. But a more efficient way (especially for longer arrays) is to use binary search.
If the array is sorted, you can check if the median index holds the target value. But if the median index's value is lower than the target, you know that you only need to search the upper half of the array. Similarly, if the median index's value is higher, you only need to search the lower half.
This can be repeated until base cases are reached, such as when the sub-array being searched is only a length of 1.
I won't discuss how to solve or implement binary search in any more detail here. I want to use the binary search algorithm to show how we can determine that (spoiler alert!) the time complexity is O(log(n)) where n is the length of the array being sorted.
Let's use our trusty number 32 from the previous section. We already know that this is 25. We also know that when we implement binary search, we keep finding the median index (or the midpoint) between our starting and ending indices. This is because we want to rule out half of the search area each time (if we know the target is higher or lower, we know which side NOT to search).
Since dividing by 2 is the inverse of multiplying by 2, this plays into our hands because exponentiation is simply multiplying a bunch of times. If our input array has 32 numbers to look through, we don't want to look through all 32.
Let's see what we are actually binary searching through if we have an array of length 32 (indices 0 - 31).
- Iteration 1:
- We check index 15 (average of 31 and 0 is 15.5, and we take the floor).
- If our target number is found on array, we can return out of the function. Else, we binary search the upper or lower half of the array.
- Iteration 2:
- Since the sub-array is only length 16, we can take the median of 0 and 15, which is 7 (floor of 7.5).
- If target found, return. Else, binary search an array of length 8.
- Iteration 3:
- Array length 8
- Iteration 4:
- Array length 4
- Iteration 5:
- Array length 2
- Iteration 6:
- Array length 1
"6 iterations!" you may say, thinking that I am actually not that great at math. But when you think about the powers of 2 and its correlation to the sub-array lengths, you will see that 1 is actually 20. And counting 0-5 actually gives you 6 items.
Additionally, the exact amount isn't the most important thing in complexity analysis, but rather the scaling factor as the size of data increases. For example, 210 is 1024. If you had an array of around 1000 numbers, your binary search would probably take 11 iterations in the worst case (0-10 is 11 items). But when you start comparing 11 iterations to the 1000 iterations an O(n) linear algorithm would take, the benefits start to become apparent.
Also, even though most numbers are not exactly divisible by 2, as long as you are roughly dividing the array in half all the smaller pieces are either ignored (we know the target isn't in a certain half of the array) or break down to arrays of length 2 and 1 anyway. So the performance would be comparable to the next power of 2 not lower than the array length.
While not every programmer needs to know a ton of math, logarithms are seen frequently enough that I feel it should be introduced. They have many mathematical properties but most are not necessary to memorize for a programmer.
The behavior of logarithms, however, is important to understand in algorithms and their complexity analysis because the potential improvements in algorithm performance are so great, and can be thought of as low hanging fruit in many cases, such as for well known divide and conquer algorithms.