Sometimes with data structures, the cost of certain operations "will cancel each other out". that some operations are costly but will happen "less often than others", and that overall, we'll be okay.
But how do we know when? and can we use it to show our databases are better?
In this post I'm going to go through a couple of examples explaining amortized (or accumulative) complexity, including one that you might be using in oh so many programs.
I assume you already have basic knowledge of algorithms and data structures, including worst case complexity analysis and big O notation.
If not there are a couple of articles already written on the topic including
I am aware this will seem kind of tricky, or even silly at first, but I hope at least one of the examples stick.
In our first example we consider a stack with three operations
Push(value:int)which adds a new value on the top of our stack
Pop()which removes and returns the top value
MultiPop(pop_amount:int)which removes and Returns
pop_amountof the top values of the stack.
Say we implemented this structure using a simple linked list, lets analyze the worst case complexity of each of the operations.
Implementing the push/pop operations as inserting/deleting the tail will cause each operation to take a constant amount of operations, or
Then what about the multi pop? Say we pop the K top values, then it will take an order of K operations, or O(K) and because we can't pop more values than there are on the stack, the maximum worst case complexity for this operation is O(n), where n is the size of the stack.
Wait a minute.
Lets say I push n values to the stack,and then use
According to our worst-case analysis that will take us
O(n) + (n/2)*O(n) =O(n^2) operations! But if instead of using
MultiPop(2) once we used
Pop() twice then in worst case it will take
O(n) + (n/2)*2*O(1)=O(n) operations!
Excuse me what?
Before we panic, lets remember that worst case complexity analysis of
O(n) means that the action can take up to an order of n operations, not necessarily each call to the operation will be this costly, so how do we analyze the cost accurately?
Well we know multi-pop doesn't cost that much. so let's try to amortize it!
am·or·tize amortized; amortizing Definition of amortize transitive verb 1 : to pay off (an obligation, such as a mortgage) gradually usually by periodic payments ...
This is where amortized analysis comes in - we need to look at the bigger picture! Instead of looking at the operations alone, we'll look at a series of
Now, because we can't pop more than we have pushed, if we examine a series of pushes, pops and multi pops the amount of pops in total is lesser than the amount of pushes.
In other words, say we have a series of m actions,
pu of which were pushes. No matter how many of the pops were single pops or multi pops the total amount of pops can not be greater than pu. In other words the amortized amount of operations is
O(pu) + O(pu)=O(pu).
Re analyzing our last example, we made
n push operations, so
pu=n. So instead of multiPop taking
O(n^2) operations we get
O(n) + total amount of pops <= O(n) + O(n) = O(n) operations - The same as we got from using only single pops.
The world makes sense again!
Using this kind of analysis, we can say that in a series of m actions of any kind in total we will make O(m) operations, or O(1) operations per action.
Meaning our amortized time complexity for each operation in the multi-pop stack is
O(1), even for the
multiPop operation, it's "paid off" by the push operations. Intuition wins again!
In the next post I will show some more practical examples that you probably already use. as well as a more formal way of doing this kind of analysis.