DEV Community

ZhiHong Chua
ZhiHong Chua

Posted on

Heap: A Faster Way to Get Array Elements in Sorted Order

When trying to sort an array, we tend to use in-built implementation, which in python is:

arr.sort()
Enter fullscreen mode Exit fullscreen mode

However, it takes time complexity O(NlogN). Honestly, that's perfect if the product we build are not at huge scale. But there is a way to get O(N) time complexity to get sorted elements of an array.

Note: notice I don't say get a sorted array, because heap doesn't actually do this.

The tradeoff for the faster time though, is the need to understand the concept of heap, which is the purpose of this article.

Main Heap Concepts and Code Implementation

The heap does not actually sort an array in the conventional sense, but uses what I will call heap logic to access the elements so that we can get the elements in sorted order.

Heap logic is best understood as a Binary Tree representation (but doesn't actually convert it into a tree! It's still an array), where each parent node has at most 2 child nodes. Consider an array

[a,b,c,d,e,f,g,h,i]
Enter fullscreen mode Exit fullscreen mode

Heap as Binary Tree Representation

The link between Array and Binary Tree representation may not be clear, but there is a simple, beautiful equation that ensures there is never a clash

const nodeIndex // As required
const leftNodeIndex = 2 * nodeIndex + 1
const rightNodeIndex = 2 * nodeIndex + 2
Enter fullscreen mode Exit fullscreen mode

Here it is useful to mention that heap can either be Minimum Heap or Maximum Heap. Minimum will give us minimum element at index = 0 of the array, maximum gives us maximum element at index = 0 of the array. For the rest of the article let's just consider Minimum Heap to avoid confusion.

Just a reminder, you can't get the second smallest value just by accessing it conventionally like arr[1]. You have to pop the smallest element first before getting it as arr[0].

The critical rule here is that in a heap the parent has to be smaller than both children. It does not matter whether left child is larger or smaller than right child.

1. Heapify

Heapify converts an array to a heap in-place. Be sure that the result array is not of sorted order in the conventional sense, it is only sorted when accessing it using the heap logic.

array = [1,9,7,5,4,8,2,6,3]

# Implement Heapify Function
def heapify(arr):
    n = len(arr)

    for i in range(int(n/2)):
        # print(arr) # For purpose of observing swaps that occur
        leftNodeIdx = 2*i + 1
        rightNodeIdx = 2*i + 2

        # Min Heap Implementation, will be different equation for Max Heap
        if arr[i] > arr[leftNodeIdx]:
            arr[i], arr[leftNodeIdx] = arr[leftNodeIdx], arr[i]
        if arr[i] > arr[rightNodeIdx]:
            arr[i], arr[rightNodeIdx] = arr[rightNodeIdx], arr[i]

heapify(array)
Enter fullscreen mode Exit fullscreen mode

This array transformation takes up to O(N) time as seen in the range of the for-loop.

2. Heap Push

Pushing new elements to the heap are placed in the leaf node of the tree representation, or the last index of the array. Then do a recursive search (think of bubbling up the tree representation) to see if the child is smaller than the parent. If yes, swap the two elements. Also do a check with the new parent and the other child element, remember we must satisfy the condition that each parent is greater or equal to its children. Then check the parent against it's parent, until you hit the top of the tree, or index = 0 of the array. Here's the implementation:

# Implement HeapPush Function
def heappush(arr, num):
    arr.append(num)
    n = len(arr)
    i = int(n/2) - 1

    while i >= 0:
        # print(arr) # For purpose of observing swaps that occur
        leftNodeIdx = 2*i + 1
        rightNodeIdx = 2*i + 2

        # Min Heap Implementation, will be different equation for Max Heap
        if leftNodeIdx < n and arr[i] > arr[leftNodeIdx]:
            arr[i], arr[leftNodeIdx] = arr[leftNodeIdx], arr[i]
        if rightNodeIdx < n and arr[i] > arr[rightNodeIdx]:
            arr[i], arr[rightNodeIdx] = arr[rightNodeIdx], arr[i]

        i = int((i - 1) / 2) if i != 0 else -1

heappush(array, 10)
heappush(array, 0)
Enter fullscreen mode Exit fullscreen mode

This will cause O(logN) time complexity for a push to an array size N. But we can think of amortised time where the pushing the first element to heap is O(1) since there are no elements to compare with.

3. Heap Pop

For popping we have to first put the leaf element arr[-1] into the top of the tree arr[0], instead of deciding which of the 2 child nodes of arr[0], arr[1] or arr[2] need to be lifted into the 0-th position. This would create a great confusion to re-check all the element positions.

By putting the leaf node (maximum value) to 0-th position, we can do a similar recursive check as heap push, this time bubbling the maximum value down the tree representation at a time complexity of O(logN).

# Implement HeapPop Function
def heappop(arr):
    if len(arr) == 0: return "Error, no elements to pop"
    arr[0], arr[-1] = arr[-1], arr[0]
    res = arr.pop()
    n = len(arr)
    i = int(n/2) - 1

    while i >= 0:
        # print('pop', arr) # To prove that it is O(logN) to pop
        leftNodeIdx = 2*i + 1
        rightNodeIdx = 2*i + 2

        # Min Heap Implementation, will be different equation for Max Heap
        if leftNodeIdx < n and arr[i] > arr[leftNodeIdx]:
            arr[i], arr[leftNodeIdx] = arr[leftNodeIdx], arr[i]
        if rightNodeIdx < n and arr[i] > arr[rightNodeIdx]:
            arr[i], arr[rightNodeIdx] = arr[rightNodeIdx], arr[i]

        i = int((i - 1) / 2) if i != 0 else -1

    return res

# Observe how the array changes
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
print(heappop(array), array)
Enter fullscreen mode Exit fullscreen mode

Again, we use consider amortised time, because popping from an array length N is O(logN), but popping from an array length 1 is O(1).

Bonus: Pop and Push Time Complexity (TC)

Technically, we can heapify an array, and then pop them one-by-one to append to an empty array to get a sorted array. How much faster could that be?

Let's do some algebra:

"""
Let's assume Time_Complexity_HeapPop < Time_Complexity_Sort, and that popping elements take time within range of [1, logN]. Using arithmetic summation, total time to pop all elements is S = N/2 * [1 + logN].

Writing this into the inequality above:
N/2 * [1 + logN] < NlogN # which becomes...
1 + logN < 2 * logN # and finally...
1 < logN

So we see that it is faster by (logN - 1) time.
log(100) = 6.64
log(100,000) = 16.61
log(1,000,000,000) = 29.90
"""
Enter fullscreen mode Exit fullscreen mode

Top comments (0)