mengqinyuan

Posted on

# Be wary of time complexity pitfalls

## Write Here

A bilibili vedio also shows this : https://www.bilibili.com/video/BV16u4m1c7cU/?spm_id_from=333.999.0.0 and I think it's a good vedio, but its language is Chinese.

## time complexity

• What does time complexity mean?
• Time complexity is a measure of how much time an algorithm takes to run as a function of the size of its input. It is a way to describe the efficiency of an algorithm, and it is used to compare different algorithms and determine which one is the most efficient.

• How to calculate time complexity?

• To calculate time complexity, we need to consider the number of operations performed by the algorithm as a function of the size of the input. The most common way to measure the number of operations is by counting the number of times a particular operation is performed.

• What are some common pitfalls when calculating time complexity?

1. Not considering the input size: If we only consider the number of operations performed by the algorithm, we may underestimate the time complexity. For example, if we count the number of times a loop runs, but we don't take into account the size of the input, then the time complexity may be too high.
1. Not considering the algorithm's efficiency: Some algorithms may perform many operations even when the input size is small, which can lead to high time complexity. For example, sorting algorithms like bubble sort and selection sort have quadratic time complexity, even though they may only need to swap two elements in a small array.
1. Not considering the algorithm's worst-case scenario: Some algorithms have a best-case scenario where they perform fewer operations than the worst-case scenario. For example, searching algorithms like binary search have a best-case scenario where they find the target element in logarithmic time, but they have a worst-case scenario where they need to examine all elements in the array.

### Let's see some examples of time complexity

Here is a question:
Find the max 10 integers in a list.

``````import random
ls = [random.randint(1, 100) for _ in range(n)]
``````

Here is the test function:

``````import time
def benchmark(func, *args) -> float:
# bench mark for testing
start = time.perf_counter()
func(*args)
end = time.perf_counter()
return end - start
``````

### Solution1: Use heap

Here is the solution which uses the heapq module:

``````def find_max_n(ls, n):
import heapq
return heapq.nlargest(n, ls)
``````

Or we write it in the python way:

``````def find_largest_n(nums, n):
if n <= 0:
return []

max_heap = []

for num in nums:
if len(max_heap) < n:
max_heap.append(num)
# call sift_down
for i in range((len(max_heap) - 2) // 2, -1, -1):
_sift_down(max_heap, i)
elif num > max_heap[0]:
max_heap[0] = num
_sift_down(max_heap, 0)

return max_heap

def _sift_down(heap, index):
left = 2 * index + 1
right = 2 * index + 2
largest = index

if left < len(heap) and heap[left] > heap[largest]:
largest = left

if right < len(heap) and heap[right] > heap[largest]:
largest = right

if largest != index:
heap[index], heap[largest] = heap[largest], heap[index]
_sift_down(heap, largest)

``````

### Solution2: Use sort

Here is the solution which uses the sort function:

``````def find_max_n(ls, n):
return sorted(ls, reverse=True)[:n]
``````

### Testing

Almost everyone know that The time complexity of the heap, is O(n log k), and the time complexity of the sort function is O(n log n).

It seems that the First solution is better than the second one. But it is not true in python.

Look at the final code:

``````import time
def benchmark(func, *args) -> float:
start = time.perf_counter()
func(*args)
end = time.perf_counter()
return end - start

def find_max_n_heapq(ls, n):
import heapq
return heapq.nlargest(n, ls)

def find_max_n_python_heap(nums, n):
if n <= 0:
return []

max_heap = []

for num in nums:
if len(max_heap) < n:
max_heap.append(num)
# call sift_down
for i in range((len(max_heap) - 2) // 2, -1, -1):
_sift_down(max_heap, i)
elif num > max_heap[0]:
max_heap[0] = num
_sift_down(max_heap, 0)

return max_heap

def _sift_down(heap, index):
left = 2 * index + 1
right = 2 * index + 2
largest = index

if left < len(heap) and heap[left] > heap[largest]:
largest = left

if right < len(heap) and heap[right] > heap[largest]:
largest = right

if largest != index:
heap[index], heap[largest] = heap[largest], heap[index]
_sift_down(heap, largest)

def find_max_n_sorted(ls, n):
return sorted(ls, reverse=True)[:n]

# test
import random
for n in range(10, 10000, 100):
ls = [random.randint(1, 100) for _ in range(n)]
print(f"n = {n}")
print(f"Use    Heapq: {benchmark(find_max_n_heapq, ls, n)}")
print(f"Python Heapq: {benchmark(find_max_n_python_heap, ls, n)}")
print(f"Sorted      : {benchmark(find_max_n_sorted, ls, n)}")
``````

I run it in Python3.11 VScode, Here is the result:

#### n is not big

n = 900
Use Heapq: 0.002430099993944168
Python Heapq: 0.06343129999004304

## Sorted : 9.280000813305378e-05

n = 910
Use Heapq: 9.220000356435776e-05
Python Heapq: 0.07715500006452203

## Sorted : 9.360001422464848e-05

n = 920
Use Heapq: 9.440002031624317e-05
Python Heapq: 0.06573969998862594

## Sorted : 0.00012450001668184996

n = 930
Use Heapq: 9.689992293715477e-05
Python Heapq: 0.06760239996947348

## Sorted : 9.66999214142561e-05

n = 940
Use Heapq: 9.600003249943256e-05
Python Heapq: 0.07372559991199523

## Sorted : 9.680003859102726e-05

n = 950
Use Heapq: 9.770004544407129e-05
Python Heapq: 0.07306570000946522

## Sorted : 0.00011979998089373112

n = 960
Use Heapq: 9.980006143450737e-05
Python Heapq: 0.0771204000338912

## Sorted : 0.00022829999215900898

n = 970
Use Heapq: 0.0001601999392732978
Python Heapq: 0.07493270002305508

## Sorted : 0.00010840001050382853

n = 980
Use Heapq: 9.949994273483753e-05
Python Heapq: 0.07698320003692061

## Sorted : 0.00010300008580088615

n = 990
Use Heapq: 9.979994501918554e-05
Python Heapq: 0.0848745999392122
Sorted : 0.00012620002962648869

#### if n is big ?

n = 10000
Use Heapq: 0.003642000025138259
Python Heapq: 9.698883199947886

## Sorted : 0.00107999995816499

n = 11000
Use Heapq: 0.0014836000045761466
Python Heapq: 10.537632800056599

## Sorted : 0.0012236000038683414

n = 12000
Use Heapq: 0.001384599949233234
Python Heapq: 12.328411899972707

## Sorted : 0.0013226999435573816

n = 13000
Use Heapq: 0.0020017001079395413
Python Heapq: 15.637207800056785

## Sorted : 0.0015075999544933438

n = 14000
Use Heapq: 0.0017026999266818166
Python Heapq: 17.298848500009626

## Sorted : 0.0016967999981716275

n = 15000
Use Heapq: 0.0017773000290617347
Python Heapq: 20.780625900020823
Sorted : 0.0017105999868363142

### What I find & how to improve it

When n is big, Sorted costs a little time (sometimes even better than use heapq), but Python Heapq costs a lot of time.

• Why Sorted costs a little time, but Python Heapq costs a lot of time?
• Because `sorted()` is a `bulit-in function` in Python, you can find python office document about it.

Bulit-in function is faster than heapq because it is written in C, which is a compiled language.

• How to improve it?
• You can use the built-in function sorted() instead of `heapq.sort()` to improve the performance of your code. The sorted() function is a built-in function in Python, which is implemented in C and is therefore much faster than `heapq.sort()`

### What about use other language?

• Use other language like C++, C#, Java, etc.
``````#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>

// C++ benchmark function
double benchmark(std::function<void(std::vector<int>, int)> func, std::vector<int> ls, int n) {
auto start = std::chrono::high_resolution_clock::now();
func(ls, n);
auto end = std::chrono::high_resolution_clock::now();
return std::chrono::duration<double>(end - start).count();
}

// Using std::partial_sort to find the n largest elements
void find_max_n_sorted(std::vector<int>& ls, int n) {
std::partial_sort(ls.begin(), ls.begin() + n, ls.end(), std::greater<int>());
ls.resize(n);
}

// Manual heap implementation
void _sift_down(std::vector<int>& heap, int index) {
int left = 2 * index + 1;
int right = 2 * index + 2;
int largest = index;

if (left < heap.size() && heap[left] > heap[largest])
largest = left;

if (right < heap.size() && heap[right] > heap[largest])
largest = right;

if (largest != index) {
std::swap(heap[index], heap[largest]);
_sift_down(heap, largest);
}
}

std::vector<int> find_max_n_python_heap(std::vector<int> nums, int n) {
if (n <= 0)
return {};

std::vector<int> max_heap;

for (int num : nums) {
if (max_heap.size() < n) {
max_heap.push_back(num);
for (int i = (max_heap.size() - 2) / 2; i >= 0; --i)
_sift_down(max_heap, i);
} else if (num > max_heap[0]) {
max_heap[0] = num;
_sift_down(max_heap, 0);
}
}

return max_heap;
}

int main() {
std::mt19937 gen(std::random_device{}());
std::uniform_int_distribution<> dis(1, 100);

for (int n = 10; n <= 10000; n += 100) {
std::vector<int> ls(n);
std::generate(ls.begin(), ls.end(), [&]() { return dis(gen); });

std::vector<int> ls_copy1 = ls;
std::vector<int> ls_copy2 = ls;

std::cout << "n = " << n << std::endl;
std::cout << "Use    Heapq: " << benchmark(find_max_n_sorted, ls_copy1, n) << std::endl;
std::cout << "Python Heapq: " << benchmark(find_max_n_python_heap, ls_copy2, n) << std::endl;

// Note: C++ std::nth_element or std::partial_sort could be used here for comparison
}

return 0;
}

``````
``````import java.util.*;
import java.time.Duration;
import java.time.Instant;

public class MaxElementsBenchmark {

public static double benchmark(Runnable func) {
Instant start = Instant.now();
func.run();
Instant end = Instant.now();
return Duration.between(start, end).toNanos() / 1e9;
}

public static List<Integer> findMaxNHeap(ArrayList<Integer> ls, int n) {
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
for (int num : ls) {
if (minHeap.size() < n) {
minHeap.offer(-num);
} else if (-num > minHeap.peek()) {
minHeap.poll();
minHeap.offer(-num);
}
}
List<Integer> result = new ArrayList<>(minHeap);
Collections.sort(result);
Collections.reverse(result);
return result;
}

public static List<Integer> findMaxNManualHeap(ArrayList<Integer> nums, int n) {
if (n <= 0) return new ArrayList<>();

ArrayList<Integer> maxHeap = new ArrayList<>();
for (int num : nums) {
if (maxHeap.size() < n) {
siftDown(maxHeap, maxHeap.size() - 1);
} else if (num > maxHeap.get(0)) {
maxHeap.set(0, num);
siftDown(maxHeap, 0);
}
}
return maxHeap;
}

private static void siftDown(ArrayList<Integer> heap, int index) {
int left = 2 * index + 1;
int right = 2 * index + 2;
int largest = index;

if (left < heap.size() && heap.get(left) > heap.get(largest))
largest = left;

if (right < heap.size() && heap.get(right) > heap.get(largest))
largest = right;

if (largest != index) {
Collections.swap(heap, index, largest);
siftDown(heap, largest);
}
}

public static List<Integer> findMaxNSorted(ArrayList<Integer> ls, int n) {
Collections.sort(ls, Collections.reverseOrder());
return ls.subList(0, Math.min(n, ls.size()));
}

public static void main(String[] args) {
Random rand = new Random();
for (int n = 10; n <= 10000; n += 100) {
ArrayList<Integer> ls = new ArrayList<>();
for (int i = 0; i < n; i++) {
}

System.out.println("n = " + n);
System.out.println("Use    Heapq: " + benchmark(() -> findMaxNHeap(ls, n)));
System.out.println("Python Heapq: " + benchmark(() -> findMaxNManualHeap(ls, n)));
System.out.println("Sorted      : " + benchmark(() -> findMaxNSorted(ls, n)));
}
}
}

``````
• The result is different from the C++ version. The C++ version is faster than the Java version. This is because the C++ version uses a built-in function, while the Java version uses a manual heap implementation.

### Conclusion

When we are dealing with large data, we should use built-in function instead of `heapq.sort()` to improve the performance of our code. We must be wary of time complexity pitfalls when dealing with large data. Sometimes time complexity pitfalls are unavoidable, but we should try to avoid them as much as possible.

Hello, I'm mengqinyuan. I'm a student. I love to learn new things.
You can see my github: [MengQinYuan's Github][https://github.com/mengqinyuan]

### Other things

Eight Queens Problem: [https://www.geeksforgeeks.org/8-queen-problem/]
Time & Space Complexity: [https://www.geeksforgeeks.org/time-complexity-and-space-complexity/]
Heap: [https://www.geeksforgeeks.org/heap-data-structure/]