Timsort is the most popular sorting algorithm that you've never heard of. If you've spent any time studying sorting algorithms in an academic context, you're probably familiar with the usual suspects: merge sort, quick sort, binary sort, and so on. Timsort is quite unique though. If you've used the native sort methods in Python or NodeJS, you've interfaced with Timsort. Let's take a look at what Timsort is...
Timsort is a hybrid sorting algorithm. Hybrid algorithms are algorithms that use two or more sub-algorithms that solve the same problem, such as sorting. A hybrid algorithm will use one of the two sub-algorithms depending on the input data or at different points in the course of the algorithm's execution. Hybrid algorithms are great because they can allow you to combine the best of both worlds when it comes to picking an ideal solution for a problem.
Hybrid algorithms are great because they allow you to combine the best of both worlds...
Timsort uses two sub-algorithms under the hood, insertion sort and merge sort. Insertion sort is a sorting algorithm that sorts an unsorted list by iterating through each item in the list one-by-one and placing it in the correct position.
Merge sort is a divide-and-conquer sorting algorithm that sorts a list by repeatedly dividing the list into smaller lists, sorting those lists, and then merging back the sorted lists together.
Merge sort and insertion sort each have their strengths and weaknesses. Timsort uses insertion sort when the size of the input list is small. Timsort starts off by using merge sort. The input list is repeatedly divided into smaller halves.
Eventually, if the length of one of the halves is equal to the length of a run, Timsort will use insertion sort to sort the list. Then, Timsort will merge back the two lists using merge sort. However, Timsort's merge sort strategy is a little different from traditional sorting algorithms. It implements a galloping approach. Typically, when merging two sorted lists together, merge sort will look at the items in the input lists one by one to determine which one should be added to the resulting list first.
Timsort is famously implemented as the default sorting algorithm in the Python programming language. Those who are brave of heart can take a look at the implementation for Timsort in CPython on GitHub. There is a lot of sort-related code in this file, but most of it provides support for the fundamental requirements of Timsort, like the implementation of a merge sort algorithm.
Essentially, Timsort looks for these already-sorted runs and merges them together to avoid extra work when sorting through the entire list.
Timsort falls back to insertion sort for short lists because insertion sort on a small number of elements tends to perform better than merge sort. It does not have the same overhead that merge sort has when it comes to managing the recursive calls and merging the lists back together.
So there you have it. That's the end of the first edition of Algorithm Archaeology covering Timsort. For those who are fans of cliff notes:
- Timsort is an adaptive algorithm, meaning it uses two different sub-algorithms depending on the situation.
- Timsort uses merge sort to sort the list unless the length of the current list being sorted is less than a particular number N. In Python, N is 64.
- Timsort is the default sorting algorithm in Python, Java, and NodeJS.
For those curious to learn more, I recommend reading Tim Peters' original notes on the algorithm.
Stay tuned for more of these posts! I've got some fun stuff in the works. ;)