Timsort
Timsort is the most popular sorting algorithm that you've never heard of. If you've spent any time studying sorting algorithms in an academic context, you're probably familiar with the usual suspects: merge sort, quick sort, binary sort, and so on. Timsort is quite unique though. If you've used the native sort methods in Python or NodeJS, you've interfaced with Timsort. Let's take a look at what Timsort is...
The what
Timsort is a hybrid sorting algorithm. Hybrid algorithms are algorithms that use two or more sub-algorithms that solve the same problem, such as sorting. A hybrid algorithm will use one of the two sub-algorithms depending on the input data or at different points in the course of the algorithm's execution. Hybrid algorithms are great because they can allow you to combine the best of both worlds when it comes to picking an ideal solution for a problem.
Hybrid algorithms are great because they allow you to combine the best of both worlds...
Timsort uses two sub-algorithms under the hood, insertion sort and merge sort. Insertion sort is a sorting algorithm that sorts an unsorted list by iterating through each item in the list one-by-one and placing it in the correct position.
Merge sort is a divide-and-conquer sorting algorithm that sorts a list by repeatedly dividing the list into smaller lists, sorting those lists, and then merging back the sorted lists together.
Merge sort and insertion sort each have their strengths and weaknesses. Timsort uses insertion sort when the size of the input list is small. Timsort starts off by using merge sort. The input list is repeatedly divided into smaller halves.
Eventually, if the length of one of the halves is equal to the length of a run, Timsort will use insertion sort to sort the list. Then, Timsort will merge back the two lists using merge sort. However, Timsort's merge sort strategy is a little different from traditional sorting algorithms. It implements a galloping approach. Typically, when merging two sorted lists together, merge sort will look at the items in the input lists one by one to determine which one should be added to the resulting list first.
The how
Timsort is famously implemented as the default sorting algorithm in the Python programming language. Those who are brave of heart can take a look at the implementation for Timsort in CPython on GitHub. There is a lot of sort-related code in this file, but most of it provides support for the fundamental requirements of Timsort, like the implementation of a merge sort algorithm.
The why
Timsort's popularity has extended beyond the Python programming language. It's the default sorting implementation in Java, JavaScript and Node (via the V8 JavaScript engine), and Octave. Its popularity stems from the fact that it's particularly tuned for the types of lists that one might come across in real-world scenarios. Timsort is highly performant on data that is already partially sorted because it looks for "runs" in the input list. "Runs" are segments of the list, having a minimum of two items, that are in strictly descending or ascending order.
Essentially, Timsort looks for these already-sorted runs and merges them together to avoid extra work when sorting through the entire list.
Timsort falls back to insertion sort for short lists because insertion sort on a small number of elements tends to perform better than merge sort. It does not have the same overhead that merge sort has when it comes to managing the recursive calls and merging the lists back together.
Conclusion
So there you have it. That's the end of the first edition of Algorithm Archaeology covering Timsort. For those who are fans of cliff notes:
- Timsort is an adaptive algorithm, meaning it uses two different sub-algorithms depending on the situation.
- Timsort uses merge sort to sort the list unless the length of the current list being sorted is less than a particular number N. In Python, N is 64.
- Timsort is the default sorting algorithm in Python, Java, and NodeJS.
For those curious to learn more, I recommend reading Tim Peters' original notes on the algorithm.
Stay tuned for more of these posts! I've got some fun stuff in the works. ;)
Top comments (9)
That was a great article. I've always been interested in learning more about the algorithms that programming languages use internally and this helped scratch that itch.
Also wanted to point out that the Github link to the TimSort code is invalid.
Thank you!
The links should be fixed now.
Great explanation Safia! And thanks for refreshing my memory. As an old timer of Python I do remember when Tim Peters added this algorithm, for a while it seemed like only the Python community had this "super power" at their disposal 😂
ps. The link "Timsort in CPython on GitHub." is broken and the link at the bottom points indeed to the C implementation. I wonder if you meant to link this: bugs.python.org/file4451/timsort.txt
Argh! Apologies. I was trying out Notion to write my blog posts and turns out it adds a redirect to all links in the exported Markdown. This is fixed now.
Thank you!
A very nicely written guide for a person who does not like the sorting algorithms where a person has to remember many things such as different scenarios and all other things related to time complexities, best and worst cases as well.
The Timsort looks a great way to tackle all kinds of questions and especially questions related to the interview as well.
I will Surely Implement this new Algorithm by myself so that I can understand the working of this algorithm more deeply and thoroughly as well.
Once more thank you for writing such an amazing article for an ordinary programmer like me.
Safia, thanks for sharing. Never knew that.
Fun fact: I read the whole article as Timesort until I saw the name Tim at the end :D
I had to stop myself from typing "Timesort" so many times while writing this.
I read the whole thing as Timesort until I read these comments 🤪