Motivation
I am sure you recognize this loading spinner icon. I do not know anyone who likes to wait for the computer. However, when w...
For further actions, you may consider blocking this person and/or reporting abuse
Loved it and I know zero about Scala.
One thing I wanted to ask: do you think Scala's
sorted
implementation should be improved?"Unpacking" an Scala list to put it in a Java array to sort it and then to re-pack again as a Scala list seems a very inefficient algorithm.
Is it possible to improve on that? Keep in mind that I don't know how Scala works and I'm sure there's a very good reason for that being the implementation they chose.
I agree with you that it is indeed not very efficient. It basically requires you to copy the data two times. If you want to use the already highly optimized java.util.Arrays.sort, however, it is the only way to do it when you don't work with an array already.
What you could try is to use a sorting algorithm which works well for the data structure you are using, like merge sort for immutable linked lists. If you are using quick sort it probably performs very poorly, as the random access for pivoting will be linear in complexity.
It would actually be interesting to see how a specialized sorting algorithm that requires you to copy the data only once would perform.
About your question whether you can change the standard library: You can try. I don't know if the development team will like it :D But I have to admit that I also thought about it. IMO the reason for choosing this implementation is that it is good enough for most of the cases and it is general enough to work for all sequence-like data structures. Also it was easy to write and verify as it is basically just wrapping around the existing and optimized Java sort.
I'll post an update if I manage to write a merge sort and compare it.
I wrote a simple merge sort which is not tail recursive and compared the Scala sorted and mine on linked lists. Our intuition seems to be correct, you can be more efficient if you put some effort to adjust the algorithm.
I understand your point but, at least from the perspective of the user of a programming language, the expectation is (going to use Python's syntax here) that "mutablelist.sort()" or "immutablelist.sort()" or and so on are going to choose a different sorting algorithm, for each type of the caller.
I wasn't proposing to change Scala's implementation, just trying to understand why there's only one if there are so many containers with wildly different behaviours.
If you're curious, as far as i know this is still Python's sorting algorithm: timsort
I'm not sure about whether one should have this expectation. The problem is that there are so many different possible implementations of a value sequence and so many different sorting algorithms. How would you possibly know what is the best sorting algorithm when designing the library? It might not only depend on the data structure but also the data itself. The strategy to have a general-purpose algorithm in your library that works everywhere is not a bad one IMHO. It will be good enough for all the cases where performance does not matter. If performance matters you anyway have to put some thought into picking the right data structure and right sorting algorithm.
I don't know but I could imagine that it's good enough like this. Maybe the development focus of the language was just different and nobody complained / made an effort to change it.
This is also indicated by the documentation of scala.util.Sorting: "Note also that high-performance non-default sorts for numeric types are not provided. If this is required, it is advisable to investigate other libraries that cover this use case."
So it seems that if you need fast sorting, stick to arrays. If this is not fast enough, try a different library.
Java > 6 also uses timsort for sorting arrays of objects (still quick sort for primitives afaik). But does Python have different data structures or only arrays? What happens if you try to call timsort on a linked list in Python? (sorry if my question is stupid but I didn't use Python a lot).
Thanks a lot for the discussion :) It's a lot of fun to dig deeper!
Python's main container structures are immutable arrays (tuples) and mutable arrays (lists).
I don't know all the details of the implementation but they are both C arrays.
Lists: How are lists implemented? and this
Tuples: stackoverflow.com/a/14135893/4186181
Thank you for the discussion! :-)
I see. I learned a lot during the writing and discussion process :)
Awesome!!
Thank you!
Wow this is incredibly thorough. Amazing job.
Thank you so much! I'm always spending way more time than I initially wanted but glad to see that it's worth it :)