DEV Community

Cover image for Microbenchmarking your Scala Code

Microbenchmarking your Scala Code

Frank Rosner on March 21, 2018

Motivation I am sure you recognize this loading spinner icon. I do not know anyone who likes to wait for the computer. However, when w...
Collapse
 
rhymes profile image
rhymes

Loved it and I know zero about Scala.

One thing I wanted to ask: do you think Scala's sorted implementation should be improved?

"Unpacking" an Scala list to put it in a Java array to sort it and then to re-pack again as a Scala list seems a very inefficient algorithm.

Is it possible to improve on that? Keep in mind that I don't know how Scala works and I'm sure there's a very good reason for that being the implementation they chose.

Collapse
 
frosnerd profile image
Frank Rosner • Edited

I agree with you that it is indeed not very efficient. It basically requires you to copy the data two times. If you want to use the already highly optimized java.util.Arrays.sort, however, it is the only way to do it when you don't work with an array already.

What you could try is to use a sorting algorithm which works well for the data structure you are using, like merge sort for immutable linked lists. If you are using quick sort it probably performs very poorly, as the random access for pivoting will be linear in complexity.

It would actually be interesting to see how a specialized sorting algorithm that requires you to copy the data only once would perform.

About your question whether you can change the standard library: You can try. I don't know if the development team will like it :D But I have to admit that I also thought about it. IMO the reason for choosing this implementation is that it is good enough for most of the cases and it is general enough to work for all sequence-like data structures. Also it was easy to write and verify as it is basically just wrapping around the existing and optimized Java sort.

I'll post an update if I manage to write a merge sort and compare it.

Collapse
 
frosnerd profile image
Frank Rosner • Edited

I wrote a simple merge sort which is not tail recursive and compared the Scala sorted and mine on linked lists. Our intuition seems to be correct, you can be more efficient if you put some effort to adjust the algorithm.

  def merge(xs: List[Int], ys: List[Int]): List[Int] =
    (xs, ys) match {
      case (Nil, l) => l
      case (l, Nil) => l
      case (xh :: xt, yh :: yt) =>
        if (xh <= yh)
          xh :: merge(xt, ys)
        else
          yh :: merge(xs, yt)
    }

  def sort(xs: List[Int]): List[Int] =
    xs match {
      case Nil         => Nil
      case head :: Nil => head :: Nil
      case xs =>
        val (left, right) = xs.splitAt(xs.length / 2)
        merge(sort(left), sort(right))
    }
  performance of "List" in {
    measure method "sorted" in {
      using(list) in { l =>
        l.sorted
      }
    }
    measure method "merge-sort" in {
      using(list) in { l =>
        sort(l)
      }
    }
  }

graph
legend

Collapse
 
rhymes profile image
rhymes

I understand your point but, at least from the perspective of the user of a programming language, the expectation is (going to use Python's syntax here) that "mutablelist.sort()" or "immutablelist.sort()" or and so on are going to choose a different sorting algorithm, for each type of the caller.

I wasn't proposing to change Scala's implementation, just trying to understand why there's only one if there are so many containers with wildly different behaviours.

If you're curious, as far as i know this is still Python's sorting algorithm: timsort

Thread Thread
 
frosnerd profile image
Frank Rosner

the expectation is (going to use Python's syntax here) that "mutablelist.sort()" or "immutablelist.sort()" or and so on are going to choose a different sorting algorithm, for each type of the caller.

I'm not sure about whether one should have this expectation. The problem is that there are so many different possible implementations of a value sequence and so many different sorting algorithms. How would you possibly know what is the best sorting algorithm when designing the library? It might not only depend on the data structure but also the data itself. The strategy to have a general-purpose algorithm in your library that works everywhere is not a bad one IMHO. It will be good enough for all the cases where performance does not matter. If performance matters you anyway have to put some thought into picking the right data structure and right sorting algorithm.

just trying to understand why there's only one if there are so many containers with wildly different behaviours.

I don't know but I could imagine that it's good enough like this. Maybe the development focus of the language was just different and nobody complained / made an effort to change it.

This is also indicated by the documentation of scala.util.Sorting: "Note also that high-performance non-default sorts for numeric types are not provided. If this is required, it is advisable to investigate other libraries that cover this use case."

So it seems that if you need fast sorting, stick to arrays. If this is not fast enough, try a different library.

If you're curious, as far as i know this is still Python's sorting algorithm: timsort

Java > 6 also uses timsort for sorting arrays of objects (still quick sort for primitives afaik). But does Python have different data structures or only arrays? What happens if you try to call timsort on a linked list in Python? (sorry if my question is stupid but I didn't use Python a lot).

Thanks a lot for the discussion :) It's a lot of fun to dig deeper!

Thread Thread
 
rhymes profile image
rhymes • Edited

Python's main container structures are immutable arrays (tuples) and mutable arrays (lists).

I don't know all the details of the implementation but they are both C arrays.

Lists: How are lists implemented? and this

Tuples: stackoverflow.com/a/14135893/4186181

Thank you for the discussion! :-)

Thread Thread
 
frosnerd profile image
Frank Rosner

I see. I learned a lot during the writing and discussion process :)

Collapse
 
liana profile image
Liana Felt (she/her)

Awesome!!

Collapse
 
frosnerd profile image
Frank Rosner

Thank you!

Collapse
 
ben profile image
Ben Halpern

Wow this is incredibly thorough. Amazing job.

Collapse
 
frosnerd profile image
Frank Rosner

Thank you so much! I'm always spending way more time than I initially wanted but glad to see that it's worth it :)