In the last post we looked at how we could make
Array more expressive. By which I mean we added a way for people to extend the data structure to support more use cases. But while third parties now can add missing functionality, there are certain functions which should be supported out of the box, both for performance and convenience reasons.
Elm has four built-in collection types:
Set. With the exception of
Dict, these types look similar from a birds eye view. You can insert an item into the collection, get it back out, and even perform some task for every item stored. Of course, they do have different trade-offs which make them suitable for different situations.
List is great when adding and retrieving items at the beginning (left side) of the collection.
Array is great when you need fast access to any element, and
Set is good when you need a sorted collection with no duplicates.
The API for these collections are different as well. Sometimes this makes sense, but not always. For instance, there is no
sort function for
Set, which is reasonable as
Set is a sorted collection. However, there also isn't a
sort function for
Array, which means that you have to convert it to a
List or a
Set and then back again if you want it sorted.
Converting from one collection type to another is a costly operation. If you use an
Array because it suits your use case most of the time, you don't want to convert to another collection just to perform that one operation.
Let's look at how we can improve the API of
Set to reduce the need for converting from one collection type to another to support the functionality you need.
Certain operations aren't supported on a collection type with good reason. For instance, there is not
set operation on
List because the performance would be bad. If one need to
set a random item in a collection,
Array should be used. Likewise,
diff should only be supported for
Set. Here are some functions that are not supported by all three collection types, which I believe shouldn't be supported either:
push can only be supported efficiently by their respective collections. If you're in need of one of these functions, it's a tell-tale sign of which collection you should use. However, with the improvements I've mentioned in earlier posts,
Array could support all of these functions efficiently, which is yet another reason as to why
Array should be the default collection type.
toIndexedList make no sense for
Set, as it doesn't operate by indexes and doesn't retain insertion order.
repeat doesn't make much sense for
Set either, as it cannot contain duplicates.
diff can only be efficiently implemented by
head can be supported efficiently by the other collections, but the operation itself really only makes sense for
Array would use
get 0, and it isn't quite clear what
head means in the case of
Sets, is it the first/lowest/minimal element or the root/middle element?
sortWith doesn't make sense for
Array should definitely get these functions.
reverse can be implemented efficiently for
Array, but makes little sense for
Below is a list of functions which can be easily supported across all three collections:
- map2, map3, map4, map5
While these functions could be supported across all three collections, does that mean they should? Since we're already messing about, let's ask ourselves if there are certain functions that shouldn't be in core to begin with.
product are essentially shortcuts for
List.foldl (+) 0 and
List.foldl (*) 1. Are those operations common enough to grant them dedicated functions in the core library? If we introduce
slice for all three collections, do we need
Do people actually use
map5? How about
unzip? I mean, there probably are, but are those operations so common that one would expect to find them in core? Not that it necessarily does any harm, but there is something to be said of having a small, but powerful, package where you can read the entire documentation in a short sitting.
Anyway, here's a list of which functions I, personally, see no reason for having in core. I expect people will tell me that these functions definitely belong, but I think it would be nice to at least have the discussion.
Here are some minor things I think shouldn't be left out of a API renovation:
Array has a
size. This is likely due to the fact that
Set is a thin abstraction on top of
Dict and so has just copied the name. I would rename
Set.length out of consistency if nothing else.
Now that I'm on the topic, since
Set is thin abstraction on top of
Dict, maybe everything discussed up til this point applies to
Dict as well?
tail. The analogous for
pop is missing from the API, so I would add it.
I would add
find for all three collections.
Set already supports
member, which is essentially
find except that it returns a boolean instead of the value for which the predicate function returns true. It's a more useful function in general, and so if
member should be a part of the API (which I believe it should) then
find belongs as well.
Dict I would throw in
max to retrieve the minimum/first and maximum/last element.
Stack is a better name than
List and so would rename that type. It's a name that more clearly explains what that collection type is good at.
Finally, as mentioned in the previous post, I think
stoppableFoldr, or something with the same functionality but with better names, should be added to the API of
Array. I think
Set could be well served with that functionality as well.
The goal I'm trying to reach is to make the
Array type more general purpose, but when improving the
Array API, it makes sense to look at the other collection types as well. This article lays out what I believe those APIs should look like.
First, I want to say that I'm not authority on what makes it into Elm. If I actually code up the suggestions in this article and submit it as a proposal, all of it could make it in, or none of it.
Second, all these suggestions are based on my own experience. What I need is your experience. Does these suggestions make sense, or am I off my rockers? Let me know in the discussion thread on the Elm Discourse.
So, is this the end of this series? Not quite. There's a summary left the be published, but that will have to wait until after we go over the top (you didn't think I'd forgot, did you?) in the next post, which is about the literal representation of