Reading this fine article on C# Groupby
got me thinking I should try something. D provides a GroupBy in std.algorithms but it is called chunk By.
One of the primary subjects to grasping idiomatic D is ranges
. And while materials have been written on it
- https://dlang.org/phobos/std_range_primitives.html
- https://tour.dlang.org/tour/en/basics/ranges
- http://ddili.org/ders/d.en/ranges.html
- https://en.m.wikipedia.org/wiki/Pipeline_(Unix)
I can't say I feel good about it as an introduction. And while many of my own articles touch on ranges I stand clear of really defining them. I will continue in that tradition, but I hope this will help me understand what I'm missing.
Clickbait
Let's get the clickbait title out of the way.
D comes with and defaults to using garbage collection, your application will utilize memory. What this article will cover is some choices made which avoid additional memory allocation. And ranges are not guaranteed to be memory allocation free.
chunkBy
Ranges provide a means to obtain elements from a container. chunckBy
is a way to group iterating those elements based on some similarities.
Let's set the stage for some data.
import std;
struct Data { int a; string b; }
Data[] elements;
void main() {
elements ~= Data(3, "a");
elements ~= Data(6, "a");
elements ~= Data(3, "b");
elements ~= Data(6, "b");
DoGroupDemo();
}
void DoGroupDemo() {
// this is a running file, continuing in future blocks
Here I established an array which will contain some data of numbers and letters. I want to first call attention to the letters 'a' and 'b'.
@nogc
void groupString(Data[] elements, Data[][] ans) {
auto grouping = elements.chunkBy!((x,y) => x.b == y.b);
assert(grouping.equal!equal(ans));
}
Here I am setting up a method to show calling chunkBy
and it groups with the string values.
Note that I have marked this method to have the compiler prove to you that it does not allocate any additional memory.
Continuing with that proof I have included the assertion which traverses the range and the provided answer.
groupString(elements, [
[Data(3, "a"), Data(6, "a")],
[Data(3, "b"), Data(6, "b")]]);
This shows the grouping. The first list is everything with 'a' and the second list is everything with 'b'.
I built this separation to show off the no garbage collection properties as the array literal will allocate.
@nogc
void groupInt(Data[] elements, Data[][] ans) {
auto grouping = elements.chunkBy!((x,y) => x.a == y.a);
assert(grouping.equal!equal(ans));
}
This time we look at the groupings by the integer. Keep in mind we have established allocation will not occur, so what to expect...
groupInt(elements, [
[Data(3, "a")],
[Data(6, "a")],
[Data(3, "b")],
[Data(6, "b")]]);
With this new grouping it appears no grouping actually happened. Each element is being grouped by itself. This is because chunkBy
is only looking at adjacent elements so it doesn't need to keep an internal memory of elements.
Let's take one final look at another avoidance of allocation.
@nogc
void groupInt2(Data[] elements, Data[][] ans) {
// sort before grouping
elements.sort!((x, y) => x.a < y.a);
auto grouping = elements.chunkBy!((x,y) => x.a == y.a);
assert(grouping.equal!equal(ans));
}
groupInt2(elements, [
[Data(3, "a"), Data(3, "b")],
[Data(6, "a"), Data(6, "b")]]);
}
By sorting the data by the desired grouping gives us results more in line with expectation.
Sort operates by moving elements in place avoiding allocation into a no storage location. (This creates some interesting properties, sorting across arrays)
Asking why chunkBy
doesn't perform an in place sort first would be reasonable. But this would violate another attempted principle, don't modify the underlying data/structure (at least when that isn't the main purpose of the algorithm).
Ranges
Ranges allow for getting at data in different manners. Algorithms utilize ranges to operate on that data and possibly produce new ranges.
We build algorithms when combined solve an assortment of problems. This means the range isn't important, it is an interface for general solutions to build upon.
D's standard library tries to provide low overhead, generic algorithms. This can result in some unexpected behaviors, as such understanding the principles they were created under will increase predictability.
Other Languages
We see other languages moving in this direction.
- C++ are getting ranges and algorithms
- C# has had LINQ methods which operate on IEnumerable
- Java got streams
This next item is about flattening multiple lists. First I will show the data setup
import std;
void main()
{
writeln(SelectMany);
writeln(ResultSelector);
}
struct Book {
string title;
string[] characters;
}
auto books = [
Book("Sphere",
["Harry",
"Norman",
"Beth",
"Jerry"]),
Book("Park",
["Malcolm",
"Grant",
"Satler",
"Nedry",
"Hammond",
"Gennaro",
"Tim",
"Lex"])
];
}
Each book has a list of characters. We are interested in the characters.
@nogc nothrow @safe
auto SelectMany() {
return books
.map!(b => b.characters)
.joiner;
}
Selecting characters consists of mapping each book to grab the character list, then joining the lists together. Once again this is an operation where no memory is allocated.
nothrow @safe
auto ResultSelector() {
return books
.map!(b => b.characters
.map!(c => tuple(b.title, c)))
.joiner;
}
To follow along with the features of select many, we actually perform an inner map on the characters.
In this case the inner map creates a closure around b
this brings in the garbage collector to manage the closure.
Top comments (0)