DEV Community

loading...

Split, Apply, Merge in D

Jesse Phillips
Senior Quality Assurance (SDET) ¶ Avid hobby D programmer ¶ Telling people what to do because I am right.
・2 min read

I wanted to find Groupby, a means to iterate a list in groups (lists of lists). In that search I came across this article about split, apply, merge for datatables. This looked like what I wanted, but it being specific to data science had me confused.

In D these function are chunkBy, map, joiner. The pattern of consistency continues as we just need to specify what to group on, once our list is sorted.

import std.algorithm;

auto data = [1,1,2,2];
assert(data.chunkBy!((a, b) => a==b)
           .equal!equal([[1,1],[2,2]));
Enter fullscreen mode Exit fullscreen mode

Unlike previous lambdas, this one is taking two arguments, this allows for elements to be grouped in interesting ways.

import std.algorithm;

auto data = [1,1,2,2,3,3];
auto evenGrouping(int a, int b) {
    if(a%2 == b%2)
        return a < b;
    return a%2 < b%2;
} 

assert(data.sort!evenGrouping
           .chunkBy!((a,b) => a%2==b%2)
           .equal([[2,2],[1,1,3,3]]));
Enter fullscreen mode Exit fullscreen mode

As mentioned sorting needs to happen first.

import std.algorithm;
import std.range;

auto data = [3,3,1,1,2,2];

assert(data.sort!((a, b) => a%2 < b%2) 
           .chunkBy!((a,b) => a%2==b%2)
           .map!(x => x.array.sort)
           .equal!equal([[2,2],[1,1,3,3]]));
Enter fullscreen mode Exit fullscreen mode

In this contrived example I decided it best to run it through a compiler. It was a good thing as I found a difference in behavior. I'll save map for another day.

Two types of lambda functions are supplied to these functions. One takes a single argument which gets referred to as unary predicate and one that takes two which gets referred to as binary predicate.

When a unary predicate is supplied to chunkBy it returns a tuple of the quality found and the value. This is an interesting optimization but this overload should live with group which already has this behavior.

Discussion (0)