Jonas Winzen

Posted on May 27, 2019

Transducers - a Generalized Concept for Data Transformations

#javascript #functional #transducers #advanced

When developing software, we sometimes can't get around of handling data in some way.
No matter, if you work on the frontend (where the UI you are building is basically a function of the data) - or on the backend (backends tend to be rather data heavy).

Typical tasks when processing data include (but are not limited to):

Filtering out datasets (like a facebook search should only give me matches with my search term or redacting information of privacy reasons - like bank numbers, email addresses or even passwords)
Mapping binary data into a human readable format or vice versa

...well, okay. Facebook might not be the best example for demonstrating applied data privacy...

TL;DR You can skip the theory, if you're not interested or already know the basics.

Theory of Data and Transformations

Data

Before working on data, we need to have an idea of how the data looks like. In general the structure can be defined as a collection of items like:

Rows in a database table - where the table is the collection and rows are the items
Elements in a set - with elements as items, the set as the collection
Fields in an array
Elements in a list
Entries in a dictionary
Nodes in a tree

Any collection could be embedded as an item into another collection. Let's formalize this statement:

-- an Item is defined as either:
-- 1. a Primitive value
-- 2. a Collection of Items
data Item = Primitive | Collection Item

Note: I also tried to find a representation for this in Typescript notation. But Typescripts type system doesn't seem to be powerful enough to describe recursive types

Transformable Data

Since our definition of data consists of just two kinds of types (Item and Collection), we can only have transformations on the collection level or on the item level. Where filtering (deciding for each element whether or not to include it in the result) is a transformation on the collection level.
A collection that can be filtered is called Filterable.

Mapping is the process of taking each item from its container (the collection), applying a transform to the item, and putting it back into a container of the same kind of collection where it came from. Mapping is a transformation on the item level.
A collection (or container), where you can map over its contents is called Functor.

In Practice

The Common Approach

Javascript comes with native array methods for mapping and filtering array data. Most notable:

Array.prototype.map(mapFn)
Array.prototype.filter(predicateFn)
Array.prototype.reduce(reducerFn [, init])

Now let's make up a simple example, to see how each of them works.
We have a collection of bibliographical data. Each item represents a Book or publication, that has a unique id, a title, was written by one or more authors and has a publication date (in form of a unix timestamp) in ms since epoch.

type BookOrArticle = {
  id: string,
  title: string,
  authors: Array<string>,
  publicationDate: number,
};

We have an array of (let's say) 10,000 books and and articles and we need to get all items that contain the word "guide" in the title, written by J. Doe and published in 2007. The result items should be in the form:

type Result = {
  title: string,
  author: string,
  date: string
};

1. Naive Approach

Lets take a first approach:

const booksAndArticlesArray = [
  /* our array of books and articles */
];

function getResults(inputArray) {
  return inputArray
    .map(item => ({
      ...item,
      year: new Date(item.publicationDate).getFullYear()
    }))
    .filter(({ year }) => year === 2007)
    .filter(({ title }) => /guide/i.test(title))
    .map(({ title, authors, publicationDate }) => ({
      title,
      author: authors.join(", "),
      date: new Date(publicationDate).toDateString()
    }))
    .filter(({author}) => author.includes("J. Doe"))
}

It might not be immediately visible, but each step of filtering or mapping creates an intermediate array containing the result, that is used as an input for the next filter/map step.

2. Optimized Approach

To reduce the number of intermediate data structures created, one could try and reduce the number of transformation steps by collapsing consecutive map and filter operations:


// collapse consecutive filter operations

function getResults(inputArray) {
  return inputArray
    .map(item => ({
      ...item,
      year: new Date(item.publicationDate).getFullYear()
    }))
    .filter(
      ({ year, title }) => year === 2007 && /guide/i.test(title)
    )
    .map(({ title, authors, publicationDate }) => ({
      title,
      author: authors.join(", "),
      date: new Date(publicationDate).toDateString()
    }))
    .filter(({ author }) => author.includes("J. Doe"));
}

We could further optimize this by moving the mapping from the authors array field in the second map operation into the first map operation. This would allow us to collapse the the final filter operation with the other filter:


// 1. move mapping authors array to author string into first map operation
// 2. collapse final filter operation with previous filter operation

function getResults(inputArray) {
  return inputArray
    .map(({ publicationDate, authors, ...item }) => ({
      ...item,
      author: authors.join(", "),
      year: new Date(publicationDate).getFullYear()
    }))
    .filter(
      ({ year, title, author }) =>
        year === 2007 &&
        /guide/i.test(title) &&
        author.includes("J. Doe")
    )
    .map(({ title, author, publicationDate }) => ({
      title,
      author,
      date: new Date(publicationDate).toDateString()
    }));
}

We reduced the number of intermediate data structures created from 5 to 3, but at the cost of readability. Further reduction is possible by moving the author and year transforms from the first map operation into the filter step and remove the first map operation (this comes at the cost of reducing readability, too).

Mapping and Filtering via Reduce

What if we could express filter and map in a way, that allows us to directly compose them.

Reminder: Compose or function composition means, that you call one function with the result of another function.
I.E: Given the functions f: X -> Y and g: Y -> Z, we call g.f(x) := g(f(x)) the composition of f with g with g.f: X -> Z (that means f.g(x) takes values of X as argument and returns values of Z as result).
Example: Composing f(x) => x + 1 and g(x) => x * x to g(f(x)) = h(x) results in h(x) => (x + 1) * (x + 1). We get this result by substituting x in g(x) with the function body of f(x).

So it looks like function composition is the right tool to express all our map and filter transformation steps at once. As reduce is one of the most versatile operations on arrays (or any other iterable structure), let's try to express map and filter as reducers.

Reminder: Reducers are functions, that can be used as a callback in Array.prototype.reduce(). The first argument of a reducer is the accumulator (the return value of the last reducer call). The second argument is the element we are looking at (just like the first argument in filter and map). Thus a reducer has the following signature: reducer(acc: A, elem: B) => A, or in Hindley-Milner notation: reducer :: A -> B -> A (The HM notation assumes that every function is curried. So a function that should take two arguments is expressed a function (A -> B -> C):

that takes one argument (A) -> returns a function (B -> C)

that takes one argument (B) -> returns the result (C)

Let's try to rebuild map and filter as a reducer. For map we need a function that takes a mapping function as argument and returns a reducer function as result. For filter we need a function, that takes a filter predicate and also returns a reducer function as result:

const map = mapFn => (acc, elem) => acc.concat(mapFn(elem));

const filter = predicateFn => (acc, elem) => {
  if (predicateFn(elem)) {
    /*
     * append the element
     * if predicate evaluates to a truthy result
     */
    return acc.concat(elem);
  }
  /*
   * don't append anything otherwise
   */
  return acc;
};

If you are now wondering 'How would one compose this?', you are wondering right. Composing two functions requires that the argument type of the outer function, matches the return type of the inner function. In the above functions, neither the result of map(mapFn) would fit into map nor into filter or vice versa. There is simply no way to do so.

What we would need, is that map(mapFn) (or filter(predicateFn)) returns a function that expects a function of its own type (I know it becomes very convoluted here, but sty with me: the resolution is simple).

We resolve this problem, by further generalising map and filter. The above implementations are just suited to work with arrays as output. But one could imagine having any type of data structure as output, that allows adding elements (like trees, heaps, maps, sets, linked lists, etc.). So let's get rid of acc.concat(x) calls in the above code and replace it with combine(acc, x), where combine is provided via an argument of the initially returned function:

const map = mapFn => combine => (acc, elem) => combine(acc, mapFn(elem));

const filter = predicateFn => combine => (acc, elem) => {
  if (predicateFn(elem)) {
    /*
     * add the element to the result data structure
     * if predicate evaluates to a truthy result
     */
    return combine(acc, elem);
  }
  /*
   * don't add anything otherwise
   */
  return acc;
};

Now take a close look, how combine is being used:

First argument: the accumulator (i.e. the result data structure)
Second argument: the element that should be added to the result data structure
Return value: the result data structure, containing the element

This not only looks like a reducer, it is a reducer!

xform (1)

Let's start using our new map and filter and build our example transform from above:

const booksAndArticlesArray = [
  /* our array of books and articles */
];

const xform = (acc, elem) =>
  map(item => ({
    ...item,
    year: new Date(item.publicationDate).getFullYear()
  }))(
    filter(({ year }) => year === 2007)(
      filter(({ title }) => /guide/i.test(title))(
        map(({ title, authors, publicationDate }) => ({
          title,
          author: authors.join(", "),
          date: new Date(publicationDate).toDateString()
        }))(
          filter(({ author }) => author.includes("J. Doe"))(
            (acc, elem) => acc.push(elem)
          )
        )
      )
    )
  )(acc, elem);

const result = booksAndArticlesArray.reduce(xform, []);

... I don't know about you, but I find this horrible 🤮. I would't approve any PR, that contains a thing like this.
To simplify readability, we introduce a general notion of compose, that composes n functions:

const compose = (...functs) => (...args) =>
  functs.length > 1
    ? compose(...functs.slice(0, -1))(
        functs[functs.length - 1](...args)
      )
    : functs[0](...args);

Note: The above implementation of compose is tail-recursive. So you could rewrite the recursion into a loop. Feel free to take this as an exercise. I'm happy to review your solution in the comments 🤓.

We now have a proper implementation for compose. Let's refactor our xform from above and bring it into a readable form:

xform(2)


const xform = compose(
  map(item => ({
    ...item,
    year: new Date(item.publicationDate).getFullYear()
  })),
  filter(({ year }) => year === 2007),
  filter(({ title }) => /guide/i.test(title)),
  map(({ title, authors, publicationDate }) => ({
    title,
    author: authors.join(", "),
    date: new Date(publicationDate).toDateString()
  })),
  filter(({ author }) => author.includes("J. Doe"))
);

As we have now a clean xform, we could define the function, that will use it, to apply our transforms. We call the "framework" for running xforms against data xduce. xduce takes four arguments:

the xform
a combiner (combine)
an initializer (getInitial)
the input data (inputArr)

/*
 * xduce draft
 */
const xduce = (xform, combine, getInitial, inputArr) =>
  inputArr.reduce(xform(combine), getInitial());

As we might want reuse the transform, we curry the last argument and default combine and getInitial arguments (for our convenience 😇):

/*
 * xduce (reworked - curried-data-last)
 */
const xduce = (
  xform,
  combine = (acc, elem) => (acc.push(elem), acc), 
  getInitial = () => []
) => inputArr => inputArr.reduce(xform(combine), getInitial());

Note: The above code uses Array.prototype.push as the combiner. Since push is a mutating operation, we have to make sure, to create a new initial collection with every call to xduce. Otherwise subsequent calls would add up results to the same array. This is usually not what we want and produces unexpected results. Alternatively you could use (acc, elem) => acc.concat(elem) as combiner. Array.prototype.concat does not mutate its source, but will be considerably slower (especially with large arrays).

Besides the provided default, you could use any data structure you like for your results. Just make sure, that the combiner and your initial collection fit together (e.g. for ES Set use (acc, elem) => acc.add(elem) as combiner and () => new Set() as your getInitial arguments).

Next step is to use our xform with xduce:

/*
 * reminder: xduce(xform) returns a reusable transform
 * that just expects input data
 */
const bookFilterMapTransform = xduce(xform);


/*
 * using bookFilterMapTransform
 */
const result = bookFilterMapTransform(booksAndArticlesArray);

The Concept

The concept explained here is also known as transducers. As transducers is such a complicated sounding word, I chose a more descriptive name for the article.
The first transducers implementation was provided in Clojure. The concept gained popularity and was ported to other languages like Javascript, Python, ReasonML/OCaml, Elixir and many more.

There are some notable library implementations of transducers available:

If you want to know more about transducers, you'll find a reading list at the end of the article.

Benchmarks

The real power of transducers will show up when using them with really large sets of data.
I made some benchmarking, to give you an idea of the performance advantage of the transducers approach. The xduce, map, filter and compose implementations, are exactly the ones I provided in this article.

Following transformations were used for the benchmark:

Transducer

const benchmarkTransducer = xduce(
  compose(
    map(function(x) {
      return x + 10;
    }),
    map(function(x) {
      return x * 2;
    }),
    filter(function(x) {
      return x % 5 === 0;
    }),
    filter(function(x) {
      return x % 2 === 0;
    })
  )
);

Native/Classic Transform

const classicTransform = arr =>
  arr
    .map(function(x) {
      return x + 10;
    })
    .map(function(x) {
      return x * 2;
    })
    .filter(function(x) {
      return x % 5 === 0;
    })
    .filter(function(x) {
      return x % 2 === 0;
    });

For time values of each array length, I took the average time from running each implementation 16 times (both on the exact same array). I created 64 equidistant steps. The smallest array had a length of 10417, the largest had a length of 489583 items.

Both implementations behave very similar in their runtime characteristics below an array size of 60k values (with transducers being only minimally faster than the classical transformation chain). Between 60k and 70k we see an almost immediate increase in the running time of the classical approach. I don't know the exact reasons for this jump (if someone has an idea, please let me know in a comment 🙃).
Here is a more detailed image of this region:

Transducers also perform great in comparison with other libraries (e.g. Immutable.js):

DEV Community

Transducers - a Generalized Concept for Data Transformations

Theory of Data and Transformations

Data

Transformable Data

In Practice

The Common Approach

1. Naive Approach

2. Optimized Approach

Mapping and Filtering via Reduce

xform (1)

xform(2)

The Concept

Benchmarks

Further Reading

Top comments (4)

Read next

Global Error Handling in Angular

🔥 Blockchain is way More Than Just Crypto! Here's What's Really Cool Right Now

How the new concepts of JSSugar and JS0 are able to slow down websites

To access terminal profile of VS Code and add commands to be run initially