scottshipp

Posted on Jan 27, 2020 • Edited on Nov 2, 2021

Real-World Java with Predicates and Streams

#java #functional

Java is a far more expressive language today than in the past. If you are still working with Java 7- idioms (whether or not you actually compile in 8+) it's worth checking out some of the powerful language features of today's Java.

In this article, I'll cover one example just to whet your appetite: predicates.

This article is geared towards Java developers who have been working in older codebases, or maybe just haven't tried out new Java features in awhile. It will show classic ways of doing a task in Java and then show a modern equivalent.

New job at the candy factory

You've been hired at CandyCorp, the nation's smallest candy manufacturer. As a pretty new company, our factory has a lot of software needs. Welcome aboard!

To get you oriented, the first thing you should know is that you can call CandyFactory.bagOfCandy() to produce a bag of candy:

Collection<Candy> bagOfCandy = CandyFactory.bagOfCandy();

Right now we only make one kind of candy: a small piece of disc-shaped chocolate in a hard candy shell. Even though that's our only product, we do make them in many colors. Every bag of candy contains an assortment of red, blue, green, and yellow pieces.

Let's take our first story off the backlog.

Count how many pieces of candy with a given color are in a bag

As a factory line quality control manager, I want to randomly choose bags of candy coming off the line and separate out the colors. That way I can perform my tasks more quickly.

One of my tasks is to count how many pieces of candy in each bag have a given color.

This will help me to insure that each bag has enough of any given color on average to keep fans of that color happy.

OK, that's easy enough. As a classic Java programmer, you can do that. You will just generalize this idea to a method, filterByColor, that will take a bag of candy and a given color, then separate out all the pieces matching that color into its own new collection. With that method in hand, the QC manager can perform the many tasks they have. To meet the given example, they can call the size() method on the new collection to find out how many there were.

Collection<Candy> bagOfCandy = CandyFactory.bagOfCandy();
Collection<Candy> redCandies = filterByColor(bagOfCandy, Color.RED);
int numberOfReds = redCandies.size();

Here is a classic, imperative, pre-Java-8 implementation of the filterByColor method.

Collection<Candy> filterByColor(Collection<Candy> assortedCandy, Candy.Color color) {
    Collection<Candy> results = new ArrayList<>();
    for(Candy candyPiece : assortedCandy) {
        if(candyPiece.getColor().equals(color)) {
            results.add(candyPiece);
        }
    }
    return results;
}

To accomplish its task, the filterByColor method performs the following steps.

It:

Creates a new collection, results, to hold the candy that is found to match the given color.
Iterates through the main collection of candy, which is in a variable named assortedCandy.
Checks if the given piece of candy is the given color.
Adds the piece to the new collection if it is the given color.
Returns the new collection.

Probably you've seen a lot of code like this before, as it is a very common use case.

You send this code to production and the QC manager is happy now that they can perform their job easier.

Expanding the line

After awhile, our company thinks about expanding the line of products. We decide to experiment with two new types of candy:

Peanut
Pretzel

For now these two new kinds of candy will be offered in "grab bag" packaging, which means that hungry customers get a bag of candy with all three (regular, pretzel, and peanut) types of candy, as part of a special promotion. If the promotion does well, we know demand is good and we can start making standalone bags of peanut or pretzel candy.

Your team has already added a new method, getType() to the Candy class. When the CandyFactory makes a grab bag, we can get the color and type of each piece of candy with code like:

Collection<Candy> bagOfCandy = CandyFactory.grabBag();
for(Candy candyPiece : bagOfCandy) {
    Candy.Color color = candyPiece.getColor();
    Candy.Type type = candyPiece.getType();
    // now use the color and/or type in some way
}

The QC manager wants to have similar functionality as before to count the candy types in each bag of this promotion. They loved the color filtering method you implemented last sprint. If you could just quickly get them a type filtering method, that would be great. They want to help answer questions like "how many pretzel candies were in the grab bag?"

You copy/paste the prior method and change a couple things . . . ok, actually you just changed everywhere that it said "color" to "type:"

Collection<Candy> filterByType(Collection<Candy> assortedCandy, Candy.Type type) {
    Collection<Candy> results = new ArrayList<>();
    for(Candy candyPiece : assortedCandy) {
        if(candyPiece.getType().equals(type)) {
            results.add(candyPiece);
        }
    }
    return results;
}

And you provide an example usage like this:

Collection<Candy> bagOfCandy = CandyFactory.grabBag();
Collection<Candy> pretzelCandies = filterByType(bagOfCandy, Candy.Type.PRETZEL);
int numberOfPretzels = pretzelCandies.size();

...and the QC manager is happy again!

A nagging feeling

The day after you ship the new code, you're thinking about how you just copy/pasted the new method from the old method. It feels . . . wrong somehow. Looking at the two methods, side-by-side, it's clear that there should be some way to share functionality between them, as they are, in fact, almost the same.

It seems natural to think about how you might write a single method that could account for both use cases. (Or even additional ones that are bound to come up.)

Something like this:

Collection<Candy> filter(Collection<Candy> candies, Object attribute) {
    Collection<Candy> results = new ArrayList<>();
    for(Candy candyPiece : candies) {
        if(/* condition matching the corresponding attribute of the candy to the attribute variable */) {
            results.add(candyPiece);
        }
    }
    return results;
}

But you can't think of a simple way to do this, because the thing you need to generalize here isn't something you can store in a variable. It's code! The boolean condition in the if statement has to actually compare a different attribute (like color, or type) of the Candy object each time.

Usually, you write a method to share code. Is there a way to pass another method into the proposed new filter method, and call the method in the if statement instead?

Classic Java: S.A.M. interfaces

Before Java 8, functionality could only be in methods and methods were always members of a class.

A special pattern was used to share functionality for use cases like this, the Single Abstract Method interface, or S.A.M. Just like it sounds, its simply an interface with a single method. It was used all the time in classic Java. One well-known example is the Comparator interface used to provide ordering criteria to sort algorithms.

We can refactor our two methods, filterByType and filterByColor, into one method by using a S.A.M. The S.A.M. can have a boolean method, and the for loop in the filter method can call the S.A.M.'s boolean method as it iterates through the collection of candy.

CandyMatcher will be our S.A.M. It looks like this:

interface CandyMatcher {
    boolean matches(Candy candy);
}

Using this approach, we write a new more generalized filter method:

Collection<Candy> filter(Collection<Candy> candies, CandyMatcher matcher) {
    Collection<Candy> results = new ArrayList<>();
    for(Candy candyPiece : candies) {
        if(matcher.matches(candyPiece)) {
            results.add(candyPiece);
        }
    }
    return results;
}

We can reuse the above method for both filtering by color and by type, simply by passing a different instance of CandyMatcher with code specific to the required use case.

In order to filter by color, we create a new class that implements CandyMatcher and provides the specific functionality of matching on a given color:

class ColorMatcher implements CandyMatcher {
    private final Candy.Color color;

    ColorMatcher(Candy.Color color) {
        this.color = color;
    }

    @Override
    public boolean matches(Candy c) {
        return c.getColor().equals(this.color);
    }
}

And, finally, we pass the ColorMatcher to the filter method. We know that filtering by color is already being used, so we can rewrite the filterByColor method now in terms of the filter method that uses the CandyMatcher:

Collection<Candy> filterByColor(Collection<Candy> candies, Candy.Color color) {
    ColorMatcher matcher = new ColorMatcher(color);
    return filter(candies, matcher);
}

Now that you've seen the filterByColor and ColorMatcher code, try to implement a TypeMatcher that can be used to match regular, peanut, or pretzel candies before continuing.

The drawback of the S.A.M. approach

If you implemented the method like I did, it looks like this:

    class TypeMatcher implements CandyMatcher {
        private final Candy.Type type;

        TypeMatcher(Candy.Type type) {
            this.type = type;
        }

        @Override
        public boolean matches(Candy c) {
            return c.getType().equals(this.type);
        }
    }

As you can see from looking at the new code, there is more code than before—not less. We gained extensibility (the ability for the code to be extended to account for future use cases) but we also lost readability since the code is both more verbose and more complicated.

One way to solve this in classic Java is to use an anonymous class instead. Rather than write a whole TypeMatcher class, just create it when you need it:

Collection<Candy> filterByType(Collection<Candy> candies, String type) {
    return filter(candies,
                  new CandyMatcher() {
                      @Override
                      public boolean matches(Candy c) {
                          return c.getType().equals(type);
                      }
                  });
}

This is pretty dissatisfying. The anonymous class takes up 5-6 lines of code depending on how you do it. Still, its arguably better than having a whole class for only one use.

But the real problem here is that its messy either way you do it. Anonymous classes are unworkable for anything more complicated than this, and creating a whole other class is a high overhead for something that you're not going to use more than once. With only these options, it's tempting to just go back to the simpler first example, even though it has duplicated code that gives us a nagging feeling.

Modern Java: lambdas

Both are solved problems in modern Java. The solution is lambdas. With a lambda, you only need a single line of code to express the matching concept!

Lambda syntax

This is a lambda:

c -> c.getColor().equals(color)

And this is a lambda:

c -> c.getType().equals(type)

Loosely speaking, lambdas have the following syntax:

A parenthetical set of variables which match a S.A.M. interface method's parameters. When there is only a single variable, the parentheses can be omitted.
An "arrow", which is a dash followed by the greater-than sign: ->
(optionally) an opening curly brace (used only when multiple lines will follow)
The code for the implementation
(optionally) the closing curly brace

You can find the formal definition for lambda expressions in section 15.27 of the Java Language Specification.

Lambda usage

Using a lambda instead of an anonymous class, filterByType now becomes:

Collection<Candy> filterByType(Collection<Candy> candies, Candy.Type type) {
    return filter(candies, c -> c.getType().equals(type));
}

One thing to note is that many Java IDE's now have a refactoring for this change. To go from the anonymous class previously mentioned to a lambda here, I simply applied the refactoring in IntelliJ IDEA, rather than performing the rewrite of the method myself.

Lambdas and Functional interfaces in the standard library

Our code is much cleaner now that we're using lambdas but something new starts to bug us. There's a remaining artifact from our S.A.M. implementation: the CandyMatcher interface. This still feels like a little scrap of boilerplate that we need in order to use lambdas.

Also a solved problem!

The Java standard library actually provides a number of interfaces for this very purpose, called functional interfaces. Functional interfaces are covered in section 9.8 of the Java Language Specification, and are defined thus:

A functional interface is an interface that has just one abstract method (aside from the methods of Object), and thus represents a single function contract.

Sometimes I think of a functional interface as an interface that can be used as the type of a lambda. For example, in order to pass a lambda to a method, you need to create a method parameter to accept it. What type is that parameter? A functional interface!

The CandyMatcher interface actually fits the technical definition of a functional interface, and that's why we were able to leave the method signature of the filter method alone when we performed our refactoring.

This method signature:

Collection<Candy> filter(Collection<Candy> candies, CandyMatcher matcher)

And it was still able to have a lambda passed to it as the matcher variable.

But considering that there's an interface for this purpose already provided by the standard library, let's go with that and eliminate the CandyMatcher. The interface the standard library provides is Predicate.

The Predicate javadoc says it "represents a predicate (boolean-valued function) of one argument."

Exactly our use case! So we can now change the method signature of filter and its implementation to use of Predicate:

Collection<Candy> filter(Collection<Candy> candies, Predicate<Candy> predicate) {
    Collection<Candy> results = new ArrayList<>();
    for(Candy candyPiece : candies) {
        if(predicate.test(candyPiece)) {
            results.add(candyPiece);
        }
    }
    return results;
}

The Predicate has a test method whose body we supply by passing a lambda.

Of course, we don't want to call the variable here predicate as that tells us how the code is doing its work but does not tell us what it is doing and why. A better name is the one we were using, candyMatcher. Perhaps candySelector or something like that would work well also. But I chose predicate in the example above so you can see exactly where the new concept is put into practice there.

Calling code now looks like this:

Collection<Candy> candies = CandyFactory.grabBag();
Collection<Candy> redCandies = filter(candies, c -> Color.RED.equals(c.getColor()));
int numberOfRedCandies = redCandies.size();
Collection<Candy> pretzelCandies = filter(candies, c -> Type.PRETZEL.equals(c.getType());

Our code is now modern. It is simpler. It is more expressive. Perhaps, most importantly, it is idiomatic. It is something that other Java developers will easily understand.

We now don't even need the filterByColor and filterByType methods. We delete them. The filter method is simple enough with a lambda that we don't need that extra code.

A small sidebar about variables in lambdas

You may have noticed that we're using the single-character variable name c. This may make you squirm, because variables are supposed to have meaningful names and it makes code more readable when we think about variable naming.

Nevertheless, similar to how single-character variable names such as i and j are commonly used in for loops, lambdas (especially one-liners with a single input variable) are a special case. The type and meaning of the variable is very clear, and the scope is very small. Therefore, it is common for programmers using lambdas to use single-character names like here, where we use c to represent candy.

Modern Java: Streams

Speaking of idiomatic Java, there's actually an even better way to implement the filter method: using a stream.

Actually, one of the Stream class' methods is filter and it does exactly what our filter method does, albeit in a slightly different way.

Before we jump straight there, though, let's talk about streams.

Definition of a stream

A Stream is similar to a collection of objects, but the Javadoc notes that it "differs from collections in several ways." You can read the Stream javadoc to see those differences if you want, but I think it's easier (though incomplete) to think of a Stream as data to which a series of operations can be applied, terminating in some result. This is done with a fluent style where you literally just call one Stream method after another. For example, if you had a Stream you might start by asking for only 100 pieces from that stream, then get the color of those pieces, resulting in a Stream by getting the color of each one, and then finally printing all the colors out, like this:

candyStream.limit(100)
           .map(c -> c.getColor())
           .forEach(c -> System.out.println(c));

As you can see a Stream is different from a collection. With a collection, you always have to decide how you're going to iterate through it and apply different operations piece-by-piece. With a Stream, you need only think of the series of operations you want to apply.

We can use our candy use case to make a practical example. The candy starts in a "grab bag" where there's a mix of colors and types. We apply an operation to filter the candy down to only the red pieces. Then we apply a terminal operation to create a new collection of only those pieces. We can even just total up the number of pieces as the terminal operation to our pipeline, to meet the given use case.

In Java, this looks like:

bagOfCandy.stream()
       .filter(c -> Color.RED.equals(c.getColor()))
       .count();

We only need the .stream() call here because we start with a collection (bagOfCandy). If it had already been a Stream, then that wouldn't have been necessary.

Refactoring our last example to Stream usage

The imperative version of filter looks like this right now:

static Collection<Candy> filter(Collection<Candy> candies, Predicate<Candy> candyMatcher) {
        Collection<Candy> results = new ArrayList<>();
        for(Candy candyPiece : candies) {
            if(candyMatcher.test(candyPiece)) {
                results.add(candyPiece);
            }
        }
        return results;
    }

We can delete this method. Let's just use streams now.

Collection<Candy> bagOfCandy = CandyFactory.grabBag();

long numberOfRedCandies = candies.stream()
                         .filter(c -> Color.RED.equals(c.getColor()))
                         .count();
long numberOfPretzelCandies = candies.stream()
                            .filter(c -> Type.PRETZEL.equals(c.getType())
                            .count();

We realize that modern Java from the standard Java library was all we needed for the QC manager's use case all along!

Why a Stream is better than its imperative cousin

Eliminating code like we just did is a pretty good reason for preferring Streams, but there's an even more important reason than that. We're eliminating code where mistakes can hide and replacing it with code from the standard library--code that is tried and true. This also offloads the
cognitive overhead of the data transformation so that we can concern ourselves with the bigger picture.

Another way to put this is that the Stream is declarative rather than imperative. That is, we're instructing the computer what to do with the data rather than performing the lower-level steps.

By telling the computer what we want, (stream it, filter it, collect it) we no longer write code that we think does one thing, but actually does another. We're not down in the weeds moving through a series of steps and hoping for the result to come out right. We let the computer deal with that.

Conclusion

I hope you enjoyed this article. You have learned about lambdas, predicates, and streams, using a real-world example. Give this article a heart and if there are enough of them, I'll add more articles with practical real-world examples where streams in Java can simplify and improve your code.

Top comments (1)

Forian Weiß • Jan 28 '20

Great read. I really like your writing style.

But I would want to argue against using the variable name c for the lambda parameter as a short for candy. Candy is short enough that there is no real benefit in shortening it, in my opinion.
Also I don't think it's comparable to i in for-loops since i is very generic and c in your example is short for candy.
But I've seen longer streams with longer potential parameter names and counts where it definitely could make sense to shorten them in some way.

DEV Community