DEV Community

Lucas
Lucas

Posted on • Originally published at heylucas.net on

3 ways to filter through collections in C++

I spent some time last week reviewing a teammate’s diff (what we call pull-requests/patches where I work). The idea was pretty simple: The system processes a list of devices (i.e., hosts in our data centers) and applies a set filters to it.

Their code was straightforward: They had an abstract class called Filter that specified the interface. Each filter, in turn, received a vector of devices to process and spit out the filtered results. It looked something like this:

// Represents a host in the data center.struct Device;class Filter {public: ~Filter() {}; virtual std::vector<Device> filter(std::vector<Device>&& devices) = 0;};class MaintenanceStatusFilter : public Filter {public: std::vector<Device> filter(std::vector<Device>&& devices) override { return std::move(devices) | ranges::actions::remove\_if(somePredicate); }};class RackFilter : public Filter {public: std::vector<Device> filter(std::vector<Device>&& devices) override { return std::move(devices) | ranges::actions::remove\_if(someOtherPredicate); }};
Enter fullscreen mode Exit fullscreen mode

Upon first inspection, that code is absolutely fine. ranges::actions::remove_if makes understanding the code a breeze.

The problem is what happens when we chain filters. When you do that, with the current code, you end up reallocating the vector whenever a filter was applied:

MaintenanceStatusFilter maintenanceFilter;RackFilter rackFilter;auto filteredDevices = rackFilter.filter( maintenanceFilter.filter(std::move(devices)));
Enter fullscreen mode Exit fullscreen mode

How can we avoid that?

Views

C++ ranges offer a handy feature called View. Views differ from iterators in that they can be easily composed and are lazily evaluated. You can chain multiple of them and the result only gets computed when it is needed.

If we change our filters to be simple functions that look at a device and decide whether they should be removed or not, our code can be as simple as:

using ranges::views;auto filteredDevices = std::move(devices) | views::remove\_if(maintenanceStatusPredicate) | views::remove\_if(rackPredicate);
Enter fullscreen mode Exit fullscreen mode

It doesn’t get much simpler than that.

But what if I can’t use ranges?

C++ ranges are fairly recent and chances are, if you’re working in an older code base, you won’t be able to use it. Or maybe you’re working with some other programming language that doesn’t have that sort of functionality. What do you do?

Here’s a few things you can try.

Use a driver

The idea is simple: Change our filter classes to only operate on a single device at a time, and move the logic to apply all the filters to a driver. This allows us to decouple the filtering logic from the “iterate-over-the-vector” logic.

Here’s what the filter classes would look like:

class Filter { ... virtual bool filter(std::Device& device) = 0;};class MaintenanceStatusFilter : public Filter {public: virtual bool filter(std::Device& device) override { return somePredicate(device); }};class RackFilter : public Filter {public: virtual bool filter(std::Device& device) override { return someOtherPredicate(device); }};
Enter fullscreen mode Exit fullscreen mode

Here’s what the driver code looks like:

class Driver {private: std::vector<Filter\*> filters\_;public: Driver(std::initializer\_list<Filter\*> filters) : filters\_(filters) {} std::vector<Device> operator()(std::vector<Device>&& devices) { std::vector<Device> result; for (const auto& device : devices) { bool keep = true; for (Filter\* filter : filters\_) { if (f->filter(device)) { keep = false; break; } } if (keep) { result.push\_back(device); } } return result; }};
Enter fullscreen mode Exit fullscreen mode

Using the driver:

MaintenanceStatusFilter maintenanceStatusFilter;RackFilter rackFilter;Driver driver = {&maintenanceStatusFilter, &rackFilter};auto filteredDevices = driver(std::move(devices));
Enter fullscreen mode Exit fullscreen mode

This option is quite verbose, but I find it quite easy to follow. The driver takes a list of filters to apply and, when called, applies the filters to each device. If a filter matches, it short-circuits the whole process and excludes the device.

Template composition

This one might be the least straightforward to implement, but using it is trivial.

Just like in the previous example, we will change our filters to operate on single devices.

We can define a template class that allows us to chain our filters into a single one. That class will be responsible for ensuring all chained filters get applied.

template<typename F0, typename... F>class Chainer : Filter {private: F0 f\_; Chainer<F...> rest\_; public: Chainer(F0 f, F... rest) : f\_(f), rest\_(rest...) {} bool filter(const Device& device) override { if (f\_.filter(device)) return true; return rest\_.filter(device); }};template<typename F>class Chainer<F> : Filter {private: F f\_;public: Chainer(F f) : f\_(f) {} bool filter(const Device& device) override { return f\_.filter(device); }};template<typename... F>Chainer<F...> chain(F... fs) { return Chainer<F...>(fs...);}
Enter fullscreen mode Exit fullscreen mode

Here’s how we apply it:

auto filter = chain(MaintenanceStatusFilter(), RackFilter());std::vector<Device> filteredDevices;for (const Device& device : devices) { if (!filter.filter(device)) { filteredDevices.push\_back(device); }}
Enter fullscreen mode Exit fullscreen mode

This one is quite a mouthful. Basically what we did was define a “recursive” template class that can take multiple filters as input. We specialize the class for the case it receives a single filter.

The Chainer class is also made a child of Filter so we can use it as such. When filter is called on a Chainer, it will apply all its chained filters to the given device. If an early filter says the device should be removed, we short-circuit the whole process and don’t invoke the remaining filters.

The chain function is just a convenience so we can avoid typing the templated types.

The usage code is quite simple, not that different from the driver code we had before.

In conclusion

I can’t lie, C++ ranges are a great addition to the language and I can’t wait to use it in more places. That said, it is still important to know how to accomplish the same thing in different ways.

How about you? What are your thoughts on C++ ranges? Can you think of other ways to solve this problem?

Top comments (0)