DEV Community

Cover image for C++ lambdas, for beginners
Gerardo Puga
Gerardo Puga

Posted on

C++ lambdas, for beginners

What this all about

When C++11 came out it came with a lot of goodies, and one of the best additions to the language repertoire were lambda functions.

Most people's first exposure to lambda functions in C++ is in the way of "throwaway functions" scattered around the code, because that's what lambdas are used for most of the time. Take for instance:

auto addition = [](int x, int y) { return x + y; };
std::cout << addition(4, 3) << std::endl;
std::cout << addition(99, 1) << std::endl;

Here a lambda function is created to perform a simple addition between two numbers, and then the lambda gets called a couple of times with different input arguments each time. When execution exits the scope where the lambda object was created, the lambda will be destroyed just like any other local variable.

Lambdas can do more than that, though. Lambdas are executable objects, and just like other objects, they can be moved around, copied, stored in containers, passed as arguments, etc. Lambdas can carry a context with them, so that they can be created in one place and transferred to another for execution. Lambdas can be stateful, and can also be linked to the state of an external object.

It must be said that lambdas don't really bring anything radically new to the C++ table. Everything that can be done with lambdas today could already be done with pre-C++11 versions of C++ using good-old functor objects (objects that can be called like functions), if you bothered to do the effort to write them.

The real advantage of the lambda notation, though, is that it provides a compact way to describe an executable object in such a way that the compiler can do the implementation for us. The programmer can then focus on the what instead of on the how, and explore design possibilities that might have been avoided otherwise

During the rest of the article I'll try to introduce you to how lambda functions get declared, and what the ins and outs of variable capture are.

Please notice that the article is aimed at beginners trying to "get it" and/or intermediate users looking for a refresh. If you already "get" lambdas, you can look try the references in cite at the very end for better material.

A Lambda's full name

Lets start by getting semantics out of the way.

In reality, there's no such thing as a lambda. "Lambda" is just a colloquial term to that, depending on the context, will usually refer to one of three things.

  • A lambda expression.
  • A closure object.
  • A closure class.

All three of them are present in the following sentence:

auto addition = [](int x, int y) { return x + y; };

Here,

  • [](int x, int y) { return x + y; } is the lambda expression.
  • addition is the closure object.
  • You can't see the closure class, but it's there in the form of the type of the closure object.

Whenever you hear someone talking about a lambda, they are most likely talking about one of the three above, and in most cases it'll be one of the first two (the third one gets far less press time).

I'll keep it light and talk about lambdas when it suits me better, but I'll use the terms above when the distinction is important.

Analysis of the lambda expression

The lambda expression is the most visible expression of the lambda in the code.

The structure of the lambda expression is

[...capture list...](...parameters...) -> <return type> { ... body ... }

but in practice, it will be a lot simpler than this. In fact, the humblest of all lambda expressions is

[]{}

which defines a lambda that defines captures nothing, takes in nothing as parameters, does nothing, and returns no data; that lambda is, in other words, equivalent to this function:

void f() {}

Just like for function declarations, the code of the lambda is enclosed within the {} brackets, and the input parameters within the () pair. There's not a lot to say about either of these components of the lambda, since they follow both in form and in function from their equivalents on the function declaration.

The () of the parameters list can be completely omitted when the lambda is not meant to receive any parameters. That's why the humblest lambda is []{} and not [](){}. If a return type is specified, however, the () needs to be written even if the parameter list is empty.

I want to talk at length about captures, which is what separates lambdas from just being throwaway functions, but allow me to get something out of the way first and discuss the return type of the lambda first.

The return type of the lambda

The return value type of the lambda can be omitted from the expression, and most people will not write it. When the type is not present the compiler will infer it from the rest of the expression.

I won't be writing the return type in the rest of the examples in this articles unless I really need it. That's why I wanted to task about this early: it's less typing for me.

Only in a minority of cases you'll see lambda expression that fully state the return types of the lambda. In those cases it will usually be because of one of two reasons:

  • The lambda body is such that the return type is ambiguous to the compiler.
  • The coder wants to coerce the return type into something different from what the compiler would have inferred.

Whenever the return type is not written in the lambda expression the compiler uses a simple but effective set of rules to determine what type you meant to return:

  • If there are no return statements, or if the ones present are bare returns with no return value (such as return;), then the lambda is assumed to return void.
  • If there's a single return statement and it has a return expression, the return type will be the type that results from evaluating the expression.
  • If there are multiple return statements and all of them have a return expression that evaluates to the very same type, that will be the return type of the lambda.
  • If none of the above is the case, the compiler will give up and then you'll have to explicitly state the return type.

A few examples:

This declares a lambda that just returns void:

auto f1 = [](){};  // could be just []{}

Here the return expression evaluates to int, and therefore the return type of the lambda will be the same type:

auto f3 = [](int n) { return n; };

The return type in the expression here evaluates to bool, so that'll be the type of the return value:

auto f4 = [](int n) { return n > 5; };

There are two return expressions in the following example, but both are bool. Since there's no ambiguity the return type will be bool:

auto f5 = [](int n) {
  if (n == 5) {
      return true;
  } else {
      return false;
  }
};

In this case there are two return statements with expressions that evaluate to different types (bool and int). In this case the programmer needs to state state the return type explicitly, because it's not possible for the compiler to decide what the return type should be:

auto f6 = [](int n) -> bool {                
  if (n < 0) {
      return false;
  } else {
      return n; /* returns true if n > 0 */
  }
};

In this case we state the return value to coerce the return type to be a bool instead of an int.

auto f7 = [](const std::vector& n) -> bool { return v.size(); };

The capture list

The capture list of a lambda is a list of variable names in the enclosing scope of the lambda that have to be captured to become accessible within the lambda.

In a lambda expression the capture list is written between the square brackets []. It is not optional, and the brackets must be present even if the capture list is empty.

A capture creates a variable within the scope of the lambda body that has the same name as the variable being captured, and which can be (depending on the mode of capture) either a copy of the external variable, or a reference to it.

Captures that generate a copy of the external variable will be called by-copy captures, while the ones that generate a reference to an external variable will be called by-reference captures.

  • A by-copy capture creates a variable within the scope of the lambda that is a copy of the variable being captured, with the same value the later had at the time the lambda was created. By-copy captures are read-only, unless the lambda is mutable (more on that later).
  • A by-reference capture stores a reference to the variable being captured, which can be used to read or update the value of the external variable at any time during the lifetime of the lambda.

To ground those definitions with an example, take a look at the code fragment below. In it two local variables get defined, followed by the definition of a lambda function that captures them both.

int foo = 33;
int bar = 22;
auto within_range = [foo, &bar](int n) {
  return foo + bar;
}

The capture list in the lambda expression captures both variables: foo is captured by-copy, while bar is captured by-reference.

It's important to realize that that while the names of the captured variables are the same as the names of the external variables that they mirror, they are different variables. i.e. the foo variable within the body of the lambda is not the same foo outside.

As you probably guessed from the example, the capture mode is declared by prefixing by-reference captures with &. By-copy captures, on the other hand, have no prefix at all.

Additionally, there are two default capture modes that allow us to capture every variable that is used in the body and not explicitly mentioned in the capture list:

  • A bare & will capture by-reference anything used but not explicitly captured by-copy.
  • A bare = will capture by-copy anything used but not explicitly captured by-reference.

In a typical capture list you'll find a mixture of default capture modes with named captures. There are a few rules to this mix, however.

  • The default modes = and & cannot be both present.
  • If a default mode is present, it must lead the list.
  • If a = default capture is present, any named capture that follows must be by-reference.
  • If a & default capture is present, any named capture that follows must be by-copy. The rules make sense if you think about it, so it's not really necessary to worry about them too much.

These are some examples of typical capture lists:

  • [] Empty capture list, nothing will be captured.
  • [foo] Capture foo by copy.
  • [&bar] Capture bar by reference.
  • [foo, &bar] Capture foo by-copy and bar by-reference.
  • [=] Capture anything named from the enclosing scope by-copy.
  • [&] Capture anything named from the enclosing scope by-reference.
  • [&, foo, bar] Capture anything named from the enclosing scope by reference, except foo and bar which must be captured by-copy.
  • [=, &foo] Capture anything named from the enclosing scope by copy, except foo which must be captured by-reference.

Mutable lambdas

By default by-copy captures are not writable, and therefore the following fragment is an error:

int value;
auto bad_lambda = [value]() { value += 10; };

By-copy captures can be made writable if the lambda is declared as mutable. This makes the lambda stateful: any change you do to a by-copy capture will be carried over to the next execution of the same lambda.

For example, in this example the lambda will remember any update to the value of the captured initial_value variable. The value of the external variable, however, will remain unchanged because the lambda updates a copy.

int initial_value{5};

auto counter_lambda = [initial_value]() mutable {
    std::cout << initial_value++ << std::endl;
};

// each call will increment the internal copy
// stored within the lambda, and change carry over to 
// the next call.
counter_lambda(); // will print 5
counter_lambda(); // will print 6
counter_lambda(); // will print 7

// the original variable outside of the lambda is unchanged
std::cout << initial_value << std::endl;

by-reference captures, on the other hand, can be both read and written regardless of whether the lambda is mutable or not.

int total{0};

auto accumulate = [&total](int n) { total += n; };

// each call updates the value of the references variable
accumulate(1);
accumulate(2);
accumulate(3);

// print the accumulated value, 6
std::cout << total << std::endl;

Generalized captures (C++14 and beyond)

All the talk so far has been about captures as they were introduced when C++11 came out.

These captures work great, and there's a lot you can do with them, but after a while you'll notice that there are a couple of ways in which they fall short:

  • A capture always has the same name as the variable that was captured. This is not a big deal, of course, but sometimes you'd like to be able to name them something else.
  • To capture a value it needs to be previously stored in a variable; it's not possible to capture the result of an expression.
  • You can't use move semantics with captures. Captured objects need to be copyable; if they are not then you'll need to capture them by-reference, which may create a problem of ownership, or you'll need to do some other trick. This may be particularly annoying if you use unique_ptr a lot.

To mend this, C++14 upgraded lambdas with generalized lambda captures which

  • allow you to name the internal name of the capture anything you like.
  • allow you to capture not only variables, but also the result of expressions (only by-copy).
  • more importantly, allow you to capture move-only variables like unique_ptr instances.

The price to pay for these welcome improvements is that generalized captures are bit more verbose than regular captures because you need to state both the name for the variable being captured, and the name of the capture variable created within the lambda. The syntax is:

  • internal_name=expression for by-copy captures.
  • &internal_name=external_name for by-reference captures.

For example, this example uses generalized captures to capture counter by reference (naming it cnt within the lambda), and also captures the result of 3 * mean_level by copy (naming the result limit within the lambda).

int mean_level{5};
int counter{0};

auto f = [&cnt = counter, limit = 3 * mean_level]() {
  if (cnt < limit) cnt++;
};

To capture a move-only object, you just need to make sure the right side of the by-copy assignment is an rvalue, which can be done by providing a temporary value, or by using std::move:

auto adapter = std::make_unique<Adapter>();

auto runner = [adapter = std::move(adapter)]() { adapter->run(); }

The extra verbosity of generalized lambda captures is a very small price to pay given that you're not even required to pay for it: you can still use regular C++11 captures when that suits you better, and mix generalized and regular captures to get the best of each:

auto f = [&counter, limit = 3 * mean_level]() {
  if (counter < limit) counter++;
};

What can and cannot be captured

Earlier I said that only variables in the immediate local scope of the lambda can be captured. I mentioned it only in passing, and it probably flew under the radar when I said it.

However, this not a minor detail or a technicality, and to see why lets see what cannot be captured.

  • Global scope variables and static data members can't be captured.
  • Non-static class members can't be captured directly.

The first one may strike you as obvious if you think of it, since globals are accessible from within any function, and lambdas are function-like ("callable") objects, there's no reason for them to be an exception to this.

Still, you should keep in mind that globals must be regarded within a lambda just like by-reference captures are. This may have important implications in multi-threaded programs.

Static class members are just globals in disguise, so it's no surprise that as far as lambdas are concerned, they have the exact same restrictions.

Non-static class members cannot be captured either, but here the truth is more nuanced: actually they can, kind-of, but they cannot be captured in the same sense in which captures work for regular local variables.

Before we dig deeper into this, however, we need to take a short detour to talk about the pointer this.

Captures and the this pointer

During execution each non-static class method has access to an implicitly created this pointer that references the instance on which the method was called. This is what gives methods access to non-static data members of the class.

Typically you don't need to de-reference that pointer explicitly since the compiler will do it for you, but you can if want to be more explicit. For instance, in this fragment

class Value {
 public:
  void set(const int x) { x_ = x; }
  int get() const { return x_; }
 private:
  int x_;
};

we can make the dependency on this more visible by rewriting both methods as

void set(const int x) { this->x_ = x; }
int get() const { return this->x_; }

It's important to notice that this does not change the way the code is compiled, it only makes explicit what the compiler is doing behind your back.

This detour to talk about this (pun intended) is because now that we have unmasked how access to data members works it's easier to understand the nuanced version of how non-static data member capture works.

Non-static class members cannot be captured because they are not in the local scope around the lambda, they are in the class scope.

However, the this pointer can be captured, because it is a variable within the immediate scope of lambdas that get created in non-static methods of a class! By capturing this the lambda gets access to all non-static data members and also to instance methods.

In order to capture this, you just add it to the capture list:

auto is_empty = [this]() { return queue_.empty(); } 

It's important to not that this is not a regular variable, and it's nature imposes a limitation to the capture process: this can only be captured by-copy; trying to capture the pointer like in [&this] is syntactic error. Default capture modes will also capture this, but notice that even if this is captured by the & default capture mode, the pointer will still be copied.

Now, here comes the catch: remember that this is not the instance, this is a pointer to the instance. You're not capturing the instance by copy, your only copying a pointer to it.

This is really important, because it means that any access to the instance members are still reference-like: by de-referencing this the lambda is accessing the original external variables, not copies of them!

This is the reason some sources explain the capture [this] as capturing the object by reference. It's not exactly true, but it's close enough.

Now, this detail can bite you hard, especially if you fall for thinking that [=] means that "everything gets captured by-copy", as in the following example:

#include <iostream>
#include <functional>
#include <string>

using namespace std;
using Filter = std::function<bool(const std::string &)>;

class FilterFactory {
 public:
  Filter buildFilter(const string &name) {
    name_ = name;
    return [=](const std::string &name) { 
        return (name == name_);
    };
  }
 private:
  std::string name_;
};

int main() {
    FilterFactory factory;
    auto filter_adam = factory.buildFilter("adam");
    auto filter_eva  = factory.buildFilter("eva");
    cout << filter_adam("adam") << endl; // should have returned true, but returns false
    cout << filter_adam("eva") << endl;  // should have returned false, but returns true
    cout << filter_eva("adam") << endl;
    cout << filter_eva("eva") << endl;
}

Here the unsuspecting programmer may have expected [=] to capture everything by-copy, including class members. That would have made each lambda completely self-contained and independent of the original factory instance.

In reality, however, the default capture mode is not capturing name_ at all; it is this that's getting captured, and the access to name_ is actually being done through this->name_.

The code builds, but behavior is not what the coder expected. For all practical purposes name_ is being accessed by reference, allowing lambdas to "see" changes to the variable value that are done after the creation of the lambda.

And it gets worse: if the factory variable is destroyed before the lambdas, the this pointer stored in the lambdas becomes invalid and any access to the data members of the instance it used to point to becomes undefined behavior.

An intuition of the closure class

Before closing the article I'll talk a bit about the closure class. The idea is not to be rigorous, but just to provide the reader with an intuition of how lambdas get implemented by the compiler under the hood.

The first thing to state is that there's no single closure class. A custom closure class gets created automatically by the compiler for each lambda expression.

These compiler-generated classes cannot be seen or changed, because only the compiler knows what they look like. That's why it is frequently said that the closure objects have an anonymous type.

To take a peek under the hood we can implement our own closure class from the description in the lambda expression. Without loss of generality, lets say we were asked to compile a fragment like the one in the following fragment:

int foo;
bool bar;

auto lf = [foo, &bar](int factor) { return foo * bar * factor; };

From the expression we can deduce that upon construction the lambda needs to capture two variables, one by copy (an integer) and another by reference (a boolean). The lambda objects that get instantiated from the expression need to be callable with a single input parameter (an integer) and must return a value after execution (another integer).

A possible implementation of the closure type for the lambda in the example above would be the following one:

class ClosureType {
 public:
    ClosureType(int foo, double &bar) : foo_{foo}, bar_{bar} {}

    double operator()(int factor) const {
        return foo_ * bar_ * factor;
    }
  private:
   int foo_;
   double &bar_;
};

Where you can see that:

  • Captures become closure class constructor parameters.
  • By-copy captures like foo are stored within member variables in each instance.
  • By-reference captures like &bar, are not stored themselves, but a reference to them is stored in the class.
  • By overloading operator() the closure objects produced by the class become callable objects.
  • The code body, return type and parameter list of the lambda expression become the body, return type and parameter list of the operator() overload.
  • In this case the operator() overload is a const method because the lambda is not mutable.

In the closure example above it is readily visible that captured variables get read on construction, while lambda parameters get passed to the lambda on execution.

While not very rigorous, our example above is good enough to get an idea of what a closure class may look like and how each part of the expression affects the implementation of the lambda.

The closure class example shows how the number and mode of captures impacts on the size of the closure objects that get instantiated from it: lambdas are more than just code (like a function would be), they carry a context with them, and the size of the context depends on the size and type of the captures.

By-reference captures are light, and only add a pointer to the size of the lambda object, but they don't guarantee that the captured object will exist at least as long as the lambda exists. By-copy captures, on the other hand, do guarantee the lifetime, but they can be heavy to move around because of the copy operation. Moving around lambdas will be at least as expensive as the most expensive by-copy capture they own.

In reality compilers can use a variety of implementations for lambdas depending on what's better suited for each case in particular. In particular, if a lambda does not capture any variable then it's usually cheaper to implement it with a simple anonymous function instead of an anonymous class; closure objects in this case become just pointers to that anonymous function.

That's why no-capture lambdas can be assigned to variables of correct pointer-to-function type variable, but lambdas that capture variables cannot.

int factor = 2;
auto no_capture_lambda = [](int n) { return 2 * n; }; 
auto with_capture_lambda = [factor](int n) { return factor * n; }; 

int (*f_ptr_1)(int) = no_capture_lambda;   // this is ok
int (*f_ptr_2)(int) = with_capture_lambda; // this fails to build

Conclusion

There's more to say about lambdas, of course, but this is probably enough for a good first bite.

For a deeper coverage you can visit the somewhat terse but extremely complete cppreference.com page on the topic. There you will find also lots of information on new developments in lambda functions in C++17 and beyond, which is something I intentionally left out.

If you have access to a copy of Effective Modern C++, go read it. If you don't, go get it. The chapter on lambdas is extremely well written and informative, just like the rest of the book. You'll find there a neat trick to use move semantics with lambdas when generalized captures are not available in your compiler.

I did not get a lot into passing lambdas around, but I did try to put some emphasis on the problems caused by capturing stuff by reference, either intentionally or by mistake. Understanding that is extremely important to safely execute lambdas in a different context from the one they were created in, and to determine how they will interact with other objects in multi-threaded code.

I hope you enjoyed this, until next time!

Top comments (2)

Collapse
 
fleuryleveso profile image
fleury le veso

generously detailed article indeed

Collapse
 
srisham profile image
Sri Balaji S

Nice article.