DEV Community

Pierre Gradot
Pierre Gradot

Posted on • Updated on

Let's try C++20 | Range-based for statements with initializer

Thomas Köppe wrote the proposal P0614R1 to describe a new feature called "Range-based for statements with initializer". This document has been approved as part of the C++20 standard.

The document is pretty straight-forward since the feature is quite simple. If you have heard of if statement with initializer from C++17, then you have probable already guessed what "range-based for statements with initializer" means.

To understand this new feature, let's say we want a code to print all the elements of a collection. The first idea would be to use a range-based for loop. But let's add another requirement: we want to print the index of the element in the collection. Let's write some code before and after C++20 and compare them.

Before C++20

Using a range-based for loop, the code would look like this:

#include <iostream>
#include <array>

int main() {
    std::array data = {"hello", ",", "world"};

    std::size_t i = 0;
    for (auto& d : data) {
        std::cout << i++ << ' ' << d << '\n';
    }
}
Enter fullscreen mode Exit fullscreen mode

Some people, including me, would argue that i's scope is too large. The variable is still available after the loop and this may not be desirable. Then they would backup to a code like this:

int main() {
    std::array data = {"hello", ",", "world"};

    for (std::size_t i = 0; i < data.size(); ++i) {
        std::cout << i << ' ' << data[i] << '\n';
    }
}
Enter fullscreen mode Exit fullscreen mode

Some people, including me, would argue this code is more verbose, less generic (since not all collections has a size() member function), and has a less explicit intent.

After C++20

With C++20, we can now do this:

int main() {
    std::array data = {"hello", ",", "world"};

    for (std::size_t i = 0; auto& d : data) {
        std::cout << i++ << ' ' << d << '\n';
    }
}
Enter fullscreen mode Exit fullscreen mode

Close to perfection 🙂

A more complex case

The proposal also talks about an undefined behavior that we are likely to run into as we try to reduce the scope of variables with range-based loops. Let's consider this code:

#include <iostream>
#include <vector>

class Foo {
public:
    const auto& items() {
        return data;
    }

private:
    std::vector<const char*> data{"hello", ",", "world"};
};

Foo getFoo() {
    return Foo();
}

int main() {
    for (auto& d : getFoo().items()) {
        std::cout << d << '\n';
    }
}
Enter fullscreen mode Exit fullscreen mode

Something is wrong with this code. Can you guess what?

getFoo().items() is a dangling reference.

We can run this code in cppinsight to understand how the loop is compiled:

const std::vector<const char*, std::allocator<const char*>>& __range1 = getFoo().items();

__gnu_cxx::__normal_iterator<const char* const*, std::vector<const char*, std::allocator<const char*>>> __begin1 = __range1.begin();
__gnu_cxx::__normal_iterator<const char* const*, std::vector<const char*, std::allocator<const char*>>> __end1 = __range1.end();

for (; __gnu_cxx::operator!=(__begin1, __end1); __begin1.operator++()) {
    const char* const& d = __begin1.operator*();
    std::operator<<(std::operator<<(std::cout, d), '\n');
}
Enter fullscreen mode Exit fullscreen mode
  1. A reference to the vector is saved as __range1.
  2. Iterators are created from this reference.
  3. The loop is performed using these iterations.

We can see that the object returned by getFoo() and the vector it contains don't survive the first line. Hence, __range1 is obviously a dangling reference.

I have to be honest: it took me some time to understand the problem here, so I guess I would have run in this UB in a real-life situation... 😒

With C++20, we can write this UB-free code with tight scope:

int main() {
    for (auto foo = getFoo(); auto& d : foo.items()) {
        std::cout << d << '\n';
    }
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

I thought with article would be simple because the addition of an init statement to range-based loop is easy understand. In the end, I have discovered a subtle and not-so-easy to spot UB.

Discussion (1)

Collapse
sandordargo profile image
Sandor Dargo

That's a cool feature, even the if with initializer. By the way, if you have access to boost, you have one more option that works before C++20:

#include <iostream>
#include <array>
#include <string>
#include <boost/range/adaptor/indexed.hpp>


int main() {
    std::array<std::string, 3> data = {"hello", ",", "world"};

    using namespace boost::adaptors;
    for (auto const& d : data | indexed(0))
    {
        std::cout << (d.index() + 1) << " - " << d.value() << '\n';
    }

}
Enter fullscreen mode Exit fullscreen mode