Resource Acquisition Is Initialization

#cpp #programming

Article();

Under Construction and Destruction

C++, unlike many modern languages, has no automatic memory management within the language - at least by default.

We'll use this class as an example:

class Foo {
public:
  Foo() {
    std::cout << "Foo(): Instance created" << std::endl;
  }
  void method() {
    std::cout << "Hello, World!" << std::endl;
  }
  ~Foo() {
    std::cout << "~Foo(): Instance destroyed" << std::endl;
  }
};

It's going to print when the constructor is called - when an instance is created - and when the destructor is called - when an instance is destroyed.

For example, look at this function:

void fn() {
  auto foo = new Foo;
  foo->method();
}

Every time that's called, a new Foo instance is created on the heap, and... lost. There's no way to get a reference to it again. It's still there, forgotten, but not gone.

If you call that function, you'll just see the "Instance created" text printed.

If you create an instance without the new operator, though, it gets created on the stack:

void fn2() {
  Foo foo;
  foo.method();
}

Every time that function is called, an "automatic" instance is created, and then destroyed as we leave the function. More accurately, it is destroyed when the name goes out of scope.

Unlike a lot of languages - Java for example - the instance is destroyed completely at a defined moment. There's no garbage collection going on, here - the compiler has inserted the code explicitly for us.

We can emulate this with some additional code, and the non-standard alloca() function that can get us some raw stack space:

void fn3() {
  auto foo = reinterpret_cast<Foo *>(alloca(sizeof Foo));
  new (foo) Foo; // In-place constructor call.
  foo.method();
  foo->~Foo(); // Manual destructor call.
}

This is, of course, both horrendous and pointless, so we never have to do that. But the point is that we know precisely when the destructor is called. In fact, with additional objects in play we still know - objects are always destroyed in the reverse order they were constructed. It's all delightfully deterministic.

Quite Exceptional

My fn3 doesn't actually work in all cases, though. If we throw an exception in Foo::method, the destructor would never be called. Yet in fn2, that would still work perfectly.

You can have multiple return statements, and it'll still just work.

There's simply no way to leave that function without the destructor being run. (Well, this is C++ - there's probably some terrifying ways to fox the compiler badly enough, but you'd have to try really hard)

And those destructors are just functions. You could put anything in them you want.

Point Smarter

Creating objects on the heap is useful - stack size is somewhat limited, and objects on the heap can deliberately be managed in novel ways to control their lifetime. But creating objects on the heap means the compiler does lots of automated management for us.

Modern C++ includes a suite of "smart pointers" that allow us to have the best of both worlds. They're all templatized classes, and those can be a little tricky to follow. Let's make one just for Foo:

class foo_ptr {
private:
  Foo * m_ptr = nullptr;
public:
  foo_ptr() = default;
  foo_ptr(Foo * f) : m_ptr(f) {}
  ~foo_ptr() {
    if (m_ptr) delete m_ptr;
  }
  Foo * operator ->() {
    return m_ptr;
  }
};

If we try our first function now, but use this smart pointer, we end up with a different behaviour:

void fn4() {
  foo_ptr foo = new Foo;
  foo->method();
}

This function has the same look to it as the original fn(), yet behaves like fn2() - if you run it, you'll see it prints the same as fn2(), but there are no leaks. This is because the compiler automatically destroys the foo_ptr, and it in turn manually destroys the Foo.

In case you're wondering, this creates no additional overhead at runtime. The actual machine code run will be identical with fn4() and this:

void fn5() {
  auto foo = new Foo;
  foo->method();
  if (foo) delete foo;
}

The standard library gives us four main smart pointer types we can use:

std::unique_ptr<Foo> holds Foo for us, and deletes it when it goes out of scope. You can't copy them, but you can move them (and, therefore, return them from a function).

std::shared_ptr<Foo> can be copied, and will destroy the Foo when the last copy goes out of scope. This is basic reference counting - a small amount of overhead but a large amount of utility and safety.

std::weak_ptr<Foo> doesn't hold the Foo for us, instead just knows whether the Foo held in a std::shared_ptr has gone out of scope or not.

std::observer_ptr<Foo> is essentially a non-smart pointer, used to indicate to other developers that you really did intend to avoid using one of the others.

Smart pointers allow us to turn manual memory management into automatic memory management. But memory is just one resource - what about others?

Highly Resourceful

In our foo_ptr, we allocated the memory we wanted during initialization of an object. This meant that the object's lifetime - which, being a stack object, is automatically controlled by the compiler - also corresponded to how long we kept the memory for.

Memory leaks are so painful in programming that many languages directed their efforts toward this one resource problem - almost every modern language uses garbage collection of some kind as a result. But not C++ - instead, the language merely provides ways of managing memory that are equally applicable to almost anything else.

Many languages suffer if you don't close a file, for example. Languages instead tend to get you to construct a scope for the object, like Python and with, or Java's try (...) constructs.

With C++, though, files are opened at initialisation and closed automatically the instant they fall out of scope. A simple ifstream implementation - ignoring actually reading, mind - might look like this:

class foo_file {
private:
  int m_fd = -1;
public:
  foo_file() = default;
  foo_file(std::string const & s) {
    m_fd = open(s.c_str());
  }
  ~foo_file() {
    if (m_fd >= 0) close(m_fd);
  }
};

Now instead of freeing memory automatically, the compiler will close files for you.

C++ doesn't have a synchronised block, like Java's. It turns out it's just not that useful. Instead, you can do this:

// Somewhere:
std::recursive_mutex m_mutex;

void fn6() {
  std::lock_guard<std::recursive_mutex> l; // Give this any name.
  // Do strange thready things.
}

lock_guard is another RAII class - its initialization acquires a lock, and when it falls out of scope it automatically frees it.

Exceptions, early returns, whatever - the compiler knows and will handle it for you.

Resource Acquisition is Initialization

RAII is such a fundamental aspect of C++ programming that it's worth spending time really understanding it. It gives us reliable management of memory, file handles, locks, and all other system resources. It does so in a way that's exception-safe, and highly flexible.