That's so Rusty: Ownership

#rust #ownership #gc

To some extent Rust is interesting because of unpopular design choices that achieve the same and sometimes better outcome than other languages. A good example is memory management, specifically heap memory management. Memory management can be achieved in two ways: explicitly or automatically.

Explicit memory management is supported in systems programming languages such as C and C++. With explicit management the programmer has more autonomy, but with great power comes great responsibility. Heap memory can be dynamically allocated and should be deallocated when no longer needed. The timing of deallocation matters; early deallocation may lead to using invalid references, late deallocation causes memory waste. The programmer should also be careful not to free memory more than once. If these requirements are not met, the following bugs can result.

1. Memory leak

Memory leak happens when memory is freed too late. Below is example where memory is not freed at all. new allocates memory on the heap and p points to address of the array. Ideally it should be freed when it is no longer needed but in this case it is not. It is eventually freed as the OS cleans up resources after program terminates. Performance cost would be evident if the pattern is repeated in long running programs or limited memory devices.

#include <iostream>
using namespace std;

int main()
{
    cout << "Before allocate" << endl;
    int * p = new int[10];
    // ... other program logic
    cout << "After allocate" << endl;
    // delete (p); p is not deleted
}

2. Dangling pointer

Dangling pointers are invalid references to memory that has already been released. In the snippet below, returns_pointer() returns pointer to local variable c. Since c is a stack-allocated local variable, following RAII idiom, it is released when it goes out of scope at the end of the function. The pointer returned is therefore invalid and causes a crash. Dangling pointers can also cause unpredictable behavior and security loopholes. These bugs are usually expensive to debug and fix.

// Example program
#include <iostream>
using namespace std;

char * returns_pointer()
{
   char c = 'a';
   return &c;
}

int main()
{
   char * cp = returns_pointer();
   cout << "Result: " << *cp << endl;
}

3. Memory corruption due to double free

Similar to dangling pointers, double free is harmful due to utilizing invalid references. The example below shows how it can cause unpredictability. Since the memory for B is allocated soon after A is released for the same size, B is assigned to same address as A. The second attempt at freeing A has no consequences since it effectively frees B. The program crashes on freeing B since it was already deallocated. Debugging this can be tricky.

#include <iostream>
using namespace std;

int main()
{
   int *A = new int[10];
   cout << "After delete A: " << A << endl;
   delete(A);
   int *B = new int[10];
   delete(A);
   cout << "After second delete A: " << A << endl;
   delete(B);
   cout << "After delete B: " << B << endl;
}

It can be very hard to detect these bugs even with due diligence, code reviews and countless tests, hence the need for automatic memory management.

Automatic memory management abstracts away memory business to let programmer focus on logic. In most languages this is done using a garbage collector. At runtime the garbage collector pauses program execution to clean up objects no longer in use. This eliminates the bugs demonstrated above but at a cost. The most significant cost is performance since program execution is paused periodically, moreover cleanup is non-deterministic and there is less opportunity to customize destructors. This is tolerable for high level languages like C# and Java.

The win win situation would be a middle ground that is both and safe and that is the case for Rust. Rust does automatic memory management without a garbage collector through Ownership.The simple yet powerful principles of ownership are:

All objects on the heap are owned by exactly one owning variable.
When the variable goes out of scope, the object will be dropped.
When objects are assigned to other variables, ownership is moved.

Below snippets illustrates it best.

fn main()
{
    let s1 = String::from("hello");
    let s2  = s1;
    // println!("s1 is: {}, s2 is: {}", s1, s2);

    let s3 = s2.clone();
    println!("s2 is: {}, s3 is: {}", s2, s3);

    print_if_not_empty(s3);
    // println!("is s3 still valid? {}",s3);

    let s4 = returns_string();
    println!("Is s4 valid? {}", s4);
}

fn print_if_not_empty(s : String)
{
    if !s.is_empty()
    {
        println!("String s: {}", s)
    }
}

fn returns_string() -> String
{
    String::from("hello world!")
}

This snippet runs correctly but it won't compile upon uncommenting the print statements. Here is why:

When s2 is assigned to s1, the value of s1 has effectively been moved to s2. s2 points to the heap address of s1 and s1 is no longer a valid variable. Accesing s1 like in the first print statement results in compile error.
s3 demonstrates how to make deep copies without having to move ownership. s3 is a clone of s2 but lives in a different heap location. Ownership is not moved so both s2 and s3 will be valid. Most languages make deep copies by default on assignment which can be expensive for large objects. Making this behavior non-default makes it a little harder to regress performance.
Similar to assignment, ownership is moved on passing objects to functions. When s3 is passed to print_if_not_empty(), its value is moved to local variable s, when s goes of out of scope at the end of the function it is dropped. As you might expect, printing s3 results in compiler error. If s3 is needed after the function call there are alternatives that are out of scope here otherwise, unnecessary clone was avoided here.
Same rules apply to return values from functions. Returning a value, moves its ownership out of local function scope, it is therefore not dropped at end of function. In the example above the string "hello world!" is moved to s4.

At the end of the program only s2 and s4 will be dropped; s1, s3 and s will have been moved. With the guarantee of exactly one owner, all values are deterministically cleaned as soon as they are no longer needed. This also comes with bonus of surfacing errors at compile time.

When I first learned about Ownership, I thought it was overkill. If function calls move objects then coding in Rust must be a lot complicated than necessary. I was also suspicious that no pointers were involved for memory management. Those were of course valid concerns that were addressed with references, a topic for another blogpost. Otherwise, it was clearer to me that Rust embodies safety, performance and error surfacing at every turn. What do you think?

DEV Community

That's so Rusty: Ownership

1. Memory leak

2. Dangling pointer

3. Memory corruption due to double free

Top comments (0)

Read next

Best Programming Language for Web3 in 2024

Why Rust Is a Good Choice for Web3 Development

Build your own SQLite, Part 2: Scanning large tables

Table sorting and pagination with HTMX