Austin Aigbe for Rust Nigeria

Posted on Feb 4, 2022 • Edited on Feb 6, 2022

Rust-y Memory. How Safe?

#rust #rustc #llvm #memory

Introduction

The Rust programming language is said to be memory-safe but how does it achieve this? Why should we even care about memory safety? It turns out that the major cause of security vulnerabilities in modern day software systems (including Desktop and mobile applications) is due to memory safety issues and to address this, there is a general consensus, which I personally agree to, that a memory-safe systems programming language like Rust is required. This is one of the reasons big tech companies have decided to invest in Rust and re-engineer some of their key products and services with it.

How does `Rust` Achieve Memory Safety?

Memory safety is achieved through three key concepts: ownership (a language feature the compiler uses to free and allocate memory based on the scope of a variable binding), borrow checking and lifetimes. All these analyses are done during compile-time.

For simplicity, I will focus more on the ownership concept and show us how the compiler guarantees memory safety at compile-time using the scope of a variable binding to determine when to allocate and deallocate memory on the stack.

Before we proceed, let's review how our code is analyzed by the Rust compiler (rustc) for memory safety.

Code Compilation With Memory Management

In simple terms, a memory is a storage space (e.g RAM) on your computer where instructions to be executed by the computer's CPU are stored. These instructions are the lines of Rust code you have written and compiled with rustc (the Rust compiler) or cargo into a machine executable file (e.g .exe file format on Windows and elf on Linux). The executable file tells the Operating System how to load your Rust program into memory for execution by the CPU.

Now, let's compile a very simple Rust program with rustc and examine how memory management (ownership) works in Rust.

// hello.rs
fn main() {
    let rust_edition = 2021;
    let message = "Hello, Rust";
    println!("{message} {rust_edition}!");
}

Compiling the above code with rustc .\hello.rs on Windows will produce an executable file,hello.exe (about 148KB in size), in the same directory as the .rs file.

Next, let's see how the hello.exe was produced by rustc and how the memory-safety checks were done.

The Rust compiler, rustc, performed several analyses on the hello.rs file before it produced the hello.exe executable file. I will give a high-level overview of this process so we understand how memory safety is achieved by the compiler and at what phase of the compilation process it is done.

Figure 1: An simplified view of rustc compilation process

Phase 1: hello.rs was translated to basic tokens using the rustc_lexer crate and then to Rust tokens using the rustc_parser crate. Tokens are easier for the compiler to work with than the text format of your .rs file.

Phase 2:Tokens were translated to AST (Abstract Syntax Tree) format. Syntax analysis is done here. Use cargo inspect --unpretty=ast-tree .\hello.rs to examine the AST output.

Figure 2: AST tree of hello.rs with cargo-inspect

Phase 3 The println! macro was refined (or desugared) to a std::io::_print statement and core::fmt::Arguments function calls.The data type of our expressions were inferred and checked. We can say that type safety is guarateed at this phase of the code compilation. The HIR (High-Level Intermediate Representation) is the output of this stage. You can use crago inspect --unpretty=hir .\hello.rs to view the HIR representation of your Rust code.

Figure 3:println! macro desugared in the HIR.

Phase 4 HIR was converted to MIR (Mid-level Intermediate Representation). Ownership, borrow-checking and optimizations are done here. In fact, the MIR shows the scope of each variable and helps the compiler know at what point a variable binding (or ownership) is out of scope and when to dellocate memory from the stack. How the compiler tracks the scope of each variable is indicated in the screenshot below. In fact, the assembly code generated by LLVM (in phase 6) is the machine representation of the MIR (after the ownership, borrow checking and optimizations have been done). In the next phase, the asm .S file is examined to see how the variables are allocated on the Stack when they are in scope and how they are deallocated when they are out of scope. In this phase, memory safety is guaranteed by the compiler.

Figure 4: MIR representation of hello.rs

Phase 5 and 6 This is the code generation phase - LLVM was used to generate the final executable file hello.exe from the optimized and memory-safe MIR representation. Further optimizations can also be done by LLVM.

Let's briefly examine the assembly code (.S file) generated by LLVM. You can use rustc --emit asm .\hello.rs to generate the file. To keep things simple, I will only examine how the ownership memory-safety feature was achieved by the allocation and deallocation of memory on the stack.

Figure 5: A simplified memory layout of the stack from the perspective of the compiler generated main function.

The assembly code for the main function is shown above (in Figure 5). The entry point is not our hello::main function. As we will see later, Rust has a runtime (std::rt::lang_start_internal). This runtime handles a lot of complexities for us that we don't need to bother about when writing our Rust code.

Line 516: The compiler generated main function is the entry point for our program.
Line 518: 40 bytes of memory is allocated on the stack.rsp is the 64-bit stack pointer register for x86_64. It always points to the top the stack.
Line 523: _ZN5hello4main17h0767239aa2b5c6caE is the mangled symbol for our hello.rs main() function. The address of our hello::main function is stored in %rcx and then passed to the runtime function as a reference in line 103.
Line 524: This line calls the Rust runtime function std::rt::lang_start (defined in line 94) and subsequently, the internal runtime function std::rt::lang_start_internal (defined in line 104). Rust has a runtime that executes our hello.rs main() function.

Figure 6: Rust runtime function executes our hello::main function

Figure 7: Execution of hello::main (part 1)

Figure 8: Execution of hello::main (part 2)

Figures 7 and 8 show how sufficient memory (200 bytes) was first allocated on the stack by the compiler before allocating memory to the two local variables rust_edition (an i32), and message (a &str). Before returning from the hello::main function, the compiler deallocates memory on the stack and frees up resources.

Conclusion

We have seen how Rust guarantees memory safety during compile-time by examining the compilation phases and how memory is allocated and deallocated on the stack in assembly code. A very simple Rust program (hello.rs) was used to examine the output of each compilation phase and the memory management of the stack in assembly code.

We also learnt that the Rust compiler generates a main function for us as the entry point of our program. This main function uses the Rust runtime to execute the main function of our Rust program.

In summary, Rust tries to guaranty memory safety during compile time and it does a pretty good job in ensuring this.

DEV Community

Rust-y Memory. How Safe?

Introduction

How does `Rust` Achieve Memory Safety?

Code Compilation With Memory Management

Conclusion

References

Top comments (0)

Read next

Rocket.rs lightspeed web server

Playing with Rust: Building a Safer rm and Having Fun Along the Way

"I Was Bored, So I Brought Rust Enums to TypeScript" - A Tale of Questionable Life Choices

Kubectl Top command:-Secrets behind scenes

Introduction

How does Rust Achieve Memory Safety?

Code Compilation With Memory Management

Conclusion

References

Read next

Rocket.rs lightspeed web server

Playing with Rust: Building a Safer rm and Having Fun Along the Way

"I Was Bored, So I Brought Rust Enums to TypeScript" - A Tale of Questionable Life Choices

Kubectl Top command:-Secrets behind scenes

How does `Rust` Achieve Memory Safety?