DEV Community

Cover image for How does the garbage collection algorithm (Mark-Sweep) work? ๐Ÿ—‘๏ธ๐Ÿงน
Adham Niazy
Adham Niazy

Posted on

How does the garbage collection algorithm (Mark-Sweep) work? ๐Ÿ—‘๏ธ๐Ÿงน

You might have heard the word "Garbage Collection" before while learning a programming language, So Let's discuss the meaning of this word before digging into the algorithm.

(NOTE: All code examples using JavaScript)

What is the problem we have?

Let's have a look at the following example.

const age = 23;
let person = {
  fullName: 'Adham Niazy',
  age: 23
};
Enter fullscreen mode Exit fullscreen mode

You could think any programming language should handle the first two lines the same way because we are just creating two variables.
But actually, they are a little bit different from each other. Let's see why.

Memory Management Illustration

The part on the left is for visualizing how memory saves each variable with its value, so when we created a variable called age, the program found that its value is a simple value of 23.

NOTE: (Booleans, Strings, Numbers, Undefined) are simple data types

But when we created a variable called person, the program found that its value isn't simple but complex!

NOTE: (Objects, Arrays) are complex data types

So, in this case, let's imagine that we have another type of memory on the right part called Heap, this part will hold all complex data types and will give the main memory only an address (Reference) of the stored data. So when we try to access person variable, the main memory will find an address, following this address to reach the actual spot in the Heap.

The real problem is when this happen. โ†“

// Problem case
person = 'Anything';
Enter fullscreen mode Exit fullscreen mode

If you understood the last part well, you might have guessed the problem that happened here.

Lost Ref

Reassigning the person variable with another value will remove the address (Reference) that points to the Heap! but the critical point is that the person object still exist in the Heap and haven't been removed. (This will cause a memory leak!!).


What is the solution?

The solution depends on the implementation of the language itself.
Example C++ doesn't handle this case and leave it for the programmer to take care of removing the object from the Heap manually before reassign the variable.

Languages Like JavaScript, Java, Python uses automatic memory management. They allocate memory when needed and deallocate it when no longer used. The deallocation process is known as Garbage Collection

Garbage Collection has many algorithms to implement but the most popular algorithm is Mark-Sweep.


Mark-Sweep Algorithm โœ”๏ธ๐Ÿงน

Mark โœ”๏ธโœ”๏ธโœ”๏ธ:
Starting from the root reference, we traverse the entire reference graph in our program. Think of the root reference as a global object from which all other "Used" objects in our program are reachable via an existing reference.
While traversing the root reference we will mark all objects we use in our code.

// Mark
const objects = [root];
while(objects.length) {
  const currentObject = objects.pop();
  currentObject.mark = true;
  pointers(currentObject).forEach(ref => {
    objects.push(ref);
  });
}
Enter fullscreen mode Exit fullscreen mode

Sweep ๐Ÿงน๐Ÿงน๐Ÿงน:
We traverse the entire Heap. If a particular memory block is marked, we reset the mark flag for the next iteration. And If the block isn't marked this means that we no longer uses this data so we can deallocate this memory block. This phase is known as "sweep"

// Sweep
let current = heap;
while(current) {
  if (  current.mark ) {
    current.mark = false;
  } else {
    free(current);
  }
  current = current.next;
}
Enter fullscreen mode Exit fullscreen mode

โš ๏ธ The implementation above is incomplete: for example, it doesn't handle circular references. Its only purpose is to demonstrate the garbage collection algorithm in familiar syntax. and let you know how languages implemented this feature


And that's it! Hope you enjoyed this article ๐Ÿค

Top comments (0)