DEV Community

Zex
Zex

Posted on

Let's Talk About Garbage

Our city is seriously working on waste classification which is good for resource recycling and environmental protection.🐒 All waste must be dumped into the right place. Since July 1st, those
individuals don't obey the regulations could be fined up to 200RMB, for organizations could be up to 500K RMB.😢

Resource Management

Good developers borrow resource from the system and release them when work is done. But human being is lazy animal, so languages that "automatically" recycle the waste are created. Therefore, developers can focus on their business without bothering the resource stuff.

C++ doesn't have language level garbage collection due to implementation difficulties and performance consideration. A set of smart pointers are created as part of STL to serve the resource management purpose.

Languages like Java, Go, Python and JavaScript, performs GC automatically.

GC Techniques

Reference Counting

Maintains a counter for each object to denote references to this object. When an object is not referenced by anyone it would be freed. This leads to cyclic reference issue, meaning when two objects are referenced by each other, neither of them could be freed eventually.

Here come a workaround called weak reference, which doesn't protect the referenced object being recycled. It doesn't increase the reference count for the referent object.

Mark and Sweep

As a tracing garbage collector, it contains two stages: mark stage and sweep stage.

First of all the garbage collector walk through the objects graph from the root, find out those reachable objects and set the mark bit to 1, which is 0 by default.

Secondly, scans all the objects, recycle those unreachable (marked as 0), set the mark bit of those reachable to 1.

The downside of this method is it stops the program execution for quite a while.

Tri-Colour

It uses three colours, white, grey and black to differentiate states of objects.

  • The white set contains candidates to recycle.
  • The grey set contains objects that reachable from the root and have references to objects in the white set.
  • The black set contains objects that reachable from the root, have no reference to objects in the white set and aren't candidates to recycle.

Objects can only be moved from grey to black and from white to grey.

The algorithm is demonstrated as below


While grey set not empty:
  Pick an object from a grey set
  Move the object to black set
  Move white objects it references to grey set

It ensures that no black objects are referencing white objects, which are not reachable from the root and can be recycled once the grey set is empty. It can work on the fly without suspending the system for a period of time.

Implementations

Stop the World

The garbage collector comes with a
recycle threshold when closed or reached, a recycle happened.

It stops the program execution until the garbage collection is done. This brings high throughput for GC. It's suitable for non-interactive applications. By stopping the world for a noticeable period of time with a specific frequency it introduces performance issue as well.

Incremental GC

Performs garbage collection discretely by breaking the collection into pieces of processes and completes them one by one. Instead of freezing for a long time like STW does, it breaks it down to many tiny periods of time.

Concurrent GC

It runs garbage collection concurrently without stopping the world. Minimizing the pause time makes it a good choice for interactive programs. Low throughout makes it costs longer time to complete GC.

Garbage Collection in Languages

GC in Python

The gc module in Python allows you to interact with the GC system. It could be disabled by gc.disable() and manually GC by invoking gc.collect(). gc.garbage is a list of objects to be collected.

Interested in learning more about it? Check this out.😎

GC in Go

Go allows you to disable GC by running the application with GOGC=off.

Go comes with two knobs to control the GC. The first one is GCPercent. Basically this is a knob that adjusts how much CPU you want to use and how much memory you want to use. The default is 100 which means that half the heap is dedicated to live memory and half the heap is dedicated to allocation. You can modify this in either direction.

MaxHeap, which is not yet released but is being used and evaluated internally, lets the programmer set what the maximum heap size should be. 

Learn more about the Go garbage collector from source code mgc.

GC in JavaScript

GC in Javascript is based on reachability and completely automatic. Found this article explains it very well, I recommend you read through it.

Helpful Resources

Fundamentals of Garbage Collection

Wiki:Garbage Collection

Latest comments (0)