DEV Community

How does your language handle memory?

Ben Halpern on February 12, 2018

It occurred to me that memory management is an area where I have only a vague understanding of the differences across languages. I really only unde...

Read full post

edA‑qa mort‑ora‑y • Feb 12 '18

I guess by "your language" you mean I should choose Leaf. :)

Leaf uses both scoped local variables and shared instances. Scope local variables are the easy ones to implement, they live on the stack and disappear after the current scope disappears. These are really cheap since they don't require any real memory management -- the memory just kind of disappears when not needed.

The shared variables are using reference counts. The alternative here would be a scanning-collecting GC. As I need to support systems programming the use of a complex GC is not an option, as it prevents interfacing with other libraries and the OS. It also gets in the way of certain real-time coding, and can lead to undesired code pauses. _There are plenty of domains where these aren't issues, but I'm not programming for them. I may also just not be a fan of such GC in general :) _

Closures mess with this simple memory model. A lot of local variables are automatically lifted to a shared structure that is passed along with the closure.

The overall effect in Leaf is that you never manually free up memory, or destroy objects. It's all automatic and can't be forgotten. The limitation is that it does not collect memory cycles (when A points to B points to A). That's a hard problem to sovle in a satisfying manner (scanning-collectors can do it, but they have their own limitations).

To come in Leaf are "move" semantics. C++ and Rust both have move semantics, in Rust they are the default approach, in C++ they are still optional, but assumed in some cases.

I could talk hours about this topic. Should my scanning-time-collector free up some time I could even write more articles.

Here are some articles I've written that relate to memory:

Also be sure to check out my descriptions of the various paradigms as they all have different approaches to memory as well. I unfortunately don't talk much about that aspect in the articles.

Ben Halpern • Feb 12 '18

Quite informative answer, keep up the Leaf work!

Kasey Speakman • Feb 12 '18 • Edited

I'll add to the .NET answer. I've coded in VB, C#, and F# on .NET but never used unmanaged code.

First, the managed side actually has two kinds of memory allocation for two kinds of objects (Reference and Value objects). Reference objects go on the Heap and are tracked by the garbage collector. They can be referenced from any other code. The GC keeps track of which code still has a reference to the objects it manages, so it knows when it can safely free the memory. The GC periodically pauses execution of code to run its mark and compact routine.

The garbage collector is also "generational". When objects are created, they are at Generation 0, which is the most-frequently checked for collection. Longer-lived objects will eventually move to higher generations which don't get checked as often. This avoids needlessly scanning thru objects which will not get collected anyway.

Value objects go on the stack (as in stack trace and stack overflow) and they are not garbage collected. They are local to the stack frame of the running code. When the stack frame exits, the Value object's memory is automatically freed.

Most devs use Reference objects (class in C#) to build their data structures rather than Value objects (e.g. struct in C#). Obviously the Value objects have an advantage because no cycles are wasted with garbage collection management. However, reference objects are often better when an object is "large" (more than 16 bytes, but that is debatable) or is passed around a lot. Since Value objects live and die on the local stack frame, they have to be copied in their entirety when they are passed up and down the stack. (down = calling another method with the object as argument, up = returning the object to the calling method, unless I have those backward). Whereas Reference objects only have their pointers copied between stack frames, not the entire structure. So Reference objects are better to use if they are passed around a lot or they are large in size due to the cost of copying a Value object.

As for using, I have never used it for memory management. I use it to avoid having to manually code a .Close() statement on things like database connections. using basically says "call .Dispose() (and therefore Close for connection objects) whenever this variable goes out of scope." using only works on objects that implement the interface IDisposable. There's no special memory magic to it. See this reference. (I've never used destructors either.)

Nick Polyderopoulos • Feb 13 '18

Cool. I dont think that someone in C# (except those who write the clr or the corefx itself or really low level code) have used finalizer or the actual memory allocation statements. I have seen them been used on projects like RavenDB. Same goes for pointers too(int*, byte* etc).
Most of the times the managed pointers are used (delegate, object etc)

Kasey Speakman • Feb 14 '18

I'll also add that lots of allocations in really "hot" code can make the GC particularly active, reducing performance. In the .NET environment it is common for performance-critical libraries to include in their performance benchmarks the # of GC allocations in each generation.

I am mindful of my usage, but in most everyday scenarios (business programming) I never even have to think about the GC.

Nick Polyderopoulos • Feb 12 '18

Well C# has the best of both worlds. It uses managed and unmanaged code.

When the user runs managed code the framework uses a garbage collector. To automatically collect and free memory with the best possible way.

The garbage collector runs on 2 modes (Workstation, Server) and has 2 concurrency levels. (UI thread, Background thread) here is the documentation made by Microsoft: Garbage Collection Design, Garbage Collection Docs.

When the user runs unmanaged code the C# uses the same constructs as the C language to allocate and free memory or other resources.

edA‑qa mort‑ora‑y • Feb 12 '18

It has automatic collection, but it also uses a lot of manual techniques. For example it has using clauses, and it also follows a Dispose pattern. The need for Dispose in frameworks is due to not being able to "automatically" handle object disposal.

C++ uses a different pattern where objects can be disposed. It's got it's own problems of course, I just want to contradict the "best of both worlds" statement. :P

rhymes • Feb 12 '18 • Edited

The .NET dispose pattern and the using remind me of Python's context managers and the with statement.

with in Python automatically calls an __enter__ and __exit__ method which means that the writer of the class can initialize and dispose resources.

As mostly everything in Python it does not uses interfaces but duck typing, so to adhere to the "context manager pattern" you just need to implement those two methods.

In practice it is used for file handling, connection management (mostly databases) and resource disposal.

Galdin Raphael • Feb 13 '18

The need for Dispose in frameworks is due to not being able to "automatically" handle object disposal.

I'm not sure if that's right. Garbage Collection is automatic. Disposing is usually done manually. Because the whole point of disposables is to control when the object resources are disposed.

Closing files for example. You'd want to do that manually. Why would one wait for a garbage collector to do it?

edA‑qa mort‑ora‑y • Feb 13 '18

It's considered one of the strong points of C++ that you don't need to manually cleanup resources of any kind, it doesn't make memory a special case.

The Dispose pattern is a contrast to this. Like C, all resources, except memory, must be manually released. Automatic "memory" management doesn't solve any of the problems of other resources, something that C++ does actually solve.

All resources are part of this discussion, and if we just ignore them it isn't a fair comparison between how languages handle it.

Galdin Raphael • Feb 13 '18

aaah you're contrasting it with C++. Makes sense.

C# doesn't the C++ concept of automatic storage. In C# everything that's a reference-type is garbage collected. Garbage collection means that you don't have control over when the object is disposed. It doesn't generally matter, but sometimes it does. Like in files for example. This is where the disposable pattern comes in.

C++ doesn't have a runtime garbage collector. So things are different there. One can tell when a resource is going to be disposed (i.e. when an object's destructor will be called) unlike in C#. So there's no need for a disposable pattern here.

Nick Polyderopoulos • Feb 12 '18

Well the Dispose pattern is commonly used with unmanaged code.
The using clauses use the Dispose pattern.

See Dispose pattern docs on microsoft.docs

and wiki page

For me is the "best of both worlds". I didnt claim that this is what others think about the management of the memory in C#

rhymes • Feb 12 '18 • Edited

I know Python a little better than I know Ruby and I'm not a GC expert so I'm going to keep it as easy to understand (for me too :D) as I can:

Python

Python has garbage collection as many other high level languages. Its GC uses a reference counting algorithm AND a generational algorithm.

Let's say variables are "mere" labels to some value in memory. Each variable pointing to the same value is a reference. When there are no variables pointing to that memory space the GC is free to collect the memory occupied by that value.

So, for example:

class Foo:
  pass

a = Foo()
b = a
c = b

In this case, in memory, there are three pointers to the istance of Foo(), you can verify that by using the id() function:

>>> class Foo: pass
...
>>> a = Foo()
>>> b = a
>>> c = b
>>> id(a), id(b), id(c)
(4468012592, 4468012592, 4468012592)

As you can see all three point to the same object, which means it has a reference count of 3.

If I were to delete all three references, for example by using del:

>>> del a; del b; del c

at that point in memory the GC can (optionally) decide to collect the space occupied by the instance of Foo() because the reference count of that object is 0.

Primitives types like numbers have only one instance of the value in memory, so they are not tracked by the GC.

As you can see variables that hold numbers with the same value point to the same object in memory:

>>> a = 1
>>> b = 1
>>> c = 1
>>> id(a), id(b), id(c)
(4464022176, 4464022176, 4464022176)
>>> import gc
>>> gc.is_tracked(a)
False

Objects instead are tracked by the GC:

>>> gc.is_tracked(Foo())
True

There's also a way to know how many references an object has:

>>> a = Foo()
>>> b = a
>>> c = a
>>> sys.getrefcount(a)
4

It says 4 because it's 3 + the one created to run .getrefcount()

Reference counting doesn't solve all the problems though, if you have circular references (two objects containing pointers to each other, sort of like a double linked list) RC won't help because it can't detect them.

So Python has an additional GC, the generational one (see my below explanation for Ruby) to fix the issue.

To recap:

Reference counting as an automatic GC
Generational GC as an optional GC

Resources:

Things you need to know about garbage collection in Python
gc module
Gargabe collection for Python - This page is at least 17 years old!!!

Ruby

Ruby used to have a simple mark & sweep algorithm, which means: cycle through all objects, mark all of them as living and sweep away those who are not living (those that are not reachable).

Since M&S can be quite slow and low performing (traversing all the allocated memory each cycle!!) they changed in Ruby 2.2 with a generational algorithm.

The difference is that generational GCs divide up the memory in spaces (generations) divided by age. The youngest objects are in one space, the older objects in another and so on.

This means that the GC can decide where to concentrate its resources, usually there are way more short lived objects than they are long (especially if you have a functional programming style ;-)) so that's where the GC starts.

It frees up the memory of the youngest generation before it moves into the space of the older ones.

So when the GC pauses to collect from the "young guns" it's faster than having to go through all of them each time.

Resources:

Incremental Garbage Collection in Ruby 2.2

GCs are way more complicated than my explanation but I hope it helps :-D

Rodrigo Nonose • Feb 13 '18

Elixir is compiled to run on the Erlang VM (BEAM) so this applies to both Elixir, Erlang and other languages compiled to BEAM.
On BEAM, everything runs on a process (not to be mixed with OS processes neither OS threads, it's a user level process).
The processes have isolated memory.
The following wall of text is, in my opinion, the "architectural result" of the runtime based on this.

Only the process itself can access its memory and communication between processes are made by asynchronous messages (hard-copied), placed in a process mailbox that have to be explicitly accessed by the receiving process.
There's a bunch of trade-offs on the choice of this model.

It introduces asynchronicity when sharing data.
Since messages are hard-copied, it consumes significantly more memory. There's some optimizations for that, like big strings belonging in a shared table being "read-only".
Inside the process, data is immutable. Meaning, if I take a piece of memory (any type) and modify it, a new one will be created. This also why (I believe) there's no array type, only linked lists (and modified linked lists doesn't always create a full copy, just a "diff").
Every process can be executed in parallel and the VM does it by default: the VM runs in a single OS process, which spawns (by default) a thread per physical core of the machine that serves as schedulers. The schedulers then have separate independent queues that runs the processes. Differently from most runtimes, the scheduling is preemptive rather than cooperative, meaning the scheduler is the one that decides how long the process is going to run.
Processes can't have (or doesn't have, I still can't properly find the details on this) "goto loops": loops are mandatorily recursive function calls (like a real functional language).
Every process can have its turn: since every process is executed a little bit, long-running processes don't block short-running ones, making the runtime highly available.
Garbage collection is pretty straight-forward, because it doesn't have to check pointers and can safely free memory if the process finished executing. Also processes that are sleeping (usually waiting for messages) doesn't consume CPU at all because it gets woken up by the scheduler.
Allows process supervision. If a process dies because of an unexpected error, the supervisor restarts it. It can restart other processes that depends on the process that died as well, enabling graceful restarts. It enables the erlang philosophy of "let it crash" (most of the bugs are solved by restarting stuff).

rhymes • Feb 13 '18

Love this, thanks :-)

Alex Rudenko • Feb 12 '18

I am not an expert :-) I know only 2 main types of memory management: 1) automatic (JS, Java, PHP) where the runtime frees the memory as soon as it is not used by any running code (garbage) 2) manual (C, C++, Rust, ASM) where you allocate and free memory via system calls which are part of your program. With the manual management, you can have certain automation like smart pointers which help you free the memory automatically. I believe that Rust has those smart point built-in into the language.

Leah • Feb 12 '18

Rust has lifetimes, it automatically checks for how long the variable is in scope at compile time, it mostly does this automatically, but you sometimes have to specify them when it can't infer them from context, but the compiler will tell you

The Rust compiler tells you a lot and if it compiles it runs if you don't use "unsafe" methods like unwrap

edA‑qa mort‑ora‑y • Feb 12 '18

Automatic and manual are tricky to separate, and they aren't really language specific.

Stick with standard types and smart pointers and C++ has fully automated memory management.

C# frequently uses a Dispose system which is manual management.

Rust uses "move" semantics by default, and generally has automatic management. It of course offers manual management for places where it's necessary.

There are also other ways to manage memory, and the consideration of stack/heap/caches/etc. ensures it's a nice rich topic for endless discussion. :)

Alex Rudenko • Feb 12 '18 • Edited

That's right :-) Basically, I agree that the Rust is probably automated... I separate it like this: manual if you write something in your program to free memory properly (smart pointers/free etc.) and automated if it is built-in into the runtime/language so that you don't to specifically care about it (unless your memory starts leaking, yeah).

Paul Lefebvre • Feb 13 '18

With object-oriented programming, each new object you create takes up space in memory. Xojo uses a technique called Automatic Reference Counting (ARC) to manage the memory used by objects. ARC is faster and more efficient than other memory management techniques such as garbage collection.

Here is how it works: When you create a new instance of a class (an object), an internal reference counter for the class is increased. When the class goes out of scope, the reference is decreased. When the reference reaches 0, the class instance is immediately removed from memory.
This means that instances of classes are removed from memory automatically when they are no longer used. Suppose you create a class based on a ListBox. You then create an instance of that class in a window. When the window is opened, the instance of the class is created in memory automatically. When the window is closed, the instance of the class is automatically removed from memory. If you store the reference to a class in a local variable, when the method or event handler is finished executing, the instance of the class is removed from memory. If you store a reference to an instance of a class in a property, the instance will be removed from memory when the object owning the property is removed from memory.

ARC really is a great way to handle object memory management and is also used by languages such a Swift and Objective-C. With ARC you generally don't have to worry about memory management except for the special case of circular references. In those cases, you'll need to manually release an object (by setting it to Nil) so that its reference count decreases to allow it to eventually reach 0, thus allowing it to get removed from memory. Otherwise, your app will create an object that is never released from memory, causing what is called a memory leak in your app. This special case does not occur often, but when it does you can make use of Weak References to help mitigate it.

Evan Oman • Feb 13 '18 • Edited

I don't pretend to fully understand it but most (all?) Java Virtual Machine implementations use generational garbage collection which is pretty neat. The GC makes certain (configurable) assumptions about the distribution of object lifetimes and attempts to optimize memory cleanup accordingly.

I am sure there are some good articles online but I really loved the coverage of this topic in Chapter 6 of Java in a Nutshell.

rhymes • Feb 12 '18 • Edited

Same goes for Python's weakref:

A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else. However, until the object is actually destroyed the weak reference may return the object even if there are no strong references to it.

A lot of runtimes implement the same ideas in slightly different ways :D