DEV Community

Cover image for Zero-Compromise Arena Allocation: A Practical Approach to Arena Memory Management
Unmeinks
Unmeinks

Posted on • Edited on

Zero-Compromise Arena Allocation: A Practical Approach to Arena Memory Management

When I first began developing my game engine, I decided to challenge myself by using only arena allocators. And while it provided some nice benefits like cache locality and near instant allocation and deallocation from arenas (after initially allocating the arena itself), it wasn't without its problems.

Temporaries Problem

Temporaries. Are. Everywhere. I was blind to this before using arenas because I would just use a vector for everything. Bad, I know, but It was nice, clean, and I never had to worry about it. Any time I needed intermediate arrays (for things like string building), a vector was always there for me, but now it wasn't, and now I needed a way to explicitly manage these temporaries.

First attempt: Separate Scratch Arenas

My initial solution to this was to pass a separate "scratch" arena by value that could be used to store intermediate values so that any changed bookkeeping would get lost at function return. For example:

void* someFunc(Arena* persistentArena, Arena scratchArena) {
    // temporary allocation, lost after function returns
    void* temp = allocate(&scratchArena, 64);

    // persistent allocation, lives beyond function
    return allocate(persistentArena, 128);
}
Enter fullscreen mode Exit fullscreen mode

Since allocation in an arena is just updating the bookkeeping (i.e. an offset), this meant scratch arenas operated effectively like a stack. From what I found online, this most aligned with Chris Wellons' method of dealing with this issue.

This was nice because, for one, ownership was incredibly clear: the result lives in the arena passed by pointer, since that is the only possible way for the result to live longer than the arena. Additionally, if I only passed an arena by value, then I know for certain that the result doesn't live on the arena, because it can't (since all changed bookkeeping as a result of allocation is lost at function return).

Furthermore, if I wanted a function's results to live in a scratch arena instead of in a "persistent" arena (an arena passed by pointer, where allocations live longer than the function's stack frame as a consequence), then I could at any point just pass a pointer to my scratch arena instead of a pointer to my persistent arena, so that the called function's results will be lost after the "current" function returns (since the called function's results were placed in the scratch arena).

Nested Function Issue

The caveat to this is that if a function needs to have both a persistent and scratch arena, and if I made that function's persistent arena my scratch arena, then what do I pass for the function's scratch arena?

void* func(Arena* persistArena, Arena scratchArena) {
    void* temp = allocate(&scratchArena, 64);

    void* result = nestedFunc(
        &scratchArena,
        // what should be passed here?
    );

    return result;
}
Enter fullscreen mode Exit fullscreen mode

There are 3 options I found: to make and pass a second scratch arena, to pass the persistent arena, or to use thread local storage.

Additionally, there were 2 ways I found to easily make a second scratch arena: to split the current scratch arena into two or to allocate a whole new scratch arena. I decided against the second strategy because I liked how arena allocation reduced dynamic allocations, and I decided against the first strategy because half-ing the scratch arena for each nested function would quickly reduce the available capacity of each nested scratch arena.

So the first option was a bust, but what about the second option? Well, it actually works pretty well, so long as the persistent arena just happened to be large enough to accommodate scratch space requirements (not likely in my case), and thats not even considering the possibility of it being allocated completely at the beginning of a function or filling up over time.

Okay, so were down to the last option: thread local storage. This is actually a pretty nice solution I read from Ryan Fleury's Untangling Lifetimes article, though it involves manually grabbing a scratch arena and checking if the passed persistent arena aliased the scratch arena (could happen if you call a function and pass the TLS scratch arena if the function result is just an intermediary / temporary).

This method also requires a manual "getScratch" and "releaseScratch" call at the beginning and end of functions that need to use the scratch arena. This was a bit frustrating, I had gotten really attached to the idea of the function's stack frame just naturally resetting the scratch arena, so going back to manually managing the reset felt bad. Additionally, I also removed the CRT, so I couldn't easily use the TLS method even if I was willing to manually manage the memory.

I was at an impasse. All of the methods seemed to have some sort of issue. Moreover, that check for if the arenas alias is actually really important to ensure you dont overwrite one arena by writing into another, which is an issue / safety concern with passing two separate arenas in general.

Exploring Dual Arenas

So I thought that maybe I shouldn't use 2 separate arenas, and instead searched about some dual arena methods and found a nice article by Stefan Reinalter. The idea is that scratch allocations are placed at the top of the arena growing downwards, while persistent allocations are placed at the bottom growing upwards. Furthermore, reason for the different directions is so that the persistent and scratch allocations dont overwrite each other. One added benefit of this approach to the previous ones is that arena space is completely maximized. This is because there's no case where a scratch allocation fails while there's available space for persistent allocations and vice versa, since the spaces for persistent and scratch allocations are combined into a single arena.

However, there was still the issue of manually managing the "scratch" space, i.e. the top of the arena, as well as the issue that I couldn't easily have a function place results into the scratch space, as I was able to with separate arenas.

Final Solution

To solve these issues, my engine uses an arena allocation strategy that takes elements from each strategy discussed so far. For starters, this is what the struct looks like:

struct ArenaFrame {
    Arena* pArena;
    uint32_t* pPersistOffset;
    uint32_t scratchOffset;
    bool isFlipped;
};
Enter fullscreen mode Exit fullscreen mode

The idea is to use a single dual arena that is actually just a wrapper of 2 offsets into the same memory: a persistent offset pointer and a scratch offset value. This way, when you pass an ArenaFrame by value, any scratch allocations are automatically cleaned up when the function returns, while persistent allocations remain.

Additionally, to make ownership / effects clear, that function can take an ArenaFrame by rvalue reference, which makes it immediately obvious at the call site which pointer is being modified. For example:

void* func2(ArenaFrame&& frame) {
    // scratch allocation, cleaned up after return
    void* temp = scratchAlloc(&frame, 64);

    // persistent allocation, lives after return
    return alloc(&frame, 128); 
}

void* func1(ArenaFrame&& frame) {
    // temp lives in scratch space
    void* temp = func2(makeFrame(frame, &frame.scratchOffset));

    // final result lives in persistant space
    return func2(makeFrame(frame, frame.pPersistOffset));
}
Enter fullscreen mode Exit fullscreen mode

Furthermore, to have a function place results into your frame's scratch region, you can do something similar. When setting the new arena frame's pPersistOffset, instead of using a pointer to your frame's persist offset, you can instead use pointer to your frame's scratch offset. This way, persistent allocations (with alloc) use the scratch offset for bookkeeping, effectively allocating in your frame's scratch region. Then, you can use your frame's persistent offset (dereferenced) to set the scratchOffset so that scratch allocations are placed in the opposite side of the arena (again, to ensure no conflict / overwriting). This is because it's completely fine to have a function allocate in your frame's persist region: since you passed the persist offset (dereferenced) by value, after the function returns, the persist region goes back exactly to how it was before.

However, you can't just swap the offsets and call it a day. This is because the two offsets grow in opposite directions, and if you just swap which one is the "persist" vs "scratch" offset, the direction the offsets grow in also swaps. If this is not accounted for, and the offsets swap, then alloc and scratchAlloc will grow into allocated space instead of free space, overwriting what you've stored. To solve this issue, there's an additional "isFlipped" member variable that exists in the frame whose value flips (between true and false) if the offsets flip. The alloc and scratchAlloc functions can then use this information to allocate in the correct direction.

After implementing this system, the complete API might look something like this:

void* func(ArenaFrame&& frame)
    // I want these results to live after the function returns
    result1 = someFunc(makeFrame(frame, frame.pPersistOffset));
    result2 = alloc(&frame, 32);

    // I want these results to live for the duration of the function only
    result3 = someFunc(makeFrame(frame, &frame.scratchOffset));
    result4 = alloc(&frame, 32);
}
Enter fullscreen mode Exit fullscreen mode

The key thing being that at any given moment, the programmer doesn't have to worry about whats flipped or not, they dont have to worry about complicated lifetimes, or what region they're allocating in. The only thing they need to worry about is if they want a result to 1. live longer than the function, or 2. live for the duration of the function, while the only thing they need to know is what pPersistOffset and scratchOffset correspond to in terms of lifetime.

This is currently the method I use in my game engine, and so far it has been pretty painless. The API is, in my opinion, extremely simple and explicit since you only ever need to pass a single ArenaFrame to any function, while the effects of a function are clear as day through the rvalue references. Here's a more complex example that shows how it all works together:

void* func2(ArenaFrame&& frame) {
    // Initial state: *frame.pPersistOffset = 0, frame.scratchOffset = 1000
    void* temp = scratchAlloc(&frame, 64);
    // After: *frame.pPersistOffset = 0, frame.scratchOffset = 936
    void* persist = alloc(&frame, 32);
     // After: *frame.pPersistOffset = 32, frame.scratchOffset = 936
    void* tempResult = anotherFunc(makeFrame(frame, &frame.scratchOffset));
    // After: *frame.pPersistOffset = 32, frame.scratchOffset = 904, 
    return alloc(&frame, 128);
    // After: *frame.pPersistOffset = 160, frame.scratchOffset = 904
}

void func1(ArenaFrame&& frame) {
    // Initial state: frame.scratchOffset = 1000, *frame.pPersistOffset = 0
    void* temp1 = func2(makeFrame(frame, frame.pPersistOffset));
    // After: frame.scratchOffset = 1000, *frame.pPersistOffset = 160
    void* temp2 = func2(makeFrame(frame, &frame.scratchOffset));
    // After: frame.scratchOffset = 840, *frame.pPersistOffset = 160
}
Enter fullscreen mode Exit fullscreen mode

The stack naturally handles cleanup of temporaries without any manual management, there's no possibility of arena aliasing since there's only a single underlying memory block, and there's maximum space utilization through the dual-direction growth.

So far, this method has helped simplify a lot of my allocation code, and hopefully it can be useful to you as well!

a full godbolt implementation

Top comments (0)