DEV Community

Viacheslav Poturaev
Viacheslav Poturaev

Posted on • Edited on

Memory arenas in Go

#go

Recently released go1.20 brought a new experimental package named arena (proposal).

The purpose of this package is to isolate multiple related allocations into a single area of memory, so that they can be freed all at once.

Expected effect is improved allocation performance and reduced garbage collector (GC) pressure.

Let's see it in action. As a toy problem, we can use binary-trees.

A program that idiomatically solves this problem has to put a lot of pressure on a memory management system to allocate many small objects, so it is a great candidate to assess new arenas.

Original source code:

binary-trees-ptr.go
package main

import (
   "flag"
   "fmt"
   "strconv"
   "sync"
)

type Tree struct {
   Left  *Tree
   Right *Tree
}

// Count the nodes in the given complete binary tree.
func (t *Tree) Count() int {
   // Only test the Left node (this binary tree is expected to be complete).
   if t.Left == nil {
      return 1
   }
   return 1 + t.Right.Count() + t.Left.Count()
}

// Create a complete binary tree of `depth` and return it as a pointer.
func NewTree(depth int) *Tree {
   if depth > 0 {
      return &Tree{Left: NewTree(depth - 1), Right: NewTree(depth - 1)}
   } else {
      return &Tree{}
   }
}

func Run(maxDepth int) {

   var wg sync.WaitGroup

   // Set minDepth to 4 and maxDepth to the maximum of maxDepth and minDepth +2.
   const minDepth = 4
   if maxDepth < minDepth+2 {
      maxDepth = minDepth + 2
   }

   // Create an indexed string buffer for outputing the result in order.
   outCurr := 0
   outSize := 3 + (maxDepth-minDepth)/2
   outBuff := make([]string, outSize)

   // Create binary tree of depth maxDepth+1, compute its Count and set the
   // first position of the outputBuffer with its statistics.
   wg.Add(1)
   go func() {
      tree := NewTree(maxDepth + 1)
      msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
         maxDepth+1,
         tree.Count())

      outBuff[0] = msg
      wg.Done()
   }()

   // Create a long-lived binary tree of depth maxDepth. Its statistics will be
   // handled later.
   var longLivedTree *Tree
   wg.Add(1)
   go func() {
      longLivedTree = NewTree(maxDepth)
      wg.Done()
   }()

   // Create a lot of binary trees, of depths ranging from minDepth to maxDepth,
   // compute and tally up all their Count and record the statistics.
   for depth := minDepth; depth <= maxDepth; depth += 2 {
      iterations := 1 << (maxDepth - depth + minDepth)
      outCurr++

      wg.Add(1)
      go func(depth, iterations, index int) {
         acc := 0
         for i := 0; i < iterations; i++ {
            // Create a binary tree of depth and accumulate total counter with its
            // node count.
            a := NewTree(depth)
            acc += a.Count()
         }
         msg := fmt.Sprintf("%d\t trees of depth %d\t check: %d",
            iterations,
            depth,
            acc)

         outBuff[index] = msg
         wg.Done()
      }(depth, iterations, outCurr)
   }

   wg.Wait()

   // Compute the checksum of the long-lived binary tree that we created
   // earlier and store its statistics.
   msg := fmt.Sprintf("long lived tree of depth %d\t check: %d",
      maxDepth,
      longLivedTree.Count())
   outBuff[outSize-1] = msg

   // Print the statistics for all of the various tree depths.
   for _, m := range outBuff {
      fmt.Println(m)
   }
}

func main() {
   n := 0
   flag.Parse()
   if flag.NArg() > 0 {
      n, _ = strconv.Atoi(flag.Arg(0))
   }

   Run(n)
}

If I compile and run this program, I can get a baseline performance metric.

time ./binary-trees-ptr 21
Enter fullscreen mode Exit fullscreen mode
stretch tree of depth 22         check: 8388607
2097152  trees of depth 4        check: 65011712
524288   trees of depth 6        check: 66584576
131072   trees of depth 8        check: 66977792
32768    trees of depth 10       check: 67076096
8192     trees of depth 12       check: 67100672
2048     trees of depth 14       check: 67106816
512      trees of depth 16       check: 67108352
128      trees of depth 18       check: 67108736
32       trees of depth 20       check: 67108832
long lived tree of depth 21      check: 4194303

real    0m7.232s
user    1m9.929s
sys     0m0.959s
Enter fullscreen mode Exit fullscreen mode

The actual program output is not so important, but the execution time is.

This result was obtained with a general purpose Macbook Pro (Intel) and in consecutive runs the numbers were slightly different. However, this post does not aim for scientific accuracy, the goal is to have rough understanding what to expect from arenas. Performance impact may vary depending on the code, so please run proper benchmarks on your actual cases.

Most allocations in the app are happening in

func NewTree(depth int) *Tree {
   if depth > 0 {
      return &Tree{Left: NewTree(depth - 1), Right: NewTree(depth - 1)}
   } else {
      return &Tree{}
   }
}
Enter fullscreen mode Exit fullscreen mode

Enter arena

This new package is available in std lib with "arena" import.
The flow is

  • create arena instance with arena.NewArena(),
  • use it to allocate instances of any type with arena.New[T any](a *Arena) *T,
  • use it to allocate slices with arena.MakeSlice[T any](a *Arena, len, cap int) []T,
  • free arena instance once the data is no longer needed with (*Arena).Free().

Because this package is experimental, you'd need to explicitly enable it with GOEXPERIMENT=arenas env var and/or with equivalent build tag //go:build goexperiment.arenas during build.

Upgrade the app

Let's isolate tree constructor to allocate in an arena.

func NewTree(a *arena.Arena, depth int) *Tree {
    if depth > 0 {
        t := arena.New[Tree](a)
        t.Left = NewTree(a, depth-1)
        t.Right = NewTree(a, depth-1)
        return t
    } else {
        return arena.New[Tree](a)
    }
}
Enter fullscreen mode Exit fullscreen mode

Now when we build a tree, we have control of where to put its data and when to free it.

.....
    // Create binary tree of depth maxDepth+1, compute its Count and set the
    // first position of the outputBuffer with its statistics.
    wg.Add(1)
    go func() {
        a := arena.NewArena()
        tree := NewTree(a, maxDepth+1)
        msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
            maxDepth+1,
            tree.Count())

        outBuff[0] = msg
        a.Free()
        wg.Done()
    }()
.....
Enter fullscreen mode Exit fullscreen mode

Upgraded app:

binary-trees-ptr-arena.go
//go:build goexperiment.arenas

package main

import (
    "arena"
    "flag"
    "fmt"
    "strconv"
    "sync"
)

type Tree struct {
    Left  *Tree
    Right *Tree
}

// Count the nodes in the given complete binary tree.
func (t *Tree) Count() int {
    // Only test the Left node (this binary tree is expected to be complete).
    if t.Left == nil {
        return 1
    }
    return 1 + t.Right.Count() + t.Left.Count()
}

// Create a complete binary tree of `depth` and return it as a pointer.
func NewTree(a *arena.Arena, depth int) *Tree {
    if depth > 0 {
        t := arena.New[Tree](a)
        t.Left = NewTree(a, depth-1)
        t.Right = NewTree(a, depth-1)
        return t
    } else {
        return arena.New[Tree](a)
    }
}

func Run(maxDepth int) {

    var wg sync.WaitGroup

    // Set minDepth to 4 and maxDepth to the maximum of maxDepth and minDepth +2.
    const minDepth = 4
    if maxDepth < minDepth+2 {
        maxDepth = minDepth + 2
    }

    // Create an indexed string buffer for outputing the result in order.
    outCurr := 0
    outSize := 3 + (maxDepth-minDepth)/2
    outBuff := make([]string, outSize)

    // Create binary tree of depth maxDepth+1, compute its Count and set the
    // first position of the outputBuffer with its statistics.
    wg.Add(1)
    go func() {
        a := arena.NewArena()
        tree := NewTree(a, maxDepth+1)
        msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
            maxDepth+1,
            tree.Count())

        outBuff[0] = msg
        a.Free()
        wg.Done()
    }()

    // Create a long-lived binary tree of depth maxDepth. Its statistics will be
    // handled later.
    var longLivedTree *Tree
    la := arena.NewArena()
    wg.Add(1)
    go func() {
        longLivedTree = NewTree(la, maxDepth)
        wg.Done()
    }()

    // Create a lot of binary trees, of depths ranging from minDepth to maxDepth,
    // compute and tally up all their Count and record the statistics.
    for depth := minDepth; depth <= maxDepth; depth += 2 {
        iterations := 1 << (maxDepth - depth + minDepth)
        outCurr++

        wg.Add(1)
        go func(depth, iterations, index int) {
            acc := 0
            for i := 0; i < iterations; i++ {
                // Create a binary tree of depth and accumulate total counter with its
                // node count.
                ar := arena.NewArena()
                a := NewTree(ar, depth)
                acc += a.Count()
                ar.Free()
            }
            msg := fmt.Sprintf("%d\t trees of depth %d\t check: %d",
                iterations,
                depth,
                acc)

            outBuff[index] = msg
            wg.Done()
        }(depth, iterations, outCurr)
    }

    wg.Wait()

    // Compute the checksum of the long-lived binary tree that we created
    // earlier and store its statistics.
    msg := fmt.Sprintf("long lived tree of depth %d\t check: %d",
        maxDepth,
        longLivedTree.Count())
    outBuff[outSize-1] = msg
    la.Free()

    // Print the statistics for all of the various tree depths.
    for _, m := range outBuff {
        fmt.Println(m)
    }
}

func main() {
    n := 0
    flag.Parse()
    if flag.NArg() > 0 {
        n, _ = strconv.Atoi(flag.Arg(0))
    }

    Run(n)
}

Now is there any performance difference?

go build -tags goexperiment.arenas ./binary-trees-ptr-arena.go
time ./binary-trees-ptr-arena 21
Enter fullscreen mode Exit fullscreen mode
stretch tree of depth 22         check: 8388607
2097152  trees of depth 4        check: 65011712
524288   trees of depth 6        check: 66584576
131072   trees of depth 8        check: 66977792
32768    trees of depth 10       check: 67076096
8192     trees of depth 12       check: 67100672
2048     trees of depth 14       check: 67106816
512      trees of depth 16       check: 67108352
128      trees of depth 18       check: 67108736
32       trees of depth 20       check: 67108832
long lived tree of depth 21      check: 4194303

real    0m5.777s
user    0m38.207s
sys     0m4.537s
Enter fullscreen mode Exit fullscreen mode

Time difference is around 20% which is very impressive given such a small change to original program.

Keep in mind that a single arena can not be used concurrently (at least as of current implementation), but you can create multiple arenas for multiple goroutines.

Top comments (2)

Collapse
 
kevinsproule profile image
kevinsproule • Edited

On my Windows 10 PC (go 1.20.5) the timings are as follows:
Original: 9.473 seconds
Your version: 5.862 seconds
My version: 1.415 seconds

The new Go Arena is fairly heavy weight in its allocation of memory. I rolled my own arena based on a slice of tree nodes. The code is just two generic functions of 2 and 7 lines of code. The newArena function (7 lines) creates a new slice of 512 bytes worth of space for nodes. The newNode function (2 lines) appends a new node to the slice. When the slice goes out of scope it is released for GC.

var node Tree

// Appends a tree node to the end of arena and returns its address
func newNode[T any](t *[]T, s T) *T {
    *t = append(*t, s)
    return &(*t)[len(*t)-1]
}

// Creates a new arena storage of ~512 bytes or 4x the size of the type
func newArena[T any](T) []T {
    var sample T
    size := 512 / unsafe.Sizeof(sample)
    if size < 1 {
        size = 4
    }
    a := make([]T, 0, size)
    return a
}

// Create a new tree of a given depth
func newTree(a []Tree, depth int) *Tree {
    tree := newNode(&a, Tree{})
    if depth > 0 {
        tree.Right = newTree(a, depth-1)
        tree.Left = newTree(a, depth-1)
    }
    return tree
}

// Sample calling:

a := newArena(node)
tree := newTree(a, maxDepth+1)



Enter fullscreen mode Exit fullscreen mode
Collapse
 
vearutop profile image
Viacheslav Poturaev

Indeed, with custom array-based "arena" it is possible to get decent performance. However, I suppose, that would go against the original problem statement:

Please don't implement your own custom "arena" or "memory pool" or "free list" - they will not be accepted.

benchmarksgame-team.pages.debian.n...

The idea of the original benchmark is to measure performance of standard memory management, and hopefully, once arena passes experimental phase, it will be considered a standard memory management tool.

As of now, the fate of arena is unclear.

Note, 2023-01-17. This proposal is on hold indefinitely due to serious API concerns. The GOEXPERIMENT=arena code may be changed incompatibly or removed at any time, and we do not recommend its use in production.