Recently released go1.20
brought a new experimental package named arena
(proposal).
The purpose of this package is to isolate multiple related allocations into a single area of memory, so that they can be freed all at once.
Expected effect is improved allocation performance and reduced garbage collector (GC) pressure.
Let's see it in action. As a toy problem, we can use binary-trees
.
A program that idiomatically solves this problem has to put a lot of pressure on a memory management system to allocate many small objects, so it is a great candidate to assess new arenas
.
Original source code:
binary-trees-ptr.go
package main
import (
"flag"
"fmt"
"strconv"
"sync"
)
type Tree struct {
Left *Tree
Right *Tree
}
// Count the nodes in the given complete binary tree.
func (t *Tree) Count() int {
// Only test the Left node (this binary tree is expected to be complete).
if t.Left == nil {
return 1
}
return 1 + t.Right.Count() + t.Left.Count()
}
// Create a complete binary tree of `depth` and return it as a pointer.
func NewTree(depth int) *Tree {
if depth > 0 {
return &Tree{Left: NewTree(depth - 1), Right: NewTree(depth - 1)}
} else {
return &Tree{}
}
}
func Run(maxDepth int) {
var wg sync.WaitGroup
// Set minDepth to 4 and maxDepth to the maximum of maxDepth and minDepth +2.
const minDepth = 4
if maxDepth < minDepth+2 {
maxDepth = minDepth + 2
}
// Create an indexed string buffer for outputing the result in order.
outCurr := 0
outSize := 3 + (maxDepth-minDepth)/2
outBuff := make([]string, outSize)
// Create binary tree of depth maxDepth+1, compute its Count and set the
// first position of the outputBuffer with its statistics.
wg.Add(1)
go func() {
tree := NewTree(maxDepth + 1)
msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
maxDepth+1,
tree.Count())
outBuff[0] = msg
wg.Done()
}()
// Create a long-lived binary tree of depth maxDepth. Its statistics will be
// handled later.
var longLivedTree *Tree
wg.Add(1)
go func() {
longLivedTree = NewTree(maxDepth)
wg.Done()
}()
// Create a lot of binary trees, of depths ranging from minDepth to maxDepth,
// compute and tally up all their Count and record the statistics.
for depth := minDepth; depth <= maxDepth; depth += 2 {
iterations := 1 << (maxDepth - depth + minDepth)
outCurr++
wg.Add(1)
go func(depth, iterations, index int) {
acc := 0
for i := 0; i < iterations; i++ {
// Create a binary tree of depth and accumulate total counter with its
// node count.
a := NewTree(depth)
acc += a.Count()
}
msg := fmt.Sprintf("%d\t trees of depth %d\t check: %d",
iterations,
depth,
acc)
outBuff[index] = msg
wg.Done()
}(depth, iterations, outCurr)
}
wg.Wait()
// Compute the checksum of the long-lived binary tree that we created
// earlier and store its statistics.
msg := fmt.Sprintf("long lived tree of depth %d\t check: %d",
maxDepth,
longLivedTree.Count())
outBuff[outSize-1] = msg
// Print the statistics for all of the various tree depths.
for _, m := range outBuff {
fmt.Println(m)
}
}
func main() {
n := 0
flag.Parse()
if flag.NArg() > 0 {
n, _ = strconv.Atoi(flag.Arg(0))
}
Run(n)
}
If I compile and run this program, I can get a baseline performance metric.
time ./binary-trees-ptr 21
stretch tree of depth 22 check: 8388607
2097152 trees of depth 4 check: 65011712
524288 trees of depth 6 check: 66584576
131072 trees of depth 8 check: 66977792
32768 trees of depth 10 check: 67076096
8192 trees of depth 12 check: 67100672
2048 trees of depth 14 check: 67106816
512 trees of depth 16 check: 67108352
128 trees of depth 18 check: 67108736
32 trees of depth 20 check: 67108832
long lived tree of depth 21 check: 4194303
real 0m7.232s
user 1m9.929s
sys 0m0.959s
The actual program output is not so important, but the execution time is.
This result was obtained with a general purpose Macbook Pro (Intel) and in consecutive runs the numbers were slightly different. However, this post does not aim for scientific accuracy, the goal is to have rough understanding what to expect from arenas. Performance impact may vary depending on the code, so please run proper benchmarks on your actual cases.
Most allocations in the app are happening in
func NewTree(depth int) *Tree {
if depth > 0 {
return &Tree{Left: NewTree(depth - 1), Right: NewTree(depth - 1)}
} else {
return &Tree{}
}
}
Enter arena
This new package is available in std lib with "arena"
import.
The flow is
- create arena instance with
arena.NewArena()
, - use it to allocate instances of any type with
arena.New[T any](a *Arena) *T
, - use it to allocate slices with
arena.MakeSlice[T any](a *Arena, len, cap int) []T
, - free arena instance once the data is no longer needed with
(*Arena).Free()
.
Because this package is experimental, you'd need to explicitly enable it with GOEXPERIMENT=arenas
env var and/or with equivalent build tag //go:build goexperiment.arenas
during build.
Upgrade the app
Let's isolate tree constructor to allocate in an arena.
func NewTree(a *arena.Arena, depth int) *Tree {
if depth > 0 {
t := arena.New[Tree](a)
t.Left = NewTree(a, depth-1)
t.Right = NewTree(a, depth-1)
return t
} else {
return arena.New[Tree](a)
}
}
Now when we build a tree, we have control of where to put its data and when to free it.
.....
// Create binary tree of depth maxDepth+1, compute its Count and set the
// first position of the outputBuffer with its statistics.
wg.Add(1)
go func() {
a := arena.NewArena()
tree := NewTree(a, maxDepth+1)
msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
maxDepth+1,
tree.Count())
outBuff[0] = msg
a.Free()
wg.Done()
}()
.....
Upgraded app:
binary-trees-ptr-arena.go
//go:build goexperiment.arenas
package main
import (
"arena"
"flag"
"fmt"
"strconv"
"sync"
)
type Tree struct {
Left *Tree
Right *Tree
}
// Count the nodes in the given complete binary tree.
func (t *Tree) Count() int {
// Only test the Left node (this binary tree is expected to be complete).
if t.Left == nil {
return 1
}
return 1 + t.Right.Count() + t.Left.Count()
}
// Create a complete binary tree of `depth` and return it as a pointer.
func NewTree(a *arena.Arena, depth int) *Tree {
if depth > 0 {
t := arena.New[Tree](a)
t.Left = NewTree(a, depth-1)
t.Right = NewTree(a, depth-1)
return t
} else {
return arena.New[Tree](a)
}
}
func Run(maxDepth int) {
var wg sync.WaitGroup
// Set minDepth to 4 and maxDepth to the maximum of maxDepth and minDepth +2.
const minDepth = 4
if maxDepth < minDepth+2 {
maxDepth = minDepth + 2
}
// Create an indexed string buffer for outputing the result in order.
outCurr := 0
outSize := 3 + (maxDepth-minDepth)/2
outBuff := make([]string, outSize)
// Create binary tree of depth maxDepth+1, compute its Count and set the
// first position of the outputBuffer with its statistics.
wg.Add(1)
go func() {
a := arena.NewArena()
tree := NewTree(a, maxDepth+1)
msg := fmt.Sprintf("stretch tree of depth %d\t check: %d",
maxDepth+1,
tree.Count())
outBuff[0] = msg
a.Free()
wg.Done()
}()
// Create a long-lived binary tree of depth maxDepth. Its statistics will be
// handled later.
var longLivedTree *Tree
la := arena.NewArena()
wg.Add(1)
go func() {
longLivedTree = NewTree(la, maxDepth)
wg.Done()
}()
// Create a lot of binary trees, of depths ranging from minDepth to maxDepth,
// compute and tally up all their Count and record the statistics.
for depth := minDepth; depth <= maxDepth; depth += 2 {
iterations := 1 << (maxDepth - depth + minDepth)
outCurr++
wg.Add(1)
go func(depth, iterations, index int) {
acc := 0
for i := 0; i < iterations; i++ {
// Create a binary tree of depth and accumulate total counter with its
// node count.
ar := arena.NewArena()
a := NewTree(ar, depth)
acc += a.Count()
ar.Free()
}
msg := fmt.Sprintf("%d\t trees of depth %d\t check: %d",
iterations,
depth,
acc)
outBuff[index] = msg
wg.Done()
}(depth, iterations, outCurr)
}
wg.Wait()
// Compute the checksum of the long-lived binary tree that we created
// earlier and store its statistics.
msg := fmt.Sprintf("long lived tree of depth %d\t check: %d",
maxDepth,
longLivedTree.Count())
outBuff[outSize-1] = msg
la.Free()
// Print the statistics for all of the various tree depths.
for _, m := range outBuff {
fmt.Println(m)
}
}
func main() {
n := 0
flag.Parse()
if flag.NArg() > 0 {
n, _ = strconv.Atoi(flag.Arg(0))
}
Run(n)
}
Now is there any performance difference?
go build -tags goexperiment.arenas ./binary-trees-ptr-arena.go
time ./binary-trees-ptr-arena 21
stretch tree of depth 22 check: 8388607
2097152 trees of depth 4 check: 65011712
524288 trees of depth 6 check: 66584576
131072 trees of depth 8 check: 66977792
32768 trees of depth 10 check: 67076096
8192 trees of depth 12 check: 67100672
2048 trees of depth 14 check: 67106816
512 trees of depth 16 check: 67108352
128 trees of depth 18 check: 67108736
32 trees of depth 20 check: 67108832
long lived tree of depth 21 check: 4194303
real 0m5.777s
user 0m38.207s
sys 0m4.537s
Time difference is around 20% which is very impressive given such a small change to original program.
Keep in mind that a single arena can not be used concurrently (at least as of current implementation), but you can create multiple arenas for multiple goroutines.
Top comments (2)
On my Windows 10 PC (go 1.20.5) the timings are as follows:
Original: 9.473 seconds
Your version: 5.862 seconds
My version: 1.415 seconds
The new Go Arena is fairly heavy weight in its allocation of memory. I rolled my own arena based on a slice of tree nodes. The code is just two generic functions of 2 and 7 lines of code. The newArena function (7 lines) creates a new slice of 512 bytes worth of space for nodes. The newNode function (2 lines) appends a new node to the slice. When the slice goes out of scope it is released for GC.
Indeed, with custom array-based "arena" it is possible to get decent performance. However, I suppose, that would go against the original problem statement:
benchmarksgame-team.pages.debian.n...
The idea of the original benchmark is to measure performance of standard memory management, and hopefully, once
arena
passes experimental phase, it will be considered a standard memory management tool.As of now, the fate of
arena
is unclear.