Andy Haskell

Posted on Jul 20, 2019 • Edited on Jul 24, 2019

A closer look at Go's append function

#go #tutorial #beginners #codenewbie

Appending items onto arrays and array-like objects is common in many programming languages, with functions like JavaScript's Array.push and Python's List.append to add items. In Go, the function for adding items to its slice type is called append, and it looks pretty similar to what you'd expect from another language.



var numbers []int
numbers = append(numbers, 250)
numbers = append(numbers, 2071)
numbers = append(numbers, 1123)
// logs [250, 2071, 1123]
fmt.Println(numbers)

In this code snippet, we add the numbers 250, 2071, and 1123 to a slice of int, and then print out its content.

Most of the time, it's easy think of append as just adding items onto a slice. However, knowing what goes on behind the scenes when you append to a slice helps you both write Go code that runs faster, and avoid an easy to miss pitfall of working with slices and the append function. So in this tutorial, we're going to take a closer look at what your computer is doing when you run append.

Watching a slice grow

When you have a slice, there are five things it can tell you:

🐛 Its length; len(slice)
📦 Its capacity; cap(slice)
🎊 What's inside it; fmt.Sprintf("%v", slice)
🗺️ Where, in your computer's memory, it is; fmt.Sprintf("%p", slice)
❓ What type its items are; fmt.Sprintf("%T", slice)

So let's take a look at what happens in the first four of those if we append to a slice five times:



var numbers []int
for i := 0; i < 5; i++  {
    numbers = append(numbers, i)
    fmt.Printf("address: %p, length: %d, capacity: %d, items: %v\n",
        numbers, len(numbers), cap(numbers), numbers)
}

Here's my output when I ran this. Any time you run it, the memory addresses should be different, but everything else should be the same:



address: 0xc00009a000, length: 1, capacity: 1, items: [0]
address: 0xc00009a030, length: 2, capacity: 2, items: [0 1]
address: 0xc0000a6000, length: 3, capacity: 4, items: [0 1 2]
address: 0xc0000a6000, length: 4, capacity: 4, items: [0 1 2 3]
address: 0xc000092080, length: 5, capacity: 8, items: [0 1 2 3 4]

Notice that:

Every time we add an item, both len(numbers) and the items in the numbers slice always change, just as we'd expect.
Each time len(numbers) and cap(numbers) are the same, the slice is at maximum capacity. If we add another item to a slice at maximum capacity, then the slice's capacity grows.



address: 0xc0000a6000, length: 3, capacity: 4, items: [0 1 2]
address: 0xc0000a6000, length: 4, capacity: 4, items: [0 1 2 3]
address: 0xc000092080, length: 5, capacity: 8, items: [0 1 2 3 4]

Not only that, but when we add to a slice that's at maximum capacity, its address also changes!

On that last point, that means that when that fifth number is added to the slice, your computer is allocating a new piece of memory for our slice. The numbers variable now points to whatever section of memory just got allocated for the slice. To understand why we need that allocation, let's take a look at the data structure behind slices: Go arrays!

Arrays in Go

If you're working with a contiguous, ordered collection of items in Go, chances are you're using a slice. However, that doesn't mean there's no such thing as Go arrays. Go uses arrays, but the reason we don't often use them directly is that arrays are mainly used for storing the underlying data for slices.

One big difference between a Go array and a Go slice is that you have to say the size of an array up front when you declare it. For example, you would declare an array of two strings with the type [2]string instead of []string:



// A slice of strings, with two items in it
vehiclesSlice := []string{"car", "Pokémon that knows SURF"}
// An array with two items in it, with its size set to two
vehiclesArray := [2]string{"car", "Pokémon that knows SURF"}

These look similar, and you can access items in an array or a slice the same way. For example, to see all your vehicles, this loop would work for both vehiclesSlice and vehiclesArray:



for i, v := range vehiclesArray {
    if i == 0 {
        fmt.Printf("I get around by riding a %s\n", v)
    } else {
        fmt.Printf("My other ride is a %s\n", v)
    }
}

However, besides syntax, the big difference between a slice and an array, and why you have to declare an array's size up front, is that while a slice can change sizes, arrays are fixed-size. And because a fixed-size array is where the items in a slice live, when you append to a slice with a line of code like vehiclesSlice = append(vehiclesSlice, "Catbus"), what actually happens is:

1) The slice increments its length, so len(vehicles) is now 3
2a) If there is still space left in the underlying array, our slice isn't at maximum capacity, so the item is just added to a new slot in the array:

2b) But if the underlying array was already full, because arrays are fixed-size, that means we need to allocate a new, bigger array in memory and then copy everything over to it.

Either way, from the perspective of a slice's items, the result will be the same; the slice will have an underlying array containing this well-drawn Catbus:

A sloppy drawing of Catbus from My Neighbor Totoro, but it looks more like a turtle with ears — Boston Museum of Fine Arts, here I come!

Slices, arrays, and performance

You won't notice it in examples with just a few items, but the reason these re-allocations from appending are a big deal is that if you have millions of items in your slice, you can end up spending a lot of time on allocating memory and re-copying it. And memory allocations are a slow operation from the computer's perspective.

To see this in action, try running this Go testing benchmark that fills a slice with a billion 8-bit integers (if you have more than four gigabytes of RAM).



package main

import (
    "testing"
)

var aBillion=1000*1000*1000

// Add items to a slice with a billion-item array allocated up-front
func BenchmarkBillionPreallocatedSlice(b *testing.B) {
    b.ReportAllocs()

    for i := 0; i < b.N; i++ {
        items := make([]uint8, 0, aBillion)
        for j := 0; j < aBillion; j++ {
            items = append(items, 255)
        }
    }
}

// Add items to a slice that starts out nil; the underlying array gets
// re-allocated and re-copied several times
func BenchmarkBillionEmptySlice(b *testing.B) {
    b.ReportAllocs()

    for i := 0; i < b.N; i++ {
        var items []uint8
        for j := 0; j < aBillion; j++ {
            items = append(items, 255)
        }
    }
}

On my Mac, adding a billion items to a slice where the memory isn't pre-allocated typically took around .5-.6 seconds, while adding them to a slice that started out empty took around 2.5-2.8 seconds, with six times as many bytes being processed!



goos: darwin
goarch: amd64
pkg: benchmark
BenchmarkBillionPreallocatedSlice-8            2     583063935 ns/op    1000005728 B/op        2 allocs/op
BenchmarkBillionEmptySlice-8                   1    2799137688 ns/op    6168961152 B/op       73 allocs/op

This shows how much the extra allocations and repeated data copying can slow down your code. Because of this, if you are looking to improve the performance of code that works heavily with a slice, you should look into preventing unnecessary re-allocations.

For example, here is a code snippet where we fill a slice up and then process its data, but we use the slicing operator to allow us to reuse its underlying array when we fill it up again.



s := make([]int, 0, 1000)

// Fill the slice up and then send the items in the slice on their merry way
// down our data pipeline.
for i := 0; i < 1000; i++ {
    s = append(s, i)
}
fmt.Printf("address: %p, length: %d, capacity: %d\n", s, len(s), cap(s))
processSliceItems(s)

// Set the slice's length to 0 so we can start re-filling it. We're still using
// the same underlying array, so we can append to it and fill it up as many
// times as we want without any new allocations slowing down our code's
// performance.
s = s[:0]
fmt.Printf("address: %p, length: %d, capacity: %d\n", s, len(s), cap(s))

for i := 0; i < 1000; i++ {
    s = append(s, i)
}

When we say s = s[:0], that means that we are assigning s a slice, with the same underlying array, but with the slice's length reset. Because of this, no new memory allocations have to happen, so we'll see the same address both times we print it.

Be safe with appending to slices

In addition to performance, there is one other implication a slice's underlying array has on how we use append. It's that to use slice append safely, we need to be sure we know where in the array we are appending to. If we take this sloth family's grocery list:



groceryList := []string{
    "Cecropia leaves",     // Leaves sloths eat a lot of
    "Hibiscus flowers",    // 🌺 They're like sloth chocolate!
    "Green Sheen shampoo", // How sloths groom the algae ecosystem in their fur
}

foodstuffs := groceryList[:2]
personalCare := groceryList[2:]

fmt.Println("Food grocery items:", foodstuffs)
fmt.Println("Personal care items:", personalCare)

This looks correct so far; we print out the foodstuffs slice and get our first two items, and we print out the personalCare slice and get our third item.

At the memory level we have three slices in-play, and all of them are working with the same underlying array:

However, that can spell trouble if we try to then append to the foodstuffs slice. If we appended "avocados" to foodstuffs like this (ancient giant sloths helped disperse the seeds of what would evolve to be today's avocados):



foodstuffs := groceryList[0:2]
personalCare := groceryList[2:3]

foodstuffs = append(foodstuffs, "Avocados")

fmt.Println("Food grocery items:", foodstuffs)
fmt.Println("Personal care items:", personalCare)

Then even though the statements foodstuffs := groceryList[0:2] and foodstuffs = append(foodstuffs, "Avocados") look like they're making fresh arrays, we're still using the same sections of memory. This means when we append to the foodstuffs slice, what we're actually doing looks like this:

We put "Avocados" into the third slot of the underlying array, and in memory, that is also the location of first slot of the personalCare slice. So now the foodstuffs slice is []string{"Cecropia leaves", "Hibiscus flowers", "Avocados"}, but the personalCare slice is []string{"Avocados"}. We wiped "Green Sheen shampoo" out of the array!

Because of this, to append to slices safely in Go, you need to be sure you are appending to the slice containing every item we have added, in this case groceryList. This means that in general, the foodstuffs and personalCare slices should really only be read from, not appended to.

If you need a copy of the slice to append to, you can use Go's built-in copy function, like this:



// The foodstuffs slice now has a different underlying array from groceryList;
// we can now append to it without overwriting personalCare[0]!
foodstuffs := make([]string, 2)
copy(foodstuffs, groceryList[:2])
foodstuffs = append(foodstuffs, "Avocados")

fmt.Println("Food grocery items:", foodstuffs)
fmt.Println("Personal care items:", personalCare)

Now, with foodstuffs having a different underlying array, avocados can be added to the foodstuffs slice, and that won't overwrite Green Sheen shampoo on the grocery list! The sloths are ready to go to the store, and you're ready to write code with append that's both performant and safe. Until next time,