Viacheslav Poturaev

Posted on Oct 7, 2021

Estimating memory footprint of dynamic structures in Go

#go #performance

TL;DR You can analyze size of particular long-living data as seen in Go heap during app development with help of Go benchmarks and/or runtime.ReadMemStats.

The problem

Understanding how much memory is taken on the heap by particular data items is not always straightforward.
This can be done relatively easy, for types of fixed size like int64 or struct { A bool }, but if you have maps, slices, strings or pointers static size calculation becomes impossible due to content-dependent dynamic size of data.

Yet, comparing memory efficiency of data structures may be necessary to measure potential performance optimization or to plan capacity for a memory-hungry application.

Memory can be occupied by short and long living values. Short-living temporary values are promptly removed by garbage collector (GC) soon after they are not needed, they can usually be identified with memory profiler.
Long-living values may be used many times and are contributing to stable memory usage of an application,
for example as a map of cached values.

Imagine we have an app that stores many of such structures in memory.

// Untitled1 is a sample struct.
type Untitled1 struct {
    ID     int
    Weight uint8
    Height uint8
    Name   string
    IsFoo  bool
    Parent *MyStruct1
    Tags   []string
}

And we want to estimate how much do they take on the heap.

Measuring memory usage with Go benchmark tests

Go benchmark aggregates heap allocations that happen during execution.

func TestUntitled1_size(t *testing.T) {
    n := 10000
    escape := func(v interface{}) {}
    res := testing.Benchmark(func(b *testing.B) {
        b.ReportAllocs()
        // Making a constant number of iterations instead of b.N for better stability of result and quicker execution.
        for i := 0; i < n; i++ {
            // Instantiating structure with relevant content.
            id := i ^ 12345
            v := Untitled1{
                ID:     id,
                Name:   "John Doe" + strconv.Itoa(i), // Using different string values to avoid interning for more realistic load.
                Weight: 80,
                Height: 180,
                IsFoo:  false,
                Parent: &Untitled1{Name: "Constantine"},
                Tags:   []string{"tag" + strconv.Itoa(i%10)},
            }

            // Escaping value to the heap as function argument.
            escape(v)
        }
    })

    s := res.MemBytes/uint64(n)

    fmt.Println("avg size of item:", s)

    // Setting safety threshold to keep struct size in reasonable boundaries.
    // This has to be aligned with actual averaged data served in production,
    // otherwise it may result in a false positive pass.
    if s > 200 {
        t.Errorf("item size is greater than 200: %d", s)
    }
}

=== RUN   TestUntitled1_size
avg size of item: 197
--- PASS: TestUntitled1_size (0.03s)

Accuracy of this result depends on how close sample content is to the contents used in production.

Now let's try to reduce memory footprint by moving Name field above Weight and save some padding and validate result with size test.

// Untitled2 is an optimized sample struct.
type Untitled2 struct {
    ID     int
    Name   string // Let's see if moving string field before uint8 helps.
    Weight uint8
    Height uint8
    IsFoo  bool
    Parent *Untitled2
    Tags   []string
}

=== RUN   TestUntitled2_size
avg size of item: 165
--- PASS: TestUntitled2_size (0.03s)

The test shows about 16% reduction in memory, this could have been a nice improvement in an app that holds millions of such items.

Actual item storage may impose additional overhead, for example if items are stored in a map.

func TestUntitled1_map_size(t *testing.T) {
    for _, numItems := range []int{1000, 10000, 100000, 1000000} {
        t.Run(strconv.Itoa(numItems), func(t *testing.T) {
            res := testing.Benchmark(func(b *testing.B) {
                b.ReportAllocs()
                m := make(map[int]Untitled1, numItems)

                for i := 0; i < numItems; i++ {
                    // Instantiating structure with relevant content.
                    id := i ^ 12345
                    v := Untitled1{
                        ID:     id,
                        Name:   "John Doe" + strconv.Itoa(i), // Using different string values to avoid interning for more realistic load.
                        Weight: 80,
                        Height: 180,
                        IsFoo:  false,
                        Parent: &Untitled1{Name: "Constantine"},
                        Tags:   []string{"tag" + strconv.Itoa(i%10)},
                    }

                    m[id] = v
                }

                // No need to force escaping of the value as map is already on the heap.
            })

            s := res.MemBytes / uint64(numItems)

            fmt.Println("avg size of item in a map of", numItems, "items:", s)

            // Setting safety threshold to keep struct size in reasonable boundaries.
            // This has to be aligned with actual averaged data served in production,
            // otherwise it may result in a false positive pass.
            if s > 320 {
                t.Errorf("item size is greater than 320: %d", s)
            }
        })
    }
}

=== RUN   TestUntitled1_map_size/1000
avg size of item in a map of 1000 items: 297
    --- PASS: TestUntitled1_map_size/1000 (0.01s)
=== RUN   TestUntitled1_map_size/10000
avg size of item in a map of 10000 items: 260
    --- PASS: TestUntitled1_map_size/10000 (0.05s)
=== RUN   TestUntitled1_map_size/100000
avg size of item in a map of 100000 items: 252
    --- PASS: TestUntitled1_map_size/100000 (0.81s)
=== RUN   TestUntitled1_map_size/1000000
avg size of item in a map of 1000000 items: 310
    --- PASS: TestUntitled1_map_size/1000000 (17.96s)

Map overhead and effective size of item vary with the number of items in a map.

Field padding optimization is also visible in map results.

=== RUN   TestUntitled2_map_size/1000
avg size of item in a map of 1000 items: 264
    --- PASS: TestUntitled2_map_size/1000 (0.01s)
=== RUN   TestUntitled2_map_size/10000
avg size of item in a map of 10000 items: 230
    --- PASS: TestUntitled2_map_size/10000 (0.10s)
=== RUN   TestUntitled2_map_size/100000
avg size of item in a map of 100000 items: 224
    --- PASS: TestUntitled2_map_size/100000 (0.83s)
=== RUN   TestUntitled2_map_size/1000000
avg size of item in a map of 1000000 items: 276
    --- PASS: TestUntitled2_map_size/1000000 (15.73s)

Measuring memory usage with `runtime.ReadMemStats`

Each item memory consumption can be estimated from the difference of heap in use before and after item creation. This
is only feasible if heap is not used concurrently with other memory intensive operations. Such approach will likely
fail in an application that serves production traffic, but should work fine if executed as a standalone app or test.

Reading heap in use is possible with runtime.ReadMemStats.

func stableHeapInUse() int64 {
    var (
        m         = runtime.MemStats{}
        prevInUse uint64
        prevNumGC uint32
    )

    for {
        runtime.ReadMemStats(&m)

        // Considering heap stable if recent cycle collected less than 10KB.
        if prevNumGC != 0 && m.NumGC > prevNumGC && math.Abs(float64(m.HeapInuse-prevInUse)) < 10*1024 {
            break
        }

        prevInUse = m.HeapInuse
        prevNumGC = m.NumGC

        // Sleeping to allow GC to run a few times and collect all temporary data.
        time.Sleep(50 * time.Millisecond)

        runtime.GC()
    }

    return int64(m.HeapInuse)
}

GC is dynamic, it may introduce some measurement noise by cleaning up data that is not related to items in question.
Because of that populating a batch of items may provide a more reliable average result.

Let's write a function to calculate average size of an item for a batch of items (a slice or a map or any other list
of items).

func itemSize(n int, populate func() interface{}) float64 {
    // Collect heap in use before creating a list of values.
    before := stableHeapInUse()

    // Create a list of items to occupy memory.
    items := populate()

    // Collect heap in use after creating a list of values.
    after := stableHeapInUse()

    // Add reference to items variable to avoid premature collection with GC.
    _ = fmt.Sprintf("%T", items)

    return float64(after-before) / float64(n)
}

For slice of such items, the size of one item is about 190 bytes. Because slice holds items continuously in memory
and does not add overhead (apart from a few bytes for slice header).

fmt.Println("avg item size for []MyStruct1", itemSize(n, func() interface{} {
    items := make([]MyStruct1, 0, n)

    for i := 0; i < n; i++ {
        id := i ^ 12345
        items = append(items, MyStruct1{
            ID:     id,
            Name:   "John Doe" + strconv.Itoa(i), // Using different string values to avoid interning for more realistic load.
            Weight: 80,
            Height: 180,
            IsFoo:  false,
            Parent: &MyStruct1{Name: "Constantine"},
            Tags:   []string{"tag" + strconv.Itoa(i%10)},
        })
    }

    return items
}))

avg item size for []Untitled1 188.416

This aligns with results received with Go benchmark approach for separate items.

DEV Community

Estimating memory footprint of dynamic structures in Go

The problem

Measuring memory usage with Go benchmark tests

Measuring memory usage with `runtime.ReadMemStats`

Latest comments (0)

Read next

How using Server-Timing API helped bring > 70% perf improvement

Proxy Server in Go

Colly: A Comprehensive Guide to High-Performance Web Crawling in Go

Upgrading Tokens (Pogo Pt: 5)

The problem

Measuring memory usage with Go benchmark tests

Measuring memory usage with runtime.ReadMemStats

Read next

How using Server-Timing API helped bring > 70% perf improvement

Proxy Server in Go

Colly: A Comprehensive Guide to High-Performance Web Crawling in Go

Upgrading Tokens (Pogo Pt: 5)

Measuring memory usage with `runtime.ReadMemStats`