DEV Community

Cover image for Trading memory for fewer allocations.

Trading memory for fewer allocations.

Taras Tsugrii
Software Engineer, Coach, Mentor, Host and Organizer of Performance Summit and Scaling Continuous Delivery
・2 min read

Even though memory allocations are not always easy to spot, they are fairly expensive due to overhead and garbage collector load. In a seemingly innocent function print_int

func print_int(x int) {
Enter fullscreen mode Exit fullscreen mode

compiler claims (-gcflags="-m")

x escapes to heap
Enter fullscreen mode Exit fullscreen mode

so runtime.convT64 function is used to convert x into a pointer

00007 (6) CALL runtime.convT64(SB)
Enter fullscreen mode Exit fullscreen mode

which is implemented in runtime.iface.go:

func convT64(val uint64) (x unsafe.Pointer) {
    if val < uint64(len(staticuint64s)) {
        x = unsafe.Pointer(&staticuint64s[val])
    } else {
        x = mallocgc(8, uint64Type, false)
        *(*uint64)(x) = val
Enter fullscreen mode Exit fullscreen mode

The most interesting bit is x = unsafe.Pointer(&staticuint64s[val]) part, which returns a pointer from a preallocated staticuint64s pool of ints between 0 and 255:

// staticuint64s is used to avoid allocating in convTx for small integer values.
var staticuint64s = [...]uint64{
    0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
    0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
    0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
Enter fullscreen mode Exit fullscreen mode

It's a fairly cheap way to trade a little memory for allocation reduction and is used in many other managed languages like Java. To make it even more useful, runtime reuses the same cache also for convT16 and convT32 functions.

It's such a useful technique that it's also used for dynamic caches, also known as object pools, but the extra flexibility of not being limited by a fairly small range of values comes at a cost of synchronization, so it's important to measure this overhead when evaluating object pools.

In summary, consider using static preallocated caches to reduce allocation count and improve performance.

Discussion (0)