DEV Community

Momchil Atanasov
Momchil Atanasov

Posted on

Understanding the Color type in Go

While working on my personal projects, I have had to deal with Go's image package a number of times. Transitively, I have also had to deal with Go's color package as well.

In the past, I used to struggle with alpha-premultiplied colors, which are the default ones in Go, and their representation. In this post I am hoping to shed some light on the matter as well as to show some micro-optimization techniques when implementing your own Color type.

Quick Theory

Before diving into the details, we should first start with the basics. Colors are often represented by four components - R, G, B and A - each representing the amount of Red, Green, Blue and Alpha (transparency/translusency) that a color is made up of.

In practice, the A is sometimes omitted when one is dealing with fully opaque colors. Also, this is not the only format out there (e.g. there is CMYK) but I will stick to RGBA in this blog post.

The most common machine representation of RGBA is probably R8G8B8A8 (RGBA8 for short). This means that there are 8 bits dedicated to each component. In essence, each component can range from 0 to 255. The benefit is that this is capable of representing a sufficient number of unique colors while also using little memory. In fact, a single color can be nicely stored in a single uint32 variable.

Other common options are R16G16B16A16 (RGBA16 for short), where each component gets 16 bits (values range from 0 to 65535), and R32FG32FB32FA32F (RGBA32F for short), where each component is a floating point number (values range from 0.0 to 1.0, though there are exceptions), allowing for easy transformations, without losing much precision, at the cost of a higher memory footprint.

Alpha-premultiplied colors

In my personal experience, I have always needed my colors to be in non-alpha-premultiplied form so I cannot pretend to fully understand the benefits of alpha-premultiplied colors. I know that there are certain algorithms which work better/faster with alpha-premultiplied colors and I believe that Go's draw package uses such algorithms internally.

Regardless, by default Go returns colors in alpha-premultiplied form so one needs to know how such colors are constructed, if one is to use Go's color or image packages correctly.

So what are alpha-premultiplied colors. These are colors, where the r, g, and b components have been multiplied by a.

...and as much as this explanation is correct, it is also very wrong. It makes certain assumptions that might not be obvious to the reader, unless they have deep knowledge on colors. This same assumption is made by the Go documentation as well.

If we were to take an r component from an RGBA8 color that has a value 128 (half intensity) and were to multiply it with an a component that has a value 128 (half transparency), we would get a new r value of 0 (no intensity), where in fact we would have expected 64 (quarter intensity). The problem is that the number overflows the 8 bits we have available.

In fact, what is usually understood is that the multiplication occurs in floating point space. In our example above, it would mean that we convert the 128 values to 0.5 (half intensity), then we multiply them with each other (i.e. 0.5 x 0.5), resulting in 0.25. Lastly, we convert the float back to uint8, resulting in 64.

Go's built-in color types

We should have a look at some of the most popular Go color types and see how they match to the formats we discussed previously.

The most popular color type is RGBA. It represents an alpha-premultiplied RGBA8 color.

Another popular color type is NRGBA. It represents a non-alpha-premultiplied RGBA8 color.

Derived from these type names, we also have the 16 bit component types.

The first one is RGBA64, which represents an alpha-premultiplied RGBA16 color.

Note: Go takes a different approach to color naming when compared with OpenGL, for example. In Go's case, the 64 in the type name indicates that the whole struct uses 64 bits (4 x 16 bits) and not that each component is 64 bits in size.

And the second one is NRGBA64, which represents a non-alpha-premultiplied RGBA16 color.

The RGBA function

Understanding the built-in color types is good but what unifies all colors in Go is the Color interface. Each color is expected to implement that interface. As such, if you were to ever implement your own color representation, you would need to implement that interface.

The only method that the Color interface requires is the RGBA one.

RGBA() (r, g, b, a uint32)
Enter fullscreen mode Exit fullscreen mode

And even if you were to never implement your own Color, you are more than likely to use the RGBA method. As such, it is important to understand how it relates to the colors.

This particular function, for me personally, has been the source of a lot of confusion in the past. Let's have a look at the official documentation.

RGBA returns the alpha-premultiplied red, green, blue and alpha values for the color. Each value ranges within [0, 0xffff], but is represented by a uint32 so that multiplying by a blend factor up to 0xffff will not overflow. An alpha-premultiplied color component c has been scaled by alpha (a), so has valid values 0 <= c <= a.

There are two important things here that are easy to get wrong.

The first one is that colors returned from the RGBA function are in the alpha-premultiplied R16G16B16A16 format. However, the components are returned inside 32 bit variables. The idea is that if one were to multiply two R16G16B16A16 colors with each other, the values would overflow the 16 bit representation, whereas 32 bits per component would handle such an operation.

Still, the important point to remember here is that each component is 16 bit, even if placed inside a 32 bit variable, and that the color components are alpha-premultiplied. It is sufficient to check the official source code for the RGBA64 type (recall that it represents an alpha-premultiplied R16G16B16A16 color).

type RGBA64 struct {
    R, G, B, A uint16
}

func (c RGBA64) RGBA() (r, g, b, a uint32) {
    return uint32(c.R), uint32(c.G), uint32(c.B), uint32(c.A)
}
Enter fullscreen mode Exit fullscreen mode

As can be seen, the 16 bit components are just cast to uint32 but their values remain in the [0, 65535] range. Also, there is no transformation done, since the color is already alpha-premultiplied.

The second important thing to note when reading the documentation is the part about the color component c has been scaled by alpha (a). What is meant here by scaled by alpha is that the alpha is converted to a float32 and only then multiplied with each color component. Recall the explanation in the Alpha-premultiplied colors section.

Implementing the Color interface

So, now that we have covered the fundamental aspects of colors in Go, let's try and implement our own color type. We will try to implement the non-alpha-premultiplied RGBA8 type.

Initial implementation

Following the principles from what's been discussed so far, we end up with the following implementation.

type CustomColor struct {
    R uint8
    G uint8
    B uint8
    A uint8
}

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // convert components to float32 in the range [0.0, 1.0]
    r32f := float32(c.R) / float32(255.0)
    g32f := float32(c.G) / float32(255.0)
    b32f := float32(c.B) / float32(255.0)
    a32f := float32(c.A) / float32(255.0)

    // perform alpha-premultiplication
    r32f = r32f * a32f
    g32f = g32f * a32f
    b32f = b32f * a32f

    // convert back to 16 bit
    r16 := uint16(r32f * 65535.0)
    g16 := uint16(g32f * 65535.0)
    b16 := uint16(b32f * 65535.0)
    a16 := uint16(a32f * 65535.0)

    // return as 32 bit (without upscaling)
    return uint32(r16), uint32(g16), uint32(b16), uint32(a16)
}
Enter fullscreen mode Exit fullscreen mode

If we test this implementation we will see that it works correctly. The question now is how well does it compare to an official implementation.

Official implementation

As mentioned already, the color.NRGBA64 type in Go implements a non-alpha-premultiplied RGBA8 color, exactly what we implemented above. Let us have a look at the official source code and compare how well we did.

func (c NRGBA) RGBA() (r, g, b, a uint32) {
    r = uint32(c.R)
    r |= r << 8
    r *= uint32(c.A)
    r /= 0xff
    g = uint32(c.G)
    g |= g << 8
    g *= uint32(c.A)
    g /= 0xff
    b = uint32(c.B)
    b |= b << 8
    b *= uint32(c.A)
    b /= 0xff
    a = uint32(c.A)
    a |= a << 8
    return
}
Enter fullscreen mode Exit fullscreen mode

There is clearly a major difference between the two implementations. How is it that the official implementation avoids the usage of float32 values and what are all the bit-wise operations doing?

Iterative optimization

Let us try to optimize our implementation one step at a time until we get to something that resembles the official implementation. Going on that journey should expand our knowledge in micro-optimizations and give us pointers on how to optimize the Color implementations we might do in the future.

First, we should realize that we don't need to go through the uint16 type but can instead directly cast to uint32, as long as we ensure that the final values don't exceed 65535. And since our float representations are in the [0.0, 1.0] range and we multiply them by 65535.0, this should be ensured. We end up with the following code.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // convert components to float32 in the range [0.0, 1.0]
    r32f := float32(c.R) / float32(255.0)
    g32f := float32(c.G) / float32(255.0)
    b32f := float32(c.B) / float32(255.0)
    a32f := float32(c.A) / float32(255.0)

    // perform alpha-premultiplication
    r32f = r32f * a32f
    g32f = g32f * a32f
    b32f = b32f * a32f

    // convert back to 16 bit, stored inside 32 bit variables
    r16 := uint32(r32f * 65535.0)
    g16 := uint32(g32f * 65535.0)
    b16 := uint32(b32f * 65535.0)
    a16 := uint32(a32f * 65535.0)
    return r16, g16, b16, a16
}
Enter fullscreen mode Exit fullscreen mode

Ok, this is cleaner, but is not much of an actual performance improvement.

Looking more at the code, one thing we may ask ourselves is whether we really need to convert the color components to the [0.0, 1.0] range. In fact, as long as it is of type float32 and the alpha component is in the [0.0, 1.0] range, we could do the alpha-premultiplication within the [0.0, 255.0] range, and then just scale it up to the [0.0, 65535.0] range. This will remove one division per component, meaning a total of three float divisions. We end up with the following code.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // convert alpha to the range [0.0, 1.0]
    a32f := float32(c.A) / float32(255.0)

    // perform alpha-premultiplication in the range [0.0, 255.0]
    r32f := float32(c.R) * a32f
    g32f := float32(c.G) * a32f
    b32f := float32(c.B) * a32f

    // convert to the [0.0, 65535.0] range
    r16 := uint32(r32f * 257.0)
    g16 := uint32(g32f * 257.0)
    b16 := uint32(b32f * 257.0)
    a16 := uint32(a32f * 65535.0)
    return r16, g16, b16, a16
}
Enter fullscreen mode Exit fullscreen mode

Note: We multiply the r, g and b components by 257 since 255 x 257 = 65535, where 255 is the maximum value in the [0.0, 255.0] range. In essence, we ensure a correct mapping from the [0.0, 255.0] range to the [0.0, 65535.0] range.

Since we are multiplying each color component by 257 and it is an integer constant, we could perform that multiplication while the color component is still in its integer representation, converting three float multiplications to three integer ones.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // convert alpha to the range [0.0, 1.0]
    a32f := float32(c.A) / float32(255.0)

    // perform alpha-premultiplication in the range [0.0, 255.0]
    r32f := float32(uint32(c.R)*257) * a32f
    g32f := float32(uint32(c.G)*257) * a32f
    b32f := float32(uint32(c.B)*257) * a32f

    // convert to the [0.0, 65535.0] range
    r16 := uint32(r32f)
    g16 := uint32(g32f)
    b16 := uint32(b32f)
    a16 := uint32(a32f * 65535.0)
    return r16, g16, b16, a16
}
Enter fullscreen mode Exit fullscreen mode

Note: We cast the color components to uint32 before performing the multiplications, as otherwise the multiplication would overflow the uint8 size that each color component has.

Let's rework the code a bit further by moving the color-component-to-alpha multiplication to the end of the code.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // convert alpha to the range [0.0, 1.0]
    a32f := float32(c.A) / float32(255.0)

    // perform integer upscaling of the color components
    upscaledR := uint32(c.R) * 257
    upscaledG := uint32(c.G) * 257
    upscaledB := uint32(c.B) * 257

    // convert to the [0.0, 65535.0] range
    r16 := uint32(float32(upscaledR) * a32f)
    g16 := uint32(float32(upscaledG) * a32f)
    b16 := uint32(float32(upscaledB) * a32f)
    a16 := uint32(a32f * 65535.0)
    return r16, g16, b16, a16
}
Enter fullscreen mode Exit fullscreen mode

Something else we should realize is that we can split the color component multiplication from uint32(c) * 257 to uint32(c) * 256 + uint32(c). Why would we want to do this? Instead of a single multiplication, don't we now have both a multiplication and an addition?

We have to know a bit base-2 theory to spot the potential performance improvement here. Multiplication by 256 can also be represented through an 8-wise bit-shift to the left. A single bit-shift to the left is equal to a multiplication by 2. Doing 8 bit-shifts to the left is equal to doing eight multiplications by 2, or 2^8, which is 256. The critical thing here is that in general bit-shifts are much faster than arbitrary multiplications or divisions.

Side note: Doing a bit-shift to the right is equal to the reverse - a division by 2.

This allows us to rewrite the code as follows.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // convert alpha to the range [0.0, 1.0]
    a32f := float32(c.A) / float32(255.0)

    // store the 8 bit values inside 32 bit variables so that we can
    // perform multiplications without overflowing
    upscaledR := uint32(c.R)
    upscaledG := uint32(c.G)
    upscaledB := uint32(c.B)

    // perform integer upscaling of the color components
    upscaledR = upscaledR + (upscaledR << 8)
    upscaledG = upscaledG + (upscaledG << 8)
    upscaledB = upscaledB + (upscaledB << 8)

    // convert to the [0.0, 65535.0] range
    r16 := uint32(float32(upscaledR) * a32f)
    g16 := uint32(float32(upscaledG) * a32f)
    b16 := uint32(float32(upscaledB) * a32f)
    a16 := uint32(a32f * 65535.0)
    return r16, g16, b16, a16
}
Enter fullscreen mode Exit fullscreen mode

By now you should start to see a resemblance to the official source code.

Another thing to realize is that adding an 8 bit number to any number whose first 8 bits are zeroes is the same as just replacing the first 8 bits.

For example, if we were to have a number A that has ????????00000000 as its 16 bit representation (where each ? can by either a 0 or 1) and were to add a number B that has 10110110 as its 8 bit representation, then the sum of the two would be ????????10110110. This is basically a bit-wise | (or) operation.

In our code above, we multiply each component by 256 by doing 8 bit-shifts to the left. This means, that we have 8 lower bits that are zero. This means that we can use bit-wise | to perform the addition. In general, bit-wise | is much quicker than arbitrary addition.

We get the following code.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // convert alpha to the range [0.0, 1.0]
    a32f := float32(c.A) / float32(255.0)

    // store the 8 bit values inside 32 bit variables so that we can
    // perform multiplications without overflowing
    r := uint32(c.R)
    g := uint32(c.G)
    b := uint32(c.B)

    // perform integer upscaling of the color components
    r = r | (r << 8)
    g = g | (g << 8)
    b = b | (b << 8)

    // convert to the [0.0, 65535.0] range
    r16 := uint32(float32(r) * a32f)
    g16 := uint32(float32(g) * a32f)
    b16 := uint32(float32(b) * a32f)
    a16 := uint32(a32f * 65535.0)
    return r16, g16, b16, a16
}
Enter fullscreen mode Exit fullscreen mode

Let's look at the alpha component for a bit. We do the following steps to evaluate it.

// convert alpha to the range [0.0, 1.0]
a32f := float32(c.A) / float32(255.0)
// convert to the [0.0, 65535.0] range
a16 := uint32(a32f * 65535.0)
Enter fullscreen mode Exit fullscreen mode

We convert the alpha down to the [0.0, 1.0] range, only so that we can scale it back up to [0.0, 65535.0]. In essence, we divide it by 255.0 and afterwards multiply it by 65535.0. Well, (v / 255.0) * 65535.0 equals (v * 65535.0) / 255.0, which equals v * 257.0. We see the well familiar 257-multiplication paradigm. Let's apply the same logic as we did with the color components.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // convert alpha to the range [0.0, 1.0]
    a32f := float32(c.A) / float32(255.0)

    // store the 8 bit values inside 32 bit variables so that we can
    // perform multiplications without overflowing
    r := uint32(c.R)
    g := uint32(c.G)
    b := uint32(c.B)
    a := uint32(c.A)

    // perform integer upscaling of the components
    r = r | (r << 8)
    g = g | (g << 8)
    b = b | (b << 8)
    a = a | (a << 8)

    // convert to the [0.0, 65535.0] range
    r16 := uint32(float32(r) * a32f)
    g16 := uint32(float32(g) * a32f)
    b16 := uint32(float32(b) * a32f)
    return r16, g16, b16, a
}
Enter fullscreen mode Exit fullscreen mode

Note: We still need the a32f variable that holds the alpha in the [0.0, 1.0] range, since we are still using it for the alpha-premultiplication of the color components. Is there something we could do about this as well?

In the code above, we are using the following equation for color alpha-premultiplication.

c_alpha_premultiplied = uint32(float32(c) * (float32(a) / 255.0))
Enter fullscreen mode Exit fullscreen mode

If we move the brackets a bit, we can get to the following, which yields the same output.

c_alpha_premultiplied = uint32((float32(c) * float32(a)) / 255.0)
Enter fullscreen mode Exit fullscreen mode

In the above equation, we can multply c and a before casting them to float32, since they are integer values.

c_alpha_premultiplied = uint32(float32(c * a) / 255.0)
Enter fullscreen mode Exit fullscreen mode

Now that we have rearranged the equation in this simplified form, we can notice that we don't need to use a floating point division. Since we are casting the end result to an integer (uint32 in this case), we are losing any digits after the decimal, hence we might as well use integer division which has that same behavior instead.

c_alpha_premultiplied = uint32(c * a) / 255
Enter fullscreen mode Exit fullscreen mode

Note: This is only possible and valid if c * a fits inside uint32. In our case c is an upscaled value that occupies 16 bits and a (as used here; prior to upscaling) occupies 8 bits. This means that in total a maximum of 24 bits would be needed and since we have 32 to work with, we should be fine.

Let's apply the above reasoning to our code.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    // store the 8 bit values inside 32 bit variables so that we can
    // perform multiplications without overflowing
    r := uint32(c.R)
    g := uint32(c.G)
    b := uint32(c.B)
    a := uint32(c.A)

    // perform color alpha-premultiplication of the color components
    r = r | (r << 8)
    r = (r * a) / 255
    g = g | (g << 8)
    g = (g * a) / 255
    b = b | (b << 8)
    b = (b * a) / 255

    // do the alpha integer upscaling last
    a = a | (a << 8)

    return r, g, b, a
}
Enter fullscreen mode Exit fullscreen mode

Lastly, we need to recall the following Go short-hand expressions.

c = c | (c << 8)

// is the same as

c |= (c << 8)
Enter fullscreen mode Exit fullscreen mode
c = (c * a) / 255

// is the same as

c *= a
c /= 255

// which is the same as

c *= a
c /= 0xff
Enter fullscreen mode Exit fullscreen mode

Using these and a bit of rearranging, we get to the following code.

func (c CustomColor) RGBA() (uint32, uint32, uint32, uint32) {
    r := uint32(c.R)
    g := uint32(c.G)
    b := uint32(c.B)
    a := uint32(c.A)

    r |= (r << 8)
    r *= a
    r /= 0xff

    g |= (g << 8)
    g *= a
    g /= 0xff

    b |= (b << 8)
    b *= a
    b /= 0xff

    a |= (a << 8)

    return r, g, b, a
}
Enter fullscreen mode Exit fullscreen mode

Except for a few syntactic differences, we have done it - we have simplified and optimized our Color implementation of a non-alpha-premultiplied RGBA8 color to match the official one.

Summary

I realize that not everyone has to deal with Go's image or color packages. Even fewer people would ever need to implement their own Color type.

Still, I am hoping that this managed to shed some light on alpha-premultiplied colors and the most popular Go color formats to those readers that will need to use them.

Also, while micro-optimizations are usually the last line of defense when dealing with performance problems, in situations where a function is expected to be called a million times within a short duration of time (as is the case with RGBA), micro-optimizations and knowing how to apply them can be a critical.

Top comments (0)