Sandeep Balachandran

Posted on Mar 21, 2020

Machine Learning - Convolution with color images

#machinelearning #beginners #computerscience

Hey there,
Hoping you are in self quarantined. Your efforts will pays off in the future.
Have a motivational story from me.

Once while performing Charlie Chaplin told the audience a wonderful joke and all the people started laughing. They are so entertained and expected more jokes. Charlie repeated the same joke . But some of them still mangaged to laugh . He again repeated the same joke. Crowd seemed stop laughing and get a little bit tensed. Then he said these beautiful lines. "when you cannot laugh on the same joke again and again...then why do you cry again and again on the same worry."

Moral of the story

Enjoy your life instead of worrying about the past.

Try to fake the laugh if you ever hear a joke back to back so one who told the joke wont get embarassed.

Main Content From Here

In a previous lesson, we learned how to perform convolutions on grayscale images. But how do we perform a convolution on
a color image? Let's begin by taking a quick recap on how we perform a convolution on a grayscale image.

We start by taking a kernel also called a filter of a certain size. Using the same example we used in the previous lesson, here we have a three by three kernel. To perform a convolution of this kernel with the given grayscale image, we center our kernel over the pixels of the image, and take each corresponding pixel and kernel value, multiply them together, sum the whole thing up, and then assign it to the corresponding pixel in the convoluted image.

As an example, here we can see that the original pixel value of 25 gets a value of 198 after the convolution with our kernel.
Remember that we can add zero padding around the entire image in order to calculate the kernel convolution of the image without losing information.

Now, let's see how we can do convolutions for color images.
Just as we did with grayscale images, we'll start by choosing a filter of a particular size. The only difference is that now, the filter itself will be 3D.

The depth of the filter will be chosen to match the number of color channels and our color image.

This is because we're going to convolve each color channel with its own two-dimensional filter. Therefore, if we're working with RGB images, our 3D filter will have a depth of three.

Let's see an example.

Suppose we have an RGB image and we want to convolve it with the following 3D filter. As we can see, the depth of our filter consists of three 2D filters. For simplicity, let's assume that our RGB image is 5 by 5 pixels.

Remember that each color channel corresponds to a two-dimensional array of pixel values, like the ones we see here.

Just as we did with the grayscale images, we'll add zero padding to each of these arrays in order to avoid losing information when performing the convolution.

We're now ready to perform our convolution.

The convolutions will be carried out in exactly the same way as for grayscale images.
The only difference is that now we have to perform three convolutions instead of 1.
Let's see how this is done step-by-step.

Let's start by looking at the red color channel, and let's begin by performing a convolution on the pixel shown here in yellow. We'll use filter number three to perform the convolution on this red color channel.

So as before, we will take each corresponding pixel and filter value, multiply them together, and sum the whole thing up.
In this case, we see that we get a value of two.

We now do the same for the green color channel, except that now we're going to use filter number two to perform the convolution on this channel.

It's important that we choose the same pixel in the green channel as we did in the red channel.

Here, we can see it highlighted in yellow and it has a value of zero. So now, we'll take each corresponding pixel and filter value, multiply them together, and sum the whole thing up.

We can see that we get a value of four in this case.

We now do the same for the blue color channel, except that now we're going to use filter number 1 to perform the convolution on this channel.

It's important that we choose the same pixel in the blue channel as we did in the red and the green channels.Here, we can see it highlighted in yellow, and it has a value of zero as well.

So now, we'll take each corresponding pixel and filter value, multiply them together, and sum the whole thing up. We can see that we get a value of two in this case.

Now that we've calculated the convolutions for each color channel, we'll add these numbers together. For this particular example, we have 2 plus 4 plus 2, which gives us a value of eight.
However, it's customary to add a bias value, which usually has a value of 1.

This results in a final value of 9. So 9 will be the corresponding value in our new convoluted output. To get the full convoluted output, we just perform the same operations for all the other pixels in each of our color channels.

In this particular example, the resulting convoluted output is a two-dimensional array with the same width and height of the RGB image.

As we can see, a convolution with a single 3D filter produces a single convoluted output.However, when working with CNNs, it's customary to use more than 1 3D filter.

If we use more than 1 filter, then we willget 1 convoluted output per filter. For example, if we use three filters, then we'll get three convoluted outputs.

Therefore, we can think of the convoluted output as being 3D, where the depth will correspond to the number of filters.

In this example, since we are using three filters, the convoluted output will have a depth of
three. Similarly, if you had 16 filters, the depth of this convolutional output will be 16.
In code,

we can control how many outputs a conv2D layer creates by specifying the number of filters. We can also specify the size of the 3D filters by using the kernel size argument.
For example,

in order to create three filters with a size of three by three like we have in our example, we'll use these arguments.Remember that when we train the CNN, the values and the 3D kernels will be updated so as to minimize theloss function.Now that we know how to perform convolutions on color images.

Let's see how we can apply max pooling in the next post.

Top comments (2)

Brad Messer • May 11 '20

Great read so far! Thanks for all the excellent info! Towards the end when we take multiple 3D filters and apply that to an input image, I had some confusion around the format of that image.

In the last picture, is a single 3D filter applied to each color channel? If so, how does that work? My assumption is the single color channel is filtered through each layer of the 3D filter and then summed up similarly to how pixels were summed up in the single 3d filter case across color channels.

Really excited to learn from you! This great material.