S.HARIHARA SUDHAN

Posted on Nov 18, 2022

Data Augmentation in CNN

#deeplearning #python #machinelearning

Machine learning can be used by algorithms to distinguish between various objects and categorize them for picture recognition. This developing technique uses Data Augmentation to create models that perform better. Machine learning models must be able to recognize an object under every circumstance, including rotation, zoom-in, and blurry images. Researchers required a synthetic method of incorporating training data with accurate adjustments.

The process of artificially deriving new data from previously collected training data is known as data augmentation. Techniques include cropping, padding, flipping, rotating, and resizing. It strengthens the model's performance and addresses problems like overfitting and a lack of data.

Data augmentation offers a variety of options for modifying the original image and may be helpful in providing enough data for larger models. So it is important to know the advantages and disadvantages of data augmentation. Let us jump into it.

Introduction

If there is enough data, convolutional neural networks (CNNs) are capable of doing incredible things. However, choosing the right quantity of training data for each feature that needs to be trained might be challenging. The network may overfit on the training data if the user does not have enough information. Images that are realistic include different sizes, stances, zoom, lighting, noise, etc.

The Data Augmentation approach is employed to make the network resilient to these often occurring phenomena. The network will experience these phenomena during training by rotating input images at different angles, flipping images along different axes, or translating/cropping the images.

A CNN needs additional instances to show to the machine learning model as more parameters are added. Higher performance can come at the expense of deeper networks requiring more training data and longer training times.

Because of this, it is also practical to avoid having to look for or produce additional Images that are appropriate for an experiment. Data augmentation can lower the cost and effort associated with expanding the pool of training samples that are accessible.

Data Augmentation Techniques

Some libraries employ data augmentation by making copies of the training images and storing them alongside the originals. This generates fresh training data for the machine learning model. Other libraries merely specify a set of transformations to apply to the training data input. These changes are made at random. The optimizer is now searching for more space as a result. This has the benefit of not requiring more disc space to enhance the training.

Image Data Augmentation is now become famous and common method used with CNNs and involves techniques such as:

Flips
Rotation (at 90 degrees and finer angles)
Translation
Scaling
Salt and Pepper noise addition

Data Augmentation has even been used in applications like image recognition.

For explaining I am going to take one original image of cat and perform actions on that

i)Flips:

The optimizer won't become biased if images are flipped because certain features aren't exclusively on one side. The original training image is rotated either vertically or horizontally over one axis of the image to do this augmentation. As a result, the features are constantly shifting.

While flipping is an augmentation comparable to rotation, it results in mirror pictures. A specific element, such a person's head, either remains at the top, bottom, left, or right of the image.

ii)Rotation:

Even while rotation, a type of augmentation, is frequently carried out at 90-degree angles, it can also take place at smaller or minute degrees if the demand for additional data is significant. The backdrop color is frequently fixed for rotation so that it will merge when the image is rotated. Otherwise, the model may infer that the changed background is a unique characteristic. When the background is the same throughout all rotated photos, this works well.

Rotational motion is used by some features. For instance, a person's head will be turned 10, 22.7, or -8 degrees. Rotation, unlike flips, does not alter the feature's orientation and does not result in mirror images. This makes it easier for models to disregard the angle as a unique characteristic of people.

iii)Translation:

When an image is translated, the primary object is moved about the frame in different ways. Think of a person in the center of the frame with all of their components visible as an example and use that as your starting point. Next, move the person to a corner and translate the image such that the legs are chopped off at the bottom.

As you can see the image is slightly cropped at the corner
Translation makes sure that the object is visible across the image, not simply in the middle or on one side. The training data can be expanded to include a range of different translations so that the network can identify translated objects.

iv)Scaling:

A machine learning model's training data are made more diverse through scaling. No matter how closely or far away the image is zoomed in, scaling the image will guarantee that the object is recognized by the network. Sometimes the object's center is incredibly small. The object may occasionally be zoomed in on and even cropped in some places.

As you can see the image is zoomed and also cropped at some places.

v)Salt and Pepper Noise:

The addition of black and white dots that resemble salt and pepper to the image is known as "salt and pepper noise addition." This mimics the dust and flaws found in actual photographs. The photographer's camera doesn't even need to be sharp or spotless for the model to recognize the picture. To give the model training with more realistic visuals, the training data set is expanded.

Simple Implementation

datagen=ImageDataGenerator(rotation_range=40,width_shift_range=0.2,height_shift_range=0.2,shear_range=0.2,zoom_range=0.2,horizontal_flip=True,fill_mode='nearest')
img=load_img('Images/Dog_or_cat.jpg') #Image
X=img_to_array(img)
X=X.reshape((1,)+X.shape)

#this.flow() command below generates batches of randomly transformed images
#and saves the result in 'preview/' directory

i=0
for batch in datagen.flow(X,batch_size=1,save_to_dir='preview',save_prefix='cat',save_format='jpeg'):
    i+=1
    if i>20:
        break

py

Adding Salt and Pepper noise to Image :

def add_salt_noise(img):

    # Getting the dimensions of the image
    row , col = img.shape

    # Randomly pick some pixels in the
    # image for coloring them white
    # Pick a random number between 300 and 10000
    number_of_pixels = random.randint(300, 10000)
    for i in range(number_of_pixels):

        # Pick a random y coordinate
        y_coord=random.randint(0, row - 1)

        # Pick a random x coordinate
        x_coord=random.randint(0, col - 1)

        # Color that pixel to white
        img[y_coord][x_coord] = 255

    # Randomly pick some pixels in
    # the image for coloring them black
    number_of_pixels = random.randint(300 , 10000)
    for i in range(number_of_pixels):

        # random y coordinate
        y_coord=random.randint(0, row - 1)

        # random x coordinate
        x_coord=random.randint(0, col - 1)

        # Color that pixel to black
        img[y_coord][x_coord] = 0

    return img

# salt-and-pepper noise can
# be applied only to grayscale images
# Reading the color image in grayscale image
img = cv2.imread('Images/Dog_or_cat.jpg',
                 cv2.IMREAD_GRAYSCALE)

#Storing the image
cv2.imwrite('Images/salt-and-pepper-lena.jpg',
            add_salt_noise(img))

py

Advantages

As a result of Data Augmentation's assistance in identifying samples the model has never seen before, prediction improvement in a model becomes more accurate.
The model has access to enough data to comprehend and train on all of the given parameters. In applications where data collecting is challenging, this may be crucial.
Helps avoid model overfitting by increasing the variability of the data through data augmentation.
Can speed up processes where gathering additional data takes longer.
Can lower the cost needed to acquire different types of data if data collection is expensive.

Drawbacks

When the variety required by the application cannot be intentionally supplied, data augmentation is useless. For instance, if training data for a model of bird recognition exclusively included red birds. By creating images with the bird's color altered, the training data could be improved.

When there is not enough diversity in the initial data, the artificial augmentation method may fail to capture the realistic color details of birds. If the augmentation approach, for instance, merely substituted red for blue, green, etc. Realistic non-red birds could have more intricate color variations, making it possible for the model to miss the color. Even still, having enough data is crucial if one wants Data Augmentation to function successfully.

Underfitting is another problem that may arise from improper Data Augmentation. To account for the higher amount of training data characteristics, the number of training epochs must be increased. It might have a suboptimal configuration if the optimization is not carried out over a sufficient number of samples.

Data augmentation will also not correct the biases in the existing data set. In the same bird example, if the training data only comprises Eagles, it would be challenging to develop an artificial augmentation technique that produces diverse species of birds.

But though it has many drawbacks it is still one of the best method that is used by researchers and in industries.

DEV Community

Data Augmentation in CNN

Introduction

Data Augmentation Techniques

Simple Implementation

Advantages

Drawbacks

Top comments (0)

Read next

A Novice Guide to Large Language Model (LLM)

AI/ML - Langchain4j - Chat Memory

Day 1 of 30 : Machine Learning

AI Runner deck