nekot0

Posted on Apr 5, 2023

Errors in the implementation of model training with effdet

#machinelearning #tutorial #python

In the previous post, I succeeded to implement the effdet model training in the simple setting and somewhat understood the code. Now, I apply this understanding to the main problem.

My entire code still does not work but I'd like to take a note of errors I met so far and the remedies.

Warning in transform from numpy.ndarray to torch.tensor

This is not critical but the warning occurs when I tried to convert numpy.ndarray to torch.tensor directly. According to the error message, the conversion is extremely slow because it is made by each element of ndarray. This time, the warning came because I tried to convert a list of numpy.array to torch.tensor. The better description is to make torch.tensor after making the entire list numpy.array.

# warning
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:47: 
UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. 
(Triggered internally at ../torch/csrc/utils/tensor_new.cpp:230.)

Processes required when an input image has multiple bounding boxes

The dataset in this competition has images, each of which has multiple bounding boxes. Also, the number of bounding boxes varies depending on the image.

If each image has one bounding box, it is stored in the form (A), but if multiple bounding boxes, in the form (B).

(A)
    [x0, y0, x1, y1]
        # x0: x-coordinate of top left
        # y0: y-coordinate of top left
        # x1: x-coordinate of bottom right
        # y1: y-coordinate of bottom right

(B)
    [[x0, y0, x1, y1],   # coordinates of bounding box 1
     [x0, y0, x1, y1],   # coordinates of bounding box 2
     ...
     [x0, y0, x1, y1]]   # coordinates of bounding box n

If images have multiple bounding boxes and the number of boxes varies, the tensor sizes are also different from an image to the others.

Therefore, the error below occurs when DataLoader receives different sizes of tensors.

# DataLoader instantiation
dataset = MyDataset(args)
dataloader = DataLoader(dataset, batch_size=4, num_workers=4)

# error message
RuntimeError: stack expects each tensor to be equal size, but got [2, 4] at entry 0 and [1, 4] at entry 1

This error can be avoided by defining 'collate_fn'. collate_fn defines the processes of combining a group of data and making it a batch. I pad the smaller data with 0 when the bounding box is smaller than the others, and made the sizes of all the bounding boxes in batch the same. The below code is an example.

import torch.nn.functional as F
from torch.utils.data import default_collate

def pad_collate_fn(batch):
    # Check the maximum number of bounding boxes in a batch
    shapes = [item[1]['bbox'].shape[0] for item in batch]
    max_shape = max(shapes)

    padded_batch = []
    for x, y in batch:

        # Remove the data with no bounding boxes
        if any(elem == 0 for elem in y['cls']):
            continue

        # Pad with 0 if the box size is smaller than the maximum
        pad_size = max_shape - y['bbox'].shape[0]
        bbox_padding = [0, 0, 0, pad_size]
        cls_padding = [0, 0, 0, pad_size]
        padded_y = {
            'bbox': F.pad(y['bbox'], bbox_padding, mode='constant', value=0),
            'cls': F.pad(y['cls'].reshape((y['cls'].shape[0],1)), cls_padding, mode='constant', value=0)
        }
        padded_batch.append((x, padded_y))

    # Apply the default batch process before return
    return default_collate(padded_batch)


# Instantiation of dataset and dataloader
dataset = MyDataset(args)
dataloader = DataLoader(
    dataset, batch_size=4, num_workers=4, 
    collate_fn=pad_collate_fn   # Pass the collate_fn as defined above
)

Channels and Data structure of input images

Colour images usually have three channels, while dicom images often used in the medical context sometimes have only one channel. As effdet requires three channels, we need to extend the channel from one to three. I made it using opencv.

import pydicom
import cv2

dcm = pydicom.dcmread(dcm_path)
image = dcm.pixel_array.astype("float32")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

We have to be careful for how the image data structure. In this method, the image channel-extended is stored in the form (Width, Height, Channel). This form works, for example, when the image is transformed using Albumentations. However, if we read the image using opencv, the image will be stored in the different form (Height, Width, Channel). The different form can make unexpected errors.

(5 April edit)

The result image obtained from the above method is in the form (Height, Width, Channel). The training worked with this and input bounding boxes of the form (x0, y0, x1, y1).

anchors.py adjustment

After dealing with the errors above, I met another when training the model. The error message says the mask size does not match with something, but I was not sure what it says.

# error message
---------------------------------------------------------------------------
IndexErrorTraceback (most recent call last)
/tmp/ipykernel_27/103307951.pyin <module>
     28forinputs,targets int:
     29optimizer.zero_grad()
---> 30         losses =bench(inputs,targets)
     31loss =losses['loss']
     32loss.backward()

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.pyin _call_impl(self, *input, **kwargs)
   1188if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190             returnforward_call(*input,**kwargs)   1191# Do not call functions when jit is used
   1192full_backward_hooks,non_full_backward_hooks =[],[]

/kaggle/input/effdet-030-package-dataset/packages/effdet/bench.pyin forward(self, x, target)
    140else:    141cls_targets, box_targets, num_positives = self.anchor_labeler.batch_label_anchors(
--> 142                 target['bbox'], target['cls'])
    143
    144loss,class_loss,box_loss =self.loss_fn(class_out,box_out,cls_targets,box_targets,num_positives)

/kaggle/input/effdet-030-package-dataset/packages/effdet/anchors.pyin batch_label_anchors(self, gt_boxes, gt_classes, filter_valid)
    376iffilter_valid:
    377valid_idx =gt_classes[i]>-1# filter gt targets w/ label <= -1
--> 378                 gt_box_list =BoxList(gt_boxes[i][valid_idx])
    379gt_class_i =gt_classes[i][valid_idx]
    380else:

IndexError: The shape of the mask [2, 1] at index 1 does not match the shape of the indexed tensor [2, 4] at index 1

According to the message, something happened in 'anchors.py'. Looking for the cause by printing out the parameters in the file, I found the process that removes bounding boxes with negative labels. The error came because this process did not apply well to the data. So, I edited this process and the error is resolved.

# Line 378 in anchors.py
gt_box_list = BoxList(gt_boxes[i][valid_idx])

# gt_boxes is a bundle of bounding boxes in the batch
# For example,
# gt_boxes:
#    tensor([[[17.1509, 58.3014, 51.9944, 78.0274],
#             [ 6.3188, 22.3562, 27.2609, 40.9863],
#             [38.4542, 19.5068, 53.6192, 37.4795]],
#            [[ 0.0000, 11.3287, 32.6629, 27.3655],
#             [ 0.0000,  0.0000,  0.0000,  0.0000],
#             [ 0.0000,  0.0000,  0.0000,  0.0000]], 
#            [[25.7564, 29.2258, 51.8807, 42.6300],
#             [34.2192, 82.6232, 57.5839, 96.0275],
#             [ 0.0000,  0.0000,  0.0000,  0.0000]]])
# In this case, gt_boxes[2] has the bounding box data of the 2nd image
# gt_boxes[2]:
#    tensor([[25.7564, 29.2258, 51.8807, 42.6300],
#            [34.2192, 82.6232, 57.5839, 96.0275],
#            [ 0.0000,  0.0000,  0.0000,  0.0000]]]
#
# valid_idx is a mask for each gt_boxes[i] that indicates which bounding boxes are effective. 
# In the above example for gt_boxes[2], the classes of 1st and 2nd lines are 1 and the third have class 0, which is removed from the bundle. 
# valid_idx: tensor([[True], [True], [False]])

# Re-write the line
gt_boxes_output = []
    for j in range(valid_idx.shape[0]):
        if valid_idx[j]: 
            gt_boxes_output.append(np.array(gt_boxes[i][j]))
gt_box_list = torch.FloatTensor(np.array(gt_boxes_output))
gt_box_list = BoxList(gt_box_list)

My entire programme still does not work, and I have to struggle with errors for a while...

DEV Community

Errors in the implementation of model training with effdet

Warning in transform from numpy.ndarray to torch.tensor

Processes required when an input image has multiple bounding boxes

Channels and Data structure of input images

anchors.py adjustment

Top comments (0)

Read next

Exposing LLM-Controlled Robots' Vulnerability to Jailbreaking Physical Attacks

This Week In Python

Unlocking Tube Magic AI’s Potential: Features, Pricing, and Performance Review

Detecting and Analyzing Comment Quality Using Vector Search