DEV Community

Cover image for EfficientDet Implementation for Object Detection
nekot0
nekot0

Posted on

EfficientDet Implementation for Object Detection

I have been interested in Machine Learning but left it untouched for years. I finally decided to start training myself so that I get insight into data usage and the capability of coding by myself. I found an interesting image competition and started with it. The competition had already finished but the data is still available and I can still submit my prediction and get the score.

The tasks are straightforward, including object detection and classification. I found EfficientDet as a useful model these days that manages both of these tasks, and decided to develop a model with it. However, the implementation was extremely hard. Some error messages require me to edit the packages imported, which I couldn't manage in Kaggle notebook. Therefore, I re-started the implementation of EfficientDet with a simple data.

A useful example I found is the blog written in Japanese in Oct 2021. The setting is to detect a red circle on a black square background. The source code on the blog worked in most parts, but I met some errors when I tested it in April 2023. The below is a note of the errors and remedies, and the accuracy of result.

Errors and remedies

  • view size is not compatible with input tensor's size and stride

We train the model with images and bounding boxes input like below:

# Training loop
for epoch in range(1, args.epoch+1):
  ...
  for (inputs, targets) in t:
    ...
    losses = bench(inputs, targets)
    ...
Enter fullscreen mode Exit fullscreen mode

The error below occurred:

# Error message
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-9f3114f08672> in <cell line: 19>()
     33     targets['cls'] = targets['cls']
     34     optimizer.zero_grad()
---> 35     losses = bench(inputs, targets)
     36     loss = losses['loss']
     37     loss.backward()

/usr/local/lib/python3.9/dist-packages/effdet/anchors.py in batch_label_anchors(self, gt_boxes, gt_classes, filter_valid)
    396                     cls_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
    397                 box_targets_out[level_idx].append(
--> 398                     box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
    399                 count += steps
    400                 if last_sample:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Enter fullscreen mode Exit fullscreen mode

'view' function is called when the model inputs images and bounding boxes to reshape the bounding boxes. So, the error suggests to use 'reshape' function instead of 'view' function. I edited 'anchors.py' in the effdet package like below:

> Line 398 Before correction
  #box_targets[count:count + steps].view([feat_size[0], feat_size[1], -1]))
> After correction
  box_targets[count:count + steps].reshape([feat_size[0], feat_size[1], -1]))
Enter fullscreen mode Exit fullscreen mode
  • Labels vanish when the bounding box goes out of boundaries

The dataset makes augmentation processes before outputting the data. These processes include in the first part randomly cropping the input image, which sometimes delete the information of bounding boxes and labels when the bounding boxes are cropped out from the original image. The sample code defines the process if this case happens, but it only defines the new bounding box and doesn't define the new labels, which causes the error.

class CircleDataset(Dataset):
  ...

  def __getitem__(self, idx):
    ...

    if bboxes.shape[0] == 0:
      bboxes = torch.zeros([1, 4], dtype=bboxes.dtype)

    ...
    return x, y

  ...
Enter fullscreen mode Exit fullscreen mode
# Error message
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-9f3114f08672> in <cell line: 19>()
     28   t = tqdm(loader, leave=False)
     29 
---> 30   for inputs, targets in t:
     31     inputs = inputs
     32     targets['bbox'] = targets['bbox']
/usr/local/lib/python3.9/dist-packages/torch/utils/data/_utils/collate.py in collate_tensor_fn(batch, collate_fn_map)
    161         storage = elem.storage()._new_shared(numel, device=elem.device)
    162         out = elem.new(storage).resize_(len(batch), *list(elem.size()))
--> 163     return torch.stack(batch, 0, out=out)
    164 
    165 
RuntimeError: stack expects each tensor to be equal size, but got [0] at entry 0 and [1] at entry 1
Enter fullscreen mode Exit fullscreen mode

To avoid this, a new label as well as a new bounding box needs to be re-defined when they vanish.

# After correction
if bboxes.shape[0] == 0:
    bboxes = torch.zeros([1, 4], dtype=bboxes.dtype)
    labels = torch.FloatTensor(np.array([0])) # Added
Enter fullscreen mode Exit fullscreen mode

Accuracy

I obtained a prediction, taking one image randomly from the training set and inputting it into the trained model.

Prediction uses DetBenchPredict within the effdet package. The original data size is (3, 512, 512) while DetBenchPredict takes batch as its input. So, I added a dimension using 'unsqueeze' function.

DetBenchPredict outputs (N, 6) tensor. N is the number of bounding boxes predicted, and the meaning of each of the six elements is:

  • x-coordinate of bounding box top left
  • y-coordinate of bounding box top left
  • x-coordinate of bounding box bottom right
  • y-coordinate of bounding box bottom right
  • probability that the image is classified correctly
  • classification

The code is as below. Bounding boxes are drawn if the probability is over 50%.

image, targets = dataset.__getitem__(0)
image = image.unsqueeze(0)

bench = DetBenchPredict(model)
with torch.no_grad():
  output = bench(image)

# Draw the predictions with over 50% probability
fig, ax = pp.subplots()
ax.imshow(image[0,:,:])

for i in range(output.shape[1]):
  if output[0, i, 4] > 0.5:
    x1 = int(output[0, i, 0])
    y1 = int(output[0, i, 1])
    width = int(output[0, i, 2] - output[0, i, 0])
    height = int(output[0, i, 3] - output[0, i, 1])
    rect = patches.Rectangle((x1, y1), width, height, edgecolor='r', facecolor='none')
    ax.add_patch(rect)
    print(output[0,i,:])

pp.show()
Enter fullscreen mode Exit fullscreen mode

The accuracy after 1 epoch is like this:

(output[0, i, :)
tensor([ 14.0453, 114.7553,  26.5884, 158.7972,   0.6781,   1.0000])
tensor([144.7045, 129.4016, 182.4770, 259.8239,   0.6156,   1.0000])
tensor([ -0.6067, 162.9664,  68.7289, 175.3027,   0.5549,   1.0000])
tensor([ -4.6260,   7.1583, 156.3810, 120.1586,   0.5246,   1.0000])
tensor([ 29.6035,  88.9964,  99.8469, 168.4458,   0.5069,   1.0000])
tensor([182.1268, 257.2897, 182.7585, 465.5251,   0.5004,   1.0000])
Enter fullscreen mode Exit fullscreen mode

Image description

The accuracy after 10 epoch is like this:

Image description

Top comments (0)