DEV Community

Seth Bang
Seth Bang

Posted on

Bulk Object Detection and Cropping with DETR

Here is a link to my repo

The Python file, main.py, is an object detection application that uses the DEtection TRansformer (DETR) model from Facebook's Hugging Face library. It allows users to identify and crop images of detected objects, storing these cropped images in a specified output directory.

The application provides a Graphical User Interface (GUI) developed with the tkinter library, where users can specify the input directory of images, the output directory for the cropped images, and a confidence level for the model to use in object detection.

Now, let's break down the script into sections and explain each part in detail.

Importing Required Libraries

The script begins by importing the necessary libraries. These include:

  • tkinter: a standard Python interface to the Tk GUI toolkit, used for developing desktop applications.
  • filedialog: a tkinter module for displaying dialog boxes that let the user select files or directories.
  • PIL (Pillow): a library for opening, manipulating, and saving many different image file formats.
  • transformers: a state-of-the-art Natural Language Processing (NLP) library that provides pre-trained models for various tasks, including the DETR model used here for object detection.
  • torch: a Python library for scientific computing, especially deep learning, providing tensors that can run on either a CPU or a GPU.
  • requests and os: standard Python libraries for handling HTTP requests and interacting with the operating system, respectively.

Initializing the Model and Processor

The DetrImageProcessor and DetrForObjectDetection classes are imported from the transformers library. These are initialized with the pretrained DETR model from Facebook, "facebook/detr-resnet-50".

Defining the Image Crop Function

def image_crops(input_directory, output_directory, confidence):

    # Loop through every file in the input directory
    for filename in os.listdir(input_directory):
        # Get the path to the current file
        curr_path = os.path.join(input_directory, filename)

        # Open the current file as an image
        image = Image.open(curr_path)

        # Pass the image to the model to detect objects
        inputs = processor(images=image, return_tensors="pt")
        outputs = model(**inputs)

        # Get the dimensions of the image
        target_sizes = torch.tensor([image.size[::-1]])

        # Post process the model outputs to get the detected objects
        results = processor.post_process_object_detection(
            outputs, target_sizes=target_sizes, threshold=confidence)[0]

        # Loop through the detected objects
        with Image.open(curr_path) as im:
            for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
                # Round the coordinates of the detected object
                box = [round(i, 2) for i in box.tolist()]
                # Get the label of the detected object
                label_text = model.config.id2label[label.item()]
                print(
                    f"Detected {model.config.id2label[label.item()]} with confidence "
                    f"{round(score.item(), 3)} at location {box}"
                )

                # Create a directory for the label if it does not exist
                if not (os.path.exists(f"{output_directory}/{label_text}")):
                    os.mkdir(f"{output_directory}/{label_text}")
                counts[label_text] = counts.get(label_text, 0) + 1
                remote_region = im.crop(box)
                remote_region.save(
                    f"{output_directory}/{label_text}/{label_text}_{counts[label_text]}.jpg")

Enter fullscreen mode Exit fullscreen mode

The image_crops function takes three arguments: input_directory, output_directory, and confidence. This function iterates through all the images in the input_directory, performs object detection on each image, and saves the cropped images in the corresponding output_directory. The confidence parameter is a threshold for the model to determine if an object is present or not.

The function performs the following steps:

  1. Loops through each image file in the input directory.
  2. Opens the image file and processes it using the DetrImageProcessor.
  3. Passes the processed image to the DetrForObjectDetection model.
  4. Gets the dimensions of the image and post-processes the model outputs to get the detected objects and their bounding boxes.
  5. Loops through each detected object and crops the image based on the bounding box coordinates.
  6. Saves the cropped image to the output directory, creating new directories for each detected object type if necessary.

GUI Functions

Several functions are defined to interact with the GUI:

  • select_input_dir: Lets the user choose the input directory.
  • select_output_dir: Lets the user choose the output directory.
  • submit: Gets the selected directories and confidence level, calls the image_crops function with these parameters, and closes the application after processing.

Building the GUI

The tkinter library is used to create the GUI. The application window is created using tk.Tk(). The GUI contains buttons for selecting the input and output directories, a slider for setting the confidence level, and a submit button to start the processing. The grid function is used to position these elements in the application window.

The mainloop function is called to start the tkinter event loop, which waits for user interaction and responds accordingly.

The final script is a complete application that allows users to perform object detection and image cropping tasks easily. It is a great example of how powerful machine learning models can be combined with user-friendly interfaces to create practical tools.

Some Examples

Top comments (0)