It is both very clever and simple and you could use this same model for many image classification applications.
Odei Garcia-Garin et al. from the University of Barcelona have developed a deep learning-based algorithm able to detect and quantify floating garbage from aerial images. They also made a web-oriented application allowing users to identify these garbages, called floating marine macro-litter, or FMML, within images of the sea surface. Floating marine macro-litter is any persistent, manufactured, or processed solid material lost or abandoned in a marine compartment. As you most certainly know, these plastic wastes are dangerous for fish, turtles, and marine mammals as they can either ingest them or get entangled and hurt.
Traditional approaches to detecting these FMMLs are observer-based methods. Meaning that they require someone on a vessel or airplane to look for them, yielding to precise identification but extremely expensive and time-demanding labor. Fortunately, this detection can be done using cameras or sensors on aerial vehicles. But it also requires trained scientists to manually look at the collected data being again extremely time-consuming. Automation is needed here and could help us improve the quality of our marine compartments worldwide much more effectively.
This is where machine learning and deep learning come into play. Deep learning proves over and over that it is a very powerful automation tool, and especially in the computer vision industry where it is known to automatically identify the important features of an image without any human supervision, making this approach less time-demanding than its predecessors. As you may suspect, they used convolutional neural networks to attack this problem. This type of neural network is the most commonly used deep learning architecture in computer vision. The idea behind this deep neural network architecture is to mimic the human’s visual system. If you want to learn more about the foundation of convolutional neural networks, or CNNs.
They trained their algorithm with aerial images like this one taken by drones and aircraft with annotations made by the same professionals that are usually manually analyzing them. This is a challenging task even for deep learning because of all the possible variations in colors and sun reflections.
In short, their model is a regular binary classifier CNN architecture composed of convolutions and pooling, terms that I explained in the video I referenced earlier, that outputs a binary response, telling us if there are FMMLs or not from an input image. The depth of the network is due to these convolution layers compressing the image and creating many feature maps, which are the outputs of the filters, ending with a general representation of the image allowing us to know “in general” what the image contains, such as FMML in this case. Note that this same architecture could have been used on any other computer vision application with the task to classify whether or not something is in the image, such as spotting a defect on a manufactured part or tell if there is a dog or not. What they did differently making it powerful to FMML detection is that they had the idea to split the image into 25 smaller cells that each outputs a classification result, FMML or not, yielding much better overall accuracy.
Then, they used the Shiny package of R to develop their application. Their algorithm allows the detection and quantification of FMML as well as providing support to the monitoring and assessment of this environmental threat. However, it is still not completely automated yet and requires a human-in-the-loop. As of now, they are still looking for more annotated aerial images to allow their algorithm to also identify the size, color, and type of FMML, which are relevant information for planning well-targeted policy and mitigation measures.
This is still an amazing application of deep learning with a great use case that will benefit everyone.