DEV Community

Ayan Dutta
Ayan Dutta

Posted on

Retinal Fundus Image Analysis and Classification

My Final Project

Abstract

Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image retrieval (CBIR) system. Despite extensive research efforts for decades, it remains one of the most challenging open problems that considerably hinders the successes of real-world CBIR systems. The key challenge has been attributed to the well-known “semantic gap” issue that exists between low-level image pixels captured by machines and high-level semantic concepts perceived by human. Among various techniques, machine learning has been actively investigated as a possible direction to bridge the semantic gap in the long term. Inspired by recent successes of deep learning techniques for computer vision and other applications, in this paper, we attempt to address an open problem: if deep learning is a hope for bridging the semantic gap in CBIR and how much improvements in CBIR tasks can be achieved by exploring the state-of-the-art deep learning techniques for learning feature representations and similarity measures. Specifically, we investigate a framework of deep learning with application to CBIR tasks with an extensive set of empirical studies by examining a state-of-the-art deep learning method (Convolutional Neural Networks) for CBIR tasks under varied settings. From our empirical studies, we find some encouraging results and summarize some important insights for future research.

Introduction

Retinal vessel segmentation and delineation of morphological attributes of retinal blood vessels, such as length, width, tortuosity, branching patterns and angles are utilized for the diagnosis, screening, treatment, and evaluation of various cardiovascular and ophthalmologic diseases such as diabetes, hypertension, arteriosclerosis and choroidal neovascularization. Automatic detection and analysis of the vasculature can assist in the implementation of screening programs for diabetic retinopathy, can aid research on the relationship between vessel tortuosity and hypertensive retinopathy, vessel diameter measurement in relation with diagnosis of hypertension, and computer-assisted laser surgery. Automatic generation of retinal maps and extraction of branch points have been used for temporal or multimodal image registration and retinal image mosaic synthesis. Moreover, the retinal vascular tree is found to be unique for each individual and can be used for biometric identification.
The retrieval performance of a content-based image retrieval system crucially depends on the feature representation and similarity measurement.
Medical imaging is fundamental to modern healthcare, and its widespread use has resulted in the creation of image databases, as well as picture archiving and communication systems. These repositories now contain images from a diverse range of modalities, multidimensional (three-dimensional or time-varying) images, as well as co-aligned multimodality images. These image collections offer the opportunity for evidence-based diagnosis, teaching, and research; for these applications, there is a requirement for appropriate methods to search the collections for images that have characteristics similar to the case(s) of interest. Content-based image retrieval (CBIR) is an image search technique that complements the conventional text-based retrieval of images by using visual features, such as color, texture, and shape, as search criteria. Medical CBIR is an established field of study that is beginning to realize promise when applied to multidimensional and multimodality medical data. In this paper, we present a review of state-of-the-art medical CBIR approaches in five main categories: two-dimensional image retrieval, retrieval of images with three or more dimensions, the use of non-image data to enhance the retrieval, multimodality image retrieval, and retrieval from diverse datasets. We use these categories as a framework for discussing the state of the art, focusing on the characteristics and modalities of the information used during medical image retrieval. The key idea of distance metric learning is to learn an optimal metric which minimizes the distance between similar images and simultaneously maximizes the distance between dissimilar images.
Deep learning frame-work for CBIR, which consists of two stages: (i) training a deep learning model from a large collection of training data; and (ii) applying the trained deep model for learning feature representations of CBIR tasks in a new domain. Representations of CBIR tasks in a category, has found state-of-the-art performance on various tasks and new domain.
Visual attention, a selective procedure of human's early vision, plays a very important role for humans to understand a scene by intuitively emphasizing some focused regions/objects. Being aware of this, we propose an attention-driven image interpretation method that pops out visual attentive objects from an image iteratively by maximizing a global attention function. In this method, an image can be interpreted as containing several perceptually attended objects as well as a background, where each object has an attention value. The attention values of attentive objectives are then mapped to importance factors so as to facilitate the subsequent image retrieval. Experiments on 7376 Hemera color images annotated by keywords show that the retrieval results from our attention-driven approach compare favorably with conventional methods, especially when the important objects are seriously concealed by the irrelevant background.

Work Description

We trained a CNN model on 20 DRIVE training images. Training has been conducted on 19000 training patches of size 48*48 generated from training dataset. Test image distances are measured considering patches also. Patch coordinates are maintained to identify the position of the patch in original image. CNN features of different layers aim to encode different-level information. High-layer features care more about semantic information but less detailed information, while low-layer features contain more detail information but suffer from the problem of background clutter and semantic ambiguity. So, we measured distance between training and test images taking highest four layers of CNN into consideration. Euclidean distance metric has been used to measure distance between two images.

Future Scope

In this project we have used only Euclidean distance to find out similarity between two images. Further work can be done using other better similarity index such as Minkowski Distance (distance measurement in normed vector space), Manhattan Distance (rectilinear distance), Spearman rank (nonparametric measure of rank correlation) coefficient etc.
CBIR can also be applied on other datasets where gradation of images is available like Messidor, EyePACS, Kaggle, ROC, DIARET DB etc., which may provide better result.
To increase the number of sample as well as diversity of image and to check performance of model, Cross Dataset CBIR i.e. trained on a dataset and result evaluated on another dataset can be applied.

Top comments (0)