Medical Image Analysis is a fast growing research area in the literature. It has become an essential part of the health-care system in recent years. The success of the various studies resulted in concrete products that help medical experts to make decisions and help patients to have better health-care services. With the help of machine learning and deep learning algorithms medical experts have more tools to fight against the diseases.
Medical images come in different forms. Some of these forms are X-ray, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound imaging, fundoscopic Images etc. The medical expert analyzes the targeted area within one of these images to perform diagnosis. In order to make this process easier, segmentation of the regions that are meaningful for the task is important. The segmentation task is achieved by extracting the target class while ignoring the objects from other classes. There are various approaches to solve this segmentation problem. There are signal processing based approaches 
, heuristic techniques[Nguyen2013AnER]
, Support Vector Machine based applications, and there are also deep learning applications [10.1007/978-3-319-46723-8_16, kamran2021rvgan]. Deep Learning applications have gained popularity in recent years thanks to the increasing computing power. One of the areas that it is being heavily used is medical image segmentation.
It is known that the success of the deep learning models relies on the input data volume. In supervised techniques, we also need the annotations of this input data. Having annotated data is always costly. In medical problems, it also requires a high level of expert knowledge. For this reason, the lack of annotated data is a much more serious problem in medical image analysis than most of the other imaging problems. One of the specific areas in medical imaging that this problem appears is Retinal Vessel Images. However, obscured details in the fundoscopic images makes the decision making process hard for the medical experts. The edges of the vessels are extremely thin and quite hard to segment. The quality of the input images are also questionable. Input images might be noisy and key information may have become very hard to extract in order to detect diseases. A mistake in that process may cause false positive or false negative diagnosis. The quality of the input image might be affected by the illumination, sensor noises, incorrect angle, type of the filter used in retinal fundus cameras etc.
Successful segmentation of the retinal vessel segmentation has been widely studied and it is still one of the hot research areas. Various different architectures are tailored for this specific problem and numerous existing deep learning architectures are used in order to perform the segmentation task. The scarcity of the annotated data pushed researchers to use data augmentation to a certain amount in order to avoid the overfitting problem. However, the usage of the data augmentation is limited in these studies. In our study we claim that if data augmentation strategy can successfully address the problems of the input data, a successful segmentation model can be obtained. In our study, we are looking for the performance gains that can be obtained by the excessive data augmentation using U-Net architecture for retinal vessel segmentation problems. We use the DRIVE dataset111https://drive.grand-challenge.org/ that has become one of the standard benchmarks in the retinal vessel segmentation studies.222All models and methods are available at https://github.com/onurboyar/Retinal-Vessel-Segmentation
2 Related Word
The Retinal Vessel Segmentation problem has gained much interest in the literature in recent years. There are different types of segmentation strategies with varying complexities.
Medical Image Segmentation studies rely heavily on the U-Net architecture [ronneberger2015unet]. In , MResU-Net is derived from U-Net architecture by replacing convolutional layers with residual blocks in order to achieve better accuracy and increase the depth of the model to infer more features. In [li2019iternet], authors try to overcome this problem by stacking little U-Net models on top of each other in order to get more information on connectivity of the micro-vessels, obscured details and lost information. In [kamran2021rvgan], authors proposed a GAN [goodfellow2014generative] based approach to segment the retinal vessels. They take the segmentation problem as an image translation problem. RV-GAN architecture has reported the best accuracy and AUC metrics so far. In 
, authors have proposed a cross connected Convolutional Neural Network architecture to perform the segmentation. In[article], authors have proposed an U-Net based model which is combined with a Residual U-Net. They propose that their proposed model is a good trade-off between the model accuracy and the training time. In [Jin_2019], authors have proposed a U-Net based model which uses Deformable Convolutional Networks, DU-Net. In order to train the model, they crop the input images into small batches. Cropping the input image into small batches is a strategy to overcome the problem of scarcity of the annotated training data.
Data augmentation techniques have been widely used in various problems. Not only in medical imaging segmentation or classification [article3]
, but also in a wide variety of computer vision problems and in problems which relies on the tabular data. We limit our review in their usage in image segmentation problems. In[ronneberger2015unet], the U-Net architecture is proposed for the first time and its usage with data augmentation is proposed. In [EatonRosen2018ImprovingDA], authors investigate the performance gain of using data augmentation in medical image segmentation and discuss the common problems of medical image segmentation due to the scarcity of the available annotated data. A comprehensive study of the data augmentation in deep learning models is reported in [Shorten2019ASO]. Data Augmentation in retinal image segmentation problem is not a widely studied area and our study is one of the most comprehensive studies that explores the performance gains from data augmentation in retinal vessel segmentation. In [10.1145/3348416.3348425], authors studied the effect of data augmentation in retinal image segmentation. However, their study is limited to rotated augmentations. In [sun2020robust], authors studied data augmentation in retinal image segmentation problem and a method that gives a robust segmentation model is proposed. In [10.3389/fncom.2019.00083], authors studied the data augmentation techniques for brain tumor segmentation problem.
Other than elastic transformations and affine image transformations which are rotation, flipping, scaling and cropping etc., they also study the generative techniques in order to perform the data augmentation. Generative Adversarial Networks have been widely used in medical image segmentation problems for both data augmentation techniques and as a segmentation model itself[article2, 10.3389/fncom.2019.00083].
|Affine Transformation||Elastic Transformation||Pixel-level Transformation|
|Rotation||Elastic Deformations||White Noise|
|Flipping||Grid Distortion||Gamma Correction|
|Zoom Out||Optical Distortion||Equalize Histogram|
In Retinal Vessel Segmentation problem, most widely used dataset is DRIVE dataset. Each paper in recent years have reported their model performance metrics on this dataset. DRIVE dataset has 20 train and 20 test images. Each image in the training set addresses a different kind of disease. It is quite challenging to create a high performance model with this amount of annotated data. We use DRIVE dataset to address these problems.
4 Experimental Setup
4.1 Problem Definition
The problem is the segmentation of the retinal images with an input data that has quality problems. Besides, the amount of the annotated data is very limited. Problems about the input data might occur due to the illumination, sensor noise, filters of the retinal camera, the input image angle, and other noises. Using the data with such defects limits the capability of the segmentation model. Most of the time the model is unable to segment the regions with noise due to the fact that the model does not have enough information about those regions. For example, in the DRIVE dataset, each image in the training set addresses a different disease. Each image may come with different quality problems. Combining this with the input image scarcity problem of the medical imaging problems, the problem at the hand becomes even more difficult. If one can identify the problems of the input images well, the quality of the segmentation model can be increased with the help of the data augmentation.
4.2 Proposed Methodology
We propose that optimal data augmentation can be successful for Retinal Vessel Segmentation, using simple U-Net architecture [ronneberger2015unet]. Data augmentation techniques are helpful for three reasons. The first reason is that they are helpful because the input data is very scarce. Data augmentation techniques increase the input image size and provide the model some extra information to learn. The other reason is that by data augmentation we can recover some of the performance loss in the model that occurred due to the image quality. If the practitioner decides the data augmentation technique based on the problems of the input image the segmentation performance can be increased. The third reason is due to the segmentation model that is used. In our study, we used U-Net architecture that makes use of pooling operations. The model learns relatively lower from the corner and side parts of the input image.
Data augmentation strategies can address these three problems. In order to address the third problem, we add rotated versions of the input images to the dataset using various angles. In order to let the model learn more from the noisy images, we use data augmentation techniques that rely on adding noise to the original image. Noise data come from the normal distribution with mean 0 and standard deviation. In our study we use augmentations with different epsilon values each greater than or equal to 1. Another technique we use is dropout which targets input pixels. In dropout data augmentation technique, pixels of the input image are set to zero in a random fashion. The portion of the pixel values to be set to zero is a parameter that needs to be defined. It is well-known that minor vessels are one of the hardest regions to segment. Figure 2 shows the predicted and ground truth vessels together. It can be seen that the majority of the incorrect segmentations occur at the minor vessels. An attempt to increase the success of the segmentation model might be to zoom to these regions and add zoomed images to the dataset. Randomly cropping the input image with the random sizes might be another strategy here. We use shifting and flipping of the input image which are also two successful data augmentation techniques. They are widely used with U-Net model training because U-Net uses convolutional filters. Convolutional filters miss the information in the edge of images. Shifting technique pushes the edges of images to the more central part of the image so that the U-Net model can learn the information in the edges of the original image from this augmented image. In order to address the image quality problems that occurred due to the brightness of the input image, we use gamma correction technique. Full list of augmentations are shown in Table 1.
4.3 Implementation Details
For our method, we use the same architecture from U-Net [ronneberger2015unet]. We train our models with the Adam optimizer [kingma2017adam] with learning rate of 1e-4, ,
. We don’t use any learning rate scheduling algorithms. For experiments on DRIVE, we use mini-batches of size 3. We use a dropout probability of 0.1 on fourth and fifth convolutional layers. The training is done by using binary cross-entropy loss. In our study we experimented with dice loss and the combination of binary cross-entropy loss and dice loss[Sudre_2017] as well. Nevertheless, it is observed that the best results are obtained using binary cross-entropy loss.
We implement all our models in Keras[chollet2015]
and train them on a single RTX 2080. We train our models 15 epochs, each epoch takes 10-15 minutes. We don’t use pre-trained weights on our experiments.
|Residual UNet ||2019||0.9779||-|
|Wang et al. [10.1145/3348416.3348425]||2019||0.9814||0.9573|
|Sun et al. [sun2020robust]||2020||0.9788||0.9545|
As seen in Table 2., our approach achieves 0.9855 Area Under the Curve (AUC) and 0.9712 accuracy score. It outperforms most of the models that has more complex architectures in the literature. Our model is also outperforms other data augmentation based studies in the literature [10.1145/3348416.3348425, sun2020robust]. It is observed that RV-GAN [kamran2021rvgan] has the highest AUC and accuracy scores. However, the training time of their model takes up to 48 hours. Even though we train our model about 3 hours, we are only 0.0032 AUC, 0.0078 accuracy point behind RV-GAN. Studies reported in Table 2 did not include the dice coefficient score of their models although it is a common metric in image segmentation tasks. Our method achieved 0.8255 mean dice score on DRIVE dataset.
In this paper, we propose a segmentation model that relies on the heavy usage of the data augmentation techniques. We use the standard U-Net architecture and outperform most of the more complex architectures in the literature. The data augmentation strategy is governed by the problems about the input images caused by the fundus camera and the environment. Various types of data augmentation techniques are used individually and collectively. Data augmentation strategy also takes the drawbacks of U-Net architecture and makes use of various rotated versions of the augmented images into account in order not to lose valuable information due to pooling operations.