Exploring The Limits Of Data Augmentation For Retinal Vessel Segmentation

05/19/2021 ∙ by Enes Sadi Uysal, et al. ∙ 0

Retinal Vessel Segmentation is important for diagnosis of various diseases. The research on retinal vessel segmentation focuses mainly on improvement of the segmentation model which is usually based on U-Net architecture. In our study we use the U-Net architecture and we rely on heavy data augmentation in order to achieve better performance. The success of the data augmentation relies on successfully addressing the problem of input images. By analyzing input images and performing the augmentation accordingly we show that the performance of the U-Net model can be increased dramatically. Results are reported using the most widely used retina dataset, DRIVE.



There are no comments yet.


page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Medical Image Analysis is a fast growing research area in the literature. It has become an essential part of the health-care system in recent years. The success of the various studies resulted in concrete products that help medical experts to make decisions and help patients to have better health-care services. With the help of machine learning and deep learning algorithms medical experts have more tools to fight against the diseases.

Medical images come in different forms. Some of these forms are X-ray, computed tomography (CT), magnetic resonance imaging (MRI), ultrasound imaging, fundoscopic Images etc. The medical expert analyzes the targeted area within one of these images to perform diagnosis. In order to make this process easier, segmentation of the regions that are meaningful for the task is important. The segmentation task is achieved by extracting the target class while ignoring the objects from other classes. There are various approaches to solve this segmentation problem. There are signal processing based approaches [1677727]

, heuristic techniques


, Support Vector Machine based applications

[4336179], and there are also deep learning applications [10.1007/978-3-319-46723-8_16, kamran2021rvgan]. Deep Learning applications have gained popularity in recent years thanks to the increasing computing power. One of the areas that it is being heavily used is medical image segmentation.

It is known that the success of the deep learning models relies on the input data volume. In supervised techniques, we also need the annotations of this input data. Having annotated data is always costly. In medical problems, it also requires a high level of expert knowledge. For this reason, the lack of annotated data is a much more serious problem in medical image analysis than most of the other imaging problems. One of the specific areas in medical imaging that this problem appears is Retinal Vessel Images. However, obscured details in the fundoscopic images makes the decision making process hard for the medical experts. The edges of the vessels are extremely thin and quite hard to segment. The quality of the input images are also questionable. Input images might be noisy and key information may have become very hard to extract in order to detect diseases. A mistake in that process may cause false positive or false negative diagnosis. The quality of the input image might be affected by the illumination, sensor noises, incorrect angle, type of the filter used in retinal fundus cameras etc.

Successful segmentation of the retinal vessel segmentation has been widely studied and it is still one of the hot research areas. Various different architectures are tailored for this specific problem and numerous existing deep learning architectures are used in order to perform the segmentation task. The scarcity of the annotated data pushed researchers to use data augmentation to a certain amount in order to avoid the overfitting problem. However, the usage of the data augmentation is limited in these studies. In our study we claim that if data augmentation strategy can successfully address the problems of the input data, a successful segmentation model can be obtained. In our study, we are looking for the performance gains that can be obtained by the excessive data augmentation using U-Net architecture for retinal vessel segmentation problems. We use the DRIVE dataset111https://drive.grand-challenge.org/ that has become one of the standard benchmarks in the retinal vessel segmentation studies.222All models and methods are available at https://github.com/onurboyar/Retinal-Vessel-Segmentation

Figure 1: Training sample from DRIVE dataset.

2 Related Word

The Retinal Vessel Segmentation problem has gained much interest in the literature in recent years. There are different types of segmentation strategies with varying complexities.

Medical Image Segmentation studies rely heavily on the U-Net architecture [ronneberger2015unet]. In [8803101], MResU-Net is derived from U-Net architecture by replacing convolutional layers with residual blocks in order to achieve better accuracy and increase the depth of the model to infer more features. In [li2019iternet], authors try to overcome this problem by stacking little U-Net models on top of each other in order to get more information on connectivity of the micro-vessels, obscured details and lost information. In [kamran2021rvgan], authors proposed a GAN [goodfellow2014generative] based approach to segment the retinal vessels. They take the segmentation problem as an image translation problem. RV-GAN architecture has reported the best accuracy and AUC metrics so far. In [8759448]

, authors have proposed a cross connected Convolutional Neural Network architecture to perform the segmentation. In

[article], authors have proposed an U-Net based model which is combined with a Residual U-Net. They propose that their proposed model is a good trade-off between the model accuracy and the training time. In [Jin_2019], authors have proposed a U-Net based model which uses Deformable Convolutional Networks, DU-Net. In order to train the model, they crop the input images into small batches. Cropping the input image into small batches is a strategy to overcome the problem of scarcity of the annotated training data.

Data augmentation techniques have been widely used in various problems. Not only in medical imaging segmentation or classification [article3]

, but also in a wide variety of computer vision problems and in problems which relies on the tabular data. We limit our review in their usage in image segmentation problems. In

[ronneberger2015unet], the U-Net architecture is proposed for the first time and its usage with data augmentation is proposed. In [EatonRosen2018ImprovingDA], authors investigate the performance gain of using data augmentation in medical image segmentation and discuss the common problems of medical image segmentation due to the scarcity of the available annotated data. A comprehensive study of the data augmentation in deep learning models is reported in [Shorten2019ASO]. Data Augmentation in retinal image segmentation problem is not a widely studied area and our study is one of the most comprehensive studies that explores the performance gains from data augmentation in retinal vessel segmentation. In [10.1145/3348416.3348425], authors studied the effect of data augmentation in retinal image segmentation. However, their study is limited to rotated augmentations. In [sun2020robust], authors studied data augmentation in retinal image segmentation problem and a method that gives a robust segmentation model is proposed. In [10.3389/fncom.2019.00083], authors studied the data augmentation techniques for brain tumor segmentation problem.

Other than elastic transformations and affine image transformations which are rotation, flipping, scaling and cropping etc., they also study the generative techniques in order to perform the data augmentation. Generative Adversarial Networks have been widely used in medical image segmentation problems for both data augmentation techniques and as a segmentation model itself

[article2, 10.3389/fncom.2019.00083].

Affine Transformation Elastic Transformation Pixel-level Transformation
Rotation Elastic Deformations White Noise
Flipping Grid Distortion Gamma Correction
Zoom Out Optical Distortion Equalize Histogram
Random Cropping Dropout
Shifting Sharpening
Shearing Blurring
Table 1: List of augmentations we use.

3 Dataset

In Retinal Vessel Segmentation problem, most widely used dataset is DRIVE dataset. Each paper in recent years have reported their model performance metrics on this dataset. DRIVE dataset has 20 train and 20 test images. Each image in the training set addresses a different kind of disease. It is quite challenging to create a high performance model with this amount of annotated data. We use DRIVE dataset to address these problems.

4 Experimental Setup

4.1 Problem Definition

The problem is the segmentation of the retinal images with an input data that has quality problems. Besides, the amount of the annotated data is very limited. Problems about the input data might occur due to the illumination, sensor noise, filters of the retinal camera, the input image angle, and other noises. Using the data with such defects limits the capability of the segmentation model. Most of the time the model is unable to segment the regions with noise due to the fact that the model does not have enough information about those regions. For example, in the DRIVE dataset, each image in the training set addresses a different disease. Each image may come with different quality problems. Combining this with the input image scarcity problem of the medical imaging problems, the problem at the hand becomes even more difficult. If one can identify the problems of the input images well, the quality of the segmentation model can be increased with the help of the data augmentation.

4.2 Proposed Methodology

We propose that optimal data augmentation can be successful for Retinal Vessel Segmentation, using simple U-Net architecture [ronneberger2015unet]. Data augmentation techniques are helpful for three reasons. The first reason is that they are helpful because the input data is very scarce. Data augmentation techniques increase the input image size and provide the model some extra information to learn. The other reason is that by data augmentation we can recover some of the performance loss in the model that occurred due to the image quality. If the practitioner decides the data augmentation technique based on the problems of the input image the segmentation performance can be increased. The third reason is due to the segmentation model that is used. In our study, we used U-Net architecture that makes use of pooling operations. The model learns relatively lower from the corner and side parts of the input image.

Data augmentation strategies can address these three problems. In order to address the third problem, we add rotated versions of the input images to the dataset using various angles. In order to let the model learn more from the noisy images, we use data augmentation techniques that rely on adding noise to the original image. Noise data come from the normal distribution with mean 0 and standard deviation

. In our study we use augmentations with different epsilon values each greater than or equal to 1. Another technique we use is dropout which targets input pixels. In dropout data augmentation technique, pixels of the input image are set to zero in a random fashion. The portion of the pixel values to be set to zero is a parameter that needs to be defined. It is well-known that minor vessels are one of the hardest regions to segment. Figure 2 shows the predicted and ground truth vessels together. It can be seen that the majority of the incorrect segmentations occur at the minor vessels. An attempt to increase the success of the segmentation model might be to zoom to these regions and add zoomed images to the dataset. Randomly cropping the input image with the random sizes might be another strategy here. We use shifting and flipping of the input image which are also two successful data augmentation techniques. They are widely used with U-Net model training because U-Net uses convolutional filters. Convolutional filters miss the information in the edge of images. Shifting technique pushes the edges of images to the more central part of the image so that the U-Net model can learn the information in the edges of the original image from this augmented image. In order to address the image quality problems that occurred due to the brightness of the input image, we use gamma correction technique. Full list of augmentations are shown in Table 1.

Figure 2: Two examples for combination of predictions and ground truths. Red pixels are predicted vessels, white pixels are ground truth pixels. By (a), it is observed that the model errors due to the minor vessels. Model performs well in segmenting thick vessels as it is pointed in the right hand side of (a). Model performs better at segmenting the vessels in (b). Minor vessels are segmented better than (a) in this example.

4.3 Implementation Details

For our method, we use the same architecture from U-Net [ronneberger2015unet]. We train our models with the Adam optimizer [kingma2017adam] with learning rate of 1e-4, ,

. We don’t use any learning rate scheduling algorithms. For experiments on DRIVE, we use mini-batches of size 3. We use a dropout probability of 0.1 on fourth and fifth convolutional layers. The training is done by using binary cross-entropy loss. In our study we experimented with dice loss and the combination of binary cross-entropy loss and dice loss

[Sudre_2017] as well. Nevertheless, it is observed that the best results are obtained using binary cross-entropy loss.

We implement all our models in Keras


and train them on a single RTX 2080. We train our models 15 epochs, each epoch takes 10-15 minutes. We don’t use pre-trained weights on our experiments.

5 Results

Paper Year AUC Accuracy
U-Net [Jin_2019] 2018 0.9830 0.9681
Residual UNet [8803101] 2019 0.9779 -
IterNet [li2019iternet] 2019 0.9816 0.9574
Wang et al. [10.1145/3348416.3348425] 2019 0.9814 0.9573
Sun et al. [sun2020robust] 2020 0.9788 0.9545
CcNet [FENG2020268] 2020 0.9678 0.9528
SUD-GAN [sudgan] 2020 0.9786 0.9560
RV-GAN [kamran2021rvgan] 2020 0.9887 0.9790
This Paper 2021 0.9855 0.9712
Table 2: Performance comparison on the DRIVE dataset.

As seen in Table 2., our approach achieves 0.9855 Area Under the Curve (AUC) and 0.9712 accuracy score. It outperforms most of the models that has more complex architectures in the literature. Our model is also outperforms other data augmentation based studies in the literature [10.1145/3348416.3348425, sun2020robust]. It is observed that RV-GAN [kamran2021rvgan] has the highest AUC and accuracy scores. However, the training time of their model takes up to 48 hours. Even though we train our model about 3 hours, we are only 0.0032 AUC, 0.0078 accuracy point behind RV-GAN. Studies reported in Table 2 did not include the dice coefficient score of their models although it is a common metric in image segmentation tasks. Our method achieved 0.8255 mean dice score on DRIVE dataset.

6 Conclusion

In this paper, we propose a segmentation model that relies on the heavy usage of the data augmentation techniques. We use the standard U-Net architecture and outperform most of the more complex architectures in the literature. The data augmentation strategy is governed by the problems about the input images caused by the fundus camera and the environment. Various types of data augmentation techniques are used individually and collectively. Data augmentation strategy also takes the drawbacks of U-Net architecture and makes use of various rotated versions of the augmented images into account in order not to lose valuable information due to pooling operations.