Automating Vitiligo Skin Lesion Segmentation Using Convolutional Neural Networks

by   Makena Low, et al.

For several skin conditions such as vitiligo, accurate segmentation of lesions from skin images is the primary measure of disease progression and severity. Existing methods for vitiligo lesion segmentation require manual intervention. Unfortunately, manual segmentation is time and labor-intensive, as well as irreproducible between physicians. We introduce a convolutional neural network (CNN) that quickly and robustly performs vitiligo skin lesion segmentation. Our CNN has a U-Net architecture with a modified contracting path. We use the CNN to generate an initial segmentation of the lesion, then refine it by running the watershed algorithm on high-confidence pixels. We train the network on 247 images with a variety of lesion sizes, complexity, and anatomical sites. The network with our modifications noticeably outperforms the state-of-the-art U-Net, with a Jaccard Index (JI) score of 73.6 36.7 contrast with the previously proposed semi-autonomous watershed approach, which requires 2-29 minutes per image.



There are no comments yet.


page 1

page 3

page 4


Skin lesion segmentation using U-Net and good training strategies

In this paper we approach the problem of skin lesion segmentation using ...

Stroke Lesion Segmentation with Visual Cortex Anatomy Alike Neural Nets

Cerebrovascular accident or stroke, is an acute disease with extreme imp...

Skin Lesion Segmentation Using Atrous Convolution via DeepLab v3

As melanoma diagnoses increase across the US, automated efforts to ident...

LesionSeg: Semantic segmentation of skin lesions using Deep Convolutional Neural Network

We present a method for skin lesion segmentation for the ISIC 2017 Skin ...

Skin Lesion Segmentation: U-Nets versus Clustering

Many automatic skin lesion diagnosis systems use segmentation as a prepr...

Dense Multi-path U-Net for Ischemic Stroke Lesion Segmentation in Multiple Image Modalities

Delineating infarcted tissue in ischemic stroke lesions is crucial to de...

Unsupervised brain lesion segmentation from MRI using a convolutional autoencoder

Lesions that appear hyperintense in both Fluid Attenuated Inversion Reco...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Vitiligo is a skin condition where patches of skin get depigmented, as shown in Fig. 1. It affects 0.5-2% of the population, can be developed by anyone, and though not physically painful, can harm patients psychologically, socially, and professionally [hazel-jemmott_hazel-jemmott_2016][amer2016quality][salzes2016vitiligo]

. The body surface area (BSA) affected by vitiligo is the main measure of the condition’s severity and progression. BSA measurements must be consistent for proper clinical care, translational research efforts, and assessment of the efficacy of treatment. For instance, a physician’s visual estimation of the percentage of vitiligo-affected BSA informs the Vitiligo Area Scoring Index (VASI) and Vitiligo European Task Force (VETF) metrics. Both measures can only detect large changes in lesion area, with the smallest being between 7.1% to 10.4% of total BSA

[komen2015vitiligo]. Current segmentation practices are mainly manual. Not only is this method detrimental for accurate and reproducible readings, but it is also a time-inefficient and labor-intensive process. Moreover, non-dermatologists are often the ones who perform these segmentations, even though they do not have a rigorous background for such reviews [raina2018energy]. This study aims to introduce a novel solution to this issue.

Figure 1: Examples of vitiligo lesions with different sizes, complexity, and anatomical sites.

A convolutional neural network (CNN) is a promising approach for solving complex skin segmentation challenges. CNNs for skin cancer segmentation are already in widespread use, in large part due to the International Skin Imaging Collaboration (ISIC) Skin Lesion Analysis Towards Melanoma Detection competition [codella2018skin]. However, vitiligo is seldom the subject of such segmentation studies. One study that uses CNNs for vitiligo segmentation is very data-intensive: it presents a model trained on about 40,000 images, which is much larger than our and most medical datasets [liu2019classification].

Figure 2: Illustration of watershed algorithm with manual seeding (left), the resulting contour (middle), and segmented output (right).

Researchers have also explored less computationally intensive techniques than CNNs. One study attempted to quantify treatment efficacy by using a computerized digital imaging analysis system (C‐DIAS) [shamsudin2015objective]. Sheth et al. leveraged standard color image processing techniques to create an automatic vitiligo segmentation program; however, this approach does not perform well when tested on large surface areas [sheth2015pilot]. To address this, Raina et al. created a graphical user interface (GUI) with a semi-autonomous version of the watershed algorithm for lesion segmentation [raina2018energy][roerdink2000watershed]. The tool succeeds in outputting subtle contours for full-body images, but it requires “seeds” from the user to define the background (environment and healthy skin) and foreground (affected skin), as shown in Fig. 2 (left) in red and green colors. This semi-manual process of segmentation requires significant work when lesions are involved, as shown in Fig. 1 (right). Our work addresses these shortcomings.

We introduce a CNN that achieves a high Jaccard Index score (intersection over union) of 73.6% with 247 training images. Our models are based on the end-to-end U-Net [ronneberger2015u] architecture. We substitute the contracting path with a popular semantic segmentation CNN that serves as a feature extractor. Our work investigates VGG16, ResNet50, InceptionV3, InceptionResNetV2, and SENet154 as contracting path enhancers [simonyan2014very][he2016deep][szegedy2016rethinking][szegedy2017inception][hu2018squeeze]. We also experiment with watershed-based post-processing; after classification, the high-confidence pixels are fed as seeds to the watershed algorithm [roerdink2000watershed]. We find that an InceptionResnetV2 contracting path performs the best out of all our explored architectures. Our method drastically reduces segmentation time compared to the watershed GUI as well as offers a method of achieving reproducible output.

2 Methods

2.1 Vitiligo Image Samples and Annotation

Our dataset consists of 308 red/green/blue (RGB) images of vitiligo lesions compiled by the UC Davis Medical Center. The lesions range widely in skin tone and anatomical location. Physicians have taken the images from several angles, at different levels of brightness, and either in ultraviolet (UV) or natural lighting. We derive the ground truth segmentation output from the semi-autonomous watershed GUI and manual edits. Each ground truth output image is a binary mask of the lesion, where zero (black) represents healthy skin or the environment, and 255 (white) represents vitiligo. The dataset is split such that 60% is for training the model (188 images), 20% is for validating the model (66 images), and 20% for testing the model on unseen data (61 images).

Figure 3: U-Net with a ResNet50 contracting path.

2.2 Evaluation Metric

We evaluate the network’s performance using the pixel-wise Intersection over Union metric (IoU):

This metric is also known as the Jaccard Index (JI) [codella2018skin]

. For each image, we calculate the JI between every classified pixel and the corresponding ground truth pixel. The JI of the image is the average of the pixel-wise JI scores. Previous analysis suggests that the JI is too optimistic by not accounting for the labor required to correct an inaccurate segmentation

[codella2017deep]. Thus, we also compute a thresholded Jaccard Index to account for segmentations that do not fall within professional inter-observer variability. If the average JI is less than 65%, we set the score to 0% for the image. Otherwise, the JI is unchanged. The threshold of 65% was determined by ISIC [codella2018skin]

. Although ISIC focuses on melanoma segmentation, the human labor required for a similar evaluation with vitiligo was not feasible for us; we suppose that the ISIC threshold is a fair estimate. The evaluation metric for our networks is the average of the threshold JI scores for the images in the validation set.

2.3 Image Pre-processing

We perform simple pre-processing on images before feeding them into our network. We subtract the mean from each image channel and normalize each channel to make the standard deviation -1 to 1 to guarantee pixel scale standardization. We re-scale every image to

. We implement data augmentation during training (after pre-processing). Data augmentation includes a rotation range from 0 to 180 degrees, horizontal and vertical shifts set to 0.05, and vertical and horizontal flips. Moreover, we set the zoom range to 0.8 to 1.2 times the original image due to the varying closeness between camera and lesion. Due to the varying brightness and lighting conditions, brightness augmentation ranges from 0.7 to 1.3 times that of the original image.

2.4 U-Net Network Experiments

Our baseline is an unmodified U-Net with 512 hidden units at the bottleneck and no pre-trained weights from ImageNet


. The final activation is a softmax layer. After 100 epochs, the JI score is 36.7%. We experiment with using popular semantic segmentation networks such as VGG16 and ResNet50 as modified contracting paths in our U-Net. Fig. 


illustrates our U-Net architecture with a ResNet50 contracting path. We utilize an API based on Keras and Tensorflow frameworks to create our test architectures listed in Table 

1. For fast comparison, each modified U-Net is only trained for 30 epochs and evaluated. Table 1 shows the results of each model.

Contracting Path Epochs Val Train
Unmodified 100 36.8% 44.7%
VGG16 30 61.2% 63.7%
ResNet50 30 64.2% 68.2%
InceptionV3 30 61.5% 63.9%
InceptionResNetV2 30 70.9% 67.0%
SENet154 30 61.3% 66.7%
Table 1: JI scores of U-Net architectures.

2.5 Hyperparameter Tuning

Since there are benefits to multiple methods of hyperparameter tuning, we use a three-pronged approach for finding optimal hyperparameters. (1) For initial exploration, we iterate with random search to leverage its strength in not fixating on local minima while also efficiently exploring the hyperparameter search space

[bergstra2012random]. (2) Once coarse parameter tuning identifies promising ranges, we manually alter our search space for fine-tuning. (3) Finally, we employ sequential model-based optimization (SMBO), so we can try future hyperparameters based on promising past ones, as well as reduce the computational expense and iterations needed for promising results compared to random search [snoek2012practical]. Table  2 outlines our chosen hyperparameters from this optimization.

Figure 4: Original image (left), ground truth overlay (middle left), prediction overlay (middle right), ground truth overlay with prediction (red is true positive and pink is false positive) (right).
Hyperparameter Value
LR 0.000336375
Optimizer Nadam
Contracting Normalization Batch
Contracting Hidden Units [512,256,128,64,32]
Freeze Weights False
Contracting Activation ELU
Weight Decay 0.000158
Dropout 0.0136
LR Decay 8.806E-05
Epochs 165
Batch Size 8
Table 2: Tuned hyperparameters for U-Net with InceptionResNetV2 contracting path.

2.6 Combining Datasets and Post-Processing

We combine the training and validation sets - for a total of 247 images - to train our network before evaluating it on the test set. We also experiment with watershed-based post-processing, which feeds high-confidence classifications as seeds into the watershed algorithm. High confidence pixels are pixels classified within a 30% confidence interval of being negative (0-77) or positive (179-255) for vitiligo.

3 Results and Discussion

InceptionResNetV2 is the best performing contracting path, as it achieves a JI of 74.1% and threshold JI of 58.0% before hyperparameter tuning. The runtime is 97 minutes for 100 epochs on a single NVIDIA Tesla K80 GPU. SENet154 also appears to be a strong candidate; however, because the high performance came at the expense of increased training time, we did not explore it further in our study. After hyperparameter tuning, the JI score is 81.5%, and the threshold JI is 62.8%. Once we perform watershed post-processing, the count of images below the threshold falls from 16 images to 14 images. After training on the combined dataset, the InceptionResNetV2-based U-Net achieves a JI of 73.6% and threshold JI of 61.9%. Fig. 4 shows an example of the output. Though it is counterintuitive that performance decreases with our larger dataset, we believe this result may be due to the variability inherent in our small test set, which is only 61 images. The total training runtime is 108 minutes for about 200 epochs for our final network.

Lesion Simple Moderate Complex
Our Method
Person 1
Person 2
Person 3
Table 3: Segmentation by our method and three persons using semi-autonomous watershed GUI compared to the original image.
Lesion Simple Moderate Complex
Our Method JI (%) 88.7% 86.1% 74%
Time <1s <1s <1s
Person 1 JI (%) 94.3% 92.1% 83.3%
Time 4m 55s 9m 4s 28m 46s
Person 2 JI (%) 96.8% 95.3% 81.9%
Time 3m 44s 6m 57s 20m 39s
Person 3 JI (%) 88.0% 85.8 75.6%
Time 1m 53s 4m 31s 17m 24s
Table 4: Segmentation scores (JI) and times for lesions of varying complexity for three persons using semi-autonomous watershed GUI compared with our method.
Lesion Person 1 Person 2 Person 3
Table 5: Segmentation when constrained to 10 minutes.
Person 1 Person 2 Person 3
Accuracy (%) 75.9% 84.7% 77.6%
Table 6: Variability in segmentation accuracy for the “complex” rated lesion, with time held constant at 10 minutes.

We conduct an error analysis on the validation images that scored below the threshold JI, 16 images in total. By inspection, we believe that eight of the images have errors primarily due to ground truth labeling limitations. The semi-autonomous watershed tool is limited in its ability to identify small lesions due to the coarseness of seeds. Manual labeling addresses some of these smaller lesions. However, there are cases in which a gradient between healthy skin to vitiligo leaves ambiguity for classification.

Moreover, pixels that are not fully confident in classification receive a lower JI score due to the way the JI is calculated. For instance, a classification of 0.7 will result in a lower JI than a classification of 1, even if both are correct in being reasonably confident that the pixel is positive for vitiligo. Still, even with errors in the labeling and a lower JI, the predictions visually capture complex target regions on any skin surface with any skin tone. The strong visual prediction suggests that the proposed architecture represents a solid foundation for future work in automating vitiligo lesion segmentation. Moreover, each segmentation took less than a few seconds per image, instead of a few minutes via semi-autonomous watershed.

We also perform a case study to quantify the correlation between lesion complexity and time to segment the lesion with the watershed GUI. We asked three persons (reviewers), who were non-dermatologists, to semi-manually segment the lesions using watershed GUI. They were allowed to gain familiarity with the GUI on practice lesions before being officially timed. We asked our reviewers to continue contouring until they felt comfortable with their segmentation being in a clinical setting. As expected, time for segmentation increased with lesion complexity. Quantitative results are shown in Table 3. Table 4

shows that our method requires less than a second, in contrast with watershed, which requires 2-29 minutes per image. We performed a similar case study to elucidate the variability in segmentation between reviewers. After 10 minutes of segmentation on the “complex” rated lesion, the reviewers are asked to pause so that we can save their progress at that moment in time. From this study, we see that segmentation accuracy indeed varies widely, with almost 10% difference between reviewers, as shown in Table 

6. Table 5 visually demonstrates variability between reviewers. Our method removes this variability.

4 Conclusion

We demonstrate that a U-Net with an InceptionResnetV2 - based contracting path, with watershed post-processing, proves promising for vitiligo segmentation. We quantify the variability that is possible between reviewers, as well as the time required to segment increasingly complex lesions. Our method eliminates both variability and long segmentation times, while also providing predictions that do not require much manual re-editing. There exist no conflicts of interest.