Data-Driven Color Augmentation Techniques for Deep Skin Image Analysis

03/10/2017 ∙ by Adrian Galdran, et al. ∙ 0

Dermoscopic skin images are often obtained with different imaging devices, under varying acquisition conditions. In this work, instead of attempting to perform intensity and color normalization, we propose to leverage computational color constancy techniques to build an artificial data augmentation technique suitable for this kind of images. Specifically, we apply the shades of gray color constancy technique to color-normalize the entire training set of images, while retaining the estimated illuminants. We then draw one sample from the distribution of training set illuminants and apply it on the normalized image. We employ this technique for training two deep convolutional neural networks for the tasks of skin lesion segmentation and skin lesion classification, in the context of the ISIC 2017 challenge and without using any external dermatologic image set. Our results on the validation set are promising, and will be supplemented with extended results on the hidden test set when available.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Melanoma is a highly aggressive form of skin tumor that can be successfully treated when diagnosed at an early stage [1]. Suspicious symptoms can be detected visually, and for that reason computer-vision techniques applied to skin image analysis are becoming increasingly important as a feasible and inexpensive method for skin care [2].

For an effective interpretation of skin lesions, color is known to be a fundamental visual feature, together with size, shape and texture [3]. In computerized skin image analysis, a representation of the image in terms of visual features is usually built in order to find relevant cues for diagnosis. Deep learning techniques, which can automate the feature design step, are outperforming traditional, hand-crafted features-based methods in most computer-aided skin disease diagnosis tasks [4, 5]. However, training these techniques requires large databases of labeled skin images. These databases often contain samples obtained at different hospitals, with different acquisition systems and under varying illumination conditions, which poses a complex challenge for robust image analysis.

Usually, the input of skin image understanding techniques undergoes a color normalization step, in order to achieve color constancy [6]. This has been recently shown to be helpful for subsequent automatic diagnosis tasks [7]. This procedure is related to the need of handcrafted feature-based methods to receive normalized input images. This is especially important at test time, since feeding the model images that are significantly different from those that the algorithms saw during training time can lead to a poor generalization of the method, due to their limited expressiveness. However, computational color constancy is a hard problem, and the output can sometimes be unpredictable.

In this work, we propose a data augmentation technique adapted for skin lesion analysis with deep neural networks, leveraging the fact that their virtually unlimited expressive power enables them to be trained with extensive variability in the data manifold, in order for them to learn features invariant to those factors of variation. Specifically, instead of relying on existing illuminant color compensation techniques to achieve a normalized dataset, we apply them to obtain an estimate of the color of the illuminant of the scene. These illuminants are then sampled at training time, and applied to white-balanced training images in order to generate new training images that simulate different but plausible illumination conditions at acquisition time. This augmentation technique leads to a robust training stage, which allows for promising results to be obtained both in segmentation and classification tasks. Our work is framed into the ISIC 2017 Challenge: Skin Lesion Analysis-Towards Melanoma Detection [8]. The competition provides three different goals, from which we only consider skin lesion segmentation and classification.

Ii Data-Driven Artificial Data Augmentation

In order to train complex models with a small amount of data, image augmentation schemes are often applied. These consist of transforming input images into new but plausible examples that can help the network to generalize better. Although recent generative models suggest that realistic medical image generation is feasible by means of more complex techniques [9]

, typically basic linear transformations on the image geometry are performed, such as shifting, scaling or rotating. However, applying domain-specific knowledge on the expected images,

e.g. on the image acquisition process, can help to generate a more diverse set of artificial images while keeping plausibility.

Below we describe a method to jointly normalize color in skin images and learn from the available data the distribution of scene illuminants. These illuminants are then applied to white-balanced images for generating useful artificial images.

Figure 1: White-balancing and illuminant estimation on several training images following Eq. 1

Ii-a Color Image Augmentation via Illuminant Estimation

A well-known property of the human visual system is the ability to perceive the color of an object as roughly constant, even when the color of the light projected on the scene is modified. This effect is known as color constancy and, although not yet fully understood, it can be partially explained by the joint interaction of low level mechanisms that act on the response of the cones in the retina and by higher cognitive-level effects [10]. In parallel, there exists a large number of techniques that attempt to reproduce such effect on image acquisition and processing systems within the field of computational color constancy. In this context, the goal is often to first estimate the illuminant of the scene, and then remove it from the input image by means of a chromatic adaptation transform. Usually this is achieved in commercial cameras by means of proprietary white-balancing algorithms. However, this general approach does not always work in a predictable manner, as the illuminant depends not only on the light cast on the scene but also on the image content.

A white-balancing technique mainly works by manipulating image intensities in such a way that the scene appears as being lit by a neutral illuminant. This is usually accomplished separately in the different spectral components of the image. Denoting an image as , a typical white balancing technique solves the following equation:


where and is the white-balanced version of , which can be recovered by inversion of Eq. (1). In this work, we estimate the illuminant on images from the training set applying the Shades of Gray color constancy technique [11], as implemented by the general color constancy framework described in [12]. A sample of training images and their white-balanced versions is shown in Fig. 1, while Fig. 2 shows the full empirical distribution over the illuminants.

Note that in [7]

the same approach was successfully employed to normalize skin images before supplying them to a classifier, observing an improvement in performance after this normalization. In this work, we proceed in a substantially different direction. Instead of only removing illuminant variations, we collect the set of illuminants extracted from the training set,

. At training time, we augment the white-balanced dataset by color-casting each corrected image with an illuminant extracted from , i.e.,

. This produces realistically rendered versions of each of them under a different illuminant. This technique is applied in an online fashion (randomly yielding one version of each image per epoch) and by applying a Von Kries-like diagonal chromatic adaptation transform

[13]. Notice that, if the illuminant is the same that was extracted from , this transformation reduces to the identity, i.e., . However, if comes from a different image, this transformation leads to a color-casted version of . The result of applying this procedure to a white-balanced image from the training set can be observed in Fig. 3. With this strategy, the method is able to realistically augment the original dataset, preventing overfitting when training a deep neural network and achieving a more robust training procedure.

After extracting the set of illuminants from the training data, we need to access its underlying probabilistic distribution in order to generate new plausible images. The way in which that distribution is modeled is relevant to the quality of the obtained artificial images. In [14], a similar technique was applied to augment the dataset before learning a deep model that can achieve color constancy on natural images. In that work, the authors applied a -means clustering to the retrieved illuminants, with

being an heuristically determined free parameter. However, this approach may lead to an over-representation of areas in the illuminant space that, although representing a separate cluster, do not contain a significant amount of samples. Here we apply a different strategy and directly sample from the raw empirical distribution of illuminants. This way, each time a training example is selected we randomly choose an illuminant from

with a uniform probability distribution, producing a new color-casted image.

Figure 2: Distribution of illuminants estimated from the training set using the shades of gray algoritm. The RGB illuminants where converted to the CIE L*a*b* uniform color space [15] via their XYZ tristimulus values, and projected onto the a*b* plane (a* approximates redness-greenness, b* approximates yellowness-blueness). Each sample shows the color it encodes.
Figure 3: Color-casting of a white-balanced training image with different illuminants.

Ii-B Gamma-Correction Augmentation

In addition to the white-balancing, a digital camera applies several other normalization steps [16]. One of them is the well-known gamma correction. When a camera captures light input, theoretically the received signal is a linear function of photons reaching the device’s sensors. However, for a more natural luminance reproduction in digital screens, a non-linear transformation with a power function is usually applied:


where is the luminance reaching the camera, is the corresponding post-processed image ready for display, and is the correction constant. Unfortunately, the specifications of this transform depend heavily on the characteristics of the display and on the imaging device manufacturer. To compensate for this issue, we also applied gamma corrections to the training images both at train and test time, aiming for a more robust representation learning of our model. Thus, after undergoing the color transformation explained above, we randomly draw for each image and epoch a correction factor

from a Normal distribution of mean

and standard deviation

truncated at 0 and 2, and apply a power-law mapping similar to the one in Eq. (2).

Ii-C Non-Linear Geometrical Image Augmentation

To complement the color transformation detailed above, we applied standard geometrical (linear) data augmentation techniques, namely rotation, horizontal and vertical flipping, translation and scaling of the input image. Moreover, to account for the non-linear distortions typical of soft tissues such as skin, we applied several non-linear transformations of the image geometry, similar to those proposed in [4].

Iii Deep Neural Networks for Skin Lesion Segmentation

Figure 4: Schematic representation of the U-Net architecture.

The described lesion segmentation problem is approached using a Convolutional Neural Network (CNN), coupled with extensive data augmentation. Data was augmented applying the transforms described in Section II and re-scaled before being employed as input to a U-Net. The U-Net architecture was first proposed in [17]

, and is a Fully Convolutional Network, originally designed for segmenting neuronal structures and cells in microscopy images. It is a powerful deep classifier, that can be successfully trained with a relatively low amount of training data and still produces accurate segmentations.

The architecture of the U-Net is represented in Fig. 4

. The network has two main paths: a contracting path, and a dimensionally symmetric expanding path. The contracting path consists of consecutive convolutional layers with a stride of

. Each convolutional layer is followed by a Rectified Linear Unit (ReLU) activation and batch normalization, except for the last layer, which is activated by a sigmoid activation function. The stride of the convolutional layers is selected so that the dimension of the output feature map of the contracting path layers decreases until

, where is the number of filters. This point in the network marks the beginning of the expanding path. The output of each layer is upsampled so that it has the same dimension as the corresponding layer in the contracting path. To compensate for the loss in spacial resolution that results from the multiple downsampling operations, the upsampled feature map is concatenated with the feature map of the corresponding layer in the contracting path.

The network was trained using gradient descent backpropagation, with the Adam optimizer and a Jaccard index-based loss function, as in

[18]. The system outputs the probability of each pixel of the input image belonging to a lesion.

Iv Deep Neural Networks for Skin Lesion Classification

The image classification task in the ISIC 2017 Challenge comprises two independently evaluated binary classification subtasks over the same set of images: 1) discriminating melanoma vs. all the other lesions and 2) discriminating seborrheic keratosis from every other kind of deseases. Both tasks were independently approached by means of separately trained networks. Nevertheless, as a common strategy, and following the rationale in [5], we chose to leverage the outcome from the lesion segmentation task by feeding the classification stage with images cropped around the bounding box defined by the segmentation mask. At train time we used the ground truth masks and increased the tight bounding box by for incorporating information on the appearance of the skin around the lesion, which may contain useful information for diagnosis. At test time we relied on the predicted segmentations and applied the same margin, which can also help alleviate small segmentation errors and keep lesion border information.

We trained both tasks with layer versions of deep residual networks [19]

. These were initialized with the weights learned by pre-training over the ILSVRC2012 Imagenet database. We replaced the last fully connected layer of the pretrained network by a dropout stage and two dense layers of

and outputs, respectively, followed by the final softmax. Given the strong tendency to overfit, a dropout probability of was set, and a two-phase fine-tuning was performed. During the first stage, only the new dense layers were trained (using an Adadelta optimizer). During the second phase, the first

layers were still kept frozen, and the rest were trained using stochastic gradient descent with a learning rate of

and momentum (). The loss function was categorical cross entropy, but the losses associated with samples from each class were weighted differently, to compensate for class imbalance.

V Experimental Evaluation

Both models were tested on the ISIC 2017 Challenge. The organizers provided a training set of images with ground-truth for segmentation and classification, a validation set of images, and a test set with images. It is important to stress that we did not employ any external skin image set or categorical data (other than natural images from the Imagenet dataset in the Resnet50 pretraining). This was allowed by the challenge organizers, but we wanted to explore the capability of the proposed technique to improve results when only few domain-specific images are available for training.

For performing predictions on unseen images, we applied randomly selected illuminants to the white-balanced input image, and predicted on several color casted versions of it. The overall prediction for each pixel was computed as the median of the predictions on each tested image. No further post-processing was performed in the output.

Table I shows the results obtained for the segmentation task on the validation set, while Table II does so for the classification task. Results on the test set will be supplied for both tasks as soon as they are available.

1) Segmentation 0.948 0.846 0.767 0.865 0.980
Table I:           Results for the segmentation task
Acc = Accuracy, DC = Dice Coefficient, JD = Jaccard Distance
Ss = Sensitivity, Sp = Specificity
3.1) Melanoma 0.791 0.580 0.482 0.833 0.517
3.2) Seborrheic keratosis 0.954 0.867 0.898 0.857 0.870
Average 0.873 0.723 0.690 0.845 0.694
Table II:           Results for the classification task
AUC = Area Under the ROC curve, AP = Average Precision
ACC, AP, SS and SP were computed at 0.5 confidence threshold

V-a Discussion and Conclusion

In this paper we presented our approach to the Segmentation and classification tasks of the ISIC 2017 Challenge Skin Lesion Analysis - towards melanoma detection. We focused our efforts on exploiting the information obtained from applying color constancy algorithms to the training set in order to perform extensive data augmentation that could help regularize our convolutional neural networks-based learning pipeline. The obtained segmentations were leveraged to improve the full image classification, obtaining competitive results, especially on the segmentation task.

The strong shift of the distribution in Fig. 2 towards the quadrant suggests that part of the illuminant estimation may be inaccurately considering the skin’s color as being produced by an unnatural reddish illuminant in some cases. In the future, we aim at further investigating this effect.