Deep neural network ensemble by data augmentation and bagging for skin lesion classification

07/15/2018 ∙ by Manik Goyal, et al. ∙ Indian Institute Of Technology Nanyang Technological University 0

This work summarizes our submission for the Task 3: Disease Classification of ISIC 2018 challenge in Skin Lesion Analysis Towards Melanoma Detection. We use a novel deep neural network (DNN) ensemble architecture introduced by us that can effectively classify skin lesions by using data-augmentation and bagging to address paucity of data and prevent over-fitting. The ensemble is composed of two DNN architectures: Inception-v4 and Inception-Resnet-v2. The DNN architectures are combined in to an ensemble by using a 1×1 convolution for fusion in a meta-learning layer.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Task 3 of ISIC 2018 challenge in Skin Lesion Analysis Towards Melanoma Detection [2, 1] is defined as to generate the binary classification corresponding to each of the 7 disease classes: melanoma, melanocytic nevus, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma and vascular lesion for each test image. The predicted responses are then scored using balanced normalized accuracy.

In this work, we propose a novel framework, data augmentation and bagging ensemble architecture (DABEA), that uses data augmentation and bagging in combination to generate multiple output vectors per model and then applies a

convolution layer as a meta-learner for combining different model outputs. Our main contributions includes using (i) an ensemble DNN models with data augmentation and bagging and (ii) a convolution layer for meta-learning of models.

2 Methodology

Figure 1: Illustration of DABEA: (a) data augmentation and bagging of outputs for a single CNN model, and (b) concatenation of outputs of CNN models as inputs for the meta-learning fusion.

2.1 Deep neural network models

We selected two of the top performing CNN architectures namely Inception-v4 and Inception-Resnet-v2 [3]

as our base training models because of their high performances accuracies on IMAGENET

[7] challenge. All these models are variants of Inception model that introduced the concept of filters with multiple sizes at the same level of CNN. Inception-v4 (Iv4) simplified the inception architecture by further factorization and introduced reduction blocks while Inception-Resnet-v2 (IRv2) combined residual blocks as in [4] to deepen the model while having computational cost similar to Inception-v4.

2.2 Data augmentation and bagging ensemble architecture (DABEA)

Given a training set where denotes the input data matrix consisting of input images of input dimensions and denotes the matrix consisting of corresponding labels of seven skin lesion classes. We build and train an ensemble of CNN architecture, or DABEA, for two-class classification of skin lesions. The DABEA combines two CNN models: (i) Inception-v4 (Iv4), and (ii) Inception-Resnet-v2 (IRv2). Parameters of both the models are first learnt by training on the IMAGENET data [7] and then on ISIC 2018 skin lesion classification dataset train split [2].

First, the images are normalized by subtracting the per-image pixel mean values and obtained the normalized data :


Normalization is used to remove any bias present in the data [6, 9, 8]. Second, data augmentation is performed on the normalized images by cropping, random brightness and saturation changes, and flipping. The augmented dataset increases the number of images available for training:


Let the original dataset is augmented by times. Training images are then fed to Inception-v4 and Inception-Resnet-v2 networks in cascade, so the ensemble consists of models . Let , , and denote the output, transfer function, and parameters of the model , respectively. The output is given by


where and .

Bagging is then performed by randomly selecting output vectors from each model. Let denote the bagged feature output from CNN model :


Finally, the bagged data from two models are combined into one feature output as follows:


2.3 Meta-Learning Output Layer

Ensemble meta learning is done by providing feature vector to a convolution layer. The convolutional layer combines the two input channels from the two CNN architectures and fused them by pooling to produced output


Forward propagation of activation in the DABEA architecture is illustrated in Algorithm 1.

  Given input images
  for  do
  end for
Algorithm 1 Forward propagation of DABEA

3 Experiments and Results

We split the ISIC 2018 [2, 1] training data into 90:10 to form the internal training and validation split. Two base models, namely, Iv4 and IRv2 are first trained on the training split with weights pre-trained on IMAGENET data [7]. The ensemble convolution is learnt using DABEA feature output over the internal validation split. All the base models and ensemble convolution are trained using cross entropy loss. For producing predictions for every official test and validation image the model DABEA is used over the ISIC 2018 test and validation data.

We experimented with both normalized and un-normalized images as input for training the base models. For normalization we used the per-image normalization as suggested by [6].

For training the Inception-v4 and Inception-Resnet-v2 models, we used Adam optimizer with a initial learning rate of 0.01 which decays over two epochs with an exponential rate of 0.94. All the normalized models were trained for 20000 epochs while un-normalized are trained for 40000 epochs with a dropout probability of 0.2. While training the ensemble part, we randomly bag

output-vectors, each were produced by augmented inputs.

We use a convolution fusion layer[5] for fusing the outputs in the proposed ensemble CNN model ; The single convolutional fusion layer is optimized using Adam optimizer with a constant learning rate of

and is trained for 100 epochs. The learnt weights are then used to obtain predictions on the official ISIC 2018 test and validation data. The outputs produced by the 100 different combinations are then clubbed by using a post-pooling technique. We experimented with 3 different pooling techniques: max-pooling, avg-pooling, and extreme-probability pooling. And finally fixed avg-pooling for test submission.

For DABEA ensemble we used three different base model sets: (i)Un-Normalized Iv4 and IRv2, (ii)Normalized Iv4 and IRv2, (iii)Both Norm. and Un-Norm Iv4 and IRv2. The performance measures for the three different sets over the ISIC 2018 validation data is given in the  Table 1.

Model Balanced Accuracy
CNN Ensemble (Un-Norm.) 0.729
CNN Ensemble (Norm.) 0.837
CNN Ensemble (Norm. + Un-Norm.) 0.732
Table 1: Comparison of Balanced Accuracy values for conv. ensemble using different base model sets over the ISIC 2018 official validation data[2, 1]


  • [1] Codella, N.C.F. et al: ”Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC)”. arXiv
  • [2] Tschandl, P., Rosendahl, C., Kittler, H.: “The HAM10000 Dataset: A Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions”, Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 2018.
  • [3]

    Szegedy, C. et al.: ”Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning”. In: AAAI Conference on Artificial Intelligence 2017.

  • [4] He, K. et al. :“Deep Residual Learning for Image Recognition”. arXiv
  • [5]

    Ju, C., Bibaut, A, van der Laan, M.J: ”The Relative Performance of Ensemble Methods with Deep Convolutional Neural Networks for Image Classification”. arXiv

  • [6] Menegola, A et al: ”RECOD Titans at ISIC Challenge 2017”. arXiv
  • [7]

    Russakovsky, O. et al: ”ImageNet Large Scale Visual Recognition Challenge”. In: International Journal of Computer Vision (IJCV) 2015.

  • [8] Matsunaga, K. et al: ”Image Classification of Melanoma, Nevus and Seborrheic Keratosis by Deep Neural Network Ensemble”. arXiv
  • [9] Berseth, M. : ”ISIC 2017 - Skin Lesion Analysis Towards Melanoma Detection”. arXiv