The Task 3 of ISIC 2018 challenge in Skin Lesion Analysis Towards Melanoma Detection [2, 1] is defined as to generate the binary classification corresponding to each of the 7 disease classes: melanoma, melanocytic nevus, basal cell carcinoma, actinic keratosis, benign keratosis, dermatofibroma and vascular lesion for each test image. The predicted responses are then scored using balanced normalized accuracy.
In this work, we propose a novel framework, data augmentation and bagging ensemble architecture (DABEA), that uses data augmentation and bagging in combination to generate multiple output vectors per model and then applies aconvolution layer as a meta-learner for combining different model outputs. Our main contributions includes using (i) an ensemble DNN models with data augmentation and bagging and (ii) a convolution layer for meta-learning of models.
2.1 Deep neural network models
We selected two of the top performing CNN architectures namely Inception-v4 and Inception-Resnet-v2 
as our base training models because of their high performances accuracies on IMAGENET challenge. All these models are variants of Inception model that introduced the concept of filters with multiple sizes at the same level of CNN. Inception-v4 (Iv4) simplified the inception architecture by further factorization and introduced reduction blocks while Inception-Resnet-v2 (IRv2) combined residual blocks as in  to deepen the model while having computational cost similar to Inception-v4.
2.2 Data augmentation and bagging ensemble architecture (DABEA)
Given a training set where denotes the input data matrix consisting of input images of input dimensions and denotes the matrix consisting of corresponding labels of seven skin lesion classes. We build and train an ensemble of CNN architecture, or DABEA, for two-class classification of skin lesions. The DABEA combines two CNN models: (i) Inception-v4 (Iv4), and (ii) Inception-Resnet-v2 (IRv2). Parameters of both the models are first learnt by training on the IMAGENET data  and then on ISIC 2018 skin lesion classification dataset train split .
First, the images are normalized by subtracting the per-image pixel mean values and obtained the normalized data :
Normalization is used to remove any bias present in the data [6, 9, 8]. Second, data augmentation is performed on the normalized images by cropping, random brightness and saturation changes, and flipping. The augmented dataset increases the number of images available for training:
Let the original dataset is augmented by times. Training images are then fed to Inception-v4 and Inception-Resnet-v2 networks in cascade, so the ensemble consists of models . Let , , and denote the output, transfer function, and parameters of the model , respectively. The output is given by
where and .
Bagging is then performed by randomly selecting output vectors from each model. Let denote the bagged feature output from CNN model :
Finally, the bagged data from two models are combined into one feature output as follows:
2.3 Meta-Learning Output Layer
Ensemble meta learning is done by providing feature vector to a convolution layer. The convolutional layer combines the two input channels from the two CNN architectures and fused them by pooling to produced output
Forward propagation of activation in the DABEA architecture is illustrated in Algorithm 1.
3 Experiments and Results
We split the ISIC 2018 [2, 1] training data into 90:10 to form the internal training and validation split. Two base models, namely, Iv4 and IRv2 are first trained on the training split with weights pre-trained on IMAGENET data . The ensemble convolution is learnt using DABEA feature output over the internal validation split. All the base models and ensemble convolution are trained using cross entropy loss. For producing predictions for every official test and validation image the model DABEA is used over the ISIC 2018 test and validation data.
We experimented with both normalized and un-normalized images as input for training the base models. For normalization we used the per-image normalization as suggested by .
For training the Inception-v4 and Inception-Resnet-v2 models, we used Adam optimizer with a initial learning rate of 0.01 which decays over two epochs with an exponential rate of 0.94. All the normalized models were trained for 20000 epochs while un-normalized are trained for 40000 epochs with a dropout probability of 0.2. While training the ensemble part, we randomly bagoutput-vectors, each were produced by augmented inputs.
We use a convolution fusion layer for fusing the outputs in the proposed ensemble CNN model ; The single convolutional fusion layer is optimized using Adam optimizer with a constant learning rate of
and is trained for 100 epochs. The learnt weights are then used to obtain predictions on the official ISIC 2018 test and validation data. The outputs produced by the 100 different combinations are then clubbed by using a post-pooling technique. We experimented with 3 different pooling techniques: max-pooling, avg-pooling, and extreme-probability pooling. And finally fixed avg-pooling for test submission.
For DABEA ensemble we used three different base model sets: (i)Un-Normalized Iv4 and IRv2, (ii)Normalized Iv4 and IRv2, (iii)Both Norm. and Un-Norm Iv4 and IRv2. The performance measures for the three different sets over the ISIC 2018 validation data is given in the Table 1.
-  Codella, N.C.F. et al: ”Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC)”. arXiv
-  Tschandl, P., Rosendahl, C., Kittler, H.: “The HAM10000 Dataset: A Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions”, Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 2018.
-  He, K. et al. :“Deep Residual Learning for Image Recognition”. arXiv
Ju, C., Bibaut, A, van der Laan, M.J: ”The Relative Performance of Ensemble Methods with Deep Convolutional Neural Networks for Image Classification”. arXiv
-  Menegola, A et al: ”RECOD Titans at ISIC Challenge 2017”. arXiv
Russakovsky, O. et al: ”ImageNet Large Scale Visual Recognition Challenge”. In: International Journal of Computer Vision (IJCV) 2015.
-  Matsunaga, K. et al: ”Image Classification of Melanoma, Nevus and Seborrheic Keratosis by Deep Neural Network Ensemble”. arXiv
-  Berseth, M. : ”ISIC 2017 - Skin Lesion Analysis Towards Melanoma Detection”. arXiv