Ensemble of Convolutional Neural Networks for Dermoscopic Images Classification

08/15/2018 ∙ by Tomáš Majtner, et al. ∙ 0

In this report, we are presenting our automated prediction system for disease classification within dermoscopic images. The proposed solution is based on deep learning, where we employed transfer learning strategy on VGG16 and GoogLeNet architectures. The key feature of our solution is preprocessing based primarily on image augmentation and colour normalization. The solution was evaluated on Task 3: Lesion Diagnosis of the ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Malignant melanoma, which is commonly known as melanoma, is the most dangerous type of skin cancer [8]. It is developed in melanocytes, which are cells producing melanin and and they are giving skin its colour. The risk of melanoma has increasing trend, especially for women under 40. The positive fact is that this form of cancer can be treated successfully with high rate, when it is detected and recognized early. However, the examination of suspicious skin lesions is time-consuming and it requires the knowledge of an expert. It is therefore natural that much research is put to automation of the melanoma recognition process.

The International Skin Imaging Collaboration (ISIC) is an international effort to improve melanoma diagnosis, sponsored by the International Society for Digital Imaging of the Skin (ISDIS). The ISIC Archive contains the largest publicly available collection of quality controlled dermoscopic images of skin lesions. In 2018, ISIC organized already third grand challenge focused on early melanoma detection. This challenge was divided into three separate tasks, where the third tasks was disease classification.

We are presenting here our framework for automated predictions of disease classification within dermoscopic images. Data for this task was extracted from the ”ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection” (ISIC 2018) grand challenge datasets [2, 12]. The disease categories recognized in this challenge include: Melanoma (MEL), Melanocytic nevus (NV), Basal cell carcinoma (BCC), Actinic keratosis / Bowen’s disease (AKIEC), Benign keratosis (BKL), Dermatofibroma (DF), and Vascular lesion (VASC). See Fig. 1 for illustration.

Figure 1: Illustration of disease categories recognized in this study. Image source: https://challenge2018.isic-archive.com/task3/

2 Proposed Method

Balancing Training Set: The official training set for ISIC 2018 Task 3 competition consists of 10,015 images. As can be seen from the Table 1, the number of samples in different categories is strongly unbalanced. Moreover, in some particular classes, there is also very small number of images, which could result in inefficient training of automated convolutional neural network (CNN). Therefore, with the aim to increase the number of training images, and at the same time to increase the robustness of our system, we performed data augmentation by horizontal flipping of training samples. This operation double the number of training images in each class.

The class balancing was achieved by sample rotation. Here, we considered the fact that position of skin lesion during acquisition process is arbitrary and the training data should reflect that property. Rotation was performed around the image center by various rotation factors which are specified in Table 1. Rotation factor implies that each image is rotated times. This means that for , each images is rotated by angle 36°, where .

Presented augmentation leads to more balanced image classes and it also increases rotation invariance of the CNN. Bicubic interpolation is used when the angle was not a multiple of 90°. As was shown by Goodfellow et al.

[4], such data augmentation by flipping and rotation compensates for variations between the training and the test sets and has a positive impact on the CNN performance. Total number of training images after balancing step is 96,274.

Training data 327 514 1,099 115 1,113 6,705 142
After flipping 654 1,028 2,198 230 2,226 13,410 284
Rotation factor 17 28 60 6 62 0 7
After rotation 14,388 13,364 13,188 13,800 13,356 13,410 14,768
Table 1: Total number of training images before and after augmentation.

Colour Normalization: Because the images are from different sources, various lightning conditions change the visual appearance of the skin lesions. Traditional conversion to grayscale images would lead to significant lost of input information, therefore we aimed for colour constancy algorithms. This topic was already covered in number of papers [7]. In our work, we tested several methods and decided to use max-RGB [6], which was also used in a recent study [1]. Max-RGB method is based on the assumption that the reflectance, which is achieved for each of the three channels, is equal [13]. The illustration of this method is presented in Fig. 2.

Figure 2: Examples of image samples after colour normalization. First row corresponds to original images and second row to max-RGB normalization.

Transfer Learning: Our classification is based on fine-tunning the VGG16 network [9] and the GoogLeNet network [10]. As it was observed in number of previous works [11]

, training a deep convolutional neural network from scratch is difficult because it requires a large amount of labeled training data which in case of medical applications is rarely available. A promising alternative is to fine-tune of a CNN that has been pre-trained using, for instance, a large set of labeled natural images such as ImageNet dataset.

The usage of transfer learning for skin lesion classification in this work was additionally motivated by recent skin cancer classification study [3]

. It was performed by retraining the weights of the last three layers, where our version has seven neurons in its output layer. For training purposes, we used colour normalized augmented training dataset. We utilize stochastic gradient descent (SGD) for optimization, with a momentum factor of 0.9, and

regularization level of

. The learning rate of 0.0001 was used in a mini-batch scheme of 8 images and employing 50 epochs for retraining. Our solution was implemented using

Matlab R2018a.

Ensembling: As a last step, we employed ensemble of both our VGG16 and GoogLeNet networks fine-tuned with preprocessed images. The ensemble typically leads to higher performance, when compared with results of a single network [5]. In our case, the ensemble weights were selected manually to be 0.5 for both networks.

3 Results

The proposed solution was evaluated using the official validation set for ISIC 2018 Task 3 competition. This set consists of 193 images and it was evaluated automatically via submission webpage. The achieved balanced accuracy was 0.801 for the VGG16 architecture, 0.797 for the GoogLeNet architecture, and 0.815 for their ensemble.


  • [1] Barata, C., Celebi, M., Marques, J.: Improving Dermoscopy Image Classification Using Color Constancy. IEEE Journal of Biomedical and Health Informatics 19(3), 1146–1152 (2015)
  • [2] Codella, N., Gutman, D., Celebi, M., Helba, B., Marchetti, M., Dusza, S., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In: IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). pp. 168–172. IEEE (2018)
  • [3] Esteva, A., Kuprel, B., Novoa, R., Ko, J., Swetter, S., Blau, H., Thrun, S.: Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks. Nature 542(7639), 115 (2017)
  • [4] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 1. MIT press Cambridge (2016)
  • [5] Kumar, A., Kim, J., Lyndon, D., Fulham, M., Feng, D.: An Ensemble of Fine-Tuned Convolutional Neural Networks for Medical Image Classification. IEEE J. of Biomed. and Health Inf. 21(1), 31–40 (2017)
  • [6] Land, E.: The Retinex Theory of Color Vision. Scientific American 237(6), 108–129 (1977)
  • [7] Madooei, A., Drew, M.: Incorporating Colour Information for Computer-Aided Diagnosis of Melanoma from Dermoscopy Images: A Retrospective Survey and Critical Analysis. International Journal of Biomedical Imaging 2016 (2016)
  • [8] Mermelstein, R., Riesenberg, L.: Changing knowledge and attitudes about skin cancer risk factors in adolescents. Health Psychology 11(6), 371 (1992)
  • [9] Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014)
  • [10]

    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going Deeper with Convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1–9 (2015)

  • [11] Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE Transactions on Medical Imaging 35(5), 1299–1312 (2016)
  • [12] Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 (2018)
  • [13] Van De Weijer, J., Gevers, T., Gijsenij, A.: Edge-Based Color Constancy. IEEE Transactions on Image Processing 16(9), 2207–2214 (2007)