1 Introduction
Over five million cases of skin cancer are newly diagnosed in the United States annually[1]. Early detection of melanoma, one of the most lethal forms of skin cancer, is critical in finding curable melanomas as well as increasing survival rate[2, 3]. With a large influx of dermoscopy images, arriving of inexpensive consumer dermatoscope attachments for smart phone[4] and a growing shortage of dermatologists[5], automatic dermoscopic image analysis plays an essential role in timely skin cancer diagnosis. Especially, lesion segmentation is one of the most important steps in dermoscopy image access.
However, automatic lesion segmentation from dermoscopy images is a challenging task. The relatively low contrast and obscure boundaries between early stage skin lesions and normal skin regions make it very difficult to distinguish lesion area from normal skin region. The situation deteriorates rapidly when skin lesions are blurred or excluded by artifacts such as natural hairs, veins, artificial ruler marks, color calibration charts, and air bubbles, etc. Some example images from ”ISBI 2017 Skin Lesion Analysis Towards Melanoma Detection Challenge” (ISBI2017 SLATMDC)[6] are shown in Fig.1.






Difficult example images with low contrast, large intraclass variety, and artifacts. Ground truth contours are marked in blue color. First row: low contrast between skin lesion and normal skin regions; Second row: large intraclass variance; Last row: artifacts in images. All images are from ISBI2017 SLATMDC
[6].Many works have been done to try to deal with this difficult problem. Previously, some researchers proposed to segment lesions based on hand-crafted features[7, 8, 9, 10, 11]. Recently, convolutional neural network(CNN) is adopted to improve medical image segmentation accuracy[12, 13]. Deep CNN based skin cancer recognition has been initially proven to achieve dermatologist-level classification accuracy[14]. Fully convolutional neural network(FCNN) has shown promising segmentation results in natural images(PASCAL VOC) by fusing multi-scale prediction maps from multiple layers in itself[15]. FCNN is based on a simple end-to-end learning and multi-scale contextual information is automatically explored in FCNN framework.
Inspired by [15]
, we propose a new FCNN framework for skin lesion segmentation in this paper. We pick up the VGG 16-layer net(publicly available from the Caffe model zoo) pretrained in ImageNet, discard its last classification layer, and replace all fully connected layers by convolution layers with randomly initialized weights. The weights in the remaining layers are kept to tackle small medical image dataset problem based on transfer learning. We add skipping layers to fuse multi-scale prediction maps from multiple layers. Different from
[15] which fuses prediction maps by pixel-wise plus operation, we treat multi-scale prediction maps as multi-scale feature maps and fuse them by concatenation operation. With our concatenating fusing strategy to keep fused features in higher space, we can capture more distinguishing local features and global features. Then we add a randomly initialized convolution layer as the last layer to finish prediction with fused features as inputs.In [16], very deep fully convolutional residual network (FCRN) was proposed to do skin lesion segmentation. The authors in [16] focused on residual network and very deep net(more than 50 layers). However, our proposed FCNN has different structures and is relatively shallower(less than 20 layers). Our experimental results are still comparable with that from [16]. Our major contributions in this paper are summarized as follows:
-
We design a novel fully convolutional neural network for skin lesion segmentation in dermatoscopic images with end-to-end learning, transfer learning, and pixel by pixel prediction.
-
We treat multi-scale prediction maps as feature maps and propose a new fusing layer with a concatenation operation to combine these multi-scale contextual information to obtain better distinguishing features. We also add a convolution layer as the last layer of our FCNN to take the fused features as inputs to finish prediction.
-
We evaluate our proposed FCNN on the public dermatoscopy image dataset from ”ISBI 2017 Skin Lesion Analysis Towards Melanoma Detection Challenge”[6]. We got and submitted preliminary results to the challenge without any pre or post processing.
2 Proposed Fully Convolutional Network
In this section, we detail our proposed fully convolutional network for skin lesion segmentation. Due to relatively small number of medical images and labels available, we begin with pretrained network in image classification task to design our own dense prediction network in order to use transfer learning. Skip layers are added to incorporate multi-scale information into our network. The architecture of our proposed FCNN is illustrated in Fig. 2.

2.1 Basic Network Architecture
We pick up the pretrained VGG 16-layer net (publicly available from the Caffe model zoo) consisting of 13 convolution layers, 3 fully connected layers, 15 RELU layers, 5 pooling layers, 2 dropout layers and 1 softmax layer. This VGG net has been trained for natural image classification in ImageNet. For our dense segmentation purpose, we discard its last two classification layers (last fully connected layer and softmax layer) and replace the remaining 2 fully connected layers with randomly initialized convolution layers. The learned weights in the remaining layers are kept to tackle small medical image dataset problem based on transfer learning. Thus, The modified network is a fully convolutional network which is able to take arbitrary size of images as inputs.
2.2 Adding Skip Layers
We add skipping layers to fuse multi-scale prediction maps from multiple layers. Different from [15] which fuses prediction maps by pixel-wise sum operation, we treat multi-scale prediction maps as multi-scale feature maps and fuse them by concatenation operation. Specifically, after each pooling layer we add an convolution layer to compute a dense prediction map with 2 channels (number of classes for each pixel in this paper). The
convolution layer can be considered as a classifier to classify each pixel independently
[15]. An convolution layer is also attached to the last convolution layer in basic network from above subsection. Thus we already added 6 convolution layers in total to the basic network. The outputs of the 6 added convolution layers are 6 dense prediction maps (2 channels each map).The above 6 dense prediction maps have different resolutions or sizes. We add multiple deconvolution layers[15] to upsample the 6 dense maps to the original size of the input image. The weights of deconvolution layers are randomly initialized and all of them can be learned automatically[15]. Then we add a concatenation layer to concatenate the 6 upsampled prediction maps (with the same size as input image) in third dimension to form a final prediction feature map with channels.
With our concatenating fusing strategy to keep fused features in higher space (12 dimension in this paper), we can capture more distinguishing local features and global features. Then we add a randomly initialized convolution layer after the above added concatenation layer as the last layer to compute the final prediction map with the concatenated feature map as input. In training stage, we add a softmax loss layer to fine-tune our network. In test, the loss layer is removed.
3 Experimental Results
3.1 Dataset
We use the public dermatoscopy image dataset from ”ISBI 2017 Skin Lesion Analysis Towards Melanoma Detection Challenge”[6] to evaluate the performance of our proposed method in this paper. This dataset consists of 150 validation images, 600 test images and 2000 training images. The ground truth binary mask images from expert manual segmentation are also provided with pixel value 255 and 0 indicating lesion pixel and skin pixel, respectively.
3.2 Implementation
Our proposed method is implemented with Maltlab by using the open source deep learning toolbox ”MatConvNet”[17]
. Our computer has a NVIDIA GTX1080 GPU card, Windows 7 system, an Intel i7-7700K processor with 4.5 GHz and 64G memory. In order to tackle the small medical data problem, we utilize a pretrained VGG 16-layer net (publicly available from the Caffe model zoo) to initialize some weights of our proposed FCNN based on transfer learning idea. Stochastic gradient descent(SGD) is used to fine-tune our proposed FCNN with batch size 6, momentum 0.9, weight decay 0.0001, learning rate 0.001.
3.3 Evaluation Metrics
We also use the same metrics as the challenge organizer to evaluate the performance of our proposed method. These metrics are sensitivity(SE), specificity(SP), accuracy(AC), Jaccard index(JA) and Dice coefficient(DI).
According to the challenge rules, these criteria are calculated for each test image and then are averaged on the whole test set to get the final results. Methods are ranked by their respective JA values.
3.4 Performance of Our Method
We demonstrate the good performance of our proposed method by giving qualitative segmentation results and quantitative results.
3.4.1 Qualitative Results
To show how our proposed method works on difficult images, we show our segmentation results of some challenging images in Fig. 3 where ground truth contours and our segmented contours are marked in blue and red color, respectively. Images with low contrast, irregular shapes and artifacts are shown in the first row, second row, last row of Fig. 3, respectively.
From Fig. 3, it can be seen that our method works very well in all of these challenging images.






3.4.2 Quantitative Results
We got and submitted preliminary results to the challenge without any pre or post processing.
4 Discussion and Conclusion
In this paper, we build a new FCNN for skin lesion segmentation. We got and submitted preliminary results to the challenge without any pre or post processing. The performance of our proposed method could be further improved by data augmentation and by avoiding image size reduction.
References
- [1] Rebecca L. Siegel, Kimberly D. Miller, and Ahmedin Jemal, “Cancer statistics, 2016,” CA: A Cancer Journal for Clinicians, vol. 66, no. 1, pp. 7–30, 2016.
- [2] Kenneth A. Freedberg, Alan C. Geller, Donald R. Miller, Robert A. Lew, and Howard K. Koh, “Screening for malignant melanoma: A cost-effectiveness analysis,” Journal of the American Academy of Dermatology, vol. 41, no. 5, pp. 738 – 745, 1999.
- [3] Charles M. Balch and etc. Antonio C. Buzaid, “Final version of the american joint committee on cancer staging system for cutaneous melanoma,” Journal of Clinical Oncology, vol. 19, no. 16, pp. 3635–3648, 2001.
- [4] MoleScope, “https://molescope.com/,” .
- [5] Alexa Boer Kimball and Jack S. Resneck Jr., “The {US} dermatology workforce: A specialty remains in shortage,” Journal of the American Academy of Dermatology, vol. 59, no. 5, pp. 741 – 745, 2008.
- [6] “Skin lesion analysis toward melanoma detection: A challenge at the international symposium on biomedical imaging (ISBI) 2017, hosted by the international skin imaging collaboration (ISIC),” 2017.
-
[7]
Tatiana Tommasi, Elisabetta La Torre, and Barbara Caputo,
“Melanoma recognition using representative and discriminative kernel
classifiers,”
in
Proceedings of the Second ECCV International Conference on Computer Vision Approaches to Medical Image Analysis
, Berlin, Heidelberg, 2006, CVAMIA’06, pp. 1–12, Springer-Verlag. - [8] M. Emre Celebi and etc. Hassan A. Kingravi, “A methodological approach to the classification of dermoscopy images,” Computerized Medical Imaging and Graphics, vol. 31, no. 6, pp. 362 – 373, 2007.
- [9] H. Ganster, P. Pinz, and etc. R. Rohrer, “Automated melanoma recognition,” IEEE Transactions on Medical Imaging, vol. 20, no. 3, pp. 233–239, March 2001.
- [10] Gerald Schaefer, Bartosz Krawczyk, M. Emre Celebi, and Hitoshi Iyatomi, “An ensemble classification approach for melanoma diagnosis,” Memetic Computing, vol. 6, no. 4, pp. 233–240, 2014.
- [11] M. H. Jafari and etc. S. Samavi, “Set of descriptors for skin cancer diagnosis using non-dermoscopic color images,” in 2016 IEEE International Conference on Image Processing (ICIP), Sept 2016, pp. 2638–2642.
- [12] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” may 2015.
- [13] Hao Chen and etc. Xiaojuan Qi, “Dcan: Deep contour-aware networks for object instance segmentation from histology images,” Medical Image Analysis, vol. 36, pp. 135 – 146, 2017.
- [14] Esteva Andre, Kuprel Brett, and etc. Novoa Roberto A., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, 2017.
- [15] Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, vol. abs/1411.4038, 2015.
- [16] L. YU, H. Chen, Q. Dou, J. Qin, and P. A. Heng, “Automated melanoma recognition in dermoscopy images via very deep residual networks,” IEEE Transactions on Medical Imaging, vol. PP, no. 99, pp. 1–1, 2016.
- [17] A. Vedaldi and K. Lenc, “Matconvnet – convolutional neural networks for matlab,” in Proceeding of the ACM Int. Conf. on Multimedia, 2015.
Comments
There are no comments yet.