Beyond Visual Image: Automated Diagnosis of Pigmented Skin Lesions Combining Clinical Image Features with Patient Data

by   Jose G. M. Esgario, et al.

kin cancer is considered one of the most common type of cancer in several countries. Due to the difficulty and subjectivity in the clinical diagnosis of skin lesions, Computer-Aided Diagnosis systems are being developed for assist experts to perform more reliable diagnosis. The clinical analysis and diagnosis of skin lesions relies not only on the visual information but also on the context information provided by the patient. This work addresses the problem of pigmented skin lesions detection from smartphones captured images. In addition to the features extracted from images, patient context information was collected to provide a more accurate diagnosis. The experiments showed that the combination of visual features with context information improved final results. Experimental results are very promising and comparable to experts.



There are no comments yet.


page 8

page 9

page 26


Artificial Intelligence for Diagnosis of Skin Cancer: Challenges and Opportunities

Recently, there has been great interest in developing Artificial Intelli...

A Patient-Centric Dataset of Images and Metadata for Identifying Melanomas Using Clinical Context

Prior skin image datasets have not addressed patient-level information o...

A Clinically Inspired Approach for Melanoma classification

Melanoma is a leading cause of deaths due to skin cancer deaths and henc...

Comparative study of image registration techniques for bladder video-endoscopy

Bladder cancer is widely spread in the world. Many adequate diagnosis te...

SkiNet: A Deep Learning Solution for Skin Lesion Diagnosis with Uncertainty Estimation and Explainability

Skin cancer is considered to be the most common human malignancy. Around...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Skin cancer is considered one of the most common type of cancer worldwide. Among the most common types of skin cancer are basal cell carcinoma, squamous cell carcinoma and melanoma. According to the who2018, currently, between 2 and 3 million non-melanoma skin cancers and 132.000 melanoma skin cancer occur every year in the world. Melanoma is by far the most dangerous form of skin cancer, causing more than of all skin cancer deaths (allen2016). Early diagnosis of the disease plays an important role in reducing the mortality rate with a chance of cure greater than (SBD).

The diagnosis of pigmented skin lesions (PSLs) can be made by invasive and non-invasive methods. One of the most common non-invasive methods was presented by SOYER1987803. The method allows the visualization of morphological structures not visible to the naked eye with the use of an instrument called dermatoscope. When compared to the clinical diagnosis, the use of dermatoscope by experts makes the diagnosis of PSLs easier, increasing by - the diagnostic sensitivity (mayer1997). An investigation by vestergaard2008 evaluated the results of nine studies for the diagnosis of PSLs with naked eye and dermatoscope. The results obtained by specialists for diagnosis with the naked eye presented a sensitivity of 0.71 CI111Confidence Interval - and specificity of 0.81 CI -, whereas results with dermatoscope presented sensitivity of 0.90 CI - and specificity of 0.90 CI -. To diagnose PSLs, a number of methods can be used, for example, ABCD rule (NACHBAR1994), Menzies method (menzies1996) and 7-Point checklist (argenziano1998). However, diagnosis methods by dermoscopic images are qualitative or semi-quantitative in such a way that they are highly subjective and rely on the expertise of the specialist to obtain good results (Binder1995).

There is a great research interest worldwide in the development of Computer-Aided Diagnosis (CAD) systems (Silveira2009) that can help not only specialists to perform more reliable diagnoses but also non-specialists to detect early malignant lesions in themselves. Most of the studies in the literature use dermoscopic images but many times someone may want a qualified opinion about a certain lesion and the only equipment available are conventional smartphone cameras. In these cases a CAD system that deals with macroscopic images would be the most suitable way.

The basic structure of a CAD system consists of the following steps: image aquisition, segmentation, feature extraction and classification (masood2013)

. The first step, image acquisition, can be performed by different devices such as dermatoscopes, spectroscopes, conventional digital cameras and smartphone cameras. The second step involves the artifacts removal and lesion border detection. The final steps of a CAD system are to extract a set of discriminative features and classify the PSLs images from these features.

Segmentation is an essential step in image analysis and pattern recognition in such a way that the quality of the results of a CAD system is strongly related to the quality of segmentation

(Pal1993). Nevertheless, segmenting images of skin lesions can be a very difficult task because of the variety of shapes, sizes, textures and colors. In addition, there are variations such as specular reflection, brightness difference, presence of artifacts, low contrast, etc. (Silveira2009)

, which makes the task more complicated. Segmentation methods can be roughly grouped into the following categories: histogram thresholding, clustering, edge-based, region-based, morphological, model-based, active contours and soft computing (for example: neural network and fuzzy logic)

(Celebi2009). In addition, segmentation methods can be separated into two larger groups: automatic and interactive (semi-automatic).

Among the works of skin lesions segmentation, automatic methods are the most common in the literature. Two systematic reviews were performed by Celebi2009; celebi2015 addressing the main existing methods on segmentation. Often new segmentation approaches are presented in the literature by improving on old methods or by proposing new methods that obtain better and more consistent results.

Khalid2016 presented a new segmentation technique based on Wavelet transform applied to the blue channel. It was observed in the experiments that the Wavelet transform is very useful in the removal of some artifacts. A region-based algorithm was proposed by Pennisi2016. The algorithm performs two processes in parallel, skin region detection by threshold and segmentation. The lesion segmentation is performed by means of edge detection with the Canny method and afterwards the Delaunay triangulation is applied. The results of the two procedures are merged to generate the final lesion mask. Eltayef2017

used a hybrid method combining the Particle Swarm Optimization algorithm and the Markov Random Field method. The pre-processing step was performed by a bank of directional filters and image reconstruction methods. The approach presented great potential in automatically identifying the lesions edges.

Recently, a great research interest in deep learning approaches and skin lesions datasets has increased, and works have been developed for skin lesion segmentation.

Yuan2017 proposed a variation of a fully convolutional networks ensemble where they explicitly included information from multiple color spaces. The work was the winner of the challenge at the 2017 International Symposium on Biomedical Imaging (ISBI) (ISBI2017) in the task of skin lesion segmentation. ALMASNI2018 proposed a new segmentation method via full resolution convolutional network. The proposed method learns the features of each input image pixel and does not require pre- and post-processing steps.

Besides the automatic methods, there are the interactive methods that receive this name because they require some human interaction. Among the existing interactive methods for skin lesions segmentation, the most common approaches are based on active contours and region growing that need initial seeds to perform segmentation. The user interaction level depends on the method being applied. Silveira2009, compared six segmentation methods for dermoscopic images. The evaluated interactive methods achieved the best results.

BENSAID1996 developed a method called Semi-Supervised C-Means with the purpose of overcoming difficulties of clustering algorithms in cases where it is possible that some data of each class can be labeled. The proposed method was applied in magnetic resonance imaging and was superior compared to other methods in the literature. Surlakar2016

presented a comparative analysis of K-Means and K-Nearest Neighbors (K-NN) methods in the segmentation of histopathological images of sweat gland tissues. The results were evaluated using the mutual information metric. The authors concluded that K-NN is the best alternative for interactive segmentation of images. Recently,

Luis2018 proposed an interactive segmentation algorithm for medical images, called Seeded Fuzzy C-Means (S-FCM). The method treats the seeds provided by the user as centroids and classifies the unlabeled pixels based on the similarity to the seeds. The proposed method was evaluated in several datasets, including dermoscopic image dataset.

In addition to the works focused on segmentation methods, several models of CAD systems have been presented in the literature combining different approaches of segmentation, feature extraction and classification (barata2014; ferris2015; abuzaghleh2015; suganya2016; jaworek20162; bakheet2017). The feature extraction approaches used by these systems can be divided into four classes: hand-crafted features, dictionary-based features, deep learning features and clinically inspired features (fidalgo2018). The most common ones are based on hand-crafted features inspired by the ABCD rule (asymmetry, border, color and diameter) and in texture descriptors (korotkov2012). According to the review by masood2013

, the most common classification methods used in the diagnosis of PSLs are: Artificial Neural Network, Statistical Analysis, Support Vector Machine (SVM), Decision Trees and K-NN. However, most of these approaches were applied only in dermoscopic images.

Recently, several approaches have been proposed to perform PSLs diagnoses from macroscopic images. The segmentation of macroscopic images becomes more challenging due to a variety of external factors, for instance, the difference in image brightness that is a common point on which the segmentation approaches focused on macroscopic images attempt to address (wong2011; cavalcanti2013; cavalcanti20132).


introduced a new system for the diagnosis of PSLs. The proposed system contains a decision-making component that combines the results of image classification and context knowledge (skin type, age, gender and part of the body) using a Bayesian network. The addition of context knowledge showed improvement in the final results.

ramezani2014 developed a system for melanoma diagnosis. A set of features based on ABCD rule was extracted and to the classification step a SVM was used. chang2013 collected a total of conventional photographs and compared the classification results of the proposed CAD system with the results obtained by specialists. The results were superior to those obtained by specialists, suggesting that even with conventional images, the system has a high discriminative capacity for malignant and benign lesions. oliveira2016 developed a system capable of classify not only the PSLs but also the features of asymmetry, border, color and texture, so that the system provides the expert with both the final diagnosis and the individual features of each lesion.

Due to the increase in the smartphones processing power, some work have been carried out in order to embed complete PSLs diagnosis systems in these devices. Since smartphones have become increasingly accessible, this kind of system can contribute much in the early diagnosis of malignant lesions (do2018; rat2018).

This paper proposes a new approach for the diagnosis of PSLs from macroscopic images captured by smartphones. The proposed system combines features extracted from the image with context information about the patient in order to obtain a more accurate diagnosis. In addition, a new method of interactive segmentation is presented that takes into account the similarity of color and proximity of pixels.

The remainder of this paper is organized as follows: Section introduces the segmentation framework; Section describes the feature extraction stage; Section explains the classification method used; Section describes the proposed approach; In Section the experiments and results are presented; Finally, section presents a brief conclusion with directions for future work.

2 Segmentation Framework

The proposed segmentation framework consists of the following steps: image acquisition, user inputs, pre-processing, image segmentation and post-processing. In Figure 1, a block diagram of the proposed segmentation framework is presented. The input images were standardized with size 300x225 (4:3 ratio). The size chosen, although much smaller than the size of the original images, implies a significant reduction of the segmentation time without significant loss in the results quality. The final result returns the mask of the region of interest (ROI). The intermediate steps are detailed in the following subsections.

Figure 1: Block diagram of the proposed segmentation framework.

2.1 User interaction

User interaction is accomplished through clicks on the image. Clicks are interpreted as labeled seeds marking the lesion and the background. The only constraint of the proposed method is that at least one seed for each region must be provided by the user, i.e., the minimum user interaction to perform segmentation is two clicks. Figure 2 shows an example of interaction where the user selected three foreground seeds (red dots) and three background seeds (blue dots).

Figure 2: Example of interaction where the user selected three foreground seeds (red dots) and three background seeds (blue dots).

2.2 Pre-processing

The motivation for applying pre-processing steps is to increase performance in the segmentation stage. Figure 3 presents the pre-processing steps performed in this work. Figure 4a was used as an example to demonstrate the results obtained in each pre-processing step. The details of each block are described in the following sections.

Figure 3: Block diagram of the pre-processing steps.

2.2.1 Hair removal

Hair is one of the most common artifacts in skin lesion imaging, its removal is essential to improve image segmentation. Most hair removal methods are based on three main steps, namely, highlights hair, segmentation and inpainting (Abbas2011). The hair was highlighted with the Laplacian of Gaussian filter with an 5x5 kernel. The hair mask was generated by fixed thresholding of . Finally, each masked pixel of the original image was replaced by the mean value of neighboring unmasked pixels (Figure 4b) in a window of pixel size.

2.2.2 Color space conversion

The L*a*b* color space mimics the way human vision perceives colors making it perceptually uniform (Garcia-Lamont2018). This color space was chosen due to linear variation of hue, whose Euclidean distance, used in the segmentation method, is presented as a good metric to calculate the chromaticity difference. In addition, the image luminance (L*) is presented in a separate channel which facilitates the correction of illumination.

2.2.3 Illumination correction

This stage aims to normalize the images illumination, following the approach proposed by Cavalcanti2010. The first step consists in determining a set of pixels that belongs to the skin region. In the original approach a region of x

pixels of each image corner was used to estimate the illumination map.

Glaister2012 used the statistical region merging method to segment skin lesion. In this work, the Otsu method (otsu) is used due to its ease of implementation and speed of execution. The set of pixels is used to adjust the quadratic function given by:


where the adjustment of the function is given by the choice of the coefficients that minimize the error function :


where and are the and coordinates of the -th element of the set and is the total number of pixels belonging to the skin region. In the approach proposed by Cavalcanti2010 the illumination correction is performed on channel of the color space. For this work the channel of the color space ** is used and the illumination correction is computed by the following equation:


where is the final image normalized resulting from the subtraction of the original image by the estimated illumination map . Replacing the original luminance channel for results in an image in the color space ** with the normalized luminance. An example of normalized image by the illumination correction stage can be seen in Figure 4c.

2.2.4 Median filter

Lastly, a median filter with a window size of 5x5 is applied to remove small artifacts that may affect the segmentation quality. This filter was chosen for its ability to remove noise while preserving image contours (oliveira2016). Figure 4d shows an example of the median filter application.

Figure 4: Pre-processing steps: (a) Original image; (b) Image after hair removal; (c) Image with normalized luminance; (d) Smoothed image by median filter.

2.3 Image Segmentation

The image segmentation step consists of separating the pixels which represent the lesion and pixels which represent the background. Since a set of labeled seeds (pixels) are available, due to the interactive nature of the proposed method. An algorithm for pixel classification based on the nearest neighbor rule was used. This decision rule provides a simple non-parametric procedure for assigning labels to the input set based on the class label represented by the nearest neighbor (Keller1985). In order to calculate the similarity between two samples, the distance measure proposed by achanta2010 which takes into account both the similarity of color and proximity of the pixels, was used in this work. The distance measure is calculated according to


where is equal to the sum of the ** distance and the plane distance normalized between the sample pair and . was defined as the diagonal size of the image . The variable controls the importance of the spatial position in the classification of the unlabeled pixels and impacts the compactness of the final mask.

The classification is performed by assigning to the input sample , the label of the -th sample labeled that presents the shortest distance according to the following equation.


2.4 Post-processing

Undesired elements or holes may be present in the mask resulting from the image segmentation step. So, three post-processing steps are applied. The first step is to identify the objects in the image and check which ones have at least one seed provided by the user, otherwise the object is removed from the mask. Next, a morphological dilation operator is applied with a 3x3 sphere kernel, making the mask contour smoother. Finally, a morphological reconstruction algorithm was used to fill possible holes in the mask, i.e., fill regions of the image background that are not reachable starting from the edges (soille2013). Figure 5 presents an example of applying the post-processing steps.

Figure 5: Post-processing example. Before (a) and after (b).

3 Feature Extraction

Once the ROI is identified, the next step consists in the feature extraction, i.e., relevant information is extracted from the images and information that is irrelevant to the classification step is discarded. The proposed system is based on the features analyzed by ABCD rule and texture analysis. The set of extracted features can be summarized in asymmetry (), border irregularity (), color () and texture () properties where has a total of features. Since the images were obtained from an unknown light source, the same lesion may present very different values in color features for different types of illumination. We used the color consistency method called Shades of Gray in order to normalize the colors of the images (shades2004).

3.1 Asymmetry

The purpose of this set of features is to quantify the asymmetry of the lesion, both in shape (messadi2014) and color (smaoui2013).

3.1.1 Shape asymmetry ()

Let be the lesion mask and the new mask obtained after rotating by . The asymmetry index is computed by the following equation


where the operator returns the mask area.

The lesion mask is centered on the image and rotated so as to align the larger axis horizontally. A vertical cut and a horizontal cut are made in the image, resulting in 4 new masks: , , and . Next, two new indexes and are calculated from the obtained masks in the same way as in the equation presented above.


3.1.2 Color asymmetry ()

To measure the asymmetry in terms of color, the sum of the Chi-square distances between the histograms of each lesion part and for each RGB component is calculated.


Therefore, two measures of color asymmetry are calculated: and .

3.2 Border

To calculate border irregularity, common measures in the literature are used, among them: Compactness, Fractal Dimension, Radial Variance, Pigmentation Transition and Solidity

(fractal1993; bhuiyan2013; jaworek2016; lynn2017; yamunarani2018). Besides these measures, the method of border irregularity evaluation proposed by jaworek2015 was used.

3.2.1 Compactness ()

The lesion irregularity can be measured taking into account its perimeter and its area . A circle-shaped lesion has a compactness equal to .


3.2.2 Fractal Dimension ()

The edge of the lesion mask can be modeled as a fractal curve, from this curve we can derive its fractal dimension. Fractals have their own dimension which are usually non-integer and gives the idea of how much the object fills the space in which it “lives”. This dimension can be calculated using the box counting method. The steps that describe the calculation of the fractal dimension (deviha55) are presented below.

  • Divide the image into regular meshes with mesh size of .

  • Counts the number of boxes that intersect at least one element of maximum level and minimum level of the binary image .

  • The value is computed for different values of .

  • The fractal dimension is represented by the slope of the curve that best fits the points .

3.2.3 Radial Variance ()

The border irregularity of the lesion can be estimated by the variance of the radial distance distribution described by:


where is the distance from the -th boundary pixel to the lesion centroid , is the average distance and is the total boundary pixels.

3.2.4 Pigmentation Transition ()

Since lesion border with abrupt variation may suggest malignancy according to the ABCD rule, this feature was used in order to describe the pigmentation transition between lesion and skin. The RGB image is transformed into a single luminance component described by:


From the luminance component the magnitude of the gradient vector for each pixel belonging to the lesion edge is calculated. To describe the pigmentation transition, the mean and variance of magnitude values of the gradient

( where is the total edge pixels) are calculated.




3.2.5 Solidity ()

The solidity is computed as the rate between the area of the lesion mask and its convex hull area .


3.2.6 Jaworek Method ()

It consists of four steps (jaworek2015): skin lesion rotation, borderline function, smoothing and irregularities detection. Initially the mask is rotated so that the major axis of the lesion is parallel to the horizontal axis. Then, the bounding box of the lesion is calculated. The borderline function is generated from the distance of the lesion border pixels to the bounding box. The function is smoothed and the Jaworek border irregularity is counted as the total stationary points of the function.

3.3 Color

A heterogeneous color lesion may be a melanoma sign, so measures that quantify the color difference in a PSLs image are important in discriminating this sort of lesion. For this, histogram measures (stoecker2009) and color variegation (jaworek2016) were used.

3.3.1 Histogram Measures ()

From the histograms of each channel of the RGB and HSV color spaces, the mean , variance and asymmetry are computed, where is the -th pixel intensity level,

is the occurrence probability of intensity level

and is the number of intensity levels. A total of histogram measures are computed.


3.3.2 Color Variegation ()

The color variegation is calculated as the log of the variance over mean of each image channel and color space. In this work, RGB and HSV color space are used, therefore, six new measures are computed.


3.4 Texture

Texture properties are calculated from grayscale images of the lesions, in order to quantify their structural characteristics. In this work, lesion texture properties are extracted using the Gray Level Co-occurrence Matrix (GLCM) and Gray-Level Run Length Matrix (GLRLM).

3.4.1 Glcm ()

Texture properties can be computed from the GLCM, initially described by haralick1973. The GLCM is defined as a matrix of relative frequencies , i.e., the frequency at which neighboring pixels in a grayscale image, separated by a distance with orientation , occur with gray level and . Contrast, correlation, energy and homogeneity measures are calculated from the GLCM and at four different angles . As a result, new measures are extracted where , , and .


3.4.2 Glrlm ()

With GLRLM, higher order statistical features for a given texture can be calculated. Five measures were presented in GALLOWAY1975: short runs emphasis (SRE), long runs emphasis (LRE), run length non-uniformity (RLN), run percentage (RP) and gray level non-uniformity (GLN). Later, CHU1990 presented two new measures: low gray level run emphasis (LGRE) and high gray level run emphasis (HGRE). GLRLM matrices are of type and indicate the number of gray level occurrences with runs of length (total repetitions of the primitive) and is the primitives inclination angle. Just as jaworek20162, the GLRLM was calculated for four orientations , the four resulting matrices have been added together and seven measures have been calculated from the resulting matrix.


4 Classification

After feature extraction, the next and last step is to classify the PSLs. In this work we used the Extreme Learning Machine (ELM) algorithm, proposed by HUANG2006, with regularization factor (deng2009)

, called Regularized Extreme Learning Machine (RELM). The RELM is used in the training of a single hidden layer feedfoward neural network (SLFN) in order to overcome the problems of slowness in training and convergence to local minima found in the backpropagation method. As a learning algorithm, an ELM offers better generalization performance, low computational cost and easy of implementation


Compared to SVM, which is a fairly common approach in the classification of skin cancer, the ELM presents similar generalization ability and superior computational speed (liu2012). In addition, the ELM deals naturally with multiclass problems. Since the SVM was originally developed for binary problems then there is a need to use some method to be applied to multiclass problems (huang2012).

The RELM was used instead of ELM because it presents better generalization performance and it holds strong anti-noise ability.

The following subsections describe the data pre-processing steps and the classification method used.

4.1 Data Pre-processing

The quality of the classification results strongly depend on the quality of the training data. Regardless of the classifier used, if the training data are incorrect or not well processed, poor models will result (Kamiran2012). Therefore, two pre-processing steps are adopted in this work: feature scaling and oversampling.

The min-max normalization method was used to rescaling the data in the interval . The features scaling general formula is given as:


where represents the -th feature of the -th dataset sample.

Another problem that the automatic diagnosis of skin cancer directly faces is the imbalance of the dataset, due to the difficulty in collecting these data and the rarity of certain diseases. In order to overcome these difficulties, a methodology for the generation of synthetic samples, called Synthetic Minority Over-sampling Technique (SMOTE) (SMOTE2002), was applied. This method generates synthetic samples from the linear combination of the actual samples. Therefore, SMOTE was applied in all training stages and in all minority classes, so that the number of samples of all classes were the same.

4.2 Classifier

This work proposes the use of RELM for the classification stage of PSLs where the pre-processed data are used in the network training.

Given a set of arbitrary training samples where and , a SLFN with hidden nodes can be modeled by the following equation:


The matrix form of the previous equation can be expressed as follows:



expresses the activation function of the hidden layer,

is the weights vector that connects the

-th neuron of the hidden layer to the neurons of the input layer,

is the weights vector that connects the -th neuron of the hidden layer to the neurons of the output layer, and is the hidden neuron bias. The matrix stores the hidden layer outputs.

The first step of RELM is to calculate the matrix according to eq. (34). Once the matrix have been calculated, the weights of the matrix can be calculated by solving the following linear system . Considering that the number of neurons is less than the number of samples then the calculation of is given by:



is the identity matrix of order

and is the regularization factor. The ELM is a particular case of RELM when .

The RELM algorithm can be summarized in three steps.

  1. Randomly assign input weights to and bias ;

  2. Compute matrix by eq. (34);

  3. Compute matrix by means of eq. (35).

5 Proposed approach

The proposed approach encompasses all the steps mentioned above and the union of these steps produces a complete CAD system. In addition, it is proposed in this paper the combination of patient information together with the information extracted of the lesion image, the combination is performed in the classifier input vector. For datasets that do not contain context information, only image features were used. The final diagnosis returned by the system is the class referring to the highest value of the output neuron. Figure 6 presents a block diagram of the proposed approach.

Figure 6: Block diagram of the proposed approach.

6 Experiments

6.1 Datasets

Three datasets were used to evaluate the performance of the segmentation algorithm and the classifier. The datasets present an increasing level of difficulty and were selected in order to evaluate the performance of the algorithms under the most diverse conditions. Details of the datasets used are listed in the Table 1.

Name Images Lesions
PH2 200 Nevus and Melanoma
ISBI 2017 2750 Nevus, Seborrheic Keratosis and Melanoma
PAD-UFES 220 Nevus, Seborrheic Keratosis and Melanoma
Table 1: Datasets details

The PH2 (PH2) dataset is composed of dermoscopic images acquired at Hospital Pedro Hispano under a controlled environment, with the same equipment and the same conditions. It is widely used in the literature and previous works already present excellent results for these images. Among the artifacts and challenges of this dataset we can mention the presence of hair, dark corner, specular reflection and small variation of luminosity.

The second dataset, called ISBI 2017 (ISBI2017), was developed from the collaboration of several clinical centers and is currently used in one of the the IEEE International Symposium on Biomedical Imaging (ISBI) challenges. This dataset is more challenging due to the difference in color of the images obtained by different equipments. In addition, there are the presence of various artifacts such as, colored patches, rectangular black frames, pen demarcations, presence of rulers, etc. Although this dataset has images, only the images make up the test set proposed by the challenge are used. In addition to the images, this dataset provides the patient’s age and sex information.

Finally we have a new dataset developed by the Dermatological Assistance Program (in Portuguese: Programa de Assistência Dermatológica abbreviated by PAD) in cooperation with the Computing Lab. inspired by Nature (Labcin) at the Federal University of Espírito Santo (UFES) that carried out a joint work to acquire images of skin lesions with smartphones. In addition to the problems encountered by dermoscopic images, the images acquired by smartphones are subject to external illumination, presence of shadows, low resolution, blurred image, etc. This dataset is made up of melanocytic nevus (NEV), seborrheic keratosis (SK) and melanoma (MEL). In addition to the images, seven context information was collected from each patient. Among them, age of each patient and six more information about the lesion that can be obtained by the following questions: “Did it scratch?”, “grow?”, “hurt?”, “change?”, “bleed?” and “raise?”. The answers to the questions are summarized as “Yes” and “No”.

The datasets PH2 and ISBI 2017 are common benchmarks in the literature used to evaluate the performance of new segmentation and classification methods of skin lesion images, the results for these datasets are presented in a single session. Since this is the first work performed with the PAD-UFES dataset, its results are presented in a separate session as a case study.

6.2 Evaluation methodology

6.2.1 Metrics

To evaluate the performance of the segmentation and classification methods, six common metrics in the literature were used. From the confusion matrix, five metrics were calculated: Sensitivity (

36), Specificity (37), Accuracy (38), Balanced Accuracy (39

) and Jaccard Index (

40). The sixth metric used to measure the diagnostic capacity of the proposed system is evaluated using the Receiver Operating Characteristic (ROC) curve analysis, where the area under the curve (AUC) is calculated. The ROC curve is obtained by plotting the true positive rate in function of the false positive rate by varying the decision threshold (bradley1997). The AUC is calculated as the integral of the ROC curve.


The variables TP, TN, FP and FN are abbreviations of True Positive, True Negative, False Positive and False Negative, respectively. At the pixel level these variables are defined according to the Table 2 for the segmentation results where is a foreground pixel and is a background pixel.

Predicted Pixel Actual Pixel
TP 1 1
TN 0 0
FP 1 0
FN 0 1
Table 2: TP, TN, FP and FN at the pixel level

The evaluation metrics above naturally deal with classification results of binary problems, however, since the problem of classifying PSLs is often formulated as a multiclass problem, the use of such metrics may not be intuitive. In multiclass problems, the calculation is performed as the average of each per-class metric. For more details refer to the work of


6.2.2 Segmentation methods comparison

Unlike automatic methods, evaluating the performance of interactive methods is a more complicated task due to the need of user interaction to provide the labeled seeds to the algorithm. In this work the algorithm 1 is proposed in order to simulate expert inputs. So, some assumptions are considered. Assuming the user is an expert then he/she will be able to:

  1. Insert the most suitable seed quantity;

  2. Choose the best spatial position for each seed.

Based on these premises, the pseudo-code as shown in the algorithm 1, works as follows. At each iteration of the internal loop from to , a set of random image seeds is selected using the ground truth mask present in the datasets. The segmentation procedure is then performed with the selected seeds and their result is stored in the vector . The external loop stores the best result obtained with the random seeds for a number of points from to . The input refers to the original lesion image, is the ground truth mask of the lesion, is the segmentation result mask, was set to and finally set to . The algorithm returns the values of the evaluation metrics for the solution that obtained the best Jaccard Index.


1:  for  to maxInputSeeds do
2:      floor()
3:      ceil()
4:     for  to maxEvaluation do
5:         SelectRandomSeeds(, , )
6:         Segmentation(, )
7:         ComputeEvaluationMetrics(, )
8:     end for
9:      max()
10:  end for
11:   max()
12:  return  
Algorithm 1 Interactive method performance evaluation

The proposed method, called Interactive Segmentation based on Nearest Neighbor (ISNN), is compared with recent works in the literature that obtained the best results in PH2 and ISBI 2017 datasets. The results of the automatic methods presented in this work were collected directly from the literature, and the S-FCM interactive method was implemented and evaluated using the proposed algorithm 1. The grouping threshold parameter of the S-FCM method was set to . The pre- and post-processing steps used in the proposed algorithm were maintained in the evaluation of the S-FCM method, in such a way that only the image segmentation block was changed, providing the same structure of the segmentation framework for both methods.

6.3 Segmentation Results

The performance results of the algorithms for the datasets PH2 and ISBI 2017 are presented in the Table 7. All results were calculated as the average of the individual results for each image and the best results are shown in bold.

Method JI(%) SE(%) SP(%) AC(%)
ISNN 93.43 97.16 97.12 98.05
ISNN 93.36 97.13 97.09 97.99
ISNN 92.67 96.62 96.80 97.73
S-FCM 92.55 96.80 95.81 97.60
Al-masni et al. ALMASNI2018 84.79 93.72 95.65 95.08
Eltayef et al. Eltayef2017 93.88 97.58 94.74
Khalid et al. Khalid2016 93.87 94.57 94.72
Pennisi et al. Pennisi2016 80.24 97.22 89.66
ISBI 2017
ISNN 88.36 93.29 96.96 97.14
ISNN 87.66 93.10 96.82 96.90
ISNN 86.20 92.97 96.27 96.48
S-FCM 87.53 92.43 96.41 96.65
Al-masni et al. ALMASNI2018 77.11 85.40 96.69 94.03
Yuan and Lo Yuan2017 76.50 82.50 97.50 93.40
Berseth MattBerseth2017 76.20 82.00 97.80 93.20
Bi et al. Bi2017 76.00 80.20 98.50 93.40
Table 3: Segmentation Results

The experimental performance results show that the interactive methods are superior to the automatic methods, presenting a considerable difference, especially in the Jaccard Index. The proposed method ISNN with obtained the best results on all datasets and for almost all evaluation measures. The values of the variable

have a direct impact on the final results, this is due to the increase or decrease in the importance of the pixels spatial position at the moment of classification, such that the higher the

value, the greater the influence of the pixels on the final result, making the mask more compact. The results present an inferior performance by increasing the value. However, it was noticed in the tests that high values of , like , have good application in cases that there is low color contrast between the lesion and the skin. In these cases only the color information is not enough to perform a good segmentation.

A more detailed analysis of the ISNN method results was performed based on the Jaccard Index where the final segmentation result for each image was classified as: Bad (JA0.65), Good (JA0.65 and JA0.9) or Excellent (JA0.9). The relative frequencies of these results for each dataset are shown in a bar chart in Figure 7. By analyzing the bar chart, the difference in the difficulty level of the PH2 and ISBI datasets is clear, so that the percentage of excellent segmentations decreases abruptly.

Figure 7: Percentage of images that got bad, good, and excellent results for each dataset based on the Jaccard Index.

The computational time was evaluated for each stage of the proposed segmentation framework. All experiments were developed and tested in the MATLAB 2015a development environment on a computer with an Intel Core i5 processor, 8GB of RAM and Microsoft Windows 10 Operating System. The computational time in seconds was calculated in the steps subsequent to the user inputs, the results are presented in the Table 4.

Pre-processing Segmentation Post-processing Total
time(s) 0.197 0.098 0.014 0.309
Table 4: Computational time

The framework presents a fast execution time. Pre-processing is the most time consuming step, most due the hair removal and illumination correction methods.

6.4 Classification Results

In order to evaluate the discriminative capacity of the extracted features in the classification of PSLs, the proposed system was tested in the same datasets. The input vector of the RELM network is formed by the union of the extracted features with the context information available in each dataset. In order to evaluate the proposed system, the leave-one-out cross validation method was used, since the dataset is unbalanced and small this method is presented as more appropriate. In addition, the result was computed as the average of runs since the RELM weights are initialized at random.

6.4.1 Ph2

Initial experiments were performed for the PH2 dataset. Since there is no well defined methodology for adjusting the RELM parameters a preliminary study of the parameters sensitivity analysis was carried out. The RELM has two parameters, number of neurons and regularization factor . A series of tests were performed for the combinations of the parameter ranges and . The AUC value was calculated for each pair of parameters and a surface curve (Figure 8.) was generated to visualize the impact of the parameters on the final result.

Figure 8: Sensitivity analysis of the RELM parameters.

A reading of the surface chart suggests that values from for the interval presented the best results. Therefore, the parameters and were selected empirically for all subsequent experiments.

The experimental results obtained in the PH2 dataset are shown in the Table 5 along with results from recent work in the literature where the best results are highlighted in bold. The proposed method obtained the best specificity value (SP) and a median sensitivity value (SE). Clearly the extracted features were able to perform a good classification of the PH2 dataset, making the proposed method competitive in relation to current approaches.

Barata barata2014 - - - 0.960 0.800 0.823 0.880
Lei Bi bi2016 0.938 0.875 0.931 0.920 0.903
Omar abuzaghleh2015 - - - 1.000 0.915 0.932 0.958
Pennisi Pennisi2016 - - - 0.935 0.871 0.936 0.903
Proposed 0.974 0.902 0.933 0.926 0.917
Table 5: PH2 dataset classification results

6.4.2 Isbi 2017

The ISBI 2017 is a multiclass dataset, however, the challenge separated the classification of lesions into two binary problems: (1) “Melanoma against seborrheic keratosis and nevus” and (2) “Seborrheic keratosis against melanoma and nevus”. In addition to the features calculated from the images, patients context information like sex and age were used to classify the images. The results obtained are presented in the Table 6 and were compared with the ISBI 2017 winner (kazuhisa2017)

. It is worth mentioning that the ISBI 2017 winner used an approach based on an Ensemble of Convolutional Neural Networks as well as an external dataset. The development of an approach such as that of

kazuhisa2017 demands a lot of images and high computational cost. Despite the inferior results obtained by our approach, the average accuracy result was close to the winner of the ISBI 2017 and even better in the classification of seborrheic keratosis.

Melanoma vs Rest
ISBI 2017 Winner 0.868 0.735 0.851 0.828 0.793
Proposed 0.806 0.695 0.780 0.765 0.737
Seborrheic keratosis vs Rest
ISBI 2017 Winner 0.953 0.978 0.773 0.803 0.876
Proposed 0.883 0.795 0.813 0.810 0.804
ISBI 2017 Winner 0.911 0.856 0.812 0.816 0.834
Proposed 0.845 0.745 0.797 0.788 0.771
Table 6: ISBI 2017 dataset classification results

6.5 Case study

The case study was performed with the dataset of macroscopic images, PAD-UFES. All the experiments performed in this study follow the same methods and parameters used in the results evaluation of the PH2 and ISBI 2017 benchmarks. First the results of segmentation and then the classification results are presented.

6.5.1 Segmentation results

The performance results of algorithms ISNN and S-FCM for the dataset PAD-UFES are presented in Table 7. All results were calculated as the average of the individual results for each image.

Method JI(%) SE(%) SP(%) AC(%)
ISNN 84.08 94.02 98.50 98.21
ISNN 83.02 93.65 98.38 98.07
ISNN 81.00 93.68 97.93 97.69
S-FCM 82.88 91.35 98.66 98.06
Table 7: PAD-UFES - Segmentation Results

The proposed method ISNN with also presented the best result on this dataset. The segmentation quality were evaluated in terms of the Jaccard index and classified as: Bad (JA0.65), Good (JA0.65 and JA0.9) and Excellent (JA0.9). As a result, the percentages were: , and respectively. Despite the difficulties encountered in segmenting macroscopic images, only of the segmented images were rated as Bad.

Segmentation examples are presented in Figure 9 in order to exemplify the behavior of the method. Figure 9(a) consists of the original image with seeds provided by the user. Figure 9(b) shows the image after the pre-processing step and Figure 9(c) shows the mask resulting from the segmentation step compared to the Ground Truth mask generated by the expert.

Figure 9: Application examples of the proposed method: (a) Original image with user inputs; (b) Pre-processed image; (c) Segmentation result (white) and Ground Truth (red contour).

6.5.2 Classification results

Initial experiments of multiclass classification were performed with the PAD-UFES dataset to verify the benefits of combining image features with context information. The classification results obtained with the proposed method are shown in the Table 8. The Experiment column describes which features were used and the column represents the number of features used in each experiment. The confusion matrix for one of the experiment runs using image features and patient context information is shown in Figure 10.

Experiment SE SP AC BAC
Image features 59 0.664 0.816 0.640 0.740
Context information 7 0.684 0.836 0.692 0.760
Combined 59+7 0.744 0.872 0.753 0.808
Table 8: PAD-UFES dataset classification results

Figure 10: Confusion matrix describing the performance for the PAD-UFES dataset.

The results of the combination of features shown in Table 8 presented an accuracy difference of in relation to the experiment with the image features, it turns out that the expressive gains are due to the low correlation of the context information with the image features so that its addition to the features vector adds important information for the PSLs classification. The confusion matrix presented in Figure 10 shows that the major errors made by the classifier are related to the SK class whereas the NEV and MEL classes appear easier to distinguish.

Since we are interested in detecting melanoma, the problem can be converted to a binary classification problem by grouping the seborrheic keratosis and nevus classes into a single non-melanoma class. An univariate analysis based on the AUC metric was performed to individually evaluate the

features in the classification of the melanoma and non-melanoma classes. The univariate analysis result can be visualized in Figure 11. In this case, we highlight features that obtained an AUC value higher than , being: Shape Asymmetry (), Compactness (), Radial Variance (), Solidity (), Red Channel Variance () and Variegation of Channels () and (). Classification results between melanoma and non-melanoma are presented in the Table 9. The experiments were performed for different isolated sets of features and their combination.

Figure 11: Univariate analysis of features based on the AUC metric for classification of melanoma and non-melanoma.
Experiment AUC SE SP AC BAC
Image features 59 0.838 0.652 0.844 0.823 0.748
Image features* 9 0.862 0.671 0.850 0.831 0.761
Context information 7 0.853 0.667 0.803 0.788 0.735
Combined 59+7 0.904 0.726 0.902 0.883 0.814
Combined* 9+7 0.928 0.777 0.894 0.881 0.835
* Experiments with the best image features.
Table 9: PAD-UFES dataset classification results between melanoma and non-melanoma.

The experimental results presented in Table 9 shows that the performance of the classifier is significantly improved by the combination of image features and context information. In addition, the selection of the best image features also improved the results with an expressive reduction in the total of features used. Based on the results obtained by our system (: and :) and the results of clinical diagnosis to the naked eye presented by vestergaard2008 (: CI - and : CI -) we can observe that the diagnostic capacity of the proposed system is comparable to an expert.

As pointed out by haenssle2018 and reinforced by brinker2018 deep learning (e.g., convolutional neural networks) alone does not solve the problem of skin cancer diagnostic yet and an improvement in classification performance could be achieved by adding clinical data as inputs to the classifiers. As far as we know, our approach combining image features with patient context information is one of the first in this direction showing that patient information is essential in the classification process similar to the decision making of dermatologists. The dataset collected in PAD-UFES with smartphone images and clinical data and the source code are available from authors upon request.

7 Conclusion

This work presented a new system for classification of pigmented skin lesions capable of dealing with dermoscopic and macroscopic images. In this study a new method of interactive segmentation was presented in order to overcome the difficulties of automatic methods. The Regularized Extreme Learning Machine algorithm was used to train a single hidden layer feedfoward neural network for the classification of pigmented skin lesions using a set of features based on ABCD rule and texture analysis. Since macroscopic images may not be very informative due to low resolution, blur and the presence of many artifacts, we have investigated the use of patient context information in addition to the extracted features from the images. The classification results of pigmented skin lesions indicated that the performance of the proposed system is comparable to that of an expert by the naked eye.

Future work shall investigate the system’s ability to classify a larger number of classes so that the CAD system can assist specialists in diagnosing a wider range of lesions. In addition, new features that might increase system performance should be investigated.


XXX thanks the XXX for funding this study - Finance Code XXX. XXX would like to thank the XXX and the local Agency of the state of XXX for financial support under grant No. XXX and No. XXX, respectively.