A superpixel-driven deep learning approach for the analysis of dermatological wounds

09/13/2019 ∙ by Gustavo Blanco, et al. ∙ 13

Background. The image-based identification of distinct tissues within dermatological wounds enhances patients' care since it requires no intrusive evaluations. This manuscript presents an approach, we named QTDU, that combines deep learning models with superpixel-driven segmentation methods for assessing the quality of tissues from dermatological ulcers. Method. QTDU consists of a three-stage pipeline for the obtaining of ulcer segmentation, tissues' labeling, and wounded area quantification. We set up our approach by using a real and annotated set of dermatological ulcers for training several deep learning models to the identification of ulcered superpixels. Results. Empirical evaluations on 179,572 superpixels divided into four classes showed QTDU accurately spot wounded tissues (AUC = 0.986, sensitivity = 0.97, and specificity = 0.974) and outperformed machine-learning approaches in up to 8.2 Last, but not least, experimental evaluations also showed QTDU correctly quantified wounded tissue areas within a 0.089 Mean Absolute Error ratio. Conclusions. Results indicate QTDU effectiveness for both tissue segmentation and wounded area quantification tasks. When compared to existing machine-learning approaches, the combination of superpixels and deep learning models outperformed the competitors within strong significant levels.



There are no comments yet.


page 6

page 7

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The growing number of different devices that support medical image acquisition and the ever-decreasing costs for storing such images have given rise to new and larger image databases in several medical centers way beyond the radiology room Gillies et al. (2015). For example, protocols for collecting images of lower limb ulcers by using low-cost smartphones in controlled environments have shown great potential to be included in the clinical workflow as additional information that supports physicians’ analyses Yu et al. (2017); Kamath et al. (2018)

. Moreover, such images can be automatically evaluated by Computer-Aided Diagnosis (CAD) tools, or even used for the searching of massive databases through content-only queries, as in Content-Based Image Retrieval (CBIR) applications. In both CAD and CBIR cases, the detection of abnormalities requires the extraction of patterns from images, while a decision-making strategy is necessary for juxtaposing new images to those in the database 

Blanco et al. (2016); Zahia et al. (2018).

Since dermatological lesions are routinely diagnosed by biopsies and surrounding skin aspects, ulcers can be computationally characterized by particular types of tissues (and their areas) within the wounded region Litjens et al. (2017); Seixas et al. (2015). For instance, Mukherjee et al. Mukherjee et al. (2014)

proposed a five-color classification model and applied a color-based low-level extractor further labeled by a Support-Vector Machine (SVM) strategy at an

hit ratio. Such idea of concatenating feature extraction and classification is found at the core of most wound segmentation strategies, as in the study of Kavitha

et al. Kavitha et al. (2017)

that evaluated leg ulcerations by extracting patterns based on local spectral histograms to be labeled by a Multi-Layer Perceptron (MLP) classifier with

accuracy. Analogously, Pereyra et al. Pereira et al. (2013) discussed the use of color descriptors and an Instance-based Learning (IbL) classifier with a 61.7% hit ratio, whereas Veredas et al. Veredas et al. (2010) suggested the use of texture descriptors and an MLP classifier with 84.84% accuracy.

Blanco et al. Blanco et al. (2016) and Chino et al. Chino et al. (2018) followed a slightly different premise for finding proper similarity measures and comparison criteria for dermatological wounds. Their approaches are based on a divide-and-conquer strategy, in which an ulcer is segmented with the support of superpixel construction methods Achanta et al. (2012). Such methods are employed for splitting an image into several pieces of ulcered tissues with well-defined borders to be described by feature engineering methods, as MPEG-7 descriptors Blanco et al. (2016) and Bag-of-Signatures Chino et al. (2018). The features are labeled by the RandomForest classifier, which segments dermatological wounds with accuracy Chino et al. (2018). Those classifier-driven segmentation approaches also provide the basis for measurements of the size and stage of the wound Blanco et al. (2016); Mukherjee et al. (2014); Kavitha et al. (2017); Pereira et al. (2013); Veredas et al. (2010); Dorileo et al. (2010); Khalil et al. (2019).

Recently, deep-learning (DL) models have been successfully applied to specific tissue segmentation problems, such as skin cancer and melanoma characterization Esteva et al. (2017); Li and Shen (2018); Wang et al. (2015)

. They usually rely on convolutional neural networks (CNNs) for combining both feature engineering and data classification into a single package, which eliminates the need for data extraction. DL models distinguish themselves regarding the topology of the underlying CNN,

e.g., VGG Simonyan and Zisserman (2014), AlexNet Russakovsky et al. (2015), Resnet He et al. (2016) or InceptionV3 Szegedy et al. (2016)

, and the learning algorithm, which can be either end-to-end training or transfer-learning. For instance, Goyal

et al. Goyal et al. (2018) proposed a transfer-learning DL model for diabetic foot segmentation with accuracy, while Nejati et al. Nejati et al. (2018) combined AlexNet and SVM for wound classification at an 86.40% hit ratio. Analogously, Zahia et al. Zahia et al. (2018) implemented a divide-and-conquer method that split pressure ulcers into fixed size regions to be labeled by a CNN at a Dice Coefficient ratio. The reviewed approaches point out that the challenges of applying DL models for the detection of wounded tissues are related to (i) finding the most suitable CNN architecture, (ii) determining the applicability of transfer-learning methods from trained CNNs, and (iii) expert-burden or biopsy-based labeling of a massive amount of images Kamath et al. (2018); Wang et al. (2015); Hoo-Chang et al. (2016).

This manuscript presents the QTDU approach for the assessment of the quality of tissues from dermatological ulcers. QTDU integrates previous literature efforts into DL models towards an accurate segmentation of ulcer tissues in lower limbs by adopting a divide-and-conquer strategy through superpixels that define the wounded tissue candidates to be learned by DL models. The idea is only a few images must be labeled by experts, from which hundreds of thousands of superpixels can be automatically extracted and used for the training of CNNs. A large set of experiments was designed for finding both the most suitable CNN architecture and the transfer-learning method that fit the purpose of segmenting dermatological ulcers, and QTDU was compared to previous segmentation efforts in a testbed based on superpixels labeled by human experts. The results showed our approach outperformed the competitors by significant margins and correctly quantified the pixelwise wounded area. Accordingly, the contributions of the manuscript are as follows:

  1. The design of an approach that integrates superpixel methods into DL models for the identification of distinct tissues within ulcered areas. The strategy, coined QTDU, enables both dermatological ulcer segmentation and pixelwise area quantification, and

  2. Experimental indications of QTDU most suitable parameters: (i) raw SLIC superpixels, and (ii) ResNet DL model with the addition of six new layers as part of transfer-learning.

The remainder of the manuscript is organized as follows: Section 2 discusses preliminaries concepts on wound segmentation; Section 3 addresses materials and methods, and a QTDU description; Section 4 focuses on experimental evaluations. Finally, Section 5 provides the conclusions.

2 Preliminaries

While the segmentation of wounded tissues from photographic images of dermatological ulcers has been discussed from distinct perspectives Kamath et al. (2018); Mukherjee et al. (2014); Nejati et al. (2018), most of the current approaches perform a three-stage pipeline for the labeling of wounds towards specific beacons and markers Pereira et al. (2013); Veredas et al. (2010). Such a pipeline is composed of (i) region segmentation, which aims to remove image noise and delimit region boundaries, (ii) feature extraction, which represents (parts of) an image in a multidimensional space, and (iii) data classi- fication, which assigns a label to each image representation.

Unlike existing approaches, our premise is raw image segmentation by superpixels can be combined with DL models to improve the current wound segmentation pipeline and enhance tissue labeling since that combination removes the need for low-level feature engineering. Moreover, advances in distinct yet related areas, such as melanoma detection Litjens et al. (2017); Han et al. (2018); Youssef et al. (2018), can be incorporated into the models through transfer-learning methods. The following paragraphs describe the concepts required by our method.

Notation. An image is a structured set of pixels , such that , where is the total number of pixels of . A pixel is a triple , where , , and represent the pixel intensity in the RGB color space Sonka et al. (2014). Analogously, a superpixel corresponds to a structured subset , where is the number of pixels in . The set of images and superpixels are denoted by and , respectively.

Region segmentation. Regions of interest within images are detached either by parameter-specific methods, e.g., FCN-Net Luo and Yang (2018) and SegNet Youssef et al. (2018), or generic methods, e.g., uniform grids Zahia et al. (2018). Superpixels are an alternative to both parameter-specific and generic methods, as they split an image into regions with well-defined borders Chino et al. (2018). Such regions are automatically selected according to a construction algorithm. Achanta et al. Achanta et al. (2012) conducted a comprehensive survey on superpixel construction algorithms and observed the SLIC method generates high-quality outputs. We follow their indication and build upon SLIC for performing raw tissue segmentation.

Feature extractor. A feature extractor (or image descriptor) is a non-bijective function . Given a superpixel of an image , maps it into a -dimensional feature vector. Semantically, the numerical values of the feature vector represent low-level characteristics of the image, such as color, texture, and shape Sonka et al. (2014). Examples of feature extractors applied in wound segmentation include MPEG-7 descriptors Color Layout, Scalable Color, and Color Structure Blanco et al. (2016).

Classifier. A classifier is a function , where is a discrete and disjoint set of labels and is a training set that conditions the behavior of . In the context of wounds, summarizes the set of labeled superpixels , i.e., for a feature extractor . The classifier’s learning algorithm is a biased method that induces from to predict a label for any superpixel . Examples of classifiers applied to tissue classification include Naïve-Bayes, Multi-Layer Perceptron (MLP), Support Vector Machines (SVM), Instance-based Learning (IbL), and RandomForest Mukherjee et al. (2014); Kavitha et al. (2017); Pereira et al. (2013); Chino et al. (2018).

Convolutional Neural Network. A convolutional neural network (CNN) is a function , where is the domain of images, is a discrete and disjoint set of labels, and is the training set that conditions . In ulcer images, if we set , then a model can be seen as a classifier that bypasses feature extraction, i.e., it uses only the structured set of pixels within superpixels. While CNNs may present distinct internal topologies, their learning algorithm is either directly end-to-end trained from raw pixels of labeled images, or adjusted by a function that performs transfer-learning from a third-party CNN Kamath et al. (2018); Russakovsky et al. (2015).

Deep-Learning Models. A deep-learning (DL) model is the package that includes the CNN topology , the CNN learning algorithm , and the set of labeled and conditioning examples . DL models have recently surpassed human performance in image classification from basic to complex tasks Li and Shen (2018); Hoo-Chang et al. (2016). A variation of DL models is using a CNN method only for feature extraction, as in the proposal of Nejati et al. Nejati et al. (2018). Their approach uniformly divides an image into patches that are fed to five convolutional layers. Three fully-connected layers generate patch representations to be further labeled by an SVM. The strategy of Zahia et al. Zahia et al. (2018) relies on a similar approach but uses a CNN with nine layers for the labeling of grid regions. The main drawback with those approaches is they are tightly coupled to specific DL models so that the strategies may not benefit from isolated enhancements on segmentation, feature extraction, or classification.

ImageNet and Transfer Learning. The finding of a suitable DL model for tissue classification requires a massive number of labeled examples, which, in practice, may be either burdensome for experts Kamath et al. (2018)

. ImageNet Challenge 

Russakovsky et al. (2015) is usually employed as the baseline for the definition of new DL models and contains more than million diversified and labeled images. Such a baseline enables the topology of CNNs to be adjusted for wound analysis through transfer-learning Esteva et al. (2017). End-to-end trained CNN topologies that reached outstanding results on ImageNet include VGG16 Simonyan and Zisserman (2014), InceptionV3 Szegedy et al. (2016) and ResNet He et al. (2016).

VGG16, InceptionV3, and ResNet. VGG16 Simonyan and Zisserman (2014) is a baseline CNN with 16 convolutional layers that extends AlexNet Krizhevsky et al. (2012) and enables the handling of image patches, which is suitable for the learning of features from wound images Zahia et al. (2018); Wang et al. (2015). On the other hand, InceptionV3 Szegedy et al. (2016) extends GoogLeNet Szegedy et al. (2015) and provides modules executed in parallel to represent parts of the CNN topology. The network architecture not only outperformed previous approaches, such as GoogLeNet and VGG16, in the labeling of ImageNet images but also reduced the learning effort in up to times in comparison to the same competitors. Finally, ResNet He et al. (2016) is a recent -layers deep topology that employs residual blocks to guide the learning algorithm on convolutional layers. A ResNet convolutional neural network was able to solve ImageNet within a 3.6% error ratio, which poses this CNN on the same tier of VGG16 and InceptionV3 Russakovsky et al. (2015).

Unbalanced classes. A frequent scenario in wound analysis is the uneven distribution of tissue patterns Yap et al. (2018); Kawahara et al. (2016). For instance, Nejati et al. Nejati et al. (2018) evaluated a dataset of 350 wound images divided into labeled patches, where most of the labels were related to only three of seven possible classes. The authors addressed such imbalance by using an approximation of the NP-hard problem that divides the instances into training and test sets. Likewise, Zahia et al. Zahia et al. (2018) investigated 22 pressure ulcer images, which were divided into 270,762 regions of granulation, 80,636 parts of slough, and 37,146 of necrosis. The authors used weighted classification metrics for reporting the results. Studies on melanoma Kawahara et al. (2016); Esteva et al. (2017) also suggest the use of a sequence of rotations and flips as data augmentation for softening the problem of unbalanced classes.

3 Materials and Methods

Studies on dermatological wounds rely on small datasets, which hinders the generalization of DL models. Our method overcomes this drawback by using a divide-and-conquer strategy in which CNNs are requested to handle superpixels instead of single images. Moreover, we designed the approach in a modular fashion where every stage of the method (segmentation, extraction, and classification) can be set as an external parameter. Therefore, our method also benefits from enhancements on underlying parameters, such as superpixel construction algorithms and coupled deep-learning models.

Data source. We consider data source ULCER_SET Pereira et al. (2013); Dorileo et al. (2010) in the design and evaluation of our proposal. The set contains photographies of arterial and venous ulcers in lower limbs with distinct sizes and different healing stages regarding patients with varying skin colors, age, and treatments. Images represent consecutive evaluations of subjects at the Neurovascular Ulcers Outpatient Clinic of HCFMRP/USP. During acquisition white and blue cloths were used to emphasize the contrast between the background and the patients’ skin, whereas color patches and rulers were included in the images to facilitate color calibration and normalization Dorileo et al. (2010). All photos were taken with the same camera (Canon EOS 5D0, 2MP, 50mm macro lens with a polarization filter), angle and distance. The typical size of an ULCER_SET image is pixels with 24 bits-depth.

Experts of HCFMRP/USP were asked to label superpixels from (out of ) ulcered images of distinct and non-related patients within ULCER_SET 222Labeled data is available at: github.com/gu-blanco/ulcer_set. by following the four-color class model described in Pereira et al. (2013), which includes labels granulation, fibrin, necrosis, not wound. Images were picked by specialists at random, aiming at maximizing diversity, e.g., tissue dominance, skin color, age, and treatment. The size of superpixels was set to pixels according to the recommendations in Blanco et al. (2016); Chino et al. (2018). As a result, 44,893 superpixels were detached and labeled as follows: (i) 37,187 superpixels with predominant healthy skin area, (ii) 3,974 fibrin superpixels, (iii) 3,284 superpixels with predominant granulation tissue, and (iv) 448 superpixels of necrosed tissue. Such instances were employed as the ground-truth for all subsequent evaluations.

Figure 1: Visual analysis of a balanced sample of ULCER_SET

superpixels. (a) T-SNE visualization of Color Structure features. (b) PCA Scree Plot. (c) PCA individual explained variance plot.

Previous approaches. Features were extracted from superpixels by Color Layout, Color Structure, and Scalable Color MPEG-7 extractors, which generated multidimensional representations of superpixels in , , and -dimensional spaces, respectively. Such an extraction serves as data preparation for the visualization of the superpixels’ space and also enables the execution of three-stage segmentation pipelines of related studies. We applied method T-Distributed Neighbor Embedding (T-SNE) Maaten (2014) for the visualization of relationships between the superpixels and their labels. Figure 1(a) shows the T-SNE visualization of Color Structure vectors after the sampling of the original set of superpixels into instances per class and by the parameterization of iterations and -step patience. T-SNE shows the MPEG-7 extractor provides a fair representation of superpixels regarding wounded tissue segmentation since few intersections between instances of different labels were found.

Thus, we also performed a dimensionality reduction before the training of classifiers. Reductions are suitable for norm-based classifiers Mello and Ponti (2018), which concentrate in medium-to-high dimensional spaces, as the cases of and

-dimensional representations, respectively. A Principal Components Analysis (PCA) transformation was applied upon such medium-to-high dimensional vectors so that the contribution of each principal component was evaluated regarding explained variances. Figures 

1(b) and (c) show the behavior of cumulative and individual explained variances for MPEG-7 Color Structure vectors, respectively. Two distinct criteria were employed to avoid high-dimensional influence and for the finding of the number of reduced dimensions: (i) Kaiser-Guttman (KG), which selects components whose individual variances are higher than 1 unit, and (ii) Scree-Plot (SP)

, which selects components whose accumulated variance corresponds to the three last quartiles of all variances 

Peres-Neto et al. (2005). KG criterion indicated components out of dimensions are required for Color Structure vectors, whereas Scree-Plot criterion selected components that represent of total Color Structure variances. Analogously, the KG criterion indicated relevant components out of dimensions are required for Scalable Color, while Scree-Plot criterion used components that represent of the total Scalable Color variances.

RandomForest RandomTree SVM MLP Naïve-Bayes Bayes-Net IbL- IbL-
Color Layout 0.902 AUC 0.448 CKC 0.698 AUC 0.374 CKC 0.646 AUC 0.331 CKC 0.879 AUC 0.427 CKC 0.822 AUC 0.336 CKC 0.853 AUC 0.376 CKC 0.696 AUC 0.376 CKC 0.397 AUC 0.376 CKC
Color Structure 0.952 AUC 0.665 CKC 0.791 AUC 0.548 CKC 0.675 AUC 0.410 CKC 0.880 AUC 0.563 CKC 0.815 AUC 0.149 CKC 0.875 AUC 0.438 CKC 0.815 AUC 0.592 CKC 0.812 AUC 0.583 CKC
Color Structure KG 0.947 AUC 0.641 CKC 0.790 AUC 0.542 CKC 0.630 AUC 0.331 CKC 0.921 AUC 0.568 CKC 0.704 AUC 0.140 CKC 0.872 AUC 0.351 CKC 0.806 AUC 0.577 CKC 0.809 AUC 0.582 CKC
Color Structure SP 0.940 AUC 0.604 CKC 0.777 AUC 0.516 CKC 0.604 AUC 0.262 CKC 0.900 AUC 0.531 CKC 0.706 AUC 0.133 CKC 0.850 AUC 0.333 CKC 0.794 AUC 0.551 CKC 0.794 AUC 0.549 CKC
Scalable Color 0.947 AUC 0.606 CKC 0.760 AUC 0.485 CKC 0.601 AUC 0.207 CKC 0.815 AUC 0.381 CKC 0.502 AUC 0.000 CKC 0.619 AUC 0.058 CKC 0.827 AUC 0.613 CKC 0.796 AUC 0.551 CKC
Scalable Color KG 0.955 AUC 0.645 CKC 0.782 AUC 0.534 CKC 0.600 AUC 0.200 CKC 0.806 AUC 0.330 CKC 0.497 AUC 0.000 CKC 0.650 AUC 0.000 CKC 0.807 AUC 0.569 CKC 0.813 AUC 0.584 CKC
Scalable Color SP 0.906 AUC 0.507 CKC 0.715 AUC 0.414 CKC 0.600 AUC 0.200 CKC 0.807 AUC 0.310 CKC 0.495 AUC 0.000 CKC 0.774 AUC 0.243 CKC 0.740 AUC 0.439 CKC 0.746 AUC 0.452 CKC
Table 1: MPEG-7 extraction and classification of superpixels regarding AUC and Cohen-Kappa Coefficient (CKC).

Machine-learning classification. A representative set of classifiers was trained with seven multidimensional superpixel representations, namely Color Layout, Color Structure, Color Structure – KG, Color Structure – SP, Scalable Color, Scalable Color – KG, and Scalable Color – SP. We examined RandomTree, Naïve-Bayes e Bayes-Net classifiers, as well as methods RandomForest, SVM, MLP, and IbL, employed in previous studies for superpixel labeling. A broad set of parameters was tested for the fine-tuning of the classifiers and results indicated (i)  trees and Gini as the objective function for RandomForest, (ii)  fully-connected hidden layers with r-prop learning algorithm for MLP, (iii)  neighbor for IbL regarding both and norms, and (iv)

 Hill-Climbing heuristic for Bayes-Net construction. Other parameters were set with the default values found in the Weka

framework333Available at: www.cs.waikato.ac.nz/ml/weka/. The classifiers and extractors ( combinations) were evaluated by a -fold cross-validation procedure on labeled images. Table 1 shows the results regarding Area under the ROC Curve (AUC) and Cohen-Kappa Coefficient (CKC). Color Structure and Scalable Color provided more suitable representations than Color Layout, whereas dimensionality reduction produced three of the eight best scenarios – see underline values in Table 1. Although MLP and Bayes-Net outperformed most of the competitors regarding Color Layout and Color Structure representations, they underperformed for Scalable Color vectors. RandomForest outperformed every competitor and reached the highest AUC score through the labeling of Scalable Color – KG representation (0.955).

QTDU approach. Our proposal, named QTDU 444QTDU is available at: github.com/gu-blanco/qtdu/, bypasses feature extraction by relying on CNNs for finding the most suitable representations of wounded tissues within dermatological ulcers. Therefore, QTDU main parameters are related to the underlying DL model used for the superpixel labeling task, i.e., CNN topology, its learning algorithm, and the conditioning training set. In particular, QTDU builds upon results of ImageNet Russakovsky et al. (2015) and the DL models in Esteva et al. (2017) and Han et al. (2018) for defining two CNN candidates: ResNet and InceptionV3. In the evaluations, we used the same learning parameters of those previous studies, i.e., learning ratio of , momentum to , training and decay patience of (the learning patient of epochs), loss function as ‘categorical cross entropy’, and batch size of .

Figure 2: QTDU overall architecture. In the learning phase, the image is divided into superpixels that adjust the underlying CNN and the new six layers. The final “mask” is obtained by joining superpixels labeled as ‘Not Wound’.

Additionally, we prepared the underlying CNN to handle overfitting since both ResNet and InceptionV3 include millions of adjustable parameters. Alternatives for such task include the use of layer regularization and the insertion of several dropout tiers, which substantially modify the network topology. However, our approach requires the architecture to be as similar to the underlying topology as possible, so that a more comprehensive transfer-learning can be performed from third-party CNNs. The management of overfitting in such a context relies on (i) data augmentation, applied to superpixels for increasing the number of instances in the training set, and (ii) careful addition of six new layers, included at the end of the underlying CNN. We followed the regularization hints of Hinton et al. Hinton et al. (2012) for including new levels as three fully-connected tiers interleaved with two dropout layers (i.e.

, Dense–Dropout–Dense–Dropout–Dense), in which the final layer provides the label. Activation function ReLU with 512 units (nodes) was set to the first two dense layers, whereas a

softmax function with four outputs was used in the last layer. As for dropouts, we applied a 0.5 rating so that half of the activation units may be nullified by the generalization routine.

Figure 2 illustrates the three-stage QTDU

pipeline, where the image is first divided into superpixels whose construction enables homogeneous regions to be kept into single blocks (regardless of their shape) according to similarity-based metrics. The learning phase consists of training the original CNN weights out of ImageNet and random variables at the new layers, which are end-to-end adjusted according to superpixels.

Data Augmentation. The number of ULCER_SET instances was augmented through the use of (i) rotations of , (ii) scale in and out by a 0.2 factor, and (iii) vertical and horizontal mirroring. Such a choice of parameters enables the targeting of wounded tissue symmetry (horizontal/vertical flip) and different wound sizes (zoom in). As a result, 179,572 instances and four labels were used for the training of QTDU

. Our implementation employs TensorFlow ImageGenerator for data augmentation, which generates augmented superpixels at runtime and avoids loading all instances into main memory in the learning phase.

Pixelwise area quantification. QTDU divides unsegmented images into superpixels that are further fused according to their labels into four distinct regions, namely (i) wound mask, (ii) granulation, (iii) fibrin, and (iv) necrotic tissues. Such fused regions are quantified regarding the number of pixels per superpixel and the total number of pixels within the image.

4 Experiments

QTDU was tested on an Ubuntu 16.04.4 LTS OS running on a local cluster with two nodes with 2560 GPU cores at 1607 MHz each, 64 Gb shared RAM. The evaluations aimed at (i) determining the most suitable settings for QTDU, and (ii) quantifying QTDU improvements regarding wound segmentation in comparison to previous machine-learning approaches and patch-based CNNs. Accordingly, we selected the best performances of Table 1 and compared them to QTDU with underlying CNNs InceptionV3555keras.io/applications/#inceptionv3 and ResNet666keras.io/applications/#resnet50. The next four comparisons were performed according to a -fold cross-validation procedure.

CNN Training. We measured the time spent on the training of two QTDU underlying CNNs, InceptionV3 and ResNet, with and without initial random weights. Additionally, we also measured the time spent on the training of every classifier with features indicated by the highest AUCs in Table 1. Figure 3

shows the average and standard deviation time for ten runnings of the learning algorithms without on-the-fly data augmentation. While the quality of CNNs with random and ImageNet pertaining weights was similar, the training with random weights demanded more time, on average, with a greater standard deviation. In particular, the training of InceptionV3 required 1,356.1

64.19m and 1.208,636.41m for random and ImageNet weights, respectively. Analogously, the training of ResNet took 2,118.889.88m and 2,050.635.49m regarding random and ImageNet parameters, respectively. Results also showed RandomForest was nearly three orders of magnitude faster than the cheapest CNN. Finally, we measured the time spent on image classification after training the underlying CNNs. On average, QTDU with InceptionV3 required 1.410.34s, whereas QTDU with ResNet took 1.950.19s for labeling a superpixel. All evaluations were performed on the same GPU-based cluster, Python

, Keras, and Scikit-learn


Figure 3: Average and standard deviation elapsed time for the training of machine-learning classifiers and QTDU with and without random weights.
CKC F1-Score Sensitivity Specificity AUC
RandomForest w/ Scalable Color – KG
RandomTree w/ Color Structure
SVM w/ Color Structure
MLP w/ Color Structure – KG
Naïve-Bayes w/ Color Structure
Bayes-Net w/ Color Structure
IbL- w/ Scalable Color
IbL- w/ Scalable Color – KG
QTDU w/ InceptionV3 0.7160.001 0.9690.007 0.9680.006 0.9710.012 0.9860.018
QTDU w/ ResNet 0.7210.001 0.9710.004 0.9700.004 0.9740.007 0.9860.012
Table 2: QTDU vs. machine-learning-based comparison regarding the task of labeling superpixels.

QTDU vs. Machine-learning Classification. Table 2 shows an overall comparison between machine-learning-based classification and QTDU with distinct parameterizations regarding Cohen-Kappa Coefficient, F1-Score, Sensitivity, Specificity, and AUC. QTDU outperformed machine-learning classification in every scenario and metric. In particular, QTDU with InceptionV3 outperformed the classifier (RandomForest) in up to and regarding Sensitivity and Specificity, respectively. Likewise, QTDU with ResNet outperformed RandomForest in up to and regarding Sensitivity and Specificity, respectively. QTDU with ResNet also outperformed the best machine-learning-based approach in up to regarding AUC and reached substantially higher values of Cohen-Kappa Coefficient and F1-Score in all comparisons.

Figure 4 provides a comparison of machine-learning approaches and QTDU regarding F1-Scores per class. Results indicate machine-learning methods achieved measures higher than 0.7 for only two of four classes at the same time. On the other hand, confusion matrices of Figures 4(i – j) show QTDU provided a more uniform result per tissue with the lowest F1-Score of 0.739. Finally, Figures 4(a – h) show the trade-offs between machine-learning approaches, which are more effective for particular tissues depending on the inducing bias and the feature extractor. For instance, RandomForest was slightly better than QTDU for the detection of “not wound” superpixels, a scenario in which our approach delivered false positives at a very low ratio. Last, but not least, QTDU with ResNet achieved results slightly better than InceptionV3 for the labeling of not wound, fibrin, and necrosis superpixels. Aimed at investigating the significance of such differences, we applied a hypothesis test on QTDU and machine-learning solutions.

Figure 4: F1-Scores reached by different superpixel classes according to both QTDU and machine-learning methods. The scale of values ranges in the [0–1] interval. The closer to , the better the score.

Ranking test. In this evaluation, we applied a leave-one-out procedure for the labeling of the uncorrelated images in our ground-truth, instead of using average values from a -fold cross-validation routine. Accordingly, a distinct Cohen-Kappa Coefficient per image was assigned to machine-learning-based methods and QTDU. Next, we performed a hypothesis test to evaluate whether significant differences existed between the coefficients achieved by the competitors. In particular, we applied the well-known Friedman ranking test Demsar (2006) to assess such differences. By using a significance level of 0.01, we obtained a -value of

and, consequently, we rejected the Friedman null hypothesis that differences are due to random sampling, and concluded at least one of the performances differs from the others.

After the rejection of Friedman’s null hypothesis, we applied the Nemenyi post-test Garcia and Herrera (2008)

for comparing pairs of approaches according to confidence intervals of 99%, 95%, and 90%. Table 

3 shows the heatmap based on the Nemenyi -values for each pair of compared approaches (lines vs. columns). QTDU outperformed every machine-learning competitor within strong significant levels, whereas some machine-learning methods, e.g., RandomForest, also outperformed others within significant margins. Moreover, although no significant difference was observed between QTDU with either InceptionV3 or ResNet, results indicate ResNet outperformed other approaches with slightly better -values in comparison to InceptionV3.

Random Forest Random Tree SVM MLP
Random Forest - 1 0 5 0 0 9 3 1 1
Random Tree 1 - 3 1 2 2 1 1 1 1
SVM 1 1 - 1 2 1 1 1 1 1
MLP 1 6 9 - 1 3 1 1 1 1
1 1 1 1 - 1 1 1 1 1
1 1 6 1 1 - 1 1 1 1
IbL - 1 4 0 1 0 8 - 3 1 1
IbL - 1 9 1 8 0 5 1 - 1 1
9 0 0 1 0 0 2 1 - 1
4 0 0 0 0 0 9 1 9 -
99% confidence 95-99% confidence 90-95% confidence No significant differences
Table 3: Heatmap of -values regarding the pairwise comparison of machine-learning methods and QTDU in the labeling of dermatological wound photos (methods are compared as lines vs. columns).

VGG16, InceptionV3, and ResNet. Although state-of-the-art CNNs InceptionV3 and ResNet were employed as QTDU underlying parameters in previous evaluations, patch-based VGG16 can be seamlessly used as well. Figure 5 shows the comparison between VGG16, InceptionV3, and ResNet as QTDU underlying CNNs regarding the mean and standard deviation of Cohen-Kappa Coefficient, F1-Score, Sensitivity, Specificity, and AUC gathered from the same experimental testbed of evaluation “QTDU vs. Machine-learning Classification”. In this experiment, VGG16 input patches were defined as superpixels, and QTDU six last layers were added at the end of the network topology. Results indicate QTDU with InceptionV3 and ResNet outperformed VGG16 for every mean value by a small margin. Therefore, we set ResNet as QTDU underlying CNN in the area quantification experiment.

Figure 5: QTDU with underlying CNNs VGG16, InceptionV3, and ResNet comparison regarding Cohen-Kappa Coefficient (CKC), F1-Score, Sensitivity, Specificity, and AUC.

Pixelwise area quantification. Five images were separated from the original ULCER_SET dataset of 217 (minus 40) images, and two experts manually segmented them according to the four-color class model. Figures 6(a–b) show a selected image and its segmentation in the ImageJ777https://imagej.nih.gov/ij/ tool. Accordingly, we performed a holdout evaluation where the new photos were evaluated as the testing fold. Next, we matched the pixels of QTDU with ResNet segmentation (Figure 6(c)) to the manually annotated pixels. The QTDU average accuracy was , which is higher than the best result reported by previous machine-learning studies, i.e.,  (See Section 1). The Mean Absolute Error (MAE) was calculated for the segmented images counting the unmatched pixels and dividing the result by the number of wounded pixels. The average MAE ratio was with a variance.

Figure 6: Examples of (a) wound photograph, (b) manual segmentation, and (c) QTDU pixelwise area quantification.

Discussion. QTDU outperformed existing three-stage approaches by significant margins in the segmentation of ulcers in lower limbs. In particular, QTDU was up to (Sensitivity) and (Specificity) better than the best carefully constructed machine-learning combo: SLIC superpixels combined with MPEG-7 Scalable Color extractor, PCA reducer with Kaiser-Guttman criterion, and a tuned RandomForest classifier. Such gains were corroborated by F1-Score confusion matrices, which indicate QTDU achieved top performances for individual classes of wounded tissues. Although no significant differences were observed between InceptionV3 and ResNet as QTDU underlying CNNs, ResNet outperformed machine-learning approaches within stronger significance levels. On the other hand, InceptionV3 was up faster to train than ResNet. Results also indicate QTDU is flexible enough to use any patch-based CNN, e.g., VGG16, InceptionV3, or ResNet, whereas overall accuracy may be affected by the choice. Therefore, the underlying CNN can be set towards optimizing either performance or resources. Last, but not least, experiments showed QTDU with ResNet reached an average error of only in comparison to human-conducted segmentation.

5 Conclusions

This manuscript presented a superpixel-driven deep learning approach for the segmentation of wounded tissues within dermatological ulcers. The method, called QTDU, is a divide-and-conquer approach designed as a modular architecture. QTDU takes advantage of superpixels for raw wound segmentation and uses coupled CNNs for performing feature extraction and tissue classification. As a side-effect, QTDU disregards superpixels’ spatial location in the original image. We validated our proposal using a set of ulcer photos, whose pixels were labeled according to four tissue types. QTDU advances in comparison to existing machine-learning-based approaches include (i) bypassing the feature extraction step, and (ii) providing a segmentation “mask” by joining labeled superpixels of the same class.

Empirical evaluations over 179,572 superpixels divided into four classes indicated QTDU efficiently segmented wounded tissues and outperformed fine-tuned machine-learning strategies in up to , , and regarding Sensitivity, Specificity, and AUC, respectively. Additionally, unlike existing machine-learning approaches, QTDU was able to enhance the hit-ratio of every type of wounded tissues, simultaneously. A ranking test also indicated QTDU outperformed every one of the eight machine-learning competitors for the classification of wound images within strong significant levels, whereas no significant differences were observed for QTDU with underlying CNNs InceptionV3 and ResNet.

Experiments indicated other underlying CNNs, such as VGG16, can be seamlessly used as QTDU parameterization, whereas the accuracy and running time of the proposal are influenced by that decision. For instance, results showed ResNet average predictions were slightly higher than VGG16 and InceptionV3, but ResNet required more training and labeling time in comparison to competing CNNs. Accordingly, QTDU can be parameterized towards performance or available resources, depending on the expert requirements. The final experiment showed QTDU reached a accuracy with a merely Mean Absolute Error ratio when compared to a manual human segmentation. Such findings reinforce QTDU can segment ulcers in lower limbs, and delimit the pixelwise area within wounded tissues.

As future work, we are designing a protocol to increase ULCER_SET dataset so that it can become a benchmark for comparing wound analysis methods. Additionally, we intend to extend QTDU by exploring the impact of several data augmentation strategies over superpixels to design a feature extractor module to be coupled into a content-based image retrieval tool for dermatological wounds.

6 References


  • [1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11), pp. 2274–2282. External Links: Document Cited by: §1, §2.
  • [2] G. Blanco, M. V. N. Bedo, M. T. Cazzolato, L. F. D. Santos, A. E. S. Jorge, C. Traina Jr., P. M. Azevedo-Marques, and A. J. M. Traina (2016) A Label-Scaled Similarity Measure for Content-Based Image Retrieval. In IEEE International Symposium on Multimedia, pp. 20–25. External Links: Document Cited by: §1, §1, §2, §3.
  • [3] D. Y. T. Chino, L. C. Scabora, M. T. Cazzolato, A. E. S. Jorge, C. Traina Jr., and A. J. M. Traina (2018) ICARUS: Retrieving Skin Ulcer Images through Bag-of-Signatures. In IEEE International Symposium on Computer-Based Medical Systems, pp. 82–87. External Links: Document Cited by: §1, §2, §2, §3.
  • [4] J. Demsar (2006) Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, pp. 1–30. Cited by: §4.
  • [5] E. A. G. Dorileo, M. A. C. Frade, R. M. Rangayyan, and P. M. Azevedo-Marques (2010) Segmentation and analysis of the tissue composition of dermatological ulcers. In Canadian Conference of Electrical and Computer Engineering, pp. 1–4. External Links: Document Cited by: §1, §3.
  • [6] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 (7639), pp. 115. External Links: Document Cited by: §1, §2, §2, §3.
  • [7] S. Garcia and F. Herrera (2008) An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. Journal of Machine Learning Research 9, pp. 2677–2694. Cited by: §4.
  • [8] R. J. Gillies, P. E. Kinahan, and H. Hricak (2015) Radiomics: images are more than pictures, they are data. Radiology 278 (2), pp. 563–577. Cited by: §1.
  • [9] M. Goyal, N. D. Reeves, A. K. Davison, S. Rajbhandari, J. Spragg, and M. H. Yap (2018) Dfunet: Convolutional neural networks for diabetic foot ulcer classification. IEEE Transactions on Emerging Topics in Computational Intelligence, pp. 1–12. External Links: Document Cited by: §1.
  • [10] S. Han, M.. Kim, W. Lim, G. Park, I. Park, and S. Chang (2018) Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. Journal of Investigative Dermatology 138 (7), pp. 1529–1538. External Links: Document Cited by: §2, §3.
  • [11] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    pp. 770–778. External Links: Document Cited by: §1, §2, §2.
  • [12] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint:1207.0580. External Links: Link Cited by: §3.
  • [13] S. Hoo-Chang, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging 35 (5), pp. 1285–1298. External Links: Document Cited by: §1, §2.
  • [14] S. Kamath, E. Sirazitdinova, and T. M. Deserno (2018) Machine learning for mobile wound assessment. In Imaging Informatics for Healthcare, Research, and Applications, Vol. 1, pp. 1–8. External Links: Document Cited by: §1, §1, §2, §2, §2.
  • [15] I. Kavitha, S. Suganthi, and S. Ramakrishnan (2017) Analysis of Chronic Wound Images Using Factorization Based Segmentation and Machine Learning Methods. In Proceedings of International Conference on Computational Biology and Bioinformatics, pp. 74–78. External Links: Document Cited by: §1, §1, §2.
  • [16] J. Kawahara, A. BenTaieb, and G. Hamarneh (2016) Deep features to classify skin lesions. In International Symposium on Biomedical Imaging, pp. 1397–1400. External Links: Document Cited by: §2.
  • [17] A. Khalil, M. Elmogy, M. Ghazal, C. Burns, and A. El-Baz (2019)

    Chronic Wound Healing Assessment System Based on Different Features Modalities and Non-Negative Matrix Factorization (NMF) Feature Reduction

    IEEE Access 7, pp. 80110–80121. External Links: Document Cited by: §1.
  • [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105. Cited by: §2.
  • [19] Y. Li and L. Shen (2018) Skin lesion analysis towards melanoma detection using deep learning network. Sensors (Basel) 18 (2). External Links: Document, ISSN 1424-8220 Cited by: §1, §2.
  • [20] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez (2017) A survey on deep learning in medical image analysis. Medical Image Analysis 42, pp. 60–88. External Links: Document Cited by: §1, §2.
  • [21] W. Luo and M. Yang (2018) Fast skin lesion segmentation via fully convolutional network with residual architecture and CRF. In International Conference on Pattern Recognition, pp. 1438–1443. External Links: Document, ISSN 1051-4651 Cited by: §2.
  • [22] L. V. D. Maaten (2014) Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 15 (1), pp. 3221–3245. Cited by: §3.
  • [23] R. F. Mello and M. A. Ponti (2018)

    Machine Learning: A Practical Approach on the Statistical Learning Theory

    Springer. Cited by: §3.
  • [24] R. Mukherjee, D. Manohar, D. K. Das, A. Achar, A. Mitra, and C. Chakraborty (2014) Automated tissue classification framework for reproducible chronic wound assessment. BioMed Research International 2014. External Links: Document Cited by: §1, §1, §2, §2.
  • [25] H. Nejati, H. A. Ghazijahani, M. Abdollahzadeh, T. Malekzadeh, N. M. Cheung, K. H. Lee, and L. L. Low (2018) Fine-grained wound tissue analysis using deep neural network. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1010–1014. External Links: Document Cited by: §1, §2, §2, §2.
  • [26] S. M. Pereira, M. A. C. Frade, R. M. Rangayyan, and P. M. Azevedo-Marques (2013) Classification of color images of dermatological ulcers. IEEE Journal of Biomedical and Health Informatics 17 (1), pp. 136–142. External Links: Document Cited by: §1, §1, §2, §2, §3, §3.
  • [27] P. R. Peres-Neto, D. A. Jackson, and K. M. Somers (2005) How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Computational Statistics & Data Analysis 49 (4), pp. 974–997. External Links: Document Cited by: §3.
  • [28] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei (2015) ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115 (3), pp. 211–252. External Links: Document Cited by: §1, §2, §2, §2, §3.
  • [29] J. L. Seixas, S. Barbon, and R. G. Mantovani (2015) Pattern recognition of lower member skin ulcers in medical images with machine learning algorithms. In IEEE International Symposium on Computer-Based Medical Systems, pp. 50–53. Cited by: §1.
  • [30] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §1, §2, §2.
  • [31] M. Sonka, V. Hlavac, and R. Boyle (2014) Image processing, analysis, and machine vision. Cengage Learning. Cited by: §2, §2.
  • [32] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. External Links: Document Cited by: §2.
  • [33] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. External Links: Document Cited by: §1, §2, §2.
  • [34] F. Veredas, H. Mesa, and L. Morente (2010) Binary tissue classification on wound images with neural networks and bayesian classifiers. IEEE Transactions on Medical Imaging 29 (2), pp. 410–427. External Links: Document Cited by: §1, §1, §2.
  • [35] C. Wang, X. Yan, M. Smith, K. Kochhar, M. Rubin, S. M. Warren, J. Wrobel, and H. Lee (2015) A unified framework for automatic wound segmentation and analysis with deep convolutional neural networks. In International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 2415–2418. External Links: Document Cited by: §1, §2.
  • [36] J. Yap, W. Yolland, and P. Tschandl (2018) Multimodal skin lesion classification using deep learning. Experimental Dermatology 27 (11), pp. 1261–1267. External Links: Document Cited by: §2.
  • [37] A. Youssef, D. Bloisi, M. Muscio, A. Pennisi, D. Nardi, and A. Facchiano (2018) Deep convolutional pixel-wise labeling for skin lesion image segmentation. In IEEE International Symposium on Medical Measurements and Applications, pp. 1–6. External Links: Document Cited by: §2, §2.
  • [38] L. Yu, H. Chen, Q. Dou, J. Qin, and P. Heng (2017) Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE Transactions on Medical Imaging 36 (4), pp. 994–1004. External Links: Document Cited by: §1.
  • [39] S. Zahia, D. Sierra-Sosa, B. Garcia-Zapirain, and A. Elmaghraby (2018) Tissue classification and segmentation of pressure injuries using convolutional neural networks. Computer Methods and Programs in Biomedicine 159, pp. 51–58. External Links: Document Cited by: §1, §1, §2, §2, §2, §2.