Log In Sign Up

DFUNet: Convolutional Neural Networks for Diabetic Foot Ulcer Classification

Globally, in 2016, one out of eleven adults suffered from Diabetes Mellitus. Diabetic Foot Ulcers (DFU) are a major complication of this disease, which if not managed properly can lead to amputation. Current clinical approaches to DFU treatment rely on patient and clinician vigilance, which has significant limitations such as the high cost involved in the diagnosis, treatment and lengthy care of the DFU. We collected an extensive dataset of foot images, which contain DFU from different patients. In this paper, we have proposed the use of traditional computer vision features for detecting foot ulcers among diabetic patients, which represent a cost-effective, remote and convenient healthcare solution. Furthermore, we used Convolutional Neural Networks (CNNs) for the first time in DFU classification. We have proposed a novel convolutional neural network architecture, DFUNet, with better feature extraction to identify the feature differences between healthy skin and the DFU. Using 10-fold cross-validation, DFUNet achieved an AUC score of 0.962. This outperformed both the machine learning and deep learning classifiers we have tested. Here we present the development of a novel and highly sensitive DFUNet for objectively detecting the presence of DFUs. This novel approach has the potential to deliver a paradigm shift in diabetic foot care.


page 4

page 6

page 7

page 9

page 11


Fully Convolutional Networks for Diabetic Foot Ulcer Segmentation

Diabetic Foot Ulcer (DFU) is a major complication of Diabetes, which if ...

HierAttn: Effectively Learn Representations from Stage Attention and Branch Attention for Skin Lesions Diagnosis

Accurate and unbiased examinations of skin lesions are critical for earl...

Fully Automatic Wound Segmentation with Deep Convolutional Neural Networks

Acute and chronic wounds have varying etiologies and are an economic bur...

Meta-learning on Spectral Images of Electroencephalogram of Schizophenics

Schizophrenia is a complex psychiatric disorder involving changes in tho...

Identifying Pediatric Vascular Anomalies With Deep Learning

Vascular anomalies, more colloquially known as birthmarks, affect up to ...

I Introduction

Diabetes Mellitus (DM) commonly known as Diabetes, is a lifelong condition resulting from hyperglycemia (high blood sugar levels), which leads to major life-threatening complications such as cardiovascular diseases, kidney failure, blindness and lower limb amputation which is often preceded by Diabetic Foot Ulcers (DFU) [1]. According to the global report on diabetes, in 2014, there are 422 million people living with DM compared to 108 million people in 1980. Among the adults that are over 18 years of age, the global pervalance has gone up from 4.7% in 1980 to 8.5% in 2014 [2]

. It is estimated by the end of 2035, the figure is expected to rise to 600 million people living with DM worldwide

[3]. It is worth mentioning that about only 20% of these people will be from developed countries and the rest will be from developing countries due to poor awareness and limited healthcare facilities [4]. There is about 15%-25% chance that a diabetic patient will eventually develop DFU and if proper care is not taken, that may result in lower limb amputation [5]. Every year, more than 1 million patients suffering from diabetes lose part of their leg due to the failure to recognize and treat DFU appropriately [6]. A Diabetic patient with a ’high risk’ foot needs periodic check-ups of doctors, continuous expensive medication, and hygienic personal care to avoid the further consequences as discussed earlier. Hence, it causes a great financial burden on the patients and their family, especially in developing countries where the cost of treating this disease can be equivalent to 5.7 years of annual income [7].

In current clinical practices, the evaluation of DFU comprises of various important tasks in early diagnosis, keeping track of development and number of lengthy actions taken in the treatment and management of DFU for each particular case: 1) the medical history of the patient is evaluated; 2) a wound or diabetic foot specialist examines the DFU thoroughly; 3) additional tests like CT scans, MRI, X-Ray may be useful to help develop a treatment plan. The patient with DFU generally have a problem of a swollen leg, although it can be itchy and painful depending on each case. Usually, the DFU have irregular structures and uncertain outer boundaries. The visual appearance of DFU and its surrounding skin depending upon the various stages i.e. redness, callus formation, blisters, significant tissues types like granulation, slough, bleeding, scaly skin. Hence, the ulcer evaluation with the help of computer vision algorithms would be based on the exact assessment of these visual signs as color descriptors and texture features.

The major challenges that are involved with this classification task are as follows: 1) large time in collection and expert labelling of the DFU images 2) high inter-class similarity between the normal (healthy skin) and abnormal classes (DFU) and intraclass variations depending upon the classification of DFU [8], lighting conditions and patient’s ethnicity. In this work, we have tested a number of Conventional Machine Learning (CML) methods and Convolutional Neural Networks (CNNs) for the classification of ulcer and non-ulcer. Then, we propose and design a novel fast CNN architecture, named as Diabetic Foot Ulcer Network (DFUNet), which outperformed GoogLeNet [9] and AlexNet [10] in terms of accuracy and sensitivity.

Ii Related Work

The proliferation of information and communication technologies present both challenges and opportunities in terms of the development of new age healthcare systems. There are a number of telemedicine systems that are currently being developed a) to improve the current healthcare systems and also, decrease the cost of medical facilities; b) to improve the reach of medical facilities i.e. frequent remote assessment of patients with the help of communication devices; c) to provide the automated solutions to deal with the shortage of expert medical professionals for these chronic diseases [11]. Over the years, researchers and doctors have developed key telemedicine systems to monitor diabetes [12]. However, there are very few intelligent systems developed for assessment of diabetic foot pathologies which can be categorized into non-automated and automated telemedicine systems.

Ii-a Telemedicine Systems for DFU

With the rapid growth in mobile telecommunications, remote communication is made possible with the help of standalone devices like smart-phones, laptops and Internet. Nowadays, a pocket size smart-phone with the advanced mobile operating system has the capability of a personal computer that can capture and send high-resolution pictures and also, audio and video communication with the help of advanced mobile internet like 4G. In the non-automated category, the common telemedicine systems based on these devices that are mostly set-up in the remote location for assessment of patients a) video conferencing [13]; b) three-dimensional (3D) wound imaging [14]; c) digital photography [15]; d) optical scanner [16]. However, there is still need of specialized medical professionals on the other side for completing the assessment of the patient. Though these systems provide promising results, but there is an urgent need of intelligent systems which can automatically detect the DFU pathologies remotely.

The use of automated telemedicine systems for DFU is still in its infancy. Notably, Liu et al. [17, 18] in 2015 developed an intelligent telemedicine system for detection of diabetic foot complications with the help of spectral imaging, infra-red thermal images and 3D surface reconstruction. However, to implement this system, there is a requirement of several expensive devices and specialist training to use these devices. Wang et. al. [19]

have used an image capture box to capture image data and determined the area of DFU using cascaded two staged Support Vector Machine based classification. They proposed the use of a super-pixel technique for segmentation and extracted the number of features to perform two staged classification. Although this system reported a promising result, it has not been validated on a large dataset. In addition, the image capture box is very impractical for data collection as there is need for contact of the patient’s feet and box surface which would not be allowed in a healthcare setting because of concerns regarding infection control. In other significant work, Manu et al.

[20] perform the segmentation of DFU and surrounding skin on the full foot images.

Additionally, computer methods based on manually engineered features or image processing approaches were implemented for tissue classification and segmentation of related skin lesion such as wound. The conventional machine learning for classification task was performed by extracting various features such as texture descriptors and color descriptors on small delineated patches of wound images, followed by machine learning algorithms to classify them into normal and abnormal skin patches [21, 22, 23, 24, 25]. As in many computer vision systems, the hand-crafted features are affected by lighting conditions and skin color depending upon the ethnicity group of the patient. In general, virtually all the skin lesions related to both wound and ulcer are now termed as wound. In medical perspective, both wound and ulcer are considered differently as wound are caused by an external problem whereas, ulcer are caused by an internal problem. Also, there are differences in appearance of the skin lesion of wound and ulcer, the cause (aetiology), the way the body responds (physiology) and disease processes (pathology) [26]. Hence, in this present study, only DFU are considered to determine how they are different from the normal healthy skin at the same place of appearance.

Ii-B Computer Vision and Deep Learning

In recent years, there has been a rapid development in the area of computer vision, especially towards the difficult and important issues like understanding images of different domains such as spectral, medical, object and face detection, multi-class and label classification

[27, 9, 28, 29]. The conventional computer vision and machine learning algorithms were very limited in their ability to process the large image data, provide the representations of data with multiple levels of abstraction, and require a lot of manual tuning for each input image. Deep convolutional networks as a recent machine learning algorithm comes out as an important technique to solve these kinds of computer vision problems [30, 10]. Deep convolutional networks obtain the multiple levels of representation methods by simple non-linear modules which transform the simple feature representation into the more advanced abstract representations for classification. Deep convolutional networks use images as input and start to learn features such as edges at specific directions and positions from the array of pixel values. At higher level, it combines these edges to learn more important abstract features such as components of desirable objects and finally, these components are connected with each other to form final objects [30].

Supervised learning is one of the most common forms of machine learning. It is very important for the training of the network as the system learn the classification tasks from a large collection of images that are labelled differently for each category. Without training, it is not possible for the machine to detect the desired category by giving the highest score of all categories [31, 32, 33]. During the training stage, different images are processed by the machine to produce the output vector of scores for all categories for each image and then, the error is measured in respect of output scores versus the expected score until the desirable score for each category is obtained. After training, a validation set of data or images is used to fine tune the hyper-parameters of networks like setting the weights for each layer and the number of convolutional and pooling layers. Lastly, the system is tested with real world test data without any expected outcome to check the performance of the system.

The major contributions of this paper are as follows: 1) to the best of our knowledge, this is the first time, CNNs have been used to develop a fully automatic method to classify the DFU skin against the normal skin. 2) development of a novel CNN architecture called DFUNet, which is fine tuned to process the input data more effectively and efficiently than other comparative state-of-the-art CNNs architecture. The remainder of the paper is structured as follows. Section III describes the methodology that we used to design classifiers based on CML and CNNs and provides details of our proposed DFUNet. In Section IV, performance of various classifiers is tested with evaluation metrics like

Sensitivity, Specificity, Precision, F-Measure, and

Area Under the receiver operating characteristic Curve (AUC)

. In the Section V, the conclusion and future scope of our work are discussed.

Iii Methodology

This section describes the proposed dataset containing examples of DFU of various patients. This includes expert labelling of the different regions as normal and abnormal skin patches. In addition, the feature descriptors used in experiments are detailed, including for CML, the CNNs architecture of LeNet, AlexNet, and GoogLeNet. Finally, we propose our own CNN architecture, DFUNet, to improve the way DFU are classified.

Iii-a DFU Dataset

The first challenge was to collect a dataset of standardized color images of DFU from various patients to train the various deep learning model. We utilized an extensive database of 292 images of patient’s foot with DFU over the previous five years at the Lancashire Teaching Hospitals, obtaining ethical approval from all relevant bodies and patient’s written informed consent. Also, we collected 105 images of the healthy foot to get the more cases for normal healthy class. Approval was obtained from the NHS Research Ethics Committee to use these images for this research. These DFU images were captured with Nikon D3300. Whenever possible, the images were acquired with close-ups of the full foot with the distance of around 30-40 cm with the parallel orientation to the plane of an ulcer. The use of flash as the primary light source was avoided and instead, adequate room lights are used to get the consistent colors in images. To ensure the close range focus and avoiding the blurriness in images from the close distance, a Nikon AF-S DX Micro NIKKOR 40mm f/2.8G lens was used. We also included another test case that is captured by IPad with the help of FootSnap application to show the robustness of algorithms over heterogeneous capture setup [34]. It consists of 20 abnormal skin patches and 32 normal skin patches in this heterogeneous test case.

Iii-B Expert Labelling of Images

With the available annotator from Hewitt et al. [35], for each full image of a foot with ulcers (as illustrated in Fig. 1), the medical experts delineated the Region Of Interest (ROI) which is an important region around the ulcer comprises of significant tissues of both normal and abnormal skin. The ground truth labels are delineated by medical professionals in the form of both normal and abnormal skin patches from the ROI region. In the collection of ground truth patches, the experts only collected both classes of patches from ROI region that helped with more robust classification of the patches rather than involving the whole foot as a region. For each delineated abnormal region, the ground truth of the type of the abnormality was labelled and exported to an Extensible Markup Language (XML) file. For the annotation on 397-foot images with both ulcer and non-ulcer, there is a total of 292 ROI (Only for the foot images with ulcers). From these annotations, we produce a total of 1679 skin patches with 641 normal and 1038 abnormal. Finally, we divided the dataset into training set of 1423 patches, validation set of 84 patches and testing set of 172 patches. The annotator tool which can delineate the image into different types of patches is shown in Fig. 1.

Fig. 1: An example of delineating the different regions from the whole foot image

Iii-C Data Augmentation of Training Patches

Deep networks require a lot of training image data because of the enormous number of parameters, especially weights associated with convolutional layers needed to be tuned by learning algorithms. Hence, we used data augmentation to improve the performance by the deep learning methods. We used the combination of various image processing techniques like rotation, flipping, contrast enhancement, using different color space, and random scaling to perform data augmentation. The rotation is performed by rotating the image by angle of 90, 180, 270. Then, three types of flipping (horizontal flip, vertical flip and horizontal+vertical flip) performed on the original patches. The four color space that are used for data augmentation are Ycbcr, NTSC, HSV and L*a*b. In the contrast enhancement, we used the three functions called adjust image intensity value, enhanced contrast using histogram equalization, and contrast-limited adaptive histogram equalization. We produced the 2 times cropped patches with the help of random offset and random orientation from the original dataset of skin patches. With these techniques, we increase the number of training and validation patches by 15 times i.e. 21,345 patches for training and 1260 patches for validation.

Iii-D Pre-processing of Training Patches

Since, we obtained the large number of training data with the help of data augmentation, it is very important to perform pre-processing on these patches. We used the zero-centre technique for pre-processing of these obtained patches, and then performed the normalization of every pixel.

Iii-E Conventional Machine Learning

We investigate the use of human design features with CML on DFU and healthy skin classification. From our observation on the differences between DFU and healthy skin, the color and texture features descriptors were the visual cues for classification. For this 2-class classification problem, the sequential minimal optimization (SMO) [36] was selected as SVM based machine learning classifier.

Iii-E1 Feature Descriptors

We resize the patches of the whole dataset to 256256 to extract the uniform color and texture feature descriptors. The three color space that we have used: RGB, HSV and L*u*v.

Local Binary Patterns (LBP) [37] is one of the most popular texture descriptors for the classification. In our case, the LBP features are extracted to recognize the sudden change in texture in an abnormal region of the foot for detection of DFU.

Histogram of Oriented Gradients (HOG) [38] is a manually designed feature which converts the pixel based representation into the gradient based. In the context of this classification, HOG can be useful in terms of image gradients at an abnormal location in an image which gives you the intensity change in that location. As the gradient is a vector quantity, it has both the magnitude and direction.

Iii-F Convolutional Neural Networks

For comparison with the traditional features, deep learning, specifically convolutional neural networks, have been used to classify between healthy foot skin and skin with diabetic ulcerations. The first architecture we used was LeNet [39]

running for 60 epochs, a learning rate of 0.01 with a step-down policy and step size of 33%, and gamma is set to 0.1. This network was originally used for recognizing digits and zip codes. These simple structures are easily recognized, even in hand-written datasets such as MNIST


Diabetic ulcers stand out on foot, as can be seen in Fig. 2 from an example from the diabetic ulceration dataset. Using LeNet represents these structures much better than traditional features, even on a relatively small training set of 1423 patches and validation of 84 patches.

Fig. 2: An example of the raw input (left) from the DFU dataset and the first activation from the LeNet architecture (right).

The input was 2828 patches of skin in grayscale split into abnormal and normal skin samples. At the first convolution layer shown in Fig. 3, the kernels and activations already show the effectiveness of CNNs when highlighting important features.

Fig. 3: The output of healthy and diabetic ulcer skin from the first convolution layer of LeNet highlight discriminative features.

We used the Caffe

[41] framework to implement LeNet [39]

, and used the Adaptive Moment Estimation (Adam)

[42] method for stochastic optimisation. This solver combines the advantages found in AdaGrad [43]

, which works well with sparse gradients, and RMSProp

[44], which works well in an online setting. Adam is intended for large datasets and variability in parameters, however, the results in Table IV show that smaller datasets work just as effectively.

We also used popular CNN model AlexNet for classification of abnormal (DFU) and normal (healthy skin) classes. This network was originally used for classification of 1000 different objects of classes on ImageNet dataset. It emerged as winner of ImageNet ILSVRC-2012 competition in classification category by achieving 99% confidence. There are few adjustments made in original network to work well for our 2-class classification problem. Also, a pre-trained model was used for better convergence of weights to achieve better results

[10]. To train the model on Caffe framework, we used the same parameters as in LeNet i.e. 60 epochs, a learning rate of 0.01, and gamma of 0.1.

Another state-of-the-art CNN architecture that we used is GoogLeNet [9], a 22 layers deep network, with similar experimental setting as of LeNet and AlexNet. Szegedy et al. [9]

introduced a new module called inception to GoogLenet. This acts as a multiple convolution filter inputs, that are processed on the same input and also does pooling at the same time. All the outcomes are then merged into single feature layer. This layer allows the model to take advantage of multi-level feature extraction from each input. Again, a transfer learning approach using pre-trained models to improve the performance.

Iii-G Proposed Method - Diabetic Foot Ulcer Network

To improve the extraction of important features for DFU classification, we propose a new Diabetic Foot Ulcer Network (DFUNet) architecture which is combination of important aspect of CNNs architecture - depth and parallel convolution layer. DFUNet combines two types of convolutional layers i.e. traditional convolution layers at the starting of the network which use single convolutional filter followed by parallel convolutional layers, which use multiple convolutional layers for extraction of multiple-features from the same input. Detecting changes in healthy skin is a clear computer vision problem similar to malignant skin lesions, so the DFUNet is designed around convolutions to finding discriminative features for learning.

Healthy skin tends to exhibit smooth textures and DFU have many distinct features including large edges, strong changes in intensity or color and quick changes between surrounding healthy skin and the DFU itself. DFUNet, summarised in Fig. 4, is split into three main sections: the initialisation layers inspired by GoogLeNet, parallel convolution layers to discriminate the DFU more effectively than previous network layers and lastly, both fully-connected layers and a softmax-based output classifier. The detailed layers of the general DFUNet architecture are provided in the Table I.

Fig. 4: An overview of the proposed DFUNet architecture.
Layer no. Layer type Filter size Stride No. of filters FC units Input Output
Layer 1 Conv. 77 22 64 - 3224224 64112112
Layer 2 Max-pool. 33 22 - - 64112112 645656
Layer 3 Conv. 11 11 64 - 645656 645656
Layer 4 Conv. 33 11 192 - 645656 1925656
Layer 5 Max-pool. 33 22 - - 1925656 1922828
Layer 6 Parallel conv. 11,33,55 11 3264128 - 1922828 2242828
Layer 7 Max-pool. 33 22 - - 2242828 2241414
Layer 8 Parallel conv. 11,33,55 11 3264128 - 2241414 2241414
Layer 9 Parallel conv. 11,33,55 11 3264128 - 2241414 2241414
Layer 10 Max-pool. 33 22 - - 2241414 22477
Layer 11 Parallel conv. 11,33,55 11 3264128 - 22477 22477
Layer 12 Max-pool. 77 11 - - 22477 22411
Layer 13 Fully conn. - - - 1000
Layer 14 Fully conn. - - - No. of Classes
TABLE I: Network Architecture of DFUNet. Conv. refers to convolutional layer, Max-pool. refers to Max-Pooling layers

The parameters used for training with DFUNet are 40 epochs, a batch size of 8, the Adam solver with a learning rate of 0.001. A step-down policy is used where the learning rate reduces with a step of 33% and gamma is set to 0.1.

Iii-G1 Input Data

The DFU training and validation images are input as 256256 patches from areas of the feet containing DFU and healthy skin. An example of the regions of a foot cropped is shown in Fig. 5. We used the centre crop of size 224224 and mirror as data parameters.

Fig. 5: Healthy and ulcer patches taken from feet for training in the CNN.

Inspired by the GoogLeNet [9] input stem, the input to DFUNet, shown in Fig 6 begins by initial convolutions, pooling and normalisation layers in a traditional CNNs structure. Doing this step also ensures that the larger raw input image dimensionality is reduced before moving on to subsequent layers.

Fig. 6: The initial input layers, similar to traditional CNNs, to prepare the data for the parallel convolution layers.

Iii-G2 Parallel Convolutions

The traditional convolutional layers use only single type of convolutional filter popularly ranging from 11 to 55 on the input data. Each convolutional filter provides different feature extraction on the same input. The idea behind using the parallel convolutional layer is basically concatenation of multiple convolution filter inputs to allow the multiple-level feature extraction and cover more spread out clusters from same input. The design of the convolutions is weighted towards creating as discriminative features as possible to highlight any DFUs in an image. Three sizes of convolution kernels are used in the parallel convolutional layers of DFUNet throughout: 55, 33 and 11. These are processed in parallel to each other and finally concatenated. The core of DFUNet is the four parallel convolutions and is shown in Fig. 7. The parallel convolutional layers are key innovation in methods appears to be in the architecture of the DFUNet. As this is the one of the most significant innovation, The DFUNet is experimented with different variants of these parallel sections to get the optimal architecture. There are total number of 5 variants of DFUNet is selected with different filter size that are experimented on the DFU dataset and the results are provided below in the Table II.

Layers No. DFUNet Var. 1 DFUNet Var. 2 DFUNet Var. 3 DFUNet Var. 4 DFUNet Var. 5
1st Parallel Conv. 128256512 192256512 128128128 192192192 256256256
2nd Parallel Conv. 128256512 192256512 128128128 256256256 256256256
3rd Parallel Conv. 128256512 192256512 256256256 256256256 512512512
4th Parallel Conv. 128256512 192256512 256256256 512512512 512512512
TABLE II: The performance measures of the various variants of DFUNet on DFU dataset. Conv. refers to convolutional layer and var. refers to variant
Fig. 7: The structure of each parallel convolution layers.

Each convolution provides additional discriminative power. Lower activations are present in healthy skin samples shown in Fig. 8 due to the absence of skin abnormalities. Higher activations are present in skin with an ulcer as shown in Fig. 9 due to skin abnormality.

Fig. 8: The healthy skin raw input, convolution kernels and convolution activations of DFUNet.
Fig. 9: The diabetic ulcer skin raw input, convolution kernels and convolution activations of DFUNet.

Each convolution layer uses a Rectified Linear Unit (ReLU) which is defined as


where the function thresholds the activations at zero. As we use a ReLU for each convolution, they include unbounded activations, so we use local response normalisation (LRN) to normalise these activations after each concatenation of convolutional layers. It is also proven helpful in avoiding the over-fitting problem faced by CNNs methods. Let, be the source output of kernel i applied at position (x,y). Then, regularized output of kernel i applied at position (x,y) is computed by


where N is total number of kernels, n is the size of the normalization neighbourhood and ,,k,(n) are the hyper-parameters.

Further, to reduce dimensionality, a max pooling layer is included after the first and the third parallel convolutions.

Iii-G3 Fully Connected Layers and Output Classifier

The final section is the softmax output of class probabilities and is a measure of how close the parameters are with respect to the ground truth labels of the training and validation data. The 2-class outputs of the DFU is healthy skin and DFU. It is formed from an average pooling layer followed by two fully connected (FC) layers with outputs of 100 for the first and 2 for the second. It is worth mentioning, the DFUNet is fine-tuned for the 2-class problem by using only outputs of 100 rather than 1000 in first FC layer and last FC layer is adjusted as 2. This fine-tuning helps in faster processing time in both training and testing phase of the DFUNet. The softmax function (cross-entropy regime) is the final layer and is defined as


where is the -th element of the vector of class scores and

is a vector of arbitrary real-valued scores that are squashed to a vector of values between zero and one that sum to one. The loss function is defined so that having good predictions during training is equivalent to having a small loss. The output layers are summarized in Fig.


Fig. 10: The final layers, including a softmax classifier, to predict normal skin and DFU.

Iv Results and Discussion

The DFU dataset was split into the 85% training, 5% validation and 10% testing sets and we adopted the 10-fold cross-validation technique. Hence, for training and validation using the proposed DFUNet architecture, we used approximately 1423 patches (including 882 abnormal cases) and 84 patches (including 52 abnormal cases) respectively from the 397 original foot images. As mentioned previously, we used both CML models and CNNs models to do the classification task. LeNet was the only architecture that worked on 2828 gray scale patches rather than 256256 RGB images as input used by GoogLeNet, AlexNet, DFUNet and CML. It was included to show how the basic deep learning works on this new classification problem.

With data augmentation technique, these patches are made 15 times for both training and validation. But, when we used the data augmentation technique in our experiment, the final results are found to be the same with all the models. Hence, we did not include the data augmentation datasets in Table III and Table IV as it did not improve the results. The main reasons behind the failure of data augmentation was overall performance metrics recorded without data augmentation was quite high and there was only small number of misclassification cases which were not corrected even with models trained with data augmentation.

In Table IV, we report Sensitivity, Specificity, Precision, Accuracy, F-Measure and Area under curve of ROC (AUC) as our evaluation metrics. In medical imaging, Sensitivity and Specificity are considered reliable evaluation metrics for classifier completeness.


In Table III, we report the performance measures of various DFUNet variants with different parameters as explained in the architecture of DFUNet in previous section. There was not much gap in performances between all the models. But, overall, the DFUNet variant 5 performed best in every evaluation metrics except Precision in which DFUNet variant 1 performed the best. It is clear that DFUNet variant 5 which uses the much larger filter sizes than other variants in last two parallel convolutional layers produced better results. Hence, with best results achieved by DFUNet variant, we used it as a proposed DFUNet to compare the performance with other traditional machine learning and deep learning models. ROC curve for all the variants is illustrated by Fig. 11.

Sensitivity Specificity Precision Accuracy F-Measure AUC
DFUNet Var. 1 0.923 0.910 0.946 0.918 0.934 0.957
DFUNet Var. 2 0.928 0.905 0.942 0.919 0.935 0.959
DFUNet Var. 3 0.928 0.906 0.942 0.921 0.935 0.960
DFUNet Var. 4 0.927 0.900 0.938 0.917 0.933 0.958
DFUNet Var. 5 0.934 0.911 0.945 0.925 0.939 0.961
TABLE III: The performance measures of various variants of the DFUNet on DFU dataset without data augmentation
Fig. 11: The ROC curve for all DFUNet models in which DFUNet var. 5 performed best with an AUC score of 0.961. Var. refers to variant.

There are three CML models and three CNNs models used for classification. In CML, we used the combination of LBP, HOG and Colour descriptors (RGB, HSV and L*u*v) as feature vectors and then, we trained an SMO for our classification problem. For each CNN, LeNet, AlexNet, GoogLeNet and our proposed DFUNet are the chosen architectures used for classification. Each classifier performed well for Sensitivity with less than 1.4% margin between the highest result (DFUNet) and the lowest result (LBP + HOG). There is a larger gap of 7.7% in Specificity for the CML models performance measure, with results ranging from 0.835 to 0.845.

For the CNNs approaches, LeNet achieved the lowest score of 0.81 for Specificity, whereas the AlexNet, GoogLeNet and DFUNet performed best in this category, with 0.892, 0.912, and 0.908 respectively. AUC is considered to be a viable performance measure for the different machine learning approaches for classification, with DFUNet and GoogLeNet achieving 0.961 and 0.960 respectively.

Sensitivity Specificity Precision Accuracy F-Measure AUC
LBP 0.919 0.764 0.878 0.865 0.898 0.932
LBP + HOG 0.881 0.841 0.906 0.866 0.893 0.931
LBP + HOG + Colour Descriptors 0.902 0.845 0.904 0.880 0.904 0.943
LeNet (CNN)[39] 0.912 0.810 0.871 0.872 0.893 0.929
Alexnet (CNN)[10] 0.895 0.886 0.933 0.893 0.914 0.950
GoogLeNet (CNN)[9] 0.905 0.912 0.949 0.907 0.927 0.960
Proposed DFUNet 0.934 0.911 0.945 0.925 0.939 0.961
TABLE IV: DFU classification results. Overall, our proposed DFUNet achieved the best results.

Overall, we showed that using CNNs can outperform the more traditional CML features by a large margin. All CNN architectures achieved higher results than any of the CML results in most cases. GoogLeNet and DFUNet were the best performers for various evaluation metrics among all the classifiers. The ROC curve for all the models is demonstrated by the Fig. 12. The details of AUC performance for each method is described in Table V.

AUC Score Standard Error of the Area Confidence interval of the AUC (95 percent)
LBP 0.9322 0.0061 0.9202 - 0.9443
LBP + HOG 0.9308 0.0060 0.9190 - 0.9427
LBP + HOG + Colour Descriptors 0.9430 0.0054 0.9324 - 0.9537
LeNet (CNN)[39] 0.9292 0.0060 0.9173 - 0.9412
Alexnet (CNN)[10] 0.9504 0.0050 0.9405 - 0.9603
GoogLeNet (CNN)[9] 0.9604 0.0045 0.9514 - 0.9690
Proposed DFUNet 0.9608 0.0044 0.9520 - 0.9695
TABLE V: The performance measures of all methods on AUC curve
Fig. 12: ROC curve for all the models including CML and CNNs mentioned in Table IV in which our proposed DFUNet method achieved the best score.

We received better results than GoogLeNet on various evaluation metrics. The reason behind using the DFUNet rather than conventional CNNs architecture in particular GoogLeNet is to speed up the best results with the help of lesser layers i.e. 14 layers architecture compared to the 22 layers architecture of GoogLeNet and fine tuning the overall architecture of DFUNet according to the 2-class problem i.e. normal and abnormal skin patches. With the 10-fold cross validation, on the same machine configuration and input batchsize on Caffe framework, DFUNet took an average of 3 minutes 32 seconds where as GoogLeNet took average of 16 minutes 27 seconds to train a model with the same amount of training and validation data. For testing, DFUNet took an average of 49 seconds where as GoogLeNet took an average of 72 seconds to classify the same test data. Therefore, we demonstrated how reducing the number of layers using the bespoke architecture of DFUNet markedly reduced processing time, while also achieving a higher sensitivity and specificity with introduction of parallel convolution layers with increased number of filter input.

Our proposed DFUNet has highest performance measures in Sensitivity, with a score of 0.934, F-measure with 0.939 and AUC with 0.962. Whereas, GoogLeNet has highest score in Specificity and Precision due to it’s robust nature of being able to find more subtle changes using the inception architecture [9].

There is no evidence of an influence of factors such as lighting conditions and skin tone due to patient’s ethnicity on DFU classification. As ulcer and surrounding skin has quite distinctive texture and color features from the normal healthy skin irrespective of above mentioned factors. In our experiments, these factors result in very few misclassified instances in testing set when there is very high red skin tone as shown in Fig. 13.

Iv-a Accurate and Inaccurate Cases of Classification by Proposed DFUNet

There are a few examples of correctly and incorrectly classified cases in both abnormal and normal classes as illustrated in Fig. 13. The performance of DFUNet is quite accurate in correctly classifying most of the testing instances. DFUNet generally struggle to classify the pre-ulcer skin and usually detected it as normal with high percentage as illustrated by example 1 and 2 of misclassification cases of abnormal class in Fig. 13. Also, DFU that are very small in size are misclassified as normal as shown by example 3 and 4 of misclassification cases of abnormal class Fig. 13. In normal skin, the patches with toe, highly wrinkled skin, and very high red tone skin are classified wrongly by the proposed method as illustrated by the examples of misclassified cases of normal classes in Fig. 13.

Fig. 13: Correctly and wrongly classified cases for both abnormal and normal classes

V Performance evaluation on Heterogeneous Test Case

Since, DFU dataset is captured with the same DSLR camera as mentioned in above section. With computer vision techniques, it is preferable to have heterogeneous capture to form dataset. But, strict medical ethical approval does not allow to use different cameras to capture the pictures of DFU. Hence, we collected another heterogeneous dataset of standardized DFU images with the help of FootSnap application. These images are captured with the help of IPad camera. We tested our algorithm on this heterogeneous dataset and received good performance with Sensitivity score of 0.929, F-measure with 0.931, , Specificity of 0.908, Precision with 0.942 and AUC with 0.950 score.

Vi Performance Evaluation on Facial Skin Dataset

Since, DFUNet performed well on the classification of DFU skin patches, to test the robustness of DFUNet on other skin lesion datasets, we run the experiment of 3-class classification of facial skin patches i.e. normal, spot and wrinkles. It is worth mentioning, there is no public skin lesion dataset available for research without prior written consent. In this derma dataset, we delineated the equal number of skin patches i.e. 110 patches for each class. We used only two best performing CNN architectures in Table. IV i.e. GoogLeNet and DFUNet for this experiment. With the same experimental settings, DFUNet outperforms GoogLeNet in each evaluation metrics for 10-fold cross-validation data as shown in Table VI. This is due to the deep learning models does not work well with smaller dataset even with full training [33]. But, DFUNet uses larger filter sizes in the later parallel convolution layers to extract more multiple features which helps DFUNet outperforms GoogLeNet in this experiment.

Sensitivity Specificity Precision Accuracy F-Measure MCC
GoogLenet 0.783 0.882 0.784 0.846 0.784 0.665
Proposed DFUNet 0.867 0.930 0.867 0.907 0.867 0.796
TABLE VI: Facial Skin classification results. Overall, our proposed DFUNet achieved the best results.

Vii Conclusion

In this work, we trained various classifiers based on traditional machine learning algorithms, CNNs and proposed a new CNN architecture, DFUNet on DFU classification which discriminates the DFU skin from healthy skin. With high-performance measures in classification, DFUNet allows the accurate automated detection of DFU in foot images and make it an innovative technique for DFU evaluation and medical treatment. For the detection of DFU, it is very important to understand the difference between DFU and healthy skin to know the features differences between these two classes in computer vision perspective. This work has potential for technology that may transform the detection and treatment of diabetic foot ulcers and lead to a paradigm shift in the clinical care of the diabetic foot. This work has formed the basis to achieve future targets that include: 1) developing the automatic annotator that can automatically delineate and classify the foot images without the help of clinicians; 2) developing the automatic ulcer detection and recognition and segmentation with the help of these classifiers; 3) implementing the method to determine the various pathologies of DFU as multi-class classification similar to the Texas classification and other grading scales; 4) implementing the various user-friendly software tools including mobile applications for ulcer recognition [45]. Since DFUNet worked well for DFU classification, this proposed framework will likely be useful for classifying the other skin lesions such as wound classification, infections like chicken pox or shingles, other skin lesions like moles and freckles, spotting marks and pimples [46] against the normal skin. For classification, DFUNet is a light-weight CNN framework that is currently fine-tuned for only two classes (ulcer and normal skin), it will be further tested in the future to include many more classes. Therefore, we demonstrated how reducing the number of layers and fine-tuning using the bespoke architecture of DFUNet markedly reduced processing time, while also achieving a higher sensitivity and specificity.


  • [1] S. Wild, G. Roglic, A. Green, R. Sicree, and H. King, “Global prevalence of diabetes estimates for the year 2000 and projections for 2030,” Diabetes care, vol. 27, no. 5, pp. 1047–1053, 2004.
  • [2] W. H. Organization et al., “Global report on diabetes who geneva,” 2016.
  • [3] K. Bakker, J. Apelqvist, B. Lipsky, J. Van Netten, and N. Schaper, “The 2015 iwgdf guidance documents on prevention and management of foot problems in diabetes: development of an evidence-based global consensus,” Diabetes/metabolism research and reviews, vol. 32, no. S1, pp. 2–6, 2016.
  • [4] A. J. Boulton, L. Vileikyte, G. Ragnarson-Tennvall, and J. Apelqvist, “The global burden of diabetic foot disease,” The Lancet, vol. 366, no. 9498, pp. 1719–1724, 2005.
  • [5] F. Aguiree, A. Brown, N. H. Cho, G. Dahlquist, S. Dodd, T. Dunning, M. Hirst, C. Hwang, D. Magliano, C. Patterson et al., IDF Diabetes Atlas: sixth edition, 6th ed., L. Guariguata, T. Nolan, J. Beagley, U. Linnenkamp, and O. Jacqmain, Eds.   International Diabetes Federation, 2013.
  • [6] D. G. Armstrong, L. A. Lavery, and L. B. Harkless, “Validation of a diabetic wound classification system: the contribution of depth, infection, and ischemia to risk of amputation,” Diabetes care, vol. 21, no. 5, pp. 855–859, 1998.
  • [7] P. Cavanagh, C. Attinger, Z. Abbas, A. Bal, N. Rojas, and Z.-R. Xu, “Cost of treating diabetic foot ulcers in five different countries,” Diabetes/metabolism research and reviews, vol. 28, no. S1, pp. 107–111, 2012.
  • [8] L. A. Lavery, D. G. Armstrong, and L. B. Harkless, “Classification of diabetic foot wounds,” The Journal of Foot and Ankle Surgery, vol. 35, no. 6, pp. 528–531, 1996.
  • [9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2015, pp. 1–9.
  • [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [11] S. Franc, A. Daoudi, S. Mounier, B. Boucherie, H. Laroye, C. Peschard, D. Dardari, O. Juy, E. Requeda, L. Canipel et al., “Telemedicine: what more is needed for its integration in everyday life?” Diabetes & metabolism, vol. 37, pp. S71–S77, 2011.
  • [12] O. El-Gayar, P. Timsina, N. Nawar, and W. Eid, “A systematic review of it for diabetes self-management: are we there yet?” International journal of medical informatics, vol. 82, no. 8, pp. 637–652, 2013.
  • [13] J. Clemensen, S. B. Larsen, M. Kirkevold, and N. Ejskjaer, “Treatment of diabetic foot ulcers in the home: video consultations as an alternative to outpatient hospital care,” International journal of telemedicine and applications, vol. 2008, p. 1, 2008.
  • [14] F. L. Bowling, L. King, J. A. Paterson, J. Hu, B. A. Lipsky, D. R. Matthews, and A. J. Boulton, “Remote assessment of diabetic foot ulcers using a novel wound imaging system,” Wound Repair and Regeneration, vol. 19, no. 1, pp. 25–30, 2011.
  • [15] C. E. Hazenberg, J. J. van Netten, S. G. van Baal, and S. A. Bus, “Assessment of signs of foot infection in diabetes patients using photographic foot imaging and infrared thermography,” Diabetes technology & therapeutics, vol. 16, no. 6, pp. 370–377, 2014.
  • [16] P. Foltynski, J. M. Wojcicki, P. Ladyzynski, K. Migalska-Musial, G. Rosinski, J. Krzymien, and W. Karnafel, “Monitoring of diabetic foot syndrome treatment: some new perspectives,” Artificial organs, vol. 35, no. 2, pp. 176–182, 2011.
  • [17] C. Liu, J. J. van Netten, J. G. Van Baal, S. A. Bus, and F. van Der Heijden, “Automatic detection of diabetic foot complications with infrared thermography by asymmetric analysis,” Journal of biomedical optics, vol. 20, no. 2, pp. 026 003–026 003, 2015.
  • [18] J. J. van Netten, M. Prijs, J. G. van Baal, C. Liu, F. van Der Heijden, and S. A. Bus, “Diagnostic values for skin temperature assessment to detect diabetes-related foot complications,” Diabetes technology & therapeutics, vol. 16, no. 11, pp. 714–721, 2014.
  • [19] L. Wang, P. Pedersen, E. Agu, D. Strong, and B. Tulu, “Area determination of diabetic foot ulcer images using a cascaded two-stage svm based classification,” IEEE Transactions on Biomedical Engineering, 2016.
  • [20] M. Goyal, N. D. Reeves, S. Rajbhandari, J. Spragg, and M. H. Yap, “Fully convolutional networks for diabetic foot ulcer segmentation,” arXiv preprint arXiv:1708.01928, 2017.
  • [21] H. Wannous, Y. Lucas, and S. Treuillet, “Enhanced assessment of the wound-healing process by accurate multiview tissue classification,” IEEE transactions on Medical Imaging, vol. 30, no. 2, pp. 315–326, 2011.
  • [22] M. Kolesnik and A. Fexa, “Multi-dimensional color histograms for segmentation of wounds in images,” Image Analysis and Recognition, pp. 1014–1022, 2005.
  • [23] M. Kolesnik and A. Fexa, “How robust is the svm wound segmentation?” in Signal Processing Symposium, 2006. NORSIG 2006. Proceedings of the 7th Nordic.   IEEE, 2006, pp. 50–53.
  • [24] E. S. Papazoglou, L. Zubkov, X. Mao, M. Neidrauer, N. Rannou, and M. S. Weingarten, “Image analysis of chronic wounds for determining the surface area,” Wound repair and regeneration, vol. 18, no. 4, pp. 349–358, 2010.
  • [25] F. Veredas, H. Mesa, and L. Morente, “Binary tissue classification on wound images with neural networks and bayesian classifiers,” IEEE transactions on medical imaging, vol. 29, no. 2, pp. 410–427, 2010.
  • [26] M. H. Hermans, “Wounds and ulcers: back to the old nomenclature,” Wounds, vol. 22, no. 11, pp. 289–93, 2010.
  • [27] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European Conference on Computer Vision.   Springer, 2014, pp. 818–833.
  • [28] M. Anthimopoulos, S. Christodoulidis, L. Ebner, A. Christe, and S. Mougiakakou, “Lung pattern classification for interstitial lung diseases using a deep convolutional neural network,” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1207–1216, 2016.
  • [29] H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning,” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1285–1298, 2016.
  • [30] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  • [31] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [32] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
  • [33] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, “Convolutional neural networks for medical image analysis: full training or fine tuning?” IEEE transactions on medical imaging, vol. 35, no. 5, pp. 1299–1312, 2016.
  • [34] M. H. Yap, C.-C. Ng, K. Chatwin, C. A. Abbott, F. L. Bowling, A. J. Boulton, and N. D. Reeves, “Computer vision algorithms in the detection of diabetic foot ulceration a new paradigm for diabetic foot care?” Journal of diabetes science and technology, p. 1932296815611425, 2015.
  • [35] B. Hewitt, M. H. Yap, and R. Grant, “Manual whisker annotator (mwa): A modular open-source tool,” Journal of Open Research Software, vol. 4, no. 1, 2016.
  • [36] J. C. Platt, “12 fast training of support vector machines using sequential minimal optimization,” Advances in kernel methods, pp. 185–208, 1999.
  • [37] D.-C. He and L. Wang, “Texture unit, texture spectrum, and texture analysis,” IEEE transactions on Geoscience and Remote Sensing, vol. 28, no. 4, pp. 509–512, 1990.
  • [38] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1.   IEEE, 2005, pp. 886–893.
  • [39] Y. LeCun, Y. Bengio et al., “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.
  • [40]

    Y. LeCun, C. Cortes, and C. J. Burges, “The mnist database of handwritten digits,” 1998.

  • [41] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia.   ACM, 2014, pp. 675–678.
  • [42] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [43]

    J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,”

    Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121–2159, 2011.
  • [44] T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, vol. 4, no. 2, 2012.
  • [45] M. H. Yap, K. E. Chatwin, C.-C. Ng, C. A. Abbott, F. L. Bowling, S. Rajbhandari, A. J. Boulton, and N. D. Reeves, “ footsnap : A new mobile application for standardizing diabetic foot images,” Journal of Diabetes Science and Technology, p. 1932296817713761, 2017.
  • [46] J. Alarifi, M. Goyal, A. Davison, D. Dancey, R. Khan, and M. H. Yap, “Facial skin classification using convolutional neural networks,” in Image Analysis and Recognition: 14th International Conference, ICIAR 2017, Montreal, QC, Canada, July 5–7, 2017, Proceedings, vol. 10317.   Springer, 2017, p. 479.