Recognition of Ischaemia and Infection in Diabetic Foot Ulcers: Dataset and Techniques

08/14/2019 ∙ by Manu Goyal, et al. ∙ Manchester Metropolitan University 4

Diabetic Foot Ulcers (DFU) detection using computerized methods is an emerging research area with the evolution of machine learning algorithms. However, existing research focuses on detecting and segmenting the ulcers. According to DFU medical classification systems, i.e. University of Texas Classification and SINBAD Classification, the presence of infection (bacteria in the wound) and ischaemia (inadequate blood supply) has important clinical implication for DFU assessment, which were used to predict the risk of amputation. In this work, we propose a new dataset and novel techniques to identify the presence of infection and ischaemia. We introduce a very comprehensive DFU dataset with ground truth labels of ischaemia and infection cases. For hand-crafted machine learning approach, we propose new feature descriptor, namely Superpixel Color Descriptor. Then, we propose a technique using Ensemble Convolutional Neural Network (CNN) model for ischaemia and infection recognition. The novelty lies in our proposed natural data-augmentation method, which clearly identifies the region of interest on foot images and focuses on finding the salient features existing in this area. Finally, we evaluate the performance of our proposed techniques on binary classification, i.e. ischaemia versus non-ischaemia and infection versus non-infection. Overall, our proposed method performs better in the classification of ischaemia than infection. We found that our proposed Ensemble CNN deep learning algorithms performed better for both classification tasks than hand-crafted machine learning algorithms, with 90 classification and 73

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 4

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Major progress in the field of machine learning allows us to make extensive use of medical imaging data to provide better diagnosis, treatment and prediction of diseases [1, 2]. Diabetic Foot Ulcers (DFUs) are major complication of diabetes which can lead to amputation of the foot or limb. In previous studies, various researchers, predominantly led by work from our laboratory have achieved high accuracy in the recognition of DFUs with machine learning algorithms [3, 4, 5, 6, 7]. There are a number of DFU classification systems such as Wagner, University of Texas, and SINBAD Classification systems, which include information on the site of DFU, area, depth, presence of neuropathy, presence of ischaemia, and infection [8, 9, 10].

The Wagner classification system is the most widely accepted classification system which is based on the depth of penetration, the presence of osteomyelitis or gangrene, and the extent of tissue necrosis according to the following grading list [8, 11].

  • Grade 0: No open lesions; may have deformity or cellulitis

  • Grade 1: Superficial diabetic ulcer (partial or full thickness)

  • Grade 2: Ulcer extension to ligament, tendon, joint capsule, or deep fascia without abscess or osteomyelitis

  • Grade 3: Deep ulcer with abscess, osteomyelitis, or joint sepsis

  • Grade 4: Gangrene localised to the portion of forefoot or heel

  • Grade 5: Extensive gangrenous involvement of the entire foot

The University of Texas classification system is frequently used by podiatrists and medical professionals to classify DFUs into the different categories depending upon the stages and grades

[9] as described in Table I. On the other hand, SINBAD classification is based on accumulated score. SINBAD stands for S (Site), I (Ischaemia), N (Neuropathy), B (Bacterial infection), A (Area), D (Depth). For each DFU, SINBAD score is calculated according to the six conditions as listed in Table II.

Stage Grade 0 Grade 1 Grade 2 Grade 3
A No open lesion Superficial wound Tendon Bone/joint
B Infection Infection Infection Infection
C Ischaemia Ischaemia Ischaemia Ischaemia
D Ischaemia/Infection Ischaemia/Infection Ischaemia/Infection Ischaemia/Infection
TABLE I: The University of Texas DFU classification system
Category Definition SINBAD score
Site Forefoot 0
Midfoot and hindfoot 1
Ischaemia Pedal blood flow intact: at least one pulse palpable 0
Clinical evidence of reduced pedal blood flow 1
Neuropathy Protective sensation intact 0
Protective sensation lost 1
Bacterial infection None 0
Present 1
Area Ulcer 1 cm 0
Ulcer 1 cm 1
Depth Ulcer confined to skin and subcutaneous tissue 0
Ulcer reaching muscle, tendon or deeper 1
Total possible score 6
TABLE II: The descriptions of SINBAD score according to the different conditions

The major challenges involved with the assessment of DFU using machine learning methods from foot images are as follows: 1) a major time-burden is involved in data collection and expert labelling of the DFU images; 2) there are high inter-class similarity and intra-class variations depending upon the different classification of DFU; 3) issues regarding non-standardization of the DFU dataset such as distance of the camera from the foot, orientation of the image and, lighting conditions; 4) lack of meta-data such as patient ethnicity, age, sex and foot size. Applying machine learning techniques to find these conditions or factors for current dataset could be very difficult, as the DFU images are captured in the hospitals without any standardization such as distance of the image acquisition from the foot and orientation of the foot. The current dataset we received with the ethical approval from NHS did not contain any records or meta-data about these conditions or any medical classification. Knowledge of the presence of infection and/or ischaemia from purely visualizing a DFU is very difficult even for an experienced podiatrist (foot specialist) or medical professional as there are certain physical and medical tests are needed to objectively diagnose these conditions. To find the presence or absence of these conditions linked to specific DFUs, expert annotations from medical professional specialized in DFUs are required. The brief description of each condition is described with machine learning perspective as well.

  1. Site: The site of DFU provides information about the location of the DFU on foot. On the sole of the foot, DFUs predominantly occur on the forefoot and also at the midfoot or hindfoot. The site of DFU is important to compare against previous ulcerations (if it was a recurrence of a previous DFU, or a newly developed DFU). This can be easily performed by a person even without prior medical knowledge.

  2. Ischaemia: Inadequate blood supply that could affect DFU healing. Ischaemia is diagnosed by palpating foot pulses and measuring blood pressure in the foot and toes. The visual appearance of ischaemia might be indicated by the presence of poor reperfusion to the foot, or black gangrenous toes (tissues death to part of the foot). From a computer vision perspective, these might be important hints of the presence of ischaemia in the DFU.

  3. Neuropathy: Neuropathy is defined as loss of sensation in the lower extremities i.e. foot region due to damage of the peripheral nerves. Neuropathy is again infeasible with the help of machine learning techniques as there is unlikely to be any visual hint to detect neuropathy in the foot. Clinicians can determine the presence of peripheral neuropathy using physical tests such as Ipswich Touch Test [12], the neuropathy disability score and the vibration perception threshold. Usually, patients with DFU have underlying neuropathy.

  4. Bacterial Infection: Infection is defined as bacterial soft tissue or bone infection in the DFU which is based on the presence of at least two classic findings of inflammation or purulence. It is very hard to determine the presence of diabetic foot infections from DFU images but increased redness in and around ulcer and coloured purulent could provide indications. In the medical system, blood testing is performed as the gold standard diagnostic test. Also, in the present dataset, the images were captured after the debridement of necrotic and devitalized tissues which removes an important indication of the presence of infection in DFU.

  5. Area: Area of the DFU measures the extent of the 2 Dimensional (2D) shape of DFU on the foot. For the purpose of DFU grading systems, it is important to know whether the area of the DFU is greater than 1 cm or not. Since, as mentioned earlier, the inconsistent images in the current dataset due to distance, orientation and lighting as the data captured in hospital, DFU images are captured with different magnification and angles. Also, there is no meta-data about the patient’s ethnicity, age, sex which makes this task of measuring area very difficult.

  6. Depth: Depth of the DFU determines the distance from the surface of the skin to the bottom of the wound due to tissue damage and loss. The depth of the DFU can be classified into two categories for the purpose of the DFU grading systems, whether the DFU is superficial that is confined to the skin and subcutaneous tissue or whether the DFU is penetrating to deeper structures: muscle, tendon and bone. The depth of DFU cannot be measured from 2D images of wound, therefore there is no depth information provided in the current dataset to be able to determine this parameter. But thanks to the latest technological advancement in mobile phone’s camera that are coming with in-built depth sensor, it would be easy to obtain depth information of DFU.

The University of Texas classification system for classifying and grading a DFU consists of 16 categories and the Wagner classification system consists of 6 categories for classifying a DFU. We did not use these systems for this work because of two main reasons: (1) we have limited dataset hence dividing the images into many categories can lead to very unbalanced dataset for this experiment; (2) many categories in these classification systems depends on DFU area and DFU extension to ligament, tendon, joint capsule, or deep fascia (depth) which is not possible with our 2-D DFU dataset. Hence, this work, we investigate identification of ischaemia and infection which form a central part of all three of these DFU classification systems.

In the related work, Netten et al. [13]

find that clinicians achieved low validity and reliability for remote assessment of DFU in foot images. Hence, it is clear that analysing these conditions from images are extremely difficult even by the expert podiatrists. In various image recognition, medical imaging and natural language processing tasks where machine learning algorithms performed better than skilled humans

[14, 15, 16]. This experiment is performed to analyse the performance of machine learning algorithms on the recognition of ischaemia and infection on non-standardized DFU dataset.

Recognition of infection and ischaemia are very important to determine factors that predict the healing progress of DFU and risk of amputation. ischaemia develops due to lack of circulation of blood into the foot, that results in spontaneous necrosis of the most poorly perfused tissues (gangrene), which may ultimately require amputation of part of the foot or leg. Detailed knowledge of the vascular anatomy of the leg, and particularly ischaemia helps the medical experts make better decisions in estimating the possibility of DFU healing, given the existing blood supply

[17]. In the previous studies, it is estimated that patients with critical ischaemia have a three-year limb loss rate of about 40% [18]. Patients with an active DFU and particularly those with ischaemia or gangrene should be checked for the presence of infection. Approximately, 56% of DFU become infected and 20% of DFU infections lead to amputation of foot or limb [19, 20, 21]. In one recent study, 785 million patients with diabetes in the US between 2007 and 2013 suggested that DFU and associated infections constitute a powerful risk factor for emergency department visits and hospital admission [22]. Due to high risks of infection and ischaemia in DFU leading to patient’s hospital admission, and amputation [23], recognition of infection and ischaemia in DFU with cost-effective machine learning methods is a very important step towards the development of complete computerized DFU assessment system for remote monitoring in the future.

Ii DFU Dataset and Expert Labelling

For binary classification of ischaemia and infection in DFU, we introduce a dataset of 1459 images of patient’s foot with DFU over the previous five years at the Lancashire Teaching Hospitals, obtaining ethical approval from all relevant bodies and patient’s written informed consent. Approval was obtained from the NHS Research Ethics Committee to use these images for this research. These DFU images were captured with different cameras that are Kodak DX4530, Nikon D3300 and Nikon COOLPIX P100.

Fig. 1: The number of DFUs according to the ratio of area of DFU (Bounding Box) and the total area of image in DFU dataset

Expert labelling of each DFU according to the different conditions present in DFU according to the popular medical classification system on this DFU dataset is really important for this task. The ground truth was produced by two healthcare professionals (consultant physicians with specialisation in the diabetic foot) on the visual inspection of DFU images. Where there was disagreement for the ground truth, the final decision was made by the more senior physician. These ground truths are used for the binary classification of infection and ischaemia of DFU. The complete number of cases of expert annotation of each condition is detailed in Table III. The dataset alongside with its ground truth labels will be made available upon acceptance of this article.

Category Definition Cases DFU patches Augmented patches
Ischaemia Absent 1249 1431 4935
Present 235 705 4935
Total images 1459 1666 9870
Bacterial infection None 628 684 2946
Present 831 982 2946
Total images 1459 1666 5892
TABLE III: This Table detailed the number of Infection and ischaemia cases, number of DFU patches and augmented patches using Natural Data-augmentation in DFU Dataset

As shown in Table III, the number of cases for ischaemia and non-ischaemia in DFU are quite unbalanced whereas infection and non-infection cases are fairly balanced.

Fig. 2: Example of extracting red and black regions from DFU patch with proposed Superpixel Color Descriptor algorithm which was then used to inform identification of ischaemia and infection.

Iii Methodology

This section describes our proposed techniques for the recognition of ischaemia and infection of DFU diagnosis system. The preparation of balanced dataset, hand-crafted features, and machine learning methods (hand-crafted machine learning and deep learning approaches) used for binary classification of ischaemia and infection are detailed in this section.

Iii-a Hand-crafted Superpixel Color Descriptors

We investigated the use of human design features with traditional machine learning on the binary classification of infection and ischaemia. We proposed novel Superpixel Color Descriptors (SPCD) to extract the region of colors of interest from DFU images that could be the important visual cues for identification of ischaemia and infection in DFU. In the first step, we used SLIC superpixels technique to produce superpixel over-segmentation of DFU patches based on pixel color and intensity values [24]. SLIC superpixels technique performs a localized k-means optimization in the 5-D CIELAB color and image space to cluster pixels as described by equations 1 - 4:

(1)
(2)
(3)
(4)

where N is the number of pixels and k is the number of superpixels, D decides the closest center for every pixel, ds is the spatial proximity, x and y represent the pixel position, dlab is the color proximity and m allows us to weigh the relative importance between color similarity and spatial proximity.

In the second step, the mean RGB color value of each superpixel is computed and applied to each superpixel (S) denoted by:

(5)

where P(R,G,B) is the pixel values of R,G,B channel in each ith position of S and n is total number of superpixels in the image.

Finally, with different number of superpixels and threshold values from each color channel, we extracted regions of two particular colors of interest that are red and black from the DFU patches. For these classification tasks, we used the number of superpixels (N=200) and threshold values (T1: 0.40,0.45,0.50,.055,0.60; T2: 0.15,0.20,0.25,0.30,0.35) to extract the color features from DFU patches. The pseudocode for SPCD algorithm is explained in Algorithm 1.

1:Over-segmentation of DFU patch with SLIC superpixel is performed;
2:Mean RGB value of each superpixel is calculated and applied;
3:Initialize variable S_Red  &  S_Black to 0
4:procedure RedAndBlackRegion
5:     for  do
6:         if  then return S_Red= S_Red + 1          
7:         if  then return S_Black= S_Black + 1          
8:         
9:               
Algorithm 1 Pseudocode for Superpixel Color Descriptors Extraction

The example of extracting color features using our novel SPCD algorithm is shown in Fig. 2.

For these classification problems, we tried the number of classifiers with standard hyper-parameters on these color features in which BayesNet, Random Forest, and Multilayer Perceptron were selected as these methods achieved the highest accuracy among other machine learning classifiers.

Iii-B A Novel Deep Learning Technique based on Natural Data-Augmentation

This section describes our proposed deep learning technique. We detailed data augmentation method and an ensemble CNN approach for recognition of ischaemia and infection.

Iii-B1 Natural Data-Augmentation for DFU images

In DFU dataset, the size of images varies between 1600 1200 and 3648 2736 depending on the different professional cameras used to capture the data. In deep learning, data augmentation is tipped as an important tool to improve the performance of algorithms. As shown in Fig. 1, about 92% of DFU have area between 0% to 20% on foot images. In common data-augmentation, there are number of techniques used such as flip, rotation, random scale, random crop, translation, Gaussian noise to perform augment in the dataset. Since, DFU occupy a very small percentage of the total area of foot images, there is a risk of missing the region of interests by using important augmentation technique such as random scale, crop and translation. Hence natural data-augmentation is more suitable for the DFU evaluation rather than common data-augmentation. This augmentation technique helps in assisting the machine algorithms to clearly pinpoint ROI of DFU on foot images and focus on finding the strong features exists in this area. We used the deep learning based localization method called Faster-RCNN with InceptionResNetV2 to get ROI of the DFU on foot images [25, 26]. Depending upon the size of DFU and image, the natural data-augmentation on DFU dataset with different magnification is demonstrated in Fig. 3. With flexible parameters to chose the number of magnification factors (3 in this classification) as well as magnification distance can be adjusted from single DFU image by natural augmentation. After magnification, further data-augmentation is achieved with the help of angles, mirror, gaussian noise, contrast, sharpen, translation, shearing using our proposed methods as shown in Fig. 4.

(a) Image (b) Ist MAG (c) 2nd MAG (d) 3rd MAG
Fig. 3: Natural data-augmentation produced from the original image with different magnifications (three magnifications in this experiment). MAG refers to magnification
(a) Image (b) Mirror (c) 45 (d) 90
(e) Gaussian Noise (f) Salt and pepper (g)Translate (h)Shear
Fig. 4: After magnification, different types of data-augmentation is achieved by the proposed natural data-augmentation

Iii-B2 Ensemble Convolutional Neural Networks

For comparison with the traditional features, deep learning algorithms are used to perform binary classification to classify (1) infection and non-infection; (2) ischaemia and non-ischaemia classes in DFU patches. For this work, we fine-tuned (transfer learning from pre-trained models) the state-of-the-art CNN models, i.e. Inception-V3, ResNet50, and InceptionResNetV2

[27, 28, 29].

We also utilized the Ensemble CNN model which combine the bottleneck features from multiple CNN models (Inception-V3, ResNet50, and InceptionResNetV2) and SVM as a classifier to produce predictions as shown in Fig. 5.

Fig. 5: Extracting bottleneck features from CNNs and fed into SVM classifier to perform binary classification of ischaemia and infection. where C1-C5 are convolutional layers, P1-P5 are pooling layers and FC is fully connected layer. Note: The CNNs in this figure are just representation of general CNNs architecture and do not represent the original CNN architectures of Inception-V3, ResNet50, and InceptionResNetV2.

Iv Results and Discussion

Both infection and ischaemia datasets were split into the 70% training, 10% validation and 20% testing sets and we adopted the 5-fold cross-validation technique. Hence, in this ischaemia dataset, we used approximately 11,564 patches, 1,652 patches, and 3,304 patches in training, validation, and testing sets respectively whereas in the infection dataset, we used 7,136 patches (training), 1,019 patches (validation), and 2,038 patches (testing) from the 2611 original foot images. As mentioned previously, we used both hand-crafted traditional machine learning (henceforth TML) models and CNN models to perform the classification task and utilized 256256 RGB images as input for TML and InceptionV3, AlexNet, and ResNet50. For InceptionResNetV2, we resized the dataset to 299299.

In Table IV and V, we report Accuracy, Sensitivity, Precision, Specificity, F-Measure, Matthew Correlation Coefficient (MCC) and Area under the ROC curve (AUC)

as our evaluation metrics.

Accuracy Sensitivity Precision Specificity F-Measure MCC Score AUC Score
BayesNet 0.7850.022 0.7740.034 0.8090.034 0.8000.027 0.7900.020 0.5720.044 0.783
Random Forest 0.7800.041 0.7390.049 0.8720.029 0.8420.034 0.7990.033 0.5710.078 0.780
Multilayer Perceptron 0.8040.022 0.8170.040 0.7870.046 0.7950.031 0.8000.023 0.6100.045 0.804
InceptionV3 (CNN) 0.8410.017 0.7840.045 0.8860.018 0.8980.022 0.8310.021 0.6880.031 0.840
ResNet50 (CNN) 0.8620.018 0.7970.043 0.9170.015 0.9270.017 0.8520.022 0.7320.032 0.865
InceptionResNetV2 (CNN) 0.8530.021 0.7890.054 0.9060.017 0.9170.019 0.8420.027 0.7140.039 0.851
Ensemble (CNN) 0.9030.012 0.8860.035 0.9180.019 0.9210.021 0.9020.014 0.8070.022 0.904
TABLE IV: The performance measures of binary classification of ischaemia by our proposed hand-crafted traditional machine learning and CNN approaches.
Accuracy Sensitivity Precision Specificity F-Measure MCC Score AUC Score
BayesNet 0.6390.036 0.6190.018 0.6530.039 0.6600.015 0.6220.079 0.2900.070 0.643
Random Forest 0.6050.025 0.6080.025 0.6070.037 0.6010.069 0.6060.012 0.2110.051 0.601
Multilayer Perceptron 0.6210.026 0.6800.023 0.6220.057 0.5700.023 0.6270.074 0.2810.055 0.619
InceptionV3 (CNN) 0.6620.014 0.6930.038 0.6530.015 0.6310.034 0.6720.019 0.3250.029 0.662
ResNet50 (CNN) 0.6730.013 0.6920.051 0.6680.023 0.6540.051 0.6790.019 0.3480.028 0.673
InceptionResNetV2 (CNN) 0.6760.015 0.6880.052 0.6720.015 0.6640.039 0.6800.024 0.3520.031 0.678
Ensemble (CNN) 0.7270.025 0.7090.044 0.7350.036 0.7440.050 0.7220.028 0.4540.052 0.731
TABLE V: The performance measures of binary classification of Infection by our proposed hand-crafted traditional machine learning and CNN approaches.

When comparing the performance of the computerized methods and our proposed techniques, CNNs performed better in the binary classification of ischaemia than infection despite more unbalanced data in the ischaemia dataset (due to more cases of non-ischaemia in the dataset). The average performance of all the models in terms of accuracy in the ischaemia dataset was 83.3% which is notably better than the average accuracy of 65.8% in infection dataset. Similarly, MCC Score and AUC Score are considered to be viable performance measures to compare the classification results. We received an average MCC Score and AUC Score for ischaemia classification of 67.1% and 83.2% respectively as compared to the infection classification of 32.3% and 65.8% respectively. The ROC curves for all the algorithms including TML and CNNs for binary classification of ischaemia and infection are shown in Fig. 6 and 7. When comparing the performances in ischaemia classification of TML and CNNs, CNNs (86.5%) outperformed the TML models (79%). Similarly, in infection classification, the accuracy of CNNs (68.4%) performed better than TML (62.1%) with a margin of 6.3%. Notably, Ensemble CNN method achieved the highest score in all performance measures in both ischaemia and infection classification.

Sensitivity and Specificity are considered as important performance measures in medical imaging. The ensemble method yielded high Sensitivity for the ischaemia dataset with a margin of 6.9% from second best performing algorithm multilayer perceptron. Interestingly, multilayer perceptron performed worst in the Specificity with a score of 79.5%. For Specificity in the ischaemia dataset, the ensemble method once again achieved the highest score of 92.9% which is marginally better than ResNet50 (92.7%).

Fig. 6: ROC curve for all TML and CNN methods for ischaemia classification.
Fig. 7: ROC curve for all TML and CNN methods for Infection classification.

In infection classification, both TML and CNN methods received moderate scores in the performance measures. Again, CNN methods performed better than TML methods achieving the highest score in all performance measures. The Ensemble CNN method performed better than other CNN classifiers especially for Specifcity with a score of 74.4% in infection classification with a notable margin of 8% than the second best performing algorithm InceptionResNetV2(66.4%). For Sensitivity, all the CNNs marginally performed well with Ensemble method achieved the highest score of 70.9%. When comparing the performance of TML methods, Multilayer Perceptron (68.0%) performed well in Sensitivity, whereas BayesNet (66%) in Specificity.

(a) (b) (c) (d)
Accurate non-ischaemia cases Accurate ischaemia cases
Fig. 8: Examples of correctly classified cases by Ensemble-CNN on ischaemia dataset. (a) and (b) represent non-ischaemia cases. (c) and (d) represent ischaemia cases.
(a) (b) (c) (d)
Misclassified non-ischaemia cases Misclassified ischaemia cases
Fig. 9: Examples of misclassified cases by Ensemble-CNN on ischaemia dataset. (a) and (b) represents non-ischaemia cases. (c) and (d) represents ischaemia cases.
(a) (b) (c) (d)
Accurate non-infection cases Accurate infection cases
Fig. 10: Examples of correctly classified cases by Ensemble-CNN on Infection dataset. (a) and (b) represents non-infection cases. (c) and (d) represents infection cases.
(a) (b) (c) (d)
Misclassified non-infection Misclassified infection cases
Fig. 11: Examples of misclassified cases by Ensemble-CNN on Infection dataset. (a) and (b) represents non-infection cases. (c) and (d) represents infection cases.

Iv-a Experimental Analysis and Discussion

Assessment of DFU with computerized methods is very important for supporting global healthcare systems through improving triage and monitoring procedures and reducing hospital time for patients and clinicians. This preliminary experiment is focused on automatically identifying the important conditions of ischaemia and infection of DFU. The main aim of this experiment was to identify ischaemia and infection from images of the feet using machine learning. We have illustrated a few examples of correctly and incorrectly classified cases in both binary classifications of ischaemia (Fig. 8 and 9) and infection (Fig. 10 and 11). As for the misclassified cases, there are huge intra-class dissimilarities and inter-class similarities between (1) infection and non-infection; (2) ischaemia and non-ischaemia cases in the DFU that make classifiers difficult to predict the correct class. Also, there are other influential factors in the classification of these conditions such as lighting conditions, marks, and skin tone due to the patient’s ethnicity. In misclassified cases of non-ischaemia as shown in Fig. 9, the cases (a) and (b) are hindered by the lighting condition (shadow) respectively whereas in the (c) and (d) misclassified ischaemia cases, the ischaemia features may be too subtle to be recognised from the images by the algorithm, or alternatively it is also likely that we needed a more sensitive objective measure of the ground truth from vascular assessments. We found that shadows are particularly problematic especially for the ischaemia classification, because machine learning algorithms can be deceived by shadows especially in determining the important conditions such as ischaemia. In Fig. 11, misclassified cases of non-infection, the presence of blood in the case (a) whereas in the case (b) belongs to one of the rare cases in the dataset that is the presence of ischaemia and non-infection. In misclassified infection cases, the visual indicators of infection were likely too subtle, or we needed more sensitive objective ground truth provided through blood analysis.

In this work, we used the proposed natural data-augmentation with the help of DFU localisation to create DFU patches from full-size foot images. These patches are useful to focus more on finding the visual indicators for important factors of DFU such as infection and ischaemia. Then, we investigated the use of both TML and CNNs to determine these conditions as binary classification. In this experiment, we received very good performance in terms of correctly classifying ischaemia despite the unbalanced dataset. But in the case of infection, the classifiers did not perform well, as the condition of infection is very hard to recognise from the foot images even by the experienced medical experts specialized in DFU and therefore likely requires ground truth determined using objective blood tests to identify bacterial infection.

Current research focuses on ischaemia and infection recognition in medical classification systems, which required the guidance of medical experts specialized in DFU. To develop a computer-aided tool for medical experts in remote foot analysis, i.e. a remote DFU diagnosis system, the following are some future challenges.

  1. Recognition of the ischaemia and infection with machine learning algorithms is an important proof-of-concept study for foot pathologies classification. Further analysis of each pathology on foot images is required according to the medical classification systems such as the University of Texas Classification of DFU [9] and SINBAD Classification System [10]. This requires close collaboration with medical experts specialized in DFU. The current dataset is not sufficient for predicting other foot conditions such as area and depth.

  2. Deep learning algorithms need substantial datasets to get very good accuracy especially for the medical imaging. This experiment only included 1459 DFU images whereas in future, if these algorithms are trained with larger number of a more balanced dataset, it can possibly improve the recognition of both ischaemia and infection.

  3. The current ground truth is based on visual inspection by experts only and not supported by the medical notes or clinical tests (vascular assessment for ischaemia and blood tests to identify the presence of any bacterial infection). Also, DFU images were debrided before these images were captured. Hence, the debridement of DFU removes the important visual indicators of infection such as coloured exudate. Therefore, the sensitivity and specificity of these algorithms can be further improved in the future feeding in ground truth from clinical tests such as vascular assessments (ischaemia) and blood tests (to identify the presence of any bacterial infection).

  4. Current clinical practice obtains the photo of the foot using different camera models, poses and illumination. It is a great challenge for computer algorithm to predict the depth and the size of the wound based on non-standardized images. With standardized dataset, such as data collection method proposed by [30], will help to increase the accuracy of the DFU diagnosis system.

  5. Dataset annotation is a laborious process, particularly for the medical experts to label the foot pathologies into 16 classes according to the University of Texas classification system. To reduce the burden of medical experts in delineation and annotation of the dataset, there is an urgent need to focus on developing unsupervised or self-supervised machine learning techniques.

  6. Collecting the time-line dataset is crucial for early detection of key pathologies. This will enable the monitoring of foot health and changes longitudinally, where the medical experts and computer algorithm can learn the early signs of DFU. In longer term, the DFU diagnosis system will be able to predict the healing process of the ulcers and prevent the DFU before it happens.

  7. A potential smart-phone app could be developed for the remote triage and monitoring of DFU. To scale-up the DFU diagnosis system, the application will run on multiple devices, disregard of the platform and/or the type of operating system.

V Conclusion

In this work, we trained various classifiers based on traditional machine learning algorithms and CNNs to discriminate the conditions of (1) ischaemia and non-ischaemia; (2) infection and non-infection related to a given DFU. We found high-performance measures in the binary classification of ischaemia, whereas a moderate performance by classifiers in the classification of infection. It is vital to understand the features of both conditions in relation to the DFU (ischaemia and infection) from the computer vision perspective. Determining these conditions especially infection from the non-standard foot images is very challenging due to (1) high visual intra-class dissimilarities and inter-class similarities between classes; (2) the visual indicators of infection and ischaemia potentially being too subtle in DFU; (3) objective medical tests for vascular supply and bacterial infection are needed to provide more objective ground truth and further improve the classification of these conditions; (4) other factors such as lighting conditions, marks, and skin tone due to patient’s ethnicity are important to incorporate into the prediction.

With a more balanced dataset and improved data capturing of DFU, the performance of these methods could be improved in the future. Also, ground truths enhanced by clinical tests for the ischaemia and infection may provide further insight and further improvement of algorithms even where there is no apparent visual indicator by eye. In the case of infection even after debridement, ground truth informed by blood tests for infection may yield improvements to sensitivity and specificity even in the absence of overtly obvious visual indicators. This work has the potential for technology that may transform the recognition and treatment of diabetic foot ulcers and lead to a paradigm shift in the clinical care of the diabetic foot.

Vi Acknowledgements

The authors express their gratitude to Lancashire Teaching Hospitals and the clinical experts for their extensive support and contribution in carrying out this research.

References

  • [1] M. H. Yap, M. Goyal, F. M. Osman, R. Martí, E. Denton, A. Juette et al., “Breast ultrasound lesions recognition: end-to-end deep learning approaches,” Journal of Medical Imaging, vol. 6, no. 1, p. 011007, 2018.
  • [2] E. Ahmad, M. Goyal, J. S. McPhee, H. Degens, and M. H. Yap, “Semantic segmentation of human thigh quadriceps muscle in magnetic resonance images,” arXiv preprint arXiv:1801.00415, 2018.
  • [3] M. Goyal, N. D. Reeves, A. K. Davison, S. Rajbhandari, J. Spragg, and M. H. Yap, “Dfunet: convolutional neural networks for diabetic foot ulcer classification,” IEEE Transactions on Emerging Topics in Computational Intelligence, pp. 1–12, 2018.
  • [4] M. Goyal, M. H. Yap, N. D. Reeves, S. Rajbhandari, and J. Spragg, “Fully convolutional networks for diabetic foot ulcer segmentation,” in 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Oct 2017, pp. 618–623.
  • [5] M. Goyal, N. D. Reeves, S. Rajbhandari, and M. H. Yap, “Robust methods for real-time diabetic foot ulcer detection and localization on mobile devices,” IEEE Journal of Biomedical and Health Informatics, vol. 23, no. 4, pp. 1730–1741, July 2019.
  • [6] C. Wang, X. Yan, M. Smith, K. Kochhar, M. Rubin, S. M. Warren et al., “A unified framework for automatic wound segmentation and analysis with deep convolutional neural networks,” in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE.   IEEE, 2015, pp. 2415–2418.
  • [7] J. J. van Netten, J. G. van Baal, C. Liu, F. van Der Heijden, and S. A. Bus, “Infrared thermal imaging for automated detection of diabetic foot complications,” 2013.
  • [8] F. W. Wagner, “The diabetic foot,” Orthopedics, vol. 10, no. 1, pp. 163–172, 1987.
  • [9] L. A. Lavery, D. G. Armstrong, and L. B. Harkless, “Classification of diabetic foot wounds,” The Journal of Foot and Ankle Surgery, vol. 35, no. 6, pp. 528–531, 1996.
  • [10] P. Ince, Z. G. Abbas, J. K. Lutale, A. Basit, S. M. Ali, F. Chohan, S. Morbach et al., “Use of the sinbad classification system and score in comparing outcome of foot ulcer management on three continents,” Diabetes care, vol. 31, no. 5, pp. 964–967, 2008.
  • [11] R. G. Frykberg, “Diabetic foot ulcers: pathogenesis and management.” American family physician, vol. 66, no. 9, pp. 1655–1662, 2002.
  • [12] G. Rayman, P. R. Vas, N. Baker, C. G. Taylor, C. Gooday, A. I. Alder et al., “The ipswich touch test: a simple and novel method to identify inpatients with diabetes at risk of foot ulceration,” Diabetes care, vol. 34, no. 7, pp. 1517–1518, 2011.
  • [13] J. J. van Netten, D. Clark, P. A. Lazzarini, M. Janda, and L. F. Reed, “The validity and reliability of remote diabetic foot ulcer assessment using mobile phone images,” Scientific Reports, vol. 7, no. 1, p. 9480, 2017.
  • [14] V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy et al., “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs,” Jama, vol. 316, no. 22, pp. 2402–2410, 2016.
  • [15] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, pp. 115–118, 2017.
  • [16]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [17] J. D. Santilli and S. M. Santilli, “Chronic critical limb ischemia: diagnosis, treatment and prognosis.” American family physician, vol. 59, no. 7, pp. 1899–1908, 1999.
  • [18] M. Albers, A. C. Fratezi, and N. De Luccia, “Assessment of quality of life of patients with severe ischemia as a result of infrainguinal arterial occlusive disease,” Journal of vascular surgery, vol. 16, no. 1, pp. 54–59, 1992.
  • [19] L. Prompers, M. Huijberts, J. Apelqvist, E. Jude, A. Piaggesi, K. Bakker et al., “High prevalence of ischaemia, infection and serious comorbidity in patients with diabetic foot disease in europe. baseline results from the eurodiale study,” Diabetologia, vol. 50, no. 1, pp. 18–25, 2007.
  • [20] B. A. Lipsky, A. R. Berendt, P. B. Cornia, J. C. Pile, E. J. Peters, D. G. Armstrong et al., “2012 infectious diseases society of america clinical practice guideline for the diagnosis and treatment of diabetic foot infections,” Clinical infectious diseases, vol. 54, no. 12, pp. e132–e173, 2012.
  • [21] L. A. Lavery, D. G. Armstrong, R. P. Wunderlich, J. Tredwell, and A. J. Boulton, “Diabetic foot syndrome: evaluating the prevalence and incidence of foot pathology in mexican americans and non-hispanic whites from a diabetes disease management cohort,” Diabetes care, vol. 26, no. 5, pp. 1435–1438, 2003.
  • [22] G. H. Skrepnek, J. L. Mills, L. A. Lavery, and D. G. Armstrong, “Health care service and outcomes among an estimated 6.7 million ambulatory care diabetic foot cases in the us,” Diabetes Care, vol. 40, no. 7, pp. 936–942, 2017.
  • [23] J. L. Mills Sr, M. S. Conte, D. G. Armstrong, F. B. Pomposelli, A. Schanzer, A. N. Sidawy et al., “The society for vascular surgery lower extremity threatened limb classification system: risk stratification based on wound, ischemia, and foot infection (wifi),” Journal of vascular surgery, vol. 59, no. 1, pp. 220–234, 2014.
  • [24] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels,” Tech. Rep., 2010.
  • [25] M. Goyal and M. H. Yap, “Region of interest detection in dermoscopic images for natural data-augmentation,” arXiv preprint arXiv:1807.10711, 2018.
  • [26] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi et al., “Speed/accuracy trade-offs for modern convolutional object detectors,” arXiv preprint arXiv:1611.10012, 2016.
  • [27] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2016, pp. 2818–2826.
  • [28] C. Szegedy, S. Ioffe, and V. Vanhoucke, “Inception-v4, inception-resnet and the impact of residual connections on learning,” CoRR, vol. abs/1602.07261, 2016. [Online]. Available: http://arxiv.org/abs/1602.07261
  • [29] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [30] M. H. Yap, K. E. Chatwin, C.-C. Ng, C. A. Abbott, F. L. Bowling, S. Rajbhandari et al., “A new mobile application for standardizing diabetic foot images,” Journal of diabetes science and technology, vol. 12, no. 1, pp. 169–173, 2018.