Identifying Pediatric Vascular Anomalies With Deep Learning

09/16/2019 ∙ by Justin Chan, et al. ∙ Seattle Children’s Hospital University of Washington 3

Vascular anomalies, more colloquially known as birthmarks, affect up to 1 in 10 infants. Though many of these lesions self-resolve, some types can result in medical complications or disfigurement without proper diagnosis or management. Accurately diagnosing vascular anomalies is challenging for pediatricians and primary care physicians due to subtle visual differences and similarity to other pediatric dermatologic conditions. This can result in delayed or incorrect referrals for treatment. To address this problem, we developed a convolutional neural network (CNN) to automatically classify images of vascular anomalies and other pediatric skin conditions to aid physicians with diagnosis. We constructed a dataset of 21,681 clinical images, including data collected between 2002-2018 at Seattle Children's hospital as well as five dermatologist-curated online repositories, and built a taxonomy over vascular anomalies and other common pediatric skin lesions. The CNN achieved an average AUC of 0.9731 when ten-fold cross-validation was performed across a taxonomy of 12 classes. The classifier's average AUC and weighted F1 score was 0.9889 and 0.9732 respectively when evaluated on a previously unseen test set of six of these classes. Further, when used as an aid by pediatricians (n = 7), the classifier increased their average visual diagnostic accuracy from 73.10 91.67 to improve diagnosis of these conditions, particularly in resource-limited areas.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

page 19

page 20

page 21

page 22

page 23

page 24

page 25

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Vascular anomalies, colloquially known as “birthmarks”, affect between 5 and 13% [1, 2] of all infants. Most are associated with changes in the appearance of the overlying skin, and initial diagnosis is commonly made by history and physical exam, a large part of which is visual inspection.

Vascular anomalies encompass a large number of diagnoses which are classified by the International Society for the Study of Vascular Anomalies (ISSVA) into vascular tumors and vascular malformations. Some vascular anomalies, such as certain infantile hemangiomas, will fade with time and can be managed medically or with observation in the primary care setting [3, 4]. In contrast, venous malformations, congenital hemangiomas, and lymphatic malformations are best managed early by a multidisciplinary team of specialists including dermatologists, surgeons, and pediatricians. Without proper treatment, bleeding, infection, permanent disfigurement, or airway complications may occur. As a result, early and correct diagnosis is critical to prevent delays in management [5, 6].

Accurate diagnosis of vascular anomalies is challenging. Vascular anomalies can occur on any surface of the body, and the skin manifestations can include a wide range of sizes and hues. As a result, these lesions can be difficult to visually differentiate, and it has been shown that as low as 31 – 53% of vascular anomalies have a correct diagnosis at the time of referral [6, 7, 8]. Additionally, some of these anomalies may be confused with common pediatric dermatologic conditions. This can lead to misplaced expectations from patients and families, and in some cases, delays in delivery of care. Thus, to assist pediatricians and other primary care physicians with accurate diagnosis, we develop an image classification system which uses a convolutional neural network (CNN) to automatically classify images of vascular anomalies.

A computer-aided system for classifying vascular anomalies currently does not exist. At most tertiary vascular anomaly centers, proper diagnosis and staging relies on a combination of modalities including clinical history and exam, imaging, angiography, tissue biopsy, and multidisciplinary consensus [6, 1]. However, these are not readily available in the primary care setting.

In this study, we construct a dataset comprising 21,681 labeled images of cutaneous skin lesions spanning 15 different pediatric dermatologic conditions including nine vascular anomalies. 10,700 images in the dataset were collected from Seattle Children’s Hospital during 2002–2018 as clinical photographs were routinely obtained for all patients visiting the vascular anomaly clinic. Additionally, we include 10,981 images from five dermatologist-curated online repositories to supplement image classes that were sparse or did not exist in our clinical dataset. We developed a CNN,[9] a type of deep learning system optimized for image classification, to visually identify and diagnose vascular anomalies. CNNs are able to automatically learn representations of input data and make predictions without the need for extensive pre-processing and feature engineering[10]. CNNs have shown specialist-level accuracy at diagnosing diseases such as melanoma[11], pneumonia[12], diabetic retinopathy[13] and cardiovascular risk[14]. We demonstrate that such a tool can improve diagnostic accuracy for vascular anomalies and other pediatric dermatologic conditions among a cohort of pediatricians.

Results

All images collected from Seattle Children’s Hospital were diagnosed by biopsy, computerized tomography (CT), angiography, ultrasonography, or specialist consensus when possible. Demographic information was collected for images associated with a valid medical record number and date of photography. The female to male ratio was 2.01 and the median age was 4 (inter-quartile range: 2) months. The images selected for inclusion were further curated by three vascular anomaly surgeons to exclude lesions without cutaneous manifestation. All images were de-identified and cropped to only include the area relevant for diagnosis. These images are organized into a taxonomy of 12 different vascular anomaly and pediatric dermatologic classes as shown in Fig. 

1a. Vascular anomalies classified under the same taxonomy, such as venous and glomuvenous malformations, were grouped together. Fig. 1

b shows example images for six of these classes, illustrating the visual similarity among these lesions.

We use a technique known as transfer learning

[15] to classify the images of our dataset. We leverage the InceptionV3 CNN[16]

that has been pre-trained on the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

[17]

containing 1000 classes, and then fine-tune the weights of the network to our dataset of images. To do this we remove the final softmax layer that produces outputs for the 1000 ImageNet classes. We then add our own layers as shown in Fig. 

2

that ends in a softmax output that produces probability outputs for our 12 image classes.

We first validate our classifier using ten-fold cross-validation[18]. Due to an unequal number of images in each skin lesion class, images within the dataset are augmented using label-preserving transformations[19, 20]. We augment each class used for cross-validation to 1000 images. This is to ensure the classifier is not overly biased towards one particular class, or fails to learn a sparse class. This class size was chosen to align with the ImageNet dataset[21], which contains an average of 500–1000 images within each subcategory. Specifically, images are rotated at a random angle and a horizontal and vertical flip are also applied at random. Shear up to an intensity of 0.2 and a zoom in the range 0.8 and 1.2 is applied to the image[11, 9]. Classes that had more than 1000 images were randomly sampled for cross-validation. Images are then resized to 299299 dimensions to be compatible with the input dimensions of our pre-trained CNN. The data split between the training and validation sets was such that images of the same lesion from multiple angles did not exist in both the training and validation set for any fold. This ensures that the CNN does not leverage patient-specific information when making a prediction. The cross-validation results in Table 1

show the AUCs, confidence intervals and

score for each of the 12 classes. To calculate the score, the probability threshold for each class is set to maximize the sum of the sensitivity and specificity on that class’ ROC curve. The score is then weighted in accordance to the number of positive and negative examples of that class. The average AUC and score across all classes is 0.9731 and 0.9367 respectively.

Next, we evaluate the test performance of our classifier on a held-out independent unseen dataset. The criteria for a sufficient number of images in a test class is based on the composition of images in the ILSVRC which has 50–100 images per test class. To obtain meaningful test performance values, we selected the first six classes in Fig. 1a that were sufficiently data abundant and most commonly seen in practice based on consensus by vascular anomaly specialists and dermatologists. For classes with more than 1000 images, images not sampled for cross-validation are used in the test set. For the remaining classes, 10% of the total number of images in that class were withheld from the prior cross-validation step, and included in the test set. Using the same CNN architecture as before, we train and evaluate a classifier over these six classes. The cross-validation results over these six classes are shown in Supplementary Table 1, the average AUC and score across six classes is 0.98384 and 0.9485 respectively. We show the individual receiver-operating characteristic (ROC) curves for the classifier’s performance on previously unseen test data from each of these six classes in Fig. 5. The average AUC across these six classes is 0.9889 and the average weighted score is 0.9732 (Supplementary Table 2).

We generate saliency maps in Fig. 4 for an example image in each class of our 12-class taxonomy using integrated gradients[22]. The maps confirm that the CNN places more weight on the pixels representing the lesion compared to surrounding skin when making a prediction. Additionally, we visualize the features learned at the last layer of our CNN classifier using t-SNE for our 6-class taxonomy in Supplementary Fig. 1. The projection of features onto a 2-D space shows that each class is clustered tightly and are separated from the clusters of other classes.

We next evaluate if our CNN trained on six lesion classes can be used to aid pediatricians to more accurately diagnose vascular anomalies, and thus make appropriate referrals. We presented 60 images from our test set to seven pediatricians. Only clear and visible images were selected for inclusion in this subset. On this subset of 60 test images, our classifier had an accuracy of 93.33%. Each pediatrician was asked to classify each image into one of six classes. Pediatricians achieved an average accuracy of 73.10% on this task. The pediatricians were not informed of their accuracy or of the correct labels of the images during their initial pass. They were then presented with the same set of images in a different random order, each annotated with the classifier’s predictions, and asked to classify each image. When aided with the classifier’s predictions, the average accuracy increased to 91.67%. We compare the confusion matrix across all pediatricians when they are unaided and aided with our classifier (Fig. 

5, Supplementary Fig. 2). The figure shows that pediatrician accuracy is increased across all six classes. Additionally, when aided with the classifier, pediatricians are able to achieve higher accuracies for venous malformations and atopic dermatitis than when using the classifier alone. Specifically, 5 out of 7 pediatricians classified venous malformations with a higher accuracy than the classifier, achieving 96% average accuracy compared to the classifier’s 80%. The remaining 2 pediatricians were on par with the classifier. 3 out of 7 pediatricians classified atopic dermatitis more accurately than the classifier, obtaining an average accuracy of 100% compared to the classifier’s accuracy of 90%; the remaining 4 pediatricians matched the accuracy of the classifier. This suggests that for these classes, combining the computer-aided system with pediatrician expertise can potentially have an advantage over either method alone.

Finally, we evaluate if our classifier can be deployed and executed on a smartphone in real-time (Supplementary Fig. 3). On an iPhone 7, the classifier makes real-time predictions within 32 ms.

Discussion

Early and accurate diagnosis of vascular anomalies is essential to minimize complications and ensure appropriate treatment. For many primary care physicians, diagnosis relies on identifying subtle visual clues. We present data that a CNN trained on images of vascular anomalies and other common pediatric skin lesions can enhance the diagnostic accuracy of physicians, which may improve outcomes and optimize referral patterns.

A limitation of our dataset is the imbalance of images across different classes. This is reflected in the increased prevalence of certain vascular anomalies such as infantile hemangiomas. As a result the number of images available in the test set for evaluation are relatively small for pyogenic granulomas (lobular hemangioma) and venous malformations. Acquiring images of more sparse diagnostic classes would allow us to more accurately estimate the real world performance of the classifier on these lesion types. We note however that a clinical deployment of this system may benefit from intentionally biasing the classifier to incorporate real world prevalence rates of different vascular anomalies when making a prediction. Evaluating the classifier in a larger prospectively obtained cohort may provide a more robust estimate of accuracy and potential clinical impact. Additionally, though the CNN can be deployed on a commodity smartphone, variations in the quality and setting of the photos may affect real-world classification accuracy. In particular, evaluating the classifier on images taken with multiple smartphone cameras would be needed to test how well our classifier generalizes to images obtained in primary care clinical settings. Developing a CNN to identify the location of a lesion in an uncropped image and to tolerate nonideal lighting conditions could be useful in some clinical scenarios. Finally, while visual inspection is one of the most important diagnostic tool for identifying vascular anomalies, the overall context is needed to make a clinical decision.

Given the prevalence of vascular anomalies, computer-aided diagnosis has the potential to improve health care outcomes and reduce the cost associated with delayed or incorrect referrals. It may also have particular benefit in resource-limited regions, where tertiary expertise is unavailable but smartphones are increasingly ubiquitous. For future studies, we envision that computer-aided diagnosis of vascular anomalies could not only augment the capabilities of primary care physicians, but also guide specialists in treatment of these conditions. For example, a similar classifier could inform clinicians about outcomes related to infantile hemangioma and predict response to propranolol treatment.

Methods

0.1 Datasets.

This study was approved by the Seattle Children’s Institutional Review Board. All data was de-identified in accordance with HIPAA guidelines. Our dataset is composed of clinical data from Seattle Children’s from 2002-2018, as well as the dermatology repositories, DermIS[23], DermNet[24], DermNetNZ[25], DermQuest[26] and the ISIC dermoscopic archive[27]. The images from Seattle Children’s Hospital consist of hemangiomas (infantile and congenital), pyogenic granuloma (lobular hemangioma), venous and glomuvenous malformations, capillary malformations, Sturge-Weber syndrome, spider angioma and lymphatic malformations. The images from the online repositories consist of pyogenic granuloma, atopic dermatitis, nevus, spider angioma, milia, impetigo, molluscum and tinea.

0.2 Training algorithm.

To train our dataset, we first removed the final 1000-node softmax layer of the InceptionV3 neural network, we then fine-tune the classifier with our own layers (Supplementary Fig. 2

) using the Keras framework. We add a 256-node fully-connected layer, with a ReLu activation, followed by Dropout regularization with a rate of 0.6, and finally a 6-way or 12-way softmax layer, depending on the taxonomy of vascular anomalies being classified. We used the RMSProp optimizer with a learning rate of

and a rho value of 0.9. We used the sklearn library for calculating performance measures including AUC and score.

0.3 t-SNE algorithm.

The t-SNE plot was generated using an implementation of Barnes-Hut t-SNE [28, 29] using a perplexity value of five, the algorithm was run for 1,000 iterations.

0.4 Saliency maps.

The saliency maps were generated with an implementation of integrated gradients [22, 30]. The output is smoothed using the SmoothGrad[31] algorithm to produce a sharper map.

0.5 Run-time analysis.

We timed an implementation of the CNN running in real-time on an iPhone 7. The CNN was ported to the iOS platform using Apple’s Core ML tools library which converts the CNN to an iPhone readable format.

0.6 Data availability statement.

All data necessary for interpreting the manuscript have been included. The datasets used in the current study are not publicly available but may be available from the corresponding authors on reasonable request and with permission of Seattle Children’s hospital and the University of Washington. Images from the online repositories were obtained from DermIS[23], DermNet[24], DermNetNZ[25], DermQuest[26] and the ISIC dermoscopic archive[27].

0.7 Use of human subjects.

All human subjects were practicing pediatricians and took our tests under informed consent. This study was approved as exempt by the Seattle Children’s Institutional Review Board.

Supplementary Materials

Supplementary Fig. 1. t-SNE visualization of the CNN’s weights.
Supplementary Fig. 2. Individual confusion matrices for pediatricians taking the survey.
Supplementary Fig. 3. User interface of CNN running on a smartphone.
Supplementary Table 1. Cross-validation performance over six classes.
Supplementary Table 2. score on test set of six classes.

Acknowledgments. The authors thank Jacob Sunshine, and John Thickstun for feedback on the manuscript. The authors also thank Allegro Pediatrics–Issaquah Highlands Group for participation and Eden Palmer for photography and vascular anomaly photographic archive.

Author contributions. JC designed the algorithms and conducted the analysis with technical supervision by SG; JC, SR and SG wrote the manuscript; RB and JP edited the manuscript; RB and JP recruited the pediatricians for the study; JP provided the data used in the analysis. SR conceptualized the study.

Competing interest statement. JC, SR, RB and SG have equity stakes in Edus Health, Inc., which is not related to the technology presented in this manuscript. SG is a co-founder of Jeeva Wireless, Inc. and Sound Life Sciences, Inc. RB is a consultant for SpiWay, LLC and a co-founder of EigenHealth, Inc.

References

References

  • [1] Nosher, J. L., Murillo, P. G., Liszewski, M., Gendel, V. & Gribbin, C. E. Vascular anomalies: a pictorial review of nomenclature, diagnosis and treatment. World journal of radiology 6, 677 (2014).
  • [2] Greene, A. K. et al. Risk of vascular anomalies with down syndrome. Pediatrics 121, e135–e140 (2008).
  • [3] Juern, A. M., Glick, Z. R., Drolet, B. A. & Frieden, I. J. Nevus simplex: a reconsideration of nomenclature, sites of involvement, and disease associations. Journal of the American Academy of Dermatology 63, 805–814 (2010).
  • [4] Haggstrom, A. N. et al. Prospective study of infantile hemangiomas: clinical characteristics predicting complications and treatment. Pediatrics 118, 882–887 (2006).
  • [5] Lee, J. W. & Chung, H. Y. Vascular anomalies of the head and neck: current overview. Archives of craniofacial surgery 19, 243 (2018).
  • [6] Greene, A. K., Liu, A. S., Mulliken, J. B., Chalache, K. & Fishman, S. J. Vascular anomalies in 5621 patients: guidelines for referral. Journal of pediatric surgery 46, 1784–1789 (2011).
  • [7] Levin, D. E. et al. Room for improvement: Patterns of referral misdiagnosis to a vascular anomalies center. Open Journal of Pediatrics 3, 331 (2013).
  • [8] MacFie, C. C. & Jeffery, S. L. Diagnosis of vascular skin lesions in children: an audit and review. Pediatric dermatology 25, 7–12 (2008).
  • [9] Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097–1105 (2012).
  • [10] Nixon, M. & Aguado, A. S. Feature extraction and image processing for computer vision (Academic Press, 2012).
  • [11] Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115 (2017).
  • [12] Rajpurkar, P. et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017).
  • [13] Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama 316, 2402–2410 (2016).
  • [14] Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature Biomedical Engineering 2, 158 (2018).
  • [15] Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 1345–1359 (2010).
  • [16] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2818–2826 (2016).
  • [17] Russakovsky, O. et al. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 211–252 (2015).
  • [18] Kohavi, R. et al. A study of cross-validation and bootstrap for accuracy estimation and model selection 14, 1137–1145 (1995).
  • [19] Cireşan, D., Meier, U. & Schmidhuber, J. Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745 (2012).
  • [20] Simard, P. Y., Steinkraus, D., Platt, J. C. et al. Best practices for convolutional neural networks applied to visual document analysis. .
  • [21] Deng, J. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255 (Ieee, 2009).
  • [22] Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    , 3319–3328 (JMLR. org, 2017).
  • [23] DermIS.net. https://www.dermis.net/dermisroot/en/home/index.htm (2019).
  • [24] Dermnet. www.dermnet.com/ (2019).
  • [25] Dermnet NZ. https://www.dermnetnz.org/ (2019).
  • [26] DermQuest. http://dermquest.com (2019).
  • [27] ISIC. https://www.isic-archive.com (2019).
  • [28] Van Der Maaten, L. Barnes-Hut-SNE. arXiv preprint arXiv:1301.3342 (2013).
  • [29] Python-TSNE. https://github.com/danielfrg/tsne (2019).
  • [30] deep-viz-keras. https://github.com/experiencor/deep-viz-keras (2019).
  • [31] Smilkov, D., Thorat, N., Kim, B., Viégas, F. & Wattenberg, M. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).