1 Introduction
In recent years, deep learning methods have received significant attention for computeraided diagnosis in a variety of fields of medical imaging. They outperform former methods in capability and accuracy. However, deep models trained for diagnosis of specific cases currently lack the ability to say “I don’t know” for ambiguous or unknown cases. Standard models do not provide prediction uncertainty, which is indispensable for the acceptance of deep learning in medical practice.
Existing approaches for uncertainty estimation in deep learning try to approximate a Bayesian neural network (BNN), where distributions are placed over the weights
[Gal and Ghahramani(2016)]. BNNs provide the mathematical tool to model uncertainty, but are usually associated with limiting computational cost. Gal et al. have shown, that the use of Monte Carlo sampling with dropout at training and test time acts as approximation to a Bayesian model without aggravating the complexity or the performance of the model [Gal and Ghahramani(2016), Kendall et al.(2015)Kendall, Badrinarayanan, and Cipolla]. This idea has attracted attention in medical segmentation [Roy et al.(2019)Roy, Conjeti, Navab, and Wachinger, Laves et al.(2019)Laves, Bicker, Kahrs, and Ortmaier]. Another method for uncertainty estimation is a variational inference approach, where a deep model is trained to learn the parameters of a probability distribution from which the final prediction is sampled
[Kingma and Welling(2013)]. Variational inference has been used recently for uncertainty estimation in deformable registration of brain MRI [Dalca et al.(2019)Dalca, Balakrishnan, Guttag, and Sabuncu].In this work, the aforementioned approaches for uncertainty estimation are integrated into diagnostic classifiers and compared in order to increase patient safety and acceptance for deep models in medical imaging.
2 Methods
In the following, we will briefly revise the Bayesian and variational inference approach for uncertainty estimation. Given a set of training images with corresponding labels from medical experts, we try to find a probabilistic function yielding the most likely label prediction of a test image with probability
(1) 
with parameters of the deep model . The posterior distribution in (1) is generally intractable and therefore, the integral can be approximated by summing Monte Carlo samples obtained from with dropout at test time [Gal and Ghahramani(2016)]. The mean of these samples is used as label prediction and the variance is interpreted as the uncertainty of the prediction. We train the ResNet18 image classifier on a dataset of 84,484 optical coherence tomographies showing four different retinal conditions [He et al.(2016)He, Zhang, Ren, and Sun, Kermany et al.(2018)Kermany, Goldbaum, Cai, Valentim, et al.]. Dropout with is added before the last fully connected layer (referred to as bayesian1) and before every building block of ResNet18 (referred to as bayesian2), creating a bayesian classifier. In Monte Carlo experiments, 100 forward passes are performed to get an approximation of the posterior distribution of the class labels.
In the variational inference approach, we assume a normal distribution for the posterior and replace the last fullyconnected layer of ResNet with two fullyconnected layers predicting the parameters
and of the posterior distribution (referred to as variational). The final prediction is sampled from this distribution using the reparameterization trick with . As proposed in [Kingma and Welling(2013)], we additionally try to bring the estimated posterior closer to a standard normal distribution by adding the KullbackLeibler divergence (KLD) to the overall loss function. In this case, the KLD can be solved analytically as
.After training the three approaches and a baseline ResNet18 for comparison, we investigate the uncertainties for all true and false predictions for images from the test set.
3 Results
baseline  bayesian1  bayesian2  variational  

precision  0.96  0.96  0.93  0.94 
recall  0.95  0.96  0.93  0.94 
F1 score  0.95  0.96  0.93  0.94 
Fig. 1 show boxplots (top row) and relative frequencies (bottom row) for uncertainties of correctly (true) and incorrectly (false) predicted cases from the test set. The results shown that cases in which the network predicts incorrectly correlate with a higher uncertainty. Mean uncertainty of incorrectly diagnosed cases was 4.6 (variational), 6.0 (bayesian2) and 8.7 (bayesian1) times higher than mean uncertainty of correctly diagnosed cases. Test set accuracies compared to a baseline ResNet18 are listed in Tab. 1.
4 Conclusion
Modeling of the prediction uncertainty in computeraided diagnosis with deep learning yields more reliable results and is therefore anticipated to increase patient safety. This can help to transfer such systems into clinical routine and to increase the acceptance of physicians and patients for machine learning in diagnosis. In future work, the uncertainties can be used to further increase classification accuracy.
This research has received funding from the European Union as being part of the EFRE OPhonLas project.
References
 [Dalca et al.(2019)Dalca, Balakrishnan, Guttag, and Sabuncu] Adrian V Dalca, Guha Balakrishnan, John Guttag, and Mert R Sabuncu. Unsupervised Learning of Probabilistic Diffeomorphic Registration for Images and Surfaces. arXiv preprint arXiv:1903.03545, 2019.
 [Gal and Ghahramani(2016)] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In International Conference on Machine Learning, pages 1050–1059, 2016.

[He et al.(2016)He, Zhang, Ren, and Sun]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep Residual Learning for Image Recognition.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 770–778, 2016.  [Kendall et al.(2015)Kendall, Badrinarayanan, and Cipolla] Alex Kendall, Vijay Badrinarayanan, and Roberto Cipolla. Bayesian SegNet: Model Uncertainty in Deep Convolutional EncoderDecoder Architectures for Scene Understanding. arXiv preprint arXiv:1511.02680, 2015.
 [Kermany et al.(2018)Kermany, Goldbaum, Cai, Valentim, et al.] Daniel S. Kermany, Michael Goldbaum, Wenjia Cai, Carolina C.S. Valentim, et al. Identifying Medical Diagnoses and Treatable Diseases by ImageBased Deep Learning. Cell, 172(5):1122–1131, 2018. doi: 10.1016/j.cell.2018.02.010.
 [Kingma and Welling(2013)] Diederik P Kingma and Max Welling. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.

[Laves et al.(2019)Laves, Bicker, Kahrs, and Ortmaier]
MaxHeinrich Laves, Jens Bicker, Lüder A. Kahrs, and Tobias Ortmaier.
A dataset of laryngeal endoscopic images with comparative study on convolution neural networkbased semantic segmentation.
International Journal of Computer Assisted Radiology and Surgery, 14(3):483–492, 2019. doi: 10.1007/s11548018019100.  [Roy et al.(2019)Roy, Conjeti, Navab, and Wachinger] Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, and Christian Wachinger. QuickNAT: A fully convolutional network for quick and accurate segmentation of neuroanatomy. NeuroImage, 186:713–727, 2019.
Comments
There are no comments yet.