Uncertainty-aware performance assessment of optical imaging modalities with invertible neural networks

03/08/2019 ∙ by Tim J. Adler, et al. ∙ 0

Purpose: Optical imaging is evolving as a key technique for advanced sensing in the operating room. Recent research has shown that machine learning algorithms can be used to address the inverse problem of converting pixel-wise multispectral reflectance measurements to underlying tissue parameters, such as oxygenation. Assessment of the specific hardware used in conjunction with such algorithms, however, has not properly addressed the possibility that the problem may be ill-posed. Methods: We present a novel approach to the assessment of optical imaging modalities, which is sensitive to the different types of uncertainties that may occur when inferring tissue parameters. Based on the concept of invertible neural networks, our framework goes beyond point estimates and maps each multispectral measurement to a full posterior probability distribution which is capable of representing ambiguity in the solution via multiple modes. Performance metrics for a hardware setup can then be computed from the characteristics of the posteriors. Results: Application of the assessment framework to the specific use case of camera selection for physiological parameter estimation yields the following insights: (1) Estimation of tissue oxygenation from multispectral images is a well-posed problem, while (2) blood volume fraction may not be recovered without ambiguity. (3) In general, ambiguity may be reduced by increasing the number of spectral bands in the camera. Conclusion: Our method could help to optimize optical camera design in an application-specific manner.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many key challenges in the intersection of natural sciences and the life sciences are related to solving inverse problems. Here, it is assumed that a forward process maps the (hidden) parameters of interest to observations that can be measured. In the context of computer assisted interventions (CAI), for example, may refer to important physiological tissue parameters, such as the tissue oxygenation (cf. Figure 1), while may represent multispectral measurements of tissue. The problem is usually solved by regression, which gives a point estimate for the tissue parameter(s) of interest based on the camera measurements [3, 30, 29]. However, in most inverse problems the mapping between and is not injective, and two substantially different can result in the same . To recover a unique inverse, a regularizer can be added to the objective, but this approach, although commonly used, neglects the inherent ambiguity of the solution. For our application, an explicit analysis of the ambiguity is crucial to identify the most suitable camera in terms of number and characteristics of camera bands. To our knowledge, none of the existing parameter estimation methods has incorporated a sufficiently powerful uncertainty quantification to do so.

Figure 1: Oxygenation map of a porcine brain generated by 8-band multispectral imaging. (a) The full brain. (b) Cropping of the region marked in (a).
Figure 2: (a) Example of a unimodal (blue) and bimodal (orange) distribution with the same expectation value

and variance

. (b) Example of two posterior distributions as provided by our INN. The posterior of the 3-band camera (green) is multi-modal, and the MAP estimation of tissue oxygenation is associated with the wrong mode leading to a poor estimation. The posterior of the 8-band camera (orange) is uni-modal with small width of the mode and better MAP estimation.

Current approaches to uncertainty quantification in the field of deep learning, such as dropout sampling (cf. e.g.

[7, 16, 17, 24]), probabilistic inference (cf. e.g. [6, 14, 31]) or ensembles of estimators (cf. e.g. [15, 23]

), typically augment traditional point estimates with confidence intervals, but do not recover unrestricted full posteriors

. Consequently, these methods do not account for the possibility that the same observation corresponds to fundamentally different . In other words these methods would always assume that follows the blue (unimodal) distribution depicted in Figure 1(a) even if followed the orange (multimodal) distribution. The following two cases illustrate that this is a serious shortcoming when we wish to recover a physiological parameter from observations (Figure 1(b)):

  1. The solution is unique but suffers from high uncertainty. This may be represented by a uni-modal posterior

    whose single mode has large standard deviation.

  2. The problem is ill-posed in the sense that two substantially different yield the same . This must be represented by a multi-modal posterior whose individual modes may have low uncertainty.

Forcing a uni-modal representation onto the second case cannot work: It would either focus on one of the modes and miss the other, or cover both solutions under a single wide mode (similar to case 1) whose maximum is located at the average of and – a highly implausible -value for the given .

We therefore argue that an ideal method for comparative camera assessment should be able to deal with all possible types of uncertainty. We propose to move beyond point estimates by mapping an observation to a full distribution rather than a single . To this end, we solve the resulting inverse problem using the recently proposed concept of invertible neural networks (INNs) [2]. Performance measures for a hardware setup can then be computed from the number and widths of the modes of the posteriors, as illustrated in Figure 1(b).

In the following sections, we describe our approach in detail and apply it to the comparative assessment of four different camera designs given the specific use case of physiological parameter estimation from multispectral imaging data.

2 Methods

In this section, we formalize the proposed approach to performance assessment in a generic manner and apply it to the specific use case of camera selection for multispectral image analysis.

Generally speaking, we assume that the method to be assessed involves a hardware setup (e. g. a multispectral camera) that is used to solve an inverse problem with a well-known forward process , such as the mapping of tissue oxygenation to the pixel-wise measurement of a multispectral camera. We further assume that we have access to a data set composed of tuples , with . Typically can be generated by means of Monte Carlo simulation, as in [13, 29, 30] assuming the (virtual) hardware setup . Finally, we represent the regressor as an invertible neural network, as detailed in Section 2.2.

Our approach to performance assessment involves the following steps: (1) Training the regressor on using

for hyperparameter tuning. (2) Applying

to , to get a target distribution for each in the test data set. (3) Extracting the modes for each

. (4) Computing descriptive statistics over the number and widths of the modes to quantify the uncertainty of the regressor. Different hardware setups can then be compared using metrics that consider not only the accuracy but also the uncertainty characteristics of the regressor. The following paragraphs instantiate this approach in the specific context of camera selection for intra-operative physiological parameter estimation.

2.1 Data generation for performance assessment

We apply Monte Carlo methods to generate tuples of physiological parameters and corresponding pixel-wise measurements . The method is based on previous work [30] and briefly revisited here.

Tissue is assumed to be composed of three infinitely wide layers. Each layer is defined by the following tissue parameters: blood volume fraction , reduced scattering coefficient at 500nm , scattering power , anisotropy , refractive index and layer thickness . Based on values for hemoglobin extinction coefficients and from literature [11], absorption and scattering coefficients and have been determined for use in the MC simulation framework. A Graphics Processing Unit (GPU) accelerated version [1] of the Monte Carlo Multi-Layered (MCML) simulation framework [26] was chosen to generate spectral reflectances. The spectral reflectances as determined by the MC simulation can be transformed to the reflectance measurement at band of a given camera by:


Here, the camera is characterized by , the th filter response, , the relative irradiance of the light source and , which represents other parameters of the optical system like camera quantum efficiency or transmission of the optical elements.

2.2 Invertible Neural Networks (INNs) for physiological parameter estimation

Figure 3: Schematic view of the network architecture applied in this paper. Blue layers correspond to invertible layers and orange layers correspond to permutation layers. The

denote the loss functions used.

Basic principle

INNs have been proposed recently as a new method to recover a posterior distribution from an observation [2]. The network takes the form of a deterministic function :

where denotes the trainable parameters and are the latent variables carrying the uncertainty of the reconstruction of given

. Sampling the latent variables according to the normal distribution

yields an approximation of .

Application to physiological parameter estimation

In the context of physiological parameter estimation, we hypothesize that observing a spectrum is generally not sufficient to recover the underlying tissue parameter(s) . Intuitively speaking, the purpose of the latent variables is to capture the information necessary to recover that is not already captured by . To recover a physiological parameter from a previously unseen spectrum , we repeatedly draw samples from the latent space to obtain samples () that we pass through the network. The corresponding set of physiological parameters yields the posterior . Due to the invertible architecture, the network simultaneously learns (1) the forward model - i. e., how to convert tissue parameters to spectral reflectances as measured by a camera - and (2) how to recover a posterior distribution of tissue parameters corresponding to an observation .

Network architecture

The network architecture applied in this work has been adapted from [2] and can be found in Figure 3. It relies on four invertible affine coupling blocks [5], each of which is followed by a permutation layer, leading to an eight layer network in total. The purpose of the permutation layer is to improve the mixing of the different input and output channels. It adds no additional weights to the network. At initialization, a randomly chosen permutation between the input and output channels is fixed permanently. We assume that each physiological parameter has its own uncertainty associated. Hence, we choose in this study.

Loss functions

Four loss terms are used to train the network (cf. Figure 3):

forward loss ():

In the forward direction, we use an loss on the predicted reflectances and the true reflectances to enforce good estimations for the forward process.

MMD forward loss ():

We apply a Maximum Mean Discrepancy (MMD) loss on the latent space estimation . MMD losses distinguish between distributions [8]. Here, we compare the distribution of the predicted latent variables to latent variables sampled from the desired distribution .

backward loss ():

We use the estimations and from the forward pass and perturb both quantities with additive Gaussian noise. The resulting output

with zero padding

is compared to with zero padding via the loss . This serves as a form of regularization, smoothing the latent space and ensuring that no critical information is hidden in low-amplitude structures in the outputs.

MMD backward loss ():

We compute a reverse pass through the net with reflectances from the training set and latent variables from . The output is then passed to an MMD loss , which compares it to the distribution given by the training samples . As previous work [2] indicates that , and are the only tissue parameters that can potentially be recovered from multispectral measurements, we decided to only feed these slices of into instead of the whole prediction.

Hyperparameter optimization

We use the training data set to perform the parameter optimization of the network and the validation data set to prevent overfitting and for hyperparameter tuning. Particularly, we use the validation data to calibrate the width of the posterior distributions. As suggested in [20], the purpose of the calibration is that for every sample, the -confidence interval () of the posterior contains the ground truth value in of the cases. In other words, for each value of exactly a fraction of the ground truth values shall be inliers of the corresponding -confidence interval. We optimize the parameters using the validation set to enforce this behavior as best as possible.

2.3 Performance assessment

We quantify the uncertainty of an inference based on two key parameters: The presence of multiple modes and the width of the posterior.

Given samples following the posterior

our approach to automatic mode detection relies on computing a kernel density estimation for

which has the advantage of being easily sampled. This then allows us to compute the corresponding relative maxima of

. A posterior is classified as

multi-modal, if its standard deviation is less than half of the prior’s standard deviation, our algorithm finds more than one relative maximum, and these maxima are further than a certain threshold apart (: , : ). Furthermore, maxima whose intensity is less than 80% of the main (i. e. highest) maximum are ignored. All remaining posteriors are classified as uni-modal.

To assess the performance of a camera, the INN is applied to , and the automatic mode detection (Section 2.3) is run on each posterior . Next, the following metrics are computed:

  • Percentage of multiple modes (MM): The percentage of multi-modal posterior distributions. We do this as a means to judge how well-posed the inversion is for the different cameras.

  • Root-mean-square error (RMSE): We utilize maximum a posteriori probability (MAP) estimate as a predictor for the physiological parameters and give the root-mean-square error of these estimations against the ground truth.

  • 68% confidence interval width (W): We report the median interval width of the 68% confidence interval as a measure of the width of the posterior distributions.

3 Experiments and Results

The purpose of the experiments was to confirm the realism of our simulation pipeline (Section 3.1) and to apply our setup to the task of comparative camera assessment (Section 3.2).

3.1 Realism of Simulation Pipeline

The simulation pipeline applied for comparative camera assessment features two potential sources of error: (1) Errors in the conversion of the simulated high resolution spectrum to multispectral measurements (sec. 3.1.1) and (2) wrong model assumptions in the generic tissue model and hence errors in the simulated spectra (sec. 3.1.2). We will address both issues in this order in the following paragraphs.

3.1.1 Virtual Camera

The realism of the simulated data relies crucially on the validity of our virtual cameras. In order to explore this, we measured color tiles (X-Rite ColorChecker®classic, Grand Rapids, MI, USA) which have a well defined spectrum using a HR2000+ spectrometer (Ocean Optics, Largo, FL, USA) and a Pixelteq SpectroCam™, which is a 8 band multispectral camera. Using the filter response functions of the SpectroCam, we transformed the high resolution spectrum to a virtual SpectroCam spectrum. To perform this experiment we used three color tiles (blue, green and red) and averaged five SpectroCam measurements. The measured intensities were normalized. As shown in Figure 4, the simulated data is in very close agreement with the real measurements.

Figure 4: From left to right measurements of a blue, green and red color tile. The green line depicts the measurement by the actual camera (Pixelteq SpectroCam™). The blue line depicts the measurements of the virtual SpectroCam™ generated from the spectrometer measurements.

3.1.2 Tissue Model

Due to the lack of a reliable gold standard method for measuring optical tissue properties in vivo, validation of the tissue model is not straightforward. Previous work has addressed this issue by comparing real measurements of tissue with simulated spectra [30]. If the accuracy of the virtual camera used can be assumed to be acceptable, deviations between real and simulated data can primarily be attributed to differences in the tissue composition. The tissue model applied in this study has been validated in a previous publication [30] using multispectral data from several different porcine abdominal organs. To confirm these findings we additionally acquired measurements from a porcine brain and a human kidney. The brain was measured using the same SpectroCam as in Section 3.1.1

. For the human kidney, we used a 16-band camera. We performed a principal component analysis (PCA) on the data generated by our tissue model (adapted to the appropriate camera) and a kernel density estimation (KDE) on the first two principal components. Afterwards, we projected the measured data on those same components. The result can be found in Figure 

5. Clearly, all the organ data points lie within the distribution of the simulated data of our tissue model.

Figure 5: Projection of measured organ data on the first two principal components (pc) of the simulated data (contour plot). (a) Measurements of two porcine brains with an 8-band camera. (b) Measurement of a human kidney with a 16-band camera.

3.2 Comparative Camera Assessment

The main purpose of our experiments was to evaluate our assessment framework in the specific context of multispectral camera selection for physiological parameters estimation.

3.2.1 Experimental Data

We applied the Monte Carlo-based method described in Section 2.1 to generate 20,000 data points representing spectral reflectances using the tissue parameter distributions summarized in Table 1. If we give a range for a parameter this parameter is sampled uniformly from this range. If we give a value with standard deviation this parameter is sampled according to a normal distribution with this expectation value and standard deviation. For each camera setup investigated here, these reflectances were converted into (simulated) camera measurements considering the optical properties of the setup. For each setup, we reserved 70% of the data for training (), 5% for hyperparameter tuning () and 25% for performance assessment (). To test our assessment framework, we assessed three camera designs that have been applied in previous work [12, 19, 29] in a comparative manner. To obtain a lower bound on the achievable uncertainty, we complemented these realistic cameras by a virtual camera with nearly optimal design. The cameras are characterized by the following filter responses (cf. Figure 6):

1 0-10 0-100 1.286 0.8-0.95 1.33 0.06-0.1
2 0-10 0-100 1.286 0.8-0.95 1.36 0.06-0.085
3 0-10 0-100 1.286 0.8-0.95 1.38 0.04-0.06
framework: MCML[1], photons per simulation
wavelength range : 450- (stepsize=
Table 1: The physiological parameter ranges used for simulating the desired tissue model.
Figure 6: The filter response functions for the four cameras considered in this study.

3-band camera optimized for medical imaging use, as described in [12].


3-band camera whose bands’ centers coincide with the standard RGB bands, as described in [19].


Pixelteq SpectroCam, a 8-band-camera optimized for medical imaging use, as described in [29]. This is the same camera as used in the experiments in Section 3.1.


As a close to optimal camera, we used a camera with a filter response featuring a (unrealistically) narrow band every . As our experimental data is based on presimulated data in the range of to , this leads to a ‘27 band camera’.

The remaining parameters and were set to 1 for all cameras.

3.2.2 Results

Figure 7 provides representative examples for the posteriors generated by our INN. The calibration errors for the four different cameras are presented in Figure 8 for the physiological parameters tissue oxygenation () and blood volume fraction (). We see that the calibration curves closely follow the identity. In the case of , the 3-med camera is ‘underconfident’ for larger values which would make estimations based on the confidence intervals in this range less reliable.

Figure 7: Examples of INN output. (a) Uni-modal posterior with high width of the 68% symmetric confidence interval (W), (b) uni-modal posterior with low W, (c) multi-modal posterior with two modes in close proximity.
Figure 8: Calibration curves for tissue oxygenation (; left) and blood volume fraction (; right)
MM [%] RMSE [pp] W [pp] MM [%] RMSE [pp] W [pp]
27-equi 0.3 2.3 2.3 (0%) 0.1 1.6 3.4 (67%)
8-med 0.5 2.9 4.0 (0%) 0.0 1.7 3.5 (62%)
3-nRGB 9.3 4.8 5.8 (0%) 0.0 1.7 3.1 (64%)
3-med 3.6 5.7 8.6 (0%) 0.0 2.4 5.3 (99%)
Table 2: Results for the comparative camera assessment. MM: Percentage of multi-modal posteriors; RMSE: Root-mean-square error; W: Median width of the 68% symmetric confidence interval; pp: percentage points; The value in brackets denotes the percentage of samples with , where represents the standard deviation of corresponding the prior distribution. Note that a prerequisite for a distribution to be classified as multi-modal is .

Table 2 shows the performance of the four different cameras using the metrics presented in Section 2.3. All computations were performed on the test set . As expected, the scores generally improve with an increasing number of spectral bands. An interesting observation is that the 3-band camera designed for medical use (3-med) has a higher RMSE compared to the camera whose design was inspired by standard RGB cameras (3-nRGB), yet, it features a substantially reduced number of multiple mode posteriors (3.6% vs 9.3%).

For all cameras except the 3-nRGB, there are only few multiple mode posteriors for reconstruction. Figure 8(a) shows the estimations of the 27-equi and the 3-med

camera which show generally good reconstructive performance and the possibility of outlier detection via the width of the posteriors.

In contrast, our results suggest that cannot be recovered from any of the cameras with high certainty. In fact, the percentage of samples with , (where represents the standard deviation of the corresponding the prior) is greater than 50% for all four cameras. The poor performance is illustrated in Figure 8(b). We see that the 27-equi camera still performs better than the 3-med camera, but none of them show good performance for high values. This general trend is also true for the other two cameras. Note that since most posteriors were even wider than the priors, they did not qualify as a candidate for the multiple mode mode detection algorithm (cf. Section 2.3) explaining the low MM.

Furthermore, although W seems reasonable in absolute terms, comparing it to twice the standard deviation of the prior distribution reveals that the median width of the posteriors goes as high as 93% for the 3-med camera, indicating that is effectively unrecoverable. For the values range from 4% in the 27-equi case to 15% in the 4-med case.

Figure 9: Worst (3-med; left) and best (27-equi; right) cases for and estimation according to the RMSE. The hue represents the width of the 68% confidence interval (c. f. Table 2). We see that both cameras have difficulties to predict higher values, but encode this in the width of the posterior. The blue line depicts the identity mapping.

4 Discussion

Meaningful performance assessment and benchmarking are crucial for advancing research and practice. Several publications, however (cf. e. g. [18]), suggest that the metrics chosen are not always well-suited for a specific assessment goal. In the context of multispectral intra-operative imaging, for example, camera assessment has typically been restricted to determining descriptive statistics on error metrics that quantify the difference between the estimations of an algorithm and reference (gold standard) results [25, 29, 30]. An advantage of this approach is that the error metrics are straightforward to compute and interpret. On the other hand, such performance measures suffer from the fact that they do not reveal important insights with respect to why methods perform poorly. In particular, they do not account for the different types of uncertainties that may occur when recovering tissue parameters from camera measurements. An interesting practical example is the 3-band camera designed for medical use [12] and investigated here. While it features a higher RMSE compared to a 3-band camera based on the standard RGB design, recovery of tissue parameters is substantially less ambiguous, as indicated by the reduced number of multiple modes.

To address the issues related to commonly applied approaches to camera design, selection and performance assessment, we present a novel approach to camera assessment which provides the following key advantages compared to previously proposed methods:

  1. Extended scope: The topic of camera design is closely linked to that of band selection [28]. To our knowledge, however, none of the approaches proposed in this field addresses the potential inherent ambiguity associated with the recovery of physiological parameters. To overcome this bottleneck, we propose moving beyond point estimates and mapping measurements to a full posterior probability distribution. Analysis of the posteriors not only provides us with a means for quantifying the uncertainty related to a specific measurement but also allows for a fundamental theoretical analysis about which tissue properties can in principle be recovered with the present camera.

  2. No need for acquisition of real data: Many approaches to band/camera selection rely on acquisition of real data [9, 10, 21, 22, 27]. Yet, acquisition of real data for a given application is often impractical due to budget constraints (no money to purchase a whole range of cameras) or ethical issues. We address this issue by performing the comparative assessment in silico. Experiments with a whole range of porcine and human organs confirm the realism of our simulation framework.

The above computations show that these networks can compute the same error metrics as before (e. g. RMSE) while having the potential for finer differentiation through additional metrics (e. g. number of modes or width of posterior).

While it is straightforward to compute the widths of the posteriors, fully-automatic multiple mode detection is not trivial due to the many parameters involved. For example, the posteriors are only implicitly given by a number of samples generated according to a latent space sample. This fact alone introduces statistical fluctuations into the estimated posterior. A kernel density estimation can smoothen out these effects, but at the cost of introducing a bandwidth parameter with a high impact on the number of resolved maxima. In addition, outliers must be handled in order to avoid faulty signals at the boundary of the posterior.

The calibration of our models suggest that while the confidence of our posteriors is already good, there is still room for improvement. In particular, the calibration of for the 3-med camera is off for larger confidences. In future studies which aim at finer differentiation between the observed cameras this would have to be remedied. We are confident that this can be achieved keeping in mind the convincing results for the other three cameras.

Another obstacle which learned methods have to sidestep are the so called out of distribution samples. The performance of our algorithm can only be guaranteed on data that is similar to the training data. In general, this problem is difficult to tackle. In our case, the PCA projections of the organ measurements show exemplary that the spectra of many interesting objects, like internal organs, are in fact in our training distribution.

The color tile experiments together with the measured organ spectra suggest the validity of our simulation framework. A natural next step would be to test the performance of our method on real data. To achieve this, there remains a key challenge: real data will always be subject to noise which needs to be handled adequately by our algorithm. One approach would be to average the spectra either by using a higher integration time or by averaging multiple measurements. However, if there are time constraints, for example induced by organ movement, there are limits to the amount of averaging possible. Another approach would be to incorporate a realistic noise model in the simulation framework to account for it during training. This would circumvent the time constraint as the evaluation time of the network trained with this new data set would not change.

Another interesting direction for future work is to apply the framework to additional cameras that are widely used in a clinical context (e. g. RGB or narrow band cameras). We expect some obstacles with regard to the extension to 2-band cameras as there is just very little information left for a multi parameter reconstruction. Additionally, these cameras would need a larger range of simulated wavelengths compared to the data set that we based our work on [29]. However, extending the framework to these ranges should be straight forward.

Additionally, while this study focused on intra-operative optical imaging, the concept of performance assessment using INNs could easily be transferred to other fields of research. Clearly, any imaging modality with pixelwise spectral information is a prime candidate. For larger image context, the INNs are still in active development. Because of their peculiar structure, the hidden layer size is the same as the input and output dimension leading to very large networks when images are to be processed as a whole. One example of an imaging modality where it might be fruitful to apply our INN method is the field of quantitative photoacoustic imaging (qPAI). It has been shown before that qPAI is an ill-posed inverse problem in theory [4]. However, to the best of our knowledge an in silico or even in vivo analysis of the practical implications of this non-uniqueness has not been conducted. The ability to detect ambiguous reconstructions of physiological parameters seems like a promising candidate to close this gap.

In conclusion, we have presented a novel method for performance assessment of optical cameras bearing the potential to measure the well-posedness of the inverse problem. Future work should focus on the evaluation steps necessary to fully harness the power of the computed posterior distributions. In particular, robust mode detection algorithms seem like a fruitful area for further investigation in order to quantify the uniqueness of the reconstruction.


  • [1] Alerstam, E., Lo, W.C.Y., Han, T.D., Rose, J., Andersson-Engels, S., Lilge, L.: Next-generation acceleration and code optimization for light transport in turbid media using gpus. Biomedical optics express 1(2), 658–675 (2010)
  • [2] Ardizzone, L., Kruse, J., Rother, C., Köthe, U.: Analyzing inverse problems with invertible neural networks. In: International Conference on Learning Representations (2019). URL https://openreview.net/forum?id=rJed6j0cKX
  • [3] Clancy, N.T., Arya, S., Stoyanov, D., Singh, M., Hanna, G.B., Elson, D.S.: Intraoperative measurement of bowel oxygen saturation using a multispectral imaging laparoscope. Biomedical optics express 6(10), 4179–4190 (2015)
  • [4] Cox, B., Laufer, J., Beard, P.: The challenges for quantitative photoacoustic imaging. In: Photons Plus Ultrasound: Imaging and Sensing 2009, vol. 7177, p. 717713. International Society for Optics and Photonics (2009)
  • [5] Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real nvp. arXiv preprint arXiv:1605.08803 (2016)
  • [6] Feindt, M.: A Neural Bayesian Estimator for Conditional Probability Densities. arXiv:physics/0402093 (2004)
  • [7] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learning, pp. 1050–1059 (2016)
  • [8] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. Journal of Machine Learning Research 13(Mar), 723–773 (2012)
  • [9] Gu, X., Han, Z., Yao, L., Zhong, Y., Shi, Q., Fu, Y., Liu, C., Wang, X., Xie, T.: Image enhancement based on in vivo hyperspectral gastroscopic images: a case study. Journal of Biomedical Optics 21(10), 101412 (2016). DOI 10.1117/1.JBO.21.10.101412. URL http://biomedicaloptics.spiedigitallibrary.org/article.aspx?doi=10.1117/1.JBO.21.10.101412
  • [10] Han, Z., Zhang, A., Wang, X., Sun, Z., Wang, M.D., Xie, T.: In vivo use of hyperspectral imaging to develop a noncontact endoscopic diagnosis support system for malignant colorectal tumors. Journal of Biomedical Optics 21(1), 016001–016001 (2016). URL http://biomedicaloptics.spiedigitallibrary.org/article.aspx?articleid=2481122
  • [11] Jacques, S.L.: Optical properties of biological tissues: a review. Physics in medicine and biology 58(11), R37 (2013)
  • [12] Kaneko, K., Yamaguchi, H., Saito, T., Yano, T., Oono, Y., Ikematsu, H., Nomura, S., Sato, A., Kojima, M., Esumi, H., Ochiai, A.: Hypoxia imaging endoscopy equipped with laser light source from preclinical live animal study to first-in-human subject research. PloS one 9(6), e99055 (2014)
  • [13] Kirchner, T., Gröhl, J., Maier-Hein, L.: Context encoding enables machine learning-based quantitative photoacoustics. Journal of Biomedical Optics 23(5), 056008 (2018). DOI 10.1117/1.JBO.23.5.056008
  • [14] Kohl, S.A., Romera-Paredes, B., Meyer, C., De Fauw, J., Ledsam, J.R., Maier-Hein, K.H., Eslami, S., Rezende, D.J., Ronneberger, O.: A probabilistic u-net for segmentation of ambiguous images. arXiv preprint arXiv:1806.05034 (2018)
  • [15] Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, pp. 6402–6413. Curran Associates, Inc. (2017)
  • [16] Leibig, C., Allken, V., Ayhan, M.S., Berens, P., Wahl, S.: Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports 7(1), 17816 (2017). DOI 10.1038/s41598-017-17876-z
  • [17] Li, Y., Gal, Y.: Dropout Inference in Bayesian Neural Networks with Alpha-divergences. arXiv:1703.02914 [cs, stat] (2017)
  • [18] Maier-Hein*, L., Eisenmann*, M., Reinke, A., Onogur, S., Stankovic, M., Scholz, P., Arbel, T., Bogunovic, H., Bradley, A.P., Carass, A., Feldmann, C., Frangi, A.F., Full, P.M., van Ginneken, B., Hanbury, A., Honauer, K., Kozubek, M., Landman, B.A., März, K., Maier, O., Maier-Hein, K., Menze, B.H., Müller, H., Neher, P.F., Niessen, W., Rajpoot, N., Sharp, G.C., Sirinukunwattana, K., Speidel, S., Stock, C., Stoyanov, D., Taha, A.A., van der Sommen, F., Wang, C.W., Weber, M.A., Zheng, G., Jannin*, P., Kopp-Schneider*, A.: Is the winner really the best? a critical analysis of common research practice in biomedical image analysis competitions (2018)
  • [19]

    Moccia, S., Wirkert, S.J., Kenngott, H., Vemuri, A.S., Apitz, M., Mayer, B., De Momi, E., Mattos, L.S., Maier-Hein, L.: Uncertainty-aware organ classification for surgical data science applications in laparoscopy.

    IEEE Transactions on Biomedical Engineering (2018)
  • [20]

    Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning.

    In: Proceedings of the 22nd international conference on Machine learning, pp. 625–632. ACM (2005)
  • [21] Nouri, D., Lucas, Y., Treuillet, S.: Efficient tissue discrimination during surgical interventions using hyperspectral imaging. In: International Conference on Information Processing in Computer-Assisted Interventions, pp. 266–275. Springer (2014). URL http://link.springer.com/chapter/10.1007/978-3-319-07521-1_28
  • [22] Nouri, D., Lucas, Y., Treuillet, S.: Hyperspectral interventional imaging for enhanced tissue visualization and discrimination combining band selection methods. International Journal of Computer Assisted Radiology and Surgery 11(12), 2185–2197 (2016). DOI 10.1007/s11548-016-1449-5. URL http://link.springer.com/10.1007/s11548-016-1449-5
  • [23] Smith, L., Gal, Y.: Understanding Measures of Uncertainty for Adversarial Example Detection. arXiv:1803.08533 [cs, stat] (2018). ArXiv: 1803.08533
  • [24] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1), 1929–1958 (2014)
  • [25] Waibel, D., Gröhl, J., Isensee, F., Kirchner, T., Maier-Hein, K., Maier-Hein, L.: Reconstruction of initial pressure from limited view photoacoustic images using deep learning. In: Photons Plus Ultrasound: Imaging and Sensing 2018, vol. 10494, p. 104942S. International Society for Optics and Photonics (2018)
  • [26] Wang, L., Jacques, S.L., Zheng, L.: Mcml—monte carlo modeling of light transport in multi-layered tissues. Computer methods and programs in biomedicine 47(2), 131–146 (1995)
  • [27] Wirkert, S.J., Clancy, N.T., Stoyanov, D., Arya, S., Hanna, G.B., Schlemmer, H.P., Sauer, P., Elson, D.S., Maier-Hein, L.: Endoscopic Sheffield Index for Unsupervised In Vivo Spectral Band Selection. In: X. Luo, T. Reichl, D. Mirota, T. Soper (eds.) Computer-Assisted and Robotic Endoscopy, vol. 8899, pp. 110–120. Springer International Publishing, Cham (2014). URL http://www.springerprofessional.de/011—endoscopic-sheffield-index-for-unsupervised-in-vivo-spectral-band-selection/5457688.html
  • [28] Wirkert, S.J., Isensee, F., Vemuri, A.S., Maier-Hein, K., Fei, B., Maier-Hein, L.: Domain and task specific multispectral band selection (conference presentation). In: Design and Quality for Biomedical Technologies XI, p. nil (2018). DOI 10.1117/12.2287824. URL https://doi.org/10.1117/12.2287824
  • [29]

    Wirkert, S.J., Kenngott, H., Mayer, B., Mietkowski, P., Wagner, M., Sauer, P., Clancy, N.T., Elson, D.S., Maier-Hein, L.: Robust near real-time estimation of physiological parameters from megapixel multispectral images with inverse Monte Carlo and random forest regression.

    International journal of computer assisted radiology and surgery 11(6), 909–917 (2016)
  • [30] Wirkert, S.J., Vemuri, A.S., Kenngott, H.G., Moccia, S., Götz, M., Mayer, B.F., Maier-Hein, K.H., Elson, D.S., Maier-Hein, L.: Physiological parameter estimation from multispectral images unleashed. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 134–141. Springer (2017)
  • [31] Zhu, Y., Zabaras, N.: Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification. Journal of Computational Physics 366, 415–447 (2018). DOI 10.1016/j.jcp.2018.04.018