Faster Uncertainty Quantification for Inverse Problems with Conditional Normalizing Flows

07/15/2020 ∙ by Ali Siahkoohi, et al. ∙ Georgia Institute of Technology 0

In inverse problems, we often have access to data consisting of paired samples (x,y)∼ p_X,Y(x,y) where y are partial observations of a physical system, and x represents the unknowns of the problem. Under these circumstances, we can employ supervised training to learn a solution x and its uncertainty from the observations y. We refer to this problem as the "supervised" case. However, the data y∼ p_Y(y) collected at one point could be distributed differently than observations y'∼ p_Y'(y'), relevant for a current set of problems. In the context of Bayesian inference, we propose a two-step scheme, which makes use of normalizing flows and joint data to train a conditional generator q_θ(x|y) to approximate the target posterior density p_X|Y(x|y). Additionally, this preliminary phase provides a density function q_θ(x|y), which can be recast as a prior for the "unsupervised" problem, e.g. when only the observations y'∼ p_Y'(y'), a likelihood model y'|x, and a prior on x' are known. We then train another invertible generator with output density q'_ϕ(x|y') specifically for y', allowing us to sample from the posterior p_X|Y'(x|y'). We present some synthetic results that demonstrate considerable training speedup when reusing the pretrained network q_θ(x|y') as a warm start or preconditioning for approximating p_X|Y'(x|y'), instead of learning from scratch. This training modality can be interpreted as an instance of transfer learning. This result is particularly relevant for large-scale inverse problems that employ expensive numerical simulations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning techniques have recently benefited inverse problems where the unknowns defining the state of a physical system and related observations are jointly available as solution-data paired samples [see, for example, 1]

. Throughout the text, we will refer to this problem as the “supervised” case. Supervised learning can be readily applied by training a deep network to map the observations to the respective solution, often leading to competitive alternatives to solvers that are purely based on a physical model for the data likelihood (e.g. PDEs) and prior (handcrafted) regularization. Unfortunately, for many inverse problems such as seismic or optoacoustic imaging, data is scarce due to acquisition costs, processing is computationally complex because of numerical simulation, and the physical parameters of interest cannot be directly verified. Furthermore, as in the seismic case, the vast diversity of geological scenarios is bound to impact the generalization capacity of the learned model. For this type of problem, supervised methods have still limited scope with respect to more traditional “unsupervised” approaches, e.g. where observations pertaining to a single unknown are available, a data model and prior are postulated, and generalization errors do not affect the results. Note that recent work has found an application for deep networks even in the unsupervised setting as a reparameterization of the unknowns and an implicit regularizing prior

[deep prior, 2, 17, 5, 14, 16, 15], by constraining the solution to its range. Unless the network has been adequately pretrained, however, the deep prior approach does not offer computational advantages.

In practice, as it is often the case in seismic or medical imaging, some legacy joint data might be available for supervised learning, while we might be interested in solving a problem related to some new observations, which are expected to come from a moderate perturbation of the legacy (marginal) distribution. In this work, we are interested in combining the supervised and unsupervised settings, as described above, by exploiting the supervised result as a way to accelerate the computation of the solution for the unsupervised problem. Clearly, this is all the more relevant when we wish to quantify the uncertainty in the proposed solution.

This paper is based on exploiting conditional normalizing flows [7, 20]

as a way to encapsulate the joint distribution of observations/solution for an inverse problem, and the posterior distribution of the solutions given data. Recent advancements have made available invertible flows that allow analytic computation of such posterior densities. Therefore, we propose a general two-step scheme which consists of: (

i) learning a generative model from many (data, solution) pairs; (ii) given some new observations, we solve for the associated posterior distribution given a data likelihood model and a prior density (even comprising the one obtained in step (i)).

2 Related work

Normalizing flow generative models are the cornerstone of our proposal, due to their ability to be trained with likelihood-based objectives, and not being subject to mode collapse. Many invertible layers and architectures are described in Dinh et al. [4], Dinh et al. [3], Kingma and Dhariwal [6], and Kruse et al. [7]. A fundamental aspect for their applications to large-scale imaging problems is constant memory complexity as a function of the network depth. Examples for seismic imaging can be found in Peters et al. [10] and Rizzuti et al. [12], and for medical imaging in Putzky and Welling [11]. In this paper, we will focus on uncertainty quantification for inverse problems, and we are therefore interested in the conditional flows described in Kruse et al. [7]

, as a way to capture posterior probabilities

[see also 20].

Bayesian inference cast as a variational problem is a computationally attractive alternative to sampling based on Markov chain Monte Carlo methods (MCMC). With particular relevance for our work,

Parno and Marzouk [8] formulates transport-based maps as non-Gaussian proposal distributions in the context of the Metropolis-Hastings algorithm. The aim is to accelerate MCMC by adaptively fine-tuning the proposals to the target density, as samples are iteratively produced by the chain. The idea of preconditioning MCMC in Parno and Marzouk [8] directly inspires the approach object of this work. Another relevant work which involve MCMC is Peherstorfer and Marzouk [9], where the transport maps are constructed from a low-fidelity version of the original problem, thus yielding computational advantages. The supervised step of our approach can also be replaced, in principle, by a low-fidelity problem. The method proposed in this paper, however, will not make use of MCMC.

3 Method

We start this section by summarizing the uncertainty quantification method presented in Kruse et al. [7], in the supervised scenario where paired samples (coming from the joint unknown/data distribution) are available. We assume that an underlying physical modeling operator exists, which defines the likelihood model , , where

is a random variable representing noise. The scope is to learn a conditional normalizing flow

(1)

as a way to quantify the uncertainty of the unknown of an inverse problem, given data . Here, , and ,

are respective latent spaces. This is achieved by minimizing the Kullback-Leibler divergence between the push-forward density

and the standard normal distribution

:

(2)

where is the Jacobian of . When is a conditional flow, e.g. defined by the triangular structure

(3)

conditional sampling given is tantamount to fixing the data seed , evaluating for a random Gaussian , and selecting the component. Moreover, we can analytically evaluate the approximated posterior density:

(4)

We now assume that a map as in Equation (4) has been determined, and we are given new observations , sampled from a marginal closely related to . Note that might be obtained with a different forward operator, a different noise distribution, or an out of prior distribution unknown. In particular, we assume a different likelihood model

(5)

We are interested in obtaining samples from the posterior

(6)

with prior , or even as defined in Equation (4), which corresponds to reusing the supervised posterior as the new prior. Similarly to the previous step, we can setup a variational problem

(7)

where we minimize over the set of invertible maps

(8)

After training, samples from are obtained by evaluating for .

For the problem in Equation (7), we can initialize the network randomly. However, if we expect the supervised problem (2) and the unsupervised counterpart (7) to be related, we can reuse the supervised result as a warm start for , e.g.

(9)

where is the projection on . By doing so, we can be interpreted the problem in Equation (9) as an instance of transfer learning [21, 13]. Alternatively, by analogy with the technique of preconditioning for linear systems, we can introduce the change of variable

(10)

and solve for instead of .

4 Numerical experiments

In this section, we present some synthetic examples aimed at verifying the speed up anticipated from the two step preconditioning. The first example is a low-dimensional problem where the posterior density can be calculated analytically, with which we can ascertain our solution. The second example constitutes a preliminary assessment for the type of inverse problem applications we are mostly interested in, e.g. seismic or optoacoustic imaging.

4.1 Gaussian likelihood and prior

Here, we consider unknowns with . The prior density is a normal distribution with and . Observations are with , and we consider the following likelihood model :

(11)

Mean and covariance are chosen to be and (

being the identity matrix). The forward operator

is a realization of a random matrix variable with independent entries distributed accordingly to

. We trained a conditional invertible network to jointly sample from .

Let us assume now that new observations have been collected. We generated those observations according to

(12)

with , , , and same noise distribution as before , . The likelihood model for , in conjunction with the same prior as in the supervised case, defines the unsupervised problem.

The uncertainty quantification results for the supervised (11) and unsupervised problem (12) are compared in Figure 1.

(a)
(b)
(c)
(d)
Figure 1: Comparison of the posterior mean (a) and covariance (b–d) obtained from supervised and unsupervised training for the Gaussian problem. Note that the results are supposed to differ, due to different prior and observation models. The analytic mean and covariance here refers to the unsupervised problem.

We study the convergence history for unsupervised training with and without warm start, as described in Equation (9). The plot in Figure 2 makes clear the computational superiority of the warm start approach.

Figure 2: Loss decay for the unsupervised Gaussian problem as a function of iterations. The comparison highlights the efficiency of the warm start strategy compared to training from scratch.

4.2 Seismic images

Now we consider the denoising problem for 2D “seismic” images , which are selected from the processed 3D seismic survey reported in Veritas [18] and WesternGeco. [19]. The dataset has been obtained by selecting 2D patches from the original 3D volume, which are then subsampled in order to obtain pixel images. The dataset is normalized.

Observations are obtained simply by adding noise

(13)

with and . Examples of pairs are collected in Figure 3. As in the previous examples, we consider a preliminary stage for supervised training via conditional normalizing flows.

(a)
(b)
(c)
(d)
(e)
(f)
Figure 3: Unknown and observation pairs for the supervised seismic image problem (first row: , second row: ).

We now turn to the unsupervised problem defined by the observation likelihood

(14)

with and . Note that a forward operator has been introduced, contrary to Equation (13). Here, is equal to , where is a compressing sensing matrix with subsampling rate. The ground truth for observations has been selected from a test set not contemplated during the supervised training phase. As a prior for , we select the posterior distribution given which has been pretrained with supervision in the previous step (see Equation (4)).

Again, comparing the loss decay during training for two different instances of the unsupervised problem in Figure 4, makes clear that considerable speed up is obtained with the warm start strategy relatively to training a randomly initialized invertible network.

(a)
(b)
Figure 4: Loss decays for two different instances of the unsupervised problems related to seismic images. As in the previous example, training with a warm start strategy evidently requires less iterations than training from scratch in order to reach the same loss value.

Moreover, despite the relatively high number of iterations ran during training (), the network initialized from scratch does not produce a reasonable result comparable to the ground truth. This can be seen by comparing the ground truth in Figure (a)a and the conditional mean in Figure (e)e, relative to the posterior distribution obtained from training a network from scratch. The comparison with the ground truth is much more favorable with the conditional mean obtained from the warm start training, in Figure (c)c

. Pointwise standard deviations for these different training modalities can also be inspected in Figures 

(d)d (warm start) and (f)f

(without warm start). The discussed results above are related to the loss function depicted in Figure 

(a)a. Same results for a different realization of the unsupervised problem with loss function shown in Figure (b)b can be seen in Figure 6.

(a)
(b)
(c)
(d)
(e)
(f)
Figure 5:

Comparison of the posterior distribution obtained from training a network with warm start and from scratch for the unsupervised seismic image problem. Figures (a) and (b) depict the ground truth and related observations. Figures (c) and (e) represent the respective conditional means, while (d) and (f) refer to the pointwise standard deviation. Note how the result in (c) provides a better estimation of the ground truth (a) compared to (e).

(a)
(b)
(c)
(d)
(e)
(f)
Figure 6: Comparison of the posterior distribution obtained from training a network with warm start and from scratch for the unsupervised seismic image problem. Figures (a) and (b) depict the ground truth and related observations. Figures (c) and (e) represent the respective conditional means, while (d) and (f) refer to the pointwise standard deviation. Note how the result in (c) provides a better estimation of the ground truth (a) compared to (e).

5 Conclusions

We presented a preconditioning scheme for uncertainty quantification, particularly aimed at inverse problems characterized by computationally expensive numerical simulations based on PDEs (including, for example, seismic or optoacoustic imaging). We consider the problem where legacy supervised data is available, and we want to solve for a new inverse problem given some out-of-distribution observations. The scheme takes advantage of a preliminary step where the joint distribution of solution and related observations is learned via supervised learning. This joint distribution is then employed as a way to precondition the unsupervised inverse problem. In the supervised and unsupervised case, we make use of conditional normalizing flows to ease computational complexity (fundamental for large 3D applications), and to be able to encode analytically the approximated posterior density. In this way, the posterior density obtained from the supervised problem can be reused as a new prior for the unsupervised problem.

The synthetic experiments confirm that the preconditioning scheme accelerates unsupervised training considerably. The examples here considered are encouraging for seismic or optoacoustic imaging applications, but additional challenges are expected for large scales due to the high dimensionality of the solution and observation space, and expensive wave equation solvers.

References

  • [1] J. Adler and O. Öktem (2017-11)

    Solving ill-posed inverse problems using iterative deep neural networks

    .
    Inverse Problems 33 (12), pp. 124007. External Links: ISSN 1361-6420, Link, Document Cited by: §1.
  • [2] A. Bora, A. Jalal, E. Price, and A. G. Dimakis (2017) Compressed sensing using generative models. External Links: 1703.03208 Cited by: §1.
  • [3] L. Dinh, J. Sohl-Dickstein, and S. Bengio (2016) Density estimation using Real NVP. External Links: 1605.08803 Cited by: §2.
  • [4] L. Dinh, D. Krueger, and Y. Bengio (2014) NICE: non-linear independent components estimation. External Links: 1410.8516 Cited by: §2.
  • [5] F. J. Herrmann, A. Siahkoohi, and G. Rizzuti (2019-12) Learned imaging with constraints and uncertainty quantification. In Neural Information Processing Systems (NeurIPS) 2019 Deep Inverse Workshop, External Links: Link Cited by: §1.
  • [6] D. P. Kingma and P. Dhariwal (2018) Glow: Generative Flow with Invertible 1x1 Convolutions. External Links: 1807.03039 Cited by: §2.
  • [7] J. Kruse, G. Detommaso, R. Scheichl, and U. Köthe (2019) HINT: Hierarchical Invertible Neural Transport for Density Estimation and Bayesian Inference. External Links: 1905.10687 Cited by: §1, §2, §3.
  • [8] M. D. Parno and Y. M. Marzouk (2018) Transport Map Accelerated Markov Chain Monte Carlo. SIAM/ASA Journal on Uncertainty Quantification 6 (2), pp. 645–682. External Links: Document Cited by: §2.
  • [9] B. Peherstorfer and Y. Marzouk (2018) A transport-based multifidelity preconditioner for Markov chain Monte Carlo. External Links: 1808.09379 Cited by: §2.
  • [10] B. Peters, E. Haber, and K. Lensink (2020) Fully reversible neural networks for large-scale surface and sub-surface characterization via remote sensing. arXiv preprint arXiv:2003.07474. Cited by: §2.
  • [11] P. Putzky and M. Welling (2019) Invert to Learn to Invert. External Links: 1911.10914 Cited by: §2.
  • [12] G. Rizzuti, A. Siahkoohi, P. A. Witte, and F. J. Herrmann (2020) Parameterizing uncertainty by deep invertible networks, an application to reservoir characterization. In SEG Technical Program Expanded Abstracts 2020, External Links: Link Cited by: §2.
  • [13] A. Siahkoohi, M. Louboutin, and F. J. Herrmann (2019-11) The importance of transfer learning in seismic modeling and imaging. Geophysics 84 (6), pp. A47–A52. External Links: Document Cited by: §3.
  • [14] A. Siahkoohi, G. Rizzuti, and F. J. Herrmann (2020) A deep-learning based bayesian approach to seismic imaging and uncertainty quantification. In 82nd EAGE Conference and Exhibition 2020, External Links: Link Cited by: §1.
  • [15] A. Siahkoohi, G. Rizzuti, and F. J. Herrmann (2020) Uncertainty quantification in imaging and automatic horizon tracking—a bayesian deep-prior based approach. In SEG Technical Program Expanded Abstracts 2020, External Links: Link Cited by: §1.
  • [16] A. Siahkoohi, G. Rizzuti, and F. J. Herrmann (2020) Weak deep priors for seismic imaging. In SEG Technical Program Expanded Abstracts 2020, External Links: Link Cited by: §1.
  • [17] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2020-03) Deep image prior.

    International Journal of Computer Vision

    .
    External Links: ISSN 1573-1405, Link, Document Cited by: §1.
  • [18] Veritas (2005) Parihaka 3D Marine Seismic Survey - Acquisition and Processing Report. Technical report Technical Report New Zealand Petroleum Report 3460, New Zealand Petroleum & Minerals, Wellington. Cited by: §4.2.
  • [19] WesternGeco. (2012) Parihaka 3D PSTM Final Processing Report. Technical report Technical Report New Zealand Petroleum Report 4582, New Zealand Petroleum & Minerals, Wellington. Cited by: §4.2.
  • [20] C. Winkler, D. Worrall, E. Hoogeboom, and M. Welling (2019) Learning likelihoods with conditional normalizing flows. External Links: 1912.00042 Cited by: §1, §2.
  • [21] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson (2014) How transferable are features in deep neural networks?. In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, pp. 3320–3328. External Links: Link Cited by: §3.