Deep learning techniques have recently benefited inverse problems where the unknowns defining the state of a physical system and related observations are jointly available as solution-data paired samples [see, for example, 1]
. Throughout the text, we will refer to this problem as the “supervised” case. Supervised learning can be readily applied by training a deep network to map the observations to the respective solution, often leading to competitive alternatives to solvers that are purely based on a physical model for the data likelihood (e.g. PDEs) and prior (handcrafted) regularization. Unfortunately, for many inverse problems such as seismic or optoacoustic imaging, data is scarce due to acquisition costs, processing is computationally complex because of numerical simulation, and the physical parameters of interest cannot be directly verified. Furthermore, as in the seismic case, the vast diversity of geological scenarios is bound to impact the generalization capacity of the learned model. For this type of problem, supervised methods have still limited scope with respect to more traditional “unsupervised” approaches, e.g. where observations pertaining to a single unknown are available, a data model and prior are postulated, and generalization errors do not affect the results. Note that recent work has found an application for deep networks even in the unsupervised setting as a reparameterization of the unknowns and an implicit regularizing prior[deep prior, 2, 17, 5, 14, 16, 15], by constraining the solution to its range. Unless the network has been adequately pretrained, however, the deep prior approach does not offer computational advantages.
In practice, as it is often the case in seismic or medical imaging, some legacy joint data might be available for supervised learning, while we might be interested in solving a problem related to some new observations, which are expected to come from a moderate perturbation of the legacy (marginal) distribution. In this work, we are interested in combining the supervised and unsupervised settings, as described above, by exploiting the supervised result as a way to accelerate the computation of the solution for the unsupervised problem. Clearly, this is all the more relevant when we wish to quantify the uncertainty in the proposed solution.
as a way to encapsulate the joint distribution of observations/solution for an inverse problem, and the posterior distribution of the solutions given data. Recent advancements have made available invertible flows that allow analytic computation of such posterior densities. Therefore, we propose a general two-step scheme which consists of: (i) learning a generative model from many (data, solution) pairs; (ii) given some new observations, we solve for the associated posterior distribution given a data likelihood model and a prior density (even comprising the one obtained in step (i)).
2 Related work
Normalizing flow generative models are the cornerstone of our proposal, due to their ability to be trained with likelihood-based objectives, and not being subject to mode collapse. Many invertible layers and architectures are described in Dinh et al. , Dinh et al. , Kingma and Dhariwal , and Kruse et al. . A fundamental aspect for their applications to large-scale imaging problems is constant memory complexity as a function of the network depth. Examples for seismic imaging can be found in Peters et al.  and Rizzuti et al. , and for medical imaging in Putzky and Welling . In this paper, we will focus on uncertainty quantification for inverse problems, and we are therefore interested in the conditional flows described in Kruse et al. 
, as a way to capture posterior probabilities[see also 20].
Bayesian inference cast as a variational problem is a computationally attractive alternative to sampling based on Markov chain Monte Carlo methods (MCMC). With particular relevance for our work,Parno and Marzouk  formulates transport-based maps as non-Gaussian proposal distributions in the context of the Metropolis-Hastings algorithm. The aim is to accelerate MCMC by adaptively fine-tuning the proposals to the target density, as samples are iteratively produced by the chain. The idea of preconditioning MCMC in Parno and Marzouk  directly inspires the approach object of this work. Another relevant work which involve MCMC is Peherstorfer and Marzouk , where the transport maps are constructed from a low-fidelity version of the original problem, thus yielding computational advantages. The supervised step of our approach can also be replaced, in principle, by a low-fidelity problem. The method proposed in this paper, however, will not make use of MCMC.
We start this section by summarizing the uncertainty quantification method presented in Kruse et al. , in the supervised scenario where paired samples (coming from the joint unknown/data distribution) are available. We assume that an underlying physical modeling operator exists, which defines the likelihood model , , where
is a random variable representing noise. The scope is to learn a conditional normalizing flow
as a way to quantify the uncertainty of the unknown of an inverse problem, given data . Here, , and ,
are respective latent spaces. This is achieved by minimizing the Kullback-Leibler divergence between the push-forward density
and the standard normal distribution:
where is the Jacobian of . When is a conditional flow, e.g. defined by the triangular structure
conditional sampling given is tantamount to fixing the data seed , evaluating for a random Gaussian , and selecting the component. Moreover, we can analytically evaluate the approximated posterior density:
We now assume that a map as in Equation (4) has been determined, and we are given new observations , sampled from a marginal closely related to . Note that might be obtained with a different forward operator, a different noise distribution, or an out of prior distribution unknown. In particular, we assume a different likelihood model
We are interested in obtaining samples from the posterior
with prior , or even as defined in Equation (4), which corresponds to reusing the supervised posterior as the new prior. Similarly to the previous step, we can setup a variational problem
where we minimize over the set of invertible maps
After training, samples from are obtained by evaluating for .
For the problem in Equation (7), we can initialize the network randomly. However, if we expect the supervised problem (2) and the unsupervised counterpart (7) to be related, we can reuse the supervised result as a warm start for , e.g.
where is the projection on . By doing so, we can be interpreted the problem in Equation (9) as an instance of transfer learning [21, 13]. Alternatively, by analogy with the technique of preconditioning for linear systems, we can introduce the change of variable
and solve for instead of .
4 Numerical experiments
In this section, we present some synthetic examples aimed at verifying the speed up anticipated from the two step preconditioning. The first example is a low-dimensional problem where the posterior density can be calculated analytically, with which we can ascertain our solution. The second example constitutes a preliminary assessment for the type of inverse problem applications we are mostly interested in, e.g. seismic or optoacoustic imaging.
4.1 Gaussian likelihood and prior
Here, we consider unknowns with . The prior density is a normal distribution with and . Observations are with , and we consider the following likelihood model :
Mean and covariance are chosen to be and (
being the identity matrix). The forward operator
is a realization of a random matrix variable with independent entries distributed accordingly to. We trained a conditional invertible network to jointly sample from .
Let us assume now that new observations have been collected. We generated those observations according to
with , , , and same noise distribution as before , . The likelihood model for , in conjunction with the same prior as in the supervised case, defines the unsupervised problem.
4.2 Seismic images
Now we consider the denoising problem for 2D “seismic” images , which are selected from the processed 3D seismic survey reported in Veritas  and WesternGeco. . The dataset has been obtained by selecting 2D patches from the original 3D volume, which are then subsampled in order to obtain pixel images. The dataset is normalized.
Observations are obtained simply by adding noise
with and . Examples of pairs are collected in Figure 3. As in the previous examples, we consider a preliminary stage for supervised training via conditional normalizing flows.
We now turn to the unsupervised problem defined by the observation likelihood
with and . Note that a forward operator has been introduced, contrary to Equation (13). Here, is equal to , where is a compressing sensing matrix with subsampling rate. The ground truth for observations has been selected from a test set not contemplated during the supervised training phase. As a prior for , we select the posterior distribution given which has been pretrained with supervision in the previous step (see Equation (4)).
Again, comparing the loss decay during training for two different instances of the unsupervised problem in Figure 4, makes clear that considerable speed up is obtained with the warm start strategy relatively to training a randomly initialized invertible network.
Moreover, despite the relatively high number of iterations ran during training (), the network initialized from scratch does not produce a reasonable result comparable to the ground truth. This can be seen by comparing the ground truth in Figure (a)a and the conditional mean in Figure (e)e, relative to the posterior distribution obtained from training a network from scratch. The comparison with the ground truth is much more favorable with the conditional mean obtained from the warm start training, in Figure (c)c
. Pointwise standard deviations for these different training modalities can also be inspected in Figures(d)d (warm start) and (f)f
(without warm start). The discussed results above are related to the loss function depicted in Figure(a)a. Same results for a different realization of the unsupervised problem with loss function shown in Figure (b)b can be seen in Figure 6.
Comparison of the posterior distribution obtained from training a network with warm start and from scratch for the unsupervised seismic image problem. Figures (a) and (b) depict the ground truth and related observations. Figures (c) and (e) represent the respective conditional means, while (d) and (f) refer to the pointwise standard deviation. Note how the result in (c) provides a better estimation of the ground truth (a) compared to (e).
We presented a preconditioning scheme for uncertainty quantification, particularly aimed at inverse problems characterized by computationally expensive numerical simulations based on PDEs (including, for example, seismic or optoacoustic imaging). We consider the problem where legacy supervised data is available, and we want to solve for a new inverse problem given some out-of-distribution observations. The scheme takes advantage of a preliminary step where the joint distribution of solution and related observations is learned via supervised learning. This joint distribution is then employed as a way to precondition the unsupervised inverse problem. In the supervised and unsupervised case, we make use of conditional normalizing flows to ease computational complexity (fundamental for large 3D applications), and to be able to encode analytically the approximated posterior density. In this way, the posterior density obtained from the supervised problem can be reused as a new prior for the unsupervised problem.
The synthetic experiments confirm that the preconditioning scheme accelerates unsupervised training considerably. The examples here considered are encouraging for seismic or optoacoustic imaging applications, but additional challenges are expected for large scales due to the high dimensionality of the solution and observation space, and expensive wave equation solvers.
Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems 33 (12), pp. 124007. External Links: Cited by: §1.
-  (2017) Compressed sensing using generative models. External Links: Cited by: §1.
-  (2016) Density estimation using Real NVP. External Links: Cited by: §2.
-  (2014) NICE: non-linear independent components estimation. External Links: Cited by: §2.
-  (2019-12) Learned imaging with constraints and uncertainty quantification. In Neural Information Processing Systems (NeurIPS) 2019 Deep Inverse Workshop, External Links: Cited by: §1.
-  (2018) Glow: Generative Flow with Invertible 1x1 Convolutions. External Links: Cited by: §2.
-  (2019) HINT: Hierarchical Invertible Neural Transport for Density Estimation and Bayesian Inference. External Links: Cited by: §1, §2, §3.
-  (2018) Transport Map Accelerated Markov Chain Monte Carlo. SIAM/ASA Journal on Uncertainty Quantification 6 (2), pp. 645–682. External Links: Cited by: §2.
-  (2018) A transport-based multifidelity preconditioner for Markov chain Monte Carlo. External Links: Cited by: §2.
-  (2020) Fully reversible neural networks for large-scale surface and sub-surface characterization via remote sensing. arXiv preprint arXiv:2003.07474. Cited by: §2.
-  (2019) Invert to Learn to Invert. External Links: Cited by: §2.
-  (2020) Parameterizing uncertainty by deep invertible networks, an application to reservoir characterization. In SEG Technical Program Expanded Abstracts 2020, External Links: Cited by: §2.
-  (2019-11) The importance of transfer learning in seismic modeling and imaging. Geophysics 84 (6), pp. A47–A52. External Links: Cited by: §3.
-  (2020) A deep-learning based bayesian approach to seismic imaging and uncertainty quantification. In 82nd EAGE Conference and Exhibition 2020, External Links: Cited by: §1.
-  (2020) Uncertainty quantification in imaging and automatic horizon tracking—a bayesian deep-prior based approach. In SEG Technical Program Expanded Abstracts 2020, External Links: Cited by: §1.
-  (2020) Weak deep priors for seismic imaging. In SEG Technical Program Expanded Abstracts 2020, External Links: Cited by: §1.
Deep image prior.
International Journal of Computer Vision. External Links: Cited by: §1.
-  (2005) Parihaka 3D Marine Seismic Survey - Acquisition and Processing Report. Technical report Technical Report New Zealand Petroleum Report 3460, New Zealand Petroleum & Minerals, Wellington. Cited by: §4.2.
-  (2012) Parihaka 3D PSTM Final Processing Report. Technical report Technical Report New Zealand Petroleum Report 4582, New Zealand Petroleum & Minerals, Wellington. Cited by: §4.2.
-  (2019) Learning likelihoods with conditional normalizing flows. External Links: Cited by: §1, §2.
-  (2014) How transferable are features in deep neural networks?. In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, pp. 3320–3328. External Links: Cited by: §3.