1 Introduction
A classical probabilistic setup for fullwaveform inversion [FWI, 30] starts from the following assumptions on the data likelihood:
(1) 
Here, represents seismic data, is the forward modeling map, collects the unknown medium parameters of interest, and is random noise. We are assuming that and are independent (knowing ), and
is normally distributed according to
. Equation (1) then defines the likelihood density functionIn the Bayesian framework, the second ingredient is a prior distribution on the model space
(2) 
Since for seismic experiments we do not have direct access to the Earth’s interior (except for localized information in the form of well logs), the prior is typically hand crafted based on, e.g., Thikhonov or total variation regularization [see, for example, 6, 24]. Alternatively, when this type of information is available, a prior can be implicitly encoded via a deep network [21].
The posterior distribution, given data , is readily obtained from the Bayes’ rule:
(3) 
where
. Sampling from the posterior probability (
3) is the goal of uncertainty quantification (UQ). Besides the usual maximumaposteriori estimator (MAP)
(4) 
being able to sample from the posterior allows to approximate the conditional mean and pointwise standard deviation:
(5)  
A great deal of research around UQ is devoted to Monte Carlo sampling [26]
, especially through Markov chains (MCMC) when dealing with highdimensional problems like FWI
[see, e.g., 18, 2, 19, 20, 27, 5, 23, 34, 11, 31, 33]. A Markov chain is a stochastic process that describes the evolution of the model distribution, and is designed to match the target distribution at steady state. A particularly egregious example is offered by the socalled Langevin dynamics [22]:(6) 
which is akin to stochastic gradient descent where the update direction is perturbed by random noise
. Under some technical assumptions on the steplength decaying [32], collecting ’s as in Equation (6) is equivalent to sample from the posterior distribution (after an initial “burnin” phase). While gradientbased sampling methods such as Langevin dynamics are gaining popularity in machine learning, we must face the computational challenge given by the evaluation and differentiation of the seismic modeling map involved in Equation (
6), which needs be repeated for multiple sources thus resulting in an expensive scheme. Nevertheless, source encoding and network reparameterization could alleviate these issues [10, 28, 29].In this paper, we take an alternative route to UQ by employing modelspace valued deep networks defined on a latent space endowed with an easytosample distribution (e.g. normal), as a way to mimic random sampling from the posterior (when properly trained). This is made possible by a special class of invertible networks for which the output density function can be computed analytically [4, 14, 17, 16]. As such, these networks encapsulate a new prior, which can be used for subsequent tasks (as, for example, a similar imaging problem).
2 Method
In this section, we discuss the theoretical foundations of the proposed method for UQ via generative networks. We start by discussing a wellknown notion of discrepancy between probabilities, the KullbackLeibler divergence. Our program is suggested in Kruse et al. [16], and entails the minimization of the posterior divergence with a known distribution pushed forward by an invertible mapping. The resulting optimization problem is further restricted over a special class of invertible networks, for which the logdeterminant of the Jacobian can be computed.
The KullbackLeibler (KL) divergence of two probabilities defined on the model space , with density and respectively, is given by
(7) 
Let us fix, now, a normally distributed random variable
(8) 
taking values on a latent space , and a deterministic map
(9) 
Selecting random samples and feeding it to generates a random variable with values in , which is referred to as the pushforward of through . The pushforward density will be denoted by
(10) 
Our goal is the solution of the following minimization problem:
(11) 
Note that the KL divergence is notoriously not symmetric, and the order of the probabilities appearing in Equation (11) is designed. A more convenient form of Equation (11) is
(12) 
where
(13) 
denotes the entropy of
. The loss function in Equation (
12) ensures that the outputs of adhere to the data without producing mode collapse (e.g. , a lowentropy configuration).While the first term in the KL divergence can be approximated by sample averaging, i.e. , calculating the entropy of a pushforward distribution is not trivial [see, for instance, 12]. In general, we typically can only evaluate without a direct way to explicitly compute its density, as in Equation (10), for a given . A popular—but computationally expensive—workaround is based on generative adversarial networks [GAN, 7], where the discriminator is in fact related to the generator pushforward density [3]
. Other approaches involve variational autoencoders
[15]and expectationmaximization techniques
[9]. Nevertheless, if we focus on the specialized class of invertible mappings (whose existence requires that as manifolds), we obtain(14) 
thanks to the changeofvariable formula. Here, is the Jacobian of with respect to the input .
Invertible networks are an enticing family of choice for
. This is especially true for sizable inverse problems, since they do not require storing the input before every activation functions (such as ReLU) and the memory overhead can be kept constant
[25]. Furthermore, recent developments has lead to architectures that allows the computation (and differentiation) of the logdeterminants that appears in Equation (14). As notable examples, we mention nonvolumepreserving networks [NVP, 4], generative flows with invertible 1by1 convolutions [Glow, 14], and hierarchically invertible neural transport [HINT, 16]. These examples share the usage of special bijective layers whose Jacobian has a triangular structure [KnotheRosenblatt transformations, 23]. Hyperbolic networks [17] are another instance of invertible maps, although volume preserving with null logdeterminants. An invertible version of classical residual networks (resnets) is also offered in Behrmann et al. [1], which however requires an implicit estimation of the logdeterminant. Yet another example of invertible architecture, though not allowing logdeterminant calculation, is discussed in Putzky and Welling [25] for inverse problem applications.By restricting the problem in Equation (11) to the family of suitable invertible networks just described, we end up with the following optimization:
(15) 
where the unknowns represent the parameters (such as weights and biases) of the invertible network
(16) 
We employ a stochastic gradient approach [ADAM, 13] with random selection of normally distributed minibatches of ’s. The expectation in Equation (15) is then replaced by MonteCarlo averaging.
The solution of problem (15) comes in the form of a network , whose evaluation over random ’s generates models ’s as if they were sampled from the posterior distribution in Equation (3). After training, we can easily compute conditional mean and pointwise standard deviation as in Equation (5), or even higherorder statistics. Since possesses a suitable architecture that explicitly encodes its output density, we can reuse this result as a new prior for adiacent regions, timelapse imaging, and so on.
3 Uncertainty quantification for reservoir characterization
We now apply the setup described in the previous section, Equation (15), to fullwaveform inversion for reservoir characterization under some simplifying assumptions. We postulate a purely acoustic model (constant density), with laterally homogeneous properties (e.g., 1D). Waves propagate in a 2D medium. The Radon transform yields the “1.5D” wave equation
(17) 
where is a wavefield, a source wavelet, and the ray parameter. Timedepth is indicated by . Data is collected at the surface .
We synthetize a compressional velocity welllog from the opensource Sleipner dataset in Figure
1 [refer to 8, for a previous AVP study]. See the acknowledgements for more info. The true model is obtained by smoothing and subsampling the welllog slowness on a m grid, and a background is obtained by severe smoothing. We select a source wavelet with corner frequencies Hz, and a angle range from to , where . Synthetic data are generated by finitedifferences, and random Gaussian noise is injected with dB. Data are shown in Figure 2.As of invertible architecture, we choose hyperbolic networks [17], due to their inherent stability, augmented with one activation normalization layer [similarly to 14] acting diagonally along the depth dimension. This modification is necessary to obtain a nonvolume preserving transformation, that is with a nonnull logdeterminant (otherwise the network entropy will remain constant, cf. Equation 14).
The inversion results are compared in Figure 3. We plot the conditional mean , as defined in Equation (5
), with confidence intervals determined by the pointwise conditional standard deviation
. The conditional and are calculated with sample average of the network outputs . Additionally we show multiple samples from the estimated posterior before and after training (Figure 4), which highlight the sharpening of the distribution around the mean once optimized. Note that was initialized as Gaussian perturbations of the MAP, cf. Figure 4(a). A structural comparison between true model and conditional statistics is made apparent in Figure 5. As expected, the standard deviation has generally higher values where the background model is further from the truth (the deeper portion), and displays peaks at the reflector positions. We also observe that uncertainties tend to grow with depth.4 Conclusions
We presented an uncertainty quantification framework alternative to more classical MCMC sampling, where an invertible neural network is initialized according to a prior distribution and trained to approximate samples from the posterior distribution, given some data. This can be achieve by minimization of a probability discrepancy loss (the KullbackLeibler divergence) between the posterior and the network output distribution. We discussed a fullwaveform inversion application for reservoir characterization, which demonstrates the ability of this network to qualitatively capture the uncertainties of the problem. Contrary to MCMC methods, the result can be reused as a new prior for related inverse problems since, by construction, we are able compute the density function of the output distribution. Despite the aim of this project being faithful posterior sampling, we remark that the network architecture do bias the uncertainty analysis
[sometimes favorably as in deep prior regularization, 29], and should be carefully factored in. On the computational side, each iteration of stochastic gradient descent requires the calculation of a seismic gradient for the models of a given batch. Therefore, a feasible extension to 2D would certainly need randomized simultaneous sources. Future work will involve elastics, incorporation of hard constraints (as opposed to priors used as a penalty term), and field data applications.5 Acknowledgments
We are grateful to Equinor and partners for providing open access to the Sleipner data through the CO storage data consortium (co2datashare.org).
6 Related materials
In order to facilitate the reproducibility of the results herein discussed, a Julia implementation of this work is made available on the SLIM github page:
Several invertible network architectures are implemented as a Julia package in:
References
 [1] (2018) Invertible Residual Networks. External Links: 1811.00995 Cited by: §2.
 [2] (2005) Fast Model Updates Using Wavelets. Multiscale Model. Simul. 3, pp. 106–130. Cited by: §1.

[3]
(2020)
Your GAN is Secretly an Energybased Model and You Should use Discriminator Driven Latent Sampling
. External Links: 2003.06060 Cited by: §2.  [4] (2016) Density estimation using Real NVP. External Links: 1605.08803 Cited by: §1, §2.
 [5] (2018) Assessing uncertainties in velocity models and images with a fast nonlinear uncertainty quantification method. Geophysics 83 (2), pp. R63–R75. External Links: Document Cited by: §1.
 [6] (2018) Totalvariation regularization strategies in fullwaveform inversion. SIAM Journal on Imaging Sciences 11 (1), pp. 376–406. Cited by: §1.
 [7] (2014) Generative Adversarial Networks. External Links: 1406.2661 Cited by: §2.
 [8] (2018) Quantitative Prediction of Injected CO2 At Sleipner Using WaveEquation Based AVO. 2018 (1), pp. 1–5. External Links: Document, Link, ISSN 22144609 Cited by: §3.

[9]
(2017)
Alternating backpropagation for generator network.
In
ThirtyFirst AAAI Conference on Artificial Intelligence
, Cited by: §2.  [10] (201912) Learned imaging with constraints and uncertainty quantification. In Neural Information Processing Systems (NeurIPS) 2019 Deep Inverse Workshop, External Links: Link Cited by: §1.
 [11] (201908) Bayesian uncertainty estimation for full waveform inversion: A numerical study. In SEG Technical Program Expanded Abstracts 2019, pp. 1685–1689. External Links: Document Cited by: §1.
 [12] (2016) Deep Directed Generative Models with EnergyBased Probability Estimation. External Links: 1606.03439 Cited by: §2.
 [13] (2014) Adam: A Method for Stochastic Optimization. External Links: 1412.6980 Cited by: §2.
 [14] (2018) Glow: Generative Flow with Invertible 1x1 Convolutions. External Links: 1807.03039 Cited by: §1, §2, §3.
 [15] (2013) AutoEncoding Variational Bayes. External Links: 1312.6114 Cited by: §2.

[16]
(2019)
HINT: Hierarchical Invertible Neural Transport for Density Estimation and Bayesian Inference
. External Links: 1905.10687 Cited by: §1, §2, §2. 
[17]
(2019)
Fully Hyperbolic Convolutional Neural Networks
. External Links: 1905.10484 Cited by: §1, §2, §3.  [18] (2004) Expanded uncertainty quantification in inverse problems: Hierarchical Bayes and empirical Bayes. Geophysics 69 (4), pp. 1005–1016. External Links: Document Cited by: §1.
 [19] (2006) Two ways to quantify uncertainty in geophysical inverse problems. Geophysics 71 (3), pp. W15–W27. External Links: Document Cited by: §1.
 [20] (2012) A stochastic Newton MCMC method for largescale statistical inverse problems with application to seismic inversion. SIAM Journal on Scientific Computing 34 (3), pp. A1460–A1487. Cited by: §1.
 [21] (2019) Stochastic Seismic Waveform Inversion Using Generative Adversarial Networks as a Geological Prior. Mathematical Geosciences 84 (1), pp. 53–79. External Links: Document Cited by: §1.
 [22] (2011) MCMC using Hamiltonian dynamics. Handbook of Markov chain Monte carlo 2 (11), pp. 2. Cited by: §1.
 [23] (2018) Transport Map Accelerated Markov Chain Monte Carlo. SIAM/ASA Journal on Uncertainty Quantification 6 (2), pp. 645–682. External Links: Document Cited by: §1, §2.
 [24] (2019) Projection methods and applications for seismic nonlinear inverse problems with multiple constraints. Geophysics 84 (2), pp. R251–R269. External Links: Document Cited by: §1.
 [25] (2019) Invert to Learn to Invert. External Links: 1911.10914 Cited by: §2.
 [26] (2004) Monte Carlo statistical methods. SpringerVerlag. Cited by: §1.
 [27] (2013) Transdimensional inference in the geosciences. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371 (1984), pp. 20110547. Cited by: §1.

[28]
(202006)
A deeplearning based bayesian approach to seismic imaging and uncertainty quantification
. 82nd EAGE Conference and Exhibition 2020. External Links: Link Cited by: §1.  [29] (2020) Uncertainty quantification in imaging and automatic horizon tracking: a Bayesian deepprior based approach. External Links: 2004.00227 Cited by: §1, §4.
 [30] (2005) Inverse problem theory and methods for model parameter estimation. Vol. 89, SIAM. Cited by: §1.
 [31] (201910) Bayesian transdimensional seismic fullwaveform inversion with a dipping layer parameterization. Geophysics 84 (6), pp. R845–R858. Cited by: §1.
 [32] (2011) Bayesian Learning via Stochastic Gradient Langevin Dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, pp. 681–688. Cited by: §1.
 [33] (201908) A gradient based MCMC method for FWI and uncertainty analysis. In SEG Technical Program Expanded Abstracts 2019, pp. 1465–1469. External Links: Document Cited by: §1.
 [34] (2018) Seismic inversion and uncertainty quantification using transdimensional Markov chain Monte Carlo method. Geophysics 83 (4), pp. R321–R334. External Links: Document Cited by: §1.