Large scale inverse problems with partial differential equation (PDE) constraints play a key role in many applications, including medical, electromagnetic, seismic imaging, as well as DC resistivity and hydrogeology[28, 22, 17, 7, 8]. We focus on seismic data, which are used by both global seismologists and oil and gas industries to get subsurface information of the Earth. In the exploratory setting of marine acquisition, seismic data are obtained by a ship towing a compressed air gun (i.e. a seismic source) and a stream of receivers. The air gun produces acoustic wave that propagate deep into the ocean floor, where a part of the wave is then get reflected to the surface where it gets recorded by the receivers. Given the observed data recorded at receiver, we solve for subsurface properties of earth such as velocity, density, and conductivity .
Waveform inversion for the medium parameters can be formalized as an inverse problem with PDE constraints:
where the constraints are the discretized linear PDEs, is the number of sources or experiments in a given survey, represents the source that emits the field , is the matrix that maps the discretized field to the location of where the data was collected, and the matrix is a discretization of the PDE with appropriate boundary conditions. We assume it is possible to compute the field given m:
and then (1) can be written in its reduced form:
where is the forward problem that predicts the data set for source. Prior information can also be included in this formulation using constraints or regularization for the model , but we do not focus on this in the paper.
In a realistic setting, we have to solve large number of PDEs to evaluate , with easily of the orders of . Since each evaluation requires solving the linear system of equations (3), these solves are the main computational bottleneck. Problem (3) is written naturally as a large sum, so stochastic techniques can readily apply. In particular, one can sample shots and use these smaller samples to generate updates for . However, problem (3) has additional structure that allows a specialized randomized approach.
for any satisfying . This makes it possible to create so called simultaneous shots during the optimization process, and obtain updates in using these shots rather than the entire dataset . Unfortunately, the identity (4) pre-supposes a common acquisition domain for all sources; it assumes that all sources see all receivers. This assumption is routinely violated for many types of data acquisition, and in particular by the marine acquisition scenario.
Contribution: The current paper makes it possible to use simultaneous shots in complex geometries, by treating uncollected data as if missing from a full all-see-all acquisition scenario, and using interpolation techniques to fill it in. Given a dataset collected from a particular acquisition, we can use low rank regularization to interpolate the unobserved data, and then proceed with simultaneous sources. However, we also go further: we propose and solve a joint inversion and interpolation
problem, which iteratively refines the interpolated data as the model estimates improve.
Roadmap: The paper proceeds as follows. In Section 2, we review the simultaneous shots method and its relationship to trace estimation and stochastic optimization. We also formalize why the approach fails in complex acquisition geometries. In Section 3 we discuss low-rank interpolation, and show the efficacy and limitations of a two stage approach; (1) interpolation followed by (2) inversion by simultaneous shots. In Section 4, we develop a unified formulation that solves a single problem to accomplish the tasks simultaneously, and show that this approach significantly improves the results. We also develop an algorithm for this unified optimization problem. We illustrate the advantages of the approach by inverting the Marmousi model  in a difficult acquisition scenario. The unified approach gives much better results than the simple stage-wise workflow.
Suppose that we have an all-see-all geometry, so all sources are recorded by all the receivers, . Define
where , and . Then the identity (4) is easily derived for :
where the last equality assumes
, which is true e.g. for a standard Gaussian vector. Many other distributions, e.g. Rademacher, also satisfy this requirement[12, 10].
If we now consider the randomized function
we have , and evaluating requires solving a single PDE. We can think of the original problem as a stochastic optimization problem
This leads to two views: stochastic average approximation, and stochastic optimization.
where is any random vector that satisfies . The solution to (8) is then approximated by the solution to the following problem:
We work with K PDEs, reducing the computational cost substantially if .
. At any iteration, we can get an unbiased estimate ofand its gradient. Assuming for simplicity that we do not have any regularizer, we get a family of algorithms of the form
so in particular is the true gradient of , while is any desired Hessian approximation that depends on the same set of shots as . To implement each iteration, we again work with PDEs; the variables can vary between iterations.
These randomized accelerations have a significant impact in full acquisition geometries, where all sources are recorded by the all receivers. In practical scenarios, this condition nearly always fails because of budgetary and physical constraints. For example, in marine seismic acquisition, both sources and receivers have to move with the ship, so all-see-all is impossible by definition.
We can model any acquisition geometry as a subsample of a hypothetical all-see-all acquisition. Call the fully sampled data and let denote the mask that extracts available observations:
where is a observed subset of entries from the fully sampled data matrix . We then have , where is the element-wise (Hadamard) product.
The mask precludes a straightforward application of the simultaneous shot method. The masked version of (3) is given by
We no longer have the simultaneous source term, because we cannot simply use the matrix vector product . Instead, we have to compute first before we can multiply it with . This negates the entire motivation of the simultaneous shot method, since we have to solve all PDEs at every iteration.
Our approach is to complete the data, so that we can work with even though we never observed it in the first place. We discuss data-driven low-rank interpolation techniques in the next section, and then present a unified interpolation and inversion approach.
3 Using Low-Rank Interpolation as a Pre-processing Step
In this section, we review optimization-based methods for interpolating missing traces to reconstruct fully sampled data from the partial observations . Commonly used interpolation techniques promote a parsimonious representation of the data in a transform domain. For example, if the vectorized data is compressible in a particular domain, we interpolate by penalizing a sparsifying penalty (such as the 1-norm) on the coefficients of in that domain; Fourier , Wavelet, and Curvelet 
domains are frequently used. Analogously, if the full data can be organized into a 2-dimensional matrix with quickly decaying singular values, we can look for low-rank decompositions that match available observations[15, 13, 1]. Here, we focus on low-rank approaches, which are computationally and memory efficient for large-scale seismic data problems .
To recover fully sampled data from the subsampled data, we can solve the following rank-minimization problem
where specifies how closely entries of must be to the actual observed entries . Problem (12) is NP hard, and algorithms that provide exact solutions have complexity that is doubly exponential in the dimension of the matrix [6, 5]. Popular alternatives are (1) a convex relaxation using the nuclear-norm, which replaces by the sum of singular values of [6, 5], or (2) an explicit factorization , where the modeler selects an upper bound for the rank of the factors a priori [13, 1] and solves
The regularizer is an upper bound to the nuclear norm (sum of singular values) of ():
Once we recover , we can then apply stochastic optimization or solve an SAA approximation, as described in the introduction, without concern for the acquisition-encoding mask .
3.1 Numerical Example
In this section, we apply stage-wise analysis (interpolation followed by inversion) to solve a full-waveform inverse problem over a complex acquisition geometry. Here, represents the constant-density acoustic Helmholtz wave equation
and encodes temporal frequencies and represents the squared-slowness. We simulate data using the full Marmousi velocity model , which has complex geological structure with steeply dipping events (Figure 1). We use a model grid spacing of 15 m and simulate a fixed-spread acquisition configuration, 400 co-located sources and receivers with 12.5 m spacing, and a Ricker source wavelet with a peak frequency of 15 Hz. To generate a dataset with partial observations, we use a frequency spectrum of 3-30 Hz and interpolate each frequency slice independently in this range using rank-minimization framework.
In order to recover missing entries using low-rank optimization, we require that:
The fully sampled data should have quickly decaying singular values.
Subsampling the data should destroy this fast decay.
When these conditions are met, we can penalize a rank proxy in order to recover the full data volume. For seismic data, monochromatic frequency slices satisfy these requirements  in the midpoint-offset domain, where the midpoint () and offset () are defined as
with and the row (source position) and column (receiver position) respectively. The transformation rotates the data matrix by 45 clockwise as illustrated in Figure 2. Figure 3 compares the singular value decay of a monochromatic seismic data slice at 4 Hz in the source-receiver and midpoint-offset domain; conditions (1) and (2) above are satisfied in the midpoint-offset domain, but not in the source-receiver domain (Figure 3). Therefore, we formulate a rank-minimization problem (14) in the midpoint-offset domain:
Once we solve (16) for in the midpoint-offset domain, we apply to the recovered data to map it back to the source-receiver domain. We then solve waveform inversion over an all-see-all geometry using simultaneous shots. We use a linear gradient model to initialize, as shown in Figure 1. The inversions are carried out sequentially in ten overlapping frequency bands on the interval 3—30 Hz (), each using 25 different randomly selected simultaneous shots and six selected frequencies in each band with an interval of 0.2 Hz. We use a limited-memory L-BFGS method .
Figure 4 shows the inversion results with interpolated data for 50%, 75% and 85% missing entries. We get excellent inversion results for 50% missing data, since we get good reconstruction quality data using rank-minimization based framework (see Figures 5 and 6). We start to see deterioration of inversion results at about 75% missing data. Finally, as we move to 85% missing data, inversion quality deteriorates noticeably. This is due to the fact that the data reconstruction using rank-minimization is poor as we move from low to high subsampling ratio (see Figures 7 and 8).
We can still recover a reasonable solution by solving the waveform inversion problem using the 15% of the observed data by applying classical approaches (non-randomized) FWI techniques. However, this is a costly prospect, as we must solve all PDEs in each iteration to get the predicted data at the available locations. Simultaneous shots can get the solution quickly by solving very few PDEs using simultaneous sources, but require fill-in for unseen data, which introduces error and degrades the solution quality. There appears to be a clear speed vs. quality tradeoff. In the next section we show that we can get around this bottleneck by using a unified approach, where we combine low-rank interpolation with inversion.
4 Simultaneous Data Completion and Inversion
From the previous section, it is clear that while filling in an all-see-all acquisition scenario makes it possible to use simultaneous shots, it also introduces error when the available data comprise less than of the hypothetical full volume. To push past this boundary, we propose a unified data completion and waveform inversion approach, and develop a customized algorithm to solve it.
where is a tradeoff parameter. The advantage of (17) over the stage-wise approach is that the model informs the interpolation. The data-fit term in the constraints prevents fill-in that is inconsistent with the physics. The challenge of (17) is the need to solve for all the sources, which is prohibitively expensive for large-scale seismic data. To mitigate this, we design a stochastic block-coordinate descent algorithm for (17
), where we need to solve only a few PDEs in each subproblem by using simultaneous shots. We can encode simultaneous shots explicitly using a random matrix
, where each column encodes a simultaneous shot by drawing a random vector from a standard i.i.d Gaussian distribution. The shotsare fixed throughout each iteration of the block coordinate descent method described below; after each iteration described below is complete, is resampled.
Solving for for fixed . For fixed , we consider a problem (17) is a residual-constrained matrix factorization problem, studied by . The randomized subproblem using the simultaneous shots is given by
where is the forward-modeled simultaneous seismic data at the current .
We solve this problem using the extension of the SPG solver  developed by 555https://github.com/UW-AMO/spgl1 to handle matrix factorization. The core idea is to solve a sequence of Lasso subproblems:
where the corresponding to is found by solving with a root finding method . The subproblem (19) is solved using a fast projected gradient method. The gradient computations for (19) are straightforward. The term needs to be computed only at the beginning of the interpolation for the current . We also reuse this precomputed to evaluate the gradient at the first iteration of the subproblem described below.
Solving for with fixed . When are fixed, problem (17) is essentially a feasibility problem in . For a set of simultaneous shots , we want to solve
is fixed for this subproblem, and we simply use L-BFGS to update . L-BFGS only needs gradient information, and the gradients are computed using the standard adjoint-state method .
The overall algorithm is summarized in Algorithm 1. The subproblems for and terminate when an iteration cap is reached.
5 Numerical Experiment
We test the joint inversion approach on the same dataset as the disjoint inversion, and compare the results with those obtained previously by the stage-wise approach. We initialize L and R following  by using the SVD of the subsampled data in the midpoint offset domain:
and set and as the initial value. The models were recovered by performing 25 partial solves of the and subproblems. Instead of using the full 400 PDE solves to recover the model, we needed only 25 PDEs, just as for the disjoint inversion experiment.
Figure 9 shows the joint inversion results for 75% and 85% missing data. Stage-wise and unified inversion give the same quality results for 50% missing data, hence, we do not plot the similar figure for joint inversion. However, the results for unified inversion at higher levels of missing data are far better. The recovery of the model is excellent even at 85% missing data, with good data reconstruction at high frequencies, see Figure 10.
6 Discussion and Conclusion
We have presented a joint interpolation and inversion framework, which can recover the high-fidelity seismic data and subsurface velocity model under high-subsampling scenarios (e.g. 80% missing data). The proposed approach exploits the fast singular value decay of seismic data in the midpoint-offset domain, and allows fast stochastic methods to bear on the inverse problem despite the missing data. In particular, we recover missing data and then use simultaneous shots from the recovered maps to inform waveform inversion. Using carefully selected stylized examples, we showed that our method outperforms the traditional seismic data reconstruction and inversion methods, is fast, and highly scalable. To our knowledge, this work is first of a kind in combining seismic data interpolation and waveform-inversion in a joint framework, allowing us to efficiently and simultaneously reconstruct seismic data volumes and invert for artifact-free velocity models of the subsurface.
This paper opens several new avenues of research. First, so far we assume that the starting velocity model is not cycle skipped, which occurs when predicted and observed data differ by more than half a cycle. For future work, we plan to relax this assumption by using Wavefield Reconstruction Inversion  to handle the cycle skipping phenomenon. Second, here we consider only randomly subsampled data scenarios rather than structured scenarios such as towed-streamer marine seismic acquisition. Future work includes developing the proposed approach for these complex geophysical acquisition scenarios. Third, Seismic data are typically irregularly sampled along spatial axes, and we also plan to adapt the approach to non-uniform sampling grids. Finally, we expect that these methods can be extended to perform joint interpolation and inversion for large-scale 3-D seismic data acquisition, where the underlying model is 3D and the observed seismic data is 5D.
We would like to acknowledge the assistance of volunteers in putting together this example manuscript and supplement. R.K. would like to thank the member organizations of the SINBAD II project and the SINBAD Consortium for supporting this work. The work of Prof. Aravkin was supported by the Washington Research Foundation Data Science Professorship.
-  A. Aravkin, R. Kumar, H. Mansour, B. Recht, and F. J. Herrmann, Fast methods for denoising matrix completion formulations, with applications to robust seismic data interpolation, SIAM Journal on Scientific Computing, 36 (2014), pp. S237–S266.
-  A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy, Level-set methods for convex optimization, arXiv preprint arXiv:1602.01506, (2016).
-  A. Brougois, M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg, Marmousi, model and data, in EAEG Workshop-Practical Aspects of Seismic Data Inversion, 1990.
-  C. Bunks, F. M. Saleck, S. Zaleski, and G. Chavent, Multiscale seismic waveform inversion, Geophysics, 60 (1995), pp. 1457–1473.
-  E. J. Candes and Y. Plan, Matrix completion with noise, Proceedings of the IEEE, 98 (2010), pp. 925–936.
-  E. J. Candès and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational mathematics, 9 (2009), p. 717.
-  M. Cheney, D. Isaacson, and J. C. Newell, Electrical impedance tomography, SIAM review, 41 (1999), pp. 85–101.
-  E. Haber, U. M. Ascher, and D. W. Oldenburg, Inversion of 3d electromagnetic data in frequency and time domain using an inexact all-at-once approach, Geophysics, 69 (2004), pp. 1216–1228.
-  E. Haber, M. Chung, and F. Herrmann, An effective method for parameter estimation with pde constraints with multiple right-hand sides, SIAM Journal on Optimization, 22 (2012), pp. 739–757.
-  E. Haber, M. Chung, and F. Herrmann, An effective method for parameter estimation with pde constraints with multiple right-hand sides, SIAM Journal on Optimization, 22 (2012), pp. 739–757.
-  F. J. Herrmann and G. Hennenfent, Non-parametric seismic data recovery with curvelet frames, Geophysical Journal International, 173 (2008), pp. 233–248.
-  M. F. Hutchinson, A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines, Communications in Statistics-Simulation and Computation, 19 (1990), pp. 433–450.
-  R. Kumar, C. Da Silva, O. Akalin, A. Y. Aravkin, H. Mansour, B. Recht, and F. J. Herrmann, Efficient matrix completion for seismic data reconstruction, Geophysics, 80 (2015), pp. V97–V114.
-  X. Li, A. Y. Aravkin, T. van Leeuwen, and F. J. Herrmann, Fast randomized full-waveform inversion with compressive sensing, Geophysics, 77 (2012), pp. A13–A17.
-  V. Oropeza and M. Sacchi, Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis, Geophysics, 76 (2011), pp. V25–V32.
-  R.-E. Plessix, A review of the adjoint-state method for computing the gradient of a functional with geophysical applications, Geophysical Journal International, 167 (2006), pp. 495–503.
-  R. G. Pratt, Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model, Geophysics, 64 (1999), pp. 888–901.
-  F. Roosta-Khorasani, Randomized algorithms for solving large scale nonlinear least squares problems, PhD thesis, University of British Columbia, 2015.
M. D. Sacchi, T. J. Ulrych, and C. J. Walker,
Interpolation and extrapolation using a high-resolution discrete fourier transform, IEEE Transactions on Signal Processing, 46 (1998), pp. 31–38.
-  A. Shapiro, Monte carlo sampling methods, Handbooks in operations research and management science, 10 (2003), pp. 353–425.
-  A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on stochastic programming: modeling and theory, SIAM, 2009.
-  K. Steklova, Computational methods in hydrogeophysics, PhD thesis, University of British Columbia, 2017.
-  E. Van Den Berg and M. P. Friedlander, Probing the pareto frontier for basis pursuit solutions, SIAM Journal on Scientific Computing, 31 (2008), pp. 890–912.
-  T. van Leeuwen and F. J. Herrmann, Fast waveform inversion without source-encoding, Geophysical Prospecting, 61 (2013), pp. 10–19.
-  T. van Leeuwen and F. J. Herrmann, A penalty method for PDE-constrained optimization in inverse problems, Inverse Problems, 32 (2015), p. 015007. (Inverse Problems).
-  J. Virieux and S. Operto, An overview of full-waveform inversion in exploration geophysics, Geophysics, 74 (2009), pp. WCC1–WCC26.
-  S. Wright and J. Nocedal, Numerical optimization, Springer Science, 35 (1999), p. 7.
-  L. Yaoguo and D. W. Oldenburg, Inversion of 3-d dc resistivity data using an approximate inverse mapping, Geophysical journal international, 116 (1994), pp. 527–537.