1 Introduction
Large scale inverse problems with partial differential equation (PDE) constraints play a key role in many applications, including medical, electromagnetic, seismic imaging, as well as DC resistivity and hydrogeology
[28, 22, 17, 7, 8]. We focus on seismic data, which are used by both global seismologists and oil and gas industries to get subsurface information of the Earth. In the exploratory setting of marine acquisition, seismic data are obtained by a ship towing a compressed air gun (i.e. a seismic source) and a stream of receivers. The air gun produces acoustic wave that propagate deep into the ocean floor, where a part of the wave is then get reflected to the surface where it gets recorded by the receivers. Given the observed data recorded at receiver, we solve for subsurface properties of earth such as velocity, density, and conductivity [26].Waveform inversion for the medium parameters can be formalized as an inverse problem with PDE constraints:
(1) 
where the constraints are the discretized linear PDEs, is the number of sources or experiments in a given survey, represents the source that emits the field , is the matrix that maps the discretized field to the location of where the data was collected, and the matrix is a discretization of the PDE with appropriate boundary conditions. We assume it is possible to compute the field given m:
(2) 
and then (1) can be written in its reduced form:
(3) 
where is the forward problem that predicts the data set for source. Prior information can also be included in this formulation using constraints or regularization for the model , but we do not focus on this in the paper.
In a realistic setting, we have to solve large number of PDEs to evaluate , with easily of the orders of . Since each evaluation requires solving the linear system of equations (3), these solves are the main computational bottleneck. Problem (3) is written naturally as a large sum, so stochastic techniques can readily apply. In particular, one can sample shots and use these smaller samples to generate updates for [14]. However, problem (3) has additional structure that allows a specialized randomized approach.
The simultaneous source method [24, 18, 9] uses the following simple fact:
(4) 
for any satisfying . This makes it possible to create so called simultaneous shots during the optimization process, and obtain updates in using these shots rather than the entire dataset . Unfortunately, the identity (4) presupposes a common acquisition domain for all sources; it assumes that all sources see all receivers. This assumption is routinely violated for many types of data acquisition, and in particular by the marine acquisition scenario.
Contribution: The current paper makes it possible to use simultaneous shots in complex geometries, by treating uncollected data as if missing from a full allseeall acquisition scenario, and using interpolation techniques to fill it in. Given a dataset collected from a particular acquisition, we can use low rank regularization to interpolate the unobserved data, and then proceed with simultaneous sources. However, we also go further: we propose and solve a joint inversion and interpolation
problem, which iteratively refines the interpolated data as the model estimates improve.
Roadmap: The paper proceeds as follows. In Section 2, we review the simultaneous shots method and its relationship to trace estimation and stochastic optimization. We also formalize why the approach fails in complex acquisition geometries. In Section 3 we discuss lowrank interpolation, and show the efficacy and limitations of a two stage approach; (1) interpolation followed by (2) inversion by simultaneous shots. In Section 4, we develop a unified formulation that solves a single problem to accomplish the tasks simultaneously, and show that this approach significantly improves the results. We also develop an algorithm for this unified optimization problem. We illustrate the advantages of the approach by inverting the Marmousi model [3] in a difficult acquisition scenario. The unified approach gives much better results than the simple stagewise workflow.
2 Background
Suppose that we have an allseeall geometry, so all sources are recorded by all the receivers, . Define
(5) 
where , and . Then the identity (4) is easily derived for :
(6) 
where the last equality assumes
, which is true e.g. for a standard Gaussian vector. Many other distributions, e.g. Rademacher, also satisfy this requirement
[12, 10].If we now consider the randomized function
we have , and evaluating requires solving a single PDE. We can think of the original problem as a stochastic optimization problem
(7) 
This leads to two views: stochastic average approximation, and stochastic optimization.
Stochastic Average Approximation (SAA)[21] [20]. We think of approximating the expectation by a small finite sample:
(8) 
where is any random vector that satisfies . The solution to (8) is then approximated by the solution to the following problem:
(9) 
We work with K PDEs, reducing the computational cost substantially if .
Stochastic optimization
. At any iteration, we can get an unbiased estimate of
and its gradient. Assuming for simplicity that we do not have any regularizer, we get a family of algorithms of the formwhere
so in particular is the true gradient of , while is any desired Hessian approximation that depends on the same set of shots as . To implement each iteration, we again work with PDEs; the variables can vary between iterations.
These randomized accelerations have a significant impact in full acquisition geometries, where all sources are recorded by the all receivers. In practical scenarios, this condition nearly always fails because of budgetary and physical constraints. For example, in marine seismic acquisition, both sources and receivers have to move with the ship, so allseeall is impossible by definition.
We can model any acquisition geometry as a subsample of a hypothetical allseeall acquisition. Call the fully sampled data and let denote the mask that extracts available observations:
(10) 
where is a observed subset of entries from the fully sampled data matrix . We then have , where is the elementwise (Hadamard) product.
The mask precludes a straightforward application of the simultaneous shot method. The masked version of (3) is given by
(11) 
We no longer have the simultaneous source term, because we cannot simply use the matrix vector product . Instead, we have to compute first before we can multiply it with . This negates the entire motivation of the simultaneous shot method, since we have to solve all PDEs at every iteration.
Our approach is to complete the data, so that we can work with even though we never observed it in the first place. We discuss datadriven lowrank interpolation techniques in the next section, and then present a unified interpolation and inversion approach.
3 Using LowRank Interpolation as a Preprocessing Step
In this section, we review optimizationbased methods for interpolating missing traces to reconstruct fully sampled data from the partial observations . Commonly used interpolation techniques promote a parsimonious representation of the data in a transform domain. For example, if the vectorized data is compressible in a particular domain, we interpolate by penalizing a sparsifying penalty (such as the 1norm) on the coefficients of in that domain; Fourier [19], Wavelet, and Curvelet [11]
domains are frequently used. Analogously, if the full data can be organized into a 2dimensional matrix with quickly decaying singular values, we can look for lowrank decompositions that match available observations
[15, 13, 1]. Here, we focus on lowrank approaches, which are computationally and memory efficient for largescale seismic data problems [13].To recover fully sampled data from the subsampled data, we can solve the following rankminimization problem
(12) 
where specifies how closely entries of must be to the actual observed entries . Problem (12) is NP hard, and algorithms that provide exact solutions have complexity that is doubly exponential in the dimension of the matrix [6, 5]. Popular alternatives are (1) a convex relaxation using the nuclearnorm, which replaces by the sum of singular values of [6, 5], or (2) an explicit factorization , where the modeler selects an upper bound for the rank of the factors a priori [13, 1] and solves
(13)  
(14) 
The regularizer is an upper bound to the nuclear norm (sum of singular values) of ([1]):
Once we recover , we can then apply stochastic optimization or solve an SAA approximation, as described in the introduction, without concern for the acquisitionencoding mask .
3.1 Numerical Example
In this section, we apply stagewise analysis (interpolation followed by inversion) to solve a fullwaveform inverse problem over a complex acquisition geometry. Here, represents the constantdensity acoustic Helmholtz wave equation
and encodes temporal frequencies and represents the squaredslowness. We simulate data using the full Marmousi velocity model [3], which has complex geological structure with steeply dipping events (Figure 1). We use a model grid spacing of 15 m and simulate a fixedspread acquisition configuration, 400 colocated sources and receivers with 12.5 m spacing, and a Ricker source wavelet with a peak frequency of 15 Hz. To generate a dataset with partial observations, we use a frequency spectrum of 330 Hz and interpolate each frequency slice independently in this range using rankminimization framework.
In order to recover missing entries using lowrank optimization, we require that:

The fully sampled data should have quickly decaying singular values.

Subsampling the data should destroy this fast decay.
When these conditions are met, we can penalize a rank proxy in order to recover the full data volume. For seismic data, monochromatic frequency slices satisfy these requirements [1] in the midpointoffset domain, where the midpoint () and offset () are defined as
(15) 
with and the row (source position) and column (receiver position) respectively. The transformation rotates the data matrix by 45 clockwise as illustrated in Figure 2. Figure 3 compares the singular value decay of a monochromatic seismic data slice at 4 Hz in the sourcereceiver and midpointoffset domain; conditions (1) and (2) above are satisfied in the midpointoffset domain, but not in the sourcereceiver domain (Figure 3). Therefore, we formulate a rankminimization problem (14) in the midpointoffset domain:
(16) 
Once we solve (16) for in the midpointoffset domain, we apply to the recovered data to map it back to the sourcereceiver domain. We then solve waveform inversion over an allseeall geometry using simultaneous shots. We use a linear gradient model to initialize, as shown in Figure 1. The inversions are carried out sequentially in ten overlapping frequency bands on the interval 3—30 Hz ([4]), each using 25 different randomly selected simultaneous shots and six selected frequencies in each band with an interval of 0.2 Hz. We use a limitedmemory LBFGS method [27].
Figure 4 shows the inversion results with interpolated data for 50%, 75% and 85% missing entries. We get excellent inversion results for 50% missing data, since we get good reconstruction quality data using rankminimization based framework (see Figures 5 and 6). We start to see deterioration of inversion results at about 75% missing data. Finally, as we move to 85% missing data, inversion quality deteriorates noticeably. This is due to the fact that the data reconstruction using rankminimization is poor as we move from low to high subsampling ratio (see Figures 7 and 8).
We can still recover a reasonable solution by solving the waveform inversion problem using the 15% of the observed data by applying classical approaches (nonrandomized) FWI techniques. However, this is a costly prospect, as we must solve all PDEs in each iteration to get the predicted data at the available locations. Simultaneous shots can get the solution quickly by solving very few PDEs using simultaneous sources, but require fillin for unseen data, which introduces error and degrades the solution quality. There appears to be a clear speed vs. quality tradeoff. In the next section we show that we can get around this bottleneck by using a unified approach, where we combine lowrank interpolation with inversion.
4 Simultaneous Data Completion and Inversion
From the previous section, it is clear that while filling in an allseeall acquisition scenario makes it possible to use simultaneous shots, it also introduces error when the available data comprise less than of the hypothetical full volume. To push past this boundary, we propose a unified data completion and waveform inversion approach, and develop a customized algorithm to solve it.
The approach is intuitive: we merge the lowrank data fillin problem (16) with the waveform inversion problem (3). The joint problem can be formulated as follows:
(17)  
subject to 
where is a tradeoff parameter. The advantage of (17) over the stagewise approach is that the model informs the interpolation. The datafit term in the constraints prevents fillin that is inconsistent with the physics. The challenge of (17) is the need to solve for all the sources, which is prohibitively expensive for largescale seismic data. To mitigate this, we design a stochastic blockcoordinate descent algorithm for (17
), where we need to solve only a few PDEs in each subproblem by using simultaneous shots. We can encode simultaneous shots explicitly using a random matrix
, where each column encodes a simultaneous shot by drawing a random vector from a standard i.i.d Gaussian distribution. The shots
are fixed throughout each iteration of the block coordinate descent method described below; after each iteration described below is complete, is resampled.Solving for for fixed . For fixed , we consider a problem (17) is a residualconstrained matrix factorization problem, studied by [1]. The randomized subproblem using the simultaneous shots is given by
(18)  
subject to 
where is the forwardmodeled simultaneous seismic data at the current .
We solve this problem using the extension of the SPG solver [23] developed by [1]^{5}^{5}5https://github.com/UWAMO/spgl1 to handle matrix factorization. The core idea is to solve a sequence of Lasso subproblems:
subject to  (19) 
where the corresponding to is found by solving with a root finding method [2]. The subproblem (19) is solved using a fast projected gradient method. The gradient computations for (19) are straightforward. The term needs to be computed only at the beginning of the interpolation for the current . We also reuse this precomputed to evaluate the gradient at the first iteration of the subproblem described below.
Solving for with fixed . When are fixed, problem (17) is essentially a feasibility problem in . For a set of simultaneous shots , we want to solve
(20) 
is fixed for this subproblem, and we simply use LBFGS to update . LBFGS only needs gradient information, and the gradients are computed using the standard adjointstate method [16].
The overall algorithm is summarized in Algorithm 1. The subproblems for and terminate when an iteration cap is reached.
5 Numerical Experiment
We test the joint inversion approach on the same dataset as the disjoint inversion, and compare the results with those obtained previously by the stagewise approach. We initialize L and R following [1] by using the SVD of the subsampled data in the midpoint offset domain:
and set and as the initial value. The models were recovered by performing 25 partial solves of the and subproblems. Instead of using the full 400 PDE solves to recover the model, we needed only 25 PDEs, just as for the disjoint inversion experiment.
Figure 9 shows the joint inversion results for 75% and 85% missing data. Stagewise and unified inversion give the same quality results for 50% missing data, hence, we do not plot the similar figure for joint inversion. However, the results for unified inversion at higher levels of missing data are far better. The recovery of the model is excellent even at 85% missing data, with good data reconstruction at high frequencies, see Figure 10.
6 Discussion and Conclusion
We have presented a joint interpolation and inversion framework, which can recover the highfidelity seismic data and subsurface velocity model under highsubsampling scenarios (e.g. 80% missing data). The proposed approach exploits the fast singular value decay of seismic data in the midpointoffset domain, and allows fast stochastic methods to bear on the inverse problem despite the missing data. In particular, we recover missing data and then use simultaneous shots from the recovered maps to inform waveform inversion. Using carefully selected stylized examples, we showed that our method outperforms the traditional seismic data reconstruction and inversion methods, is fast, and highly scalable. To our knowledge, this work is first of a kind in combining seismic data interpolation and waveforminversion in a joint framework, allowing us to efficiently and simultaneously reconstruct seismic data volumes and invert for artifactfree velocity models of the subsurface.
This paper opens several new avenues of research. First, so far we assume that the starting velocity model is not cycle skipped, which occurs when predicted and observed data differ by more than half a cycle. For future work, we plan to relax this assumption by using Wavefield Reconstruction Inversion [25] to handle the cycle skipping phenomenon. Second, here we consider only randomly subsampled data scenarios rather than structured scenarios such as towedstreamer marine seismic acquisition. Future work includes developing the proposed approach for these complex geophysical acquisition scenarios. Third, Seismic data are typically irregularly sampled along spatial axes, and we also plan to adapt the approach to nonuniform sampling grids. Finally, we expect that these methods can be extended to perform joint interpolation and inversion for largescale 3D seismic data acquisition, where the underlying model is 3D and the observed seismic data is 5D.
Acknowledgments
We would like to acknowledge the assistance of volunteers in putting together this example manuscript and supplement. R.K. would like to thank the member organizations of the SINBAD II project and the SINBAD Consortium for supporting this work. The work of Prof. Aravkin was supported by the Washington Research Foundation Data Science Professorship.
References
 [1] A. Aravkin, R. Kumar, H. Mansour, B. Recht, and F. J. Herrmann, Fast methods for denoising matrix completion formulations, with applications to robust seismic data interpolation, SIAM Journal on Scientific Computing, 36 (2014), pp. S237–S266.
 [2] A. Y. Aravkin, J. V. Burke, D. Drusvyatskiy, M. P. Friedlander, and S. Roy, Levelset methods for convex optimization, arXiv preprint arXiv:1602.01506, (2016).
 [3] A. Brougois, M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg, Marmousi, model and data, in EAEG WorkshopPractical Aspects of Seismic Data Inversion, 1990.
 [4] C. Bunks, F. M. Saleck, S. Zaleski, and G. Chavent, Multiscale seismic waveform inversion, Geophysics, 60 (1995), pp. 1457–1473.
 [5] E. J. Candes and Y. Plan, Matrix completion with noise, Proceedings of the IEEE, 98 (2010), pp. 925–936.
 [6] E. J. Candès and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational mathematics, 9 (2009), p. 717.
 [7] M. Cheney, D. Isaacson, and J. C. Newell, Electrical impedance tomography, SIAM review, 41 (1999), pp. 85–101.
 [8] E. Haber, U. M. Ascher, and D. W. Oldenburg, Inversion of 3d electromagnetic data in frequency and time domain using an inexact allatonce approach, Geophysics, 69 (2004), pp. 1216–1228.
 [9] E. Haber, M. Chung, and F. Herrmann, An effective method for parameter estimation with pde constraints with multiple righthand sides, SIAM Journal on Optimization, 22 (2012), pp. 739–757.
 [10] E. Haber, M. Chung, and F. Herrmann, An effective method for parameter estimation with pde constraints with multiple righthand sides, SIAM Journal on Optimization, 22 (2012), pp. 739–757.
 [11] F. J. Herrmann and G. Hennenfent, Nonparametric seismic data recovery with curvelet frames, Geophysical Journal International, 173 (2008), pp. 233–248.
 [12] M. F. Hutchinson, A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines, Communications in StatisticsSimulation and Computation, 19 (1990), pp. 433–450.
 [13] R. Kumar, C. Da Silva, O. Akalin, A. Y. Aravkin, H. Mansour, B. Recht, and F. J. Herrmann, Efficient matrix completion for seismic data reconstruction, Geophysics, 80 (2015), pp. V97–V114.
 [14] X. Li, A. Y. Aravkin, T. van Leeuwen, and F. J. Herrmann, Fast randomized fullwaveform inversion with compressive sensing, Geophysics, 77 (2012), pp. A13–A17.
 [15] V. Oropeza and M. Sacchi, Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis, Geophysics, 76 (2011), pp. V25–V32.
 [16] R.E. Plessix, A review of the adjointstate method for computing the gradient of a functional with geophysical applications, Geophysical Journal International, 167 (2006), pp. 495–503.
 [17] R. G. Pratt, Seismic waveform inversion in the frequency domain, part 1: Theory and verification in a physical scale model, Geophysics, 64 (1999), pp. 888–901.
 [18] F. RoostaKhorasani, Randomized algorithms for solving large scale nonlinear least squares problems, PhD thesis, University of British Columbia, 2015.

[19]
M. D. Sacchi, T. J. Ulrych, and C. J. Walker,
Interpolation and extrapolation using a highresolution discrete fourier transform
, IEEE Transactions on Signal Processing, 46 (1998), pp. 31–38.  [20] A. Shapiro, Monte carlo sampling methods, Handbooks in operations research and management science, 10 (2003), pp. 353–425.
 [21] A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on stochastic programming: modeling and theory, SIAM, 2009.
 [22] K. Steklova, Computational methods in hydrogeophysics, PhD thesis, University of British Columbia, 2017.
 [23] E. Van Den Berg and M. P. Friedlander, Probing the pareto frontier for basis pursuit solutions, SIAM Journal on Scientific Computing, 31 (2008), pp. 890–912.
 [24] T. van Leeuwen and F. J. Herrmann, Fast waveform inversion without sourceencoding, Geophysical Prospecting, 61 (2013), pp. 10–19.
 [25] T. van Leeuwen and F. J. Herrmann, A penalty method for PDEconstrained optimization in inverse problems, Inverse Problems, 32 (2015), p. 015007. (Inverse Problems).
 [26] J. Virieux and S. Operto, An overview of fullwaveform inversion in exploration geophysics, Geophysics, 74 (2009), pp. WCC1–WCC26.
 [27] S. Wright and J. Nocedal, Numerical optimization, Springer Science, 35 (1999), p. 7.
 [28] L. Yaoguo and D. W. Oldenburg, Inversion of 3d dc resistivity data using an approximate inverse mapping, Geophysical journal international, 116 (1994), pp. 527–537.
Comments
There are no comments yet.