1 Introduction
The computation of wave propagation is a critical ingredient in many important applications. For example, in inverse problems involving the wave equation as the forward model, efficient numerical simulation of wave propagation directly corresponds to a more precise and faster imaging pipeline.
In this work, we will focus on the model initial value problem:
(1)  
with periodic boundary conditions. Furthermore, we shall assume that (i) the wave speed is only piecewise smooth with nontrivial total variation, and (ii) the wavelengths in the initial wave field are at least one order of magnitude smaller than the domain size.
Performing largescale highfidelity simulations of wave propagation in heterogeneous media is a challenging task. Standard numerical methods typically require very fine spatiotemporal grids to fully resolve the smallest wavelength in the solutions, the geometry in the wave speed’s discontinuities, and to reduce excessive numerical dispersion errors. Furthermore, the stability and accuracy requirement imposes an additional restriction on the time step size, making longtime simulation even more difficult.
Many existing multiscale methods tackle the wave equation by assuming a separation of scales in the medium’s wave speed. The assumption on more global structures such as periodicity or stationary ergodicity is also common. Such assumptions allow a cleverly designed algorithm to achieve a computational complexity that is essentially independent of the need to resolve the smallest scale globally in spacetime. These are algorithms may employ either online or offline computations to approximate the effective wave propagation accurately. We mention the Heterogeneous Multiscale Methods (HMM), see e.g. [29][1][11], which typically solve the effective problem, free of small parameters, defined by localized online computations. The Multiscale Finite Element Method (MsFEM) style algorithms [10], including the Localized Orthogonal Decomposition (LOD) method, e.g. [2], compute specialized basis functions for each given multiscale medium. See also [12], and [21]. These methods do not require scale separation, but need to fully resolve the medium by the basis functions. As the basis functions are mediumspecific, these algorithms typically aim to solve the same equation repeatedly with different initial conditions. See also [12], and [21]. It is also possible to construct lowrank approximations of the numerical wave propagator using data. Druskin et. al. [9, 7] introduced a datadriven reduced order model (ROM) for the wave equation. Specifically, using wave field time snapshots the authors construct a wave propagator allowing very accurate inversion of the medium.
In [20], we proposed a multiscale algorithm in which the computation of a coarse, underresolving solver is corrected by online, using data computed, inparallel and in localized subdomains, by another accurate solver.
1.1 The proposed approach
Our approach departs from the above multiscale methodologies. We propose a supervised deep learning framework to enhance the accuracy of a lowfidelity and computationally cheaper solver for (1). After describing how neural networks are used in our approach, we will present our choice of neural network model (the architecture, the input, and output variables), and our approach to generating training data that consistently sample the causality in wave propagation in a class of media with piecewise smooth wave speeds. We will also discuss the rationales behind our approach and report several representative numerical studies.
We assume that a timereversible coarse solver, , is given. It advances a given wave field to time using the given wave speed profile . Additionally, an accurate fine solver, is also available. The fine solver is used to generate data for improving the coarse solver. As these two solvers operate on grids of different resolutions (a coarse and a fine grid), we will extend the coarse solver to one defined on the same grid as the fine solver for comparing the wave fields propagated by the two solvers. The extension is defined via a restriction (projection) operator
, which maps fine grid functions to the coarse grid. We shall also need a prolongation (e.g. interpolation) operator
, which maps coarse grid functions to the fine grid. and are typical elements in a standard multigrid algorithm.We reorganize the time stepping by the fine solver into
(2) 
In general, can be thought of as a lowpass filter, and is not identity; i.e. . The second term above, , should consist mostly the higher frequency components of As for the first term, since filters out the higher frequency parts in , it is reasonable to approximate by a computationally cheaper strategy using a coarser grid — we consider .
If we further introduce a fixedpoint iteration that allows the fine solver to be applied in parallel, we would derive a parareal scheme [17]:
(3) 
with
(4) 
The second term on the right hand side of (3) serves as an attempt to add back the missing high frequency components, using information from the previous iteration.
The iteration in (3) is often unstable when the solvers have too little dissipation. In such cases, the stability and convergence of (3) will rely heavily on being a good approximation of . However, in heterogeneous wave media, will likely produce higher frequency modes, even if have only low frequency modes. Consequently, we should not expect the approximation by to be globally sufficient.
One idea is to replace in by something more elaborate. Our objective is to construct neural networks that approximate the operation
(5) 
using the data , with and sampled from certain distributions of interests. We then use the resulting enhanced solver
(6) 
for wave propagation on the fine grid. Note that in most parts of this paper the solvers’ dependence on the wave speed, , is hidden for brevity of notation.
We will rely on the parareal iteration to add back the missing high frequency components. The enhanced parareal scheme take the same form as above
(7) 
with
(8) 
The wave solution operator, and most numerical solvers, are linear and reversible with respect to the initial data but nonlinear to the wave speed. So is However, the operator is a socalled ”nearidentity” operator for sufficiently smooth wave speeds and wave fields. Therefore, nearby a fixed reference speed function , the linear operator has potential to offer an adequate, though nonoptimal, approximation of .
It may be helpful to inspect the approximation of an individual Fourier mode of the wave field and consider errors in the phases and the amplitudes. The additive correction in the parareal iterations is very effective in reducing the amplitude errors. However, the phase errors generally have a detrimental effect on the stability of parareal iterations. See e.g. [3][20][26][16]. It is therefore imperative that corrects the numerical dispersion errors in compared to Furthermore, as discussed above, we also expect that to introduce some of the higher frequency modes that come out of propagating a smooth given wave field through the given nonconstant media.
Notice that for each instance of , is a linear operator, provided that the prolongation is linear. With the nonlinear activations and biases in the network removed, the optimization model defines a regression model to construct datadriven (factorized) linear approximations of the operator associated with . In which regime will this type of linear approximation work? We shall present some experiments and comparisons regarding this observation.
In Section 2, we present the proposed learning model and our approach in generating training data. In Section 3, we report results showing that a properly trained, simple, multilevel network can be an effective corrector of the coarse solver in a range of media. In Section 4, we apply the proposed networks to the parareal scheme in four wave speed models.
1.2 Other related work
train a convolutional neural network (CNN) for correction of lowfidelity solutions. As a supervised learning task, the data consists of lowfidelity and highfidelity pairs, computed using the secondorder and twentiethorder finitedifference, respectively. The data are time snapshots of the solutions for various media and sources. The experimental results in these papers indicate that the trained CNN can remove the numerical dispersion in complex velocity models. This result is encouraging since their setup is quite straightforward.
In [19], a neural network model is proposed to learn wave propagation on a fixed medium. The PhysicsInformed Neural Network (PINN) [22]
approach to setup the neural network’s optimization model. This means the use of the given PDE on a chosen setup of points in the domain as the penalty term. The fullyconnected neural network learns the linear relation between the initial wave source and the resulting wave field at later time. Each forward feed outputs the estimated wave field at the specified location. Applying the network for simulation on a different wave medium requires retraining the network.
Another approach to using neural networks in a parareal framework is discussed in [18]. The authors propose parareal physicsinformed neural network (PPINN). In particular, the sequence of fine solvers, mapping solutions from to ,
, are fully connected neural networks (PINNs) trained with loss functions that incorporate penalize the network outputs misfit to given differential equations. See
[22]. The key idea of [18] is using the the parareal scheme to train the sequence of PINNs in parallel. However, to the best of our current knowledge, the accuracy and stability of the PINN method for wave problems are not well understood.Our approach is similar to [27][23], but with additional key findings and further improvements. Most differently, our algorithm extensively uses the wave energy variables; i.e. the spatial gradient and the time derivative of weighted by the wave speed. We think that this strategy allows the algorithm to retain more physical properties of the wave system. This setup also leads to the input variables being dimensionless, which allows the algorithm to scale better. Such approach involving the energetic variables are used successfully in [28][20].
1.3 The numerical solvers
In this section, we describe the coarse and fine solvers used in this paper. Both solvers are defined by the standard central differencing for the partial derivatives, applied on uniform Cartesian grids for the spatial domain and uniform stepping in time. Starting with the discretization of the spatial derivatives:
(9) 
where
(10) 
and denote the standard basis of The central scheme in time (which is the Leap Frog scheme for (9)) can be written as the velocity Verlet time integrator for : (with approximating )
(11)  
where and This scheme scheme is widely use in practice for large scale computations, particularly for media with less smooth wave speeds.
We will denote the mapping defined in (11) by Both the fine and the coarse solvers will be defined by with different and .
The fine solver
used in this paper is defined by the same scheme (11) on a finer grid: . More precisely,
The coarse solvers
is defined by running (11) for time steps on the coarse grid ; i.e.
We assume that both the fine and the coarse solvers are stable on the respective grids. Furthermore, we assume that the fine solver is sufficiently accurate for wave speed , where is an upper bound for the wave speed in the class of wave media of interest.
If the two solvers are defined on different Cartesian grids (), we need to employ additional operators to map grid functions defined on the two grids. In this paper, denotes the restriction operator that maps a finer grid function defined on to one on . denotes the prolongation operator that maps a coarse grid function to fine grid.
With a slight abuse of notations, we will also use , , and to denote the corresponding discrete versions computed on the fine grid.
Wave energy components
The solvers approximately preserves the wave energy
(12) 
or its discretized version, which also defines a discrete seminorm on the grid:
(13) 
Due to this property, we shall also use a discretized version of the wave energy as a seminorm for comparing wave fields computed by the central scheme. In this paper, we will design algorithms that are defined as functions of the energy components of the wave field .
On the spatial grid with periodic boundary conditions, define the mapping into energy components as
(14) 
where is computed by numerically. denotes the pseudoinverse of and is defined in Appendix A.1.
The enhanced solvers
Our goal is to construct neural network functions , parameterized by , such that
For a class of piecewise smooth wave speed functions. In other words, corrects the energy components computed by the coarse solver , according to those computed by .
With , we define the enhanced solver as
(15) 
The time step size of the enhanced coarse solver  

u  The wave field vector: 
The correction operator  
The coarse solver defined on a coarse grid  
The fine solver  
The enhanced solver on the fine grid.  
The neural networks used for defining  
The discrete wave energy norm of .  
An elementary training data set  
The training data set derived from 
Correction of dispersion errors
Here we present a simple and classical dispersion property of the central scheme. The point is to identify a regime in which is an identity plus a minor perturbation. In such a regime, the enhanced solver should approximate the perturbation.
We analyze in a simplified setup involving plane waves: propagating in a medium with constant speed in one dimension. We assume that the fine solution operator to be exact, mapping the given wave field at time to :
which has the linear dispersion relation Here we exclusively use to denote the wave number (not the parareal iteration). On the other hand, we approximate the fullydiscrete coarse solver by a semidiscrete scheme. More precisely, in this scheme denoted as , central differencing with step size is used to approximate the spatial derivatives, and the resulting system is integrated in time exactly. We denote the coarse solution
The semidiscrete scheme gives us the numerical dispersion relation
(16) 
Define the difference between the positivesigned and as the numerical dispersion error
(17) 
We see the wellknown property that higher wave numbers lead to larger dispersion errors, and asymptotically, the need to decrease at a faster rate than .
We can thus approximate the energy components of the fine solution, by correcting the numerical dispersion errors in the coarse solutions . The correction can be written as a convolution:
(18) 
where
denotes the Fourier transform from the frequency
to . For details, readers refer to Appendix A.2.We notice that in Equation (16), for small , the leading order term in the correction is an identity map. This fact could serve as a theoretical motivation for adding a skip connection between every corresponding pair of layers in the multilevel network described below. The numerical dispersion relation (16) and (18) also predict that the main role of the neural network would be to overcome the challenges resulting from higher frequency wave components, to say the least.
In Section 3.2, we presents an example illustrating the numerical dispersion errors in a piecewise constant wave wave medium. There, we will compare the correction of such errors by the proposed networks.
2 The neural networks and training
In this section, we describe the network architecture, the optimization model, and the generation of training examples.
The network’s input variables are the energy components of a wave field on the coarse grid and the wave speed function — these are functions on teh coarse grid. The outputs are the energy components of the processed wave field, but on the fine grid. The network does not explicitly involve any approximation of the derivatives of the wave field.
2.1 The JNet architecture
Our chosen network architecture is a Unet architecture [25] with skip connections, though in our setup the downwardupward path in the network resembles more like the letter J.
The network contains a cascade of convolution layers encoding the input tensor into a socalled feature space. Then a cascade of convolution layers decodes the feature space to produce the final output image. The encoding pathway utilizes averagepooling layers to downsample the intermediate output, while the decoding pathway uses the bilinear interpolation. The network structure is also similar to the multigrid network MgNet
[15].The input tensor
has channels consisting of the energy components of the coarse solution and the wave speed functions defined on the coarse grid.
Figure 1 illustrates a 3level network. The networks defined on such structure is the focus of this paper. A network with downsampling processes are referred to as an level network. In particular, we shall report computational results obtained from trained 3level and 6level networks. The ones with ReLU activations will be referred to as JNets. The networks with ReLU replaced by the identity function and no biases will be referred to as the linear networks. In this paper, we will compare these networks after training with different databases.
Finally, as a reference, the forward feed of a 3level JNet
, implemented in PyTorch with a
input tensor, takes roughly seconds on the CPU of a moderately equipped laptop. On the same computer, the fine solver, implemented in NumPy, takes roughly seconds to evolve a given initial wave field of size to time2.2 Data sets and optimization model
To learn the coarsetofine solution map the training data should contain sufficiently representative examples, sampling the wave speed of interest and the wave fields of relevance — those propagated by the coarse and fine solvers, starting from a class of initial conditions of interest. In our setup,
The wave speed and the wave field are sampled from the distributions described below.
Both the coarse and the fine solvers use the standard second order scheme defined in Section 1.3. The discretization parameters of the solvers are tabulated in Table 2.
The data sets discussed below will be made available for download.
order of interp  
th order 
The wave speed and initial wave field samples and
The data set for the medium wave speed comprises of randomly selected subregions (of different areas and rotations) of the full Marmousi model [8] and the BP model [6]. The wave speed in each of the subregions will be mapped (subsampled and rescaled^{3}^{3}3The wave speed in each medium example is rescaled by an integer, allowing the coarse solver to be stable with the predetermined and .) to a array, which is regarded as a grid function defined on , . See Figure 2.
Figure 3 illustrates the statistics in the wave speeds sampled from these two data sets. Additionally, we remark that Wu et al. [30] developed a comprehensive methodology to synthesize the mediums with realistic earth layers and faults. Their approach may be use to further enrich the data set.
For each of the sampled subregions, an initial wave field, is sampled from the Gaussian pulses of the form:
The pair thus sampled is collected in
The example wave fields and
We recognize the need to sample the strong causality in the wave dynamics. Ergo we sample the trajectories passing through the wave speedinitial wave field pair in and Figure 4 demonstrate the scheme which we use to generate the training two training sets. The trajectories are discretized by a finite number of points computed from the fine solver (one could use the coarse solver for this as well). The points from the trajectories are paired with the associating wave speed function and collected into the training data sets.

We first collect the sequence of wave fields propagated by the fine solver:
Then consists of the pairs
(19) 
This data set consists of the pairs
(20)
We generate examples from and , and name the resulting databases and respectively. When the context is clear, we will use and for simplicity. Figure 5 shows the Fourier mode statistics of the wave fields in and . We observe that the wave fields in have more high frequency modes.
The optimization model and training
We use a simple stochastic gradient descent algorithm to minimize the mean squared error (MSE)
(21) 
where is the training data set and denotes the number of training examples. is either or defined above. Here the summands correspond to the approximation errors in the discrete wave energy (13). Similar strategies of using wave energy to compare wave fields for wave propagation purposes have been used successfully, for example in [24][28], and [20].
The neural networks are trained by the minibatch stochastic gradient descent (SGD) with momentum, starting with the initial conditions prescribed by PyTorch’s builtin routines. The training data is randomly divided into batches of 128 examples. We report the MSE in a typical training progress for each data set in Figure 6. Notice that the training errors on the right subplot appear approximately one order of magnitude smaller than those on the left. This is a subject of future investigation. It is possibly due to the Procrustes solutions’s being closer to each other in some suitable sense.
In Table 3, we further report a comparison of the final training errors and testing errors between the 3level and 6level JNet, using .
3level  6level  3level  6level  

Final training error  
Test error 
3 Propagation
In this section, we study the enhanced propagator in a few setups. We shall report the computed results using the relative energy error
where is the computed wave field and is the reference solution.
3.1 Comparison of propagation errors on constant media
We test the enhanced solvers on constant media of different wave speeds, using initial wave fields consisting of pulses of various widths. The networks are trained on data sets of different cardinalities, sampled from the same distribution.
The test initial wave field is defined as follows:
where is sampled uniformly . Larger valuers of correspond to wave fields whose Fourier modes for higher wave numbers have larger magnitudes (”sharper” wave fields).
Coarse and fine solutions are propagated in a constant medium . The errors in the energy is defined as
(22) 
We remark that the dependence of for and are hidden from the notation.
We report a set of comparisons of in Figures 7 and 8 for and respectively. The results are obtained from the 6level linear and ReLU JNets, with the parameters reported in Table 2. The networks trained on and . In Figure 7, from the two rows of error images, we observe that while linear networks produce roughly comparable errors to those from the ReLU networks in the region of lower wave speed and smaller . On the opposite regime (larger and ), the nonlinear ReLU networks perform better. As discussed in Section 1.3, sharper wave fields and faster wave speeds result in larger nummerical dispersion errors. The approximation power in the nonlinear networks seems to contribute the better performance. We also notice that the ReLU networks trained with 10,000 examples seems to achieve slightly larger minimal errors than those by the networks trained with 5,000 examples. We will see in Section 4 that a more significant number of training examples does result in better generalization capacity for the ReLU networks and perhaps not surprisingly have relatively little impact on the linear networks. Figure 8 presents a different situation. With , the difference between the coarse solver and the fine solver is larger. Here we see a clear advantage in using the nonlinear networks.
3.2 Numerical dispersion and refraction
In this section, we study the enhanced solvers in a wave medium described by piecewise constant wave speed of moderate contrast.
Figure 9 shows the model layered medium and the anisotropic initial Gaussian pulse. We apply different solvers to propagate that initial wave field. The wave fronts associated with different wavenumbers travel at different speeds and transmit through the interfaces (the wave speed discontinuities) at different times. Figure 10 shows the energy of the propagated wave fields. The networks used in these simulations are trained on The proposed networks can provide a certain level of nontrivial correction to the dispersion and refraction in the coarse solution.
3.3 More complex wave speeds
We demonstrate the effectiveness of the trained neural networks in proving the accuracy of coarse solver. We compute the evolution of the wave field
with the initial condition
We consider four test media:

Waveguide: ,

Inclusion: ,

BP model (rescaled): ,

Marmousi model (rescaled): .
These four models are illustrated in Figure 12. Note that these examples are not present in the training data set. The parameters used in the solvers are reported in Table 2.
Figure 13 presents a comparison of the coarse solver and a few enhanced solvers . Compared to the training and testing errors presented in Table 3, we see that the effect of the ”downsampling” operator dominates to a large degree. We encourage the readers to compare this result to the one presented in Figure 18 in the next section.
In Figure 14 we show the ”phantom” energy in the output of While this can be regarded as an undesirable effect, it does improve the amplitude of the propagated wave field. We also observe that the trained 6level network produces drastically reduced amount of ”phantom” energy.
4 Improvement by parareal iterations
One of the shortcomings of employing a current deep learning strategy for scientific computing tasks is the lack of rigorous error estimates. Our idea is to use the proposed deep learning strategy in a selfimproving loop, in which one can show that the computed errors will contract along the iterations. Here, we adopt the parareal framework, originally proposed in [17], as the potentially selfimproving looping mechanism.
Parareal methods can be regarded as a fixed point iteration that involves the coarse and the fine solvers:
(23) 
where the loop is enumerated in and the subscript denotes the time stepping. The initial conditions for the iterations are defined by
(24) 
In the context of this paper,
is defined on the fine grid, and
The convergence of parareal methods are analyzed for different contexts in several papers. See e.g. [5],[13],[4][26][16]. We adhere to a simple version, which states that at the discrete level with the vector norm, the error amplification factor for the parareal iteration is the product of and with . The iteration is convergent when the coarse solver is sufficiently accurate, depending on the magnitude of and the size of the iterative system, . It is a challenge to construct a stable and convergent parareal method for purely hyperbolic problems, especially when the coarse solver is underresolving the medium.
As we have seen in the serial computation examples in Section 2.2, our network reduces the errors in the coarse solutions by 90% in very challenging media. Hence, we expect the parareal iteration involving the enhanced coarse solver to be more stable and, in turn, be used to improve approximation accuracy.
Following the framework proposed as the parareal schemes [4][20], the focus of this section rest on schemes of the form:
(25) 
where the operator under consideration include

is defined by a JNet, trained offline.
The Procrustes parareal method
The Procrustes parareal method is defined as follows:
(26) 
where and for , solves the following optimization problem:
(27) 
The data matrices and are defined as follows:

The th column of corresponds to the vector ,

The th column of corresponds to the vector ,
For , are defined in (24). We call this method the Procrustes parareal method since is constructed by solving the Orthogonal Procrustes Problem [14].
The Procrustes parareal algorithm is introduced in[20]. In the original version, the fine and coarse solvers communicate on the coarse grid. This means that the data matrices records the wave fields on the coarse grid and is smaller. In contrast, in the version presented above, the the solvers communicate on the fine grid. This allows the resulting algorithm to correct for details corresponding to higher wave numbers.
Both versions of the Procrustes parareal method are remarkably stable for wave propagation. We point out again, that due to this stability, we are able to use this method to generate training examples that lead to more performant neural networks. See Section 2.2.
The neural network approach
Of course, the center of this paper is
and is a pretrained neural network discussed above. We initialized the parareal iterations by the coarse solutions:
(28) 
4.1 Numerical simulations and comparisons
First, we follow up the dispersion study in Section 3.2 and present the wave energy field computed in the first four parareal iterations in Figure 11. There we observe the effect of the additive correction of the parareal updates — with accurate phase, the amplitudes are restored effectively.
In Figures 1519, we observe that the initial errors () from different enhanced solvers are similar to the results reported in Figure 13. The initial errors are dominated by the effects of the restriction operator in and being repeatedly applied in the time stepping process. However, as increases, the effect of having better phase errors in the Jnets can be observed. In some situations, the advantage of having more dissipation can also be observed.
We present a comprehensive accuracy study in Figure 15, comparing the solvers that we have discussed so far. There, we observe the importance of the training data size for the nonlinear networks. We also notice that the linear networks perform surprisingly well in the studied setup (particularly the linear network trained with on 2000 examples). One possible explanation is that the operator to be approximated,
is closer to the identity matrix with a nonlinear perturbation (both
and are nonlinear functions of ); as least forand the class of wave speeds under consideration. Ergo, linear regression may yield reasonable approximations. However, for larger
, we have seen in Figure 8 that linear networks are inferior.Figure 16 shows a comparison of the networks trained on and We observe that the network trained on yields better performance after a few iterations. It is possible that the wave fields in are closer to the ones encountered in the simulations performed in this experiment — the wave fields are computed by parareal style updates.
In Figure 17 we compare the accuracy of the threelevel JNets trained for and We first observe from the errors at that the enhanced solver with the smaller is more accurate. Except for the BPmodel, the performance of the parareal iterations with the two networks seem comparable.
In Figure 18, we present a comparison of the performance of the proposed parareal correction, (25), using the threelevel and the sixlevel networks, both trained on . The iterations with the threelevel network appear to yield smaller errors because the threelevel network has larger numerical dissipation (See Figure 10 and Section 3.2).
In Figure 19, we compare the Procrustes approach defined in (26) and the coarse solver enhanced by a 6level JNet with
So what can we conclude from these examples? First, the phase accuracy is of upmost importance in the performance of the pararealstyle iterations. Second, the parareal scheme itself is sensitive to the amount of dissipation in the system. For systems with little or no dissipation, such as the hyperbolic system that we consider in this paper, the size of the iterative system, will have a critical role: the larger it is, the smaller the difference between and has to be in order to have stability and achieve the desired correction. Of course, the stability and convergence for such type of iterations is well understood, e.g. [31]. Our examples not only confirm such theories, but also demonstrate the feasibility of adopting a datadriven approach, e.g. the Procrustes approach [20] using online data or the deep learning approach using offline data.
Finally, we noticed that the BP model presents more difficulty for the proposed strategy. One possible cause may be the higher contrast in the BP model — particularly the large discontinuity across the periodically extended top and bottom boundaries — and that the training data should include more instances of wave propagating through such type of interfaces. This is a subject of future investigation.
5 Summary
In this paper, we presented a deep learning approach for improving the accuracy of wave propagation by an underresolving numerical solver. The data used for training the neural networks come from simulations involving the coarse solver to be corrected and a more accurate fine solver. We incorporated the developed deep learning framework into the parareal scheme, thereby improving its convergence property and the accuracy of the simulated wave propagation. Experimental results show that the presented approach dramatically enhances the stability and accuracy of the assimilated parareal scheme in a class of challenging media.
There is much room for generalization and improvement. We presented an approach to learn the correction operator for fixed . Would it be possible to learn as a function of ? The presented numerical examples comparing networks trained on and demonstrated the importance of sampling the computationally relevant, strongly causal wave fields. A more rigorous understanding of the wave fields manifold and devising efficient sampling algorithms would undoubtedly lead to more effective learning results.
Acknowledgment
Nguyen and Tsai were supported partially by National Science Foundation Grant DMS1913209. Tsai is also partially supported by National Science Foundation Grant DMS1720171, and by Army Research Office, under Cooperative Agreement Number W911NF1920333. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
The authors thank Louis Ly for stimulating conversations on the design and training of neural networks. The authors thank Texas Advanced Computing Center (TACC) for the computing resource.
References
 [1] (2011) Finite element heterogeneous multiscale method for the wave equation. Multiscale Modeling & Simulation 9 (2), pp. 766–792. Cited by: §1.
 [2] (2017) Localized orthogonal decomposition method for the wave equation with a continuum of scales. Mathematics of Computation 86 (304), pp. 549–587. Cited by: §1.
 [3] (2016) Parareal multiscale methods for highly oscillatory dynamical systems. SIAM J. Sci. Comput. 38 (6). Cited by: §1.1.
 [4] (2017) Thetaparareal scheme. arXiv:1704.06882 [math.NA]. External Links: Link Cited by: §4, §4.

[5]
(2005)
On the convergence and the stability of the parareal algorithm to solve partial differential equations
. In Domain Decomposition Methods in Science and Engineering, pp. 425–432. External Links: Document, ISBN 9783540268253, Link Cited by: §4.  [6] (2005) The 2004 bp velocity benchmark. In 67th EAGE Conference & Exhibition, pp. cp–1. Cited by: §2.2.
 [7] (2020) Reduced order model approach to inverse scattering. SIAM Journal on Imaging Sciences 13 (2), pp. 685–723. Cited by: §1.
 [8] (1990) Marmousi, model and data. In EAEG workshoppractical aspects of seismic data inversion, Cited by: §2.2.
 [9] (2018) A nonlinear method for imaging with acoustic waves via reduced order model backprojection. SIAM Journal on Imaging Sciences 11 (1), pp. 164–196. Cited by: §1.
 [10] (2009) Multiscale finite element methods: theory and applications. Vol. 4, Springer Science & Business Media. Cited by: §1.
 [11] (2012) Multiscale methods for wave propagation in heterogeneous media over long time. In Numerical analysis of multiscale computations, pp. 167–186. Cited by: §1.
 [12] (2019) A highorder multiscale finiteelement method for timedomain elastic wave modeling in strongly heterogeneous media. Journal of Applied Geophysics 170, pp. 103852. Cited by: §1.
 [13] (2007) Analysis of the parareal timeparallel timeintegration method. SIAM Journal on Scientific Computing 29 (2), pp. 556–578. Cited by: §4.
 [14] (2012) Matrix computations. Vol. 3, JHU Press. Cited by: §4.
 [15] (2019) MgNet: a unified framework of multigrid and convolutional neural network. Science china mathematics 62 (7), pp. 1331–1354. Cited by: §2.1.
 [16] (2018) Influence of the phase accuracy of the coarse solver calculation on the convergence of the parareal method iteration for hyperbolic pdes. Computing and Visualization in Science (English). External Links: ISSN 14329360 Cited by: §1.1, §4.
 [17] (2001) A ”parareal” in time discretization of pde’s. Comptes Rendus de l’Academie des Sciences 332, pp. 661–668. Cited by: §1.1, §4.
 [18] (2019) PPINN: parareal physicsinformed neural network for timedependent pdes. arXiv preprint arXiv:1909.10145. Cited by: §1.2.
 [19] (2020) Solving the wave equation with physicsinformed deep learning. arXiv preprint arXiv:2006.11894. Cited by: §1.2.
 [20] (2020) A stable parareallike method for the second order wave equation. Journal of Computational Physics 405, pp. 109156. Cited by: §1.1, §1.2, §1, §2.2, §4, §4.1, §4.
 [21] (2008) Numerical homogenization of the acoustic wave equations with a continuum of scales. Computer Methods in Applied Mechanics and Engineering 198 (34), pp. 397–406. Cited by: §1.
 [22] (2019) Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.2, §1.2.
 [23] (2019) Learned iterative solvers for the helmholtz equation. In 81st EAGE Conference and Exhibition 2019, Vol. 2019, pp. 1–5. Cited by: §1.2, §1.2.
 [24] (2016) Acoustic wavefield imaging using the energy norm. Geophysics 81 (4), pp. S151–S163. Cited by: §2.2.
 [25] (2015) Unet: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pp. 234–241. Cited by: §2.1.
 [26] (201806) Wave propagation characteristics of parareal. Comput. Vis. Sci. 19 (12), pp. 1–17. External Links: ISSN 14329360, Link, Document Cited by: §1.1, §4.
 [27] (2019) Neural network augmented waveequation simulation. arXiv preprint arXiv:1910.00925. Cited by: §1.2, §1.2.
 [28] (2009) Gaussian beam decomposition of high frequency wave fields. Journal of Computational Physics 228 (23), pp. 8856 – 8871. External Links: Document, ISSN 00219991, Link Cited by: §1.2, §2.2.
 [29] (2003) The heterognous multiscale methods. Communications in Mathematical Sciences 1 (1), pp. 87–132. Cited by: §1.
 [30] (2020) Building realistic structure models to train convolutional neural networks for seismic structural interpretation. Geophysics 85 (4), pp. WA27–WA39. Cited by: §2.2.
 [31] (2017) Algebraic multigrid methods. Acta Numerica 26, pp. 591–721. Cited by: §4.1.
Appendix A Appendix
a.1 Defining
Given a vector and , and assuming that for some function periodic . The main task is to find . Of course, without any additional assumption, one can only recover a function from its partial derivatives up to a constant. So, we may define a pseudoinverse of in the Fourier domain as:
(29) 
In the setup of this paper, we need to evaluate
This means that comes from the first component of We take to be the discrete sum of the coarse wave field , defined by . More precisely,
Such choice of can be considered as a consistency condition, when is replaced by the identity operator, and the coarse grid is identical to the fine grid.
a.2 Analytic correction operator in constant media
Here we derive the formula for correcting the numerical dispersion error of the semidiscrete scheme in one space dimension and a constant medium. The precise context is discussed in discussed in Section 1.3.
We use the plane wave ansatz
where are functions to be determined from the initial conditions, and .
Plugging the Ansatz into the definition of energy components
(30)  
(31) 