Log In Sign Up

Numerical wave propagation aided by deep learning

by   Hieu Nguyen, et al.

We propose a deep learning approach for wave propagation in media with multiscale wave speed, using a second-order linear wave equation model. We use neural networks to enhance the accuracy of a given inaccurate coarse solver, which under-resolves a class of multiscale wave media and wave fields of interest. Our approach involves generating training data by the given computationally efficient coarse solver and another sufficiently accurate solver, applied to a class of wave media (described by their wave speed profiles) and initial wave fields. We find that the trained neural networks can approximate the nonlinear dependence of the propagation on the wave speed as long as the causality is appropriately sampled in training data. We combine the neural-network-enhanced coarse solver with the parareal algorithm and demonstrate that the coupled approach improves the stability of parareal algorithms for wave propagation and improves the accuracy of the enhanced coarse solvers.


page 13

page 14

page 20

page 21

page 22

page 23

page 27


Multigrid-augmented deep learning preconditioners for the Helmholtz equation

In this paper, we present a data-driven approach to iteratively solve th...

Simulation of the propagation of a cylindrical shear wave : non linear and dissipative modelling

The simulation of a wave propagation caused by seismic stimulation allow...

Deep Eikonal Solvers

A deep learning approach to numerically approximate the solution to the ...

Site-specific Deep Learning Path Loss Models based on the Method of Moments

This paper describes deep learning models based on convolutional neural ...

Fourier Neural Operator Networks: A Fast and General Solver for the Photoacoustic Wave Equation

Simulation tools for photoacoustic wave propagation have played a key ro...

1 Introduction

The computation of wave propagation is a critical ingredient in many important applications. For example, in inverse problems involving the wave equation as the forward model, efficient numerical simulation of wave propagation directly corresponds to a more precise and faster imaging pipeline.

In this work, we will focus on the model initial value problem:


with periodic boundary conditions. Furthermore, we shall assume that (i) the wave speed is only piecewise smooth with non-trivial total variation, and (ii) the wavelengths in the initial wave field are at least one order of magnitude smaller than the domain size.

Performing large-scale high-fidelity simulations of wave propagation in heterogeneous media is a challenging task. Standard numerical methods typically require very fine spatio-temporal grids to fully resolve the smallest wavelength in the solutions, the geometry in the wave speed’s discontinuities, and to reduce excessive numerical dispersion errors. Furthermore, the stability and accuracy requirement imposes an additional restriction on the time step size, making long-time simulation even more difficult.

Many existing multiscale methods tackle the wave equation by assuming a separation of scales in the medium’s wave speed. The assumption on more global structures such as periodicity or stationary ergodicity is also common. Such assumptions allow a cleverly designed algorithm to achieve a computational complexity that is essentially independent of the need to resolve the smallest scale globally in space-time. These are algorithms may employ either online or offline computations to approximate the effective wave propagation accurately. We mention the Heterogeneous Multiscale Methods (HMM), see e.g. [29][1][11], which typically solve the effective problem, free of small parameters, defined by localized online computations. The Multiscale Finite Element Method (MsFEM) style algorithms [10], including the Localized Orthogonal Decomposition (LOD) method, e.g. [2], compute specialized basis functions for each given multiscale medium. See also [12], and [21]. These methods do not require scale separation, but need to fully resolve the medium by the basis functions. As the basis functions are medium-specific, these algorithms typically aim to solve the same equation repeatedly with different initial conditions. See also [12], and [21]. It is also possible to construct low-rank approximations of the numerical wave propagator using data. Druskin et. al. [9, 7] introduced a data-driven reduced order model (ROM) for the wave equation. Specifically, using wave field time snapshots the authors construct a wave propagator allowing very accurate inversion of the medium.

In [20], we proposed a multiscale algorithm in which the computation of a coarse, under-resolving solver is corrected by online, using data computed, in-parallel and in localized subdomains, by another accurate solver.

1.1 The proposed approach

Our approach departs from the above multiscale methodologies. We propose a supervised deep learning framework to enhance the accuracy of a low-fidelity and computationally cheaper solver for (1). After describing how neural networks are used in our approach, we will present our choice of neural network model (the architecture, the input, and output variables), and our approach to generating training data that consistently sample the causality in wave propagation in a class of media with piecewise smooth wave speeds. We will also discuss the rationales behind our approach and report several representative numerical studies.

We assume that a time-reversible coarse solver, , is given. It advances a given wave field to time using the given wave speed profile . Additionally, an accurate fine solver, is also available. The fine solver is used to generate data for improving the coarse solver. As these two solvers operate on grids of different resolutions (a coarse and a fine grid), we will extend the coarse solver to one defined on the same grid as the fine solver for comparing the wave fields propagated by the two solvers. The extension is defined via a restriction (projection) operator

, which maps fine grid functions to the coarse grid. We shall also need a prolongation (e.g. interpolation) operator

, which maps coarse grid functions to the fine grid. and are typical elements in a standard multigrid algorithm.

We reorganize the time stepping by the fine solver into


In general, can be thought of as a low-pass filter, and is not identity; i.e. . The second term above, , should consist mostly the higher frequency components of As for the first term, since filters out the higher frequency parts in , it is reasonable to approximate by a computationally cheaper strategy using a coarser grid — we consider .

If we further introduce a fixed-point iteration that allows the fine solver to be applied in parallel, we would derive a parareal scheme [17]:




The second term on the right hand side of (3) serves as an attempt to add back the missing high frequency components, using information from the previous iteration.

The iteration in (3) is often unstable when the solvers have too little dissipation. In such cases, the stability and convergence of (3) will rely heavily on being a good approximation of . However, in heterogeneous wave media, will likely produce higher frequency modes, even if have only low frequency modes. Consequently, we should not expect the approximation by to be globally sufficient.

One idea is to replace in by something more elaborate. Our objective is to construct neural networks that approximate the operation


using the data , with and sampled from certain distributions of interests. We then use the resulting enhanced solver


for wave propagation on the fine grid. Note that in most parts of this paper the solvers’ dependence on the wave speed, , is hidden for brevity of notation.

We will rely on the parareal iteration to add back the missing high frequency components. The enhanced parareal scheme take the same form as above




The wave solution operator, and most numerical solvers, are linear and reversible with respect to the initial data but nonlinear to the wave speed. So is However, the operator is a so-called ”near-identity” operator for sufficiently smooth wave speeds and wave fields. Therefore, nearby a fixed reference speed function , the linear operator has potential to offer an adequate, though non-optimal, approximation of .

It may be helpful to inspect the approximation of an individual Fourier mode of the wave field and consider errors in the phases and the amplitudes. The additive correction in the parareal iterations is very effective in reducing the amplitude errors. However, the phase errors generally have a detrimental effect on the stability of parareal iterations. See e.g. [3][20][26][16]. It is therefore imperative that corrects the numerical dispersion errors in compared to Furthermore, as discussed above, we also expect that to introduce some of the higher frequency modes that come out of propagating a smooth given wave field through the given non-constant media.

Notice that for each instance of , is a linear operator, provided that the prolongation is linear. With the nonlinear activations and biases in the network removed, the optimization model defines a regression model to construct data-driven (factorized) linear approximations of the operator associated with . In which regime will this type of linear approximation work? We shall present some experiments and comparisons regarding this observation.

In Section 2, we present the proposed learning model and our approach in generating training data. In Section 3, we report results showing that a properly trained, simple, multi-level network can be an effective corrector of the coarse solver in a range of media. In Section 4, we apply the proposed networks to the parareal scheme in four wave speed models.

1.2 Other related work

The authors in [27][23]

train a convolutional neural network (CNN) for correction of low-fidelity solutions. As a supervised learning task, the data consists of low-fidelity and high-fidelity pairs, computed using the second-order and twentieth-order finite-difference, respectively. The data are time snapshots of the solutions for various media and sources. The experimental results in these papers indicate that the trained CNN can remove the numerical dispersion in complex velocity models. This result is encouraging since their setup is quite straightforward.

In [19], a neural network model is proposed to learn wave propagation on a fixed medium. The Physics-Informed Neural Network (PINN) [22]

approach to setup the neural network’s optimization model. This means the use of the given PDE on a chosen setup of points in the domain as the penalty term. The fully-connected neural network learns the linear relation between the initial wave source and the resulting wave field at later time. Each forward feed outputs the estimated wave field at the specified location. Applying the network for simulation on a different wave medium requires retraining the network.

Another approach to using neural networks in a parareal framework is discussed in [18]. The authors propose parareal physics-informed neural network (PPINN). In particular, the sequence of fine solvers, mapping solutions from to ,

, are fully connected neural networks (PINNs) trained with loss functions that incorporate penalize the network outputs misfit to given differential equations. See

[22]. The key idea of [18] is using the the parareal scheme to train the sequence of PINNs in parallel. However, to the best of our current knowledge, the accuracy and stability of the PINN method for wave problems are not well understood.

Our approach is similar to [27][23], but with additional key findings and further improvements. Most differently, our algorithm extensively uses the wave energy variables; i.e. the spatial gradient and the time derivative of weighted by the wave speed. We think that this strategy allows the algorithm to retain more physical properties of the wave system. This setup also leads to the input variables being dimensionless, which allows the algorithm to scale better. Such approach involving the energetic variables are used successfully in [28][20].

1.3 The numerical solvers

In this section, we describe the coarse and fine solvers used in this paper. Both solvers are defined by the standard central differencing for the partial derivatives, applied on uniform Cartesian grids for the spatial domain and uniform stepping in time. Starting with the discretization of the spatial derivatives:




and denote the standard basis of The central scheme in time (which is the Leap Frog scheme for (9)) can be written as the velocity Verlet time integrator for : (with approximating )


where and This scheme scheme is widely use in practice for large scale computations, particularly for media with less smooth wave speeds.

We will denote the mapping defined in (11) by Both the fine and the coarse solvers will be defined by with different and .

The fine solver

used in this paper is defined by the same scheme (11) on a finer grid: . More precisely,

The coarse solvers

is defined by running (11) for time steps on the coarse grid ; i.e.

We assume that both the fine and the coarse solvers are stable on the respective grids. Furthermore, we assume that the fine solver is sufficiently accurate for wave speed , where is an upper bound for the wave speed in the class of wave media of interest.

If the two solvers are defined on different Cartesian grids (), we need to employ additional operators to map grid functions defined on the two grids. In this paper, denotes the restriction operator that maps a finer grid function defined on to one on . denotes the prolongation operator that maps a coarse grid function to fine grid.

With a slight abuse of notations, we will also use , , and to denote the corresponding discrete versions computed on the fine grid.

Wave energy components

The solvers approximately preserves the wave energy


or its discretized version, which also defines a discrete semi-norm on the grid:


Due to this property, we shall also use a discretized version of the wave energy as a semi-norm for comparing wave fields computed by the central scheme. In this paper, we will design algorithms that are defined as functions of the energy components of the wave field .

On the spatial grid with periodic boundary conditions, define the mapping into energy components as


where is computed by numerically. denotes the pseudo-inverse of and is defined in Appendix A.1.

The enhanced solvers

Our goal is to construct neural network functions , parameterized by , such that

For a class of piecewise smooth wave speed functions. In other words, corrects the energy components computed by the coarse solver , according to those computed by .

With , we define the enhanced solver as

The time step size of the enhanced coarse solver

The wave field vector:

The correction operator
The coarse solver defined on a coarse grid
The fine solver
The enhanced solver on the fine grid.
The neural networks used for defining
The discrete wave energy norm of .
An elementary training data set
The training data set derived from
Table 1: A list of key notations.

Correction of dispersion errors

Here we present a simple and classical dispersion property of the central scheme. The point is to identify a regime in which is an identity plus a minor perturbation. In such a regime, the enhanced solver should approximate the perturbation.

We analyze in a simplified setup involving plane waves: propagating in a medium with constant speed in one dimension. We assume that the fine solution operator to be exact, mapping the given wave field at time to :

which has the linear dispersion relation Here we exclusively use to denote the wave number (not the parareal iteration). On the other hand, we approximate the fully-discrete coarse solver by a semi-discrete scheme. More precisely, in this scheme denoted as , central differencing with step size is used to approximate the spatial derivatives, and the resulting system is integrated in time exactly. We denote the coarse solution

The semi-discrete scheme gives us the numerical dispersion relation


Define the difference between the positive-signed and as the numerical dispersion error


We see the well-known property that higher wave numbers lead to larger dispersion errors, and asymptotically, the need to decrease at a faster rate than .

We can thus approximate the energy components of the fine solution, by correcting the numerical dispersion errors in the coarse solutions . The correction can be written as a convolution:



denotes the Fourier transform from the frequency

to . For details, readers refer to Appendix A.2.

We notice that in Equation (16), for small , the leading order term in the correction is an identity map. This fact could serve as a theoretical motivation for adding a skip connection between every corresponding pair of layers in the multi-level network described below. The numerical dispersion relation (16) and (18) also predict that the main role of the neural network would be to overcome the challenges resulting from higher frequency wave components, to say the least.

In Section 3.2, we presents an example illustrating the numerical dispersion errors in a piecewise constant wave wave medium. There, we will compare the correction of such errors by the proposed networks.

2 The neural networks and training

In this section, we describe the network architecture, the optimization model, and the generation of training examples.

The network’s input variables are the energy components of a wave field on the coarse grid and the wave speed function — these are functions on teh coarse grid. The outputs are the energy components of the processed wave field, but on the fine grid. The network does not explicitly involve any approximation of the derivatives of the wave field.

2.1 The JNet architecture

Our chosen network architecture is a U-net architecture [25] with skip connections, though in our setup the downward-upward path in the network resembles more like the letter J.

The network contains a cascade of convolution layers encoding the input tensor into a so-called feature space. Then a cascade of convolution layers decodes the feature space to produce the final output image. The encoding pathway utilizes average-pooling layers to downsample the intermediate output, while the decoding pathway uses the bilinear interpolation. The network structure is also similar to the multigrid network MgNet


Figure 1:

Diagram of a 3-level network. If the network is ReLU activations, then there are bias parameters in each conv filter, followed by a ReLU activation and a batch norm layer.

The input tensor

has channels consisting of the energy components of the coarse solution and the wave speed functions defined on the coarse grid.

Figure 1 illustrates a 3-level network. The networks defined on such structure is the focus of this paper. A network with down-sampling processes are referred to as an -level network. In particular, we shall report computational results obtained from trained 3-level and 6-level networks. The ones with ReLU activations will be referred to as JNets. The networks with ReLU replaced by the identity function and no biases will be referred to as the linear networks. In this paper, we will compare these networks after training with different databases.

Finally, as a reference, the forward feed of a 3-level JNet

, implemented in PyTorch with a

input tensor, takes roughly seconds on the CPU of a moderately equipped laptop. On the same computer, the fine solver, implemented in NumPy, takes roughly seconds to evolve a given initial wave field of size to time

2.2 Data sets and optimization model

To learn the coarse-to-fine solution map the training data should contain sufficiently representative examples, sampling the wave speed of interest and the wave fields of relevance — those propagated by the coarse and fine solvers, starting from a class of initial conditions of interest. In our setup,

The wave speed and the wave field are sampled from the distributions described below.

Both the coarse and the fine solvers use the standard second order scheme defined in Section 1.3. The discretization parameters of the solvers are tabulated in Table 2.

The data sets discussed below will be made available for download.

order of interp
-th order
Table 2: Discretization parameters for the fine and coarse solver.

The wave speed and initial wave field samples and

The data set for the medium wave speed comprises of randomly selected subregions (of different areas and rotations) of the full Marmousi model [8] and the BP model [6]. The wave speed in each of the subregions will be mapped (subsampled and rescaled333The wave speed in each medium example is rescaled by an integer, allowing the coarse solver to be stable with the predetermined and .) to a array, which is regarded as a grid function defined on , . See Figure 2.

Figure 3 illustrates the statistics in the wave speeds sampled from these two data sets. Additionally, we remark that Wu et al. [30] developed a comprehensive methodology to synthesize the mediums with realistic earth layers and faults. Their approach may be use to further enrich the data set.

For each of the sampled subregions, an initial wave field, is sampled from the Gaussian pulses of the form:

The pair thus sampled is collected in

Next, we generate a set of training examples that have more complex initial wave fields. For each in , we collect that are computed by the Procrustes Parareal scheme, as defined in (26)-(27), for and . We shall refer to this collection as .

The example wave fields and

We recognize the need to sample the strong causality in the wave dynamics. Ergo we sample the trajectories passing through the wave speed-initial wave field pair in and Figure 4 demonstrate the scheme which we use to generate the training two training sets. The trajectories are discretized by a finite number of points computed from the fine solver (one could use the coarse solver for this as well). The points from the trajectories are paired with the associating wave speed function and collected into the training data sets.

We first collect the sequence of wave fields propagated by the fine solver:

Then consists of the pairs


This data set consists of the pairs


We generate examples from and , and name the resulting databases and respectively. When the context is clear, we will use and for simplicity. Figure 5 shows the Fourier mode statistics of the wave fields in and . We observe that the wave fields in have more high frequency modes.

Figure 2: Square subregions of different sizes in the Marmousi profile are rotated and rescaled to a reference domain with the sample wave speed . and are applied to different initial pulses on it to generate training data.
Figure 3: Statistics of the wave speed in 100 sampled wave media.
Figure 4: The sampling of the solution manifold: On each random sample of medium wave speed, the coarse and fine solvers propagate initial wave fields consisting of random Gaussian pulses. Of each data point, corresponds to a blue box above, and a corresponding red box.
Figure 5: Sum of Fourier mode magnitude squared of the component of in the training wave field. The results are plotted in logarithmic scale.
The optimization model and training

We use a simple stochastic gradient descent algorithm to minimize the mean squared error (MSE)


where is the training data set and denotes the number of training examples. is either or defined above. Here the summands correspond to the approximation errors in the discrete wave energy (13). Similar strategies of using wave energy to compare wave fields for wave propagation purposes have been used successfully, for example in [24][28], and [20].

The neural networks are trained by the mini-batch stochastic gradient descent (SGD) with momentum, starting with the initial conditions prescribed by PyTorch’s built-in routines. The training data is randomly divided into batches of 128 examples. We report the MSE in a typical training progress for each data set in Figure 6. Notice that the training errors on the right subplot appear approximately one order of magnitude smaller than those on the left. This is a subject of future investigation. It is possibly due to the Procrustes solutions’s being closer to each other in some suitable sense.

In Table 3, we further report a comparison of the final training errors and testing errors between the 3-level and 6-level JNet, using .

3-level 6-level 3-level 6-level
Final training error
Test error
Table 3: Train and test errors of networks. The error is evaluated on the batch size of . We trained these networks on for epochs; the learning rate in the first 1000 epochs is set to , then reduced to .
Figure 6: The training progress of two different data sets. Left: data set . Right: data set containing snapshots of Procrustes parareal in crops of Marmousi and BP medium. Each data set has samples. In both training sessions, the batch size is 128, and the learning rate is .

3 Propagation

In this section, we study the enhanced propagator in a few setups. We shall report the computed results using the relative energy error

where is the computed wave field and is the reference solution.

3.1 Comparison of propagation errors on constant media

We test the enhanced solvers on constant media of different wave speeds, using initial wave fields consisting of pulses of various widths. The networks are trained on data sets of different cardinalities, sampled from the same distribution.

The test initial wave field is defined as follows:

where is sampled uniformly . Larger valuers of correspond to wave fields whose Fourier modes for higher wave numbers have larger magnitudes (”sharper” wave fields).

Coarse and fine solutions are propagated in a constant medium . The errors in the energy is defined as


We remark that the dependence of for and are hidden from the notation.

We report a set of comparisons of in Figures 7 and 8 for and respectively. The results are obtained from the 6-level linear and ReLU JNets, with the parameters reported in Table 2. The networks trained on and . In Figure 7, from the two rows of error images, we observe that while linear networks produce roughly comparable errors to those from the ReLU networks in the region of lower wave speed and smaller . On the opposite regime (larger and ), the nonlinear ReLU networks perform better. As discussed in Section 1.3, sharper wave fields and faster wave speeds result in larger nummerical dispersion errors. The approximation power in the nonlinear networks seems to contribute the better performance. We also notice that the ReLU networks trained with 10,000 examples seems to achieve slightly larger minimal errors than those by the networks trained with 5,000 examples. We will see in Section 4 that a more significant number of training examples does result in better generalization capacity for the ReLU networks and perhaps not surprisingly have relatively little impact on the linear networks. Figure 8 presents a different situation. With , the difference between the coarse solver and the fine solver is larger. Here we see a clear advantage in using the nonlinear networks.

Figure 7: A comparison of the of linear and ReLU nets, trained with different number of examples drawn from the same distribution. The results are obtained with networks trained on with .
Figure 8: A comparison of the of linear and ReLU nets, trained with different number of examples drawn from the same distribution. The ReLU JNets exhibit better accuracy in the regime of larger wave peed and higher wave numbers. Here, the results are obtained with 6-level networks trained on with .
Figure 9: Left: The wave speed profile of a model layered medium. Right: The initial wave field .

3.2 Numerical dispersion and refraction

In this section, we study the enhanced solvers in a wave medium described by piecewise constant wave speed of moderate contrast.

Figure 9 shows the model layered medium and the anisotropic initial Gaussian pulse. We apply different solvers to propagate that initial wave field. The wave fronts associated with different wavenumbers travel at different speeds and transmit through the interfaces (the wave speed discontinuities) at different times. Figure 10 shows the energy of the propagated wave fields. The networks used in these simulations are trained on The proposed networks can provide a certain level of non-trivial correction to the dispersion and refraction in the coarse solution.

Figure 10: The proposed models correct reflected and refracted waves.
Figure 11: Wave energy field of solutions obtained from parareal using the three-level Jnet. Red line helps compare the refracted wave fronts. Green line indicates the wave trajectory. The second green line segments satisfies Snell’s law.

3.3 More complex wave speeds

We demonstrate the effectiveness of the trained neural networks in proving the accuracy of coarse solver. We compute the evolution of the wave field

with the initial condition

We consider four test media:

  • Waveguide: ,

  • Inclusion: ,

  • BP model (rescaled): ,

  • Marmousi model (rescaled): .

These four models are illustrated in Figure 12. Note that these examples are not present in the training data set. The parameters used in the solvers are reported in Table 2.

Figure 12: Wave speed models used in the experiments.
Figure 13: Relative energy errors of the solutions computed by the coarse solver and the enhanced coarse solvers trained on . The initial pulse locates at the center of the domain.

Figure 13 presents a comparison of the coarse solver and a few enhanced solvers . Compared to the training and testing errors presented in Table 3, we see that the effect of the ”downsampling” operator dominates to a large degree. We encourage the readers to compare this result to the one presented in Figure 18 in the next section.

In Figure 14 we show the ”phantom” energy in the output of While this can be regarded as an undesirable effect, it does improve the amplitude of the propagated wave field. We also observe that the trained 6-level network produces drastically reduced amount of ”phantom” energy.

4 Improvement by parareal iterations

One of the shortcomings of employing a current deep learning strategy for scientific computing tasks is the lack of rigorous error estimates. Our idea is to use the proposed deep learning strategy in a self-improving loop, in which one can show that the computed errors will contract along the iterations. Here, we adopt the parareal framework, originally proposed in [17], as the potentially self-improving looping mechanism.

Parareal methods can be regarded as a fixed point iteration that involves the coarse and the fine solvers:


where the loop is enumerated in and the subscript denotes the time stepping. The initial conditions for the iterations are defined by


In the context of this paper,

is defined on the fine grid, and

The convergence of parareal methods are analyzed for different contexts in several papers. See e.g. [5],[13],[4][26][16]. We adhere to a simple version, which states that at the discrete level with the vector -norm, the error amplification factor for the parareal iteration is the product of and with . The iteration is convergent when the coarse solver is sufficiently accurate, depending on the magnitude of and the size of the iterative system, . It is a challenge to construct a stable and convergent parareal method for purely hyperbolic problems, especially when the coarse solver is under-resolving the medium.

As we have seen in the serial computation examples in Section 2.2, our network reduces the errors in the coarse solutions by 90% in very challenging media. Hence, we expect the parareal iteration involving the enhanced coarse solver to be more stable and, in turn, be used to improve approximation accuracy.

Following the framework proposed as the -parareal schemes [4][20], the focus of this section rest on schemes of the form:


where the operator under consideration include

  • where

    is a unitary matrix derived from the online data

  • is defined by a JNet, trained off-line.

The Procrustes parareal method

The Procrustes parareal method is defined as follows:


where and for , solves the following optimization problem:


The data matrices and are defined as follows:

  • The -th column of corresponds to the vector ,

  • The -th column of corresponds to the vector ,

For , are defined in (24). We call this method the Procrustes parareal method since is constructed by solving the Orthogonal Procrustes Problem [14].

The Procrustes parareal algorithm is introduced in[20]. In the original version, the fine and coarse solvers communicate on the coarse grid. This means that the data matrices records the wave fields on the coarse grid and is smaller. In contrast, in the version presented above, the the solvers communicate on the fine grid. This allows the resulting algorithm to correct for details corresponding to higher wave numbers.

Both versions of the Procrustes parareal method are remarkably stable for wave propagation. We point out again, that due to this stability, we are able to use this method to generate training examples that lead to more performant neural networks. See Section 2.2.

The neural network approach

Of course, the center of this paper is

and is a pretrained neural network discussed above. We initialized the parareal iterations by the coarse solutions:


4.1 Numerical simulations and comparisons

First, we follow up the dispersion study in Section 3.2 and present the wave energy field computed in the first four parareal iterations in Figure 11. There we observe the effect of the additive correction of the parareal updates — with accurate phase, the amplitudes are restored effectively.

In Figures 15-19, we observe that the initial errors () from different enhanced solvers are similar to the results reported in Figure 13. The initial errors are dominated by the effects of the restriction operator in and being repeatedly applied in the time stepping process. However, as increases, the effect of having better phase errors in the Jnets can be observed. In some situations, the advantage of having more dissipation can also be observed.

We present a comprehensive accuracy study in Figure 15, comparing the solvers that we have discussed so far. There, we observe the importance of the training data size for the nonlinear networks. We also notice that the linear networks perform surprisingly well in the studied setup (particularly the linear network trained with on 2000 examples). One possible explanation is that the operator to be approximated,

is closer to the identity matrix with a nonlinear perturbation (both

and are nonlinear functions of ); as least for

and the class of wave speeds under consideration. Ergo, linear regression may yield reasonable approximations. However, for larger

, we have seen in Figure 8 that linear networks are inferior.

Figure 16 shows a comparison of the networks trained on and We observe that the network trained on yields better performance after a few iterations. It is possible that the wave fields in are closer to the ones encountered in the simulations performed in this experiment — the wave fields are computed by parareal style updates.

In Figure 17 we compare the accuracy of the three-level JNets trained for and We first observe from the errors at that the enhanced solver with the smaller is more accurate. Except for the BP-model, the performance of the parareal iterations with the two networks seem comparable.

In Figure 18, we present a comparison of the performance of the proposed parareal correction, (25), using the three-level and the six-level networks, both trained on . The iterations with the three-level network appear to yield smaller errors because the three-level network has larger numerical dissipation (See Figure 10 and Section 3.2).

In Figure 19, we compare the Procrustes approach defined in (26) and the coarse solver enhanced by a 6-level JNet with

Figure 14: Phantom energy produced by the networks with zero input wave energy. Top row: . Bottom row: . The networks are trained on .

So what can we conclude from these examples? First, the phase accuracy is of upmost importance in the performance of the parareal-style iterations. Second, the parareal scheme itself is sensitive to the amount of dissipation in the system. For systems with little or no dissipation, such as the hyperbolic system that we consider in this paper, the size of the iterative system, will have a critical role: the larger it is, the smaller the difference between and has to be in order to have stability and achieve the desired correction. Of course, the stability and convergence for such type of iterations is well understood, e.g. [31]. Our examples not only confirm such theories, but also demonstrate the feasibility of adopting a data-driven approach, e.g. the Procrustes approach [20] using online data or the deep learning approach using off-line data.

Finally, we noticed that the BP model presents more difficulty for the proposed strategy. One possible cause may be the higher contrast in the BP model — particularly the large discontinuity across the periodically extended top and bottom boundaries — and that the training data should include more instances of wave propagating through such type of interfaces. This is a subject of future investigation.

Figure 15: A comparisons of parareal iterations applied to the JNets trained on data sets of different sizes. The training sets are uniformly chosen subsamples of The errors are evaluated at .
Figure 16: Effects of the training data sets. The errors are evaluated at .
Figure 17: Effect of the size of . Accuracy and convergence of the parareal iterations, using 3-level JNets trained on and . The final time of propagation is .
Figure 18: A comparison of parareal corrections for the 3-level and 6-level JNets. The networks are trained on The final time of propagation is .
Figure 19: Accuracy and convergence of the parareal iterations involving networks of different depths. The 3-level and 6-level JNets are trained on . The final propagation time is .

5 Summary

In this paper, we presented a deep learning approach for improving the accuracy of wave propagation by an under-resolving numerical solver. The data used for training the neural networks come from simulations involving the coarse solver to be corrected and a more accurate fine solver. We incorporated the developed deep learning framework into the parareal scheme, thereby improving its convergence property and the accuracy of the simulated wave propagation. Experimental results show that the presented approach dramatically enhances the stability and accuracy of the assimilated parareal scheme in a class of challenging media.

There is much room for generalization and improvement. We presented an approach to learn the correction operator for fixed . Would it be possible to learn as a function of ? The presented numerical examples comparing networks trained on and demonstrated the importance of sampling the computationally relevant, strongly causal wave fields. A more rigorous understanding of the wave fields manifold and devising efficient sampling algorithms would undoubtedly lead to more effective learning results.


Nguyen and Tsai were supported partially by National Science Foundation Grant DMS-1913209. Tsai is also partially supported by National Science Foundation Grant DMS-1720171, and by Army Research Office, under Cooperative Agreement Number W911NF-19-2-0333. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

The authors thank Louis Ly for stimulating conversations on the design and training of neural networks. The authors thank Texas Advanced Computing Center (TACC) for the computing resource.


  • [1] A. Abdulle and M. J. Grote (2011) Finite element heterogeneous multiscale method for the wave equation. Multiscale Modeling & Simulation 9 (2), pp. 766–792. Cited by: §1.
  • [2] A. Abdulle and P. Henning (2017) Localized orthogonal decomposition method for the wave equation with a continuum of scales. Mathematics of Computation 86 (304), pp. 549–587. Cited by: §1.
  • [3] G. Ariel, S. J. Kim, and R. Tsai (2016) Parareal multiscale methods for highly oscillatory dynamical systems. SIAM J. Sci. Comput. 38 (6). Cited by: §1.1.
  • [4] G. Ariel, H. Nguyen, and R. Tsai (2017) Theta-parareal scheme. arXiv:1704.06882 [math.NA]. External Links: Link Cited by: §4, §4.
  • [5] G. Bal (2005)

    On the convergence and the stability of the parareal algorithm to solve partial differential equations

    In Domain Decomposition Methods in Science and Engineering, pp. 425–432. External Links: Document, ISBN 978-3-540-26825-3, Link Cited by: §4.
  • [6] F. Billette and S. Brandsberg-Dahl (2005) The 2004 bp velocity benchmark. In 67th EAGE Conference & Exhibition, pp. cp–1. Cited by: §2.2.
  • [7] L. Borcea, V. Druskin, A. V. Mamonov, M. Zaslavsky, and J. Zimmerling (2020) Reduced order model approach to inverse scattering. SIAM Journal on Imaging Sciences 13 (2), pp. 685–723. Cited by: §1.
  • [8] A. Brougois, M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg (1990) Marmousi, model and data. In EAEG workshop-practical aspects of seismic data inversion, Cited by: §2.2.
  • [9] V. Druskin, A. V. Mamonov, and M. Zaslavsky (2018) A nonlinear method for imaging with acoustic waves via reduced order model backprojection. SIAM Journal on Imaging Sciences 11 (1), pp. 164–196. Cited by: §1.
  • [10] Y. Efendiev and T. Y. Hou (2009) Multiscale finite element methods: theory and applications. Vol. 4, Springer Science & Business Media. Cited by: §1.
  • [11] B. Engquist, H. Holst, and O. Runborg (2012) Multiscale methods for wave propagation in heterogeneous media over long time. In Numerical analysis of multiscale computations, pp. 167–186. Cited by: §1.
  • [12] S. Fu, K. Gao, and E. T. Chung (2019) A high-order multiscale finite-element method for time-domain elastic wave modeling in strongly heterogeneous media. Journal of Applied Geophysics 170, pp. 103852. Cited by: §1.
  • [13] M. J. Gander and S. Vandewalle (2007) Analysis of the parareal time-parallel time-integration method. SIAM Journal on Scientific Computing 29 (2), pp. 556–578. Cited by: §4.
  • [14] G. H. Golub and C. F. Van Loan (2012) Matrix computations. Vol. 3, JHU Press. Cited by: §4.
  • [15] J. He and J. Xu (2019) MgNet: a unified framework of multigrid and convolutional neural network. Science china mathematics 62 (7), pp. 1331–1354. Cited by: §2.1.
  • [16] M. Iizuka and K. Ono (2018) Influence of the phase accuracy of the coarse solver calculation on the convergence of the parareal method iteration for hyperbolic pdes. Computing and Visualization in Science (English). External Links: ISSN 1432-9360 Cited by: §1.1, §4.
  • [17] J.-L. Lions, Y. Maday, and G. Turinici (2001) A ”parareal” in time discretization of pde’s. Comptes Rendus de l’Academie des Sciences 332, pp. 661–668. Cited by: §1.1, §4.
  • [18] X. Meng, Z. Li, D. Zhang, and G. E. Karniadakis (2019) PPINN: parareal physics-informed neural network for time-dependent pdes. arXiv preprint arXiv:1909.10145. Cited by: §1.2.
  • [19] B. Moseley, A. Markham, and T. Nissen-Meyer (2020) Solving the wave equation with physics-informed deep learning. arXiv preprint arXiv:2006.11894. Cited by: §1.2.
  • [20] H. Nguyen and R. Tsai (2020) A stable parareal-like method for the second order wave equation. Journal of Computational Physics 405, pp. 109156. Cited by: §1.1, §1.2, §1, §2.2, §4, §4.1, §4.
  • [21] H. Owhadi and L. Zhang (2008) Numerical homogenization of the acoustic wave equations with a continuum of scales. Computer Methods in Applied Mechanics and Engineering 198 (3-4), pp. 397–406. Cited by: §1.
  • [22] M. Raissi, P. Perdikaris, and G. E. Karniadakis (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.2, §1.2.
  • [23] G. Rizzuti, A. Siahkoohi, and F. J. Herrmann (2019) Learned iterative solvers for the helmholtz equation. In 81st EAGE Conference and Exhibition 2019, Vol. 2019, pp. 1–5. Cited by: §1.2, §1.2.
  • [24] D. Rocha, N. Tanushev, and P. Sava (2016) Acoustic wavefield imaging using the energy norm. Geophysics 81 (4), pp. S151–S163. Cited by: §2.2.
  • [25] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §2.1.
  • [26] D. Ruprecht (2018-06) Wave propagation characteristics of parareal. Comput. Vis. Sci. 19 (1-2), pp. 1–17. External Links: ISSN 1432-9360, Link, Document Cited by: §1.1, §4.
  • [27] A. Siahkoohi, M. Louboutin, and F. J. Herrmann (2019) Neural network augmented wave-equation simulation. arXiv preprint arXiv:1910.00925. Cited by: §1.2, §1.2.
  • [28] N. M. Tanushev, B. Engquist, and R. Tsai (2009) Gaussian beam decomposition of high frequency wave fields. Journal of Computational Physics 228 (23), pp. 8856 – 8871. External Links: Document, ISSN 0021-9991, Link Cited by: §1.2, §2.2.
  • [29] E. Weinan, B. Engquist, et al. (2003) The heterognous multiscale methods. Communications in Mathematical Sciences 1 (1), pp. 87–132. Cited by: §1.
  • [30] X. Wu, Z. Geng, Y. Shi, N. Pham, S. Fomel, and G. Caumon (2020) Building realistic structure models to train convolutional neural networks for seismic structural interpretation. Geophysics 85 (4), pp. WA27–WA39. Cited by: §2.2.
  • [31] J. Xu and L. Zikatanov (2017) Algebraic multigrid methods. Acta Numerica 26, pp. 591–721. Cited by: §4.1.

Appendix A Appendix

a.1 Defining

Given a vector and , and assuming that for some function periodic . The main task is to find . Of course, without any additional assumption, one can only recover a function from its partial derivatives up to a constant. So, we may define a pseudo-inverse of in the Fourier domain as:


In the setup of this paper, we need to evaluate

This means that comes from the first component of We take to be the discrete sum of the coarse wave field , defined by . More precisely,

Such choice of can be considered as a consistency condition, when is replaced by the identity operator, and the coarse grid is identical to the fine grid.

a.2 Analytic correction operator in constant media

Here we derive the formula for correcting the numerical dispersion error of the semi-discrete scheme in one space dimension and a constant medium. The precise context is discussed in discussed in Section 1.3.

We use the plane wave ansatz

where are functions to be determined from the initial conditions, and .

Plugging the Ansatz into the definition of energy components


Taking the Fourier transform and evaluate at , we have

To keep the notation clean, let

For , rearrange the initial conditions

Solving for the coefficients, we get

Then from (30),(31) we derive