Multilevel Monte Carlo Acceleration of Seismic Wave Propagation under Uncertainty

10/03/2018 ∙ by Marco Ballesio, et al. ∙ King Abdullah University of Science and Technology 0

We interpret uncertainty in the parameters of a model for seismic wave propagation by treating the parameters as random variables, and we apply the Multilevel Monte Carlo (MLMC) method to reduce the cost of approximating expected values of selected, physically relevant, quantities of interest (QoI) with respect to the random variables. Aiming to solve source inversion problems, where we seek to infer the source of an earthquake from ground motion recordings on the Earth's surface, we consider two QoI which measure the discrepancy between computed seismic signals and given reference signals: one QoI is defined in terms of the L2-misfit, which is directly related to maximum likelihood estimates of the source parameters; the other is based on the quadratic Wasserstein distance between probability distributions, and represents one possible choice in a class of such misfit functions which have become increasingly popular for seismic inversion. We use a publicly available code in widespread use, based on the spectral element method, to approximate seismic wave propagation, including seismic attenuation. With this code, using random coefficients and deterministic initial and boundary data, we have performed benchmark numerical experiments with synthetic data in a two-dimensional physical domain with a one-dimensional velocity model where the assumed parameter uncertainty is motivated by realistic Earth models. In this test case, the MLMC method reduced the computational cost of the standard Monte Carlo (MC) method by up to 97 based on the quadratic Wasserstein distance, for a relevant range of tolerances. Extending the method to three-dimensional domains is straight-forward and will further increase the relative computational work reduction, allowing the solution of source inversion problems which would be infeasible using standard MC.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent large earthquakes and their devastating effects on society and infrastructure (e.g., New Zealand, 2011; Japan, 2011; Nepal, 2015) emphasize the urgent need for reliable and robust earthquake-parameter estimations for subsequent risk assessment and mitigation. Seismic source inversion is a key component of seismic hazard assessments where the probabilities of future earthquake events in the region are of interest.

From ground motion recordings at the surface of the Earth, i.e. seismograms, we are interested in efficiently computing the likelihood of postulated parameters describing the unknown source of an earthquake. A sub-problem of the seismic source inversion is to infer the location of the source (hypocenter) and the origin time. We take the expected value of the quantity of interest, which, in our case, is the misfit between the observed and predicted ground displacements for a given seismic source location and origin time, to find the location and time of highest likelihood from observed seismogram data. Several mathematical and computational approaches can be used to calculate the predicted ground motions in the source inversion problem. These techniques span from approximately calculating only some of the waveform attributes, (e.g. peak ground acceleration or seismic phase arrival-times), often by using simple one-dimensional velocity models, to the simulation of the full wave propagation in a three-dimensionally varying structure.

In this work, the mathematical model and its output are random to account for the lack of precise knowledge about some of its parameters. In particular, to account for uncertainties in the material properties of the Earth, these are modeled by random variables. The most common approach used to compute expected values of random variables is to employ Monte Carlo (MC) sampling. MC is non-intrusive, in the sense that it doesn’t require underlying deterministic computational codes to be modified, but only called with randomly sampled parameters. Another striking advantage of MC is that no regularity assumptions are needed on the quantity of interest with respect to the uncertain parameters, other than that the variance has to be bounded to use the Central Limit Theorem to predict the convergence rate. However, in situations when generating individual samples from the computational model is highly expensive, often due to the need for a fine, high resolution, discretization of the physical system, MC can be too costly to use. To avoid a large number of evaluations of the computer model with high resolution, but still preserve the advantages of using MC, we apply Multilevel Monte Carlo (MLMC) sampling 

[15, 18, 16] to substantially reduce the computational cost by distributing the sampling over computations with different discretization sizes. Polynomial chaos surrogate models have been used to exploit the regularity of the waveform solution [9]; in cases when the waveform is analytic with respect to the random parameters, the asymptotic convergence can be super-algebraic. However, if the waveform is not analytic, one typically only achieves algebraic convergence in asymptotically, as was shown in [32] to be the case with stochastic collocation for the second order wave equation with discontinuous random wave speed. The setup in [32] with a stratified medium is analogous to the situation we will treat in the numerical experiments for a viscoelastic seismic wave propagation problem in the present paper. The efficiency of polynomial chaos, or stochastic collocation, methods deteriorates as the number of “effective” random variables increases, whereas the MC only depends weakly on this number. Here MC type methods have an advantage for the problems we are interested in.

We are motivated by eventually solving the inverse problem; therefore the Quantities of Interest (QoIs) considered in this paper are misfit functions quantifying the distance between observed and predicted displacement time series at a given set of seismographs on Earth’s surface. One QoI is based on the -norm of the distances; another makes use of the Wasserstein distance between probability densities, which requires transformation of the displacement time series to be applicable. The advantages of the Wasserstein distance over the difference for full-waveform inversion, for instance, that the former circumvents the issue of cycle skipping, have been shown in [13] and further studied in [12, 39, 40]. A recent preprint [31] combines this type of QoIs with a Bayesian framework for inverse problems.

In our demonstration of MLMC for the two QoIs, we consider a seismic wave propagation in a semi-infinite two-dimensional domain with free surface boundary conditions on the Earth’s surface, heterogeneous viscoelastic media, and a point-body time-varying force. We use SPECFEM2D [22, 35]

for the numerical computation where we consider an isotropic viscoelastic Earth model. The heterogeneous media is divided into homogeneous horizontal layers with uncertain densities and velocities. The densities and shear wave velocities are treated as random, independent between the subdomains, and they are uniformly distributed over the assumed intervals of uncertainty in the respective layers. The compressional wave velocities of the subdomains follow a multivariate uniform distribution conditional on the shear wave speeds. Our choice of probability distributions to describe the uncertainties are motivated by results given in 

[2, 34].

The paper is outlined as follows: In Section 2, the seismic wave propagation problem is described for a viscoelastic medium with random Earth material properties. The two QoIs are described in Section 3. The computational techniques, including (i) numerical approximation of the viscoelastic Earth material model and of the resulting initial boundary value problem, the combination of which is taken as “black-box” solver by the widely used seismological software package SPECFEM2D [22], and (ii) MLMC approximation of QoIs depending on the random Earth material properties, are described in Section 4. The configuration of the numerical tests is described in Section 5, together with the results, showing a considerable decrease in computational cost compared to the standard Monte Carlo approximation for the same accuracy.

2 Seismic Wave Propagation Model with
Random Parameters

Here we describe the model we use for seismic wave propagation in a heterogeneous Earth medium, given by an initial boundary value problem (IBVP). We interpret the inherent uncertainty in the Earth material properties through random parameters which define the compressional and shear wave speed fields and the mass density.

First, we state the strong form of the IBVP in the case of a deterministic elastic Earth model, later to be extended to a particular anelastic model in the context of a weak form of the IBVP, suitable for the numerical approximation methods used in Section 4 and 5. Finally, we state assumptions on the random material parameter fields.

2.1 Strong Form of Initial Boundary Value Problem

We consider a heterogeneous medium occupying a domain modeling the Earth. We denote by the space-time displacement field induced by a seismic event in . In the deterministic setting is assumed to satisfy

(1a)
for some finite time horizon, given by , with the initial conditions
 , (1b)
and the free surface boundary condition on Earth’s surface
on , (1c)

where

denotes the stress tensor, and

denotes the unit outward normal to . Together with a constitutive relation between stress and strain, (1a)–(1c) form an IBVP for seismic wave propagation; two different consitutive relations will be considered below. In this paper denotes time derivative and and denote spatial gradient and divergence operators, respectively.

With denoting the density, becomes a body force that includes the force causing the seismic event. In this study, we consider a simple point body force acting with time-varying magnitude at a fixed point, as described in [1, 10].

For an isotropic elastic Earth medium undergoing infinitesimal deformations, the constitutive stress-strain relation can be described by

(2)

with the trace of the symmetric infinitesimal strain tensor, , and the identity tensor; see [1, 10, 6]. In the case of isotropic heterogeneous elastic media undergoing infinitesimal deformations, the first and second Lamé parameter, denoted and respectively, are functions of the spatial position, but for notational simplicity, we often omit the dependencies on . These parameters, together, constitute a parametrization of the elastic moduli for homogeneous isotropic media and together with also determine the compressional wave speed, , and shear wave speed, , by

(3)

Either one of the triplets and defines the Earth’s material properties with varying spatial position for a general velocity model [36].

Simplification of the full Earth model

For the purpose of the numerical computations, we will later replace the whole Earth domain by a semi-infinite domain, which we will truncate with absorbing boundary conditions at the artificial boundaries introduced by the truncation. The domain boundary is with denoting the artificial boundary. We will consider numerical examples with , and from now on let denote the dimension of the physical space.

2.2 Weak Form of Initial Boundary Value Problem

The numerical methods for simulating the seismic wave propagation used in this paper are based on an alternative formulation of IBVP (1) that uses the weak form in space. To obtain such form, one multiplies (1a) at time by a sufficiently regular test function , and integrates over the physical domain . Using integration by parts and imposing the traction-free boundary condition (1c), one derivative is shifted from the unknown displacement, , to the test function , i.e.,

where denotes the double contraction. In this context, “sufficiently regular test function”, means that , where denotes the Sobolev space ,

equipped with the usual inner product

where is a characteristic length scale, and the corresponding induced norm

The weak form of the IBVP then becomes:

Problem 1 (Weak form of isotropic elastic IBVP).

Find , which both satisfies the initial conditions (1b), and for in (2) satisfies

(4)

almost everywhere in the time interval , where the trial space

(5)

using the spaces defined in (6).

Above, the time dependent functions, , , , belong to Bochner spaces

(6)

where the appropriate choice of depends on the number of spatial derivatives needed: , , and , respectively, with the latter space being the dual space of .

According to [26, 27], Problem 1 is well-posed under certain regularity assumptions. More precisely, assuming that the density  is bounded away from zero, , the problem fits into the setting of Section 1, Chapter 5, of [27], after dividing through by . Assuming that in (2) are also sufficiently regular, according to Theorem 2.1 in the chapter it holds that, if the force , the initial data and , and the boundary, , is infinitely differentiable, there exists a unique solution to Problem 1,

which depends continuously on the initial data.

In our numerical experiments, however, we instead use a problem with piecewise constant material parameters, violating the smoothness assumption, but only at material interfaces in the interior of the domain. Furthermore, a singular source term, , common in seismic modeling, will be used.

2.3 Weak Form Including Seismic Attenuation

For a more realistic Earth model, we include seismic attenuation in the wave propagation. The main cause of seismic attenuation is the relatively small but not negligible anelasticity of the Earth. In the literature, e.g., Chapter 6 of [10] or Chapter 1 of [6], anelasticity of the Earth is modeled by combining the mechanical properties of elastic solids and viscous fluids. In a heterogeneous linear isotropic viscoelastic medium, the displacement field follows the IBVP (1), but the stress tensor depends linearly upon the entire history of the infinitesimal strain, and the constitutive relation (2) will be replaced by

(7)

where represents the anelastic fourth order tensor which accounts for the Earth’s material properties, which will be further discussed in Section 4.2.

With the constitutive relation (7) replacing (2), the IBVP becomes:

Problem 2 (Weak form of isotropic viscoelastic IBVP).

Find , defined in (5), which both satisfies the initial conditions (1b), and for in (7) satisfies

(8)

almost everywhere in the time interval .

Theoretical well-posedness results for Problem 2 are not known to the authors. However, in the related case where the boundary conditions in Problem 2 are replaced with homogeneous Dirichlet boundary conditions, well-posedness for the modified problem can, for example, be found in [14]; again with the assumptions on and  giving well-posedness of Problem 1.

2.4 Statement in Stochastic setting

Here we model the uncertain Earth material properties, , as time-independent random fields , where is the sample space of a complete probability space. We assume that the random fields are bounded from above and below, uniformly both in physical space and in sample space, and with the lower bounds strictly positive,

(9a)
(9b)
(9c)

For any given sample the displacement field solves Problem 1 or Problem 2, for the respective case.

Any known well-posedness properties of the deterministic Problem 1 and Problem 2 are directly inherited in their stochastic form, assuming the same regularity of realizations of the random fields as of their deterministic counterparts.

3 Quantities of Interest

Two QoI suitable for different approaches to seismic inversion will be described. The common feature is that they quantify the misfit between data, consisting of ground motion measured at the Earth’s surface, at fixed equidistant observation times , , and model predictions, consisting of the corresponding model predicted ground motion. The displacement data, , and the model predicted displacement, , are given for a finite number of receivers, , at locations . Let us ignore model errors and assume that the measured data is given by the model, , depending on two parameters, denoted by and . Here corresponds to the unknown source location, which can be modeled as deterministic or stochastic depending on the approach to the source inversion problem, and is a random nuisance parameter, corresponding to the uncertain Earth material parameters. We assume that is given by the model up to some additive noise:

(10)

where , independent identically distributed (i.i.d.), and and denote some fixed values of and , respectively. We consider the random parameter, , to consist of the material triplet, as a random variable or field and all other parameters than and as given.

The additivity assumption on the noise, while naive, can easily be replaced by more complex, correlated, noise models without affecting the usefulness or implementation of the MLMC approach described in Section 4.4.

Let us denote a QoI by . An example of a seismic source inversion approach is finding the location, , that yields the lowest expected value of , i.e., the solution of

(11)

Both QoIs investigated in this work can be used to construct likelihood functions for statistical inversion, see e.g. [4]. For instance, finding the source location, , by maximizing the marginal likelihood, i.e., the solution of

(12)

where is the likelihood function.

In the two QoIs below, we assume that the discrete time observations have been extended to a continuous function, e.g., by linear interpolation between data points.

-based QoI

The first QoI  studied in this work, denoted by , is based on the commonly used misfit between predicted data and measured data :

(13)

where is the Euclidean norm in and

is the total simulation time. This quantity of interest is directly related to the seismic inversion problem through its connection to the likelihood for normally-distributed variables:

(14)

where .

A drawback with the misfit function for full-waveform seismic inversion, see e.g. [39]

, is the well-known cycle skipping issue which typically leads to many local optima and raises a substantial challenge to subsequent tasks such as optimization and Bayesian inference.

-based QoI

An alternative QoI was introduced, in the setting of seismic inversion, and analyzed in [13, 12, 39, 40], where it is shown to have several desirable properties which

is lacking; in particular, if a waveform is compared to a shifted version of itself, this QoI is a convex function of the shift. This QoI is based on the quadratic Wasserstein distance between two probability density functions (PDFs),

and , which is defined as

(15)

where is the set of all maps that rearrange the PDF into . When is an interval in , an explicit form

(16)

exists, where and

are the cumulative distribution functions (CDFs) of

and respectively.

To eliminate the scaling due to the length of the time interval we make a change of variable so that below. Typically, the waveforms will not be PDFs, even in their component parts. If we assume that and are two more general one-dimensional functions, taking both positive and negative values in the interval, then the non-negative parts and and non-positive parts and can be considered separately, and one can define

(17)

To define the QoI we sum

applied to all spatial components of the vector-valued

and in all receiver locations, i.e.

(18)
Remark 1 (Alternating signs of and ).

Note that the definition of above requires all components of and in all receivers to obtain both positive and negative values in the time interval for (17) to be well-defined. With Gaussian noise in (10), the probability of violating this assumption is always positive, though typically too small to observe in practice if the observation interval and the receivers are properly set up. To complete the definition of , we may extend (17) by replacing by 1 whenever at least one of and is identically 0.

Remark 2 (Convexity of ).

The convexity of with respect to time-shifts in signals is directly related to source inversion problems, [12], since moving the source location will result in shifts in the arrival times in the receivers. As an illustration, consider the case where the data is synthetic data obtained from a computed approximation of . Figure 1 shows results from a numerical example very similar to the one in Section 5. In Figure 1 and are approximated for 41 different synthetic data, obtained by shifting the source location, , while keeping the Earth material parameters fixed, as given in Table 5 on page 5. In the figure, is the shift of the source location in the synthetic data relative to the source location encoded in when approximating Problem 2. The left column shows a larger region around the point , marked with a red circle, and the right gives a detailed view around . The left column clearly illustrates a situation where the large-scale behavior of is convex with respect to , while is not. In the absence of noise, both QoIs are convex in a small neighborhood around the point . Avoiding the non-convex behavior on the larger scale significantly simplifies the source inversion problem.

Figure 1: The expected value, , as a function of the shift between source location in the MLMC simulation and in the simulation used to generate the synthetic data; see Remark 2.
The top row shows , computed with synthetic data with additive noise, the middle row with the same synthetic data, and the bottom row, without noise added to the synthetic data.
The effect of adding or removing a noise of this level is not visible in , and therefore the corresponding figures of without added noise are omitted.

4 Computational Techniques

In this section, we start with a concise description of how Problem 1 and Problem 2 are approximated numerically. The domain, , is modified in two steps: first, the finite Earth model is replaced by a half-plane in two dimensions or a half-space in three dimensions, and second, this semi-infinite domain is truncated, introducing absorbing boundary conditions on the artificial boundaries. We also describe a simplification of the stress tensor model (7) that results in a viscoelastic stress tensor suitable for numerical implementation. Then, we proceed with providing computational approximations of the QoI given in Section 3. The section ends with a summary of the MLMC algorithm for computing the expected value of the QoI.

4.1 Numerical Approximation of Initial Boundary Value Problem

A numerical approximation of Problem 1 or 2 can either be achieved by (i) approximating the seismic wave propagation produced by a seismic event on the whole Earth, or by (ii) restricting the computational domain, , to a local region around the source and the receivers. In either case, there are purpose-built software packages based on the Spectral Element Method (SEM) [21, 24] that will be used in this paper. The MLMC method does not fundamentally depend on which of the alternatives, (i) or (ii), that is used, or on the choice of SEM over other approximation methods. Indeed, an important advantage of MLMC, or more generally MC, methods is that they are non-intrusive in the sense that they can straightforwardly be applied by randomly sampling the Earth material parameters and then executing any such publicly available simulation code to compute the corresponding sample of the QoI.

In our numerical example, we choose alternative (ii), and proceed in two steps: first, we approximate the Earth locally by a half-plane, in a two-dimensional test case, or by a half-space, in the full three-dimensional problem; second, the half-plane or half-space is truncated to a finite domain, where absorbing boundary conditions (ABC) are introduced on the artificial boundaries to mimic the absorption of seismic energy as the waves leave the region around the receivers. The variational equations (4) and (8) now contain a non-vanishing boundary term, corresponding to the part of the boundary, , where absorbing boundary conditions apply. We use a perfectly matched layer (PML) approximation of the ABC, introduced in [3] and used in many fields; see e.g. [37, 23] in the context of seismic wave propagation. However, in the absence of true PML, see [38], for Problem 2 which has attenuating Earth material properties, in practice we choose the truncation of such that is far enough from all receivers to guarantee that no reflected waves reach the receivers in the time interval , given the maximal wave speeds allowed by the range of uncertainties (9).

To apply SEM, first, a semi-discrete version of the variational equation is introduced by discretizing space and introducing a finite dimensional solution space where the solution at time can be represented by a finite vector, . Then, the time evolution of the SEM approximation,

, of the seismic wavefield solves an initial value problem for the second order ordinary differential equation (ODE) in time

(19)

where is the mass matrix, is the global absorbing boundary matrix, the global stiffness matrix, and the source term.

To get the semi-discrete form, is divided into non-overlapping elements of maximal size , similarly to what is done when using a standard finite-element method. Quadrilateral elements are used in two space dimensions and hexahedral in three. Each element is defined in terms of a number, , of control points and an equal number of shape functions which describe the isomorphic mapping between the element and reference square or cube. The shape functions are products of Lagrange polynomials of low degree. In the remainder, we assume that no error is introduced by the representation of the shape of the elements, which is justified by the very simple geometry of the test problem in Section 5. The displacement field on every element is approximated by a Lagrange polynomial of higher degree, , and the approximation of the variational form (4) or (8), including the artificial boundary term, over an element is based on the Gauss-Lobatto-Legendre integration rule on the same points used for Lagrange interpolation; this choice leads to a diagonal mass matrix, , which is beneficial in the numerical stepping scheme. More details on the construction of these matrices and the source term can be found in [24].

The initial value problem for the ODE (19) is approximately solved by introducing a discretization of the time interval, and by applying a time-stepping method. Among multiple available choices, this work uses the second-order accurate explicit Newmark-type scheme; see for example Chapter 9 in [20]. It is a conditionally stable scheme and the associated condition on the time step leads to , for uniform spatial discretizations. Thanks to the diagonal nature of and the sparsity of and , the cost per time step of the Newmark scheme is proportional to the number of unknowns in , and thus to , and since the number of time steps is inversely proportional to the total work is proportional to .

Determining by the stability constraint, , we expect the second order accuracy of the Newmark scheme to asymptotically be the leading order error term as , assuming sufficient regularity of the true solution.

4.2 Computational Model of Seismic Attenuation

Approximately solving Problem 2, as it is stated in Section 2.3, is very difficult since the stress  in (7) at time depends on the entire solution history for , or in practice for since the displacement is assumed to be constant up to time 0. Even in a discretized form in an explicit time stepping scheme, approximately updating the stress according to (7) would require storing the strain history for all previous time steps in every single discretization point where must be approximated, requiring unfeasible amounts of computer memory and computational time. Therefore, the model of the viscoelastic properties of the medium is often simplified to a generalized Zener model using a series of “standard linear solid” (SLS) mechanisms. The present work uses the implementation of the generalized Zener model in SPECFEM2D. Below, we will briefly sketch the simplification of (7). For a more detailed description, we refer readers to [28, 41, 21, 30, 6].

The integral in the stress-strain relation (7) can be expressed as a convolution in time by defining the relaxation tensor to be zero in , i.e. , where is the Heaviside function. That is,

(20)

which, as discussed in [11], can be formulated in the frequency domain as

(21)

where . In seismology, it has been observed, [7], that the so-called quality factor

(22)

is approximately constant over a wide range of frequencies. This is an intrinsic property of the Earth material that describes the decay of amplitude in seismic waves due to the loss of energy to heat, and its impact on is explicitly given in equation (5) of ([11]). This observation allows modeling in the frequency domain through a series of a number, , of SLS. Then the stress-strain relation (20) can be approximated as

(23)

where is the unrelaxed viscoelastic fourth order tensor, which for an isotropic Earth model is defined by and or equivalently and . The relaxation functions for each SLS satisfy initial value problems for a damping ODE; by the non-linear optimization approach given in [5], implemented in SPECFEM2D and used in this work, the ODE for each is determined by two parameters: the quality factor, , and the number of SLS, .

4.3 Quantities of Interest

The two QoI, and defined in (13) and (18) respectively, are approximated from discrete time series and . The data observation times , are considered given and fixed, and with realistic frequencies of the measurements, , it is natural to take smaller time steps in the time discretization of the numerical approximation of and we assume that . Furthermore, the time discretization is assumed to be characterized by one parameter , e.g., the constant time step size of a uniform discretization. Since we defined both QoI as sums over all receivers and all components of the vector-valued functions and in the receivers, it is sufficient to describe the approximation in the case of two scalar functions , representing the simulated displacement, and , representing the observed data. Here, we define the function as the piecewise linear interpolation in time of the data points.

Approximation of

The time integral in (13) is approximated by the Trapezoidal rule on the time discretization of the numerical simulation. With the assumption that and the definition of as the piecewise linear interpolation in time of the data points, only the discretization of contributes to the error. The numerical approximation of will then have an asymptotic error, provided that is sufficiently smooth.

Approximation of

The approximation of (18), through (17), requires both positive and negative values to be attained in all components of and in all receivers; see Remark 1 on page 1. Assuming that this holds we approximate the -distance between the normalized non-negative parts of and ; we treat the non-positive analogously. To this end, the zeros of and , denoted and respectively, are approximated by linear interpolation, and they are included in the respective time discretizations, generating and , and thus and are obtained. Then the corresponding values of the CDFs, and are approximated by the Trapezoidal rule, followed by normalization. Finally, the inverse of in , i.e. , is approximated by linear interpolation, and analogously for the inverse of in , before the integral in (16) is approximated by the Trapezoidal rule on the discretization of . These steps combined lead to an asymptotic error, provided that is sufficiently smooth.

Remark 3 (On expected weak and strong convergence rates).

The deterministic rate of convergence, as , of both approximations, is two. Here, is identical to the time step of the underlying approximation method of Problem 1 or 2. In the numerical approximation of , with the second order Newmark scheme in time, the asymptotic convergence rate is also two at best, which holds if the solution is sufficiently regular. By our assumptions on the random fields satisfying the same regularity as in the deterministic case and being uniformly bounded, from above and away from zero from below, in the physical domain and with respect to random outcomes, we expect both the weak and the strong rates of convergence to be the same as the deterministic convergence rate of the numerical approximation; i.e. at best two, asymptotically as goes to zero.

4.4 MLMC Algorithm

Here, we summarize the MLMC algorithm introduced by Giles ([15]) and independently, in a setting further from the one used here, by Heinrich in ([18]), which has since become widely used ([16]).

MLMC is a way of reducing the computational cost of standard MC, for achieving a given accuracy in the estimation of the expected value of some QoI, in situations when the samples in the MC method are obtained by numerical approximation methods characterized by a refinement parameter, , controlling both the accuracy and the cost.

Goal

We aim to approximate the expected value of some QoI, , by an estimator , with the accuracy requirement that

with probability , for , (24)

where is a user-prescribed error tolerance. To this end we require

(25a)
and
(25b)

for some , which we are free to choose.

Assumptions on the Numerical Approximation Model

Consider a sequence of discretization-based approximations of characterized by a refinement parameter, . Let denote the resulting approximation of , using the refinement parameter for an outcome of the random variable . In this work, we consider successive halvings in the refinement parameter, , which in Section 4.1 corresponds to the spatial mesh size and the temporal step size, and , respectively. We then make the following assumptions on how the cost and accuracy of the numerical approximations depend on . We assume that the work per sample of , denoted , depends on as

(26a)
and the weak order of convergence is , so that we can model,
(26b)
and that the variance is independent of the refinement level
(26c)
Standard MC estimator

For i.i.d. realizations of the parameter, , the unbiased MC estimator of is given by

(27)

For to satisfy (25) we require

(28)
which according to the model (26b) becomes
(29)

For any fixed such that , the value of the splitting parameter, , is implied by replacing the inequality in (29) by equality and solving for , giving

(30)

Thus, the model for the bias tells us how large of a statistical error we can afford for the desired tolerance, . By the Central Limit Theorem, properly rescaled converges in distribution,

(31)

where is a standard normal random variable with CDF . Hence, to satisfy the statistical error constraint (25b), asymptotically as , we require

(32)

where is the confidence parameter corresponding to a confidence interval, i.e. .

The computational work of generating is

For asymptotic analysis, assume that we can choose

by taking equality in (29) and by taking equality in (32); then we get the asymptotic work estimate

(33)

For any fixed choice of , the computational complexity of the MC method is . Minimizing the right hand side in (33) with respect to gives the asymptotically optimal choice

(34)
MLMC estimator

The work required to meet a given accuracy by standard MC can be significantly improved by systematic generation of control variates given by approximations corresponding to different mesh sizes. In the standard MLMC approach, we use a whole hierarchy of meshes defined by decreasing mesh sizes and the telescoping representation of the expected value of the finest approximation, ,

from which the MLMC estimator is obtained by approximating the expected values in the telescoping sum by sample averages as

(35)

where denote i.i.d. realizations of the mesh-independent random variables. Note that the correction terms

(36)

are evaluated with the same outcome of in both the coarse and the fine mesh approximation. This means that , as , provided that the numerical approximation converges strongly. Introducing the notation

(37)

and assuming a strong convergence rate we model

(38)

Note that while this holds asymptotically as , by the definition of strong convergence, this model may be inaccurate for small , corresponding to coarse discretizations. However, it suffices for an asymptotic work estimate.

The computational work needed to generate is

(39)

where we now assume that (26a) also holds for the cost of generating . In order for to satisfy (24), we fix and require to satisfy the bias constraint (28) and, consequently (29), on the finest discretization, , leading to

(40)

and we also require it to satisfy the statistical error constraint (25b). In the MLMC context, (25b) is approximated by the bound

(41)

on the variance of . Enforcing (25b) through this bound is justified asymptotically, as converges to 0, by a Central Limit Theorem for MLMC estimators if for example ; see Theorem 1.1 in [19]. Given and , minimizing the work (39) subject to the constraint (41) leads to the optimal number of samples per level in ,

(42)

Substituting this optimal in the total work (39) yields:

(43)

Finally, using the mesh parameter given by (40), work per sample (26a), and for simplicity assuming that (38) also holds for , this computational work has the asymptotic behavior

(44)

as , assuming ; see e.g. Theorem 3.1 in [15], or Corollary 2.1 and Corollary 2.2 in [17]. Similar to the standard MC case, it is possible to optimize the choice of in (25) for MLMC. In particular, if , an asymptotic analysis gives , as , indicating an aggressive refinement of the numerical discretization to reduce the bias. Again, the choice of does not change the rate of the complexity, but an optimal choice may reduce the work with a constant factor.

In all three cases in (44), the complexity is lower than the corresponding complexity, , for standard MC simulation of the same problem (33). This leads to very significant computational saving in complex models, and as a result some problems that are infeasable using the standard MC method are computationally tractable using MLMC.

MLMC applied to of Section 4.14.3

The assumption on the work per sample (26a) holds for

, since the degrees of freedom in the uniform spatial discretization are proportional to

, and the number of time steps is proportional to , where work per time step of the explicit time stepping scheme is proportional to the degrees of freedom. In the setting described in Section 23, the weak convergence rate, , is identical to the rate of convergence in the approximation of the deterministic problem, and the strong convergence rate, , equals the weak rate. The explicit Newmark time stepping scheme and the numerical approximation of and are both of order 2, so that asymptotically as we expect and assuming sufficiently regular exact solution. Based on these observations, summarized in Table 1, and the complexity estimates (33) and (44), we expect the asymptotic complexity to improve from to , for , and from to , for , as and standard MC is replaced by MLMC.

Model parameters Asymptotic complexity
        
2 3 2 4
3 4
Table 1: Summary of the parameters in the work and convergence models (26) and (38), for the numerical approximation of Problem 1 or Problem 2, given in Section 4.14.3, and the corresponding asymptotic complexity estimates, given in (33) and (44).

5 Numerical Tests

These numerical experiments make up an initial study of the validity of MLMC techniques as a means of accelerating the approximation of expected values of source inversion misfit functions, where we take the expectation with respect to random parameters modeling uncertainties in the Earth model. After this initial study where the source is approximated to a point and only synthetic data are used, our ultimate goal is to integrate MLMC into the full source inversion problem where the finite fault solution is to be inferred by using real seismological data. While the final source inversion must be based on numerical simulations on a three-dimensional Earth model, these initial tests were made on a two-dimensional model described in the following. Furthermore, the misfit functions were chosen with the aim of identifying the source location considering the source moment tensor as fixed.

We first describe the problem setup, including the source model, computational geometry, discretization, random Earth material parameters, and the synthetic data replacing actual measurements in the two-dimensional test. Finally, we describe the execution and results of MLMC computations on the given problem setup.

5.1 Problem Setup

For the numerical tests with , we create a geometry consistent with an actual network of receivers, belonging to a small seismic network in the Ngorongoro Conservation Area on the East Rift, in Tanzania. We do this by selecting three receivers that are approximately aligned with the estimated epicenter of a seismic event that was recorded. Figure 2 illustrates the physical configuration. The rough alignment of the source and the receiver locations in the actual seismic network make this event a good opportunity to run the tests in a two-dimensional domain. We describe the two-dimensional computational domains below, together with the source and Earth parameters.

Figure 2: Source–receivers–geometry for the Tanzania case study, restricted to three receivers, marked by red triangles, which fall approximately along a straight line, also aligned with the estimated source location of a recorded seismic event (marked by a blue star).

5.1.1 Source model

We consider a point source with a symmetric moment tensor, modeled as a body force in the variational equation (8) of Problem 2,

with the moment tensor

measured in , and a Gaussian source-time function with corner frequency ,