    # Solving Traveltime Tomography with Deep Learning

This paper introduces a neural network approach for solving two-dimensional traveltime tomography (TT) problems based on the eikonal equation. The mathematical problem of TT is to recover the slowness field of a medium based on the boundary measurement of the traveltimes of waves going through the medium. This inverse map is high-dimensional and nonlinear. For the circular tomography geometry, a perturbative analysis shows that the forward map can be approximated by a vectorized convolution operator in the angular direction. Motivated by this and filtered back-projection, we propose an effective neural network architecture for the inverse map using the recently proposed BCR-Net, with weights learned from training datasets. Numerical results demonstrate the efficiency of the proposed neural networks.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Traveltime tomography is a method to determinate the internal properties of a medium by measuring the traveltimes of waves going through the medium. It is first motivated in global seismology in determining the inner structure of the Earth by measuring at different seismic stations the traveltimes of seismic waves produced by earthquakes (Backus and Gilbert, 1968; Rawlinson et al., 2010). By now, it has found many applications, such as Sun’s interior (Kosovichev, 1996), ocean acoustics (Munk et al., 2009), and ultrasound tomography (Schomberg, 1978; Jin and Wang, 2006) in biomedical imaging.

#### Background.

The governing equation of first-arrival traveltime tomography (TT) is the eikonal equation (Born and Wolf, 1965) and we consider the two dimensional case in this paper for simplicity. Let be an open bounded domain with Lipschitz boundary . Suppose that the positive function is the slowness field, i.e., the reciprocal of the velocity field, defined in . The traveltime satisfies the eikonal equation . Since it is a special case of the Hamilton-Jacobi equation, the solution can develop singularities and should be understood in the viscosity sense (Ishii, 1987).

A typical experimental setup of TT is as follows. For each point , one sets up the Soner boundary condition at point , i.e., only zero value at , and solves for the following the eikonal equation

 (1)

where the superscript is to index the source point. Recording the solution of at points produces the whole data set . In practice and are samples from a discrete set of points on . Here we assume for now that they are placed everywhere on , for the simplicity of presentation and mathematical analysis.

The forward problem is to compute given the slowness field . On the other hand, the inverse problem, at the center of the first-arrival TT, is to recover given .

Both the forward and inverse problems are computationally challenging, and a lot of efforts have been devoted to their numerical solutions. For the forward problem, the eikonal equation, as a special case of the Hamilton-Jacobi equation, can develop singular solutions. In order to compute the physically meaningful viscosity solution, special care such as up-winding is required. As the resulting discrete system is nonlinear, fast iteration methods such as fast marching method (Popovici and Sethian, 1997; Sethian, 1999) and fast sweeping method (Zhao, 2005; Kao et al., 2005; Qian et al., 2007) have been developed. Among them, the fast sweeping methods have been successfully applied to many traveltime tomography problems (Leung et al., 2006). The inverse problem is often computationally more intensive, due to the nonlinearity of the problem. Typical methods take an optimization approach with proper regularization (Chung et al., 2011) and require a significant number of iterations.

#### A deep learning approach.

Over the past decade or so, deep learning (DL) has become the dominant approach in computer vision, image processing, speech recognition, and many other applications in machine learning and data science

(Hinton et al., 2012; Krizhevsky et al., 2012; Goodfellow et al., 2016; Ma et al., 2015; Leung et al., 2014; Sutskever et al., 2014; LeCun et al., 2015; Schmidhuber, 2015)

. From a technical point of view, this success is a synergy of several key developments: neural networks (NNs) as a flexible framework for representing high-dimensional functions and maps, simple algorithms such as back-propagation (BP) and stochastic gradient descent (SGD) for tuning the model parameters, efficient general software packages such as Tensorflow and Pytorch, and unprecedented computing power provided by GPUs and TPUs.

In the past several years, deep neural networks (DNNs) have been increasingly used in scientific computing, particularly in solving PDE-related problems (Khoo et al., 2017; Berg and Nyström, 2018; Han et al., 2018a; Fan et al., 2018; Araya-Polo et al., 2018; Raissi and Karniadakis, 2018; Kutyniok et al., 2019; Feliu-Faba et al., 2019), in two directions. In the first direction, as NNs offer a powerful tool for approximating high-dimensional functions (Cybenko, 1989), it is natural to use them as an ansatz for high-dimensional PDEs (Rudd and Ferrari, 2015; Carleo and Troyer, 2017; Han et al., 2018a; Khoo et al., 2019; E and Yu, 2018). The second direction focuses on the low-dimensional parameterized PDE problems, by using the DNNs to represent the nonlinear map from the high-dimensional parameters of the PDE solution (Long et al., 2018; Han et al., 2018b; Khoo et al., 2017; Fan et al., 2018, 2019b, 2019a; Li et al., 2019; Bar and Sochen, 2019).

As an extension of the second direction, DNNs have been widely applied to inverse problems (Khoo and Ying, 2018; Hoole, 1993; Kabir et al., 2008; Adler and Öktem, 2017; Lucas et al., 2018; Tan et al., 2018; Fan and Ying, 2019a, b; Raissi et al., 2019). For the forward problem, since applying neural networks to input data can be carried out rapidly under current software and hardware architectures, the solution of the forward problem can be significantly accelerated when the forward map is represented with a DNN. For the inverse problem, DNNs can help in two critical ways: (1) due to its flexibility in representing high-dimensional functions, DNNs can potentially be used to approximate the full inverse map, thus avoiding the iterative solution process; (2) recent work in machine learning shows that DNNs often can automatically extract features from the data and offer a data-driven regularization prior.

This paper applies the deep learning approach to the first-arrival TT by representing the whole inverse map using a NN. The starting point is a perturbative analysis of the forward map, which reveals that for the circular tomography geometry, the forward map contains a one-dimensional convolution with multiple channels, after appropriate reparameterization. This observation motivates to represent the forward map from 2D coefficient to 2d data by a one-dimensional convolution neural network (with multiple channels). Further, the one-dimensional convolution neural network can be implemented by the recently proposed multiscale neural network (Fan et al., 2018, 2019a). Following the idea of filtered back-projection (Schuster and Quintus-Bosz, 1993), the inverse map can be approximated by the adjoint map followed by a pseudo-differential filtering step. This suggests an architecture for the inverse map by reversing the architecture of the forward map followed with a simple two-dimensional convolution neural network.

For the test problems being considered, the resulting neural networks have parameters when the data is of size (a fully-connected layer results in parameters), thanks to the convolutional structure and the compact multiscale neural network. This rather small number of parameters allows for rapid and accurate training, even on rather limited data sets.

#### Organization.

This rest of the paper is organized as follows. The mathematical background is given in Section 2. The design and architecture of the DNNs of the forward and inverse maps are discussed in Section 3. The numerical results in Section 4 demonstrate the numerical efficiency and the generalization of the proposed neural networks.

## 2 Mathematical analysis of traveltime tomography

### 2.1 Problem setup

This section describes the necessary mathematical insights that motivate the NN architecture design. Let us consider the so-called differential imaging setting, where a background slowness field is known, and denote by the solution of the eikonal equation associated with the field :

 (2)

Then for a perturbation to the slowness field, the difference in the traveltime naturally satisfies

 (3)

The imaging data consists of over all and : .

To better understand the dependence of on , we assume to be sufficient small and carry out a perturbative analysis. Squaring creftype 3 and canceling the background using creftype 2 result in

 (∇~us(x))T∇~us(x)+2(∇us0(x))T∇~us(x)=~m(x)2+2m0(x)~m(x). (4)

Since is sufficiently small, is also a small quantity. Keeping only linear terms in and discarding the higher order ones yields

 ∇us0(x)T∇~us(x)≈m0(x)~m(x), (5)

which is an advection equation. Using , one can further simplify the upper equation as

 ˆ∇us0(x)T∇~us(x)≈~m(x), (6)

where stands for the unit vector.

For simplicity, let be the unique characteristic of that connects and . Then

 d(xs,xr)≡~us(xr)≈∫C0(xs,xr)~m(x)dx≡d1(xs,xr), (7)

where is introduced to stand for the first-order approximation to . Particularly, if the background slowness field is a constant, then is a line segment with start and end points to be and , respectively, and

 d1(xs,xr)=|xs−xr|∫10~m(xs+τ(xr−xs))dτ. Figure 1: Illustration of the problem setup. The domain is a unit disk and the light sources and the receivers are equidistantly placed on the boundary.

The most relevant geometry in traveltime tomography either for medicine and earth science is the circular geometry where is modeled as a unit disk (Chung et al., 2011; Deckelnick et al., 2011; Yeung et al., 2018). As illustrated in Fig. 1, the sources and receivers are placed on the boundary equidistantly. More precisely, with , and with , , where in the current setup.

Often in many cases, the background slowness field is only radially dependent, or even a constant (Deckelnick et al., 2011; Yeung et al., 2018). In what follows, is assumed to be radially dependent, i.e., .

### 2.2 Mathematical analysis on the forward map

Since the domain is a disk, it is convenient to rewrite the problem in the polar coordinates. Let , and , where is the radial coordinate and are the angular ones.. Figure 2: Visualization of the slowness field and the measurement data. The upper figures are the perturbation of the slowness field ~m(x) (m0=1 and ~m≤0 in this sample), the measurement data us(xr) and the difference d(xs,xr) with respect to the background measurement data. The lower-left figure is ~m(x) in the polar coordinates and the lower-right two figures the “shear” of their corresponding upper figures.

Figure 2 presents an example of the slowness field and the measurement data. Notice that the main signal in and concentrates on the minor diagonal part. Due to the circular tomography geometry, it is convenient to “shear” the measurement data by introducing a new angular variable , where the difference here is understood modulus . As we shall see in the next section, this shearing step significantly simplifies the architecture of the NNs. Under the new parameterization, the measurement data is

 d(s,h)≡d(xs,xs+h). (8)

The same convention applies to its first order approximation: . By writing in the polar coordinates, the linear dependence of on in eq:d1 states that there exists a kernel distribution such that

 d1(s,h)=∫10∫2π0K(s,h,θ,ρ)~m(θ,ρ)dρdθ. (9)

#### Convolution form of the map m(θ,ρ)→d1(s,h).

Since the domain is a disk and is only radially independent, the whole problem is equivariant to rotation. In this case, the situation can be dramatically simplified. Precisely, we have the following proposition. There exists a function periodic in the last parameter such that

 d1(s,h)=∫10∫2π0κ(h,ρ,s−θ)~m(θ,ρ)dρdθ. (10)

Let and we parameterize the characteristic as , with and , . Then the relationship eq:d1 between and can be written as

 d1(s,h)=∫10~m(θs,h(τ),ρs,h(τ))∥p′s,h(τ)∥dτ.

Since the background slowness is radially independent, the characteristic is rotation invariant in the sense that for any , if is a parameterization of , then is a parameterization of . Hence, for any , if we rotate the system by a angular , then

Writing this equation in the form of eq:d1kernel directly yields . Hence, there is a periodic in the last parameter such that . This completes the proof.

Proposition 2.2 shows that acts on in the angular direction by a convolution, which is, in fact, the motivation behind shearing the measurement data . This property allows us to evaluate the map by a family of 1D convolutions, parameterized and .

#### Discretization.

All the above analysis is in the continuous space. One can apply a discretization on the eikonal equation creftype 1 by finite difference and solve it by fast sweeping method or fast marching method. Here we assume that the discretization of is on a uniform mesh on . More details of the discretization and the numerical solver will be discussed in the Section 4. With a slight abuse of notation, we use the same letters to denote the continuous kernels, variables and their discretization. Then the discretization version of Equations 10 and 9 is

 d(s,h)≈∑ρ,θK(s,h,θ,ρ)~m(θ,ρ)=∑ρ(κ(h,ρ,⋅)∗~m(⋅,ρ))(s). (11)

## 3 Neural networks for TT

In this section, we describe the NN architecture for the inverse map based on the mathematical analysis in Section 2. To start, we first study the NN for the forward map and then the inverse map.

#### Forward map.

The perturbative analysis in Section 2.2 shows that, when is sufficiently small, the forward map can be approximated by creftype 11. In terms of the NN architecture, for small , the forward map creftype 11 can be approximated by a (non-local) convolution on the angular direction and a fully-connected operator on the direction. In the actual implementation, it can be represented by the convolution layer by taking and as the channel dimensions. For larger , this linear approximation is no longer accurate. In order to extend the neural network for creftype 11

to the nonlinear case, we propose to increase the number of convolution layers and nonlinear activation functions.

[ht] , , Resampling data to fit for BCR-Net. with as the channel direction

Use BCR-Net to implement the convolutional neural network.

Reconstruct the result from the output of BCR-Net. Neural network architecture for the forward map .

In the direction, denote the number of channels by , whose value is problem-dependent and will be discussed in the numerical part. In the angular direction, since the convolution between and is global, in order to represent global interactions the window size of the convolution must satisfy the following relationship

 wNcnn≥Nθ, (12)

where is the number of layers and is number of discretization points on the angular direction. A simple calculation shows that the number of parameters of the neural network is . The recently proposed BCR-Net (Fan et al., 2019a) has been demonstrated to require fewer number of parameters and provide better efficiency for such global interactions. Therefore, in our architecture, we replace the convolution layers with the BCR-Net. The resulting neural network architecture for the forward map is summarized in Section 3

with an estimate of

parameters. The components are explained in the following.

• mapping to is the one-dimensional convolution layer with window size , channel number , activation function

and period padding on the first direction.

• BCR-Net is motivated by the data-sparse nonstandard wavelet representation of the pseudo-differential operators (Beylkin et al., 1991). It processes the information at different scale separately and each scale can be understood as a local convolutional neural network. The one-dimensional maps to where the number of channels and layers in the local convolutional neural network in each scale are and , respectively. The readers are referred to (Fan et al., 2019a) for more details on the BCR-Net.

#### Inverse map.

The perturbative analysis in Section 2.2 shows that if is sufficiently small, the forward map can be approximated by , the operator notation of the discretization creftype 11. Here is a vector indexed by , is a vector indexed by , and is a matrix with row indexed by and column indexed by .

The filtered back-projection method (Schuster and Quintus-Bosz, 1993) suggests the following formula to recover :

 ~m≈(KTK+ϵI)−1KTd, (13)

where is a regularization parameter. The first piece can also be written as a family of convolutions

 (KTd)(θ,ρ)=∑h(κ(h,ρ,⋅)∗d(⋅,h))(θ). (14)

The application of to can be approximated with a similar neural network to in Section 3. The second piece is a pseudo-differential operator in the space and it is implemented with several two-dimensional convolutional layers for simplicity. Then the resulting architecture for the inverse map is summarized in Section 3 and illustrated in Fig. 3. The used in Section 3 is the two-dimensional convolution layer with window size , channel number , activation function and periodic padding on the first direction and zero padding on the second direction. The selection of the hyper-parameters in Section 3 will be discussed in Section 4.

[htb] , , , , Application of to with as the channel direction

Application of from to Neural network architecture for the inverse problem . Figure 3: Neural network architecture for the inverse map of TT.

## 4 Numerical tests

This section reports the numerical performance of the proposed neural network architecture in Section 3 for the inverse map .

### 4.1 Experimental setup

In order to solve the eikonal equation creftype 1 on the unit disk , we embed into the square domain by specifying sufficiently large slowness values outside . The domain is discretized with a uniform Cartesian mesh with points in each direction by a finite difference scheme. The fast sweeping method proposed in (Zhao, 2005) is used to solve the nonlinear discrete system. In the polar coordinates, the domain is partitioned by uniformly Cartesian mesh with points, i.e., and . As used in Section 3 is in the polar coordinates while the eikonal equation is solved in the Cartesian ones, the perturbation of the slowness field is treated as a piecewise linear function in the domain

and is interpolated on to the polar grid. The number of sources and receivers as

, and hence .

The NN in Section 3

is implemented with Keras

(Chollet et al., 2015) running on top of TensorFlow (Abadi et al., 2016). All the parameters of the network are initialized by Xavier initialization (Glorot and Bengio, 2010)

. The loss function is the mean squared error, and the optimizer is the Nadam

(Dozat, 2016). In the training process, the batch size and the learning rate is firstly set as and respectively, and the NN is trained epochs. One then increases the batch size by a factor till with the learning rate unchanged, and then decreases the learning rate by a factor to with the batch size fixed as . In each step, the NN is trained with epochs. For the hyper-parameters used in Section 3, , , and . The selection of the channel number will be studied later.

### 4.2 Results

For a fixed , stands for the exact measurement data solved by numerical discretization of creftype 1. The prediction of the NN from is denoted by

. The metric for the prediction is the peak signal-to-noise ratio (PSNR), which is defined as

 PSNR=10log10(Max2MSE),  Max=maxij(~mij)−minij(~mij),  MSE=1NθNρ∑i,j|~mi,j−~mNNi,j|2. (15)

For each experiment, the test PSNR is then obtained by averaging creftype 15 over a given set of test samples. The numerical results presented below are obtained by repeating the training process five times, using different random seeds for the NN initialization.

The numerical experiments focus on the shape reconstruction setting (Üstündag, 2008; Deckelnick et al., 2011), where are often piecewise constant inclusions. The background slowness field is set as and the slowness field is assumed to be the sum of piecewise constant ellipses. As the slowness field is positive, it is required that . For each ellipse, the direction is uniformly random over the unit circle, the position is uniformly sampled in the disk, and the width and height depend on the datasets. It is also required that each ellipse lies in the disk and there is no intersection between every two ellipses. Three types of data sets are generated to test the neural network.

• Negative inclusions. , the perturbation of the slowness, is in the ellipses and

otherwise, and the width and height of each ellipse are sampled from the uniform distributions

and , respectively.

• Positive inclusions. is in the ellipses and otherwise, and the width and height of each ellipse are sampled from and , respectively.

• Mixture inclusions. The setup of each ellipse is either a negative one in the negative inclusions or a positive one in the positive inclusions.

For each type, we generate two datasets for and . For each test, samples are generated with used for training and the remaining for testing. Figure 4: The test PSNR for different channel numbers c for the three types of data with Ne=4. Figure 5: NN prediction of a sample in the test data for negative (first row) / positive (second row) / mixture (third row) inclusions with Ne=4 for different noise level δ=0, 2% and 10%. Figure 6: NN generalization test for the negative inclusions. The upper (or lower) figures: the NN is trained by the data of the number of ellipses Ne=2 (or 4) with noise level δ=0, 1% or 2% and test by the data of Ne=4 (or 2) with the same noise level. Figure 7: NN generalization test for different types of data sets. The first column is the reference solution. In each column of the last three columns, the NN is trained with one data type (negative, positive, or mixed) and is tested on all three data types with Ne=4 and without noise.

The first numerical study is concerned with the choice of channel number in Section 3. Figure 4 presents the test PSNR and the number of parameters with different channel number for three types of data sets with . As the channel number increases, the test PSNR first consistently increases and then saturates for all the three types of data. Notice that the number of parameters of the neural network is . The choice of is a reasonable balance between accuracy and efficiency and the total number of parameters is K.

To model the uncertainty in the measurement data, we introduce noises to the measurement data by defining , where

is a Gaussian random variable with zero mean and unit variance and

controls the signal-to-noise ratio. In terms of the actual data of the differential imaging, . Notice that, since the mean of for all the samples lies in in these experiments, the signal-to-noise ratio for is in fact more than . For each noisy level , , , an independent NN is trained and tested with the noisy data set .

Figure 5 collects, for different noise level , samples for all three data types: (1) negative inclusions with , (2) positive inclusions with , and (3) mixture inclusions with . The NN is trained with the datasets generated in the same way as the test data. When there is no noise in the measurement data, the NN consistently gives accurate predictions of the slowness field , in the position, shape, and direction of the ellipse. For the small noise levels, for example, , the boundary of the shapes slightly blurs while the position and direction of the ellipse are still correct. As the noise level increases, the shapes become fuzzy but the position and number of shapes are always correct. This demonstrates the proposed NN architecture is capable of learning the inverse problem.

The next test is about the generalization of the proposed NN. We first train the NN by the data set of the negative inclusions with (or ) with noise level , or and test by the data of the negative inclusions with (or ) with the same noise level. The results, presented in Fig. 6, indicate that the NN trained by the data with two inclusions is capable of recovering the measurement data of the case with four inclusions, and vice versa. This shows that the trained NN is capable of predicting beyond the training scenario.

The last test is about the prediction power of the NN on one data type while trained with another. In Fig. 7, the first column is the reference solution. In each of the rest three columns, the NN is trained with one data type (negative, positive, or mixed) and is tested on all three data types, with and without noise. The figures in the second column show that the NN trained by negative inclusions fails to capture the information of the positive inclusions, and vice versa, the third column demonstrates that the NN trained with positive inclusions fails for the negative inclusions. On the other hand, the NN trained with mixed inclusions is capable of predicting reasonably well for all three data types.

## 5 Discussions

This paper presents a neural network approach for the inverse problems of first arrival traveltime tomography, by using the NN to approximate the whole inverse map from the measurement data to the slowness field. The perturbative analysis, which indicates that the linearized forward map can be represented by a one-dimensional convolution with multiple channels, inspires the design of the whole NN architectures. The analysis in this paper can also be extended to the three-dimensional TT problems by leveraging recent work such as (Cohen et al., 2018).

The work of Y.F. and L.Y. is partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program. The work of L.Y. is also partially supported by the National Science Foundation under award DMS-1818449.

## References

• Abadi et al. (2016) Martín Abadi et al. Tensorflow: A system for large-scale machine learning. In OSDI, volume 16, pages 265–283, 2016.
• Adler and Öktem (2017) Jonas Adler and Ozan Öktem. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems, 33(12):124007, 2017.
• Araya-Polo et al. (2018) M. Araya-Polo, J. Jennings, A. Adler, and T. Dahlke. Deep-learning tomography. The Leading Edge, 37(1):58–66, 2018.
• Backus and Gilbert (1968) George Backus and Freeman Gilbert. The resolving power of gross Earth data. Geophysical Journal International, 16(2):169–205, 1968.
• Bar and Sochen (2019) Leah Bar and Nir Sochen. Unsupervised deep learning algorithm for PDE-based forward and inverse problems. arXiv preprint arXiv:1904.05417, 2019.
• Berg and Nyström (2018) Jens Berg and Kaj Nyström.

A unified deep artificial neural network approach to partial differential equations in complex geometries.

Neurocomputing, 317:28–41, 2018.
• Beylkin et al. (1991) Gregory Beylkin, Ronald Coifman, and Vladimir Rokhlin. Fast wavelet transforms and numerical algorithms I. Communications on pure and applied mathematics, 44(2):141–183, 1991.
• Born and Wolf (1965) Max Born and Emil Wolf. Principles of optics: electromagnetic theory of propagation, interference and diffraction of light. Oxford: Pergamon, 3rd edition, 1965.
• Carleo and Troyer (2017) Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.
• Chollet et al. (2015) François Chollet et al. Keras. https://keras.io, 2015.
• Chung et al. (2011) Eric Chung, Jianliang Qian, Gunther Uhlmann, and Hongkai Zhao. An adaptive phase space method with application to reflection traveltime tomography. Inverse Problems, 27(11):115002, 2011.
• Cohen et al. (2018) Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. In International Conference on Learning Representations, 2018.
• Cybenko (1989) George Cybenko.

Approximation by superpositions of a sigmoidal function.

Mathematics of control, signals and systems, 2(4):303–314, 1989.
• Deckelnick et al. (2011) Klaus Deckelnick, Charles M Elliott, and Vanessa Styles. Numerical analysis of an inverse problem for the eikonal equation. Numerische Mathematik, 119(2):245, 2011.
• Dozat (2016) Timothy Dozat.

International Conference on Learning Representations, 2016.
• E and Yu (2018) Weinan E and Bing Yu. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
• Fan and Ying (2019a) Yuwei Fan and Lexing Ying. Solving electrical impedance tomography with deep learning. arXiv preprint arXiv:1906.03944, 2019a.
• Fan and Ying (2019b) Yuwei Fan and Lexing Ying. Solving optical tomography with deep learning. arXiv preprint arXiv:1910.04756, 2019b.
• Fan et al. (2018) Yuwei Fan, Lin Lin, Lexing Ying, and Leonardo Zepeda-Núñez. A multiscale neural network based on hierarchical matrices. arXiv preprint arXiv:1807.01883, 2018.
• Fan et al. (2019a) Yuwei Fan, Cindy Orozco Bohorquez, and Lexing Ying. BCR-Net: a neural network based on the nonstandard wavelet form. Journal of Computational Physics, 384:1–15, 2019a.
• Fan et al. (2019b) Yuwei Fan, Jordi Feliu-Fabà, Lin Lin, Lexing Ying, and Leonardo Zepeda-Núñez. A multiscale neural network based on hierarchical nested bases. Research in the Mathematical Sciences, 6(2):21, 2019b.
• Feliu-Faba et al. (2019) Jordi Feliu-Faba, Yuwei Fan, and Lexing Ying. Meta-learning pseudo-differential operators with deep neural networks. arXiv preprint arXiv:1906.06782, 2019.
• Glorot and Bengio (2010) Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In

Proceedings of the thirteenth international conference on artificial intelligence and statistics

, pages 249–256, 2010.
• Goodfellow et al. (2016) Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
• Han et al. (2018a) J. Han, A. Jentzen, and W. E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018a.
• Han et al. (2018b) Jiequn Han, Linfeng Zhang, Roberto Car, and Weinan E. Deep potential: A general representation of a many-body potential energy surface. Communications in Computational Physics, 23(3):629–639, 2018b.
• Hinton et al. (2012) G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012. ISSN 1053-5888.
• Hoole (1993) S Ratnajeevan H Hoole. Artificial neural networks in the solution of inverse electromagnetic field problems. IEEE transactions on Magnetics, 29(2):1931–1934, 1993.
• Ishii (1987) Hitoshi Ishii. A simple, direct proof of uniqueness for solutions of the Hamilton-Jacobi equations of eikonal type. Proceedings of the American Mathematical Society, pages 247–251, 1987.
• Jin and Wang (2006) Xing Jin and Lihong V Wang. Thermoacoustic tomography with correction for acoustic speed variations. Physics in Medicine & Biology, 51(24):6437, 2006.
• Kabir et al. (2008) Humayun Kabir, Ying Wang, Ming Yu, and Qi-Jun Zhang. Neural network inverse modeling and applications to microwave filter design. IEEE Transactions on Microwave Theory and Techniques, 56(4):867–879, 2008.
• Kao et al. (2005) Chiu-Yen Kao, Stanley Osher, and Yen-Hsi Tsai. Fast sweeping methods for static Hamilton–Jacobi equations. SIAM journal on numerical analysis, 42(6):2612–2632, 2005.
• Khoo and Ying (2018) Yuehaw Khoo and Lexing Ying. SwitchNet: a neural network model for forward and inverse scattering problems. arXiv preprint arXiv:1810.09675, 2018.
• Khoo et al. (2017) Yuehaw Khoo, Jianfeng Lu, and Lexing Ying. Solving parametric PDE problems with artificial neural networks. arXiv preprint arXiv:1707.03351, 2017.
• Khoo et al. (2019) Yuehaw Khoo, Jianfeng Lu, and Lexing Ying. Solving for high-dimensional committor functions using artificial neural networks. Research in the Mathematical Sciences, 6(1):1, 2019.
• Kosovichev (1996) Alexander G Kosovichev. Tomographic imaging of the Sun’s interior. The Astrophysical Journal Letters, 461(1):L55, 1996.
• Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, pages 1097–1105, USA, 2012. Curran Associates Inc.
• Kutyniok et al. (2019) Gitta Kutyniok, Philipp Petersen, Mones Raslan, and Reinhold Schneider. A theoretical analysis of deep neural networks and parametric PDEs. arXiv preprint arXiv:1904.00377, 2019.
• LeCun et al. (2015) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(436), 2015.
• Leung et al. (2014) Michael K. K. Leung, Hui Yuan Xiong, Leo J. Lee, and Brendan J. Frey. Deep learning of the tissue-regulated splicing code. Bioinformatics, 30(12):i121–i129, 2014.
• Leung et al. (2006) Shingyu Leung, Jianliang Qian, et al. An adjoint state method for three-dimensional transmission traveltime tomography using first-arrivals. Communications in Mathematical Sciences, 4(1):249–266, 2006.
• Li et al. (2019) Yingzhou Li, Jianfeng Lu, and Anqi Mao. Variational training of neural network approximations of solution maps for physical models. arXiv preprint arXiv:1905.02789, 2019.
• Long et al. (2018) Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-net: Learning PDEs from data. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3208–3216, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.
• Lucas et al. (2018) Alice Lucas, Michael Iliadis, Rafael Molina, and Aggelos K Katsaggelos. Using deep neural networks for inverse problems in imaging: beyond analytical methods. IEEE Signal Processing Magazine, 35(1):20–36, 2018.
• Ma et al. (2015) Junshui Ma, Robert P. Sheridan, Andy Liaw, George E. Dahl, and Vladimir Svetnik. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 55(2):263–274, 2015. doi: 10.1021/ci500747n.
• Munk et al. (2009) Walter Munk, Peter Worcester, and Carl Wunsch. Ocean acoustic tomography. Cambridge university press, 2009.
• Popovici and Sethian (1997) Alexander Mihai Popovici and James Sethian. Three dimensional traveltime computation using the fast marching method. In SEG Technical Program Expanded Abstracts 1997, pages 1778–1781. Society of Exploration Geophysicists, 1997.
• Qian et al. (2007) Jianliang Qian, Yong-Tao Zhang, and Hong-Kai Zhao. Fast sweeping methods for eikonal equations on triangular meshes. SIAM Journal on Numerical Analysis, 45(1):83–107, 2007.
• Raissi and Karniadakis (2018) Maziar Raissi and George Em Karniadakis. Hidden physics models: Machine learning of nonlinear partial differential equations. Journal of Computational Physics, 357:125 – 141, 2018. ISSN 0021-9991.
• Raissi et al. (2019) Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
• Rawlinson et al. (2010) Nicholas Rawlinson, S Pozgay, and S Fishwick. Seismic tomography: a window into deep Earth. Physics of the Earth and Planetary Interiors, 178(3-4):101–135, 2010.
• Rudd and Ferrari (2015) Keith Rudd and Silvia Ferrari. A constrained integration (CINT) approach to solving partial differential equations using artificial neural networks. Neurocomputing, 155:277–285, 2015.
• Schmidhuber (2015) Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85–117, 2015. ISSN 0893-6080.
• Schomberg (1978) Hermann Schomberg. An improved approach to reconstructive ultrasound tomography. Journal of Physics D: Applied Physics, 11(15):L181, 1978.
• Schuster and Quintus-Bosz (1993) Gerard T Schuster and Aksel Quintus-Bosz. Wavepath eikonal traveltime inversion: Theory. Geophysics, 58(9):1314–1323, 1993.
• Sethian (1999) James A Sethian. Fast marching methods. SIAM review, 41(2):199–235, 1999.
• Sutskever et al. (2014) Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 3104–3112. Curran Associates, Inc., 2014.
• Tan et al. (2018) Chao Tan, Shuhua Lv, Feng Dong, and Masahiro Takei. Image reconstruction based on convolutional neural network for electrical resistance tomography. IEEE Sensors Journal, 19(1):196–204, 2018.
• Üstündag (2008) D Üstündag. Retrieving slowness distribution of a medium between two boreholes from first arrival traveltimes. Int. J. Geol, 2:1–8, 2008.
• Yeung et al. (2018) Tak Shing Au Yeung, Eric T Chung, and Gunther Uhlmann. Numerical inversion of 3d geodesic X-ray transform arising from traveltime tomography. arXiv preprint arXiv:1804.10006, 2018.
• Zhao (2005) Hongkai Zhao. A fast sweeping method for eikonal equations. Mathematics of computation, 74(250):603–627, 2005.