Machine learning formulation for the Factored Eikonal Equation
The recent deep learning revolution has created an enormous opportunity for accelerating compute capabilities in the context of physics-based simulations. Here, we propose EikoNet, a deep learning approach to solving the Eikonal equation, which characterizes the first-arrival-time field in heterogeneous 3D velocity structures. Our grid-free approach allows for rapid determination of the travel time between any two points within a continuous 3D domain. These travel time solutions are allowed to violate the differential equation - which casts the problem as one of optimization - with the goal of finding network parameters that minimize the degree to which the equation is violated. In doing so, the method exploits the differentiability of neural networks to calculate the spatial gradients analytically, meaning the network can be trained on its own without ever needing solutions from a finite difference algorithm. EikoNet is rigorously tested on several velocity models and sampling methods to demonstrate robustness and versatility. Training and inference are highly parallelized, making the approach well-suited for GPUs. EikoNet has low memory overhead, and further avoids the need for travel-time lookup tables. The developed approach has important applications to earthquake hypocenter inversion, ray multi-pathing, and tomographic modeling, as well as to other fields beyond seismology where ray tracing is essential.READ FULL TEXT VIEW PDF
Seismic wave propagation forms the basis for most aspects of seismologic...
We introduce a scheme for probabilistic hypocenter inversion with Stein
We propose a novel deep learning approach to efficiently perform Bayesia...
Numerical solutions to the equation for advection are determined using
We develop a deep learning approach to extract ray directions at discret...
Accurate forward modeling is important for solving inverse problems. An
While the possibility of time travel in physics is still debated, the
Machine learning formulation for the Factored Eikonal Equation
Three-dimensional ray tracing is a fundamental component of modern seismology, having direct applications to earthquake hypocenter inversions , seismic tomography , and earthquake source properties 
. These derived products further form the basis for many downstream seismological applications. The Eikonal Equation is a well-known nonlinear partial differential Equation that characterizes the first-arrival-time field for a given source location in a 3D medium. The Eikonal formulation can be solved with several finite difference algorithms ,,, with varying computational demands and stabilities to the solutions.
Recent advances in deep learning have shown to be extremely promising in the context of physics-based simulations [5, 26]. This technology has started to be applied to geophysics as well, for example, to predict the acoustic wave response of a medium given a velocity model as input , forecast the next time step of a wavefield conditional on its history , and accelerate viscoelastic simulations 
. However, these techniques rely on inputs from pre-computed physics based models, which could themselves contain modeling-based artifacts and input bias. Instead, we wish to learn the underlying physics by incorporating the formulations into the neural network architecture and loss function. These physics-based procedures[19, 24, 16, 1, 11]
take advantage of the backpropagation procedure to compute the gradient of the neural network output relative to the input terms. Such physics-informed neural networks[19, 16] are mesh independent, giving a continuous function output related to the inputs. These techniques have been used to learn the parametrization for formulations of the Burgers, Schrodinger, and Navier-Stokes Equations, with comparisons made to the numerical derivatives .
In this paper, we propose EikoNet, an approach to solving the factored Eikonal Equation directly with deep neural networks. EikoNet can learn the travel-time between any two points in a truly continuous 3D medium, avoiding the use of grids. We leverage the differentiability of neural networks to analytically compute the spatial gradients of the travel-time field, and train the network to minimize the difference between the true and predicted velocity model for the factored Eikonal formulation. EikoNet is massively parallelized and therefore well-suited for GPUs, has low memory overhead, and avoids the need for travel-time lookup tables. Additionally, EikoNet has several novel advantages that are not currently offered with conventional finite differencing schemes.
The Eikonal Equation is a nonlinear first-order partial differential Equation representing a high-frequency approximation to the propagation of waves in heterogeneous media . The formulation takes the general form;
where is the Euclidean Norm, is the travel-time through the medium from a source location to a receiver location , is the velocity of the medium at the receiver location, is the slowness of the medium at the receiver location, and the gradient at the reciever location.
The value of travel-time is computed by minimizing the misfit of a travel-time field that satisfies the user imposed velocity model, with the additional boundary condition that the travel-time at the source location equals zero, . Solutions to Equation 1 have a strong singularity at the source location , leading to numerical errors close to the source. To mitigate such singularity effects, a factored formulation is often used, with solutions representing the travel-time deviation from a homogeneous medium with . The factored travel-time form can then be represented by:
where , representing the distance function from the source location, and the deviation of the travel-time field from a model travel-time with homogeneous unity velocity. Substituting the formulation of Equation 2 into Equation 1
and expanding using the chain rule, then the velocity can be represented by;
Our approach to solving the Eikonal Equation trains a deep neural network, , to predict the deviation of the travel-time field, , between an input pair of source-receiver coordinates, . The deviation of the travel-time field is then represented by,
If was known, a neural network could be trained for a catalog source-receiver pairs. However, itself is unknown, being the solution to the Equation that we want to solve. Instead, only the velocity model and a differential Equation specifying how relates to are known, but this can be used to train the neural network. Thus, we cast the problem as one where we aim to accurately predict the local velocity, assuming that the output of is indeed ; and use this value to compute from Equation 3, defining this predicted velocity as . Here, we exploit the differentiability of deep neural networks to analytically determine the spatial gradient of Equation 4 with respect to .
Solving the factored Eikonal Equation is therefore reduced to training a neural network with supervision on the velocity model, by iteratively updating the parameters to minimize some loss function. Once trained, the Eikonal Equation is no longer needed, as outputs directly (Figure 1.2). In the process, the details of the velocity model will be encoded in the network, requiring the network to be re-trained if the velocity model is to be changed. The model architecture used in this study is a feed-forward network consisting of a series of residual blocks with fully-connected layers  followed by nonlinear units (Figure 113].
The neural network learns the travel time between any source receiver pair and must contain an adequate sampling from across the 3D medium. The input features are organized into a vector of six components,
where , , are the Cartesian coordinates of the source or receiver. The features are paired with the seismic velocity at the receiver location,
A training dataset therefore consists of many samples, taken from across the 3D volume (Figure 1.3). We discuss how these datasets are constructed in the following section.
The misfit between the predicted , as determined from Equation 3, and observed, , velocities are then minimised using a mean-squared error loss function,
We use the Adam optimization algorithm for training with a learning rate of  (Figure 1.4). The batch and dataset size are set to and respectively, with their variability discussed further in Section 6.2. The dataset is sepearted into training and validation data, with the validation dataset at of the total size. An addition test dataset is created representing source reciever pairs which are blind to the user prior to loss calculation.
Once trained the network can be applied to a series of user defined source-receiver pairs to determine the travel-time and predicted velocity, as shown in Figure 1.5.
Our approach builds a training dataset by randomly sampling source-receiver points from the continuous 3D medium and labeling each with the velocity at the receiver location. Since the velocity model is gridded, we use linear interpolation to map these values to a continuous domain. In this study, we explore three different methods for sampling the velocity model.
The dataset is composed of a series of source locations selected randomly across the model space. Once selected , the receiver locations are selected at a random distance away from the source location along a random vector. This method inherently allows the source-receivers pairs to have a distance distribution that is uniform across the model space.
The dataset could also be composed of a series of randomly selected source and receiver locations, this allows for a more random distribution of points across the model space but inherently has a non-uniform distributions of distances between source-receiver points as expressed by the Bertrand Paradox.
In complex velocity contrasting models the training procedure could quickly learn areas of simple velocity variations, we act to improve the training speed of this procedure by allowing dynamic resampling of the training source-receiver pairs for values of greatest misfit. For the first epoch of training we minimise the misfit between the actual and predicted velocity estimates using a L2-norm, but also determine a importance weight parameter,, for each training sample defined by:
where is the neural-network predicted velocity value and is the actual imposed velocity model. The weight value is high for the samples with the greatest relative misfit. For subsequent training epochs training samples are selected based on the weight value normalized by the maximum weight in the training dataset. To mitigate stagnation in the extremes of the weight distribution we project the weights between a user defined minimum and maximum, represented by . This bound was chosen to mitigate the undersampling of regions with low misfit and oversampling of regions that could contain singularities.
The accuracy of the solution to the Eikonal Equation is given directly by the loss, which quantifies the degree to which the solution violates the PDE in a least squares sense. Thus, after training is finished, we can predict the travel time to all points desired within the 3D medium and use Equation 1 to calculate the learned velocity model. Comparing the learned velocity model to the actual velocity model provides a visual and rigorous quantitative approach to understanding the accuracy of the solution (Figure 1.5).
The Fast Marching Method (FMM) is a grid based numerical scheme to determine the solution to the Eikonal Equation . The FMM uses a finite difference approach to track the evolution of the minimum travel-time using a advancing interface scheme . This solution returns the minimum travel-time from any source location to a receiver node in a user-defined mesh or regular grid. This formulation relies on a solution to the Eikonal formulation without singularities and therefore can fail if sharp velocities or under sampling. Typically to mitigate these effects a smoothed velocity model is supplied with adequate receiver node sampling to reconcile any velocity contrasts. As this procedure computes the travel-time from a source location to a series of receiver locations this procedure can be repeated for a number of source locations, with the computational cost and memory/storage requirements increasing with the number of source and receiver locations.
Throughout this study we utilize the python fast marching method toolkit scikit-fmm (https://pythonhosted.org/scikit-fmm/) to formulate the travel-time from a source location on a receiver geometry at grid spacing in the dimensions.
Outlined are a series of experiments designed to demonstrate the versatility of our approach for learning travel time fields in complex 3D velocity models. We consider four different velocity models and examine the performance of the trained network. Each network is trained with a batch size of , using a dataset with randomly located source-receiver pairs. The effects of dynamic sampling and weighting are discussed further in Section 6.
The first model we consider is a homogeneous 3D velocity structure with a value of (Figure 2.1). For demonstration purposes, a source is placed at , with both and slices of the travel time field shown. At each receiver point, we plot the learned velocity using Equation 1, and use this to determine the percent error. For comparison, we also show the FMM solution calculated on a grid of receivers with grid spacing in the , and dimensions. The training loss goes to zero after training epochs, showing that the network has learned the travel time field perfectly.
Next, we consider a velocity model that increases linearly with depth. The model increases from at the surface, to at depth. Comparison with the conventional finite-difference scheme shows good agreement, with a maximum difference between imposed and recovered velocity model of .
The third test conducted is a 3D model containing a central cube embedded within a homogeneous background (Figure 2.3). The cube has a constant velocity of , while the background velocity is . Here, the neural network approach has excellent misfit between the actual and learned velocity, with only some disagreement close to sharp velocity gradients. The neural-network travel-time field is similar to that of the finite-difference scheme, with the effects of the sharp velocity contrast shown in the deflection of the travel-time fronts, demonstrating that the neural network approach is able to reconcile even difficult cases with sharp velocity changes of of the mean value.
The final velocity model experiment contains a checkerboard, which we use to demonstrate that the proposed algorithm is able to reconcile positive and negative velocity anomalies varying spatially within the domain. This velocity model is implemented using the following Equation:
where is the velocity at a point. We compare the solution with the finite-difference scheme and actual velocity model showing that the deep learning approach is able to reconcile the velocity model with a maximum velocity difference of .
Having demonstrated that deep neural networks can indeed learn to directly solve the Eikonal Equation, we now examine the effect of the sampling scheme on the learned solution. Here, we use the block velocity model from the previous section and train separate models using each of the three sampling techniques described in Section 3.2. We also separately test how the total number of training samples and batch size affect the performance.
Figure 3 shows the validation results for each of the three sampling methods. The weighted random sampling approach achieves the lowest loss of the three sampling methods, yet converges in a similar number of training epochs. However, the other two methods still perform well, as the differences in validation loss are relatively small. We expect that the weighted random sampling will be of increasing importance in very complex 3D models and recommend it for selection of source-receiver pairs.
We investigate how the number of training samples and batch size affect the network performance. We re-run the simulation with three dataset sizes (, and ) and three batch sizes (, and ), inspecting the recovery of the final solution and the validation loss for each simulation. Figure 4i shows the variation of the optimal recovered solution for the different batch and sample sizes, with columns representing the increasing number of samples and rows the increase batch size. Figure 4j shows the validation loss for each of the separate simulations with the line colour corresponding to the panel color.
From Figure 4, it is clear that the number of training samples has a profound influence on the network performance. The dataset with samples achieves a loss that is about an order of magnitude lower than the dataset with samples. In addition, through inspection of the recovered velocity model we can see that the low sample size is unable to reconcile the complex velocity structure of the sharp velocity contrast of the block model. Therefore, it is crucial that an adequate number of training samples number be used when constructing the dataset. Future work could investigate dynamic sampling of the velocity model space to reconcile regions of greatest misfit.
We have demonstrated that deep neural networks can solve the Eikonal Equation to learn the travel-time field in heterogeneous 3D velocity structures. This procedure has demonstrated consistent calculations with prior finite-difference approaches for travel-time formulation.
Finite-difference approaches require computing the travel time field separately for every source location of interest, without any ability to pass along knowledge about the wavefield or velocity structure between simulations. The computation cost for the finite-difference approach therefore increases based on the number of source locations and receiver nodes, with the storage of these travel-time tables increasing drastically with grid size. The deep learning approach instead is able to learn from and generalize the knowledge acquired between multiple source locations, as much of the structure of the problem is highly similar. For example, if two sources are placed very close to each other, the travel-time field will be generally quite similar between them, and the information learned from solving the Equation for the first source can be used to more rapidly learn the solution for the second source. This is not the case for the finite difference algorithms, which have no notion of context. Implicitly, this means that the neural network is learning the velocity model. The disk storage size required is equal to the size of the neural network () as only the weights of the network have to be retained. Figure 5
demonstrates a schematic representing the disk storage required for the finite-difference and machine learning approach.
One of the distinguishing features of EikoNet is that the travel time solutions are valid for any two points within the 3D continuous domain. This means it is never necessary to store a travel time grid and interpolate it to achieve the desired result away from the grid nodes. Here, EikoNet automatically learns an optimized interpolation scheme during the training process, drawing on context from across the entire dataset.
A second important aspect is that solutions to the Eikonal Equation, as learned by EikoNet, are guaranteed to be differentiable back through the network with respect to the source or receiver locations. This has a variety of important practical applications, such as in locating earthquakes, as the inverse problem can be formulated as one of gradient descent, by analytically calculating the gradients of an objective function with respect to the source locations.
The Eikonal Equation is not just used in seismology, but numerous other domains of wave physics such as optics , medical imagery  and video-game rendering . It is expected that EikoNet would be just as suitable in these fields.
The computational cost of predicting the travel-time from a source to receiver location is equal to the time required to pass across the network. The low computation cost of a forward prediction means that the neural-network model can be used to significantly increase the computation speed of typically used ray-based procedures. Our approach is massively parallel and very well suited for GPUs. Below, we outline several areas that could benefit from this deep learning approach.
In earthquake location theory a series of seismic instruments are used to record the arrival time of the incoming seismic wave. These instrument arrival times are then inverted using a velocity model to determine an earthquake location and location uncertainty. In recent years advances in seismological instrumentation have allowed for the incorporation of Distributed Accoustic Sensing (DAS), using optical fibres as a series of non-discretized station locations . Current travel-time finite-difference techniques are not tractable for handling the tens of thousands of virtual receivers that DAS arrays provide. This would require a large computational cost and disk storage space. In comparison our machine learning technique has a fixed disk storage space and the computation cost only that of the forward prediction from the network.
The Eikonal formulation represents the first arrival between source and receiver locations. However, in complex velocity models sharp gradients in the velocity structure can produce a multi-pathing effect with the energy partitioned between multiple ray paths. acted to mitigate this effect by employing FMM to track the evolution of the seismic wavefront for a narrow band of nodes, representing a interface of interest. Once the outgoing wave traverses one of these node locations an additional simulation is triggered and the combined wavefield used to determine a possible secondary pathway. The deep learning approach can be adapted to include additional secondary arrivals by integrating the travel-time along thousands of arbitrary pathways at a small additional computational cost. Each pathway can be inspected to determine its plausibility of being a secondary arrival, possible arrival time and corresponding amplitudes.
For tomographic inversions which undergo many iterations successively, new travel-time fields must be computed from scratch for each iteration. Our approach allows for the neural network model from the previous iteration to be used as the starting point for the next training procedure, which could rapidly converge to the new velocity model if the perturbations are relatively small. This would effectively remove ray tracing as a computational burden from this part of the tomography, as nearly all of the compute time would be spent on the very first tomography iteration.
The computation cost in the training procedure is in learning the complexities of the velocity model space. If the velocity model is an unknown but the user has some prior knowledge of possible arrival time differences, then this approach could be updated to do some form of tomographic inversion. This procedure instead would learn the velocity model to fit some known travel-time values. We perceive that this addition can be made in the loss function term itself, where an additional loss term can be used to update the velocity model to try and mitigate known observations. Prior finite-difference methods would have difficulty with this procedure as the travel-time field would have to be recalculated for each update to the velocity model.
This project was partly supported by a grant from the USGS. K. Azizzadenesheli gratefully acknowledge the financial support of Raytheon and Amazon Web Services. An outline for the code used throughout this article can be found at https://github.com/Ulvetanna/EikoNet.