HypoSVI
Hypocentral earthquake location using Steinvariational gradient descent and deep neural network traveltime formulations
view repo
We introduce a scheme for probabilistic hypocenter inversion with Stein variational inference. Our approach uses a differentiable forward model in the form of a physicsinformed neural network, which we train to solve the Eikonal equation. This allows for rapid approximation of the posterior by iteratively optimizing a collection of particles against a kernelized Stein discrepancy. We show that the method is wellequipped to handle highly nonconvex posterior distributions, which are common in hypocentral inverse problems. A suite of experiments is performed to examine the influence of the various hyperparameters. Once trained, the method is valid for any network geometry within the study area without the need to build travel time tables. We show that the computational demands scale efficiently with the number of differential times, making it ideal for largeN sensing technologies like Distributed Acoustic Sensing.
READ FULL TEXT VIEW PDF
This paper presents the potential of applying physicsinformed neural
ne...
read it
In this paper, we introduce a new form of amortized variational inferenc...
read it
The recent deep learning revolution has created an enormous opportunity ...
read it
We propose a Bayesian physicsinformed neural network (BPINN) to solve ...
read it
As the large amount of sequencing data accumulated in past decades and i...
read it
Numerical solutions to the equation for advection are determined using
d...
read it
We develop a distributed framework for the physicsinformed neural netwo...
read it
Hypocentral earthquake location using Steinvariational gradient descent and deep neural network traveltime formulations
Earthquake hypocenters represent the points in space and time at which earthquakes occur. They are a fundamental component of many downstream analyses in seismology, from seismic tomography to earthquake source properties. They are also used for realtime earthquake forecasting, such as during active sequences. Thus the ability to reliably estimate hypocenters and characterize their uncertainty is of major importance in seismology.
Determining a hypocenter from observations of seismic waves is a classic inverse problem in geophysics (Geiger, 1912; Thurber, 1985)
. More recently, Bayesian inference has been used for hypocenter inversion
(Tarantola, 2004; Lomax et al., 2000), in which prior information is combined with some observations to infer posterior distributions over the hypocentral parameters. In their most general form, the travel time solutions from ray theory are nonlinear in the hypocentral coordinates, which for this particular problem adds nonconvexity to the posterior. Furthermore, and perhaps a more serious issue, is that the observations of seismic wave arrival times often contain significant errors resulting from the widespread adoption of automated picking algorithms (Ross et al., 2018; Mousavi et al., 2020). One strategy for adding robustness to hypocenter inversions has been to incorporate nonstandard likelihood functions into the inverse problem, which has significantly improved the results but at the cost of creating highly nonconvex posteriors. This makes many common techniques for performing Bayesian inference, such as Markov chain Monte Carlo or variational inference, illsuited for this particular problem with the nonconvex nature of the posterior expected to be computational slow.
Recent advances in deep learning have led to the development of physicsinformed neural networks (PINNs), which are designed to learn solutions to partial differential equations (PDEs). Such approaches have a number of appealing properties that are not present with conventional approaches like finite difference methods, for example the solutions can be made differentiable, are often meshfree, and can be rapidly calculated upon demand. These properties make PINNs wellsuited as the forward model in an inverse problem, in particular since it often is desirable to take gradients of some objective function. Sampling and ensemble methods such as MCMC or variational inference typically require many evaluations of the forward model, so solving this with a PINN makes it more computationally tractable.
The rise of deep learning over the last decade has accelerated the development of novel approaches for performing Bayesian inference. One notable example is Stein variational inference (SVI), in which a collection of particles is iteratively optimized to approximate a target posterior (Qiang & Dilin, 2016). It is better suited than standard variational techniques at handling multimodal distributions because the number of modes does not need to be known a priori; this results from a kernelized objective function that creates a natural repulsive force between the particles. SVI requires evaluating many gradients and as such, benefits from the differentiability of PINNs.
Our contributions to this paper are as follows: (1) we develop a framework for earthquake hypocenter inversion using Stein variational inference; (2) we incorporate a PINN trained to solve the Eikonal equation as a forward model; (3) we perform experiments on the hyperparameters of the inverse problem to characterize their effect on the solution; and (4) we benchmark the method against a catalog of earthquakes from Southern California.
In this section, we provide background information on Stein variational inference as well as physicsinformed neural networks.
For two random variables
and , let denote the prior on , the likelihood function, and the posterior over after observing (conditioning on) . Using celebrated Bayes rule, these quantities are related as, , where , and the normalization constant .Let denote a reproducing kernel Hilbert space on the domain , with a positive definite reproducing kernel , endowed with the inter product and the norm . We further define , as a set of multivalued functions, with values, with the corresponding norm , where for any we have .
For a function , we define Stein’s operator endowed with and as,
(1) 
We further define the kernelized Stein’s discrepancy between two distributions and using is as follows,
(2) 
This discrepancy equates to zero when . Fortunately, the maximization in Eq. 2 has a closedform solution where is the maximizer.
Now consider the Kullback–Leibler between and , i.e., divergence. We aim to find a gradient direction , a function, such that maximally reduces the KL divergence. Using , we can use gradient descent with learning rate and update to reduce the , and make the closer to . It is known that for the kernelized Stein’s discrepancy, the direction that provides the direction of maximal change is (Qiang & Dilin, 2016). In the following, we provide an update rule to update and approximate the posterior given observed data.
We represent with a set of particles, i.e., an average of many delta Dirac measures. where q is the distribution of this particle. In the following, we update , and make it closer to by moving the particles. Therefore, for the update direction , at each point , we have,
with the updating rule given by,
(3) 
Here is the step size at the
th epoch. For the choice of kernel, we deploy the celebrated Radial Basis Function (RBF),
, with representing the width of kernel, for its empirical and universal approximation properties. As discussed above, the update in Eq. 3, updates (through updating the particles distribution) at each time step to make it closer to in the Stein’s discrepancy sense.In solving inverse problems for earthquake hypocenters, the most common approach is to use a ray theoretical forward model to calculate the expected travel times, , for seismic waves propagating from a given source location to a receiver location. In heterogeneous 3D Earth models, the Eikonal equation is often solved to determine (Rawlinson & Sambridge, 2005),
(4) 
where is the Euclidean norm, is the traveltime through the medium from a source location to a receiver location , is the velocity of the medium at the receiver location, is the slowness of the medium at the receiver location, and the gradient at the receiver location.
Smith et al. (2020) developed a PINN approach to solving the factored Eikonal equation (EikoNet), which trains a deep neural network to calculate the traveltime between any two points in a 3D medium for a given velocity model, satisfying the additional boundary condition that the traveltime at the source location equals zero, . We leverage a factored eikonal formulations to mitigate the strong singularity affects at the source location, representing the traveltime as a deviation from a homogeneous medium with (Treister et al., 2016). The factored traveltime form can then be represented by:
(5) 
where , representing the distance function from the source location, and the deviation of the traveltime field from a model traveltime with homogeneous unity velocity. Substituting the formulation of equation 5 into equation 4
and expanding using the chain rule, then the velocity can be represented by;
(6) 
They leveraged the analytical differentiability of neural networks to solve the factored Eikonal equation from scratch, without the use of finitedifference solutions during training. Once trained, a network describing the traveltime between any sourcereceiver pair can be represented by:
(7) 
where is the traveltime between the source location and receiver location , and is the neural network with weights and biases given by .
EikoNet has several properties that are mathematically advantageous in solving inverse problems. First, the solutions to the Eikonal equation are meshindependent, i.e. they are not discretized on a fixed grid and can be evaluated at truly any point within the 3D medium. Second, the network is a forward model that is analytically differentiable, which allows for gradientbased methods to be efficiently employed to calculate a downstream objective function. Third, by approximating the Eikonal equation with a deep neural network, the optimization part of the inverse problem is easily solved with graphical processing units (GPUs).
We now present an approach for probabilistic hypocenter inversion that uses a PINN as a forward model and SVI to approximate the posterior distribution. The method consists of several primary steps:
An EikoNet model is trained for a given Earth velocity model to solve the Eikonal equation. This is performed for both Pwaves and Swaves.
A collection of particles is randomly initialized throughout the geographic study area. These represent preliminary hypocenter locations.
Travel times are calculated with EikoNet from each particle to every receiver with an observation.
The synthetic travel times are used together with the data to calculate a kernelized Stein discrepancy (loss function).
The gradients of the loss are calculated with automatic differentiation and used to collectively update the particles’ locations.
Steps 35 are repeated until convergence. The final collection of particle positions will approximate the posterior distribution of the hypocenter.
Uncertainty estimates are extracted from the particles using kernel density methods.
Next, we provide a detailed discussion of each stage of the procedure, with the outline of the inversion given in Figure 1.
Throughout this study we train EikoNet traveltime models using a set of constant training parameters and network architecture as described in Smith et al. (2020) and supplied in Table 1. A model region is defined spanning our Longitude, Latitude, depth regional of interest, with xmin and xmax locations as [, , ] and [, , ] respectively. The grid is projected to a UTM coordinate system, with random sourcereciever locations selected within the UTM model space. These points represent the training locations, with different velocity models discussed below.
Parameter  Value  

Dataset Size  
Validation Fraction  
Batch Size  
Optimizer  ADAM (+ scheduler)  
Learning Rate  
Sampling Type  Weighted Random Distance  
Sampling Type Bounds  
Domain Normalization  Offset MinMax Normalization  
Network Architecture 

In many earthquake location procedures the complex geometry of the subsurface is poorly understood, with the assumption that lateral variations in velocity are negligible compared to velocity variations in depth. As such onedimensional velocity structure describing how the velocity changes with depth are specified. These models typically have independent velocity structure defined for both the Pwave and Swave arrivals, or a scaling relationship of Vp/Vs. It is important to understand how reliabiable these methods are for location procedures such as HypoSVI, as this would be a typical starting model for many use cases. In addition, understanding of the computational demand for training more simplistic traveltime models, informs the feasibility of the method on typical computational systems. We investigate these problems for our region of interest by training EikoNet traveltime models from the Vp and Vs velocity structure shown by the blue dots in Figure 2
a. We interpolate the velocity at the point locations as the linear interpolation of the observed velocity values. Two independent EikoNet neural networks are trained independently for the Vp and Vs velocity structure using the network parameters specified in Table
1. The training of each model took 10 epochs, with roughly a minutes training time on a Nvdia V100 GPU and minutes on a free Colab GPU (either a Nvidia K80,T4 or P100). Once trained the traveltime models can be validated by comparing the imposed observed velocity to predicted velocity, determined as the analytical gradient of the traveltime over the neural network, for a series of sourcereceiver pairs within the threedimensional domain. Figure 2outlines the comparison of the observed velocity structure and the predicted velocity, with the variance of the predicted velocity within
of the observed values. The consistent velocity structure and low computational overhead shows that this method is viable regardless of the available computational infrastructure.In more well studied regions prior geophysical datasets and analysis could have been leverage to gain a better insight into the complex subsurface and therefore the velocity structure. The velocity structure for our region of interest, that of Southern California, has been densely studied with a compilation of direct velocity model estimates (Shaw et al., 2015; Lee et al., 2014), used to construct detailed subsurface velocity models defined under the group project ’Southern California Earthquake Centre Community Velocity Model’ (SCECCVM, https://www.scec.org/research/cvm). The current versions of these velocity models encompass the SCECCVMH (Shaw et al., 2015, version 15.1.0)) and SCECCVMS (Lee et al., 2014, version 4.26). Incremental improvements are made to the CVMH, but the CVMS has seen no advances since the current version. The SCECCVMs are computed using numerous datasets (Süss & Shaw, 2003), encompassing a detailed subsurface threedimensional velocity structure from moho surface, basement surface and topological/bathymetric surfaces. Using the SCECCVMH model we train EikoNet models to determine the traveltime within the complex 3D velocity structures. The models are trained on randomly selected sourcereceiver points within the domain, with example slice at Longitude given for the Pwave and Swave velocity structure in Figure 3a and 3b respectively. The EikoNet models once trained represent the traveltime and predicted velocity between any points, as such we show the recovered velocity model colourmap and traveltime contours (at spacing) for a earthquake source at on a receiver grid as separation with Longitude. This example shows consistent agreement between the observed and predicted velocity models, able to reconcile the sharp velocity contrasts which create deflection in the traveltime fields. This example demonstrates the viability of this method in complex 3D velocity structures.
An earthquake hypocenter, , is composed of three spatial coordinates, , and the origin time, . Most commonly, the data used to locate earthquakes are measured times of seismic P and Swave arrivals ("phase picks") over a network seismic instruments. These phase picks define a set of absolute arrival time observations , where . In a Bayesian framework (Tarantola, 2004; Lomax et al., 2000), inference on is performed by combining prior knowledge together with the observations,
(8) 
where is the posterior distribution, is the prior distribution, is the likelihood, and is a constant.
A simple example of for hypocenter inversion is,
(9) 
where is an estimate of uncertainty and,
(10) 
is a nonlinear forward model, i.e. a solution to the Eikonal equation plus the origin time. Thus, the forward model in this problem is a physicsinformed neural network. Since is included as an input to the neural network, this allows for downstream gradients to be taken with respect to it.
More recently, a likelihood function based on the Equal Differential Time method (Lomax et al., 2000, EDT) has seen increasing usage. The EDT likelihood builds differential times from all pairs of phases, and in the process, decouples origin time, from the spatial coordinates of the hypocenter. The formulation is given by;
(11) 
(12) 
where and are different phase arrival time observations, is a phasedependent estimate of uncertainty, and
is the total number of differential times. In addition to reducing the number of latent variables by one, this formulation acts to minimise the effects of outliers, which are particularly common with automated picking algorithms. This robustness results from the fact that in the EDT likelihood, the errors are combined in an additive manner, rather than multiplicative as typical in Bayesian inference problems. Each term in Eq.
11 produces a hyperbolic error surface that decays like a Gaussian in the direction normal to each point on the hyperbola. Thus, Eq. 11 can be viewed as producing a stack of hyperbolas with relatively limited intersection, which creates robustness in the presence of strong outliers. However, the downside is that it results in posterior distributions that are highly nonconvex, making MCMC methods and standard variational inference schemes difficult to use for this problem (Lomax et al., 2000). The origin time is reintroduced by using the optimised earthquake location to determine the predicted origin times to each of the observational locations, determining the origin time as the median of the predicted origin times. The uncertainty is then defined by the median absolute deviation (MAD) from the predicted origin time. We use a uniform prior, , with samples selected within the model domain specified in the Eikonal physics informed neural network.The uncertainty in the posterior distribution is assigned as a combination of the observational, , and forward model uncertainty, , given as
(13) 
The observational uncertainty represents uncertainty in each of the observational times, with an expected standard deviation for each observation time supplied by the user. This value is then converted to a variance to define
for each observation. The forward model uncertainty is constructed as a function of the predicted traveltime for each of the observational locations (similar to that given in Lomax et al. 2000 for LOCGAU2), given by(14) 
where is the fraction of the travel time to use as uncertainty, bounded within the max and min uncertainties specified by and respectively. Throughout this work we use the , discussing the effects of these parameters on synthetic testing within Section 4.5.
A Stein variational gradient descent procedure is used to optimise for the EqualDifferential Time posterior. We use a Gaussian kernel for the SVI procedure, also known as radial basis function (RBF) kernel, for its practical and universal approximation properties. First, we initialize particles randomly using a uniform prior over the 3D study area . For each of these particle locations, we calculate corresponding travel times using EikoNet forward model (Section 1. of Figure 1), evaluating the posterior (to within the normalization constant ), and determine the kernelized Stein discrepancy (Section 2 of Figure 1). Then, we calculate the gradients of this loss function particlewise with respect to the hypocentral coordinates using automatic differentiation, which is possible due to the differentiablity of the PINN (Section 3 of Figure 1). We use these gradients together with the ADAM optimizer (Kingma & Ba, 2014) to update the particle locations until convergence, where the optimal hypocentral location is consistent across multiple epochs. Supplementary Video 1 demonstrates the convergence for the example outlined in Figure Figure 1.
The next step is to extract summary statistics from the posterior distribution. As mentioned previously, the posterior is typically strongly nonconvex due to the EDT likelihood function, although many of the local extrema are effectively negligible in amplitude. Therefore, we aim to determine the dominant cluster of particles representing the main peak of the posterior. This is achieved by using the DBSCAN clustering technique (Hahsler et al., 2019, Section 4 of Figure 1) to identify highdensity clusters of particles, with the cluster with the largest number of particles is used as the dominant cluster. Once the dominant cluster is identified, we apply kernel density methods using Gaussian kernels to estimate the MAP and quantify the location uncertainty from its covariance matrix. We discuss hyperparameter tuning for DBSCAN (Hahsler et al., 2019) in a later section.
In this section we first demonstrate the earthquake inversion scheme on a series of synthetic tests. We construct a catalogue of synthetic earthquake locations across the region, determining the traveltime to a grid of observation points at fixed elevation of , before applying a uncertainty in the synthetic phase arrival and inverting to determine the earthquake location and uncertainty. The earthquake locations are at a fixed latitude and depth of and respectively, with longitude varying from to at separations. The recovered optimal hypocentre and location uncertainty are then compared with the imposed earthquake locations and an expected 2std contour from a gridsearch approach. We vary the possible user defined parameters with the optimised parameters given in Table 2 and earthquake locations in Figure 4. However, we expect that these parameters will need to be varied somewhat depending on the exact application, for example if the error models or network geometry are changed significantly. As such we recommend that initial synthetic testing is undertaken before real data is inverted. Outlined below are discussions on how each hyperparameter affects the recovered locations for this study, with corresponding Supplementary Figures S9S11.
Parameter  Value 

Number of Particles  
Number of Epochs  
Observation Weights  
Radial Basis Function  
Location Uncertainty 
The number of particles used in the Stein Variational Gradient Descent is of great importance for the resolution of the resolved earthquake location. If the number of particles is too small then the particles density is unable to adequately represent the posterior distribution. However, a large number of particles would have increasing computational demand on the inversion procedure and is intractable for large earthquake catalogues. We specify a optimal number of samples equal to and find that an increase in the number of particles does not provide additional information on the the earthquake location, but reducing the number of particles greatly effects posterior. Additional plots for variations of number of particles, with remaining parameters set equal to Table 2, is given in Supplementary Figure S9.
The number of epochs effects how well the particles density represents the posterior for the earthquake location. Figure 1b demonstrates that the posterior drastically changes in the early epochs, but once it converges there is little to no change in the recovered posterior. However, an increase in number of epochs effects the computational time, which over a large number of earthquake locations could have a large effect. We optimise the number of epochs to determine when the earthquake locations and location uncertainty is consistent between epochs.
The RBF kernel can be represented by , where is the shape parameter and the pairwise particle difference. As this term acts as a repulsive force in the SVI procedure increasing the term has the effect of increasing the minimum distance between particles locations. Understanding the trade off for the shape parameter is important as larger values could effect on the recovered posterior. Qiang & Dilin (2016) defined a dynamic shape parameter with the value changing depending on , where is the median distance between pairwise particles, with the definition demonstrating that for each the contribution from its own gradient and the influence from other points balances out. We investigate the variation of the RBF shape parameters on the recovered synthetic earthquake locations finding that parameter has little effect the recovered optimal hypocentral location, with minor variations of the recovered posterior for static values between and that of the dynamic shape parameter (Supplementary Figure S10). We decided to use a static shape parameter of , to mitigate any difference that could occur to the posterior from mulitple run of the same observations for a dynamic shape parameter.
The total uncertainty assigned to the inverse problem is a combination of the picking uncertainty and the forward model uncertainty due to the velocity structure. As described previously, we follow Lomax et al. (2000) and characterize the uncertainty in the forward model as a fraction of the travel time. This is a reasonable choice as the uncertainty in the predicted travel times is expected grow in proportion to the travel time. In our hyperparameter investigation we found that a fraction of should be used, as lower values lead to significant mislocation of the recovered events (Supplementary Figure S11). The upper and lower bounds to the allowed error has less of an effect on our synthetic testing, which we attribute to the synthetic station locations being regularly spaced. For observational data that is clustered spatially the upper and lower bounds could be of great importance and should be investigated with synthetic examples for the specific network geometry.
The posterior for earthquake location is nonconvex due to the EDT likelihood function and we aim to determine the dominant cluster of particles representing the main peak of the posterior. This is achieved by using the DBSCAN clustering technique (Hahsler et al., 2019) to identify highdensity clusters of particles. We investigate the variation of the two dominant parameters defining the DSCAN clustering technique, the maximum distance between two samples for them to be considered as in the neighbourhood of the other and the minimum number of samples per cluster (Supplementary figure S12). The definition of the minimum distance is crucial for effectively sampling the dominant peak of the distribution, with a minimum distance too small possibly subsampling the peak as multiple clusters and a value too large including additional local peaks in the the posterior. We found that the minimum distance must be defined large enough to remove the effects of subsampling the dominant peak in the posterior, in this use case we found sufficient, although has little variation in the recovered kernal density function function until the mimum distance is expanded to at least . In addition, we find little effect of the kernel density function with changes in the minimum number of particles per cluster, something that is to be expected when the dominent cluster typically comprises of the particles for these synthetic inversions. We conclude that a minimum cluster separation of should be used for these large regional scale problems and is insenstive to the low values in the minimum number of samples per cluster, which we use a minimum cluster comprising particles.
The number of observations going into a inversion affects the compute time, as each observation requires predicted traveltime formulations from EikoNet and gradients to be computed . Here, we investigate the computational cost of the inversion procedure while increasing the number of observations. We replicate an increasing number of observations by copying the synthetic station deployment locations multiple times, labelling them as different station names but comprising the same arrival times. This synthetic testing was chosen to minimising the changing effect on the location estimate, which would occur if additional synthetic station locations are provided. All other location hyperparameters are fixed at values given in Table 2. The earthquake locations are then determined for the varying number of observations and the total number of pairwise differential times, with the average computational time for a Nvidia V100 shown in Table 3. The computational time even for the observations, differential times, only takes per event. These synthetic tests demonstrate that this approach is computationally scaleable with computational time increasing as a linearly in a loglog space of computational time vs number of observational differential times.
# of Observations  # of Differential Times  Time per Event(s) 
To further validate the developed method, we apply it to real earthquakes occurring within the Southern California region, with region defined in Section 3.2. This study area was chosen as it encompasses a large seismic network and complex 3D regional velocity structures (Allam & BenZion, 2012)
. We used the detections and phase picks from the open source Southern California Earthquake Data Centre (SCEDC) phase arrival observational catalogue, for the fist
events starting 20190101. The events and phase picks used have all been manually reviewed by analysts at the Southern California Seismic Network (Hutton et al., 2010).We infer hypocenters for the earthquakes using two different velocity models (1D and 3D cases, described in Section 3.2). The hyperparameters used for the inversions are outlined in Table 2 with detailed explanation of the reasoning behind the parameter definition outlined in Section 4. The catalogues are generated on a Nvidia V100 GPU with an average of per event, varying depending on the number of observations in the inversion procedure, with on average observations per event. Since the calculation of traveltimes from EikoNet is independent on the complexity of the velocity model (once the network has been trained), the processing takes equal time for both the 1D and 3D trained models. Example inversions for three events are shown in Figure 5.
To understand the validity of our location technique we compare our earthquake catalogue, with a catalogue determined using the conventional earthquake location software, NonLinLoc. NonLinLoc is a non linear earthquake technique leveraging finitedifference traveltime solutions; Gaussian or equaldifferential likelihood functions; and, likelihood estimations schemes using octtree, gridsearch or Markov Chain Monte Carlo (MCMC). Traveltimes are computed by solving the eikonal using a finitedifference approach outlined in Podvin & Lecomte (1991). For a 1D velocity structure, only varying in depth, the package computes the traveltimes as an radial 2D finitedifference traveltime model that depends on the radial distance from the observation point and the depth, saving these as independent traveltime lookup tables. In contrast, for complex threedimensional velocity structures the traveltimes are computed for a user defined gridded series of receiver locations, with each observation saved as a separate traveltime lookup table. Since the storage and computational requirements for a NonLinLoc using the complex 3D velocity for a very high resolution location grid, this method was intractable as it would return large gridding artifacts to the recovered earthquake locations and predicted location uncertainty, which are not directly comparable to the nongridded solutions of the HypoSVI. Instead we compare the HypoSVI and NonLinLoc locations using the onedimensional velocity structure, with the NonLinLoc traveltime and initial location grids resolved to km and
km receptively. The location is determined using a EqualDifferential TravelTime (EDT) likelihood function and octree sampling technique. The location uncertainty of the recovered NonLinLoc catalogue is determined as the standard error in X,Y,Z to 2std using the diagonal of the covariance matrix. The remaining NonLinLoc user parameters are given in the full control file in the Supplementary Material. The HypoSVI earthquake catalogues for the 1D and 3D velocity structures are given in Figure
6ab and 6cd respectively.For comparison we derive a NonLinLoc catalogue for subregion of [,] to [,]. This region comprises a total of events in the HypoSVI 1D catalogue (Figure 7ab), with the NonLinLoc comprising events (Figure 7cd). Manual inspection showed that the events present in the NonLinLoc catalogue but not HypoSVI catalogue, are events that are locate external to the subregion in the HypoSVI catalogue but are projected to the edge of NonLonLoc search grid, having large location uncertainties. For the remaining events we determine the relative location differences between the two catalogues by projecting both catalogues to a local universal transverse mercato (UTM) coordinate system and determining the distance between the events in in a local XYZ coordinates. The relative distance of the NonLinLoc locations minus the HypoSVI 1D locations are given in Figure 8ac. The relative locations demonstrate no consistent spatial bias, with the mean location difference given by , as shown by the red dot in Figure 8ac. In addition, we normalise the location difference by the location uncertainty from the NonLinLoc catalogue. Figure 8df gives the normalized location distances, with of the events having a relative distance less than that of the NonLinLoc location uncertainty, as shown by the points within the dashed box.
In this paper, we developed a new approach to performing Bayesian inference on earthquake hypocenters that combines a differentiable forward model (physicsinformed neural network) with Stein variational inference. Unlike with MCMC sampling methods, SVI approximates a posterior with a collection of particles, with the set of particle locations jointly optimized. In this paper we use an EikoNet forward model, but this could be replaced with any other differentiable forward model. Thus, HypoSVI is a general variational approach to hypocenter inversion. We validated the method with synthetic tests and compared the locations for events in Southern California with those produced by the Southern California Seismic Network. In particular, we focused on demonstrating the reliability of the method in the presence of nonconvex posterior distributions, which SVI is well suited for handling. This is all possible because of the differentiable forward model.
Another advantage of our approach is that it is computationally efficient and can make use of state of the art GPU architectures and modern deep learning APIs like PyTorch. This allows for rapid calculation of the gradients with automatic differentiation. As GPU hardware improves, such as increased memory, these performance gains will be passed on to the algorithm which will allow for even larger datasets to be worked with than currently possible. By combining SVI with EikoNet, we are able to evaluate observations at any point within the 3D volume without retraining, i.e. the forward model is valid for any array geometry. Due to the highlyparallelized nature of calculations with neural networks, our method scales well to very large networks, which may be important for emerging technologies like Distributed Acoustic Sensing (DAS). This was demonstrated herein by the ability to locate an earthquake with 2048 phase picks in 439 seconds. Thus, our HypoSVI approach is ideal for handling the enormous data volumes that are starting to emerge in seismology.
HypoSVI is avaliable at github https://github.com/Ulvetanna/HypoSVI, with additional runable Coalab code supplied at the Github.
This project was partly supported by a grant from the United States Geological Survey (USGS). We would like to thank Jack Wilding and Bing Q. Li for interesting discussions about the implementation and software usage.
Geiger L., 1912, Probability method for the determination of earthquake epicenters from the arrival time only,
St. Louis Univ. Bull.,8,6071
Comments
There are no comments yet.