1 Introduction
An earthquake tomography method is a tool to visualize seismic velocity structure in the Earth. The crust is an approximately 10–50kmthick layer that covers the Earth’s surface ([Bassin 2000]), and hosts intense shallow seismicity. Local earthquake tomography (LET) is often used to capture the highresolution, threedimensional (3D) crustal structure of a given region (e.g., [Aki & Lee 1976], [Thurber 1993]) and relocate earthquake hypocenters (e.g., [Thurber 1983]). Therefore, LET and its associated approaches, such as doubledifference tomography ([Zhang & Thurber 2003]), provide fundamental information for understanding earthquake generation mechanisms in and around the crust (e.g., [Alesssandrini et al. 2001], [Nugraha & Mori 2006]).
Seismic tomography works best when a large number of different ray paths are produced by welldistributed source and receiver arrays. However, seismic sources are often localized around fault zones, and most seismic stations are deployed near the Earth’s surface, resulting in an inhomogeneous distribution of seismic ray paths due to these uneven source and receiver distributions. Therefore, LET often suffers from the unstable estimation of structural parameters, or overfitting. Laplacian regularization has commonly been used in LET to mitigate overfitting and stabilize crustal seismic velocity structure estimations (e.g., [Lees & Crosson 1989], [Zhang et al. 1998], [Moran et al. 1999]). The penalty to dissimilar velocity at neighborhoods by the Laplacian regularization yields smooth fluctuations of seismic velocity. The smoothed estimates are reasonably acceptable in seismology because spatial pressure and temperature variations, which are key factors affecting the seismic velocity structure, are generally gradual. However, Laplacian regularization ignores an important component of crustal structures; i.e., a velocity discontinuity which is due to a rapid change in seismic velocity and often represents either a geological boundary or a solid–liquid contact. The Conrad and Mohorovičić (Moho) discontinuities are wellknown, and have been incorporated into onedimensional (1D) velocity models, such as the PREM ([Dziewonski & Anderson 1981]) and IASP91 ([Kennett & Engdahl 1991]) models. Furthermore, there may also be local discontinuities that are difficult to image in the crust, such as the boundary between a sedimentary basin and basement rocks. It is therefore desirable to obtain a stable estimate of both a sharp velocity jump and gradual change in seismic velocity. One way to overcome the pitfall of Laplacian regularization is to place grid points along a discontinuity ([Zhao et al. 1992], [Moran et al. 1999]), but this approach requires a prior knowledge of the discontinuity.
Here we develop a new regularization method for 3D LET that can handle both drastic and gradual changes in the seismic wave velocity structure of the crust. Specifically, we utilize a combination of the following two penalty terms in a geophysical inverse problem: (i) an type penalty on the norm of the secondorder differences between the horizontal units in the vertical direction; and (ii) an type penalty on the firstorder differences of the horizontal components. Penalty (i) is a regularization term that reflects our seismological knowledge of discontinuities, and is constructed by applying trend filtering ([Kim et al. 2009]), which is a sparse estimation technique. Sparse estimations with type penalties yield estimates with zero values, and work well in balancing the tradeoffs of mitigating overfitting and obtaining estimation accuracy when the estimand has sparse representation (e.g., [Tibshirani 1996], [Schmidt et al. 2007]). Sparse estimations have been utilized in a variety of research areas, including seismology (e.g., [Loris et al. 2007], [Yao et al. 2011], [Evans & Meade 2012], [Charlety et al. 2013], [Nakata et al. 2016], [Nakata et al. 2017], [Hirose et al. 2020]
). In our case, penalty (i) makes the distribution of resulting velocities averaged on the horizontal unit piecewise linear along the vertical direction, which enhances the depths at which drastic velocity changes occur. Penalty (ii) produces velocity distributions with smooth changes in the horizontal directions which fit the common understanding of velocity structures in the Earth’s interior, since horizontal velocity variations are generally mild compared with vertical (depth) velocity variations. These penalty terms allow the proposed method to elucidate both gradual velocity changes and velocity discontinuities. Our penalty term in the depth direction can be applied to automatically image unknown velocity discontinuities in the crust since it does not require a prior knowledge of any existing discontinuities. We determine all values of hyperparameters (regularization parameters) via crossvalidation.
This paper is organized as follows. We first outline the basis of LET and introduce the proposed approach in Section 2. We then conduct a synthetic test in Section 3 to demonstrate that our proposed method works better in estimating velocity structures with drastic changes than existing methods. We discuss the characteristics of conventional methods and our proposed method via several additional experiments in Section 4, and apply the proposed method to real seismic data in Section 5. We conclude by providing arguments for our proposed method in Section 6.
2 Mathematical formulation
Here we provide the LET mathematical formulation, including estimations of the 3D velocity structures and hypocentral parameters (earthquake origin times and locations). We only focus on cases using compressionalwave (Pwave) arrivals since the description does not depend on a specific seismic phase.
2.1 LET fundamental framework
We first design 3D grid points to model the subsurface velocity structures. Let be a seismic velocity parameter at grid point . In this paper, the axis indicates the depth, and the 
plane indicates the horizontal location. We assume that the grid points are arranged at regular intervals in the horizontal and vertical directions. We then calculate the velocity at an arbitrary point by interpolating the values of the velocity parameters at the nearest eight grid points, where
denotes the velocity structure at coordinate (not necessarily at a grid point).An arrival time contains information on the following factors: the origin time of earthquake , the hypocenter location of earthquake , the velocity parameter at grid point , and the ray path from to seismic station . These factors are combined using ray theory to provide the predicted travel time from earthquake to seismic station :
where denotes the element of the path length. We hereafter call the set of velocity parameters and the set of hypocentral parameters.
The ray paths and other factors are calculated in an alternate update for a given set of observed arrival times as follows. The ray path for each earthquake–station pair is calculated for given hypocentral and velocity parameter estimates. The hypocentral and velocity parameter estimates are then updated for a given ray path. This alternate update is conducted until the desired accuracy of the tomography is achieved. Here we focus on the velocity parameter estimations since the alternating update segregates the parameter estimation and raypath calculation.
The velocity (and hypocentral) parameter estimations are usually conducted by minimizing the sum of the squared residuals between the calculated arrival times and observed arrival times for all of the available earthquake–station pairs:
(1) 
where and are the observed earthquakes and available observation stations, respectively.
2.2 Structured regularization for 3D LET
Here we propose a structured regularization approach to accurately image two different velocity changes: drastic velocity changes in the vertical direction at discontinuities and gradual velocity changes in the horizontal directions. Our objective function, which is minimized to estimate the optimal velocity structure, incorporates three components (the loss function
in Eq. (1), vertical regularization term on the horizontal units in the vertical direction, and horizontal regularization term on the components in the horizontal direction) as follows:(2) 
where and are nonnegative regularization parameters. We multiply by for the convenience of computation. We obtain estimates by applying iterative calculations based on quadratic approximations and the alternating direction method of multipliers (ADMM; [Glowinski & Marroco 1975], [Gabay & Mercier 1976]) to this nonlinear and nonconvex problem (see the Appendix). The vertical regularization term takes the form:
(3)  
(4) 
Our verticaldirection penalty is the sum of , that is, the norm of the secondorder differences in the horizontal units. The value of becomes to be zero when gradient of the average velocities on among , , and th layers is constant, and thus the penalty forces the minimizer of Eq. (2) piecewise linearly in the vertical direction. The horizontal regularization term is given by:
(5) 
where the relation means that the two grid points are located on the same layer and are adjacent to each other. The horizontaldirection penalty builds upon the firstorder velocity differences among adjacent grids on the same layer and the norm, which allows the resultant velocity parameters to vary smoothly. Note that, the penalty terms in Eqs. (3) and (5) need to be divided by the corresponding grid intervals if the grid points are not arranged at regular intervals.
Figure 1 illustrates how the penalty terms work in the vertical and horizontal directions. The verticaldirection penalty term (Eq. (3)) is a version of trend filtering ([Kim et al. 2009]), which has been applied in various research fields (e.g., [Tibshirani 2014], [Selvin et al. 2016], [Guntuboyina et al. 2020]). This method is known to be suitable for capturing underlying piecewise linear trends. Trend filtering can reduce the penalized elements to zero, whereas Laplacianbased regularization does not (e.g., [Wang et al. 2016]). Our proposed verticaldirection penalty takes the form of the sum of , which reduces the change in the averagevelocity gradient in the vertical direction. Our proposed method can adapt to drastic velocity changes due to geological discontinuities at depth, and it does not require a prior information on the location of the discontinuity.
3 Numerical experiments
We evaluate the performances of our proposed regularization method and two existing methods, SIMUL and Laplacian regularization, to determine the effectiveness of our proposed method in reproducing the seismic velocity structure of a given region. SIMUL is a LETbased software package that simultaneously performs 3D ray tracing, and both velocity and hypocentral parameter estimations ([Thurber 1993]); we hereafter refer to the method without any structured regularization as “SIMUL” for simplicity. Laplacian regularization has often been used in LET, as mentioned in Section 1. The objective function that is minimized during the velocity and hypocentral parameter estimations takes the following form:
(6) 
where
(7)  
and and are regularization parameters. Both Laplacian regularization and our proposed method impose the same penalty in the horizontal direction because we mainly focus on the detection performance of velocity discontinuities at depth. We hereafter refer to this method as “Laplacian” for notational simplicity. A key difference between Laplacian and our proposed method is the employed norm: the former uses the sum of the norm, and the latter uses the sum of the norm (Eqs. (3) and (7)). We applied the algorithm in SIMUL to perform the 3D raytracing calculations through the resultant velocity model for each method.
We determined the Laplacian regularization parameters (Eq. (6)) and those for our proposed method (Eq. (2)) via crossvalidation. We first split the dataset into training and validation datasets. We then estimated the velocity parameters using given regularization parameter values that were in a set of prepared values. We finally selected the regularization parameter values within the set that minimized the root mean square error of the predicted arrival times for the validation dataset.
3.1 Synthetic data
We conducted synthetic tests using the Japan Meteorological Agency (JMA) unified earthquake catalog to investigate the performances of the three different approaches. Here we focus on the velocity model estimation by fixing the location and origin time of each earthquake to the JMA catalog values for simplicity. The location of the study area is shown in Figure 2. The dataset consists of 199 earthquakes that occurred in Central Japan. We obtained 3,954 Pwave arrival times from 68 seismic stations in the target area. The arrival times were divided into estimation and validation datasets, with 2,965 arrival times used to estimate the velocity parameters and 989 arrival times used to validate the method accuracy.
We constructed a 26layer model that extended from 0.0 km (surface) to 25.0 km depth at 1.0km intervals. We denote the surface layer as “Layer 0” and the layers with grid points at km depth as “Layer ”. We then placed 36 (66) horizontaldirected grids at an 8.0km horizontal interval on each layer, such that the center of the grids was at N, E. We set constant velocities for the outer points, which surrounded the main target region, because some of the hypocenters and stations were located outside the target region. We arranged the outer points as those that were 220 km from the end grid points of each layer in the horizontal direction. We set the “outer layer” at 200 km depth in the vertical direction.
We calculated the synthetic Pwave arrival times as follows. We first defined the baseline velocity of each layer, as shown in Figure 3(a). We assumed a 1D velocity model with a drastic increase in its Pwave velocity at around Layer 12. We then generated “true” velocities at the grid points by adding % anomalies to the baseline velocities to produce a checkerboard pattern for each layer (Fig. 3
(b)). We finally calculated synthetic arrival times for the available earthquake–station pairs using this assumed 3D velocity structure, and generated additional time by adding Gaussian noise with a standard deviation of 0.1 s.
3.2 Results
Figure 4 shows the averaged Pwave velocity–depth profiles for the three methods. The initial value of velocity parameter in each grid point is assumed to be 4.0 km/s in this synthetic test. The following regularization parameter values were determined via crossvalidation: in Laplacian (Eq. (6)) and in the proposed method (Eq. (2)). The averaged velocities obtained via the three methods all capture the drastic velocity increases around Layer 12, as shown in Figure 4(a). However, the SIMUL output shows obvious fluctuations in its estimated velocity structure. This may be due to the 1 km grid interval in the vertical direction being finer than the grid interval that LET studies have generally employed when using data from the nationwide seismic network in Japan (e.g., [Matsubara et al. 2017]). These unstable SIMUL estimations indicate that information on the spatial arrangement of the grid points is needed to avoid illposed estimations of the velocity structure in our case. Laplacian outperformed SIMUL due to the employed regularization, which reduced the fluctuations in the averaged velocities. However, the Laplacian estimates at the grid points in the layers near and below the velocity jump (Layers 13–25) were unable to obtain a constant velocity. Conversely, our proposed method relatively recovered the true average velocities, including the drastic increase in velocity around Layer 12 accurately. We quantitatively compare the estimation accuracies of the three methods by calculating the mean absolute error (MAE):
where is the number of grid points, and and are the estimated and true velocity parameters at each grid point, respectively. The MAE of SIMUL, Laplacian, and our proposed methods were , , and , respectively. We confirm that the proposed penalty can replicate regions where the average velocities are constant at depth based on the behavior in Figure 4(d) and the MAE values.
The norm of the penalty in the vertical direction is different in the Laplacian and proposed model, as shown in Eqs (3) and (7). Laplacian employs the sum of the norm, whereas the proposed method employs the sum of the norm as the verticaldirection penalty. The values (Eq. (4)), which were evaluated using the obtained velocity structure for each method, are illustrated in Figure 4(b). The values should be zero at most of the layers, except for those around Layer 12, where there is a sudden velocity change, because the true velocity structure was generated from only three baseline velocities (Fig. 3(a)): 4.0 km/s in Layers 0–11, 4.5 km/s in Layer 12, and 5.0 km/s in Layers 13–25. Most of the computed values for the SIMULestimated velocity structure were far from zero, as shown in Figure 4(b). Although the Laplacianestimated values were relatively small compared with the SIMULestimated values, the Laplacian regularization terms did not reduce to zero. In contrast, most of the values estimated by our proposed method were almost exactly zero, as the penalty terms in the proposed method produce a piecewise linear velocity structure.
We then focus on the spatial variations in the estimated velocities in the horizontal units. The imposed checkerboard anomalies on the true velocity structure and the estimated velocity perturbations, both of which are estimated from the baseline velocities at each grid point in Layers 1, 12, 20, and 25, are shown in Figure 5. SIMUL tends to estimate amplitude anomalies that are more than 5% smaller/larger than the assumed true structure in this experiment. Both the Laplacian and proposed method reproduced the checkerboard pattern in the shallower layers (Layers 0–10). However, Laplacian failed to reproduce the assumed checkerboard pattern in the deeper layers (Layers 18–25), whereas the proposed method successfully restored the true structure in most areas (Fig. 5(b)).
4 Discussion
4.1 Size of the velocity jump at a discontinuity
We examined the sensitivity of the three methods to the amplitude of the velocity jump in the vertical direction. It is expected that estimation accuracy of the velocity parameters will deteriorate as the size of the velocity jump becomes larger. The results of this sensitivity test are shown in Figures 6(a)–(c) and 4 (the case when the size of the velocity jump is km/s). Figure 6 shows the averages of the true and estimated velocities at each layer, and the MAEs for each tested velocity jump. The three methods yield comparable estimation accuracies when there is no velocity jump (constant velocity with depth; Fig. 6(a)). We find that the SIMUL performance degraded gradually as the size of the velocity jump increased. Laplacian performed better than SIMUL based on the MAEs, but it failed to reproduce the linear trend in Layers 13–25 (Figs 6(b) and (c)). This occurs because the penalty term of the Laplacian does not strictly hold the average velocity gradient constant since it is composed of the sum of the norm (; Eq. (4)), as discussed in Subsection 3.2. In contrast, the decrease in the precision of the proposed method is relatively suppressed, especially in the layers below the velocity jump (Layers 13–25), by expressing a piecewise linear trend of the true velocity structure via the penalty term, which consists of the sum of the norm. We confirm that the proposed method estimated velocity parameters more stably for each of the tested velocity jumps compared with the conventional methods (Fig. 6(d)). These results suggest that the proposed method can recover a range of velocity jumps (small to large amplitudes) that may be associated with a discontinuity.
4.2 Initial model dependence
We conducted additional experiments to verify the influence of the initial model on the estimated velocity structure for each of the three methods, which may be due to the nonlinearity of the objective function. The main experiment adopted an initial velocity structure of 4.0 km/s at all of the grid points (Fig. 4); our additional experiments tested initial velocities of 4.5 and 5.0 km/s. We configured all of the other settings to be the same as those in the main experiment. The averages of the true and estimated velocities in each layer, and the associated MAEs for different initial velocities are shown in Figure 7. Note that the proposed method yields the smallest MAEs among the three methods for each of the three initial velocities (Fig. 7(c)), indicating that our structured regularizations provide stable estimations of the velocity structure, regardless of the initial velocity model. It is generally more difficult to estimate the velocity parameters in the deeper layers than those in the shallow layers because of the sparsity of the seismic ray paths at depth. The estimated average velocity better reproduced the true velocity in all three cases when the initial velocity was set to 5.0 km/s, which is close to the true velocity of the deep layers.
4.3 The relationship between method accuracy and sample size
We verified the accuracy of each method for different sample sizes (the number of travel time data). The sample size was controlled by either decreasing or increasing the number of available seismic stations that were analyzed to extract the Pwave travel times. Figure 8 shows the averages of the true and estimated velocities in each layer, and the MAEs for each sample size. We confirmed that the proposed method performed the best in each of the tested settings based on its MAE values (Fig. 8(d)). The number of velocity parameters was as large as 936 () in our experiments, inevitably making it difficult to estimate the velocity structure without regularization when the sample size was small (Fig. 8(a)), as discussed in Subsection 3.1. Although all three methods performed well for a large sample size (Figs 8(b) and (c)), the methods with regularizations (Laplacian and proposed methods) yielded better accuracies than that without regularization (SIMUL). The Laplacian and proposed methods yielded relatively stable accuracies, even when the number of travel time data was small, since regularization methods are generally capable of avoiding overfitting and performing well when there are a lot of parameters to estimate (e.g., [Negahban et al. 2012], [Hastie et al. 2015]). Furthermore, the proposed method reproduced the sharp change in the average velocity structure the best among the three methods and outperformed the SIMUL and Laplacian methods in the layers below the velocity jump (Layers 13–25) in every setting, with this superior performance attributed to the penalty.
5 Application to real seismic data
We applied the proposed method to real seismic data, using the seismic waveforms from 211 earthquakes that were observed by the highsensitivity seismograph network in Japan (Hinet; [NIED 2019]) during the 2005–2014 period. We picked 2,042 Pwave travel times from the waveforms, and divided the travel times into 1,701 training data and 341 validation data for the crossvalidation. The target region of this experiment is shown in Figure 2. We employed the same grid points as those in the synthetic test. We set a velocity of 6.0 km/s at all of grid points in the study area for the initial velocity model, and fixed JMA2001 1D velocity model ([Ueno et al. 2002]) values, which have been commonly employed for routine hypocenter determinations throughout Japan, at the outer points (outside the target region).
The resultant Pwave velocity–depth profiles for the three methods are shown in Figure 9(a). Our proposed method estimated a notable velocity change in the target area (red line in Fig. 9(a)), with monotonically increasing velocities to approximately 16 km depth (Layer 16) and a nearly constant velocity at greater depths. The Conrad discontinuity in the target region has been imaged at approximately 15–20 km depth (e.g., [Iidaka et al. 2003], [Katsumata 2010]). Therefore, the depth at which the obtained average velocity gradient suddenly changes can be interpreted as the Conrad discontinuity. The obtained average velocities, which span the 5.0–6.5 km/s range above 16 km depth and are approximately 6.7 km/s at greater depths, are comparable with those retrieved by a reflection and wideangle refraction survey ([Iidaka et al. 2003]). Therefore, the applicability of the proposed method in elucidating sharp velocity discontinuities is validated by its success in detecting the Conrad discontinuity, which is defined by this sudden change in the velocity gradient within the target area.
There were large fluctuations in the SIMUL average velocities (green line in Fig. 9(a)), whereas the Laplacian vertical fluctuations were smoothed (blue line in Fig. 9(a)). The average velocity gradient of the Laplacian estimates changes in the Layer 13–16 range, which seems to be associated with the Conrad discontinuity. However, the velocities at greater depths are clearly smaller than those expected beneath the Conrad discontinuity in the target region ([Iidaka et al. 2003]). The small number of travel time data used here can cause underestimations in the Laplacian average velocities, as discussed in the numerical experiments (Subsection 4.3).
Furthermore, we illustrate the vertical variations in Pwave velocities along the two E–W profiles obtained by the proposed method in Figures 9(b) and (c). The Pwave velocities above 5 km depth decrease gradually toward the east, which may correspond to a local geological structure. This lateral velocity variation resembles one that has been imaged in previous studies (e.g., [Matsubara et al. 2019]). The regularization parameters for the proposed and Laplacian methods are and , respectively.
These results, which are obtained from real seismic data, suggest that the proposed method, with type regularization, can stably detect the true depth of the velocity discontinuity, even when the number of available observational data is small. Later reflected and/or converted waves have conventionally been used to investigate the depths of various velocity discontinuities, such as the Conrad and Moho discontinuities, and the subducting plate interface (e.g., [Matsuzawa et al. 1986], [Zhao et al. 1997]); however, such later waves are only identified in a limited number of earthquake–station pairs. Unlike reflected and converted waves, direct P and S waves are commonly and widely observed from numerous earthquakes in the Earth. Therefore, the proposed method will improve the detection of velocity discontinuities considerably and refine imaging in areas where later waves are not widely observed.
6 Conclusion
We introduced a nonlinear inversion method with structured regularization to image the crustal structure of the Earth. Our proposed LET method simultaneously estimated the smooth trends and drastic changes in the crustal velocity structure, both of which are expected in the Earth’s interior, by employing two types of penalty terms that are added to the vertical and horizontal directions of the model space, respectively. We employed a verticaldirection penalty term that consisted of the secondorder differences in the depthdependent velocity parameters to detect a velocity discontinuity, thereby highlighting the ability to image drastic velocity changes in the vertical direction. This verticaldirection penalty term works on the depthaveraged velocity values, and takes the form of the sum of the norm, which allows it to reproduce piecewise linear trends in the velocity changes at depth. We used a horizontaldirection penalty term that consisted of firstorder differences of the velocities that were based on the norms. This horizontaldirection penalty smooths velocity fluctuations.
We compared the imaging capability of the proposed method with a conventional LET approach (SIMUL) and Laplacian regularization via synthetic data experiments to verify the performance of the proposed method. The proposed method can adequately reproduce drastic velocity changes in the vertical direction and gradual velocity changes in the horizontal direction. Our results highlighted the importance of regularization to better estimate the subsurface velocity structure, and also demonstrated that the proposed method is stable against variations in the amplitude of velocity jump, initial velocity structure, and sample size. Furthermore, we applied the proposed method to real seismic data from Central Japan, and successfully imaged a distinct velocity gradient at approximately 15 km depth, which corresponded to the Conrad discontinuity. Therefore, the proposed method can improve the detectability of a velocity discontinuity using P and/or Swave travel time data.
Our proposed method automatically detects the existence (or nonexistence) of discontinuities because it does not require a prior information for the velocity discontinuity. We focused on an estimation of the velocity parameters, whereas LET usually includes updated hypocentral locations. Therefore, the accuracy of the velocity estimations and hypocenter determinations via LET will be further improved by replacing the LET penalty terms with our proposed terms from the conventional framework.
Data availability
The data underlying this article are available in Japan Meteorological Agency (JMA) at https://www.data.jma.go.jp/svd/eqev/data/bulletin/deck_e.html.
Acknowledgements
GMT software package [Wessel & Smith 1998] and R [R Core Team 2020] were used for creating figures. This work is supported by JST CREST Grant Number JPMJCR1763, Japan, and JSPS KAKENHI Grant Number JP20K19753.
References
 [Aki & Lee 1976] Aki, K. & Lee, W. H. K., 1976. Determination of threedimensional velocity anomalies under a seismic array using first P arrival times from local earthquakes: 1. A homogeneous initial model, Journal of Geophysical research, 81, 4381–4399.
 [Alesssandrini et al. 2001] Alesssandrini, B., Filippi, L. & Borgia, A., 2001. Uppercrust tomographic structure of the Central Apennines, Italy, from local earthquakes, Tectonophysics, 339, 479–494.
 [Bassin 2000] Bassin, C., 2000. The current limits of resolution for surface wave tomography in North America, Eos, Transactions American Geophysical Union, 81, S12A03.

[Boyd et al. 2011]
Boyd, S., Parikh, N., Chu, E., Peleato, B. & Eckstein, J., 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers,
Foundations and Trends ® in Machine Learning
, 3, 1–122.  [Charlety et al. 2013] Charlety, J., Voronin, S., Nolet, G., Loris, I., Simons, F. J., Sigloch, K. & Daubechies, I. C., 2013. Global seismic tomography with sparsity constraints: Comparison with smoothing and damping regularization, Journal of Geophysical Research: Solid Earth, 118, 4887–4899.
 [Dziewonski & Anderson 1981] Dziewonski, A. M. & Anderson, D. L., 1981. Preliminary reference Earth model, Physics of the earth and planetary interiors, 25, 297–356.
 [Evans & Meade 2012] Evans, E. L. & Meade, B. J., 2012. Geodetic imaging of coseismic slip and postseismic afterslip: Sparsity promoting methods applied to the great Tohoku earthquake, Geophysical Research Letters, 39, L11314.
 [Gabay & Mercier 1976] Gabay, D. & Mercier, B., 1976. A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Computers & mathematics with applications, 2, 17–40.
 [Gavin 2013] Gavin, H., 2013. The LevenbergMarquardt method for nonlinear least squares curvefitting problems, Technical Report, Department of Civil & Environmental Engineering, Duke University.
 [Glowinski & Marroco 1975] Glowinski, R. & Marroco, A., 1975. Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisationdualité d’une classe de problèmes de Dirichlet non linéaires, ESAIM: Mathematical Modelling and Numerical Analysis  Modélisation Mathématique et Analyse Numérique, 9, 41–76.
 [Guntuboyina et al. 2020] Guntuboyina, A., Lieu, D., Chatterjee, S. & Sen, B., 2020. Adaptive risk bounds in univariate total variation denoising and trend filtering, The Annals of Statistics, 48, 205–229.
 [Hastie et al. 2015] Hastie, T., Tibshirani, R. & Wainwright, M., 2015. Statistical learning with sparsity: The lasso and generalizations, CRC press.
 [Hirose et al. 2020] Hirose, T., Nakahara, H., Nishimura, T. & Campillo, M., 2020. Locating spatial changes of seismic scattering property by sparse modeling of seismic ambient noise crosscorrelation functions: Application to the 2008 IwateMiyagi Nairiku (Mw 6.9), Japan, earthquake, Journal of Geophysical Research: Solid Earth, 125, e2019JB019307.
 [Iidaka et al. 2003] Iidaka, T., Iwasaki, T., Takeda, T., Moriya, T., Kumakawa, I., Kurashimo, E., Kawamura, T., Yamazaki, F., Koike, K. & Aoki, G., 2003. Configuration of subducting Philippine Sea plate and crustal structure in the central Japan region, Geophysical research letters, 30, 1219.
 [Katsumata 2010] Katsumata, A., 2010. Depth of the Moho discontinuity beneath the Japanese islands estimated by traveltime analysis, Journal of Geophysical Research: Solid Earth, 115, B04303.
 [Kennett & Engdahl 1991] Kennett, B. L. N. & Engdahl, E. R., 1991. Traveltimes for global earthquake location and phase identification, Geophysical Journal International, 105, 429–465.
 [Kim et al. 2009] Kim, S.J., Koh, K. & Boyd, S. and Gorinevsky, D., 2009. L1 trend filtering, SIAM review, 51, 339–360.
 [Lees & Crosson 1989] Lees, J. M. & Crosson, R. S., 1989. Tomographic inversion for threedimensional velocity structure at Mount St. Helens using earthquake data, Journal of Geophysical Research: Solid Earth, 94, 5716–5728.
 [Levenberg 1944] Levenberg, K., 1944. A method for the solution of certain nonlinear problems in least squares, Quarterly of applied mathematics, 2, 164–168.
 [Li & Harris 2018] Li, D. & Harris, J. M., 2018. Full waveform inversion with nonlocal similarity and modelderivative domain adaptive sparsitypromoting regularization, Geophysical Journal International, 215, 1841–1864.
 [Loris et al. 2007] Loris, I., Nolet, G., Daubechies, I. & Dahlen, F. A., 2007. Tomographic inversion using l1norm regularization of wavelet coefficients, Geophysical Journal International, 170, 359–370.
 [Marquardt 1963] Marquardt, D. W., 1963. An algorithm for leastsquares estimation of nonlinear parameters, Journal of the society for Industrial and Applied Mathematics, 11, 431–441.
 [Matsubara et al. 2017] Matsubara, M., Sato, H., Uehira, K., Mochizuki, M. & Kanazawa, T., 2017. Threedimensional seismic velocity structure beneath Japanese Islands and surroundings based on NIED seismic networks using both inland and offshore events, Journal of Disaster Research, 12, 844–857.
 [Matsubara et al. 2019] Matsubara, M., Sato, H., Uehira, K., Mochizuki, M., Kanazawa, T., Takahashi, N., Suzuki, K. & Kamiya, S., 2019. Seismic velocity structure in and around the Japanese Island Arc derived from seismic tomography including NIED MOWLAS Hinet and Snet data, in Seismic Waves  Probing Earth System, IntechOpen.
 [Matsuzawa et al. 1986] Matsuzawa, T., Umino, N., Hasegawa, A. & Takagi, A., 1986. Upper mantle velocity structure estimated from PSconverted wave beneath the northeastern Japan Arc, Geophysical Journal International, 86, 767–787.
 [Moran et al. 1999] Moran, S. C., Lees, J. M. & Malone, S. D., 1999. P wave crustal velocity structure in the greater Mount Rainier area from local earthquake tomography, Journal of Geophysical Research: Solid Earth, 104, 10775–10786.
 [Nakata et al. 2017] Nakata, R., Hino, H., Kuwatani, T., Yoshioka, S., Okada, M. & Hori, T., 2017. Discontinuous boundaries of slow slip events beneath the Bungo Channel, southwest Japan, Scientific reports, 7, 6129.
 [Nakata et al. 2016] Nakata, R., Kuwatani, T., Okada, M. & Hori, T., 2016. Geodetic inversion for spatial distribution of slip under smoothness, discontinuity, and sparsity constraints, Earth, Planets and Space, 68, 20.
 [NIED 2019] National Research Institute for Earth Science and Disaster Resilience, 2019. NIED Hinet, National Research Institute for Earth Science and Disaster Resilience.
 [Negahban et al. 2012] Negahban, S. N., Ravikumar, P., Wainwright, M. J. & Yu, B., 2012. A unified framework for highdimensional analysis of Mestimators with decomposable regularizers, Statistical science, 27, 538–557.
 [Nugraha & Mori 2006] Nugraha, A. D. & Mori, J., 2006. Threedimensional velocity structure in the Bungo Channel and Shikoku area, Japan, and its relationship to lowfrequency earthquakes, Geophysical research letters, 33, L24307.
 [R Core Team 2020] R Core Team, 2020. R: A language and environment for statistical computing, available from: https://www.Rproject.org/.
 [Schmidt et al. 2007] Schmidt, M., NiculescuMizil, A. & Murphy, K., 2007. Learning graphical model structure using L1regularization paths, in AAAI, volume 7, 1278–1283.
 [Selvin et al. 2016] Selvin, S., Ajay, S. G., Gowri, B. G., Sowmya, V. & Somon, K. P., 2016. L1 trend filter for image denoising, Procedia Computer Science, 93, 495–502.
 [Thurber 1983] Thurber, C. H., 1983. Earthquake locations and threedimensional crustal structure in the Coyote Lake area, central California, Journal of Geophysical Research: Solid Earth, 88, 8226–8236.
 [Thurber 1993] Thurber, C. H., 1993. Seismic tomography: Theory and practice, in Local earthquake tomography: velocities and Vp/Vstheory, ed. Iyer, H. M. & Hirahara, K., Chapman and Hall.
 [Tibshirani 1996] Tibshirani, R., 1996. Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 58, 267–288.
 [Tibshirani 2014] Tibshirani, R. J., 2014. Adaptive piecewise polynomial estimation via trend filtering, The Annals of Statistics, 42, 285–323.
 [Ueno et al. 2002] Ueno, H., Hatakeyama, S., Aketagawa, T., Funasaki, J. & Hamada, N., 2002. Improvement of hypocenter determination procedures in the Japan Meteorological Agency, Quarterly Journal of Seismology, 65, 123–134.
 [Wang et al. 2016] Wang, Y.X., Sharpnack, J., Smola, A. J. & Tibshirani, R. J., 2016. Trend filtering on graphs, The Journal of Machine Learning Research, 17, 3651–3691.
 [Wessel & Smith 1998] Wessel, P. & Smith, W. H. F., 1998. New, improved version of Generic Mapping Tools released, Eos, Transactions American Geophysical Union, 79, 579.
 [Yao et al. 2011] Yao, H., Gerstoft, P., Shearer, P. M. & Mecklenbräuker, C., 2011. Compressive sensing of the TohokuOki Mw 9.0 earthquake: Frequencydependent rupture modes, Geophysical Research Letters, 38, L20310.
 [Zhang & Thurber 2003] Zhang, H. & Thurber, C. H., 2003. Doubledifference tomography: The method and its application to the Hayward fault, California, Bulletin of the Seismological Society of America, 93, 1875–1889.
 [Zhang et al. 1998] Zhang, J., ten Brink, U. S. & Toksöz, M. N., 1998. Nonlinear refraction and reflection travel time tomography, Journal of Geophysical Research: Solid Earth, 103, 29743–29757.
 [Zhao et al. 1992] Zhao, D., Hasegawa, A. & Horiuchi, S., 1992. Tomographic imaging of P and S wave velocity structure beneath northeastern Japan, Journal of Geophysical Research: Solid Earth, 97, 19909–19928.
 [Zhao et al. 1997] Zhao, D., Matsuzawa, T. & Hasegawa, A., 1997. Morphology of the subducting slab boundary in the northeastern Japan arc, Physics of the Earth and Planetary Interiors, 102, 89–104.
Appendix: An optimization with ADMM
The objective function of our method, as outlined in Subsection 2.2, is:
Here we consider an optimization of the quadratic form (hereafter denoted by ), instead of the sum of the squared residuals . We then separate
into an additive form of two terms that depend on only the velocity parameters and hypocentral parameters via QR decomposition:
. We obtain the following optimization problem via the duality principle:where represents the norm,
is a matrix satisfying (), and , , and are the numbers of grid points in , , and directions, respectively.
We then apply the Lagrange multiplier method, and consider the following function:
We obtain estimates by applying the ADMM ([Glowinski & Marroco 1975], [Gabay & Mercier 1976]), a widely used algorithm that is well suited for solving distributed convex optimization problems (e.g., [Boyd et al. 2011], [Li & Harris 2018]):
where “” is the proximal operator. We note that the minimization function for is nonlinear and nonconvex, and that it takes an additive form of the sums of squares via iterative quadratic approximation. Therefore, we obtain the optimal value using the Levenberg–Marquardt method ([Levenberg 1944], [Marquardt 1963], [Gavin 2013]).