In this paper, we consider the well-known Helmholtz equation defined in () as follows:
imposed with the Sommerfeld radiation condition
where is the unknown function, is the source term and denotes the wave number with being the angular frequency and the wave speed. Solving the Helmholtz equation (1) efficiently, especially with large wave number, is crucial in many physics and engineering problems. For example, in exploration seismology, the Helmholtz equation (1) with the pre-given wave speed is needed to be solved for hundreds of different sources in reverse time migration, and even more in full wave inversion. However, since the discrete Helmholtz problem with high wave number is highly indefinite, finding the efficient solver is quite challenging Gander2012 , and many methods has been proposed and studied, including the direct method Frontal1 , the multigrid method Erlangga2006 and the domain decomposition method Despres1990 , all of them are briefly reviewed below.
The direct method such as the multifrontal method Frontal1 with nested dissection George1973 , was designed to solve linear systems arising from discretization of general PDE problem, and could be employed to solve the discrete Helmholtz problem. The multifrontal method is further coupled with the hierarchically semi-separable matrices (HSS) HSS , and the low rank proprieties are exploited to reduce the order of computational complexity for many problems including Helmholtz problem Xia2010 ; Wang2016 . However, the low-rank representation for the Helmholtz kernel in high frequency is missing Engquist2016 , which causes the HSS coupled multifrontal method less effective for the high frequency problem. On the other hand, some variants of multifrontal method are introduced in Gillman2015 ; Leng2015 for the Helmholtz problem. These methods focus on constructing the Dirichlet to Neumann (DtN) map for the subdomains in the nested dissection, which is more intuitive than manipulating the algebraic matrices in the multifrontal method while the order of computational complexity remains the same.
The multigrid method with the shifted Laplace was first introduced in Erlangga2006 , and then further developed in Erlangga2005 ; Erlangga2006b ; Erlangga2008 ; Umetani2009 ; Sheikh2013 ; Aruliah2002 . A complex shift is added to the Helmholtz operator in this method, resulting in an easier problem that could be solved with multigrid solver, which then can be used as an effective preconditioner for the original Helmholtz problem. The shifted Laplace method has been shown to be very effective, and followed by many research in literature, to name a few, Airaksinen2007 ; Calandra2013 ; Cools2014 ; Cools2015 ; Tsuji2015 ; Hu2016 ; Stolk2016 ; Reps2017 ; Gander2015 . The amount of the shift is a compromise, a larger shift leads to easier problem to solve in preconditioning but more iteration steps in the Krylov subspace solve while a smaller shift results in harder preconditioning but less iteration steps. For the high frequency problem, if the shifted problem in preconditioning is required to be solved efficiently, then the number of iterations in the Krylov subspace solve grows fast as the frequency grows, thus the high frequency problem is still a big challenge for the shifted Laplace method.
The domain decomposition method (DDM) was first applied to solve the Helmholtz problem in Despres1990 . A good approximation of the Dirichlet to Neumann (DtN) map is the key to maintain the effectiveness of DDM for the Helmholtz equation, and various transmission conditions are proposed to approximate the DtN map, leading to various DDM for the Helmholtz equation as in Collino2000 ; SchwarzBdry1 ; SchwarzBdry2 ; SchwarzBdry3 ; SchwarzBdry4 ; SchwarzBdry5 ; SchwarzBdry6 ; Douglas1998 ; Boubendir2012 ; Schadle2007 ; Toselli1999 . The additive nature of these DDM causes the number of iteration grows fast as the number of subdomains grows.
The sweeping type DDM for the Helmholtz problem was first introduced by Engquist and Ying in Engquist2011a ; Engquist2011b , and followed by many variants, such as single layer potential by Stolk Stolk2013 , the double sweeping preconditioner developed by Vion an Geuzaine Vion2014 , the polarized trace method by Zepeda Zepeda2014 , and the source transfer DDM (STDDM) by Chen and Xiang Chen2013a ; Chen2013b . These methods mainly differ at the the transmission conditions, and are uniformly formulated in the context of optimized Schwarz by Gander and Zhang GanderReview . These DDMs usually decomposes the domain into layers, and sweep forwards and backwards in the layers to obtain a good approximation of the solution.
The sweeping type DDMs could be interpreted as the or factorization and forward/backward substitutions, and they generally have two phases, the factorization phase and sweeping phase. In the factorization phase, the local discrete systems of subdomains are factorized, which could be done in parallel. In the sweeping phase, the local solves of subdomains are applied one by one to form the global solution, which is a sequential process. The factorization phase is the bottleneck for the sweeping type DDMs for the Helmholtz problem in , since the factorization of each 2D layered subdomain requires a scalable and efficient direct solver, which is hard to accomplish as mentioned previously. On the other hand, although the sweeping phase is sequential, it could be arranged in a pipeline for the case of multiple right-hand sides (RHSs), which is quite common in many practical applications such as seismic imaging and electromagnetic scattering.
The recursive sweeping DDMs were proposed by Liu and Ying Liu2015b and by Du and Wu Wu2015 based on the sweeping preconditioner and source transfer DDM, respectively. In these methods, each of the layered subdomains is further decomposed into smaller layers in the perpendicular direction, and again solved with the sweeping DDM. In such a way, the bottleneck of the factorization of subdomains no longer exists. However, the difficulty is shifted to the sweeping phase; the number of steps used for each sequential subdomain sweep is now proportional to the number of subdomains, that causes these methods not suitable for parallel computing in practice, for example, when solving the multiple RHSs problem with the pipeline processing, the construction of an efficient pipeline will require too many RHSs.
The success of using source or trace transfer in sweeping DDM with layered partition inspires the additive overlapping DDM for the Helmholtz equation proposed by Leng and Ju Leng2019 , which is completely based on structured subdomains along all spatial directions (i.e., checkerboard domain decomposition) in the context of the source transfer method, and it is proved that the method could produce the exact solution in finite steps for constant medium problem. The corner direction transfers are considered for the first time in this method. It is observed that for the case that the source lies in only one subdomain, the exact global solution can be constructed with the subdomain solution marching along four diagonal directions. Following this work, Taus et. al. Zepeda2019 recently proposed a sweeping-type method, called “L-sweeps”, which is based on the corner direction transfers but wisely utilizes the property of diagonal subdomain solution marching and arranges the solving order into sweeps of all directions. The L-sweeps method produce an outstanding algorithm with complexity where denotes the number of unknowns of the discrete system. Furthermore, the number of steps required by each sequential subdomain sweep in the method is only proportional to the -th root of the number of subdomains, thus the L-sweeps method is much more suitable for parallel computing compared to the recursive sweeping methods. When solving the multiple RHSs problem using the L-sweeps method with pipeline, the requirement on the number of RHSs is practical and could be easily satisfied.
In this paper, we propose a new diagonal sweeping method for solving the Helmholtz equation (1) based on checkerboard domain decompositions. Compared to the L-sweeps method Zepeda2019 , our algorithm has two major advantages in terms of efficiency and effectiveness:
The needed sweeps in each preconditioning solve are reduced from L-sweeps of directions (8 in and 26 in respectively) to diagonal sweeps of directions (4 in and 8 in respectively).
The reflections are treated more properly for the layered medium problems, increasing from one reflection to averagely two reflections per preconditioning solve.
The rest of the paper is organized as follows. We first review the perfect match layer (PML) problem associated with the Helmholtz equation and the corresponding additive overlapping DDM Leng2019 with source transfer in and in Section 2. Based on wisely re-arranging the solving order of the additive DDM, the diagonal sweeping DDM with source transfer in is proposed and analyzed in Section3 and its extension to in Section 4. In addition, we show that the resulted DDM solutions are exactly the solutions of the PML problems in the constant medium case. In Section 5, various numerical experiments in and are performed to verify the convergence of the proposed diagonal sweeping DDM method for the constant medium problems, and then to test the efficiency and effectiveness of the method as an preconditioner for layered medium and even more complicated problems. Some concluding remarks are finally drawn in Section 6.
2 Perfect match layer and additive overlapping DDM with source transfer
In this section, we first recall the Perfect Match Layer (PML) method and the source transfer technique, and then review the additive overlapping DDM with source transfer proposed in Leng2019 , which is the basis of the diagonal sweeping DDM proposed in this paper.
2.1 PML and source transfer
The Helmholtz equation (1) defined in the whole space with the Sommerfeld radiation condition (2) can be solved in a bounded domain such as a rectangular box using the so-called uniaxial PML method Berenger ; Chew1994 ; Kim2010 ; Bramble2013 ; Chen2013a , provided that the source lies inside the box domain. Suppose that a rectangular box in is defined as , with the center of the box denoted by where , for . Let , , with being piecewise smooth functions such that
where is a smooth medium profile function, then the complex coordinate stretching for is defined as
The PML equation is then defined under the complex coordinate stretching as follows:
where is the PML solution with
The well-posedness of the weak problem associated with equation (5) has been established in (Chen2013a, , Lemma 3.3), and the PML solution equals within the box and decays exponentially outside of the box. For convenience, we denote by the PML problem (5) associated with the rectangular box , and denote by the linear operator associated with .
Similarly in , the PML equation for the cuboidal box could be defined as (5) with
where and is defined in the same way as (3).
The source transfer technique is presented in the following. The case of is used for illustration and similar result also holds for . Suppose that a piecewise smooth curve divides into two parts and , and at the meantime, the curve also divides the rectangular box into two parts. Let be the extended domain of by a distance of , for instance, , where is a positive constant, and denote , as shown in Figure 1-(a). There always exists a smooth cutoff function with such that
where denotes certain generic positive constant. Then we have the following Lemma on source transfer Leng2019 :
Suppose that the support of is in . Let be the solution to the PML problem with the source (i.e, in ). Given as the restriction of on , such as , and let be solution to the PML problem with the source (i.e., in ). Then it holds that in and in .
The above Lemma is pretty straight forward based on the fact is the partial modification of and is the correction to the modification according to the residual . Lemma 2.1 is applied in the additive DDM Leng2019 for two types of boundaries , the straight line and the fold line, as shown in Figure 1-(b) and (c), which correspond to the horizontal/vertical directional transfer and the corner directional transfer, respectively. The overlapping region in Lemma 2.1 is handled with a shifted PML media profile function with
For simplicity, the above shifted medium profile is denoted by from now on, and when we refers to the PML problem , an extended region of width is always attached to the rectangular box , for possible overlapping with its neighbor regions.
The domain decomposition that we use in this paper is stated as follows. The rectangular domain in is uniformly partitioned into nonoverlapping rectangular subdomains. Let , for , and , for . Then we have nonoverlapping rectangular subdomains as
For convenience, we also define the box (, ), which consists of a set of rectangular subdomains as follows:
It is clear that the PML equation associated with each rectangular subdomain needs to be solved in the DDM method. The source , which is assumed to be compactly supported in , is decomposed to
Notice that the PML profile as (6) makes each subdomain has an overlapping region with its neighbor subdomains, thus we next define an overlapping domain decomposition of the two-dimensional space as
Similarly, for the cuboidal domain in , the partition in -direction is done with , for , then we have nonoverlapping subdomains , overlapping subdomains , and decomposed sources (, , ).
2.2 The additive overlapping DDM with source transfer
The additive DDM proposed in Leng2019 is based on checkerboard domain decomposition and source transfer between overlapping subdomains. Let us first illustrate it with the domain partition in . A few notations are first introduced below. Two truncation functions are defined as
and four one-dimensional cutoff functions are defined as
where is a monotone cutoff function in such that for , for , and for With the above one-dimensional cutoff functions, the corresponding two-dimensional cutoff functions associated with the subdomains () are defined as
Denote by the linear operator associated with the PML problem .
Let us first consider the simple case that the source lies in . At step 1, the subdomain PML problem is solved with the source and the solution is denoted by , as is shown in Figure 2-(a), the horizontal and vertical transferred sources are computed on each subdomain. At step 2, the local problem is solved with the right transferred source (see Figure 2-(b)) as the local source, and the solution is denoted by (see Figure 2 (c)). By using Lemma 2.1 for , the rightward source transfer is applied, and we have
as is shown in Figure 2-(d). Similarly, the local solution of the subdomain are obtained, and we have
as shown in Figure 2-(f) and (g).
Note that the additive DDM has an important property that the subdomain solving is not direction related, in the sense that, on each subdomain, the transferred sources coming from different directions are summed into one local source and then solved. At step 3, the subdomain solution has already been constructed in , and in previous steps, thus only the solution for needs to be constructed. However, in order to derive an algorithm that is not direction related as mentioned above, instead of directly using a corner directional source transfer, the horizontal and vertical source transfers are applied again on each subdomain, though the nonzero ones are only the upper transfer of in and right transfer of in . In addition to the horizontal and vertical sources and transferred to , the corner direction transferred source is also passed to . Using (7) and (8), we have that the summation of transferred sources for is in fact , where is the cutoff function for the L-shaped domain , then by using Lemma 2.1, the corner directional source transfer is applied, and we know that . Now that the local solutions in all subdomains are obtained, a formula of the global solution expressed as the combination of local solutions is then to be derived. From (7) and (8), it holds that
thus can be expressed as , and we have
or in a more symmetric form,
Now we are ready to state the additive DDM for domain partition in the case of general source . At step 1, solve the local problems with local sources and denote the solutions as . Then at step 2, solve the local problems with horizontal and vertical transferred sources calculated by using the solutions of step 1, and denote the solutions as . Finally at step 3, solve the local problems with horizontal, vertical and corner transferred sources calculated by using the solutions of step 1 and 2, and denote the solution as . Then the DDM solution is constructed to be
and is indeed the solution to with source in the constant medium case.
To illustrate the extension of the additive DDM from to partitions, let us define the following one-dimensional cutoff functions,
for , . Note that the symbols and are used to indicate the signs of the and components of a direction, respectively. With the above one-dimensional cutoff functions, two-dimensional ones are defined as
for with , and
Define the following truncation functions for the half spaces and the quarter spaces in :
Using the above cutoff functions, truncation functions and linear operators, we are able to define the source transfer operators in as:
for with Then the additive overlapping DDM with source transfer in Leng2019 can be stated as follows:
Algorithm 2.1 (Additive overlapping DDM with source transfer in Leng2019 ).
For the constant medium case, it is proved in Leng2019 that the DDM solution defined by (13) is the exactly the solution of the with the source . In the above Algorithm 2.1, all the sources , , are solved simultaneously. For a given source , depending on the relative position of the subdomain to subdomain , the subdomains can be divided into two types in the solving process: the first type consists of the ones with either or , of which the local solutions effected by source are obtained by applying horizontal or vertical source transfer, the other type consists of the ones with and , of which the local solutions effected by source are obtained by applying corner directional source transfer.
To illustrate the method in , the same notations as above for the and components are reused and we also add the notations for the component. The one-dimensional cutoff functions in the direction are defined as
and then the cutoff function for each subdomain are
where with , and
The truncation functions for the half spaces, the quarter spaces and the eighth spaces in are defined as
Then the corresponding transfer function in is defines as
for and . With these notations, the additive overlapping DDM with source transfer in Leng2019 can be stated as follows:
Algorithm 2.2 (Additive overlapping DDM with source transfer in Leng2019 ).
It is shown in Leng2019 that in the case of source lying in only one subdomain in (or in ), the subdomain (or ) performs nonzero local solving only at step (or ), and construct the exact solution in the subdomain at that very step. This results in subdomain solution marching in diagonal directions, and such diagonal marching suggests a sweeping type solver, which will be derived in the next section. We note that this property makes it possible to reduce the sweeping solve of all directions Zepeda2019 to only diagonal directions.
3 The diagonal sweeping DDM with source transfer in
In this section we will derive the diagonal sweeping DDM with source transfer in by starting with the source lying within only one subdomain. If the exact solution is constructed for the case of the source lying only one subdomain, and the solving procedure does not depend on that specific one subdomain, then the exact solution could also be constructed straightforwardly for the case of general source. Without loss of generality, we take a () domain partition and assume that the source lies only in () in our illustration. There are totally diagonal directions in : , , , , and the sweep along each of the directions contains a total of steps.
We perform the first sweep along the direction , i.e., from lower-left subdomains to upper-right subdomains, where the -th step of this sweep handles the group of subdomains with . In the first steps, the solution is always zero since the local source in with is zero. At step , the subdomain problems in is solved with the source , transferred sources are generated and passed to its neighbor subdomains correspondingly for later use, as shown in Figure 3-(a). At step 6, the subdomain problems in and are solved. Take as example, the horizontal source transfer is applied, in which the rightward transferred source from at step 5 is used as the local source for , the local subdomain problem is solved, and 5 new transferred sources are generated and passed to its corresponding neighbor subdomains, as shown in Figure 3-(b). At step 7, the subdomain problems in , and are solved. The cases in and are similar as step 6. As for , the corner directional source transfer is applied, in which the upward transferred source from at step 6, the rightward transferred source from at step 6, and the upper-right transferred source from at step 5 are summed as the local source for , the local subdomain problem is solved, and 3 new transferred sources are generated and passed to its corresponding neighbor subdomains, as shown in Figure 3-(c). The following steps in the sweeping continues and at step 9, the solution is constructed in the upper-right quadrant with respect to .
We note that the subdomains on which the upwards transfers are solved, namely and , are handled by this sweep of upper-right direction, while they are handled by the upwards sweep in the L-sweeps method. Similarly, the subdomains on which the rightwards transfers are solved, namely and , are handled by this sweep, while they are handled by the rightwards sweep in the L-sweeps method. This illustrates the major difference between the L-sweeps method and the proposed diagonal sweeping method, that is the horizontal and vertical sweeps in the former method are merged into the diagonal sweeps in the latter method.
It is clear that the directions of the sweeps and the source transfers are important in designing the sweeping algorithm. We define that two vectorsand in are in the similar direction if and only if . In the steps of the first sweep, it is found that only the transferred sources in the directions , and are used and they are in the similar directions of the current sweep , while the others are left for future sweeps. Thus the first rule on the transferred source in sweeps in is defined as:
(Similar directions in ) A transferred source which is not in the similar direction of one sweep, should not be used in that sweep.
In the second sweep, the direction of sweep is chosen to be , which aims at constructing the solution in the upper-left quadrant . Note that the -th step of this sweep handles the group of subdomains with . According to Rule 3.2, among all the transferred sources left from the previous sweep (i.e., the first sweep), the ones with directions , and will be used in the second sweep, since they are in the similar direction to the the current sweeping direction . There is nothing to solve at the first steps of the second sweep. At step 6, the subdomain problem in is solved with the leftward transferred source from , and 5 transferred sources are generated and passed to its neighbor subdomains as shown Figure 3-(e). At step 7 of the second sweep, the subdomain problems on and are solved as shown Figure 3-(f), and so on for the following steps, and after step 9 the solution is constructed in the upper-left quadrant , again leaving some transferred sources for future sweeps. It is found that the subdomains that need leftwards transfer solving are handled in the second sweep, while the subdomains that need upwards transfer solving have already been handled in the first sweep.
In the third sweep, the direction of sweep is choose to be , which aims at constructing the solution in the lower-right quadrant as shown in Figure 4-(a). Note that the -th step of this sweep handles the group of subdomains with . The transferred sources from the upper-right quadrant are needed, while the transferred sources from the upper-left quadrant should be excluded, thus we need to introduce one more rule for the source transfer in sweeps. Note that the new rule should not make decisions for transferred sources based on the relative position with respect to , otherwise the method is only valid for this special case of the source lying only in . The second rule on the transferred source in sweeps in is defined as follows:
(Opposite directions in ) The horizontal or vertical transferred source generated in one sweep in should not be used in a later sweep if these two sweeps have opposite directions.
Such a rule affects neither the transferred sources in the previous two sweeps, nor the transferred sources from the upper-right quadrant in the third sweep, but effectively prevent the transferred sources from the upper-left quadrant to enter the third sweep, since they are generated in the second sweep, which has the opposite direction to the third sweep. There is nothing to solve at the first steps of the third sweep. At step 6, the subdomain problem in is solved with the downward transferred source from , and 5 new transferred sources are generated and passed to its neighbor subdomains as shown Figure