# A fast Fourier transform based direct solver for the Helmholtz problem

This paper is devoted to the efficient numerical solution of the Helmholtz equation in a two- or three-dimensional rectangular domain with an absorbing boundary condition (ABC). The Helmholtz problem is discretized by standard bilinear and trilinear finite elements on an orthogonal mesh yielding a separable system of linear equations. The main key to high performance is to employ the Fast Fourier transform (FFT) within a fast direct solver to solve the large separable systems. Numerical results for both two- and three-dimensional problems are presented confirming the efficiency of the method discussed.

## Authors

• 1 publication
• 5 publications
• ### A fast algorithm for the electromagnetic scattering from a large rectangular cavity in three dimensions

The paper is concerned with the three-dimensional electromagnetic scatte...
08/05/2020 ∙ by Yanli Chen, et al. ∙ 0

• ### Solution of Wiener-Hopf and Fredholm integral equations by fast Hilbert and Fourier transforms

We present numerical methods based on the fast Fourier transform (FFT) t...
06/09/2021 ∙ by Guido Germano, et al. ∙ 0

• ### Constructing an orthonormal set of eigenvectors for DFT matrix using Gramians and determinants

The problem of constructing an orthogonal set of eigenvectors for a DFT ...
12/12/2017 ∙ by Vadim Zaliva, et al. ∙ 0

• ### Fast finite-difference convolution for 3D problems in layered media

We developed fast direct solver for 3D Helmholtz and Maxwell equations i...
09/03/2019 ∙ by Vladimir Druskin, et al. ∙ 0

• ### Scalable Algorithms for High Order Approximations on Compact Stencils

The recent development of parallel technologies on modern desktop comput...
12/07/2019 ∙ by Yury Gryazin, et al. ∙ 0

• ### An alternative extended linear system for boundary value problems on locally perturbed geometries

This manuscript presents a new extended linear system for integral equat...
07/16/2020 ∙ by Yabin Zhang, et al. ∙ 0

• ### An implementation of an efficient direct Fourier transform of polygonal areas and volumes

Calculations of the Fourier transform of a constant quantity over an are...
04/16/2021 ∙ by Brian B. Maranville, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

This work presents an efficient fast direct solver employing FFT for the Helmholtz equation

 (1) −△u−ω2u=f

in a rectangular domain with absorbing boundary conditions (ABCs). Here, denotes the wave number, which is assumed to be a constant. This condition is valid for homogeneous media. For example, the iterative solution techniques for acoustic scattering by objects in homogeneous and layered media considered in [10, 12] leads a sequence of such problems. In this work, we derive the fast solver for the case of having a first-order ABC imposed. However, the same method can be used for the second-order ABCs employed in [10]. These ABCs are the time-harmonic counterpart of the ABCs for the wave equation considered in [1]. Moreover, the solver can be used also for problems with Robin boundary conditions, which essentially broadens its applicability. Solving the Helmholtz equation is in general difficult or impossible to solve efficiently with most numerical methods. Difficulties related to the numerical solution of time-harmonic Helmholtz equations are discussed for instance in [19].

The numerical method proposed in this work is a fast direct solver, which is applicable for problems posed in rectangular domains and having suitable tensor product form matrices. This kind of diagonalization technique has already been proposed in

[14]. However, we use FFT and sparsity in order to implement it efficiently. For example, bilinear or trilinear finite element discretizations employed in this paper lead to matrices with such suitable tensor product form. Also the fourth-order accurate modified versions of these elements [7] are applicable with the solver. To improve the efficiency this method employs FFT instead of cyclic reduction. Several other fast direct solvers for elliptic problems in rectangular domains are discussed, e.g., in [17].

The fast direct solvers can be used as efficient preconditioners in the iterative solution of problems in more general domains, see [3, 11]. Other methods, which are based on an equivalent formulation of the original problem to enable preconditioning with fast direct methods, are referred to as fictitious domain, domain imbedding, or capacitance matrix methods, and they have been successfully applied also to acoustic scattering problems for instance in [6, 8, 9, 11] as well as to scattering problems with multiple right-hand sides, where a specific method for such problems is considered, for example, in [18].

A different fast direct solver for the Helmholtz equation (1) has already been presented in [10], where cyclic reduction techniques are used in the solution procedure. In [10], the application of a specific cyclic reduction-type fast direct method is presented, called the partial solution variant of the cyclic reduction (PSCR) method, to the solution of the Helmholtz equation [15, 16, 20] and its computational efficiency is demonstrated. For three-dimensional problems, a general fast direct solver like the PSCR has excellent parallel scalability, see [16]. However, the use of FFT over cyclic reduction techniques is preferable, since FFT is generally faster. That is partly due to very fast implementations of FFT.

The reason for not applying FFT previously was that the ABCs prevent the diagonalization using FFT. The following three steps describe the basic idea to solve this problem and are discussed in more detail in this work:

1. Some of the boundary conditions are changed on the ABC part of the boundary to be of periodic type. This modified problem can be solved now with an FFT solver.

2. Since the boundary conditions have been changed to periodic ones, the residual vector is computed due to these incorrect boundary conditions. This has nonzero components only on the boundary parts, where we have changed the boundary conditions. The correction is computed by solving a problem with the original matrix, where the right-hand side vector is the residual. Only the component of the correction which lie on the changed boundaries are computed. For this the so-called partial solution problem a special technique exists, see

[2, 10, 13, 16] for example. Due to sparsity of the vectors the solution requires only or operations in case of two or three dimensions, respectively, where is the total number of unknowns. After this step, the correct boundary values are known.

3. A similar problem to the one in the first step is solved, but now the right-hand side vector is adjusted so that the solution has correct values on the boundaries. From this it follows that the solution is also the solution of the original problem.

A similar method to the above one was already proposed in [5], but with the major difference that the problem in the second step was solved iteratively. The new solver employing the FFT method is efficient in terms of computational time and memory usage, especially in the three-dimensional case. For three-dimensional problems, a general fast direct solver like the PSCR method [16] requires operations, whereas the FFT based solver reduces the complexity to operations.

The paper is organized as follows: In Section 2, we present the classic and variational formulation of the Helmholtz problem as well as its discretization by bilinear and trilinear finite elements leading to a system of linear equations. The main idea of the solver for this problem and some preliminaries, which appear in the initialization process of the implemented fast solver for the two- and three-dimensional case, are discussed in Section 3. Sections 4 and 5 are devoted to the two- and three-dimensional problems, respectively. Here, some preliminaries as well as the fast solver steps are discussed. Finally, numerical results for both two- and three-dimensional problems are presented in Section 6, and conclusions are drawn in Section 7.

## 2. Problem formulation and discretization

Let , , be a -dimensional rectangular domain and its boundary. We consider the Helmholtz equation describing the linear propagation of time-harmonic acoustic waves given by

 (2) −△u−ω2u =fin Ω, (3) Bu =0on Γ,

where denotes the wave number. Here, equation (3) is an approximation for the Sommerfeld radiation condition

 (4) limr→∞r(d−1)/2(∂ru−iωu)=0

appearing for the general model problem in . The approximation is performed by truncating the unbounded domain at a finite distance, which provides the boundary condition at the truncation boundary. In order to reduce spurious reflections caused by this artificial boundary, we use local absorbing boundary conditions. In general, the boundary can be decomposed into parts where different boundary conditions are imposed. Let now be decomposed into an absorbing boundary condition part and a Neumann boundary condition part . The Neumann boundary conditions are given by

 (5) Bu=∇u⋅n=0on ΓN,

whereas in case of absorbing boundary conditions we have that

 (6) Bu=∇u⋅n−iωu=0% on ΓB,

where denotes the outward normal to the boundary. This type of absorbing boundary conditions (6) are called first-order absorbing boundary conditions.

###### Remark 1.

In practical applications, so-called second-order absorbing boundary conditions are also important and were considered in [10] as well. However, the choice of the absorbing boundary conditions does not have any impact on the performance of the proposed solver. Thus, the same method can be used with given second-order absorbing boundary conditions, see [10].

In order to obtain the discrete version of the Helmholtz problem (2)–(3), let us first state its weak formulation. For that we introduce the Hilbert space . Let the source term be given. Multiplying (2) with a test function as well as applying integration by parts and the boundary condition (3), yields the weak problem: Find such that

 (7) a(u,v)=∫Ωfvdx∀v∈V,

where

 (8) a(u,v)=∫Ω(∇u⋅∇v−ω2uv)dx−iω∫∂Ωuvds.

Let us denote the mesh points by , for every -direction with and the corresponding mesh size by . Thus, the mesh is equidistant in each direction . Moreover, we denote the full mesh size by , which is given by . Discretizing problem (7) by bilinear or trilinear finite elements on an orthogonal mesh leads to a system of linear equations given by

 (9) Au=f,

where the matrix has a separable tensor product form and denotes the discrete right-hand side. For the two-dimensional case, the matrix is given by

 (10) A=(K1−ω2M1)⊗M2+M1⊗K2,

whereas in three dimensions it is given by

 (11) A=(K1−ω2M1)⊗M2⊗M3+M1⊗(K2⊗M3+M2⊗K3).

Here, the -matrices and are one-dimensional stiffness and mass matrices, respectively, in the -direction with possible modifications on the boundaries due to the absorbing boundary conditions. They are computed by one-dimensional numerical quadrature on the unit interval and are given as follows

 (12) Kj=1hj⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝k1,1−1−12−1⋱⋱⋱−12−1−1knj,nj⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

and

 (13) Mj=hj6⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝21141⋱⋱⋱14112⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

where the first and last entries are including the corresponding boundary conditions. Absorbing boundary conditions (6) yield the entries , whereas Neumann boundary conditions lead to . The matrix is the same for both Neumann and (first-order) absorbing boundary conditions.

###### Remark 2.

Without loss of generality let us assume that the absorbing boundary conditions are posed in direction of for both (opposite) sides.

In the next two sections, an efficient fast solver for solving the large linear systems is proposed. We split the discussion into two sections corresponding to the two-dimensional and three-dimensional problems with the matrix given by (10) and (11), respectively.

## 3. Main idea and preliminaries

### 3.1. Basic steps

The idea for solving the problem is to consider an auxiliary problem

 (14) Bv=f,

where the system matrix is derived by changing some absorbing boundary conditions to periodic ones. The key is that we can solve the modified (periodic) problem now by using the FFT method, which is not possible for the original problem .

Before going into more details later, let us discuss the main steps of the solver.

Step 1: Solve problem (14) .

Step 2: Let . Solve

 (15) Aw=f−Av=Bv−Av=(B−A)v.

Step 3: Solve

 (16) Bu=f+(B−A)(v+w).

Note that after Step 2, we would have already had the identity , but we do not use this identity directly to compute , which will be explained in Subsections 4.2 and 5.2.

### 3.2. Preliminaries

We have assumed that the absorbing boundary conditions are given in -direction on both opposite boundaries, see Remark 2. Since we have to change the boundary conditions on the two opposite boundaries to be of periodic type for the auxiliary problem (14), we consider the one-dimensional stiffness and mass matrices and and change the entries corresponding to the boundary parts into periodic conditions for the auxiliary problem (14). We denote these matrices by and . The circulant matrices and are given as follows

 (17) KBj=1hj⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝2−1−1−12−1⋱⋱⋱−12−1−1−12⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

and

 (18) MBj=hj6⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝411141⋱⋱⋱141114⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

which means that we have changed the boundary conditions on two opposite boundaries to be of periodic type. Hence,

 (19) KBj,(1,2)=KBj,(1,nj)=KBj,(2,1)=KBj,(nj,1),MBj,(1,2)=MBj,(1,nj)=MBj,(2,1)=MBj,(nj,1).

Let us consider now the original problem (9) and the auxiliary problem (14) in more detail. After a suitable permutation and have the block forms

 (20) A=(AbbAbrArbArr)andB=(BbbAbrArbArr),

the subscripts and correspond to the nodes on the boundary and to the rest of the nodes, respectively. Note that the matrix has the structure

 (21) B−A=(Bbb−Abb000),

and only the matrix

 (22) Cbb=Bbb−Abb

has to be saved for the application of the fast solver. Steps of the fast solver for

include also the application of the so-called partial solution method, which is a special implementation of the method of separation of variables. This method involves the solution of the generalized eigenvalue problems

 (23) K1V1=M1V1ΛA1

and

 (24) KB1W1=MB1W1ΛB1,

where the matrices and contain the eigenvalues as diagonal entries and the matrices and

contain the corresponding eigenvectors as their columns. Let us define the more general problem

 (25) KBjWj=MBjWjΛBj

with circulant matrices and for the problem size corresponding to any -direction. The eigenvector matrix for the generalized eigenvalue problem (25) is given by

 (26) Wj=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝111…111e−2πinje−22πinj…e−(nj−2)2πinje−(nj−1)2πinj1e−22πinje−42πinj…e−2(nj−2)2πinje−2(nj−1)2πinj⋮⋮⋮⋱⋮⋮1e−(nj−2)2πinje−(nj−2)22πinj…e−(nj−2)22πinje−(nj−1)(nj−2)2πinj1e−(nj−1)2πinje−2(nj−1)2πinj…e−(nj−2)(nj−1)2πinje−(nj−1)22πinj⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

and the diagonal entries of the corresponding diagonal eigenvalue matrix are

 (27) ΛBj,l=KBj,(1,1)+KBj,(1,nj)e−(l−1)2πinj+KBj,(1,2)e−(l−1)(nj−1)2πinjMBj,(1,1)+MBj,(1,nj)e−(l−1)2πinj+MBj,(1,2)e−(l−1)(nj−1)2πinj

for . In order to apply the partial solution method the eigenvectors are normalized so that they satisfy the conditions:

 (28) VT1M1V1=I1 and VT1K1V1=ΛA1, (29) WT1MB1W1=I1 and WT1KB1W1=ΛB1.

This can be achieved by multiplying and with the vectors and of length , respectively. We denote the general vectors by and of length . The entries of are given by

 (30) sAj,l=1/∥Mj∥Vj,l

with the componentwise matrix norms

 (31) ∥Mj∥Vj,l=(VTj,lMjVj,l)1/2,

which are weighted by the -th eigenvectors of for . Similarly, the entries of are given by

 (32) sBj,l=1/∥MBj∥Wj,l

with the componentwise matrix norms

 (33) ∥MBj∥Wj,l=(WTj,lMBjWj,l)1/2,

which are weighted by the -th eigenvectors of for . In (28) and (29),

denotes the identity matrix of length

. In the following, and denote the identity matrices of lengths and , respectively. The eigenvector matrices and are used for solving the partial solution problems when needed in the steps of the solver. However, the generalized eigenvalue problems (23) and (24) have to be solved only once during the solution process – in the initialization. The conditions (28) and (29) also lead to a convenient representation for the inverses of the system matrices and , which is discussed in Subsections 4.1 and 5.1 for the respective two-dimensional and three-dimensional case. In the next two sections, we discuss the two-dimensional and three-dimensional problems in more detail separately, since the efficient implementation of the initialization process and the steps of the fast solver differs in both cases.

## 4. The two-dimensional case

### 4.1. Preliminaries for the two-dimensional problem

#### 4.1.1. Reformulation of problem matrices A and B

The matrix for the auxiliary problem (14) in the two-dimensional case is given by

 (34) B=(KB1−ω2MB1)⊗M2+MB1⊗K2.

Using the conditions (29) as follows

 (35) B=(W−T1ΛB1W−11−ω2W−T1I1W−11)⊗M2+(W−T1I1W−11)⊗K2=(W−T1(ΛB1−ω2I1)W−11)⊗M2+(W−T1I1W−11)⊗K2=(W−T1⊗I2)((ΛB1−ω2I1)⊗M2+I1⊗K2)(W−11⊗I2),

the inverse of can be represented by

 (36) B−1=(W1⊗I2)H−1B(WT1⊗I2),

where

 (37) HB=(ΛB1−ω2I1)⊗M2+I1⊗K2.

Similarly using the conditions (28), we obtain for the system matrix the following representation:

 (38) A−1=(V1⊗I2)H−1A(VT1⊗I2),

where

 (39) HA=(ΛA1−ω2I1)⊗M2+I1⊗K2.

#### 4.1.2. LU decomposition

Linear systems with the block diagonal matrices and are solved with a direct method. The matrices and consist of diagonal blocks each of size . In the following, let us discuss the method applied on the matrix (given by (37)), since it applies analogously for . First, the LU decomposition of is computed as follows

 (40) HB=LBUB.

Then the linear system

 (41) HBy=LBUBy=r

is solved by solving the respective two subproblems

 (42) LBz=r and UBy=z

consecutively in the application of the fast solver. Note that denotes now some right-hand side which is in the different steps of the solver also different. However, we will describe that in more detail in the next subsection. We denote by

 (43) HA=LAUA

the LU decomposition corresponding to . The structure of and is essential for the fast application of the direct solver. The diagonal blocks of these matrices are tridiagonal which makes the LU decomposition fast for them. The computational complexity is optimal .

### 4.2. Fast solver in the two-dimensional case

#### 4.2.1. Step 1

The auxiliary problem

 (44) Bv=B(vbvr)=f

is solved, but only and not is computed. For that, the FFT of the right-hand side is computed first, where its coefficients are given as follows

 (45) ^fk=N∑l=1e−2πi(l−1)(k−1)Nfl

for all and normalization might be needed, which means that is multiplied by the vector , where its components are defined in (32) with . Note that computing the FFT corresponds to the multiplication . The vector is saved, since it will be needed in Step 3 as well. Next, we apply the LU decomposition (41) with the right-hand side

 (46) LBz1=^f and UB~z1=z1.

Performing the inverse FFT on the resulting vector would provide both and , which would correspond to the multiplication . However, since we only need , instead of that, we multiply the vector by the matrix by taking advantage of the sparsity of the desired components . More precisely, we multiply from the left side by the eigenvectors of which correspond only to the boundary denoted by leading to

 (47) vb=(Wb1⊗I2)~z1,

which resembles the representation (36) for . The computational complexity of Step 1 is .

#### 4.2.2. Step 2

We introduce the additional vector defined as . Since

 (48) Aw=Au−Av=f−Av=Bv−Av

and (21), we obtain the following problem:

 (49) Aw=A(wbwr)=(B−A)v=(Cbbvb0),

where is defined by (22). We solve the problem (49) by using the representation (38), but compute again only and not , since the right-hand side as well as the desired components of the solution are both sparse. First, we compute

 (50) gb=(Vb1T⊗I2)Cbbvb,

then by using the LU decomposition (43) we solve

 (51) LAz2=gb and UA~z2=z2,

and, finally, we solve

 (52) wb=(Vb1⊗I2)~z2,

which resembles the representation (38) for but corresponds only to the boundary . The computational complexity of Step 2 is of optimal order .

#### 4.2.3. Step 3

Finally, in order to obtain the solution of the original problem (9), we solve now the problem

 (53) Bu=f+(B−A)(v+w)=f+(Cbb(vb+wb)0)

due to . Since we have already computed the Fourier transformation of in Step 1 by (45), we only need to compute the Fourier transformation of the second term of the right-hand side of equation (53) and due to sparsity again only the part corresponding to the boundary as follows

 (54) hb=(Wb1T⊗I2)Cbb(vb+wb)

leading to the Fourier transformation of the entire right-hand side of equation (53) denoted by . Next, we apply the LU decomposition (41) with this right-hand side yielding

 (55) LBz3=^f+^h and UB~z3=z3.

In the last step all resulting components are needed (not only the ones corresponding to ) to obtain the solution by applying the inverse Fourier transformation on , where its coefficients are given as follows

 (56) ^uk=1NN∑l=1e2πi(l−1)(k−1)N~z3,l

for all . Again the vector may need to be multiplied by the vector with entries defined in (32) for before applying the inverse Fourier transformation on it. The computation of the inverse Fourier transformation corresponds to the multiplication

 (57) u=(W1⊗I2)~z3.

The computational complexity of Step 3 is .

## 5. The three-dimensional case

### 5.1. Preliminaries for the three-dimensional problem

#### 5.1.1. Matrices A and B

The matrix in the three-dimensional case is given by

 (58) B=(KB1−ω2MB1)⊗M2⊗M3+MB1⊗(K2⊗M3+M2⊗K3).

Using the equations (28) and (29), the inverses of and can be represented analogously as in the two-dimensional case as follows

 (59) A−1=(V1⊗I23)H−1A(VT1⊗I23)

and

 (60) B−1=(W1⊗I23)H−1B(WT1⊗I23),

where

 (61) HA=((ΛA1−ω2I1)⊗M2+I1⊗K2)⊗M3+I1⊗M2⊗K3

and

 (62) HB=((ΛB1−ω2I1)⊗M2+I1⊗K2)⊗M3+I1⊗M2⊗K3,

respectively.

#### 5.1.2. Applying the two-dimensional fast solver

Linear systems with the block diagonal matrices and are again solved with a direct method. However, it is performed in a different way than for the two-dimensional case, since the LU decomposition is slow for large block tridiagonal problems. More precisely, the efficient implementation in three dimensions contains the application of the two-dimensional fast solver for subproblems of size including the computation of the partial solution method in -direction. For that, one needs to solve the following generalized eigenvalue problems during the initialization process:

 (63) K2V2=M2V2ΛA2 and KB2W2=MB2W2ΛB2

with the matrices and as defined in (17) and (18) but for the -direction now. Moreover, the diagonal eigenvalue matrices and and the eigenvector matrices and are formed analogously as for the -direction only the problem size changes to . For that, the generalized eigenvalue problem with circulant matrices for a general problem size was defined in (25) and the corresponding eigenvector matrix is represented in (26). The eigenvectors are normalized to satisfy the conditions

 (64) VT2M2V2=I2 and VT2K2V2=ΛA2, (65) WT2MB2W2=I2 and WT2KB2W2=ΛB2.

This can be achieved by multiplying and with the vectors and of length , respectively, which are introduced in (30)–(33). The fast solver includes the application of the partial solution method times for the following two-dimensional subproblems in -direction corresponding to the matrices (61) and (62):

 (66) AA,l=(K2−(ω2−ΛA1,l)=:pAM2)⊗M3+M2⊗K3

and

 (67) AB,l=(K2−(ω2−ΛB1,l)=:pBM2)⊗M3+M2⊗K3,

where . This yields the direct solution for and . Note that the subproblems (66) and (67) reflect the structure of (61) and (62), respectively, in the framework of the two-dimensional problem (10). Now the parameters and have been introduced in (66) and (67), respectively. The corresponding auxiliary problems are given by

 (68) BA,l=(KB2−pAMB2)⊗M3+MB2⊗K3,

and

 (69) BB,l=(KB2−pBMB2)⊗M3+MB2⊗K3,

which now reflect the structure of (61) and (62), respectively, in the framework of the two-dimensional problem (34). As defined in (27), is the th-diagonal entry of the diagonal eigenvalue matrix . Analogously, we have denoted by the th-diagonal entry of the diagonal eigenvalue matrix .

In summary, the two-dimensional problems (66) and (67) are solved for all by applying times the two-dimensional FFT based fast solver from Subsection 4.2 using the auxiliary two-dimensional problems (68) and (69). The computational complexity of this step is now .

### 5.2. Fast solver in the three-dimensional case

The three-dimensional solver has the same three main steps, but now instead of applying LU decomposition we apply the two-dimensional solver as described in the previous subsection. We present all the main steps of the solver similarly as for the two-dimensional case in Subsection 4.2 but in less detail. Hence, we refer the reader also to Subsection 4.2 for more details.

#### 5.2.1. Step 1

The auxiliary problem (44)

 Bv=B(vbvr)=f

is solved, but only and not is computed. For that, we need to compute first, which is equivalent to the computation of the FFT for the right-hand side , where its coefficients are given in (45) for all . The vector can be normalized by multiplying it with the vector of length with components defined in (32). The vector is saved, since it will be needed in Step 3 as well. Next, we apply times the two-dimensional solver for the problems (67) with the auxiliary problems (69) leading to the solution of the problem

 (70) HB~z1=^f.

Since we only need , we now multiply the vector by the matrix by taking advantage of the sparsity of the desired components . More precisely, we multiply from the left side by the components of the eigenvectors of which correspond only to the boundary denoted by leading to

 (71) vb=(Wb1⊗I23)~z1,

which resembles the representation (60) for .

#### 5.2.2. Step 2

We introduce the additional vector defined as . Since (48) and (21), we derive at problem (49)

 Aw=A(wbwr)=(B−A)v=(Cbbvb0).

We solve this problem by using the representation (59), but compute again only and not , since the right-hand side as well as the desired components of the solution are both sparse. First, we solve

 (72) gb=(Vb1T⊗I23)Cbbvb,

then by applying times the two-dimensional solver for the subproblems (66) with the auxiliary problems (68) leading to the solution of the problem

 (73) HA~z2=gb.

Finally, we compute

 (74) wb=(Vb1⊗I23)~z2,

which resembles the representation (59) for but corresponds only to the boundary .

#### 5.2.3. Step 3

The last step yields the solution of the original problem (9). First, we solve the problem (53)

 Bu=f+(B−A)(v+w)=f+(Cbb(vb+wb)0)

due to . Since we have already computed the Fourier transformation of in Step 1 by (45), we only need to compute the Fourier transformation of the second term of the right-hand side of equation (53) and due to sparsity again only the part corresponding to the boundary as follows

 (75) hb=(Wb1T⊗I23)Cbb(vb+wb)

leading to the Fourier transformation of the entire right-hand side of equation (53) denoted by . Next, we apply times the two-dimensional solver for the problems (67) with the auxiliary problems (69) leading to the solution of the problem

 (76) HB~z3=^f+^h.

In the last step all resulting components are needed (not only the ones corresponding to ) to obtain the solution by applying the inverse Fourier transformation on , where its coefficients are defined as in (56) for all . The vector may be normalized by multiplying it with the vector of length with the components defined in (32) before applying the inverse Fourier transformation which corresponds to the multiplication

 (77) u=(W1⊗I23)~z3.

finally leading to the solution .

In the next section, we present numerical experiments for both the two-dimensional and three-dimensional case.

## 6. Numerical results

For and , the computational domain is chose as the unit square and unit cube, respectively. The wave number is set for all numerical experiments. The discretization meshes are uniform with respect to each -direction, where the corresponding step sizes are denoted by . The right-hand side is chosen as 0.01 for the first entries and 1 for all the other entries. In this set of experiments, the efficiency of the fast direct solver is discussed. The numerical experiments have been computed using MATLAB 9.3, R2017b.

In the following, we compare the CPU times in seconds for computing the solution by applying Matlab’s backslash and the fast solver presented in this work. In our case, Matlab’s backslash uses the sparse direct solver UMFPACK for computing the solution of the sparse linear systems (10) and (11), see [4]. Besides the computational times of applying Matlab’s backslash and the FFT based direct solver also the computational times needed in the initialization process are presented. However, the initialization processes differ between the two- and three-dimensional problems, which has already been addressed in Sections 4 and 5. Summing up, in the two-dimensional case the LU decompositions are computed for the matrices and defined in (37) and (39), respectively, (see Subsection 4.1.2), whereas the three-dimensional problem is not solved using LU decomposition in three dimensions. In this case, the action of the inverses of and defined in (62) and (61), respectively, are computed by applying the fast solver in two dimensions times (see Subsection 5.1.2). This approach leads to a more efficient implementation in three dimensions.

The largest numerical experiments were computed in three dimensions with 135 005 697 unknowns. Note that this size is already too large regarding the computational memory in Matlab for setting up the whole matrix matrix as defined in (11), which would be needed in order to apply Matlab’s backslash. However, the fast solver presented in this work still computes the solution in a reasonable amount of time without using too much computational memory, since the fast solver does not need to form the whole matrix , but only of the left upper block (22) denoted by of the matrix defined in (21).

###### Remark 3.

All numerical experiments presented in this section were performed on a laptop with Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz processor and 16 GB 2133 MHz LPDDR3 memory.

### 6.1. Numerical results for the two-dimensional case

For the first set of numerical experiments, we choose . The CPU times in seconds for the initialization process as well as the application of the fast solver and Matlab’s backslash for comparison are presented in Table 1 for different values of . The largest numerical experiments in two dimensions have been computed for with 4 198 401 unknowns. As can be observed in Table 1, the CPU times applying Matlab’s backslash are about the same size as for the initialization, whereas the CPU times of the FFT based fast solver grow with .

Table 2 presents the CPU times in seconds for various combinations of and in the two-dimensional case. More precisely, we compare here the computational times with respect to a large difference in the magnitude of and . Both cases are discussed and . As one can observe in Table 2 it does not influence the computational times of the solver whether or . For and , the fast solver’s CPU time was 0.10 seconds, and for and , it was 0.12 seconds. However, if is very large, it influences the computational times of the initialization process, more precisely, of solving the generalized eigenvalue problem (23) performed by applying Matlab’s function eig, which requires the full representations of the matrices and , which are otherwise stored as sparse matrices. These increased computational times can be observed in the last two columns of Table 2. We note here that the solution method described in [10] would be faster for solving this generalized eigenvalue problem.

### 6.2. Numerical results for the three-dimensional case

In the three-dimensional case, we chose for all for the first set of numerical experiments again. The CPU times in seconds for the initialization process as well as the application of the fast solver and Matlab’s backslash for comparison are presented in Table 3 for different values of . As can be observed from Table 3, the computational times for applying Matlab’s backslash rise very quickly. For , its CPU time was 608.12 seconds, whereas the fast solver’s CPU time was only 2.02 seconds. Then already for with 2 146 689 unknowns, Matlab runs out of memory when the matrix is formed which makes it impossible to solve the problem (9) by applying Matlab’s backslash. However, since the FFT based fast solver only needs the setting up of the left upper block of the matrix and not of the entire matrix , the solver can be applied solving the problem (9) up to size with 135 005 697 unknowns (Remark 3). Moreover, we can observe from Table 3 that the CPU times applying Matlab’s backslash are already much larger for than for the initialization process and the application of the fast solver. Table 3 shows that the CPU times of applying the FFT based fast solver are nearly optimal order of complexity .

Table 4 presents the CPU times in seconds for various combinations of , , in the three-dimensional case similar to Table 2 for two dimensions comparing computational times with respect to a large difference in the magnitudes of . Again we discuss all possible combinations, e.g., or . The third and fourth column of Table 4 comparing and show that the CPU times in applying Matlab’s backslash are similar (1.06 and 0.96 in the third and fourth column, respectively) and also applying the fast solver (0.22 and 0.25 in the third and fourth column, respectively). However, note that the FFT based direct solver is already four times faster than Matlab’s backslash for this set of values. Also, the sixth, seventh and eighth column present similar numerical results, but Matlab’s backslash cannot be applied anymore for these combinations, since Matlab runs out of memory when forming the matrix . Note that the computational times for the initialization process needed to apply the FFT based direct solver take only around 2 seconds for these sets of values.