# FMM-LU: A fast direct solver for multiscale boundary integral equations in three dimensions

We present a fast direct solver for boundary integral equations on complex surfaces in three dimensions, using an extension of the recently introduced strong recursive skeletonization scheme. For problems that are not highly oscillatory, our algorithm computes an LU-like hierarchical factorization of the dense system matrix, permitting application of the inverse in O(N) time, where N is the number of unknowns on the surface. The factorization itself also scales linearly with the system size, albeit with a somewhat larger constant. The scheme is built on a level-restricted, adaptive octree data structure and therefore it is compatible with highly nonuniform discretizations. Furthermore, the scheme is coupled with high-order accurate locally-corrected Nyström quadrature methods to integrate the singular and weakly-singular Green's functions used in the integral representations. Our method has immediate application to a variety of problems in computational physics. We concentrate here on studying its performance in acoustic scattering (governed by the Helmholtz equation) at low to moderate frequencies.

• 1 publication
• 8 publications
• 7 publications
• 12 publications
08/30/2019

### Solution of Stokes flow in complex nonsmooth 2D geometries via a linear-scaling high-order adaptive integral equation scheme

We present a fast, high-order accurate and adaptive boundary integral sc...
07/03/2020

09/29/2019

### A fast boundary integral method for high-order multiscale mesh generation

In this work we present an algorithm to construct an infinitely differen...
07/24/2020

### An accelerated, high-order accurate direct solver for the Lippmann-Schwinger equation for acoustic scattering in the plane

An efficient direct solver for solving the Lippmann-Schwinger integral e...
02/09/2022

### On the hyper-singular boundary integral equation methods for dynamic poroelasticity: three dimensional case

In our previous work [SIAM J. Sci. Comput. 43(3) (2021) B784-B810], an a...
07/16/2020

### An alternative extended linear system for boundary value problems on locally perturbed geometries

This manuscript presents a new extended linear system for integral equat...
07/27/2020

### Zeta Correction: A New Approach to Constructing Corrected Trapezoidal Quadrature Rules for Singular Integral Operators

We introduce a new quadrature method for the discretization of boundary ...

## 1 Introduction

Integral equation formulations lead to powerful methods for the solution of boundary value problems governed by the partial differential equations (PDEs) of classical mathematical physics. The corresponding free-space Green’s functions are well-known, and in the absence of source terms, boundary integral formulations are restricted to the surface of the domain, thereby reducing the dimensionality of the problem. There has been a resurgence of interest in these methods over the past few decades because of the availability of fast algorithms for applying the dense matrices which arise after discretization. These include hierarchical schemes such as fast multipole methods (FMMs), panel clustering methods,

-matrix methods and multigrid variants, as well as FFT-based schemes such as the method of local corrections, pre-corrected FFT methods, etc. The literature on such methods is vast and we point only to a few review articles and monographs [9, 36, 5, 50, 46]. Assuming

is the number of degrees of freedom used in sampling the surface and the corresponding charge and/or dipole densities, and

is the system matrix obtained after the application of a suitable quadrature rule to the chosen integral representation, these algorithms permit

to be applied to a vector in

or time. When the linear system is well-conditioned, this generally allows for the rapid iterative solution of large-scale problems in essentially optimal time.

Despite these advances, there are several tasks where iterative solvers are not satisfactory. The obvious case is in problems which they themselves are ill-conditioned (such as high-frequency scattering problems). An equally important area where direct methods are preferred is in inverse problems, or in any other setting where one needs to solve the same system matrix with multiple right-hand sides. Finally, fast direct solvers are very useful when exploring low-rank perturbations of the geometry and, hence, the system matrix. Updating the solution in such cases requires only a few applications of or a fast update of the inverse itself [31, 53].

In the last few years, several algorithmic ideas have emerged which permit the construction of a compressed approximation of at a cost of the order or , for modest . In this paper, we describe such a scheme, which we refer to as the FMM-LU method. It uses FMM-type hierarchical compression strategies to rapidly compute an -factorization of the system matrix. In this manuscript, we apply the method to adaptive, multiscale surface discretizations of boundary integral equations coupled to high-order accurate quadratures; this leads to efficient, high-fidelity solvers for geometrically intricate models. We will concentrate here boundary value problems for the Helmholtz equation at low to moderate frequencies, but the scheme is equally applicable to many other families of boundary value problems (e.g. Laplace, Stokes, Maxwell, etc.).

The paper is organized as follows: Section 2 provides a brief derivation of the boundary integral equations of interest, and an overview of the Nyström method for their discretization. in Section 3 we review the notion of strong skeletonization for matrices, i.e. the algebraic analog of FMM-based compression [23, 55, 33, 34]. This was used by Minden, Ho, Damle and Ying [54] to develop the strong recursive skeletonization factorization method (RS-S), of which our algorithm is a variant. We then present in Section 4 efficient quadrature coupling and our modifications of strong skeletonization, leading to the FMM-LU scheme. (Closely related are the inverse FMM approach [2, 25] and the matrix formalism [8, 7].) Section 5 contains several numerical examples demonstrating the asymptotic scaling of our algorithm, as well as the accuracy obtained in solving realistic boundary value problems. In Section 6, we provide some guidelines regarding the current generation of fast direct solvers, discuss the limitations of the present scheme, and speculate about potential avenues for future improvement.

###### Remark 1.

The algorithm described in the present paper has several new features. First, it was designed with complex, multiscale geometries in mind as well as high-order accurate surface discretization and quadrature. We make use of an adaptive octree data structure to keep track of analytical estimates concerning the maximal rank of well-separated blocks which could be at different levels in the tree hierarchy. This affects the manipulation of Schur complements in the

-factorization, and also requires the development of quadrature machinery which permits on-the-fly extraction of near field matrix elements when targets are on-or-close-to a surface triangle where accurate approximation of a layer potential requires some care. The key ingredient here is the use of generalized Gaussian quadrature [15, 14]

for self-interactions on surface panels and precomputed hierarchical interpolation matrices to accelerate adaptive integration in the near field.

### Related work

Fast direct solvers for integral equations have their origin in the solution of two-point boundary value problems in one dimension [35, 58, 1, 19, 64]. There is an extensive literature in this area and the methods are, by now, quite mature. In higher dimensions, a major distinction in fast algorithms for both hierarchically compressible matrices and their inverses concerns which blocks are left intact and which are subject to compression. For the sake of simplicity, let us assume an octree data structure has been imposed on a surface discretiztion and refined uniformly. Let us also assume that the unknowns are ordered so that the points in each (non-empty) leaf node, denoted by , are contiguous. If all block submatrices corresponding to interactions between leaf nodes and , with , are (hierarchically) compressed, using the method of [22, 45] leads to the recursive skeletonization methods of [29, 41, 51, 43, 49].

These methods are, more or less, optimal for boundary integral equations in two dimensions but not in three dimensions. In three dimensions, the interaction between neighboring leaf nodes leads to relatively high-rank block matrices. In the literature, this is referred to as weak admissibility, weak skeletonization, or -matrix compression [38, 40, 9]. To overcome this obstacle, borrowing from the language of fast multipole methods, one can instead choose to compress only those block matrices  corresponding to well-separated interactions, which are known a priori to be low-rank to any fixed precision (based on an analysis of the underlying PDE or integral equation). Well-separated here means that leaf nodes and are separated by a box of the same size. In the literature, this is referred to as strong admissibility, strong skeletonization, mosaic-skeletonization, or -matrix compression [2, 8, 39, 47, 54, 25].

Fast solvers for boundary integral equations in three dimensions using weak skeletonization without additional improvements have led to solvers whose compression/factorization costs are of the order , with subsequent applications of the inverse requiring only  work. Fortunately, the implicit constants associated with these asymptotics are relatively small [41]. A variety of improvements have been developed to reduce the factorization costs by introducing auxiliary variables in a geometrically-aware fashion [42, 24].

Strong skeletonization-based schemes are more complicated to handle in terms of data structures and linear algebraic manipulation, but have significant advantages. First, a large body of work by Hackbusch and collaborators has led to a rigorous algebraic theory for hierarchically compressible -matrices and fast solvers with linear or quasilinear complexity [6, 10, 7]. The constants implicit in the asymptotic scaling with this approach, however, are quite large. To improve performance, the inverse fast multipole method (IFMM) was introduced by Ambikasaran, Coulier, Darve, and Pouransari [2, 25], and the strong skeletonization method was introduced by Minden, Ho, Damle and Ying [54], which, as noted above, we will largely follow here.

Finally, we should note that an early fast solver for volume integral equations in two dimensions was presented in [20] and that related fast solvers have been developed using direct discretization of the PDE rather than integral equations [63, 28, 57]. Closely related to our work here is that of [59] which makes use of strong admissibility in the sparse matrix setting.

## 2 Problem setup

Let  be a bounded region in  with smooth boundary . Given a kernel , consider the following integral equation for ,

 ασ(x)+∫ΓK(x,y)σ(y)da(y)=f(x). (2.1)

Here is a given function on the boundary, is an unknown function to be determined, and a constant. Such integral equations (and their analogs for the vector-valued case) naturally arise in the solution of boundary value problems for the Laplace, Helmholtz, Yukawa, Maxwell, and Stokes equations, just to name a few. In these settings, the kernel is typically related to the Green’s function (or its derivatives) of the corresponding partial differential equation.

For example, consider the exterior Dirichlet boundary value problem for the Helmholtz equation with wave number and boundary data ; the Helmholtz potential defined in satisfies

 (Δ+k2)u =0 in R3∖Ω, (2.2) u =f on Γ, limr→∞r(∂u∂r−iku) =0.

Let denote the free-space Green’s function for the Helmholtz equation given by

 Gk(x,y)=eik|x−y|4π|x−y|, (2.3)

and let

 K(x,y)=(n(y)⋅∇yGk(x,y))−ikGk(x,y), (2.4)

where  is the outward normal at . The above kernel is the kernel of what is known as the combined field representation. Suppose that satisfies (2.1) with and as defined above, then the potential given by

 u(x)=∫ΓK(x,y)σ(y)da(y),x∈R3∖Ω, (2.5)

is the solution to the exterior Dirichlet problem for the Helmholtz equation in (2.2). We will often denote the above formula more succinctly as , where  is the layer potential operator along  with kernel . A more detailed derivation and associated proofs for this fact can be found in [44], for example.

The integral equation creftype 2.1 can be discretized with high-order accuracy using (for example) a suitable Nyström method [3] resulting in the following linear system

 ασi+N∑j=1,j≠iK(xi,xj)σjwij=f(xi). (2.6)

Here and are the quadrature nodes and weights, respectively, while is an approximation to the true value . The kernel is often singular at the origin and smooth away from the origin. When using high-order discretization methods, it is often possible to use quadrature weights independent of the target location, except possibly for a local neighborhood of targets close to each source, i.e. for all , for a suitably defined well-separated region . Such quadratures are often referred to as locally-corrected quadratures for obvious reasons. Many existing quadrature methods such as coordinate transformation methods, singularity subtraction methods, quadrature by expansion, Erichsen-Sauter rules, and generalized Gaussian methods combined with adaptive integration can be used as locally-corrected quadratures [18, 17, 48, 27, 56, 60, 61, 67, 13, 14, 12, 15, 27, 32, 62]. In this setting, the discrete linear system creftype 2.6 then takes the form

 ασi+N∑j=1xj∈Far(xi)K(xi,xj)σjwj+N∑j=1xj∉Far(xi)K(xi,xj)σjwij=f(xi). (2.7)

We further re-scale the above equation so that the unknowns are for which the discrete linear system becomes

 α~σi+N∑j=1xj∈Far(xi)√wiK(xi,xj)√wj~σj+N∑j=1xj∉Far(xi)√wiK(xi,xj)wij√wjσj=√wif(xi). (2.8)

The scaling by the square root of the weights in the above equation formally embeds the discrete solution  in and results in the discretized operators (including sub-blocks of the matrices) to have norms and condition numbers which are close to (and converge to) those of the continuous versions of the corresponding operators [11]. In the following work, we restrict our attention to the fast solution of linear systems of the form creftype 2.8.

###### Remark 2.

In many locally-corrected quadrature methods, in order to accurately compute far interactions the underlying discretization may require some additional oversampling to meet an a priori specified precision requirement. See [32] for a thorough discussion. In short, let , , , for , denote the set of oversampled discretization nodes, the corresponding quadrature weights for integrating smooth functions on the surface, and the interpolated density at the oversampled discretization nodes, respectively. For triangulated/quadrangulated surfaces, these can be obtained via local polynomial interpolation of the chart information and the density. The linear system (without the square root scaling) then takes the form

 ασi+M∑j=1sj∈Far(xi)K(xi,sj)^σj^wj+N∑j=1xj∉Far(xi)K(xi,xj)σjwij=f(xi). (2.9)

The need for oversampling often arises when using low-order discretizations. The extension of our approach, and other existing approaches, to discretizations which require oversampling for far interactions is currently being pursued and the results will be reported at a later date. The main ideas are similar, but constructing an efficient algorithm requires addressing several implementation details.

## 3 Strong skeletonization factorization

The basic structure of the FMM-LU factorization is closely related to the strong skeletonization factorization (RS-S) introduced in [54]. In this section, we briefly review key elements of RS-S. To the extent possible, we use the same notation as in [54] to clearly highlight our modifications to their approach.

Suppose that all the discretization points , are contained in a cube . We will superimpose on  a hierarchy of refinements as follows: the root of the tree is itself and defined as level 0. Level is obtained from level recursively by subdividing each cube at level into eight equal parts as the number of points in that cube at level is greater than some specified parameter . The eight cubes created by subdividing a larger cube are referred to as children of the parent cube. When the refinement has terminated, is covered by disjoint childless boxes at various levels of the hierarchy (depending on the local density of the given points). These childless boxes are referred to as leaf boxes. For any box in the hierarchy, the near field region of consists of other boxes at the same level that touch , and the far field region is the remainder of the domain.

For simplicity, we assume that the above described octree satisfies a standard restriction – namely, that two leaf nodes which share a boundary point must be no more than one refinement level apart. In creating the adaptive data structure as described above, it is very likely that this level-restriction criterion is not met. Fortunately, assuming that the tree constructed to this point has leaf boxes and that its depth is of the order , it is straightforward to enforce the level-restriction in a second step requiring  effort with only a modest amount of additional refinement [26].

###### Remark 3.

The near field region and far field region of a box in the octree hierarchy is almost always different from the near region and far regions of source and target locations associated with the locally-corrected quadrature methods in (2.8). In practice, for almost all targets, the near field region of the locally-corrected quadrature corrections is a subset of near field region of the leaf box containing the target. In the event that this condition is violated, the RS-S algorithm of [54] would require additional modifications to handle the associated quadrature corrections discussed in Section 4.3.

### 3.1 Strong skeletonization

The idea of strong skeletonization was recently introduced in [54] and extends the idea of using the interpolative decomposition to globally compress a low-rank matrix to the situation in which only a particular off-diagonal block is low-rank. In the context of solving a discretized boundary integral equation, the off-diagonal low-rank block is a result of far field interactions via the kernel (e.g. Green’s function) of the integral equation. We briefly recall the standard interpolative decomposition [45] and the form of strong skeletonization as presented in [54].

###### Lemma 1 (Interpolative decomposition).

Given a matrix with rows indexed by and columns indexed by , an -accurate interpolative decomposition (ID) of is a partitioning of into a set of so-called skeleton columns denoted by and redundant columns , and a construction of a corresponding interpolation matrix such that

 ∥AIR−AIST∥≤ε∥AIJ∥, (3.1)

i.e. the redundant columns are well-approximated, to the required tolerance , by linear combinations of the skeleton columns. Equivalently, after an application of a permutation matrix  such that , the interpolative decomposition results in the -accurate low-rank factorization with the error estimate,

 ∥AIJP−AIS[TI]∥≤ε∥AIJ∥. (3.2)

The norms above can be taken to be the standard induced spectral norm.

The ID is most robustly computed using the strong rank-revelaing QR factorization of Gu and Eisenstat [37]. However, in this work we use a standard greedy column-pivoted QR [52]. While both algorithms have similar computational complexity when , i.e. , the greedy column pivoted QR tends to have better computational performance.

Next, consider a three-by-three block matrix  and suppose that is a partition of the index set, with , such that and , i.e.

 A=⎡⎢⎣AIIAIJ0AJIAJJAJK0AKJAKK⎤⎥⎦. (3.3)

Assuming that is invertible, then using block Gaussian elimination the matrix can be decoupled from the rest of the matrix as follows

 L⋅A⋅U=⎡⎢⎣I00−AJIA−1III000I⎤⎥⎦⋅A⋅⎡⎢⎣I−A−1IIAIJ00I000I⎤⎥⎦=⎡⎢⎣AII000SJJAJK0AKJAKK⎤⎥⎦, (3.4)

where is the only non-zero block of the matrix that has been modified. The matrix is often referred to as the Schur complement update.

Suppose now that the matrix arises from the discretization of an integral operator. Further suppose that is a set of indices of points contained in a box in the octree, that are the indices of the set of points contained in the near field region of , and that are the indices of points contained in the far field region of . Using an appropriate permutation matrix , we can obtain the following block structure for :

 PTAP=⎡⎢⎣ABBABNABFANBANNANFAFBAFNAFF⎤⎥⎦. (3.5)

The blocks corresponding to interactions between points in and its far field, i.e. and , are assumed to be numerically low-rank and can therefore be compressed using interpolative decompositions. As before, we partition into a collection of redundant points and a set of skeleton points such that, up to an appropriate permutation of rows and columns (which can be absorbed into the permutation matrix above), we have

 [AFBATBF]=[AFRAFSATRFATRS]≈[AFSATSF]⋅[TI]. (3.6)

Note that in the relationship above the same interpolation matrix  is used to compress both  and . Clearly this is possible when the kernel of the associate integral equation is symmetric. If the kernel is not symmetric, empirically the same matrix  can be used at the cost of a small increase in . Using the same matrix  simplifies various subsequent linear algebra manipulations, but strictly speaking, is not necessary. Different interpolation matrices can be used for each of  and . Using different compression matrices will result in smaller skeleton sets at the cost of more linear algebraic bookkeeping.

Further splitting the indices in creftype 3.5, and combining with creftype 3.6, we get

 PTAP=⎡⎢ ⎢ ⎢ ⎢⎣ARRARSARNARFASRASSASNASFANRANSANNANFAFRAFSAFNAFF⎤⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢⎣ARRARSARNTTASFASRASSASNASFANRANSANNANFAFSTAFSAFNAFF⎤⎥ ⎥ ⎥ ⎥⎦. (3.7)

Since the redundant rows and columns of the interaction between points in and can be well-approximated by the corresponding rows and columns of the interactions between points in and , we can decouple points from the far field points as follows. Let , denote the elimination matrices defined on the partition as

 (3.8)

Then

 EPTAPF=⎡⎢ ⎢ ⎢ ⎢⎣XRRXRSXRN0XSRASSASNASFXNRANSANNANF0AFSAFNAFF⎤⎥ ⎥ ⎥ ⎥⎦, (3.9)

where the notation is used to indicate blocks of the above matrix whose entries are different from the entries of the original matrix in (3.7). We also note that the matrices and are, in fact, block diagonal when viewed over the partition , and therefore the above factorization can be considered a type of elimination.

Now, assuming again that the block is invertible, we can use it as a pivot block to completely decouple the redundant indices from the rest of the problem as follows. Let and denote the block upper and lower triangular elimination matrices given by

 L=⎡⎢ ⎢ ⎢ ⎢⎣I−XSRX−1RRI−XNRX−1RRII⎤⎥ ⎥ ⎥ ⎥⎦,U=⎡⎢ ⎢ ⎢ ⎢⎣I−X−1RRXRS−X−1RRXRNIII⎤⎥ ⎥ ⎥ ⎥⎦. (3.10)

Then

 Z(A;B) =LEPTAPFU (3.11) =⎡⎢ ⎢ ⎢ ⎢⎣XRR0000XSSXSNASF0XNSXNNANF0AFSAFNAFF⎤⎥ ⎥ ⎥ ⎥⎦.

The matrix in creftype 3.11 is of the form creftype 3.3. This process of decoupling the redundant degrees of freedom of from the rest of the problem is referred to as the strong skeletonization of  with respect to . The resulting matrix is denoted by , as above.

Since the matrices and are block diagonal with respect to the partition , equation (3.11) can be re-written in terms of a block -like factorization of the original matrix :

 (3.12)

For notational convenience, let and denote the left and right skeletonization operators defined by

 V=PE−1L−1,W=U−1F−1PT, (3.13)

with the understanding that these matrices will be stored and used in factored form for computational efficiency. Moreover, the matrices are block triangular matrices with identities on the diagonal and hence their inverses can be trivially computed by toggling the sign of the nonzero off-diagonal blocks. With this shorthand, we can obtain an even more compact representation of  given by

 Z(A;B)=V−1AW−1. (3.14)
###### Remark 4.

The elimination matrices and are referred to as and in [54]. However, for clarity of identifying the block lower and upper triangular structure in and with respect to the partition , we have renamed and .

### 3.2 Low-rank approximation using proxy surfaces

For any box (particularly on the finest level of the octree), there are typically points in its far field region. Due to this fact, any algorithm which requires dense assembly of all blocks  and  for all boxes  in order to compute the interpolative decompositions in creftype 3.6 would result in an  computational complexity for constructing a compressed representation of . In this section, we briefly discuss an indirect, efficient, and accurate approach for constructing the interpolative decomposition of  and . More specific implementation details on this accelerated compression is given in Section 4. In what follows, in a slight abuse of notation, we will associate points in the original discretization of the boundary integral equation  with their index value . We will therefore say, interchangeably for clarity, that  or  or .

In order to achieve the linear-time speedup in compression of these off-diagonal blocks, we must make two additional assumptions regarding the discrete linear system creftype 2.8:

1. For each box , there exists a partition of the far field index set , with , such that for all and , the corresponding matrix entries in are given by . We also assume that the converse holds: for all and , the corresponding matrix entries in are given by . This is equivalent to assuming that for these well-separated regions  and , a smooth source-dependent quadrature rule can be used to accurately discretize the underlying integral equation. One could, in principle, require this assumption to hold for all points in . However, due to Schur complement updates arising from the strong skeletonization procedure, as we will see, this condition will be violated for some points in . This issue is discussed in further detail in Section 3.3.

2. The kernel is a linear combination of the Green’s function of a homogeneous elliptic partial differential equation (denoted by ) and its derivatives. In particular, if  denotes a smooth surface embedded in  then the interaction kernel is then assumed to satisfy the following conditions:

• For , the integrals of the kernel along  should be interpreted in a principal value sense.

• For , satisfies the underlying PDE for all . This assumption is true for both the Helmholtz single and double layer potentials, for example.

The above assumptions will be used in constructing efficient compression using Green’s identities and what are known as proxy surfaces.

We now give a brief overview of this compression procedure for the block . This block of the discretized integral equation maps charges located in  to potentials located in . In particularly, charges located in  induce a potential – outside of  – which satisfies the underlying homogeneous elliptic PDE, and therefore, this potential can be represented by an equivalent charge density distributed along a proxy surface  which encloses  but does not include points in . This effectively means that the block  can be split and factored as

 AFB ≈[AQB√DPMFγKγB√DB] (3.15)

Specific details of computing the above factorization are contained in Section 4.1, as well as a justification of such a factorization. The implication of the above factorization is the following: since both  and  are , and as will be shown later on  can be discretized using a modest number of points proportional to the size of  in the oscillatory regime, the matrix on the right in (3.15) has dimensions which are . This means that an interpolative decomposition on the columns of this matrix can be computed with  work. Denoting this compression as

 (3.16)

we have effectively computed a low-rank approximation of  at a cost of only  flops. The matrix  above is a permutation matrix that appropriately re-orders the columns according to those which have been chosen as skeleton columns and those which are redundant columns.

The compression of the dual matrix  can be done in a nearly identical manner after observing that all potentials which satisfy the underlying PDE in the box  can be represented using a charge density lying on the same proxy surface , and therefore an interpolative decomposition on the rows of this matrix can be computed in  time. More details, and a justification, are found in Section 4.1. Lastly, these two compressions can be done simultaneously in order to generate the same interpolation “” matrix for both  and  at a modest increase in rank. While not strictly necessary, this somewhat simplifies the subsequent factorization procedure and associated storage costs.

###### Remark 5.

In [42], the authors discuss compression via the proxy method applied to integral equations of the form

 α(x)σ(x)+b(x)∫ΓK(x−y)c(y)σ(y)da(y)=f(x). (3.17)

In particular, when constructing , they emphasize the need for including both matrices  and , where and are diagonal matrices with entries and , for , respectively. Handling non-uniform quadrature weights is equivalent to compressing a discretized version of creftype 3.17 with , , and .

### 3.3 The RS-S Algorithm

In this section, we provide a brief summary of the RS-S algorithm. We refer the reader to the original manuscript for a detailed description [54]. The RS-S algorithm proceeds by sequentially applying the strong skeletonization procedure to each box in the level-restricted tree. The boxes in the tree hierarchy are traversed in an upward pass, i.e. boxes at the finest level will be processed first followed by boxes at subsequent coarser levels. After each application of the strong skeletonization procedure, only the skeleton points associated with each box are retained for further processing. These will be referred to as the active degrees of freedom. Even when constructing near field and far field index sets of a box, only the active degrees of freedom contained in the respective regions are retained (other degrees of freedom have been deemed redundant and decoupled from the system). For boxes at coarser levels, the active degrees of freedom for each box is the union of the active degrees of freedom of each of its children boxes. After regrouping the active indices from all the children of boxes at coarser levels, the process of strong skeletonization can be applied to those boxes as well. This process is continued until there are no remaining active degrees of freedom in the far field region of any box at a given level or the algorithm reaches level 1 in the tree structure (for which the statement is trivially true since there are no boxes in the far field region of any box).

Similar to before, let and denote the left and right skeltonization operators associated with box defined in creftype 3.13. Suppose that the multi-level RS-S algorithm of [54] terminates at box ; let  denote the remaining active degrees of freedom in the domain. Let denote the permutation which orders the points in in a contiguous manner. Then the RS-S factorization of the matrix takes the form

 (3.18)

Here is the block diagonal matrix given by

 D=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣XR1R1⋱XRMRMABtBt⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, (3.19)

where are the redundant indices in . An approximate factorization of is readily obtained from the formula above and is given by

 A−1≈(W−11⋯W−1M)PtD−1PTt(V−1M⋯V−11). (3.20)

In the event that the matrix is positive definite, one can also compute the generalized square-root and the log-determinant of the matrix using the factorization above.

One crucial detail still remains to be resolved for the construction of this factorization: how best to obtain a partition of the active far field indices for each box such that the first condition in Section 3.2 is satisfied. In particular, it must be shown that even after several recursive applications of the strong skeletonization procedure the matrix entries corresponding to are still given by and are unaffected by Schur complements introduced during the elimination procedure. That this is even possible is not immediately obvious since the Schur complement update obtained by applying the strong skeletonization procedure to one box might potentially affect the interaction between a different box and its far interactions. Recall that the Schur complement constructed during the strong skeletonization procedure applied to box updates the interaction between points in the near field region of . Referring to Figure 5 in [54], using strong skeletonization to compress the operator with respect to points in updates the entries of the matrix corresponding to interactions of points contained in and since both of them are in the near field region of . However, is in the far field region of , and thus the entries of  (where denotes the far field region of ) will not correspond to the original matrix entries. However, all such interactions can be included in the partition  of .

A systematic way of addressing this issue and constructing the partition was presented in [54]. Suppose that is a sphere enclosing the box with radius equal to , where is the side-length of box . Let denote the set of indices in such that if then . This choice ensures that all the matrix entries in and its transpose are always of the form and , respectively. Furthermore, this choice also ensures that after merging the active degrees of freedom from boxes at finer levels, the partition still satisfies the constraint. These result are proven in [54] (Theorem 3.1, and Corollary 3.2 respectively). The additional buffer of placing the proxy surface separated by two boxes at the same level is crucial in proving those results.

Under mild assumptions on the ranks of the interactions between a box and its far field, the cost of computing the RS-S factorization , the cost of applying the compressed operator and its inverse , and the memory required () in the algorithm all scale linearly in the number of discretization points independent of the ambient dimension of the data. In particular, if the ranks of the far field blocks are for boxes on level  where , then with the implicit constant being a polynomial power of , where is the requested tolerance. These conditions are typically satisfied for discretizations of boundary integral equations for non-oscillatory problems.

###### Remark 6.

A naive implementation of looping over all Schur complements for updating matrix entries could result in an complexity for constructing the RS-S factorization. Letting  denote the Schur complement obtained from eliminating the redundant degrees of freedom in box , one can precompute pairs corresponding to Schur complements which would impact the matrix entries  and its transpose. Using this list, one can avoid having to loop over all Schur complement blocks and therefore retain the complexity for constructing the RS-S factorization.

## 4 Quadrature coupling in multiscale geometries

The primary source of error in constructing the RS-S factorization is in the compression of the matrix corresponding to far interactions (and its dual) using the proxy method [66, 65]. Recall that after using the proxy method for compression as discussed in Section 3.2, the interpolative decomposition is computed for a matrix whose entries are kernel evaluations . In the following section, we present three modifications to the standard RS-S algorithm of [54]:

1. Properly formulating the proxy surface compression procedure when the kernel  is obtained using the combined field integral representation;

2. Determining a properly sampled discretization of the proxy surface  using  points which is based on the box size in the tree hierarchy, so as to sufficiently sample the operator when the kernel  is oscillatory without excessive oversampling; and

3. Finally, constructing a partition capable of handling near field quadrature corrections for multiscale geometries.

We first turn to the accelerated proxy compression of matrix blocks obtained when using a combined field representation for the solution to a Dirichlet problem.

### 4.1 Proxy compression for combined field representations

As discussed in the introduction, in order to solve the exterior Dirichlet scattering problem for the Helmholtz equation, we employ an integral equation formulation whose kernel is given by the combined field potential :

 K(x,y)=(n(y)⋅∇yGk(x,y))−ikGk(x,y). (4.1)

After inverting the resulting integral equation

 12σ+KΓσ=f,on Γ, (4.2)

the solution to the boundary value problem is given as

 (4.3)

Our goal in the following is to justify the method of proxy compression, i.e. to show that the appropriate row or column spaces of submatrices  and , respectively, are spanned by proxy kernel matrices which only involve the kernel  evaluated at proxy points and the relevant set of sources or targets.

#### 4.1.1 Outgoing skeletonization

To this end, for a given box , a region and its boundary  can be chosen such that and that . See Figure 1 for a geometric setup of the situation. We therefore, by our earlier assumptions, have that and that satisfies the underlying PDE (i.e. Helmholtz equation) in  for each . For this box , now consider what we will call the associated exterior proxy boundary value problem:

 (Δ+k2)v =0, in R3∖D, (4.4) v =KBτ, on γ, limr→∞r(∂v∂r−ikv) =0,

where  is Helmholtz potential due to any function  supported on the piece of  contained inside box , i.e.

 KBτ(x)=∫Γ∩BK(x,y)τ(y)da(y),for x∈γ. (4.5)

The above boundary value problem is clearly an exterior Dirichlet problem for the Helmholtz equation, and can be solved using an integral equation method by first representing  as  for some unknown density  defined along . (Note that in practice,  is usually taken to be the surface of a sphere but there is no mathematical reason why this must be the case.)

The solution to the boundary value problem (4.4) is unique and can be formally obtained, as was for  in the introduction, as

 v=Kγ(12I+Kγγ)−1KBτ, (4.6)

where it is understood that the inverse operator in the middle is a map from  to , i.e. that  is a map from  to  and is interpreted in the proper principal value sense. Finally, consider a discretization and  embedding [11] of the above form of  which maps sources at a collection of  with strengths  to their potentials  at target locations :

 √DPv (4.7) =√DPMPγKγB√DB(√DBt),

where  denotes the elementwise Hadamard product of the kernel matrix  with a matrix  of quadrature corrections and the matrices  contain smooth quadrature corrections along their diagonal. Upon further inspection, however, assuming that the discretization of the above integral equation along  was suitably accurate and that the source and target locations  and  were chosen to be the same as in the original discretization of , we have that it must be true that

 KPB=MPγKγB (4.8)

due to the uniqueness of the exterior Helmholtz Dirichlet problem (i.e. if the boundary data in (4.4) was chosen to agree with that induced by sources located at discretization points , then the solution  at  must agree with the potential generated via multiplication by ). Therefore, due to the partition of the far field , it must also be true that (after a suitable permutation of rows)

 (4.9)

A note on dimensions of the above matrices: recall that, by assumption,  and therefore the bulk of the discretization nodes in  are contained in , i.e. that . This implies that the matrix on the very right above has dimensions which are , where  denotes the number of points used to discretize the proxy surface. Choosing  is discussed in the following section. Next we detail how to use the above factorization to compute a column skeletonization of the original submatrix .

First, a column skeletonization of the matrix on the right in (4.9) is performed which yields a decomposition of  into redundant and skeleton points, i.e. :

 (4.10)

where  is a permutation of the matrix performing a reordering of columns according to the split , and in our continued abuse of notation, it is implied that this is an -accurate approximation. Inserting (4.10) into (4.9), we have

 AFB (4.11)

Again, due to the uniqueness of the exterior Helmholtz boundary value problem in (4.4), it must be true that , and therefore we have computed a low-rank approximation to the original matrix  since

 AFB =[AQS√DPKPS√DS][TSRI]PB (4.12) =[AQSAPS][TSRI]PB =AFS[TSRI]PB.
###### Remark 7.

In the publicly available software associated with [54], the authors use a slightly simpler method to obtain interpolative decompositions of matrices such as  via proxy compression. It appears that in practice, their method doesn’t seem to affect the accuracy of the computed solution on simple geometries for requested tolerances of  or greater. Our discussion above was meant to be a fully self-contained treatment of proxy compression, as was implemented in our solver.

#### 4.1.2 Incoming skeletonization

The above algorithm for compressing  can immediately be extended to the row compression of the dual block  by considering instead the associated interior proxy boundary value problem (analogous to (4.4)). We refer to this procedure as calculating the incoming skeleton nodes for a box . In this case, unfortunately, it may be the case that the wavenumber

is such that it is an interior Dirichlet eigenvalue of the Laplace operator inside

, and therefore (4.4) is not uniquely solvable. However, this doesn’t matter for our purposes since all we hope to capture in the row compression is an accurate approximation to the row space of the associated matrix block. Since only the existence of the matrix  is used – it is never explicitly formed or is any piece of it inverted – the calculation is nearly identical and the row skeletonization can be computed by only forming and approximating the matrices  and .

#### 4.1.3 Simultaneous skeletonization

Lastly, as mentioned in Section 3.2, it is sometimes useful to choose identical incoming and outgoing skeleton points for a particular box. If this is desired, then a single interpolative decomposition can be performed:

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣AQBATBQKγB√DBKTBγ√DTB⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣AQSATSQKγS√DSKTSγ√DTS⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦[TSRI]PB. (4.13)

It is easily verified that the matrix  above compresses both  and . In practice, the rank used in the above low-rank approximation is slightly larger than if the blocks had been compressed separately (but, the subsequent bookkeeping and factorization is slightly simpler and uses less storage).

### 4.2 Proxy surface discretization

Recall that when using the proxy surface compression technique, the matrix  in (4.8) must accurately capture the subspace of potentials supported on the proxy surface, and likewise,  must accurately capture the subspace of potentials induced by densities on the proxy surface . We therefore must take some care when choosing how to select points  on .

For oscillatory problems, it turns out that the ranks of tend to grow with the size of the box measured in number of wavelengths. This growth in rank can be attributed to the oscillatory nature of the Green’s function , which, in turn, necessitates an increased number of points to sample the proxy for accurately computing the implied integrals and subsequent low-rank approximation. One can use standard multipole estimates for appropriately choosing  to enable accurate compression of the far field blocks [21, 36].

In particular, in the oscillatory regime, the rank of the far field blocks tend to scale quadratically with the size of the box measured in wavelengths. Note that this does not affect the overall complexity of the method. In the limit , when the size of the problem as measured in wavelengths is fixed, is still a constant and typically . However, in the regime where the boundary is sampled at fixed number of points per wavelength, then the cost of the factorization typically grows like since . This is the expected behavior of fast direct solvers which are based on low-rank approximation, as demonstrated in [12, 30, 50]. We numerically verify this behavior for our approach as well in Section 5.

###### Remark 8.

When finding the skeleton and redundant points in a box, the far interaction matrices and the blocks affected by Schur complements are simultaneously compressed. Due to presence of the near blocks which contain the Schur complements, in practice, for small wave numbers, it is possible to use fewer points than predicted by standard multipole estimates.

### 4.3 Far field partitioning

Recall that a necessary criterion for determining the split for a box is that the interactions corresponding to points and the points are of the form . There are two mechanisms by which the condition would be violated: (1) if from the quadrature perspective, then the original matrix entry has target dependent weights and is of the form , or (2) if the matrix entries have been updated due to Schur complements arising from a previous application of the strong skeletonization procedure to a different box.

Depending on the subdivision criterion used for generating the octree, the near interactions associated with the locally-corrected quadrature may spill over into the far field region of the box, i.e. for a point , there may exist , where is the far field region associated with box . In this situation, the partition can be modified to include all such near quadrature corrections of point that spill over into the far region of the corresponding box, i.e. if for any then even if it would not have been affected by previously computed Schur complement blocks. In practice, this situation is rarely encountered and typically happens for at most boxes even in multiscale geometries.

In the original RS-S factorization, the second issue is handled by defining the index set to consist of all points in all boxes  which are not adjacent to , but which are separated by exactly one box-width in the  sense. We propose a minor modification to include only boxes which have been affected by previous applications of the strong skeletonization procedure which can be precomputed. To allow for this modification, we also need to set the proxy surface to be the sphere of radius centered at the box center of as opposed to the in the original RS-S factorization.

## 5 Numerical experiments

In this section, we illustrate the performance of our approach. For examples in Sections 5.2 and 5.1, we consider a wiggly torus as the geometry where the boundary is parameterized by as

 X(u,v)=⎡⎢⎣1.2⋅(2+cos(v)+0.25cos(5u))cos(u)(2+cos(v)+0.25cos(5u))sin(u)1.7⋅sin(v)⎤⎥⎦, (5.1)

see Figure 2. In Section 5.3, the geometry is a multiscale plane where the ratio of the largest triangle to the smallest one is . For all three examples, we consider the solution of the exterior Dirichlet problem creftype 2.2 using the combined field representation

 u(x)=Dk[σ](x)−ikSk[σ](x),x∈R3∖Ω, (5.2)

which on imposing the Dirichlet boundary conditions results in the following integral equation for the unknown density ,

 σ(x)2+Dk[σ](x)−ikSk[σ](x)=f(x),x∈Γ. (5.3)

Here is the prescribed Dirichlet data,  is the Helmholtz wavenumber, and  are interpreted using their on-surface principal value senses. Let denote the corresponding wavelength.

Suppose that creftype 5.3 is discretized using the Nyström method where the layer potentials are evaluated using the locally corrected quadrature methods discussed in [32]. We assume that the surface is given by the disjoint union of patches , , where each patch is parametrized by a non-degenerate chart , with being the standard right simplex. We further assume that the charts , their derivative information, the density and the data are discretized using order Vioreanu-Rokhlin nodes on .

Let be the total number of discretization points. Let denote the tolerance used for computing the factorization (i.e. the tolerance requested in each interpolative decomposition performed). For a patch , let denote the diameter of the patch and let denote the effective sampling rate on measured in points per wavelength. Let