1 Introduction
Integral equation formulations lead to powerful methods for the solution of boundary value problems governed by the partial differential equations (PDEs) of classical mathematical physics. The corresponding freespace Green’s functions are wellknown, and in the absence of source terms, boundary integral formulations are restricted to the surface of the domain, thereby reducing the dimensionality of the problem. There has been a resurgence of interest in these methods over the past few decades because of the availability of fast algorithms for applying the dense matrices which arise after discretization. These include hierarchical schemes such as fast multipole methods (FMMs), panel clustering methods,
matrix methods and multigrid variants, as well as FFTbased schemes such as the method of local corrections, precorrected FFT methods, etc. The literature on such methods is vast and we point only to a few review articles and monographs [9, 36, 5, 50, 46]. Assumingis the number of degrees of freedom used in sampling the surface and the corresponding charge and/or dipole densities, and
is the system matrix obtained after the application of a suitable quadrature rule to the chosen integral representation, these algorithms permitto be applied to a vector in
or time. When the linear system is wellconditioned, this generally allows for the rapid iterative solution of largescale problems in essentially optimal time.Despite these advances, there are several tasks where iterative solvers are not satisfactory. The obvious case is in problems which they themselves are illconditioned (such as highfrequency scattering problems). An equally important area where direct methods are preferred is in inverse problems, or in any other setting where one needs to solve the same system matrix with multiple righthand sides. Finally, fast direct solvers are very useful when exploring lowrank perturbations of the geometry and, hence, the system matrix. Updating the solution in such cases requires only a few applications of or a fast update of the inverse itself [31, 53].
In the last few years, several algorithmic ideas have emerged which permit the construction of a compressed approximation of at a cost of the order or , for modest . In this paper, we describe such a scheme, which we refer to as the FMMLU method. It uses FMMtype hierarchical compression strategies to rapidly compute an factorization of the system matrix. In this manuscript, we apply the method to adaptive, multiscale surface discretizations of boundary integral equations coupled to highorder accurate quadratures; this leads to efficient, highfidelity solvers for geometrically intricate models. We will concentrate here boundary value problems for the Helmholtz equation at low to moderate frequencies, but the scheme is equally applicable to many other families of boundary value problems (e.g. Laplace, Stokes, Maxwell, etc.).
The paper is organized as follows: Section 2 provides a brief derivation of the boundary integral equations of interest, and an overview of the Nyström method for their discretization. in Section 3 we review the notion of strong skeletonization for matrices, i.e. the algebraic analog of FMMbased compression [23, 55, 33, 34]. This was used by Minden, Ho, Damle and Ying [54] to develop the strong recursive skeletonization factorization method (RSS), of which our algorithm is a variant. We then present in Section 4 efficient quadrature coupling and our modifications of strong skeletonization, leading to the FMMLU scheme. (Closely related are the inverse FMM approach [2, 25] and the matrix formalism [8, 7].) Section 5 contains several numerical examples demonstrating the asymptotic scaling of our algorithm, as well as the accuracy obtained in solving realistic boundary value problems. In Section 6, we provide some guidelines regarding the current generation of fast direct solvers, discuss the limitations of the present scheme, and speculate about potential avenues for future improvement.
Remark 1.
The algorithm described in the present paper has several new features. First, it was designed with complex, multiscale geometries in mind as well as highorder accurate surface discretization and quadrature. We make use of an adaptive octree data structure to keep track of analytical estimates concerning the maximal rank of wellseparated blocks which could be at different levels in the tree hierarchy. This affects the manipulation of Schur complements in the
factorization, and also requires the development of quadrature machinery which permits onthefly extraction of near field matrix elements when targets are onorcloseto a surface triangle where accurate approximation of a layer potential requires some care. The key ingredient here is the use of generalized Gaussian quadrature [15, 14]for selfinteractions on surface panels and precomputed hierarchical interpolation matrices to accelerate adaptive integration in the near field.
Related work
Fast direct solvers for integral equations have their origin in the solution of twopoint boundary value problems in one dimension [35, 58, 1, 19, 64]. There is an extensive literature in this area and the methods are, by now, quite mature. In higher dimensions, a major distinction in fast algorithms for both hierarchically compressible matrices and their inverses concerns which blocks are left intact and which are subject to compression. For the sake of simplicity, let us assume an octree data structure has been imposed on a surface discretiztion and refined uniformly. Let us also assume that the unknowns are ordered so that the points in each (nonempty) leaf node, denoted by , are contiguous. If all block submatrices corresponding to interactions between leaf nodes and , with , are (hierarchically) compressed, using the method of [22, 45] leads to the recursive skeletonization methods of [29, 41, 51, 43, 49].
These methods are, more or less, optimal for boundary integral equations in two dimensions but not in three dimensions. In three dimensions, the interaction between neighboring leaf nodes leads to relatively highrank block matrices. In the literature, this is referred to as weak admissibility, weak skeletonization, or matrix compression [38, 40, 9]. To overcome this obstacle, borrowing from the language of fast multipole methods, one can instead choose to compress only those block matrices corresponding to wellseparated interactions, which are known a priori to be lowrank to any fixed precision (based on an analysis of the underlying PDE or integral equation). Wellseparated here means that leaf nodes and are separated by a box of the same size. In the literature, this is referred to as strong admissibility, strong skeletonization, mosaicskeletonization, or matrix compression [2, 8, 39, 47, 54, 25].
Fast solvers for boundary integral equations in three dimensions using weak skeletonization without additional improvements have led to solvers whose compression/factorization costs are of the order , with subsequent applications of the inverse requiring only work. Fortunately, the implicit constants associated with these asymptotics are relatively small [41]. A variety of improvements have been developed to reduce the factorization costs by introducing auxiliary variables in a geometricallyaware fashion [42, 24].
Strong skeletonizationbased schemes are more complicated to handle in terms of data structures and linear algebraic manipulation, but have significant advantages. First, a large body of work by Hackbusch and collaborators has led to a rigorous algebraic theory for hierarchically compressible matrices and fast solvers with linear or quasilinear complexity [6, 10, 7]. The constants implicit in the asymptotic scaling with this approach, however, are quite large. To improve performance, the inverse fast multipole method (IFMM) was introduced by Ambikasaran, Coulier, Darve, and Pouransari [2, 25], and the strong skeletonization method was introduced by Minden, Ho, Damle and Ying [54], which, as noted above, we will largely follow here.
Finally, we should note that an early fast solver for volume integral equations in two dimensions was presented in [20] and that related fast solvers have been developed using direct discretization of the PDE rather than integral equations [63, 28, 57]. Closely related to our work here is that of [59] which makes use of strong admissibility in the sparse matrix setting.
2 Problem setup
Let be a bounded region in with smooth boundary . Given a kernel , consider the following integral equation for ,
(2.1) 
Here is a given function on the boundary, is an unknown function to be determined, and a constant. Such integral equations (and their analogs for the vectorvalued case) naturally arise in the solution of boundary value problems for the Laplace, Helmholtz, Yukawa, Maxwell, and Stokes equations, just to name a few. In these settings, the kernel is typically related to the Green’s function (or its derivatives) of the corresponding partial differential equation.
For example, consider the exterior Dirichlet boundary value problem for the Helmholtz equation with wave number and boundary data ; the Helmholtz potential defined in satisfies
(2.2)  
Let denote the freespace Green’s function for the Helmholtz equation given by
(2.3) 
and let
(2.4) 
where is the outward normal at . The above kernel is the kernel of what is known as the combined field representation. Suppose that satisfies (2.1) with and as defined above, then the potential given by
(2.5) 
is the solution to the exterior Dirichlet problem for the Helmholtz equation in (2.2). We will often denote the above formula more succinctly as , where is the layer potential operator along with kernel . A more detailed derivation and associated proofs for this fact can be found in [44], for example.
The integral equation creftype 2.1 can be discretized with highorder accuracy using (for example) a suitable Nyström method [3] resulting in the following linear system
(2.6) 
Here and are the quadrature nodes and weights, respectively, while is an approximation to the true value . The kernel is often singular at the origin and smooth away from the origin. When using highorder discretization methods, it is often possible to use quadrature weights independent of the target location, except possibly for a local neighborhood of targets close to each source, i.e. for all , for a suitably defined wellseparated region . Such quadratures are often referred to as locallycorrected quadratures for obvious reasons. Many existing quadrature methods such as coordinate transformation methods, singularity subtraction methods, quadrature by expansion, ErichsenSauter rules, and generalized Gaussian methods combined with adaptive integration can be used as locallycorrected quadratures [18, 17, 48, 27, 56, 60, 61, 67, 13, 14, 12, 15, 27, 32, 62]. In this setting, the discrete linear system creftype 2.6 then takes the form
(2.7) 
We further rescale the above equation so that the unknowns are for which the discrete linear system becomes
(2.8) 
The scaling by the square root of the weights in the above equation formally embeds the discrete solution in and results in the discretized operators (including subblocks of the matrices) to have norms and condition numbers which are close to (and converge to) those of the continuous versions of the corresponding operators [11]. In the following work, we restrict our attention to the fast solution of linear systems of the form creftype 2.8.
Remark 2.
In many locallycorrected quadrature methods, in order to accurately compute far interactions the underlying discretization may require some additional oversampling to meet an a priori specified precision requirement. See [32] for a thorough discussion. In short, let , , , for , denote the set of oversampled discretization nodes, the corresponding quadrature weights for integrating smooth functions on the surface, and the interpolated density at the oversampled discretization nodes, respectively. For triangulated/quadrangulated surfaces, these can be obtained via local polynomial interpolation of the chart information and the density. The linear system (without the square root scaling) then takes the form
(2.9) 
The need for oversampling often arises when using loworder discretizations. The extension of our approach, and other existing approaches, to discretizations which require oversampling for far interactions is currently being pursued and the results will be reported at a later date. The main ideas are similar, but constructing an efficient algorithm requires addressing several implementation details.
3 Strong skeletonization factorization
The basic structure of the FMMLU factorization is closely related to the strong skeletonization factorization (RSS) introduced in [54]. In this section, we briefly review key elements of RSS. To the extent possible, we use the same notation as in [54] to clearly highlight our modifications to their approach.
Suppose that all the discretization points , are contained in a cube . We will superimpose on a hierarchy of refinements as follows: the root of the tree is itself and defined as level 0. Level is obtained from level recursively by subdividing each cube at level into eight equal parts as the number of points in that cube at level is greater than some specified parameter . The eight cubes created by subdividing a larger cube are referred to as children of the parent cube. When the refinement has terminated, is covered by disjoint childless boxes at various levels of the hierarchy (depending on the local density of the given points). These childless boxes are referred to as leaf boxes. For any box in the hierarchy, the near field region of consists of other boxes at the same level that touch , and the far field region is the remainder of the domain.
For simplicity, we assume that the above described octree satisfies a standard restriction – namely, that two leaf nodes which share a boundary point must be no more than one refinement level apart. In creating the adaptive data structure as described above, it is very likely that this levelrestriction criterion is not met. Fortunately, assuming that the tree constructed to this point has leaf boxes and that its depth is of the order , it is straightforward to enforce the levelrestriction in a second step requiring effort with only a modest amount of additional refinement [26].
Remark 3.
The near field region and far field region of a box in the octree hierarchy is almost always different from the near region and far regions of source and target locations associated with the locallycorrected quadrature methods in (2.8). In practice, for almost all targets, the near field region of the locallycorrected quadrature corrections is a subset of near field region of the leaf box containing the target. In the event that this condition is violated, the RSS algorithm of [54] would require additional modifications to handle the associated quadrature corrections discussed in Section 4.3.
3.1 Strong skeletonization
The idea of strong skeletonization was recently introduced in [54] and extends the idea of using the interpolative decomposition to globally compress a lowrank matrix to the situation in which only a particular offdiagonal block is lowrank. In the context of solving a discretized boundary integral equation, the offdiagonal lowrank block is a result of far field interactions via the kernel (e.g. Green’s function) of the integral equation. We briefly recall the standard interpolative decomposition [45] and the form of strong skeletonization as presented in [54].
Lemma 1 (Interpolative decomposition).
Given a matrix with rows indexed by and columns indexed by , an accurate interpolative decomposition (ID) of is a partitioning of into a set of socalled skeleton columns denoted by and redundant columns , and a construction of a corresponding interpolation matrix such that
(3.1) 
i.e. the redundant columns are wellapproximated, to the required tolerance , by linear combinations of the skeleton columns. Equivalently, after an application of a permutation matrix such that , the interpolative decomposition results in the accurate lowrank factorization with the error estimate,
(3.2) 
The norms above can be taken to be the standard induced spectral norm.
The ID is most robustly computed using the strong rankrevelaing QR factorization of Gu and Eisenstat [37]. However, in this work we use a standard greedy columnpivoted QR [52]. While both algorithms have similar computational complexity when , i.e. , the greedy column pivoted QR tends to have better computational performance.
Next, consider a threebythree block matrix and suppose that is a partition of the index set, with , such that and , i.e.
(3.3) 
Assuming that is invertible, then using block Gaussian elimination the matrix can be decoupled from the rest of the matrix as follows
(3.4) 
where is the only nonzero block of the matrix that has been modified. The matrix is often referred to as the Schur complement update.
Suppose now that the matrix arises from the discretization of an integral operator. Further suppose that is a set of indices of points contained in a box in the octree, that are the indices of the set of points contained in the near field region of , and that are the indices of points contained in the far field region of . Using an appropriate permutation matrix , we can obtain the following block structure for :
(3.5) 
The blocks corresponding to interactions between points in and its far field, i.e. and , are assumed to be numerically lowrank and can therefore be compressed using interpolative decompositions. As before, we partition into a collection of redundant points and a set of skeleton points such that, up to an appropriate permutation of rows and columns (which can be absorbed into the permutation matrix above), we have
(3.6) 
Note that in the relationship above the same interpolation matrix is used to compress both and . Clearly this is possible when the kernel of the associate integral equation is symmetric. If the kernel is not symmetric, empirically the same matrix can be used at the cost of a small increase in . Using the same matrix simplifies various subsequent linear algebra manipulations, but strictly speaking, is not necessary. Different interpolation matrices can be used for each of and . Using different compression matrices will result in smaller skeleton sets at the cost of more linear algebraic bookkeeping.
Further splitting the indices in creftype 3.5, and combining with creftype 3.6, we get
(3.7) 
Since the redundant rows and columns of the interaction between points in and can be wellapproximated by the corresponding rows and columns of the interactions between points in and , we can decouple points from the far field points as follows. Let , denote the elimination matrices defined on the partition as
(3.8) 
Then
(3.9) 
where the notation is used to indicate blocks of the above matrix whose entries are different from the entries of the original matrix in (3.7). We also note that the matrices and are, in fact, block diagonal when viewed over the partition , and therefore the above factorization can be considered a type of elimination.
Now, assuming again that the block is invertible, we can use it as a pivot block to completely decouple the redundant indices from the rest of the problem as follows. Let and denote the block upper and lower triangular elimination matrices given by
(3.10) 
Then
(3.11)  
The matrix in creftype 3.11 is of the form creftype 3.3. This process of decoupling the redundant degrees of freedom of from the rest of the problem is referred to as the strong skeletonization of with respect to . The resulting matrix is denoted by , as above.
Since the matrices and are block diagonal with respect to the partition , equation (3.11) can be rewritten in terms of a block like factorization of the original matrix :
(3.12) 
For notational convenience, let and denote the left and right skeletonization operators defined by
(3.13) 
with the understanding that these matrices will be stored and used in factored form for computational efficiency. Moreover, the matrices are block triangular matrices with identities on the diagonal and hence their inverses can be trivially computed by toggling the sign of the nonzero offdiagonal blocks. With this shorthand, we can obtain an even more compact representation of given by
(3.14) 
Remark 4.
The elimination matrices and are referred to as and in [54]. However, for clarity of identifying the block lower and upper triangular structure in and with respect to the partition , we have renamed and .
3.2 Lowrank approximation using proxy surfaces
For any box (particularly on the finest level of the octree), there are typically points in its far field region. Due to this fact, any algorithm which requires dense assembly of all blocks and for all boxes in order to compute the interpolative decompositions in creftype 3.6 would result in an computational complexity for constructing a compressed representation of . In this section, we briefly discuss an indirect, efficient, and accurate approach for constructing the interpolative decomposition of and . More specific implementation details on this accelerated compression is given in Section 4. In what follows, in a slight abuse of notation, we will associate points in the original discretization of the boundary integral equation with their index value . We will therefore say, interchangeably for clarity, that or or .
In order to achieve the lineartime speedup in compression of these offdiagonal blocks, we must make two additional assumptions regarding the discrete linear system creftype 2.8:

For each box , there exists a partition of the far field index set , with , such that for all and , the corresponding matrix entries in are given by . We also assume that the converse holds: for all and , the corresponding matrix entries in are given by . This is equivalent to assuming that for these wellseparated regions and , a smooth sourcedependent quadrature rule can be used to accurately discretize the underlying integral equation. One could, in principle, require this assumption to hold for all points in . However, due to Schur complement updates arising from the strong skeletonization procedure, as we will see, this condition will be violated for some points in . This issue is discussed in further detail in Section 3.3.

The kernel is a linear combination of the Green’s function of a homogeneous elliptic partial differential equation (denoted by ) and its derivatives. In particular, if denotes a smooth surface embedded in then the interaction kernel is then assumed to satisfy the following conditions:

For , the integrals of the kernel along should be interpreted in a principal value sense.

For , satisfies the underlying PDE for all . This assumption is true for both the Helmholtz single and double layer potentials, for example.

The above assumptions will be used in constructing efficient compression using Green’s identities and what are known as proxy surfaces.
We now give a brief overview of this compression procedure for the block . This block of the discretized integral equation maps charges located in to potentials located in . In particularly, charges located in induce a potential – outside of – which satisfies the underlying homogeneous elliptic PDE, and therefore, this potential can be represented by an equivalent charge density distributed along a proxy surface which encloses but does not include points in . This effectively means that the block can be split and factored as
(3.15)  
Specific details of computing the above factorization are contained in Section 4.1, as well as a justification of such a factorization. The implication of the above factorization is the following: since both and are , and as will be shown later on can be discretized using a modest number of points proportional to the size of in the oscillatory regime, the matrix on the right in (3.15) has dimensions which are . This means that an interpolative decomposition on the columns of this matrix can be computed with work. Denoting this compression as
(3.16) 
we have effectively computed a lowrank approximation of at a cost of only flops. The matrix above is a permutation matrix that appropriately reorders the columns according to those which have been chosen as skeleton columns and those which are redundant columns.
The compression of the dual matrix can be done in a nearly identical manner after observing that all potentials which satisfy the underlying PDE in the box can be represented using a charge density lying on the same proxy surface , and therefore an interpolative decomposition on the rows of this matrix can be computed in time. More details, and a justification, are found in Section 4.1. Lastly, these two compressions can be done simultaneously in order to generate the same interpolation “” matrix for both and at a modest increase in rank. While not strictly necessary, this somewhat simplifies the subsequent factorization procedure and associated storage costs.
Remark 5.
In [42], the authors discuss compression via the proxy method applied to integral equations of the form
(3.17) 
In particular, when constructing , they emphasize the need for including both matrices and , where and are diagonal matrices with entries and , for , respectively. Handling nonuniform quadrature weights is equivalent to compressing a discretized version of creftype 3.17 with , , and .
3.3 The RSS Algorithm
In this section, we provide a brief summary of the RSS algorithm. We refer the reader to the original manuscript for a detailed description [54]. The RSS algorithm proceeds by sequentially applying the strong skeletonization procedure to each box in the levelrestricted tree. The boxes in the tree hierarchy are traversed in an upward pass, i.e. boxes at the finest level will be processed first followed by boxes at subsequent coarser levels. After each application of the strong skeletonization procedure, only the skeleton points associated with each box are retained for further processing. These will be referred to as the active degrees of freedom. Even when constructing near field and far field index sets of a box, only the active degrees of freedom contained in the respective regions are retained (other degrees of freedom have been deemed redundant and decoupled from the system). For boxes at coarser levels, the active degrees of freedom for each box is the union of the active degrees of freedom of each of its children boxes. After regrouping the active indices from all the children of boxes at coarser levels, the process of strong skeletonization can be applied to those boxes as well. This process is continued until there are no remaining active degrees of freedom in the far field region of any box at a given level or the algorithm reaches level 1 in the tree structure (for which the statement is trivially true since there are no boxes in the far field region of any box).
Similar to before, let and denote the left and right skeltonization operators associated with box defined in creftype 3.13. Suppose that the multilevel RSS algorithm of [54] terminates at box ; let denote the remaining active degrees of freedom in the domain. Let denote the permutation which orders the points in in a contiguous manner. Then the RSS factorization of the matrix takes the form
(3.18) 
Here is the block diagonal matrix given by
(3.19) 
where are the redundant indices in . An approximate factorization of is readily obtained from the formula above and is given by
(3.20) 
In the event that the matrix is positive definite, one can also compute the generalized squareroot and the logdeterminant of the matrix using the factorization above.
One crucial detail still remains to be resolved for the construction of this factorization: how best to obtain a partition of the active far field indices for each box such that the first condition in Section 3.2 is satisfied. In particular, it must be shown that even after several recursive applications of the strong skeletonization procedure the matrix entries corresponding to are still given by and are unaffected by Schur complements introduced during the elimination procedure. That this is even possible is not immediately obvious since the Schur complement update obtained by applying the strong skeletonization procedure to one box might potentially affect the interaction between a different box and its far interactions. Recall that the Schur complement constructed during the strong skeletonization procedure applied to box updates the interaction between points in the near field region of . Referring to Figure 5 in [54], using strong skeletonization to compress the operator with respect to points in updates the entries of the matrix corresponding to interactions of points contained in and since both of them are in the near field region of . However, is in the far field region of , and thus the entries of (where denotes the far field region of ) will not correspond to the original matrix entries. However, all such interactions can be included in the partition of .
A systematic way of addressing this issue and constructing the partition was presented in [54]. Suppose that is a sphere enclosing the box with radius equal to , where is the sidelength of box . Let denote the set of indices in such that if then . This choice ensures that all the matrix entries in and its transpose are always of the form and , respectively. Furthermore, this choice also ensures that after merging the active degrees of freedom from boxes at finer levels, the partition still satisfies the constraint. These result are proven in [54] (Theorem 3.1, and Corollary 3.2 respectively). The additional buffer of placing the proxy surface separated by two boxes at the same level is crucial in proving those results.
Under mild assumptions on the ranks of the interactions between a box and its far field, the cost of computing the RSS factorization , the cost of applying the compressed operator and its inverse , and the memory required () in the algorithm all scale linearly in the number of discretization points independent of the ambient dimension of the data. In particular, if the ranks of the far field blocks are for boxes on level where , then with the implicit constant being a polynomial power of , where is the requested tolerance. These conditions are typically satisfied for discretizations of boundary integral equations for nonoscillatory problems.
Remark 6.
A naive implementation of looping over all Schur complements for updating matrix entries could result in an complexity for constructing the RSS factorization. Letting denote the Schur complement obtained from eliminating the redundant degrees of freedom in box , one can precompute pairs corresponding to Schur complements which would impact the matrix entries and its transpose. Using this list, one can avoid having to loop over all Schur complement blocks and therefore retain the complexity for constructing the RSS factorization.
4 Quadrature coupling in multiscale geometries
The primary source of error in constructing the RSS factorization is in the compression of the matrix corresponding to far interactions (and its dual) using the proxy method [66, 65]. Recall that after using the proxy method for compression as discussed in Section 3.2, the interpolative decomposition is computed for a matrix whose entries are kernel evaluations . In the following section, we present three modifications to the standard RSS algorithm of [54]:

Properly formulating the proxy surface compression procedure when the kernel is obtained using the combined field integral representation;

Determining a properly sampled discretization of the proxy surface using points which is based on the box size in the tree hierarchy, so as to sufficiently sample the operator when the kernel is oscillatory without excessive oversampling; and

Finally, constructing a partition capable of handling near field quadrature corrections for multiscale geometries.
We first turn to the accelerated proxy compression of matrix blocks obtained when using a combined field representation for the solution to a Dirichlet problem.
4.1 Proxy compression for combined field representations
As discussed in the introduction, in order to solve the exterior Dirichlet scattering problem for the Helmholtz equation, we employ an integral equation formulation whose kernel is given by the combined field potential :
(4.1) 
After inverting the resulting integral equation
(4.2) 
the solution to the boundary value problem is given as
(4.3) 
Our goal in the following is to justify the method of proxy compression, i.e. to show that the appropriate row or column spaces of submatrices and , respectively, are spanned by proxy kernel matrices which only involve the kernel evaluated at proxy points and the relevant set of sources or targets.
4.1.1 Outgoing skeletonization
To this end, for a given box , a region and its boundary can be chosen such that and that . See Figure 1 for a geometric setup of the situation. We therefore, by our earlier assumptions, have that and that satisfies the underlying PDE (i.e. Helmholtz equation) in for each . For this box , now consider what we will call the associated exterior proxy boundary value problem:
(4.4)  
where is Helmholtz potential due to any function supported on the piece of contained inside box , i.e.
(4.5) 
The above boundary value problem is clearly an exterior Dirichlet problem for the Helmholtz equation, and can be solved using an integral equation method by first representing as for some unknown density defined along . (Note that in practice, is usually taken to be the surface of a sphere but there is no mathematical reason why this must be the case.)
The solution to the boundary value problem (4.4) is unique and can be formally obtained, as was for in the introduction, as
(4.6) 
where it is understood that the inverse operator in the middle is a map from to , i.e. that is a map from to and is interpreted in the proper principal value sense. Finally, consider a discretization and embedding [11] of the above form of which maps sources at a collection of with strengths to their potentials at target locations :
(4.7)  
where denotes the elementwise Hadamard product of the kernel matrix with a matrix of quadrature corrections and the matrices contain smooth quadrature corrections along their diagonal. Upon further inspection, however, assuming that the discretization of the above integral equation along was suitably accurate and that the source and target locations and were chosen to be the same as in the original discretization of , we have that it must be true that
(4.8) 
due to the uniqueness of the exterior Helmholtz Dirichlet problem (i.e. if the boundary data in (4.4) was chosen to agree with that induced by sources located at discretization points , then the solution at must agree with the potential generated via multiplication by ). Therefore, due to the partition of the far field , it must also be true that (after a suitable permutation of rows)
(4.9) 
A note on dimensions of the above matrices: recall that, by assumption, and therefore the bulk of the discretization nodes in are contained in , i.e. that . This implies that the matrix on the very right above has dimensions which are , where denotes the number of points used to discretize the proxy surface. Choosing is discussed in the following section. Next we detail how to use the above factorization to compute a column skeletonization of the original submatrix .
First, a column skeletonization of the matrix on the right in (4.9) is performed which yields a decomposition of into redundant and skeleton points, i.e. :
(4.10) 
where is a permutation of the matrix performing a reordering of columns according to the split , and in our continued abuse of notation, it is implied that this is an accurate approximation. Inserting (4.10) into (4.9), we have
(4.11)  
Again, due to the uniqueness of the exterior Helmholtz boundary value problem in (4.4), it must be true that , and therefore we have computed a lowrank approximation to the original matrix since
(4.12)  
Remark 7.
In the publicly available software associated with [54], the authors use a slightly simpler method to obtain interpolative decompositions of matrices such as via proxy compression. It appears that in practice, their method doesn’t seem to affect the accuracy of the computed solution on simple geometries for requested tolerances of or greater. Our discussion above was meant to be a fully selfcontained treatment of proxy compression, as was implemented in our solver.
4.1.2 Incoming skeletonization
The above algorithm for compressing can immediately be extended to the row compression of the dual block by considering instead the associated interior proxy boundary value problem (analogous to (4.4)). We refer to this procedure as calculating the incoming skeleton nodes for a box . In this case, unfortunately, it may be the case that the wavenumber
is such that it is an interior Dirichlet eigenvalue of the Laplace operator inside
, and therefore (4.4) is not uniquely solvable. However, this doesn’t matter for our purposes since all we hope to capture in the row compression is an accurate approximation to the row space of the associated matrix block. Since only the existence of the matrix is used – it is never explicitly formed or is any piece of it inverted – the calculation is nearly identical and the row skeletonization can be computed by only forming and approximating the matrices and .4.1.3 Simultaneous skeletonization
Lastly, as mentioned in Section 3.2, it is sometimes useful to choose identical incoming and outgoing skeleton points for a particular box. If this is desired, then a single interpolative decomposition can be performed:
(4.13) 
It is easily verified that the matrix above compresses both and . In practice, the rank used in the above lowrank approximation is slightly larger than if the blocks had been compressed separately (but, the subsequent bookkeeping and factorization is slightly simpler and uses less storage).
4.2 Proxy surface discretization
Recall that when using the proxy surface compression technique, the matrix in (4.8) must accurately capture the subspace of potentials supported on the proxy surface, and likewise, must accurately capture the subspace of potentials induced by densities on the proxy surface . We therefore must take some care when choosing how to select points on .
For oscillatory problems, it turns out that the ranks of tend to grow with the size of the box measured in number of wavelengths. This growth in rank can be attributed to the oscillatory nature of the Green’s function , which, in turn, necessitates an increased number of points to sample the proxy for accurately computing the implied integrals and subsequent lowrank approximation. One can use standard multipole estimates for appropriately choosing to enable accurate compression of the far field blocks [21, 36].
In particular, in the oscillatory regime, the rank of the far field blocks tend to scale quadratically with the size of the box measured in wavelengths. Note that this does not affect the overall complexity of the method. In the limit , when the size of the problem as measured in wavelengths is fixed, is still a constant and typically . However, in the regime where the boundary is sampled at fixed number of points per wavelength, then the cost of the factorization typically grows like since . This is the expected behavior of fast direct solvers which are based on lowrank approximation, as demonstrated in [12, 30, 50]. We numerically verify this behavior for our approach as well in Section 5.
Remark 8.
When finding the skeleton and redundant points in a box, the far interaction matrices and the blocks affected by Schur complements are simultaneously compressed. Due to presence of the near blocks which contain the Schur complements, in practice, for small wave numbers, it is possible to use fewer points than predicted by standard multipole estimates.
4.3 Far field partitioning
Recall that a necessary criterion for determining the split for a box is that the interactions corresponding to points and the points are of the form . There are two mechanisms by which the condition would be violated: (1) if from the quadrature perspective, then the original matrix entry has target dependent weights and is of the form , or (2) if the matrix entries have been updated due to Schur complements arising from a previous application of the strong skeletonization procedure to a different box.
Depending on the subdivision criterion used for generating the octree, the near interactions associated with the locallycorrected quadrature may spill over into the far field region of the box, i.e. for a point , there may exist , where is the far field region associated with box . In this situation, the partition can be modified to include all such near quadrature corrections of point that spill over into the far region of the corresponding box, i.e. if for any then even if it would not have been affected by previously computed Schur complement blocks. In practice, this situation is rarely encountered and typically happens for at most boxes even in multiscale geometries.
In the original RSS factorization, the second issue is handled by defining the index set to consist of all points in all boxes which are not adjacent to , but which are separated by exactly one boxwidth in the sense. We propose a minor modification to include only boxes which have been affected by previous applications of the strong skeletonization procedure which can be precomputed. To allow for this modification, we also need to set the proxy surface to be the sphere of radius centered at the box center of as opposed to the in the original RSS factorization.
5 Numerical experiments
In this section, we illustrate the performance of our approach. For examples in Sections 5.2 and 5.1, we consider a wiggly torus as the geometry where the boundary is parameterized by as
(5.1) 
see Figure 2. In Section 5.3, the geometry is a multiscale plane where the ratio of the largest triangle to the smallest one is . For all three examples, we consider the solution of the exterior Dirichlet problem creftype 2.2 using the combined field representation
(5.2) 
which on imposing the Dirichlet boundary conditions results in the following integral equation for the unknown density ,
(5.3) 
Here is the prescribed Dirichlet data, is the Helmholtz wavenumber, and , are interpreted using their onsurface principal value senses. Let denote the corresponding wavelength.
Suppose that creftype 5.3 is discretized using the Nyström method where the layer potentials are evaluated using the locally corrected quadrature methods discussed in [32]. We assume that the surface is given by the disjoint union of patches , , where each patch is parametrized by a nondegenerate chart , with being the standard right simplex. We further assume that the charts , their derivative information, the density and the data are discretized using order VioreanuRokhlin nodes on .
Let be the total number of discretization points. Let denote the tolerance used for computing the factorization (i.e. the tolerance requested in each interpolative decomposition performed). For a patch , let denote the diameter of the patch and let denote the effective sampling rate on measured in points per wavelength. Let