We consider Laplace’s equation
|where is a non-empty domain with a sufficiently smooth boundary . If we add the boundary condition|
|with a suitable function , we obtain the Dirichlet problem. With the boundary condition|
denotes the outward-pointing unit normal vector for the domain, we arrive at the Neumann problem. We can solve these problems by using Green’s representation formula (cf., e.g., [18, Theorem 2.2.2]) given by
denotes the free-space Green’s function for the Laplace operator. If we know Dirichlet and Neumann boundary conditions, we can use (2) to compute the solution in any point of the domain .
For boundary values , Green’s formula takes the form
and this boundary integral equation can be used to solve the Dirichlet problem: we can solve the integral equation to obtain the Neumann values from the Dirichlet values and then use (2) to find the solution in all of .
In order to solve the Neumann problem, we take the normal derivative of (3) and arrive at the boundary integral equation
that can be used to find Dirichlet values matching given Neumann values, so we can follow the same approach as for the Dirichlet problem.
In this paper, we will concentrate on solving the boundary integral equations (3) and (4) efficiently. We employ a Galerkin scheme using a finite-dimensional space spanned by basis functions for the Neumann values and another finite-dimensional space spanned by basis functions for the Dirichlet values. The discretization turns the boundary integral operators into matrices, i.e., the matrix corresponding to the single-layer operator given by
|the matrix corresponding to the double-layer operator given by|
|and the matrix corresponding to the hypersingular operator given by|
Together with the mixed mass matrix given by
we obtain the linear system
for the Dirichlet-to-Neumann problem (3), where contains the coefficients of the Neumann values and those of the given Dirichlet values, and the system
for the Neumann-to-Dirichlet problem (4), where now contains the coefficients of the Dirichlet values and those of the given Neumann values. and denote the transposed matrices of and , respectively.
Considered from the point of view of numerical mathematics, these linear systems pose three challenges: we have to evaluate singular integrals in order to compute diagonal and near-diagonal entries, the resulting matrices are typically dense and large, and the condition number grows with the matrix dimension, which leads to a deteriorating performance of standard Krylov solvers. The first challenge can be met by using suitable quadrature schemes like the Sauter-Schwab-Erichsen technique [26, 12, 27]. For the third challenge, various preconditioning algorithms have been proposed [28, 23], among which we choose -matrix coarsening and factorization [14, 2].
This leaves us with the second challenge, i.e., the efficient representation of the dense matrices , , and . This task is usually tackled by employing a data-sparse approximation, i.e., by finding a sufficiently accurate approximation of the matrices that requires only a small amount of data. One possibility to construct such an approximation is to use wavelet basis functions and neglect small matrix entries in order to obtain a sparse matrix [24, 29, 10, 22]. Since the construction of suitable wavelet spaces on general surfaces is complicated , we will not focus on this approach.
Instead, we consider low-rank approximation techniques that directly provide an approximation of the matrices for standard finite element basis functions. These techniques broadly fall into three categories: analytic methods approximate the kernel function
locally by sums of tensor products, and discretization of these sums leads to low-rank approximations of matrix blocks. The most prominent analytic methods are the fast multipole expansion[25, 16, 17]
and interpolation[13, 8]. Algebraic methods, on the other hand, directly approximate the matrix blocks, e.g., by computing a rank-revealing factorization like an adaptive cross approximation [1, 4, 3]
. The convergence and robustness of analytic methods can be proven rigorously, but they frequently require a larger than necessary amount of storage. Algebraic methods reach close to optimal compression, but involve a heuristic pivoting strategy that may fail in certain cases[9, Example 2.2]. Hybrid methods combine analytic and algebraic techniques in order to obtain the advantages of both without the respective disadvantages. In this paper, we will focus on the hybrid cross approximation  and the Green cross approximation  that both combine an analytic approximation with an algebraic improvement in order to obtain fast and reliable algorithms.
Both approximation schemes in this paper’s focus lead to -matrices [21, 6], a special case of hierarchical matrices [19, 15]. A given matrix with general finite row and column index sets and is split into submatrices that are approximated by factorized low-rank representations.
The submatrices are constructed hierarchically in order to make the matrix accessible for elegant and efficient recursive algorithms. The first step is to split the index sets into a tree structure of subsets.
Definition 1 (Cluster tree)
Let be a finite non-empty index set. A tree is called a cluster tree for if the following conditions hold:
Every node of the tree is a subset of .
The root is .
If a node has children, it is the union of these children:
The children of a node are disjoint:
Nodes of a cluster tree are called clusters.
Cluster trees can be constructed by recursively splitting index sets, e.g., ensuring that “geometrically close” indices are contained in the same cluster. We assume that cluster trees and for the index sets and are given.
Using the cluster trees, the index set corresponding to the matrix can now be split into a tree structure.
Definition 2 (Block tree)
A tree is called a block tree for the row index set and the column index set if the following conditions hold:
Every node of the tree is a subset with and .
The root is .
If a node has children, the children are given as follows:
for all .
Nodes of a block tree are called blocks.
Block trees are usually constructed recursively using an admissibility condition matching the intended approximation scheme: we start with the root and recursively subdivide blocks. Once the admissibility condition indicates for a block that we can approximate the submatrix , we stop subdividing and call the block a farfield block. We also stop if and have no children, then is a nearfield block. If we ensure that leaf clusters have only a small number of elements, nearfield blocks are small and we can afford to store them without compression. The key to the efficiency of hierarchical matrices is the data-sparse representation of the farfield blocks.
We assume that a block tree is given and that its farfield leaves are collected in a set , while the nearfield leaves are in a set .
The more efficient -matrices [21, 6] use a different factorized representation closely related to fast multipole methods : each cluster is associated with a matrix, and only the columns of this matrix are used to represent matrix blocks.
Definition 3 (Cluster basis)
Let . A family of matrices is called a cluster basis for with (maximal) rank if the following conditions hold:
We have for all .
For all with , there are transfer matrices for all such that
An important property of cluster bases is that they can be stored efficiently: under standard assumptions, only units of storage are requires, where is the cardinality of the index set .
Definition 4 (-matrix)
Let and be cluster bases for and , respectively. A matrix is called an -matrix with row basis and column basis if for every admissible leaf there is a coupling matrix such that
Under standard assumptions, we can represent an -matrix by coefficients, where and are the cardinalities of the index sets. The matrix-vector multiplication can be performed in operations for -matrices, and there are a number of other important operations that also only have linear complexity with respect to and , cf. .
3 Hybrid cross approximation
In order to construct an -matrix approximation of the matrices , , and required for the boundary element method, we first consider the hybrid cross approximation (HCA) method  originally developed for hierarchical matrices.
We associate each cluster with an axis-parallel bounding box such that the supports of all basis functions associated with indices in are contained in . In order to ensure that the kernel function is sufficiently smooth for a polynomial approximation, we introduce the admissibility condition
where and denote the Euclidean diameters of the bounding boxes and , while denotes their Euclidean distance. is a parameter that controls the storage complexity and the accuracy of the approximation.
If two clusters , satisfy this condition, the restriction can be approximated by polynomials. We use -th order tensor Chebyshev interpolation and denote the interpolation points for and by and , where . The corresponding Lagrange polynomials are denoted by and , and the tensor interpolation polynomial is given by
It is possible to prove
The rate of convergence depends only on the parameter , cf. [6, Chapter 4].
Unfortunately, the rank of this approximation is too large: potential theory suggests that a rank should be sufficient for an -th order approximation.
and use a rank-revealing pivoted LU factorization to obtain an approximation of the form
where and denote the first row and column pivots, respectively. Due to the properties of the kernel function , the rank can be expected to be significantly smaller than .
Given the pivot sets and , we can now “take back” the interpolation:
By controlling the interpolation order and the accuracy of the cross approximation, we can ensure that the approximation error is below any given tolerance .
In order to obtain an approximation of the submatrix , we replace by in (5a) to find
with matrices and . Due to , the matrix is an improved low-rank approximation of the submatrix .
This procedure alone only yields a hierarchical matrix, not the desired -matrix. In order to reduce the storage requirements, we apply hierarchical compression : the hybrid cross approximation technique already yields low-rank approximations of individual blocks, but these approximations do not share common row or column cluster bases. The hierarchical compression algorithm recursively merges independent submatrices into larger -matrices until the entire matrix is in the required form.
4 Green cross approximation
While hybrid cross approximation requires a subsequent compression step to obtain an -matrix, we now consider a related technique that directly yields an -matrix approximation of the matrix .
We once more consider an admissible pair , of clusters with bounding boxes . We construct an auxiliary axis-parallel box such that
We observe that the variables and are separated in both integrands. Due to the second and third assumptions in (12), the integrands are smooth, and we can approximate the integrals by a quadrature rule, e.g., a composite Gauss rule, with weights and quadrature points on the boundary in order to get
for all and all . The function is again a sum of tensor products, and, as in the case of the hybrid cross approximation, its discretization gives rise to a low-rank approximation of the submatrix , since we have
with and , therefore
Unfortunately, the rank of this approximation is quite high, higher than, e.g., for the hybrid cross approximation. Once again, we can use an algebraic technique, i.e., the adaptive cross approximation, to improve the analytically-motivated initial approximation : we define
and perform an adaptive cross approximation to find index sets and of cardinality such that
By applying this approximation and “taking back” the quadrature approximation in the last step, we arrive at
We define and and obtain
This is a rank- approximation of the matrix block, and experiments indicate that is frequently far smaller than . The equation (13) can be interpreted as “algebraic interpolation”: we (approximately) recover all entries of the matrix from a few rows , where the indices in play the role of interpolation points and the columns of the role of Lagrange polynomials. We call this approach Green cross approximation (GCA).
Concerning our goal of finding an -matrix, we observe that and depend only on and , but not on the cluster , therefore it is a good candidate for a cluster basis.
Applying the same procedure to the cluster instead of , we obtain and such that
and combining both approximations yields
the required representation (9) for an -matrix.
In order to make and proper cluster bases, we have to ensure the nesting property (8). We can achieve this goal by slightly modifying our construction: we assume that has two children and that the sets and and the matrices and have already been computed. We have
We apply the cross approximation to instead of and obtain and such that
ensures (8) if we let
This modification not only ensures that is a proper cluster basis, it also reduces the computational work required for the cross approximation, since only the submatrices have to be considered, and these submatrices are significantly smaller than .
The parallelization of this algorithm is straightforward: we can compute index sets and matrices for all leaves of the cluster tree in parallel. Once this task has been completed, we can treat all clusters whose children are leaves in parallel. Next are all clusters whose children have already been treated. Repeating this procedure until we reach the root of the tree yields a simple and efficient parallel version of the algorithm.
5 Numerical experiments
Now that we have two compression algorithms at our disposal, we have to investigate how well they perform in practice. While we can prove for both algorithms that they can reach any given accuracy (disregarding rounding errors), we have to see which accuracies are necessary in order to preserve the theoretical convergence rates of the Galerkin discretization.
Until now, we have only seen HCA and GCA applied to the matrix corresponding to the single-layer operator. For the other two operators, we simply take the appropriate derivatives of and and use them as approximations of the kernel functions.
Given a boundary element mesh, we choose discontinuous piecewise constant basis functions for the Neumann values and continuous piecewise linear basis functions for the Dirichlet values. For a meshwidth of , we expect theoretical convergence rates of for the Neumann values in the norm, for the Neumann values in the norm and the Dirichlet values in the norm, and for the Dirichlet values in the norm.
In a first experiment, we approximate the unit sphere by a sequence of triangular meshes constructed by splitting the eight sides of a double pyramid regularly into triangles and then projecting all vertices to the unit sphere. Since we expect the condition number of the matrices to be in and want to preserve the theoretical convergence rate of , we aim for an accuracy of for the matrix approximation. For the sake of simplicity, we use a slightly higher accuracy: if is halved, we reduce the error tolerance by a factor of instead of just .
Nearfield matrix entries are computed by Sauter-Schwab-Erichsen quadrature , and we have to increase the order of the nearfield quadrature occasionally to ensure the desired rate of convergence.
The parameters used for this experiment are summarized in Table 1, where denotes the number of triangles, the nearfield quadrature order, the resolution of the cluster tree, i.e., the maximal size of leaf clusters, the order of interpolation for HCA and the order of quadrature for GCA, the relative accuracy for the adaptive cross approximation, the accuracy for the hierarchical compression used for HCA, the relative accuracy of the Krylov solver, and the relative accuracy of the preconditioner constructed by coarsening  and -Cholesky decomposition .
Figure 2 shows the -norm error for the approximation of the Neumann data computed via the boundary integral equation (3). We use the functions , , and with and as test cases. We can see that the optimal convergence rate of is preserved despite the matrix compression and nearfield quadrature.
Since computing the exact -norm error is complicated, we rely on the -ellipticity of the single-layer operator: we compute the -projection of the Neumann values into the discrete space, which is expected to converge at a rate of to the exact solution, and then compare it to the Galerkin solution using the energy product corresponding to the matrix . Figure 4 indicates that the discrete -norm error even converges at a rate of , therefore the convergence compared to the continuous solution is also preserved.
Now that we have established that the compression algorithms do not hurt the optimal convergence rates of the Galerkin discretization, we can consider the corresponding complexity. The runtimes have been measured on a system with two Intel®Xeon®Platinum 8160 processors, each with 24 cores and running at a base clock of
GHz. The implementation is based on the open-source H2Lib software package, cf.http://www.h2lib.org.
Since we expect the runtime to grow like , where is the number of triangles and , we display the runtime per triangle in Figure 5, using a logarithmic scale for and a linear scale for the runtime. We can see that HCA and GCA behave differently if the problem size grows: the runtime for HCA shows jumps when the order of interpolation is increased, while the runtime for GCA shows jumps when the order of the nearfield quadrature is increased.
This observation underlines a fundamental difference between the two methods: HCA constructs the full coefficient matrix for every matrix block, and the matrix requires coefficients to be stored and operations for the adaptive cross approximation, where we expect . GCA, on the other hand, applies cross approximation only per cluster, not per block, and the recursive structure of the algorithm ensures that only indices used in a cluster’s children are considered in their parent. This explains why GCA is less vulnerable to increases in the order than HCA.
On the other hand, HCA computes the approximation of a submatrix by evaluating single integrals, cf. (11), that can be computed in operations, while GCA relies on double integrals, i.e., the entries of the matrix , that require operations. This explains why HCA is less vulnerable to increases in the nearfield quadrature order than GCA.
The setup of the matrices dominates the runtime, e.g., for triangles, the matrices , , and take , , and seconds to set up, respectively, while the preconditioner for takes only seconds for coarsening and seconds for the factorization, with and seconds for the preconditioner for . Solving the linear system with the preconditioned conjugate gradient method takes around seconds for the Dirichlet-to-Neumann problems and around seconds for the Neumann-to-Dirichlet problems.
Of course, the storage requirements of the algorithms may be even more important than the runtime, since they determine the size of a problem that “fits” into a given computer. We again expect a growth like and report the storage requirements per triangle in Figure 6.
Although theory leads us to expect the storage requirements to grow like , Figure 6 suggests a behaviour like in practice. We can see that HCA consistently requires less storage than GCA. This is not surprising, since the algebraic algorithm  employed to turn the hierarchical matrix provided by HCA into an -matrix essentially computes the best possible -matrix approximation.
We may conclude that a server with 2 processors and just processor cores equipped with GB of main memory can handle boundary element problems with more than 8 million triangles in a matter of hours without sacrificing accuracy.
Admittedly, the unit sphere considered so far is an academic example. In order to demonstrate that the techniques also work in more complicated settings, we consider the crank shaft geometry displayed in Figure 7 created by Joachim Schöberl’s netgen Software. We start with a mesh with triangles and refine these triangles regularly in order to obtain higher resolutions.
The boundary element mesh is far less “smooth” in this case, and this leads both to an increased condition number and the need to use significantly higher nearfield quadrature orders. Since we have already seen that HCA is far less susceptible to the nearfield quadrature than GCA, we only consider HCA in this example. Experiments indicate that the parameters given in Table 2 are sufficient to preserve the theoretically predicted convergence rates of the Galerkin method.
Figure 8 shows the -norm errors for different refinement levels of the crank shaft geometry. We can see that the optimal rate of convergence is again preserved despite the matrix compression.
Figure 9 shows the setup times per triangle for the three matrices. Due to the computationally expensive nearfield quadrature, the setup time for the matrices dominates the other parts of the program, e.g., for triangles the setup times for the three matrices are , , and seconds, respectively, while computing both preconditioners takes only seconds and each linear system is solved in unter seconds.
We conclude that using modern compression techniques like HCA and GCA in combination with efficient -matrix representations of the resulting matrices, large boundary element problems on meshes with several million triangles can be treated in few hours on moderately expensive servers.
-  Bebendorf, M.: Approximation of boundary element matrices. Numer. Math. 86(4), 565–589 (2000)
-  Bebendorf, M.: Hierarchical LU decomposition based preconditioners for BEM. Computing 74, 225–247 (2005)
-  Bebendorf, M., Kuske, C., Venn, R.: Wideband nested cross approximation for Helmholtz problems. Numer. Math. 130(1), 1–34 (2015)
-  Bebendorf, M., Rjasanow, S.: Adaptive low-rank approximation of collocation matrices. Computing 70(1), 1–24 (2003)
-  Börm, S.: Construction of data-sparse -matrices by hierarchical compression. SIAM J. Sci. Comp. 31(3), 1820–1839 (2009)
-  Börm, S.: Efficient Numerical Methods for Non-local Operators: -Matrix Compression, Algorithms and Analysis, EMS Tracts in Mathematics, vol. 14. EMS (2010)
-  Börm, S., Christophersen, S.: Approximation of integral operators by Green quadrature and nested cross approximation. Numer. Math. 133(3), 409–442 (2016)
-  Börm, S., Grasedyck, L.: Low-rank approximation of integral operators by interpolation. Computing 72, 325–332 (2004)
-  Börm, S., Grasedyck, L.: Hybrid cross approximation of integral operators. Numer. Math. 101, 221–249 (2005)
-  Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet methods for elliptic operator equations — Convergence rates. Math. Comp. 70, 27–75 (2001)
-  Dahmen, W., Schneider, R.: Wavelets on manifolds I: Construction and domain decomposition. SIAM J. Math. Anal. 31, 184–230 (1999)
-  Erichsen, S., Sauter, S.A.: Efficient automatic quadrature in 3-d Galerkin BEM. Comput. Meth. Appl. Mech. Eng. 157, 215–224 (1998)
-  Giebermann, K.: Multilevel approximation of boundary integral operators. Computing 67, 183–207 (2001)
-  Grasedyck, L.: Adaptive recompression of -matrices for BEM. Computing 74(3), 205–223 (2004)
-  Grasedyck, L., Hackbusch, W.: Construction and arithmetics of -matrices. Computing 70, 295–334 (2003)
-  Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comp. Phys. 73, 325–348 (1987)
-  Greengard, L., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation in three dimensions. In: Acta Numerica 1997, pp. 229–269. Cambridge University Press (1997)
-  Hackbusch, W.: Elliptic Differential Equations. Theory and Numerical Treatment. Springer-Verlag Berlin (1992)
-  Hackbusch, W.: A sparse matrix arithmetic based on -matrices. Part I: Introduction to -matrices. Computing 62(2), 89–108 (1999)
-  Hackbusch, W.: Hierarchical Matrices: Algorithms and Analysis. Springer (2015)
-  Hackbusch, W., Khoromskij, B.N., Sauter, S.A.: On -matrices. In: Bungartz, H., Hoppe, R., Zenger, C. (eds.) Lectures on Applied Mathematics. pp. 9–29. Springer-Verlag, Berlin (2000)
-  Harbrecht, H., Schneider, R.: Wavelet Galerkin schemes for boundary integral equations – Implementation and quadrature. SIAM J. Sci. Comput. 27, 1347–1370 (2006)
-  Langer, U., Pusch, D., Reitzinger, S.: Efficient preconditioners for boundary element matrices based on grey-box algebraic multigrid methods. Int. J. Numer. Meth. Eng. 58(13), 1937–1953 (2003)
-  von Petersdorff, T., Schwab, C.: Fully discretized multiscale Galerkin BEM. In: Dahmen, W., Kurdila, A., Oswald, P. (eds.) Multiscale wavelet methods for PDEs. pp. 287–346. Academic Press, San Diego (1997)
-  Rokhlin, V.: Rapid solution of integral equations of classical potential theory. J. Comp. Phys. 60, 187–207 (1985)
-  Sauter, S.A.: Cubature techniques for 3-d Galerkin BEM. In: Hackbusch, W., Wittum, G. (eds.) Boundary Elements: Implementation and Analysis of Advanced Algorithms. pp. 29–44. Vieweg-Verlag (1996)
-  Sauter, S.A., Schwab, C.: Boundary Element Methods. Springer (2011)
-  Steinbach, O., Wendland, W.L.: The construction of some efficient preconditioners in the boundary element method. Adv. in Comp. Math. 9, 191–216 (1998)
-  Tausch, J.: A variable order wavelet method for the sparse representation of layer potentials in the non-standard form. J. Numer. Math. 12(3), 233–254 (2004)