Fast large-scale boundary element algorithms

01/15/2020 ∙ by Steffen Börm, et al. ∙ 0

Boundary element methods (BEM) reduce a partial differential equation in a domain to an integral equation on the domain's boundary. They are particularly attractive for solving problems on unbounded domains, but handling the dense matrices corresponding to the integral operators requires efficient algorithms. This article describes two approaches that allow us to solve boundary element equations on surface meshes consisting of several million of triangles while preserving the optimal convergence rates of the Galerkin discretization.



There are no comments yet.


page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider Laplace’s equation

where is a non-empty domain with a sufficiently smooth boundary . If we add the boundary condition
with a suitable function , we obtain the Dirichlet problem. With the boundary condition


denotes the outward-pointing unit normal vector for the domain

, we arrive at the Neumann problem. We can solve these problems by using Green’s representation formula (cf., e.g., [18, Theorem 2.2.2]) given by



denotes the free-space Green’s function for the Laplace operator. If we know Dirichlet and Neumann boundary conditions, we can use (2) to compute the solution in any point of the domain .

For boundary values , Green’s formula takes the form


and this boundary integral equation can be used to solve the Dirichlet problem: we can solve the integral equation to obtain the Neumann values from the Dirichlet values and then use (2) to find the solution in all of .

In order to solve the Neumann problem, we take the normal derivative of (3) and arrive at the boundary integral equation


that can be used to find Dirichlet values matching given Neumann values, so we can follow the same approach as for the Dirichlet problem.

In this paper, we will concentrate on solving the boundary integral equations (3) and (4) efficiently. We employ a Galerkin scheme using a finite-dimensional space spanned by basis functions for the Neumann values and another finite-dimensional space spanned by basis functions for the Dirichlet values. The discretization turns the boundary integral operators into matrices, i.e., the matrix corresponding to the single-layer operator given by

the matrix corresponding to the double-layer operator given by
and the matrix corresponding to the hypersingular operator given by

Together with the mixed mass matrix given by

we obtain the linear system


for the Dirichlet-to-Neumann problem (3), where contains the coefficients of the Neumann values and those of the given Dirichlet values, and the system


for the Neumann-to-Dirichlet problem (4), where now contains the coefficients of the Dirichlet values and those of the given Neumann values. and denote the transposed matrices of and , respectively.

Considered from the point of view of numerical mathematics, these linear systems pose three challenges: we have to evaluate singular integrals in order to compute diagonal and near-diagonal entries, the resulting matrices are typically dense and large, and the condition number grows with the matrix dimension, which leads to a deteriorating performance of standard Krylov solvers. The first challenge can be met by using suitable quadrature schemes like the Sauter-Schwab-Erichsen technique [26, 12, 27]. For the third challenge, various preconditioning algorithms have been proposed [28, 23], among which we choose -matrix coarsening and factorization [14, 2].

This leaves us with the second challenge, i.e., the efficient representation of the dense matrices , , and . This task is usually tackled by employing a data-sparse approximation, i.e., by finding a sufficiently accurate approximation of the matrices that requires only a small amount of data. One possibility to construct such an approximation is to use wavelet basis functions and neglect small matrix entries in order to obtain a sparse matrix [24, 29, 10, 22]. Since the construction of suitable wavelet spaces on general surfaces is complicated [11], we will not focus on this approach.

Instead, we consider low-rank approximation techniques that directly provide an approximation of the matrices for standard finite element basis functions. These techniques broadly fall into three categories: analytic methods approximate the kernel function

locally by sums of tensor products, and discretization of these sums leads to low-rank approximations of matrix blocks. The most prominent analytic methods are the fast multipole expansion

[25, 16, 17]

and interpolation

[13, 8]. Algebraic methods, on the other hand, directly approximate the matrix blocks, e.g., by computing a rank-revealing factorization like an adaptive cross approximation [1, 4, 3]

. The convergence and robustness of analytic methods can be proven rigorously, but they frequently require a larger than necessary amount of storage. Algebraic methods reach close to optimal compression, but involve a heuristic pivoting strategy that may fail in certain cases

[9, Example 2.2]. Hybrid methods combine analytic and algebraic techniques in order to obtain the advantages of both without the respective disadvantages. In this paper, we will focus on the hybrid cross approximation [9] and the Green cross approximation [7] that both combine an analytic approximation with an algebraic improvement in order to obtain fast and reliable algorithms.

2 -matrices

Both approximation schemes in this paper’s focus lead to -matrices [21, 6], a special case of hierarchical matrices [19, 15]. A given matrix with general finite row and column index sets and is split into submatrices that are approximated by factorized low-rank representations.

The submatrices are constructed hierarchically in order to make the matrix accessible for elegant and efficient recursive algorithms. The first step is to split the index sets into a tree structure of subsets.

Definition 1 (Cluster tree)

Let be a finite non-empty index set. A tree is called a cluster tree for if the following conditions hold:

  • Every node of the tree is a subset of .

  • The root is .

  • If a node has children, it is the union of these children:

  • The children of a node are disjoint:

Nodes of a cluster tree are called clusters.

Cluster trees can be constructed by recursively splitting index sets, e.g., ensuring that “geometrically close” indices are contained in the same cluster. We assume that cluster trees and for the index sets and are given.

Using the cluster trees, the index set corresponding to the matrix can now be split into a tree structure.

Definition 2 (Block tree)

A tree is called a block tree for the row index set and the column index set if the following conditions hold:

  • Every node of the tree is a subset with and .

  • The root is .

  • If a node has children, the children are given as follows:

    for all .

Nodes of a block tree are called blocks.

Block trees are usually constructed recursively using an admissibility condition matching the intended approximation scheme: we start with the root and recursively subdivide blocks. Once the admissibility condition indicates for a block that we can approximate the submatrix , we stop subdividing and call the block a farfield block. We also stop if and have no children, then is a nearfield block. If we ensure that leaf clusters have only a small number of elements, nearfield blocks are small and we can afford to store them without compression. The key to the efficiency of hierarchical matrices is the data-sparse representation of the farfield blocks.

We assume that a block tree is given and that its farfield leaves are collected in a set , while the nearfield leaves are in a set .

For a hierarchical matrix [19, 15], we simply assume that farfield blocks have low rank and can therefore be stored efficiently in factorized form with , . Here we use and as abbreviations for and .

The more efficient -matrices [21, 6] use a different factorized representation closely related to fast multipole methods [25]: each cluster is associated with a matrix, and only the columns of this matrix are used to represent matrix blocks.

Definition 3 (Cluster basis)

Let . A family of matrices is called a cluster basis for with (maximal) rank if the following conditions hold:

  • We have for all .

  • For all with , there are transfer matrices for all such that


An important property of cluster bases is that they can be stored efficiently: under standard assumptions, only units of storage are requires, where is the cardinality of the index set .

Definition 4 (-matrix)

Let and be cluster bases for and , respectively. A matrix is called an -matrix with row basis and column basis if for every admissible leaf there is a coupling matrix such that


Under standard assumptions, we can represent an -matrix by coefficients, where and are the cardinalities of the index sets. The matrix-vector multiplication can be performed in operations for -matrices, and there are a number of other important operations that also only have linear complexity with respect to and , cf. [6].

3 Hybrid cross approximation

In order to construct an -matrix approximation of the matrices , , and required for the boundary element method, we first consider the hybrid cross approximation (HCA) method [9] originally developed for hierarchical matrices.

We associate each cluster with an axis-parallel bounding box such that the supports of all basis functions associated with indices in are contained in . In order to ensure that the kernel function is sufficiently smooth for a polynomial approximation, we introduce the admissibility condition


where and denote the Euclidean diameters of the bounding boxes and , while denotes their Euclidean distance. is a parameter that controls the storage complexity and the accuracy of the approximation.

If two clusters , satisfy this condition, the restriction can be approximated by polynomials. We use -th order tensor Chebyshev interpolation and denote the interpolation points for and by and , where . The corresponding Lagrange polynomials are denoted by and , and the tensor interpolation polynomial is given by

It is possible to prove

The rate of convergence depends only on the parameter , cf. [6, Chapter 4].

Unfortunately, the rank of this approximation is too large: potential theory suggests that a rank should be sufficient for an -th order approximation.

We reduce the rank by combining the interpolation with an algebraic procedure, i.e., with adaptive cross approximation [1, 4]: we introduce the matrix by

and use a rank-revealing pivoted LU factorization to obtain an approximation of the form

where and denote the first row and column pivots, respectively. Due to the properties of the kernel function , the rank can be expected to be significantly smaller than .

Given the pivot sets and , we can now “take back” the interpolation:

By controlling the interpolation order and the accuracy of the cross approximation, we can ensure that the approximation error is below any given tolerance [9].

In order to obtain an approximation of the submatrix , we replace by in (5a) to find


with matrices and . Due to , the matrix is an improved low-rank approximation of the submatrix .

This procedure alone only yields a hierarchical matrix, not the desired -matrix. In order to reduce the storage requirements, we apply hierarchical compression [5]: the hybrid cross approximation technique already yields low-rank approximations of individual blocks, but these approximations do not share common row or column cluster bases. The hierarchical compression algorithm recursively merges independent submatrices into larger -matrices until the entire matrix is in the required form.

4 Green cross approximation

While hybrid cross approximation requires a subsequent compression step to obtain an -matrix, we now consider a related technique that directly yields an -matrix approximation of the matrix .


Figure 1: Possible choice of the auxiliary bounding box corresponding to and

We once more consider an admissible pair , of clusters with bounding boxes . We construct an auxiliary axis-parallel box such that


cf. Figure 1 for an illustration. Let . The third assumption in (12) implies , therefore the function is harmonic in This property allows us to apply Green’s equation (2) to the domain to obtain

We observe that the variables and are separated in both integrands. Due to the second and third assumptions in (12), the integrands are smooth, and we can approximate the integrals by a quadrature rule, e.g., a composite Gauss rule, with weights and quadrature points on the boundary in order to get

for all and all . The function is again a sum of tensor products, and, as in the case of the hybrid cross approximation, its discretization gives rise to a low-rank approximation of the submatrix , since we have

with and , therefore

Unfortunately, the rank of this approximation is quite high, higher than, e.g., for the hybrid cross approximation. Once again, we can use an algebraic technique, i.e., the adaptive cross approximation, to improve the analytically-motivated initial approximation : we define

and perform an adaptive cross approximation to find index sets and of cardinality such that

By applying this approximation and “taking back” the quadrature approximation in the last step, we arrive at

We define and and obtain


This is a rank- approximation of the matrix block, and experiments indicate that is frequently far smaller than . The equation (13) can be interpreted as “algebraic interpolation”: we (approximately) recover all entries of the matrix from a few rows , where the indices in play the role of interpolation points and the columns of the role of Lagrange polynomials. We call this approach Green cross approximation (GCA).

Concerning our goal of finding an -matrix, we observe that and depend only on and , but not on the cluster , therefore it is a good candidate for a cluster basis.

Applying the same procedure to the cluster instead of , we obtain and such that

and combining both approximations yields

the required representation (9) for an -matrix.

In order to make and proper cluster bases, we have to ensure the nesting property (8). We can achieve this goal by slightly modifying our construction: we assume that has two children and that the sets and and the matrices and have already been computed. We have

We apply the cross approximation to instead of and obtain and such that

and therefore

so defining

ensures (8) if we let

This modification not only ensures that is a proper cluster basis, it also reduces the computational work required for the cross approximation, since only the submatrices have to be considered, and these submatrices are significantly smaller than .

The parallelization of this algorithm is straightforward: we can compute index sets and matrices for all leaves of the cluster tree in parallel. Once this task has been completed, we can treat all clusters whose children are leaves in parallel. Next are all clusters whose children have already been treated. Repeating this procedure until we reach the root of the tree yields a simple and efficient parallel version of the algorithm.

5 Numerical experiments

Now that we have two compression algorithms at our disposal, we have to investigate how well they perform in practice. While we can prove for both algorithms that they can reach any given accuracy (disregarding rounding errors), we have to see which accuracies are necessary in order to preserve the theoretical convergence rates of the Galerkin discretization.

Until now, we have only seen HCA and GCA applied to the matrix corresponding to the single-layer operator. For the other two operators, we simply take the appropriate derivatives of and and use them as approximations of the kernel functions.

Given a boundary element mesh, we choose discontinuous piecewise constant basis functions for the Neumann values and continuous piecewise linear basis functions for the Dirichlet values. For a meshwidth of , we expect theoretical convergence rates of for the Neumann values in the norm, for the Neumann values in the norm and the Dirichlet values in the norm, and for the Dirichlet values in the norm.

In a first experiment, we approximate the unit sphere by a sequence of triangular meshes constructed by splitting the eight sides of a double pyramid regularly into triangles and then projecting all vertices to the unit sphere. Since we expect the condition number of the matrices to be in and want to preserve the theoretical convergence rate of , we aim for an accuracy of for the matrix approximation. For the sake of simplicity, we use a slightly higher accuracy: if is halved, we reduce the error tolerance by a factor of instead of just .

Nearfield matrix entries are computed by Sauter-Schwab-Erichsen quadrature [27], and we have to increase the order of the nearfield quadrature occasionally to ensure the desired rate of convergence.

Table 1: Parameters chosen for the unit sphere

The parameters used for this experiment are summarized in Table 1, where denotes the number of triangles, the nearfield quadrature order, the resolution of the cluster tree, i.e., the maximal size of leaf clusters, the order of interpolation for HCA and the order of quadrature for GCA, the relative accuracy for the adaptive cross approximation, the accuracy for the hierarchical compression used for HCA, the relative accuracy of the Krylov solver, and the relative accuracy of the preconditioner constructed by coarsening [14] and -Cholesky decomposition [20].


Figure 2: error for the Dirichlet-to-Neumann problem

Figure 2 shows the -norm error for the approximation of the Neumann data computed via the boundary integral equation (3). We use the functions , , and with and as test cases. We can see that the optimal convergence rate of is preserved despite the matrix compression and nearfield quadrature.


Figure 3: error for the Neumann-to-Dirichlet problem

Figure 3 shows the -norm error for the approximation of the Dirichlet data computed via the boundary integral equation (4). Also in this case, the optimal convergence rate of is preserved.

Since computing the exact -norm error is complicated, we rely on the -ellipticity of the single-layer operator: we compute the -projection of the Neumann values into the discrete space, which is expected to converge at a rate of to the exact solution, and then compare it to the Galerkin solution using the energy product corresponding to the matrix . Figure 4 indicates that the discrete -norm error even converges at a rate of , therefore the convergence compared to the continuous solution is also preserved.


Figure 4: Discrete error for the Dirichlet-to-Neumann problem

Now that we have established that the compression algorithms do not hurt the optimal convergence rates of the Galerkin discretization, we can consider the corresponding complexity. The runtimes have been measured on a system with two Intel®Xeon®Platinum 8160 processors, each with 24 cores and running at a base clock of

GHz. The implementation is based on the open-source H2Lib software package, cf.


Figure 5: Setup times for cluster bases and matrices

Since we expect the runtime to grow like , where is the number of triangles and , we display the runtime per triangle in Figure 5, using a logarithmic scale for and a linear scale for the runtime. We can see that HCA and GCA behave differently if the problem size grows: the runtime for HCA shows jumps when the order of interpolation is increased, while the runtime for GCA shows jumps when the order of the nearfield quadrature is increased.

This observation underlines a fundamental difference between the two methods: HCA constructs the full coefficient matrix for every matrix block, and the matrix requires coefficients to be stored and operations for the adaptive cross approximation, where we expect . GCA, on the other hand, applies cross approximation only per cluster, not per block, and the recursive structure of the algorithm ensures that only indices used in a cluster’s children are considered in their parent. This explains why GCA is less vulnerable to increases in the order than HCA.

On the other hand, HCA computes the approximation of a submatrix by evaluating single integrals, cf. (11), that can be computed in operations, while GCA relies on double integrals, i.e., the entries of the matrix , that require operations. This explains why HCA is less vulnerable to increases in the nearfield quadrature order than GCA.

The setup of the matrices dominates the runtime, e.g., for triangles, the matrices , , and take , , and seconds to set up, respectively, while the preconditioner for takes only seconds for coarsening and seconds for the factorization, with and seconds for the preconditioner for . Solving the linear system with the preconditioned conjugate gradient method takes around seconds for the Dirichlet-to-Neumann problems and around seconds for the Neumann-to-Dirichlet problems.

Of course, the storage requirements of the algorithms may be even more important than the runtime, since they determine the size of a problem that “fits” into a given computer. We again expect a growth like and report the storage requirements per triangle in Figure 6.


Figure 6: Storage requirements for the matrices

Although theory leads us to expect the storage requirements to grow like , Figure 6 suggests a behaviour like in practice. We can see that HCA consistently requires less storage than GCA. This is not surprising, since the algebraic algorithm [5] employed to turn the hierarchical matrix provided by HCA into an -matrix essentially computes the best possible -matrix approximation.

We may conclude that a server with 2 processors and just processor cores equipped with GB of main memory can handle boundary element problems with more than 8 million triangles in a matter of hours without sacrificing accuracy.

Admittedly, the unit sphere considered so far is an academic example. In order to demonstrate that the techniques also work in more complicated settings, we consider the crank shaft geometry displayed in Figure 7 created by Joachim Schöberl’s netgen Software. We start with a mesh with triangles and refine these triangles regularly in order to obtain higher resolutions.


Figure 7: Crank shaft geometry

The boundary element mesh is far less “smooth” in this case, and this leads both to an increased condition number and the need to use significantly higher nearfield quadrature orders. Since we have already seen that HCA is far less susceptible to the nearfield quadrature than GCA, we only consider HCA in this example. Experiments indicate that the parameters given in Table 2 are sufficient to preserve the theoretically predicted convergence rates of the Galerkin method.

Table 2: Parameters chosen for the crank shaft geometry


Figure 8: error for the Neumann-to-Dirichlet problem for the crank shaft geometry

Figure 8 shows the -norm errors for different refinement levels of the crank shaft geometry. We can see that the optimal rate of convergence is again preserved despite the matrix compression.


Figure 9: Setup times for matrices for the crank shaft geometry

Figure 9 shows the setup times per triangle for the three matrices. Due to the computationally expensive nearfield quadrature, the setup time for the matrices dominates the other parts of the program, e.g., for triangles the setup times for the three matrices are , , and seconds, respectively, while computing both preconditioners takes only seconds and each linear system is solved in unter seconds.

We conclude that using modern compression techniques like HCA and GCA in combination with efficient -matrix representations of the resulting matrices, large boundary element problems on meshes with several million triangles can be treated in few hours on moderately expensive servers.


  • [1] Bebendorf, M.: Approximation of boundary element matrices. Numer. Math. 86(4), 565–589 (2000)
  • [2] Bebendorf, M.: Hierarchical LU decomposition based preconditioners for BEM. Computing 74, 225–247 (2005)
  • [3] Bebendorf, M., Kuske, C., Venn, R.: Wideband nested cross approximation for Helmholtz problems. Numer. Math. 130(1), 1–34 (2015)
  • [4] Bebendorf, M., Rjasanow, S.: Adaptive low-rank approximation of collocation matrices. Computing 70(1), 1–24 (2003)
  • [5] Börm, S.: Construction of data-sparse -matrices by hierarchical compression. SIAM J. Sci. Comp. 31(3), 1820–1839 (2009)
  • [6] Börm, S.: Efficient Numerical Methods for Non-local Operators: -Matrix Compression, Algorithms and Analysis, EMS Tracts in Mathematics, vol. 14. EMS (2010)
  • [7] Börm, S., Christophersen, S.: Approximation of integral operators by Green quadrature and nested cross approximation. Numer. Math. 133(3), 409–442 (2016)
  • [8] Börm, S., Grasedyck, L.: Low-rank approximation of integral operators by interpolation. Computing 72, 325–332 (2004)
  • [9] Börm, S., Grasedyck, L.: Hybrid cross approximation of integral operators. Numer. Math. 101, 221–249 (2005)
  • [10] Cohen, A., Dahmen, W., DeVore, R.: Adaptive wavelet methods for elliptic operator equations — Convergence rates. Math. Comp. 70, 27–75 (2001)
  • [11] Dahmen, W., Schneider, R.: Wavelets on manifolds I: Construction and domain decomposition. SIAM J. Math. Anal. 31, 184–230 (1999)
  • [12] Erichsen, S., Sauter, S.A.: Efficient automatic quadrature in 3-d Galerkin BEM. Comput. Meth. Appl. Mech. Eng. 157, 215–224 (1998)
  • [13] Giebermann, K.: Multilevel approximation of boundary integral operators. Computing 67, 183–207 (2001)
  • [14] Grasedyck, L.: Adaptive recompression of -matrices for BEM. Computing 74(3), 205–223 (2004)
  • [15] Grasedyck, L., Hackbusch, W.: Construction and arithmetics of -matrices. Computing 70, 295–334 (2003)
  • [16] Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comp. Phys. 73, 325–348 (1987)
  • [17] Greengard, L., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation in three dimensions. In: Acta Numerica 1997, pp. 229–269. Cambridge University Press (1997)
  • [18] Hackbusch, W.: Elliptic Differential Equations. Theory and Numerical Treatment. Springer-Verlag Berlin (1992)
  • [19] Hackbusch, W.: A sparse matrix arithmetic based on -matrices. Part I: Introduction to -matrices. Computing 62(2), 89–108 (1999)
  • [20] Hackbusch, W.: Hierarchical Matrices: Algorithms and Analysis. Springer (2015)
  • [21] Hackbusch, W., Khoromskij, B.N., Sauter, S.A.: On -matrices. In: Bungartz, H., Hoppe, R., Zenger, C. (eds.) Lectures on Applied Mathematics. pp. 9–29. Springer-Verlag, Berlin (2000)
  • [22] Harbrecht, H., Schneider, R.: Wavelet Galerkin schemes for boundary integral equations – Implementation and quadrature. SIAM J. Sci. Comput. 27, 1347–1370 (2006)
  • [23] Langer, U., Pusch, D., Reitzinger, S.: Efficient preconditioners for boundary element matrices based on grey-box algebraic multigrid methods. Int. J. Numer. Meth. Eng. 58(13), 1937–1953 (2003)
  • [24] von Petersdorff, T., Schwab, C.: Fully discretized multiscale Galerkin BEM. In: Dahmen, W., Kurdila, A., Oswald, P. (eds.) Multiscale wavelet methods for PDEs. pp. 287–346. Academic Press, San Diego (1997)
  • [25] Rokhlin, V.: Rapid solution of integral equations of classical potential theory. J. Comp. Phys. 60, 187–207 (1985)
  • [26] Sauter, S.A.: Cubature techniques for 3-d Galerkin BEM. In: Hackbusch, W., Wittum, G. (eds.) Boundary Elements: Implementation and Analysis of Advanced Algorithms. pp. 29–44. Vieweg-Verlag (1996)
  • [27] Sauter, S.A., Schwab, C.: Boundary Element Methods. Springer (2011)
  • [28] Steinbach, O., Wendland, W.L.: The construction of some efficient preconditioners in the boundary element method. Adv. in Comp. Math. 9, 191–216 (1998)
  • [29] Tausch, J.: A variable order wavelet method for the sparse representation of layer potentials in the non-standard form. J. Numer. Math. 12(3), 233–254 (2004)