## Method details

In Drzisga et al. (2019), we applied the surrogate matrix methodology to isogeometric analysis (IGA) in order to avoid *over-assembling* mass, stiffness, and divergence matrices.
While doing so, we also showed how this methodology could be applied to other system matrices arising in IGA computations.
In that article, large performance gains were demonstrated for a number of important PDE scenarios while still maintaining solution accuracy.
In the present article, we provide an accompanying reference implementation, together with explanations and examples.
This should be considered an extension to the main theoretical article Drzisga et al. (2019), which we strongly recommend to read beforehand.

Recall that Drzisga et al. (2019) focuses on the Galerkin form of IGA Cottrell et al. (2009); Hughes et al. (2005).
The resulting surrogate matrix methods perform quadrature for only a small fraction of the NURBS basis function interactions during assembly and then *approximate the rest by interpolation*.
This leads to large sparse linear system matrices where the majority of entries have been computed by interpolation.
Such interpolated matrices will generally not coincide with those which would otherwise be generated by performing quadrature for the complete basis, or on every element, but they can be interpreted as cost-effective surrogates for them.

This idea was first introduced in the context of first-order finite elements and massively parallel simulations by Bauer et al. in Bauer et al. (2017). There, the computational run time improvement results from a two-scale strategy after which the method was named. Thereafter, several subsequent investigations were initiated Bauer et al. (2019, 2018); Drzisga et al. (2019, 2019), of which this paper may be seen as a product. In particular, massively parallel applications in geodynamical simulations are presented in Bauer et al. (2019, 2018) and a theoretical analysis for first order finite element methods is given in Drzisga et al. (2019). For moderately sized problems a two-scale strategy will possibly not result in an optimal performance gain. Therefore, we exploit a mesh-size dependent macro-element approach where the macro-mesh is closer related to the fine mesh.

In this article, we restrict our attention to Poisson’s boundary value problem on a single patch geometry , where , is a fixed diffeomorphism of sufficient regularity, and . Represented on the reference domain , the bilinear form appearing in the standard weak form of this problem is defined

for arbitrary .
Adopting further notation from Drzisga et al. (2019), this reference domain description leads to the definition of a set of smooth *stencil functions* , .
In turn, these functions can be interpolated onto a finite-dimensional space of multivariate B-splines using their values sampled from a subset of a lattice .
Such interpolants are called *surrogate stencil functions* .
Taking on the convention , we define the surrogate stiffness matrix as follows:

This definition preserves the symmetry and the kernel of the true IGA stiffness matrix .

The reference implementation described in this paper is built on the GeoPDEs package for Isogeometric Analysis in Matlab and Octave de Falco et al. (2011); Vázquez (2016). This package provides a framework for implementing and testing new isogeometric methods for PDEs. Our reference implementation is available in the git repository git (2019), which is itself a fork of the GeoPDEs repository. It is important to note that similar modifications can be made to other present-day software libraries and so the implementation presented in this article should foremost serve as a template. For this reason, we tried to find a balance between comprehensibility and performance. Most notably, the performance of our implementation may be greatly improved by performing quadrature for individual basis function interactions instead of element-wise quadrature. However, for most IGA software libraries, this feature would require significant software restructuring.

Before continuing, let us establish some more mathematical notation which appears in the article, cf. Drzisga et al. (2019). Let be the space dimension, be the maximum degree of the discrete IGA B-spline spaces, and be the mesh size. By , we denote the skip parameter which defines the sampling length on which we perform interpolation of the stencil functions. Let be the spline degree of the interpolation space. We assume that the number of elements in each spatial dimension is always equal and we denote its value by

. Finally, recall that B-spline and NURBS bases are built from tensor products of univariate B-splines. For convenience, we assume that the associated number of knots and the B-spline degrees do not depend on the Cartesian coordinates.

## Implementation

Our implementation preserves the local element quadrature approach present in most standard IGA and finite element software.
This is not optimally efficient, but performance advantages can still be easily achieved because quadrature is usually not required on every element.
In order to avoid performing quadrature on specific elements, we made some minor modifications to the following core functions in GeoPDEs: msh_evaluate_col.m^{1}^{1}1
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/msh/%40msh_cartesian/msh_evaluate_col.m
, @sp_scalar/sp_evaluate_col.m^{2}^{2}2
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/space/%40sp_scalar/sp_evaluate_col.m
, and @sp_scalar/sp_evaluate_col_param.m^{3}^{3}3
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/space/%40sp_scalar/sp_evaluate_col_param.m
.^{4}^{4}4The interested reader may refer to the diff between the surrogate and the GeoPDEs master branch in order to view the precise modifications.
In addition to these minor modifications, we added two functions which handle the assembly of the surrogate stiffness matrices in 2D and 3D, respectively.
Namely, we added op_gradu_gradv_surrogate_2d^{5}^{5}5
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/space/%40sp_scalar/op_gradu_gradv_surrogate_2d.m
and op_gradu_gradv_surrogate_3d^{6}^{6}6
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/space/%40sp_scalar/op_gradu_gradv_surrogate_3d.m
.
These functions are based on @sp_scalar/op_gradu_gradv_tp^{7}^{7}7
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/space/%40sp_scalar/op_gradu_gradv_tp.m
wherein the global matrix is assembled in a column-wise fashion Vázquez (2016).
In this section, we outline the modifications made to these core routines and describe the new 2D function in detail by breaking it down into coherent pieces of code.

### Code modifications

In GeoPDEs, column-wise assembly which exploits the tensor product structure in IGA is used for performance Vázquez (2016).
For instance, in 2D, the functions listed above are called for each column of elements^{8}^{8}8
In 3D, each column in the first dimension corresponds to a plane in the remaining dimensions, thus multiple element masks are required to cover all the active elements.
in a patch during the column-wise assembly of the global stiffness matrix.
Here, the function msh_evaluate_col collects the quadrature rules for the physical domain in a column and the function sp_evaluate_col returns a struct variable representing the discrete function space in that column.

Let us denote the elements where quadrature in necessary as *active elements*, cf. Fig. 1.
We added the possibility to perform quadrature on only a subset of the elements in a given column by providing a list of indices called a *mask* for the active elements.
We implemented this feature by extending the core functions mentioned above by an additional optional argument named element_mask

which, as the name suggests, consists of a cell array with vectors of indices of active elements. Particularly, the

element_mask is a cell array with a mask for the second dimension in the first component and, if required, contains an additional mask for the third dimension in the second component.Some typical patterns of active elements are depicted in Fig. 1. Here, we have illustrated two different element masks corresponding to the cases , , and . Let us focus on the case . Here, the first and last four columns correspond to near-boundary elements. That is, each whole column of elements is made up of active elements; cf. the red column in Fig. 1. In this scenario, we should employ unmodified function calls to msh_evaluate_col and sp_evaluate_col.

Another scenario is when only the top and bottom near-boundary elements in a column are active; cf. the green elements in Fig. 1. Here, the element mask is given by the set , which yields the element mask for .

In the third and final scenario, both elements in the interior and near the top and bottom boundary are active; cf. the blue elements in Fig. 1. Each interior active element patch consists of elements and we start sampling at the element with an increment of . Since we want to avoid extrapolation in the subsequent spline interpolation, we also include the active elements near the interior boundary. With , the interior indices are thus given by . The final element mask is obtained by also including the near-boundary elements, i.e., . In our case , the element mask is described by the set .

### Code extensions

Below is the signature of op_gradu_gradv_surrogate_2d. This function has the following input arguments: a space of type sp_scalar, a mesh of type msh_cartesian, a function handle coeff of the coefficient, the skip paramater M and the surrogate interpolation degree q. The final surrogate matrix K_surr (in sparse format) is the sole output.

[firstline=36,lastline=36]src/op_gradu_gradv_surrogate_2d.m At the beginning of the function, some self-explanatory sanity checks are performed which verify that the correct input parameters have been passed. [firstline=38,lastline=52]src/op_gradu_gradv_surrogate_2d.m In the next few lines, some helper variables holding the number of degrees of freedom (dofs) in the univariate B-spline basis are defined. The total number may be expressed as

. Removing dofs from both the left and right boundary yields the number of interior dofs present in one Cartesian direction in the stencil function domain. [firstline=54,lastline=56]src/op_gradu_gradv_surrogate_2d.m The following lines compute the indices of the rows in the global stiffness matrix corresponding to the stencil function domain. This is done by collecting all available indices and removing the indices nearest the boundary in each dimension. In the 3D case, the array row_indices has 3 dimensions. [firstline=59,lastline=65]src/op_gradu_gradv_surrogate_2d.m The next step is computing the sample point coordinates in the interior of the stencil function domain described in Drzisga et al. (2019). For practical reasons, we map this domain to the unit domain and take every point in each direction. To avoid extrapolating any values later, we reinsert the sample points on boundary of which may have been skipped. [firstline=68,lastline=72]src/op_gradu_gradv_surrogate_2d.m Prior to computing the element mask, we save the indices of the matrix rows corresponding to the sample points row_indices_subset. These indices are required in order to determine the active elements with the GeoPDEs function sp_get_cells. In order to make the coming call to sp_get_cells faster, we filter the row_indices_subset to only include the rows up to index . The last line filtering the rows is optional and may be omitted. [firstline=75,lastline=77]src/op_gradu_gradv_surrogate_2d.m The following snippet sets up an element mask which skips the quadrature at all non-active elements. Here, we employ the sp_get_cells function which returns the indices of all elements within the support of the basis functions corresponding to the rows filtered in the previous snippet. Some additional filtering enforces that only the indices up to are included. Since quadrature needs to be performed on all of the remaining near-boundary elements, we ensure that the elements with first or second indices in are also active. [firstline=80,lastline=84]src/op_gradu_gradv_surrogate_2d.m Additionally, a second mask with only the near-boundary element indices is required. Clearly, this mask only includes the indices in. [firstline=87,lastline=88]src/op_gradu_gradv_surrogate_2d.m For performance reasons, we pre-allocate memory for the sparse matrix with an estimated number of non-zero entries per row. [firstline=91,lastline=91]src/op_gradu_gradv_surrogate_2d.m Below is the first main loop which only performs quadrature on the active elements; i.e., those required for the subsequent stencil interpolation. It is very similar to the original column-wise loop in

op_gradu_gradv_tp, but it employs the modified msh_evaluate_col and sp_evaluate_col (see previous subsection) with the previously determined masks. [firstline=94,lastline=112]src/op_gradu_gradv_surrogate_2d.m We define three vectors, which will hold the columns, rows, and values of the surrogate matrix. This format allows for faster construction of a sparse matrix in a compressed sparse column format. [firstline=114,lastline=116]src/op_gradu_gradv_surrogate_2d.m The following snippet contains the main loop of the surrogate assembly algorithm. In practice, there are surrogate stencil functions , so the statements in the inner-most loop are reached times. First, the position (i.e., shift) of the stencil function relative to the diagonal entries in the matrix is computed. This “shift” is in one-to-one correspondence with the translations described in Drzisga et al. (2019). Due to symmetry, we only need to compute the surrogates for the upper-diagonal matrix, thus we skip all cases where the shift is smaller or equal to zero by simply continuing to the next iteration. If the shift is larger than zero, we extract all the sample points from the associated rows of the partly assembled stiffness matrix. Having all the sample points at hand, we can easily perform the interpolation of the remaining matrix entries . In order to remove any dependence on external software, the built-in Matlab function interp2 is used here. Note that this function only supports the spline interpolation orders and . In Drzisga et al. (2019), we used the RectBivariateSpline function provided by the SciPy Python package Jones et al. (01 ), which supports any spline interpolation up to order . In the last three lines, the rows, columns, and values of the surrogate matrix are added to the vectors required for the sparse matrix creation at the end of the routine. Note that the values of the upper-diagonal are added to the lower-diagonal in order to enforce symmetry. [firstline=119,lastline=142]src/op_gradu_gradv_surrogate_2d.m For performance reasons, we pre-allocate the diagonal of the surrogate stiffness matrix with dummy values which are overwritten later. [firstline=145,lastline=147]src/op_gradu_gradv_surrogate_2d.m In the penultimate step, we combine the matrix components obtained through interpolation with those obtained through standard quadrature. This is done by creating a matrix K_interp with only the interpolated components. We then zero all of the entries in K_surr which have non-zero values in K_interp and finally sum the two matrices. This may seem complicated at first, but it yields good performance because the sum of both sparse matrices may be computed efficiently. [firstline=150,lastline=155]src/op_gradu_gradv_surrogate_2d.m In the final step, the zero row-sum property is enforced by setting the diagonal value to the negative sum of the off-diagonal values in each row. [firstline=158,lastline=158]src/op_gradu_gradv_surrogate_2d.m The op_gradu_gradv_surrogate_3d function is in many ways similar to its 2D counterpart. For instance, clearly some of the 2D arrays need to be changed to 3D arrays, the 3D interpolation function interp3 needs to be used, and the number of stencil functions increases. However, performing quadrature only on the active elements is more difficult, since in the column-wise iteration GeoPDEs uses, each column is now made up of a matrix of elements (instead of a vector). Therefore, a much more complicated set of element masks need to be used; see lines 98 to 115 in op_gradu_gradv_surrogate_3d.m^{9}

^{9}9 https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/space/%40sp_scalar/op_gradu_gradv_surrogate_3d.m .

## Verification

To facilitate easy verification, we added the demo script geopdes_surrogate_examples.m^{10}^{10}10
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/examples/geopdes_surrogate_examples.m
, which provides the ability to run the problems considered in (Drzisga et al., 2019, Section 7.3).
The code for the 2D and 3D examples may be found in ex_surrogate_poisson_2d.m^{11}^{11}11
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/examples/surrogate/ex_surrogate_poisson_2d.m
and ex_surrogate_poisson_3d.m^{12}^{12}12
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/examples/surrogate/ex_surrogate_poisson_3d.m
, respectively.
Again, recall Poisson’s problem

on a domain .
The 2D and 3D domains we considered are depicted in Fig. 2 and their respective NURBS/B-spline descriptions are in GeomQuarterAnnulusWithBumps.m^{13}^{13}13
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/examples/surrogate/data_files/GeomQuarterAnnulusWithBumps.m
and GeomBentTwistedBox.m^{14}^{14}14
https://github.com/drzisga/geopdes/blob/v3.1-surrogate/geopdes/inst/examples/surrogate/data_files/GeomBentTwistedBox.m
.

In order to test the quality of the discrete solutions, we employ the following manufactured solution and load in 2D

and the following in 3D

The Dirichlet datum is chosen as the restriction of the manufactured solution to the boundary .

Each example script assembles the standard IGA matrix first and solves the corresponding system. Afterwards, the surrogate matrix is assembled and the surrogate system is solved. Finally, the relative and errors for each case are computed and written to the console. In addition, both the maximum absolute value of the difference in the two stiffness matrices (i.e., ) and the achieved speed-up are displayed.

In the following, we present the output of running the ex_surrogate_poisson_2d script in Matlab 2018b.
The script ran on a workstation equipped with an Intel^{®} Xeon^{®} E5-1630 v3 processor with a nominal frequency of 3.7 GHz.
[fontsize=,frame=single,framerule=0.4pt]text
Initializing problem…
Assembling standard IGA matrix…
Standard assembly time: 5.022883 s
Solving standard IGA problem…
Standard solve time: 0.270706 s
Assembling surrogate IGA matrix…
Surrogate assembly time: 1.577257 s
Solving surrogate IGA problem…
Surrogate solve time: 0.273799 s
Computing errors…
Relative error in standard IGA
L2-norm: 1.335554e-03
H1-norm: 1.407778e-02

∥A-~A∥_max = 9.877796e-04

Relative error in surrogate IGA L2-norm: 1.335619e-03 H1-norm: 1.407779e-02

Assembly speed-up: 218.46

The default parameters in the demo script are , , and . The number of knots in each dimension is in 2D and in 3D. These parameters may be easily modified by resetting the appropriate variables at the beginning of each script.

## Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 800898. This work was supported by the German Research Foundation (DFG) and the Technical University of Munich (TUM) within the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA), by grant WO671/11-1, and in the framework of the Open Access Publishing Program.

## References

- Drzisga et al. (2019) D. Drzisga, B. Keith, B. Wohlmuth, The surrogate matrix methodology: Low-cost assembly for isogeometric analysis, arXiv preprint arXiv:1904.06971 (2019).
- Cottrell et al. (2009) J. A. Cottrell, T. J. Hughes, Y. Bazilevs, Isogeometric analysis: toward integration of CAD and FEA, John Wiley & Sons, 2009.
- Hughes et al. (2005) T. J. Hughes, J. A. Cottrell, Y. Bazilevs, Isogeometric analysis: CAD, finite elements, NURBS, exact geometry and mesh refinement, Comput. Methods Appl. Mech. Eng. 194 (2005) 4135–4195.
- Bauer et al. (2017) S. Bauer, M. Mohr, U. Rüde, J. Weismüller, M. Wittmann, B. Wohlmuth, A two-scale approach for efficient on-the-fly operator assembly in massively parallel high performance multigrid codes, Appl. Numer. Math. 122 (2017) 14–38.
- Bauer et al. (2019) S. Bauer, M. Huber, S. Ghelichkhan, M. Mohr, U. Rüde, B. Wohlmuth, Large-scale simulation of mantle convection based on a new matrix-free approach, J. of Comput. Science 31 (2019) 60–76.
- Bauer et al. (2018) S. Bauer, M. Huber, M. Mohr, U. Rüde, B. Wohlmuth, A new matrix-free approach for large-scale geodynamic simulations and its performance, in: Int. Conf. on Comput. Science, Springer, 2018, pp. 17–30.
- Drzisga et al. (2019) D. Drzisga, B. Keith, B. Wohlmuth, The surrogate matrix methodology: a priori error estimation, arXiv preprint arXiv:1902.07333 (2019).
- de Falco et al. (2011) C. de Falco, A. Reali, R. Vázquez, GeoPDEs: a research tool for isogeometric analysis of PDEs, Adv. Eng. Software 42 (2011) 1020–1034.
- Vázquez (2016) R. Vázquez, A new design for the implementation of isogeometric analysis in Octave and Matlab: GeoPDEs 3.0, Comput. & Math. with Appl. 72 (2016) 523–554.
- git (2019) Fork of the GeoPDEs git repository including the surrogate branch, https://dx.doi.org/10.5281/zenodo.3402341, 2019. doi:10.5281/zenodo.3402341, [Online; accessed September 7th 2019].
- Jones et al. (01 ) E. Jones, T. Oliphant, P. Peterson, et al., SciPy: Open source scientific tools for Python, http://www.scipy.org, 2001–. [Online; accessed February 11th 2019].