A Sketched Finite Element Method for Elliptic Models

by   Robert Lung, et al.
University of Oxford

We consider a sketched implementation of the finite element method for elliptic partial differential equations on high-dimensional models. Motivated by applications in real-time simulation and prediction we propose an algorithm that involves projecting the finite element solution onto a low-dimensional subspace and sketching the reduced equations using randomised sampling. We show that a sampling distribution based on the leverage scores of a tall matrix associated with the discrete Laplacian operator, can achieve nearly optimal performance and a significant speedup. We derive an expression of the complexity of the algorithm in terms of the number of samples that are necessary to meet an error tolerance specification with high probability, and an upper bound for the distance between the sketched and the high-dimensional solutions. Our analysis shows that the projection not only reduces the dimension of the problem but also regularises the reduced system against sketching error. Our numerical simulations suggest speed improvements of two orders of magnitude in exchange for a small loss in the accuracy of the prediction.



page 22


Symmetrized two-scale finite element discretizations for partial differential equations with symmetric solutions

In this paper, a symmetrized two-scale finite element method is proposed...

A Geometrical Method for Low-Dimensional Representations of Simulations

We propose a new data analysis approach for the efficient post-processin...

Application of Randomized Quadrature Formulas to the Finite Element Method for Elliptic Equations

The implementation of the finite element method for linear elliptic equa...

A nonconforming finite element method for an elliptic optimal control problem with constraint on the gradient

This article is concerned with the nonconforming finite element method f...

Adaptive finite element approximations for elliptic problems using regularized forcing data

We propose an adaptive finite element algorithm to approximate solutions...

Adaptive surrogate models for parametric studies

The computational effort for the evaluation of numerical simulations bas...

A practical algorithm to minimize the overall error in FEM computations

Using the standard finite element method (FEM) to solve general partial ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Motivated by applications in digital manufacturing twins and real-time simulation in robotics, we consider the implementation of the Finite Element Method (FEM) in high-dimensional discrete models associated with elliptic partial differential equations (PDE). In particular, we focus on the many-query context, where a stream of approximate solutions are sought for various PDE parameter fields [8], aiming to expedite computations in situations where speedy model prediction is critical. Realising real-time simulation with high-dimensional models is instrumental to enable digital economy functions and has been driving developments in model reduction over the last decade [12]. Reducing the computational complexity of models is also central to the practical performance of statistical inference and uncertainty quantification algorithms, where a multitude of model evaluations are necessary to achieve convergence [16]. When real-time prediction is coupled with noisy sensor data, as in the digital twins paradigm, a fast, somewhat inaccurate model prediction typically suffices [4].

Our approach is thus tailored to applications where some of the accuracy of the solution can be traded off with speed. In these circumstances the framework of randomised linear algebra presents a competitive alternative [23]. In the seminal work [6], Drineas and Mahoney propose an algorithm for computing the solution of the Laplacian of a graph, making the case for sampling the rows of the matrices involved based on their statistical leverage scores. Despite aimed explicitly for symmetric diagonally dominant systems arising, their approach provides inspiration for the numerical solution of PDEs on unstructured meshes. Apart from the algebraic resemblance to the Galerkin FEM systems, the authors introduced sampling based on leverage scores of matrices through the concept of ‘effective resistance’ of a graph derived by mimicking Ohmic relations in resistor networks. As it turns out the complexity of computing the leverage scores is similar to that of solving the high-dimensional problem deterministically, however efficient methods to approximate them have since been suggested [7]. More recently, Avron and Toledo have proposed an extension of [6] for preconditioning the FEM equations by introducing the ‘effective stiffness’ of an element in a finite element mesh [1]. Specifically, for sparse symmetric positive definite (SSPD) stiffness matrices, they derive an expression for the effective stiffness of an element and show its equivalence to the statistical leverage scores. Sampling elements leads to a sparser preconditioner.

In situations where a single, high-dimensional linear system is sought, randomised algorithms suited to SSPD systems are readily available. The methods of Gower and Richtarik for example randomises the row-action iterative methods by taking a sequence of random projections onto convex sets [9]

. This algorithm is equivalent to a stochastic gradient descent method with provable convergence, while their alternative approach in

[10] iteratively sketches the inverse of the matrix. In [2], Bertsekas and Yu present a Monte Carlo method for simulating approximate solutions to linear fixed-point equations, arising in evaluating the cost of stationary policies in Markovian decisions. Their algorithm is based on approximate dynamic programming and has subsequently led to [20], that extends some of the proposed importance sampling ideas in the context of linear ill-posed inverse problems.

Real-time FEM computing at the many query paradigm, is hindered by two fundamental challenges: the fast assembly of the stiffness matrix for each parameter field, and the efficient solution of the resulting system to the required accuracy. To mitigate these, is to compromise slightly on the accuracy in order to capitalise on speed. To achieve this we first transform the linear SSPD system into an overdetermined least squares problem, and then project its solution this onto a low-dimensional subspace. This mounts to inverting a low-dimensional, dense matrix whose entries are perturbed by random errors. Our emphasis and contributions are in developing the projected sketching algorithm, and in optimising the sampling process so that it is both efficient in the multi-query context and effective in suppressing the variance of the solution. We also analyse the complexity of our algorithm and derive, probabilistic error bounds for quality of the approximation.

Our paper is organised as follows: In section 2 we provide a concise introduction to the Galerkin formulation for elliptic boundary value problems, and subsequently derive the projected least squares formulation of the problem. We then describe the sampling distribution used in the sketching and provide the conditions under which the reduced sketched system has a unique solution. Section 4 contains a description of our algorithm, and our main result that describes the complexity of our algorithm in achieving an error tolerance in high probability. We then provide an error analysis addressing the various types of errors imparted on the solution through the various stages of the methodology, before concluding with some numerical experiments based on the steady-state diffusion equation.

1.1. Notation

Let denote the set of integers between 1 and inclusive. For a matrix , and denote its -th row and column respectively, and its -th entry. is the pseudo-inverse of and its condition number. If

we define the singular value decomposition

where , and

. Unless stated otherwise, singular values and eigenvalues are ordered in non-increasing order. Analogously, for a symmetric and positive definite matrix

, is the largest eigenvalue, and the smallest. By we denote the number of non-zero elements in . Further we write

for the Euclidean norm for a vector or the spectral norm of a matrix and

the Frobenius norm of a matrix. For matrices and with the same number of rows

is the augmented matrix formed by column concatenation. The identity matrix is expressed as

or to specify its dimension when important to the context. We write for the Kronecker product of vector with the ones vector in dimensions.

2. Galerkin finite element method preliminaries

Consider the elliptic partial differential equation


on a bounded, simply connected domain , with Dirichlet conditions


on a Lipschitz smooth boundary . Further let a bounded positive parameter function in the Banach space such that


for some finite constants and . Multiplying (1) by an appropriate test function , then integrating over the domain and invoking the divergence theorem yields


where denotes the -dimensional integration element. Using the standard definition of the Sobolev space on this domain as


where is the space of square-integrable functions on we define the solution and test function spaces respectively by


Let and , where the Sobolev space is to be understood in terms of a surjective trace operator from to . Then the weak form of the boundary value problem (1)-(2) is to find a function such that


The existence and uniqueness of the weak solution is guaranteed by the Lax-Milgram theorem [8].

To derive the Galerkin finite element approximation method from the weak form (7), we consider a mesh comprising elements, having interior and boundary vertices (nodes). Further let the conforming finite dimensional space associated with the chosen finite element basis defined on . Choosing

to comprise linear interpolation shape functions with local support over the elements in

then we can express the FEM approximation of in this basis for a set of coefficients as


We have made slight abuse of notation by using for the function in as well as its FEM approximation in . In effect, the finite element formulation of the boundary value problem is to find such that


We further define the element-average coefficients


and applying the Dirichlet boundary conditions on the boundary nodes we arrive at the Galerkin system of equations for the vector


The equations in (11) are expressed in a matrix form as


where is the symmetric, sparse and positive-definite stiffness matrix, whose dependence on the parameters is implicit and suppressed for clarity. The FEM construction guarantees the consistency of the system (12), thus is always in the column space of and consequently it admits a unique solution . As we focus to the efficient approximation of in the many query context we content with two challenges: the efficient assembly of the stiffness matrix, and the speedy solution of the resulted FEM system.

2.1. The stiffness matrix

Let is the index set of the vertices of the th element, and consider to be the sparse matrix holding the gradients of the linear shape functions where . In this is then a constant gradients vector associated with the th node of , and let the element of a vector such that and a row concatenation of matrices for all elements. If we define as and the concatenation of the matrices as


then the stiffness matrix takes the form of a high-dimensional sum or product of sparse matrices


which for large require efficient assembly using reference elements and geometry mappings [15]

. The above construction typically leads to a stiffness matrix that is well-conditioned for inversion with the exception of acute element skewness

[14] and parameter vectors with wild variation [22], which cause the the condition number to increase dramatically. Explicit bounds on the largest and smallest eigenvalues of , and respectively the singular values of , are given in [13].

3. A regularised sketched formulation

The sought solution can be alternatively obtained by solving the over-determined least squares problem



The fact that the above problem is over-determined implies, at least to some extent, robustness against noise, such as random perturbations on the elements of the matrix and vector . A similar error is induced by randomised sketching where we replace (15) with


and look for a random approximation of in the sense that . We note that and don’t have to be similar as such, e.g. have the same dimensions, as long as the problems are well defined and the optimisers remain similar. Following [6] and [19] we seek to approximate with some sketch by sampling and scaling rows according to probabilities that will be specified later. The number of rows in in that case equals the number of drawn samples. Clearly must have at least rows as otherwise the problem (16) will be under-determined and, due to the non-uniqueness of the solution, the error could become arbitrarily large. On the other hand, if around rows are sampled from a suitable distribution, then Drineas and Mahoney show that the resulting sketch is a good approximation with high probability. However, if substantially less than samples are drawn then the sketching induced error outweighs its computational benefits. In order to understand how this issue can be addressed we note that, if has full column-rank and thus the optimiser of (16) is unique, the solution of the sketched problem can be obtained by solving the linear system

which is equivalent to solving


From (17) it becomes clear that the sketching induced error can be regarded as an error on the right-hand side of the linear system (12) or the least squares problem (15). We can easily obtain a bound for the relative error given by

A standard way of dealing with noise as in (17) is regularisation [18]. Suppose that there exists a low-dimensional subspace


spanned by a basis of orthonormal functions arranged in the columns of matrix , and assume that is sufficient to approximate within some acceptable level of accuracy, in the sense of incurring a small subspace error . The orthogonal projection operator maps vectors from onto the subspace . Of course, such a subspace can’t accommodate all but rather only sufficiently regular . For that reason has to be constructed using prior information (e.g. degree of smoothness) about the solution. Orthogonality of ensures for any the existence of a unique, optimal low-dimensional vector satisfying


In these conditions we can pose a projected-regularised least-squares problem replacing (15) by


in order to improve the robustness of the solution against sketching-induced errors. The problem in (20) still involves high-dimensional quantities such as and , but the solution is unique as soon as and the null-space of have intersection. We start by introducing the low dimensional problem***We emphasise the contrast between the projected equations in (21) and the projected variable least squares problem

whose solution is
and incurs a subspace regression error term that is quadratic in . Moreover, note that the right hand side vector in the normal equations has dependence on the parameter through .


A solution of (21) yields a solution of (20) because the columns of form an ONB of . In addition, we have the following.

Lemma 3.1.

If has full column rank and the columns of form an ONB of so that is the projection onto , then


In particular, both problems have a unique solution.


Both problems have unique solutions because is convex and has (by assumption) full column rank. Therefore it suffices to show that there exists an element that solves both problems. The solution of (21) can be found explicitly by solving the linear system

We have used that has full column rank so that and is invertible. Similarly we may consider

which produces solutions such that is a solution of the right-hand side of (22). Since and has full column rank we can write as

We conclude that is a solution to both sides of (22) which completes the proof. ∎

The right hand side of (22) has a very natural interpretation and is obtained by embedding the rows of , the vector and the variable in using its low dimensional representation from the basis induced by the columns of . In view of Lemma 3.1 we may regularise the problem from (16) and obtain an embedded sketched counterpart to (20) as


We argue that (23) is much more robust to the noise imparted by the approximation and produces solutions with controlled errors even if substantially less than suitably drawn samples are used for the approximation. In order to see why, notice that the problem (23) can be expressed in terms of the low-dimensional vector of coefficients


so that . Recalling that , it is convenient to introduce


together with their sketched approximations

Lemma 3.2.

If has full column rank then the solution of the least-squares problem (24) is given by and we have


where and are the solutions of (20) and (23) respectively.


If has linearly independent columns then and the solution of (24) solves

Again is invertible because has linearly independent columns and the first claim follows. The matrix is positive definite which implies that is positive definite and . The matrix has orthonormal columns which implies . Since we can use the formula we have just shown and obtain

where the last identity is due to . ∎

In order to understand the effect of row sampling and why it can be a good approximation, we start by writing


as a sum of outer products of rows. Introduce for some sample size the iid random indices taking values in with distribution


for each and . Instead of (28) we may consider the sketch


If we define the random matrix

and the random diagonal matrix via


then can put and construct the sketch as


Lastly, we can write as well as for the sketches of and

. A simple computation together with an application of the strong law of large numbers shows the following.

Proposition 3.3 (Lemma 3 and 4 in [DrineasMahoneyKannan]).

Assume that the sampling probabilities satisfy the consistency condition


In this case we have for the matrix as defined in (30) that and . As a consequence, almost surely for .

Proposition 3.3 summarises the asymptotic properties of the used sketch. The condition (33) is very mild and holds for a wide range of distributions such as sampling from scaled row norms or uniform sampling. The convergence rate of cannot be improved although the constant depends on the chosen probabilities . In other words, as long as we sample all non-zero rows with positive probability we will obtain a sketch that has good asymptotic properties when considered as an approximation for . However, in order to find good sampling probabilities we have to consider the non-asymptotic behaviour of the sketch. In fact, the main purpose of the regularisation/dimensionality reduction was to avoid situations where sampling a large number of rows is necessary. If , then the regularised problem (21

) has substantially fewer degrees of freedom than the high dimensional formulation in (

15). Consequently, the dependence of on the rows of is a lot smoother than the dependence of on . In other words, approximating by row sampling has a much smaller effect on the regularised solution than an approximation of with the same sample size would have on the solution of the full system (12). For example, a much smaller number of rows needs to be sampled to obtain the correct null-space which results in a full-rank approximation of . Note that, conditional on being invertible, in combination with Lemma 3.2 implies


so the randomisation error of the regularised problem is entirely controlled by low dimensional structures. This property is the key to a small sketching error and thus to an overall accurate approximation when only few samples are drawn. Using the notation from before and letting be the singular value decomposition of , we can write the bound from (34) as

From the above formulation it becomes apparent that the error will be small if the sketch is constructed such that in spectral norm. We argue that this is essentially equivalent to . Indeed, we have the following.

Lemma 3.4.

If then


Under the condition of the lemma we know that is invertible and that

which implies the upper bound by considering the estimate

Denote by the -th eigenvalue of . Then we may write

where is the smallest eigenvalue. By assumption of the lemma

which implies the claim after dividing by and taking the inverse. ∎

An approximation of can be obtained by sampling with probabilities that are proportional to the statistical leverage scores


i.e. the row norms of the left singular vectors of [7]. At first sight it seems that taking sampling probabilities proportional to the leverage scores in (35) in order to obtain a sketch of (21) is very similar to using the leverage scores of to obtain (16) from (15) as was proposed by Drineas and Mahoney in [6] for a similar problem. A key difference is that is tall and dense while is sparse and thus is quite different to the initial stiffness matrix . Consequently, an interpretation of the leverage scores from (35) in terms of effective stiffness [1] is, to the best of our knowledge, not possible. The following Lemma will be useful for our further developments.

Lemma 3.5 ([21] section 6.4).

Assume that is constructed as before with sampling probabilities satisfying


for some . Then we have


An important corollary of the above lemma is that a sketch which is constructed by sampling from leverage score probabilities will virtually always be invertible and therefore the sketched problem (24) has a unique solution. The following result states that this property is preserved even when the rows are re-weighted, an operation which changes the leverage scores.

Proposition 3.6.

Let be a diagonal matrix with positive entries, i.e. for each . Assume that the sketching matrix is constructed with sampling probabilities . For the scaled sketch we have


It is sufficient to show that

because the probability bound follows immediately from

after applying (37) from Lemma 3.5. The above matrices are always positive semi-definite and therefore invertibility is equivalent to positive definiteness. For any diagonal matrix it holds that where is a random diagonal matrix with entries . Thus for any we have

Since has full column rank we know that corresponds to a change of basis and whenever . It follows that is positive definite if and only if is positive definite. As is a diagonal such that with probability , the latter is equivalent to being positive definite. The case of is covered by . ∎

Proposition 3.6 states that re-scaling of rows doesn’t affect the quality of the sketching matrix regarding its invertibility and after sampling rows the probability of the sketch being singular decays exponentially fast with each additional draw. In practice this makes knowledge of valuable because we only need to sample rows for some moderately large and obtain a sketch that is virtually never singular. On the other hand, we need at least samples so that there is any hope in obtaining a non-singular matrix. The remarkable thing about Proposition 3.6 is that the failure probability is independent of both, the inner dimension of the product as well as the scaling matrix and equivalent to the bound which could be obtained by sampling from . This suggests that a sketch which is constructed by drawing samples from is not too different compared to sampling from . This intuition is supported by the following result which describes the change in the leverage scores after re-weighting a single row.

Proposition 3.7 ([5] Lemma 5).

Let be a diagonal matrix with and for each . Then


and for


where are the cross leverage scores.

Since has orthogonal columns, we have for any and thus the cross leverage scores from the above Lemma satisfy


For a general diagonal matrix as in Proposition 3.6 we may without loss of generality assume that each entry lies in since we can divide the elements by their maximum. The re-weighting can thus be considered as a superposition of single row operations


where the are as in Proposition 3.7. Since the commute we can apply them in any order without changing the outcome. Considering Lemma 3.5, if we could ensure that isn’t substantially smaller than then sampling from will produce good sketches for .

Large leverage scores

Equation (39) shows that the relative change of the -th leverage score after a re-weighting of the -th row shrinks when . In the extreme case when the re-weighting has no effect. In addition to this stability property it trivially holds that which suggests that large leverage scores are fairly stable when rows are re-weighted.

Small leverage scores

From Equation (40) we know that the increase of after re-weighting of row is proportional to . If the entries of the scaling matrix don’t vary too much, then (41) suggests that we can expect the total increase, i.e. after applying for each to be roughly of order . On the other hand, small are fairly sensitive to re-weighting of row since in that case. Thus we can expect that the re-weighting of row will counterbalance the effects from re-weighting the other rows. In addition, we know that

Since large leverage scores will likely be quite stable and we would expect that not too many small leverage scores will become large.

So far we have discussed the projection of the high-dimensional system without providing explicit details on how the basis is selected. A desired property is to sustain a small projection error for all admissible parameter choices under the constraint . Suitable options include subsets of the right singular vectors of or orthogonalised Krylov-subspace bases [11], however these have to be computed for each individual parameter vector which can be detrimental to the speed of the solver. Alternatively, we opt for a generic basis exploiting the smoothness of

on domains with smooth Lipschitz boundaries. A simple choice is to select the basis among the eigenvectors of the discrete Laplacian operator


for . From and splitting the eigenvectors as

such that the columns of correspond to the last columns of , and respectively to the smallest eigenvalues . In effect, with constrained by the Dirichlet boundary conditions, the norm provides a measure of the smoothness of in the interior of . It is not difficult to see that this basis satisfies

We remark that the computation of the basis is computationally very expensive for large , as the eigen-decomposition of is necessary, however this is only computed once, prior to the beginning of the simulation (offline stage) in an offline stage. After the matrix has been obtained we can compute the leverage scores . The Laplacian differs from a general stiffness matrix only by different diagonal weights, i.e. is replaced by the diagonal matrix where the contain information about the parameter from (1). Propositions 3.6 and 3.7 along with the developments thereafter suggest that the Laplacian leverage scores can nonetheless be used to construct sketches of the projected matrix because the difference in the stiffness matrices is just a diagonal weighting.

4. Complexity and error analysis

Motivated by the developments from the previous sections we propose the following algorithm for computing solutions to a sequence of problem of the form (1). We assume that each problem is specified by its parameter vector for (see section 2.1).

input : Matrices , , data vector , and sampling probabilities (offline)
output : Parameter dependent solutions where
Online Simulation;
for  to  do
       input : Parameter vector , sample size
       draw row indices from ;
       get the sampled indices ;
       set and write ;
       compute the frequencies for ;
       find for and the diagonal matrix ;
       find for and the diagonal matrix ;
       assemble the matrix