# A note on energy contraction and optimal convergence of adaptive iterative linearized finite element methods

In this note, we revisit a unified methodology for the iterative solution of nonlinear equations in Hilbert spaces. Our key observation is that the general approach from [Heid Wihler, arXiv:1808.04990v2] satisfies an energy contraction property in the context of (abstract) strongly monotone problems. This property, in turn, is the crucial ingredient in the recent convergence analysis in [Gantner et al., arXiv:2003.10785]. In particular, we deduce that adaptive iterative linearized finite element methods (AILFEMs) lead to linear convergence with optimal algebraic rates with respect to the degrees of freedom as well as the total computational time.

## Authors

• 9 publications
• 25 publications
• 8 publications
01/27/2021

### Goal-oriented adaptive finite element methods with optimal computational complexity

We consider a linear symmetric and elliptic PDE and a linear goal functi...
03/24/2020

### Rate optimality of adaptive finite element methods with respect to the overall computational costs

We consider adaptive finite element methods for second-order elliptic PD...
10/20/2020

### Gradient flow finite element discretisations with energy-based adaptivity for excited states of Schrödingers equation

We present an effective numerical procedure, which is based on the compu...
05/06/2022

### Conditions for Digit Stability in Iterative Methods Using the Redundant Number Representation

Iterative methods play an important role in science and engineering appl...
03/24/2020

### On the threshold condition for Dörfler marking

It is an open question if the threshold condition θ < θ_ for the Dörfler...
02/24/2022

### Performance of an iterative wavelet reconstructor for the Multi-conjugate Adaptive Optics RelaY of ESO's ELT

The Multi-conjugate Adaptive Optics RelaY (MAORY) is one of the key Adap...
05/22/2020

### A short note on plain convergence of adaptive least-squares finite element methods

We show that adaptive least-squares finite element methods driven by the...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

On a real Hilbert space with inner product  and induced norm , we consider a nonlinear operator , where denotes the dual space of . We aim to generate optimally converging (with respect to the number of degrees of freedom and the total computational time) finite element approximations of the following nonlinear operator equation:

 F(u)=0in X⋆. (1)

In weak form, this problem reads:

 Find u∈X such that⟨F(u),v⟩=0∀v∈X; (2)

here, signifies the duality pairing on . For the purpose of this work, we suppose that satisfies the following conditions:

1. Lipschitz continuity: There exists a constant such that

 |⟨F(u)−F(v),w⟩|≤LF∥u−v∥X∥w∥X∀u,v,w∈X.
2. Strong monotonicity: There exists a constant such that

 ν∥u−v∥2X≤⟨F(u)−F(v),u−v⟩∀u,v∈X.
3. The operator possesses a potential, i.e., there exists a Gâteaux differentiable (energy) functional such that .

Assuming (F1) and (F2), the main theorem of strongly monotone operators guarantees that (1) (or equivalently (2)) has a unique solution ; see, e.g., [Neč86, §3.3] or [Zei90, Thm. 25.B].

### 1.1. Iterative linearization

Following the recent approach [HW20a], we apply a general fixed point iteration scheme for (1). For given , we consider a linear and invertible preconditioning operator . Then, the nonlinear problem (1) is equivalent to . For any suitable initial guess , this leads to the following iterative scheme:

 Find un+1∈X such thatA[un]un+1=f(un)in X⋆∀n≥0. (3)

Note that the above iteration is a linear equation for , thereby rendering (3) an iterative linearization scheme for (1). Given , the weak form of (3) is based on the bilinear form , for , and the solution  of (3) can be obtained from

 a(un;un+1,w)=⟨f(un),w⟩∀w∈X. (4)

Throughout, for any , we suppose that the bilinear form  is uniformly coercive and bounded, i.e., there are two constants  independent of such that

 a(u;v,v)≥α∥v∥2X∀v∈X, (5)

and

 a(u;v,w)≤β∥v∥X∥w∥X∀v,w∈X. (6)

For any given , owing to the Lax–Milgram theorem, these two properties imply that the linear equation (4) admits a unique solution .

### 1.2. Iterative linearized Galerkin approach (ILG)

In order to cast (4) into a computational framework, we consider closed (and mainly finite dimensional) subspaces , endowed with the inner product and norm on . Then, ILG [HW20a] is based on restricting the weak iteration scheme (4) to . Specifically, for a prescribed initial guess , define a sequence inductively by

 a(unY;un+1Y,w)=⟨f(unY),w⟩∀w∈Y. (7)

Note that (7) has a unique solution, since the conditions (5) and (6) above remain valid for the restriction to . For the purpose of the convergence results in this paper, we introduce a monotonicity condition on the functional from (F3).

1. There exists a (uniform) constant such that for any closed subspace , the sequence defined by (7) fulfils the bound

 E(unY)−E(un+1Y)≥CE∥∥unY−un+1Y∥∥2X∀n≥0, (8)

where is the potential of (restricted to ) introduced in (F3).

If satisfies (F1)–(F3) and if the ILG bilinear form from (4) is coercive (5) with coercivity constant , then (F4) is fulfilled; see [HW20b, Prop. 2.1]. Moreover, upon imposing alternative conditions, we may still be able to satisfy (8) even when . For instance, [HW20a, Rem. 2.8] proposed an a posteriori step size strategy that guarantees the bound (8) in the context of the damped Newton method. This argument can be generalized to other methods containing a damping parameter. Moreover, the Kačanov linearization approach (see (22) below) satisfies (8) with without requiring ; see, e.g., [HW20a, Eq. (18)].

### 1.3. Adaptive iterative linearized finite element method (AILFEM)

We restrict ILG (7) to a sequence of Galerkin subspaces with corresponding sequences for given by

 a(unN;un+1N,v)=⟨f(unN),v⟩∀v∈XN. (9)

The spaces will be conforming finite element spaces associated to admissible triangulations of an underlying bounded Lipschitz domain , with . AILFEM (Algorithm 1 below) exploits an interplay of adaptive mesh refinements and ILG (9) and chooses the initial guesses and for all and appropriate indices .

### 1.4. Goal of the paper

The work [GHPS18] has recently analyzed the AILFEM for the solution of (1) in the specific context of a Zarantonello linearization approach originally proposed in [CW17]. The results of [GHPS18] have been extended to contractive iterative solvers in [GHPS20]. In particular, it has been shown that an optimal convergence rate with respect to the number of degrees of freedom as well as with respect to the overall computational cost can be achieved. The purpose of the present note is to show that these optimality results from [GHPS20] also apply to AILFEM beyond the Zarantonello linearization approach (cf. [HW20b] for some convergence results in abstract spaces). Especially, we will exploit (F4) to establish an energy contraction property for the fixed point iteration (3), which constitutes the crucial ingredient in the analysis of [GHPS20]. In particular, we will revisit a few example AILFEM schemes from [HW20a], and verify our theoretical findings numerically.

## 2. Energy contraction

In this section, we will establish an energy contraction result for ILG (7). To this end, let be a closed linear subspace of as in §1.2. Moreover, denote by the unique solution of the equation

 ⟨F(u⋆Y),v⟩=0∀v∈Y, (10)

where existence and uniqueness of follows again from the main theorem of strongly monotone operators.

###### Proposition 2.1.

Consider the sequence  generated by the iteration (7). If (F1)–(F4) are satisfied and fulfils (5)–(6) for any , then there holds the energy contraction property

 0≤E(un+1Y)−E(u⋆Y)≤q2???[E(unY)−E(u⋆Y)]∀n≥0, (11)

with a contraction constant

 0≤q???:=(1−2CEν2β−2L−1F)\nicefrac12<1 (12)

independent of the subspace and of the iteration number . In addition, the sequence  converges to the unique solution of (10).

###### Proof.

With (F2) and since is the (unique) solution of (10), for , we first observe that

 ν∥∥u⋆Y−unY∥∥2X (F2)≤⟨F(u⋆Y)−F(unY),u⋆Y−unY⟩⟨F(unY),unY−u⋆Y⟩.

Recalling that , (7) and (6) prove that

 ⟨F(unY),unY−u⋆Y⟩a(unY;unY−un+1Y,unY−u⋆Y)β∥∥un+1Y−unY∥∥X∥∥unY−u⋆Y∥∥X.

Altogether, we derive the a posteriori

error estimate

 ∥∥u⋆Y−unY∥∥X≤βν−1∥∥unY−un+1Y∥∥X∀n≥0. (13)

Next, we exploit the structural assumptions (F1)–(F3) to obtain the well-known inequalities

 ν2∥∥u⋆Y−v∥∥2X≤E(v)−E(u⋆Y)≤LF2∥∥u⋆Y−v∥∥2X∀v∈Y; (14)

see, e.g., [HW20b, Lem. 2.3] or [GHPS18, Lem. 5.1]. For any , we thus infer that

 0E(un+1Y)−E(u⋆Y) =E(unY)−E(u⋆Y)−[E(unY)−E(un+1Y)] E(unY)−E(u⋆Y)−CEν2β2∥∥u⋆Y−unY∥∥2X E(unY)−E(u⋆Y)−2CEν2β2LF[E(unY)−E(u⋆Y)]=q2???[E(unY)−E(u⋆Y)].

This proves, in particular, that . Finally, iterating this inequality, we obtain that

 0≤ν2∥∥u⋆Y−unY∥∥2XE(unY)−E(u⋆Y)≤q2n???[E(u0Y)−E(u⋆Y)]∀n≥0.

In particular, we conclude that in as . ∎

## 3. Adaptive finite element discretization

In this section, we thoroughly formulate AILFEM. For each , the space in (9) will be a conforming finite element space that is associated to an admissible triangulation of an underlying bounded Lipschitz domain , with . Following the recent approach [GHPS20], we will present an adaptive iterative linearization finite element algorithm that exploits the interplay of adaptive mesh refinements and the iterative scheme (9). Throughout this section, we will always assume that satisfies (F1)–(F4) and that (5) and (6) hold true.

### 3.1. Mesh refinements

We adopt the framework from [GHPS20, §2.2–2.4], with slightly modified notation. Consider a shape-regular mesh refinement strategy such as, e.g., newest vertex bisection [Mit91]. For a subset of marked elements in a regular triangulation , let be the coarsest regular refinement of such that all elements have been refined. Specifically, we write for the set of all possible meshes that can be generated from by (repeated) use of . For a mesh , we assume the nestedness of the corresponding finite element spaces and , respectively, i.e., . In the sequel, starting from a given initial triangulation of , we let be the set of all possible refinements of .

With regards to the optimal convergence rate of the algorithm with respect to the overall computational cost (see the next section §4), a few assumptions on the mesh refinement strategy are required, cf. [GHPS20, §2.8]. These are satisfied, in particular, for newest vertex bisection.

1. Splitting property: Each refined element is split into at least two and at most many subelements, where is a generic constant. In particular, for all triangulations , and for any , the (one-level) refinement satisfies

 #(T∖T′)+#T≤#T′≤Cref#(T∖T′)+#(T∩T′).

Here, is the set of all elements in which have been refined in , and comprises all unrefined elements.

2. Overlay estimate: For all meshes , and refinements there exists a common refinement denoted by

 Tref1⊕Tref2∈refine(Tref1)∩refine(Tref2)⊆refine(T),

which satisfies .

3. Mesh-closure estimate: There exists a constant such that, for each sequence of successively refined meshes, i.e., for some , it holds that

 #TN−#T0≤CmeshN−1∑J=0#MJ∀N∈N.

### 3.2. Error estimators

For a mesh associated to a discrete space , suppose that there exists a computable local refinement indicator , with for all and . Then, for any and , let

 ηN(UN,v):=(∑T∈UNηN(T,v)2)\nicefrac12andηN(v):=ηN(TN,v). (15)

We recall the following axioms of adaptivity from [CFPP14] for the refinement indicators: There are fixed constants and such that, for all and , with associated refinement indicators and , the following properties hold, cf. [GHPS20, §2.8]:

1. Stability: , for all and all .

2. Reduction: , for all .

3. Reliability: For the error between the exact solution of (1) and the exact discrete solution of (10), we have the a posteriori error estimate .

4. Discrete reliability: , where is the solution of (10) on the discrete space associated with .

We emphasize that (A1)–(A4) are satisfied for the usual -weighted residual error estimators in the specific context of our application in section §5, cf. (23).

### 3.3. AILFEM algorithm

We recall the adaptive algorithm from [GHPS18] (and its generalization [GHPS20]), which was studied in the specific context of finite element discretizations of the Zarantonello iteration. These algorithms are closely related to the general adaptive ILG approach in [HW20a]. The key idea is the same in all algorithms: On a given discrete space, we iterate the linearization scheme (9) as long as the linearization error estimator dominates. Once the ratio of the linearization error estimator and the a posteriori error estimator falls below a prescribed tolerance, the discrete space is refined appropriately.

###### Remark 3.1.

Under the conditions (A1) and (A3), we notice some facts about Algorithm 1 from [GHPS20, Prop. 3] and [GHPS18, Prop. 4.5 & 4.6]: For any , there holds the a posteriori error estimate

 ∥∥u⋆−un(N)N∥∥X≤C???ηN(un(N)N), (16)

where depends only on , and . In particular, if the repeat loop terminates with for some , then , i.e., the exact solution is obtained. Moreover, in the non-generic case that the repeat loop in Algorithm 1 does not terminate after finitely many steps, for some , the generated sequence converges to (in particular, the solution is discrete).

## 4. Optimal convergence

We are now ready to outline the linear convergence of the sequence of approximations from Algorithm 1 to the unique solution of (1). In addition, the rate optimality with respect to the overall computational costs will be discussed.

### 4.1. Step counting

Following [GHPS20], we introduce an ordered index set

 Q:={(N,n)∈N20: index pair (N,n) % occurs in Algorithm~{}??? ∧ n

where counts the number of steps in the repeat loop for each . We exclude the pair from , since either and or even if the loop does not terminate after finitely many steps; see also Remark 3.1. Observing that Algorithm 1 is sequential, the index set is naturally ordered: For , we write if and only if appears earlier in Algorithm 1 than . With this order, we can define the total step counter

 |(N,n)|:=#{(N′,n′)∈Q:(N′,n′)<(N,n)}=n+N−1∑N′=0n(N′),

which provides the total number of solver steps up to the computation of . Finally, we introduce the notation .

### 4.2. Linear convergence

Based on the notation used in Algorithm 1, let us introduce the quasi-error

 ΔnN:=∥∥u⋆−unN∥∥X+ηN(unN)∀(N,n)∈¯¯¯¯¯Q:=Q∪{(N,n(N)):n(N)<∞}, (17)

and suppose that the estimator satisfies (A1)–(A3). Then, appealing to the energy contraction property from Proposition 2.1, we make the crucial observation that ILG (7) satisfies the contraction property (C1) from [GHPS20]. Consequently, [GHPS20, Thm. 4] directly applies to our setting:

###### Theorem 4.1 ([Ghps20, Thm. 4]).

Suppose (A1)–(A3). Then, for all and , there exist constants and such that the quasi-error (17) is linearly convergent in the sense of

 ΔnN≤Clinq|(N,n)|−|(N′,n′)|linΔn′N′∀(N,n),(N′,n′)∈Q with (N′,n′)<(N,n).

The constants and depend only on , and the adaptivity parameters and .

### 4.3. Optimal convergence rate and computational work

Furthermore, we address the optimal convergence rate of the quasi-error (17) with respect to the degrees of freedom as well as the computational work. As before, we can directly apply a result from [GHPS20] owing to the energy contraction from Proposition 2.1. For its statement, we need further notation: First, for , let be the set of all refinements of with . Next, for , define

 ∥u⋆∥As:=supL∈N(L+1)sinfTopt∈T(L)[∥∥u⋆−u⋆opt∥∥X+ηopt(u⋆opt)]∈R≥0∪{∞},

where is the discrete solution (10) on the finite element space related to an optimal (in terms of the above infimum) mesh . For , we note that if and only if the quasi-error converges at least with rate along a sequence of optimal meshes.

###### Theorem 4.2 ([Ghps20, Thm. 7]).

Suppose (R1)–(R3) and (A1)–(A4), and define

 λopt:=1−q???q???Cstb√\nicefracν2.

Let and such that

 0<θ′:=θ+\nicefracλλopt1−\nicefracλλopt<(1+C2stbC2rel)−\nicefrac12.

Then, for any , there exist positive constants such that

 c−1opt∥u⋆∥As≤sup(N′,n′)∈Q(#TN′−#T0+1)sΔn′N′≤sup(N′,n′)∈Q(∑(N,n)∈Q(N,n)≤(N′,n′)#TN)sΔn′N′≤Coptmax{∥u⋆∥As,Δ00}. (18)

Here, the constant depends only on , and , and additionally on and , respectively, if or for some , respectively; moreover, the constant depends only on , and .

The significance of (18) is that the quasi-error from (17) decays at rate (with respect to the number of elements or, equivalently, the number of degrees of freedom) if and only if rate is achievable for the discrete solutions on optimal meshes (with respect to the number of elements). If, in addition, all of the (single) steps in Algorithm 1 can be performed at linear cost, , then the quasi-error even decays with rate with respect to the overall computational cost if and only if rate is attainable with respect to the number of elements. We note that the total computational cost is proportional to the total computational time, which is therefore monitored in the subsequent numerical experiment.

## 5. Numerical experiment

In this section, we test Algorithm 1 with a numerical example.

### 5.1. Model problem

On an open, bounded, and polygonal domain with Lipschitz boundary , we consider the quasi-linear second-order elliptic boundary value problem:

 Find u∈X such thatF(u):=−∇⋅{μ(|∇u|2)∇u}−g=0in % X⋆. (19)

Here, is the standard Sobolev space of -functions on  with zero trace along . For , the inner product and norm on  are defined by  and , respectively. We suppose that in (19) is given, and the diffusion parameter fulfils the monotonicity property

 mμ(t−s)≤μ(t2)t−μ(s2)s≤Mμ(t−s)∀t≥s≥0, (20)

with constants . Under this condition, the nonlinear operator  from (19) satisfies (F1) and (F2) with and ; see [Zei90, Prop. 25.26]. Moreover, has a potential given by

 E(u):=∫Ωψ(|∇u|2)dx−⟨g,u⟩∀u∈X,

where , . The weak form of the boundary value problem (19) reads:

 Find u∈X such that∫Ωμ(|∇u|2)∇u⋅∇vdx=⟨g,v⟩∀v∈X. (21)

For the nonlinear boundary value problem (19), [HW20a, §5.1] proves the convergence of a few particular iterative linearization schemes, which can be cast into the general approach (4). These include the following:

1. Zarantonello (or Picard) iteration, for :

 −Δun+1=−Δun−δZF(un),n≥0;

cf. Zarantonello’s original report [Zar60] or the monographs [Neč86, §3.3] and [Zei90, §25.4]. We further point to [HW20b, Rem. 1] for the extended domain of the damping parameter .

2. Kačanov iteration:

 −∇⋅{μ(|∇un|2)∇un+1}−g=0,n≥0; (22)

cf. [Zei90, §25.13].

3. Newton iteration, for a damping parameter :

 F′(un)un+1=F′(un)un−δ(un)F(un),n≥0;

Here, for , the Gâteaux derivative of  is given through

 ⟨F′(u)v,w⟩=∫Ω2μ′(|∇u|2)(∇u⋅∇v)(∇u⋅∇w)dx+∫Ωμ(|∇u|2)∇v⋅∇wdx,v,w∈X.

### 5.2. Discretization and local refinement indicator

AILFEM for (21) is based on regular triangulations that partition the domain  into open and disjoint triangles  such that . Moreover, we consider associated finite element spaces , where we signify by the space of all affine functions on . The mesh refinement strategy in Algorithm 1 is given by newest vertex bisection [Mit91]. Moreover, for any and any , we define the local refinement indicator respectively the global error indicator from (15) by

 (23)

where signifies the normal jump across element faces, and is equivalent to the diameter of . This error estimator satisfies the assumptions (A1)–(A4) for the problem under consideration; see, e.g.,[GMZ12, §3.2] or [CFPP14, §10.1].

### 5.3. Computational example

Consider the L-shaped domain , and the nonlinear diffusion parameter , which satisfies (20) with and . Moreover, we choose such that the analytical solution of (19) is given by

 u⋆(r,φ)=r\nicefrac23sin(\nicefrac2φ3)(1−rcosφ)(1+rcosφ)(1−rsinφ)(1+rsinφ)cosφ,

where and are polar coordinates; this is the prototype singularity for (linear) second-order elliptic problems with homogeneous Dirichlet boundary conditions in the L-shaped domain; in particular, we note that the gradient of  is unbounded at the origin.

In all our experiments below we set the adaptive mesh refinement parameters to , and . The computations employ an initial mesh consisting of 192 uniform triangles, and with the starting guess . Then, the procedure is run until the number of elements exceeds . We always choose the damping parameter for the Newton iteration, and vary the damping parameter for the Zarantonello iteration, as well as the adaptivity parameter , cf. line 6 in Algorithm 1, throughout the experiments. Our implementation is based on the Matlab package [FPW11] with the necessary modifications.

In general, for the Newton scheme, we note that choosing the damping parameter to be  (potentially resulting in quadratic convergence of the iterative linearization close to the solution) might lead to a divergent iteration for the given boundary value problem (cf. [AW14]). Our numerical computations illustrate, however, that this is not of concern in the current experiments. Indeed, for , the bound (8) from (F4) remains satisfied in each iteration. Otherwise, a prediction and correction strategy which obeys the bound (8), could be employed (see [HW20a, Rem. 2.8]). This would guarantee the convergence of the (damped) Newton method.

1. and : In Figure 1, we display the performance of Algorithm 1 with respect to both the number of elements and the total computational time. We clearly see a convergence rate of for the Kačanov and Newton method, which is optimal for linear finite elements. Moreover, the Zarantonello iteration has a pre-asymptotic phase of reduced convergence, which becomes optimal for finer meshes. In Figure 2 (left) we observe that the energy contraction factor given by

 ϰN:=E(un(N)N)−E(u⋆)E(u0N)−E(u⋆) (24)

is inferior for the Zarantonello iteration in the initial phase (compared to the Kačanov and Newton methods), which might explain the reduced convergence. This contraction factor becomes better for an increased number of iterations, see Figure 2 (right), thereby leading to the asymptotically optimal convergence rate for the Zarantonello iteration. Finally, in Figure 3 (left), we display the quotient

 κN:=E(un(N)−1N)−E(un(N)N)∥∥un(N)N−un(N)−1N∥∥2X, (25)

which experimentally verifies the assumption (F4).

2. and : As before, in Figure 4, we display the performance of Algorithm 1 with respect to both the number of elements and the total computational time. We clearly observe the optimal convergence rate of for all of the three iteration schemes presented above from the initial mesh onwards. In contrast to the experiment before, the energy contraction factor  from (24) is now of comparable size for all the iteration schemes, as we can see from Figure 5 (left). Moreover, the number of iterations does not significantly differ for the three iterative methods. Again, we plot in Figure 3 (right) the quotient (25) for the numerical evidence of the assumption (F4).

3. and : Once more, we observe optimal convergence rate for all our three iteration schemes with respect to both the number of elements and the total computational time, see Figure 6. The total computational times obtained, however, differ noticeably, see Figure 6 (right), as a consequence of the varying number of iterative linearization steps of the three methods, see Figure 7 (right). In contrast, the energy contraction factor  from (24) almost coincides for the different iteration schemes, see Figure 7 (left), which is due to the small adaptivity parameter .