 # Convergence analysis of inexact two-grid methods: Multigrid cycles

Multigrid is a popular iterative solver for a large class of linear systems that arise from discretized partial differential equations. Typically, it is a recursive call of two-grid procedure and hence can be treated as an inexact two-grid scheme. In this paper, we present a systematic convergence analysis of standard multigrid methods based on the inexact two-grid theory developed by Xu and Zhang (2020). Two alternating combinations of the V-cycle and W-cycle multigrid methods are also analyzed. More specifically, we establish new upper bounds for the convergence factor of multigrid methods in a purely algebraic manner. Moreover, our analysis allows the coarsest-grid problem to be solved approximately.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

In 1962–1964, Fedorenko [18, 19] proposed a multigrid idea for solving the Poisson’s equation on a unit square. A more complicated case of variable coefficients was later considered by Bakhvalov . The actual efficiency of multigrid methods was recognized by Brandt [11, 12]. To analyze the convergence of multigrid methods, Hackbusch [20, 21] developed some fundamental elements for multigrid analysis. For other representative work on the early development of multigrid methods, we refer to [23, 41, 16, 32] and the references therein. Since the early 1980s, multigrid has been well developed and widely applied in scientific computing (see, e.g., [32, 34]).

The general convergence proofs (e.g., [22, 7]) for multigrid methods required regularity properties of the boundary value problem and quasi-uniformness of the underlying finite element or finite difference meshes. These requirements led to the further development of hierarchical basis methods [5, 1, 40, 6]. Convergence theories based on some algebraic approximation assumptions appeared in [26, 31, 25]. For second-order elliptic boundary value problems without full elliptic regularity, the convergence theory of the V-cycle multigrid with the Richardson-type smoother was studied by Brenner . A unified treatment for multigrid convergence is via the method of subspace corrections [9, 10, 35, 8]. Under such a framework, multigrid convergence can be established without the regularity and quasi-uniformness assumptions. An exact characterization for the convergence factor of the method of subspace corrections (as well as the method of alternating projections) in a Hilbert space setting was established by Xu and Zikatanov , which is the so-called XZ-identity. In 2010, Napov and Notay  presented a systematic comparison of the convergence bounds for the V-cycle multigrid methods. It is not possible to review all relevant literature on multigrid convergence here; for more details, we refer to the survey papers [41, 24, 30, 37] and the references therein.

Algebraic multigrid [14, 13, 31] constructs the coarsening process in a purely algebraic manner that requires no explicit knowledge of geometric properties, which has been widely applied in scientific and engineering computing, especially in the situations associated with complex domains, unstructured grids, problems with jump coefficients, etc (see, e.g., [34, 37]

). It is feasible to show optimal convergence properties (e.g., independent of the mesh size) of multigrid methods via the convergence theories mentioned above, e.g., the theory of subspace correction methods. However, the convergence estimates of such approaches do not, in general, give satisfactory predictions of actual multigrid convergence speed

[32, Page 96]. Moreover, some required assumptions may be difficult to check for algebraic multigrid methods. Unlike the mature convergence theory developed for geometric multigrid methods, two-grid analysis is still a main strategy for motivating and analyzing algebraic ones [24, 30].

A common wisdom on multigrid convergence can be stated as follows: If exact two-grid methods converge sufficiently well (i.e., the convergence speed is fast enough), then the corresponding multigrid method with cycle index (see Algorithm 2) has the similar convergence properties as two-grid ones [32, Page 77]. In 2007, Notay [29, Theorem 3.1] proved that, if the convergence factor of exact two-grid methods is uniformly bounded by , then the convergence factor of the corresponding W-cycle multigrid method (corresponding to ) is bounded by . This analysis (as well as the standard one in [32, Theorem 3.2.1]) fails to deliver a level-independent upper bound for the convergence factor of the V-cycle multigrid (corresponding to ). In 2010, Napov and Notay  further investigated the connection between two-grid convergence and V-cycle multigrid convergence, and showed that, besides the uniform two-grid convergence, additional conditions are required to derive a level-independent upper bound for the convergence factor of the V-cycle multigrid.

Since for , Notay’s result [29, Theorem 3.1] is only applicable for . In practice, it is observed that the W-cycle multigrid methods may have the similar convergence properties as two-grid ones even if . For example, we apply the classical algebraic multigrid method  to solve the 2D Poisson’s equation with homogeneous Dirichlet boundary condition on a unit square (using the P-finite element on a quasi-uniform grid with one million interior vertices). From Table 1, we observe that the W-cycle multigrid methods behave similarly to the corresponding two-grid ones even when the two-grid methods converge slowly.

Motivated by these observations, we revisit the convergence analysis of multigrid methods and establish a new convergence theory for multigrid methods based on the inexact two-grid theory developed in . Our main results can be divided into two parts.

• The first part includes three types of convergence estimates: (3.12), (3.18), and (3.20). These results are valid for any cycle index , from which one can readily get the convergence estimates for the V-cycle and W-cycle multigrid methods. The upper bounds in (3.12) and (3.18) are strictly decreasing with respect to , both of which tend to the maximum of two-grid convergence factors over all levels as . In particular, Notay’s result [29, Theorem 3.1] can be directly deduced from the estimate (3.18). The third estimate (3.20) involves the level index . The upper bound in (3.20) is strictly increasing with respect to , which tends to the bound in (3.18) as .

• The second part is concerned with two alternating combinations of the V-cycle and W-cycle multigrid methods, which can be viewed as multigrid methods with a fractional cycle index . This part contains two types of convergence estimates. The first type consists of (4.4) and (4.6), which depend on the parity of level index and the extreme quantities defined by (3.8) and (3.10). As a corollary, if the convergence factor of exact two-grid methods is uniformly bounded by , then the convergence factor of the alternating multigrid methods is bounded by or (which depends on the parity of level index). The second type includes (4.19) and (4.20), which involve the level index . If , then the upper bounds in (4.19) and (4.20) tend to the bounds in (4.4) and (4.6), respectively.

It is worth mentioning that our estimates do not require the coarsest-grid problem to be solved exactly.

The rest of this paper is organized as follows. In Section 2, we first introduce the convergence estimates for inexact two-grid methods, and then give some basic assumptions and properties on multigrid methods. In Section 3, we present a new convergence analysis of standard multigrid methods, which contains three types of estimates. In Section 4, we establish a convergence theory for the alternating combinations of the V-cycle and W-cycle multigrid methods. In Section 5, we give some concluding remarks.

## 2. Preliminaries

In this section, we introduce two important convergence estimates for inexact two-grid methods, and give some general assumptions involved in the convergence analysis of multigrid methods. For convenience, we first list some basic notation used in the subsequent discussions.

• denotes the identity matrix (or when its size is clear from context).

• and

denote the minimum and maximum eigenvalues of a matrix, respectively.

• denotes the spectrum of a matrix.

• denotes the energy norm induced by a symmetric and positive definite (SPD) matrix . That is, for any , ; for any , .

### 2.1. Two-grid methods

Consider solving the linear system

 (2.1) Au=f,

where is SPD, , and . To describe two-grid methods, we need the following assumptions.

• is a nonsingular smoother, and is SPD.

• is a prolongation matrix of rank , where is the number of coarse variables.

• is the Galerkin coarse-grid matrix.

• is a general SPD coarse-grid matrix.

Given an initial guess , the standard two-grid scheme for solving (2.1) can be described as Algorithm 1.

The iteration matrix of Algorithm 1 is

 (2.2) ˜ETG=(I−M−TA)(I−PB−1cPTA)(I−M−1A),

which satisfies

 u−uTG=˜ETG(u−u0).

In particular, if , then Algorithm 1 is called an exact two-grid method. In this case, the iteration matrix is denoted by

 (2.3) ETG=(I−M−TA)(I−PA−1cPTA)(I−M−1A).

Define

 ˜M:=MT(M+MT−A)−1MandΠ˜M:=P(PT˜MP)−1PT˜M.

Then, the convergence factor can be characterized as

 (2.4) ∥ETG∥A=1−1KTG,

where

 (2.5) KTG=maxv∈Rn∖{0}∥(I−Π˜M)v∥2˜M∥v∥2A.

The identity (2.4) is often called the two-level XZ-identity [17, Theorem 4.3] (see also [36, 42]).

The following theorem presents more general estimates for the convergence factor of two-grid methods [39, Corollaries 3.10 and 3.18], from which one can readily get the identity (2.4).

###### Theorem 2.1.

Let and . For Algorithm 1, if the coarse-grid matrix satisfies that

 (2.6) 0≤vTc(Bc−Ac)vc≤αvTcPT˜MPvc∀vc∈Rnc,

then

 (2.7) 1−1KTG≤∥˜ETG∥A≤1−1+αKTG+αλmax(A−1˜M).

Alternatively, if the coarse-grid matrix satisfies that

 (2.8) 0≤vTc(Bc−Ac)vc≤βvTcAcvc∀vc∈Rnc,

then

 (2.9) 1−1KTG≤∥˜ETG∥A≤1−1+βKTGλmin(˜M−1A)(1+β)KTG.

### 2.2. Multigrid methods

The fundamental module of multigrid methods is the two-grid procedure described by Algorithm 1. To design a well converged two-grid method, it is not necessary to solve the coarse-grid problem exactly, especially when the problem size is still large. Instead, without essential loss of convergence speed, one can solve the coarse problem approximately. A natural idea (i.e., multigrid idea) is to apply the two-grid scheme recursively. This validates that multigrid can be regarded as an inexact two-grid scheme, which enables us to analyze multigrid convergence via inexact two-grid theory.

By recursively applying Algorithm 1 in the coarse-grid correction steps, one can obtain a multigrid algorithm. To describe the algorithm concisely, we give some notation and assumptions.

• The algorithm involves levels with indices , where and correspond to the coarsest-level and the finest-level, respectively.

• For each , is the number of coarse variables at level , and .

• For each , denotes a prolongation matrix from level to level , and .

• Let . For each , denotes the Galerkin coarse-grid matrix at level .

• is an SPD approximation to , and is a symmetric and positive semidefinite (SPSD) matrix.

• For each , denotes a nonsingular smoother at level , and is SPD.

• At level , the number of presmoothing is equal to that of postsmoothing, which is denoted by .

• The cycle index involved in the coarse-grid correction steps is denoted by , which is a positive integer.

Given an initial guess , the standard multigrid method for solving the linear system can be described as Algorithm 2. The symbols and in Algorithm 2 mean that the corresponding schemes will be carried out and iterations, respectively. In particular, and correspond to the V- and W-cycles, respectively (see Figure 1).

The iteration matrix of Algorithm 2 is

 (2.10) ˜E(k)MG=(I−S−TkAk)νk[I−Pk(I−(˜E(k−1)MG)γ)A−1k−1PTkAk](I−S−1kAk)νk,

which satisfies

 uk−uMG=˜E(k)MG(uk−˜uk).

For brevity, we define an equivalent smoother by the relation

 (2.11) I−M−1kAk=(I−S−1kAk)νk.

Due to is SPD, it follows that

 ∥∥I−S−1kAk∥∥Ak<1,

which, together with (2.11), yields

 ∥∥I−M−1kAk∥∥Ak<1.

This implies that is also SPD. In addition, from (2.11), we have

 I−M−TkAk=(I−S−TkAk)νk.

Thus,

 (2.12)

with

 ˜E(1)MG=(I−M−T1A1)(I−P1˜A−10PT1A1)(I−M−11A1).

In view of (2.12), we have

where

 ˆMk=I−A12kM−1kA12k.

Applying mathematical induction, we can deduce that is symmetric and

 λ(˜E(k)MG)=λ(A12k˜E(k)MGA−12k)⊂[0,1)∀k=1,…,L,

 ∥∥˜E(k)MG∥∥Ak<1.

As a result, can be written as

 (2.13) ˜E(k)MG=I−˜B−1kAk,

where is SPD and is SPSD. Combining (2.12) and (2.13), we obtain the recursive relation

where

 (2.14) ¯¯¯¯¯¯Mk:=Mk(Mk+MTk−Ak)−1MTk.

Interchanging the roles of and in (2.14) yields another symmetrized smoother

 (2.15) ˜Mk:=MTk(Mk+MTk−Ak)−1Mk.

It is easy to check that both and are SPSD matrices.

## 3. Convergence analysis: Standard cycles

Comparing (2.2) with (2.12), we see that Algorithm 2 is essentially an inexact two-grid method with , , , and

 (3.1) Bc=Ak−1(I−(˜E(k−1)MG)γ)−1.
###### Remark 3.1.

Note that the coarse-grid matrix in Algorithm 1 has to be SPD. From (3.1), we have

 Bc =A12k−1(A−12k−1−(˜E(k−1)MG)γA−12k−1)−1 =A12k−1(A−12k−1−A−12k−1A12k−1(˜E(k−1)MG)γA−12k−1)−1 =A12k−1(I−A12k−1(˜E(k−1)MG)γA−12k−1)−1A12k−1 =A12k−1[I−(A12k−1˜E(k−1)MGA−12k−1)γ]−1A12k−1.

Due to is symmetric and , it follows that given by (3.1) is an SPD matrix.

Define

 (3.2) σ(k)TG :=∥∥E(k)TG∥∥Ak, (3.3) ˜σ(k)MG :=∥∥˜E(k)MG∥∥Ak.

The quantities and are referred to as the convergence factors of the exact two-grid method and the multigrid method at level , respectively. According to the lower bound in (2.7) (or (2.9)), we deduce that

 (3.4) ˜σ(k)MG≥1−1K(k)TG=σ(k)TG,

which reveals that a fast exact two-grid method is necessary for good convergence of the corresponding multigrid method.

Based on Theorem 2.1, we can derive the following upper bounds for .

###### Lemma 3.2.

For any , it holds that

 (3.5) ˜σ(k)MG≤1−1+λmax((PTk˜MkPk)−1Ak−1)(˜σ(k−1)MG)γ1−(˜σ(k−1)MG)γ11−σ(k)TG+λmax(A−1k˜Mk)λmax((PTk˜MkPk)−1Ak−1)(˜σ(k−1)MG)γ1−(˜σ(k−1)MG)γ

and

 (3.6) ˜σ(k)MG≤σ(k)TG+(˜σ(k−1)MG)γ(1−σ(k)TG−λmin(˜M−1kAk)).
###### Proof.

By (3.1), we have

 Bc−Ac =Ak−1(I−(˜E(k−1)MG)γ)−1−Ak−1 =A12k−1[I−(A12k−1˜E(k−1)MGA−12k−1)γ]−1A12k−1−Ak−1 =A12k−1[I−(A12k−1˜E(k−1)MGA−12k−1)γ]−1(A12k−1˜E(k−1)MGA−12k−1)γA12k−1.

Then, for any , we have

 vTk−1(Bc−Ac)vk−1vTk−1PTk˜MkPkvk−1 =vTk−1(Bc−Ac)vk−1vTk−1Ak−1vk−1vTk−1Ak−1vk−1vTk−1PTk˜MkPkvk−1

where we have used the facts that and

 0

Using (2.7), we obtain

 ˜σ(k)MG≤1−1+λmax((PTk˜MkPk)−1Ak−1)(˜σ(k−1)MG)γ1−(˜σ(k−1)MG)γK(k)TG+λmax(A−1k˜Mk)λmax((PTk˜MkPk)−1Ak−1)(˜σ(k−1)MG)γ1−(˜σ(k−1)MG)γ.

The estimate (3.5) then follows from the relation

 (3.7) K(k)TG=11−σ(k)TG.

Similarly, we have

 vTk−1(Bc−Ac)vk−1vTk−1Acvk−1∈[0,(˜σ(k−1)MG)γ1−(˜σ(k−1)MG)γ].

An application of (2.9) yields

 ˜σ(k)MG≤1−1+(˜σ(k−1)MG)γ(K(k)TGλmin(˜M−1kAk)−1)K(k)TG.

Using (3.7), we can arrive at the estimate (3.6) immediately. ∎

In what follows, we establish three types of convergence estimates for Algorithm 2 based on Lemma 3.2. For convenience, we define

 (3.8) σL :=max1≤k≤Lσ(k)TG, (3.9) τL :=max1≤k≤Lλmax((PTk˜MkPk)−1Ak−1), (3.10) εL :=min1≤k≤Lλmin(˜M−1kAk).

### 3.1. Estimate of the first kind

The definitions (3.8)–(3.10) imply that

 maxvk∈Rnk∖{0}vTk˜Mk(I−Pk(PTk˜MkPk)−1PTk˜Mk)vkvTkAkvk≤11−σL, maxvk∈range(Pk)∖{0}vTkAkvkvTk˜Mkvk≤τL, maxvk∈Rnk∖{0}vTk˜MkvkvTkAkvk≤1εL,

which, together with the positive semidefiniteness of , lead to the assumptions in the following lemma.

###### Lemma 3.3.

Assume that , , and . Then, there exists a strictly decreasing sequence with the limit such that is a root of the equation

 σLεL(1−xγ)+τL(1−εL)(1−σL)xγεL(1−xγ)+τL(1−σL)xγ−x=0(0
###### Proof.

Define

 Fγ(x):=σLεL(1−xγ)+τL(1−εL)(1−σL)xγεL(1−xγ)+τL(1−σL)xγ−x.

Obviously, is a continuous function in . Direct computations yield

 Fγ(σL) =τL(1−σL−εL)(1−σL)σγLεL(1−σγL)+τL(1−σL)σγL>0, Fγ(1−εL) =εL(1−σL−εL)[(1−εL)γ−1]εL−εL(1−εL)γ+τL(1−σL)(1−εL)γ<0.

Hence, has at least one root in .

Let be a root of . Note that is a strictly increasing function with respect to . We then have

Since , there exists an such that . Repeating this process, one can get a strictly decreasing sequence .

Due to , it follows that

 xγ=σLεL(1−(xγ)γ)+τL(1−εL)(1−σL)(xγ)γεL(1−(xγ)γ)+τL(1−σL)(xγ)γ,

which yields

 limγ→+∞xγ=limγ→+∞σLεL(1−(xγ)γ)+τL(1−εL)(1−σL)(xγ)γεL(1−(xγ)γ)+τL(1−σL)(xγ)γ=σL.

This completes the proof. ∎

Using (3.5) and Lemma 3.3, we can derive the following convergence estimate.

###### Theorem 3.4.

Let be stated as in Lemma 3.3. If the coarsest-grid matrix in Algorithm 2 satisfies

 (3.11) 0≤vT0(˜A0−A0)v0≤εL(xγ−σL)(1−σL)(1−εL−xγ)vT0PT1˜M1P1v0∀v0∈Rn0,

then

 (3.12) ˜σ(k)MG≤xγ∀k=1,…,L.
###### Proof.

By (2.7) and (3.11), we have

 ˜σ(1)MG≤1−1+εL(xγ−σL)(1−σL)(1−εL−xγ)11−σ(1)TG+λmax(A−11˜M1)εL(xγ−σL)(1−σL)(1−εL−xγ).

The definitions (3.8) and (3.10) mean that

 11−σ(1)TG≤11−σLandλmax(A−11˜M1)=1λmin(˜M−11A1)≤1εL.

Hence,

 ˜σ(1)MG≤1−1+εL(xγ−σL)(1−σL)(1−εL−xγ)11−σL+xγ−σ