 # Convergence analysis of inexact two-grid methods: A theoretical framework

Multigrid methods are among the most efficient iterative techniques for solving large-scale linear systems that arise from discretized partial differential equations. As a foundation for multigrid analysis, two-grid theory plays an important role in understanding and designing multigrid methods. Convergence analysis of exact two-grid methods (i.e., the Galerkin coarse-grid system is solved exactly) has been well developed: the convergence factor of exact two-grid methods can be characterized by an identity. However, convergence theory of inexact ones (i.e., the coarse-grid problem is solved approximately) is still less mature. In this paper, a theoretical framework for the convergence analysis of inexact two-grid methods is developed. More specifically, two-sided bounds for the energy norm of the error propagation matrix are established under different approximation conditions, from which one can readily get the identity for the convergence factor of exact two-grid methods.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Multigrid is a popular and effective solver for systems of linear equations stemming from discretized partial differential equations. For a large class of linear systems, it has been proved to possess uniform convergence with (nearly) optimal complexity (i.e., it requires about work for a linear system with unknowns); see, e.g., [7, 23, 24]. The fundamental module of multigrid is a two-grid scheme, which involves two alternate processes: the smoothing (or local relaxation) step and the coarse-grid correction step. The optimality is achieved when the smoothing and coarse-grid correction steps are complementary.

Typically, the smoothing step is a stationary iterative procedure, such as the (weighted) Jacobi-type and Gauss–Seidel-type iterations. These classical methods are generally effective to eliminate the high-frequency (i.e., oscillatory) error components, whereas the low-frequency (i.e., smooth) parts cannot be effectively eliminated [7, 23]. To remedy this defect, the coarse-grid correction step is designed to reduce the low-frequency error components by solving a coarse problem with much fewer unknowns (the number of these unknowns is denoted by ). The coarse-grid correction step involves two intergrid operators that transfer information between fine- and coarse-grids: a restriction matrix

that restricts the fine-grid residual to the coarse-grid; a prolongation (or interpolation) matrix

with full column rank that extends the correction computed on the coarse-grid to the fine one. Usually, is taken to be the transpose of (as considered in this paper). The Galerkin coarse-grid matrix is then defined as , which gives the coarse representation of the fine-grid matrix .

Most of the existing literature on two-grid theory (see, e.g., [10, 35, 18, 30, 6]) focus on exact two-grid methods with some exceptions like [17, 24]. A powerful identity has been established to characterize the convergence factor of exact two-grid methods [29, 10]. To design a well converged two-grid method, it is not necessary to solve the coarse problem exactly, especially when the problem size is still large. Multigrid is typically a recursive call (e.g., the V- and W-cycles) of the two-grid scheme and hence can be treated as an inexact two-grid scheme. As is well known, two-grid convergence is often sufficient to assess the convergence of the W-cycle multigrid methods; see, e.g., [11, 23]. With the aid of the hierarchical basis idea  and the minimization property of Schur complement (see, e.g., [1, Theorem 3.8]), Notay 

derived a convergence estimate for inexact two-grid methods. Based on this estimate, he also showed that, if the convergence factor of exact two-grid methods is uniformly bounded by

, then the convergence factor of the corresponding W-cycle multigrid method is bounded by .

Besides theoretical considerations, two-grid theory can also guide the design of multigrid algorithms. The implementation of multigrid scheme on large-scale parallel machines is still a challenging topic, especially in the era of exascale computing. For instance, stencil sizes (the number of nonzero entries in a row) of the standard Galerkin coarse-grid matrices tend to increase further down in the multilevel hierarchy of algebraic multigrid methods [5, 3, 19], which will increase the communication costs. As problem size increases and the number of levels grows, the overall efficiency of parallel algebraic multigrid methods may decrease dramatically. To maintain multigrid convergence and improve parallel efficiency, some sparse approximation strategies for have been proposed; see, e.g., [4, 22, 21, 8]. Motivated by the convergence analysis in , Falgout and Schroder  proposed a non-Galerkin coarsening strategy to improve the parallel performance of algebraic multigrid algorithms.

In this paper, we present a systematic convergence analysis of inexact two-grid methods. Two-sided bounds for the energy norm of the error propagation matrix are established in a purely algebraic manner. Our main results include the following three types of estimates.

• The first one (3.3) is a general convergence estimate, which slightly improves the existing one in [17, Theorem 2.2]. This estimate is valid for any symmetric and positive definite (SPD) coarse-grid matrix . In practice, we are more interested in the situation that is a suitable approximation to , which motivates the next two estimates.

• The second one (3.10) is established under the approximation condition

 (1.1) −αvTcPT˜MPvc≤vTc(Bc−Ac)vc≤βvTcPT˜MPvc∀vc∈Rnc,

where is a symmetrized smoother defined by (2.6), , and . Clearly, the condition (1.1) measures how far deviates from by reference to the restricted smoother (it can be viewed as an approximation to ).

• The third one (3.44) is established under the “relative error” condition

 (1.2) −γvTcAcvc≤vTc(Bc−Ac)vc≤δvTcAcvc∀vc∈Rnc,

where and . A special case of the condition (1.2) () appeared in [24, Page 145].

It is worth mentioning that our estimates generalize the identity for the convergence factor of exact two-grid methods .

The rest of this paper is organized as follows. In Section 2, we first introduce some fundamental matrices involved in the analysis of two-grid methods, and then present the identity for the convergence factor of exact two-grid methods. In Section 3, we establish the convergence theory of inexact two-grid methods, which mainly contains three types of estimates. In Section 4, we give some concluding remarks.

## 2. Preliminaries

In this section, we introduce some algebraic properties of two-grid methods, which play an important role in the convergence analysis of inexact two-grid methods. For convenience, we first list some notation used in the subsequent discussions.

• denotes the identity matrix (or when its size is clear from context).

• denotes the -th eigenvalue of a matrix (assuming that the eigenvalues are algebraically arranged in the same order throughout this paper).

• , , and stand for the smallest eigenvalue, the smallest positive eigenvalue, and the largest eigenvalue of a matrix, respectively.

• denotes the spectrum of a matrix.

• denotes the spectral radius of a matrix.

• denotes the spectral norm of a matrix.

• denotes the spectral condition number of a matrix.

• denotes the energy norm induced by an SPD matrix : for any , ; for any , .

### 2.1. Two-grid methods

Consider solving the linear system

 (2.1) Au=f,

where is SPD, , and . Given an initial guess and a nonsingular matrix , we perform the smoothing process

 (2.2) uk+1=uk+M−1(f−Auk)k=0,1,…,

where is called a smoother and is the residual at the -th iteration. Let . From (2.2), we have

 ek+1=(I−M−1A)ek,

which yields

 ek=(I−M−1A)ke0.

Hence,

 (2.3) ∥ek∥A≤∥(I−M−1A)k∥A∥e0∥A≤∥I−M−1A∥kA∥e0∥A.

If , we deduce from (2.3) that, for any initial error

, the error vector

tends to zero as . Since

 ∥(I−M−1A)v∥2A=∥v∥2A−⟨(M+MT−A)M−1Av,M−1Av⟩∀v∈Rn,

a sufficient and necessary condition for the iteration (2.2) to be -convergent (i.e., ) is that is SPD.

In view of the -convergent smoother , we define

 (2.4) ¯¯¯¯¯¯M:=M(M+MT−A)−1MT,

which is often referred to as a symmetrized smoother. It is easy to check that

 (2.5) I−¯¯¯¯¯¯M−1A=(I−M−TA)(I−M−1A).

Interchanging the roles of and in (2.4) yields another symmetrized smoother

 (2.6) ˜M:=MT(M+MT−A)−1M,

which satisfies

 (2.7) I−˜M−1A=(I−M−1A)(I−M−TA).

According to (2.5) and (2.7), we deduce that both and are symmetric and positive semidefinite (SPSD).

The following lemma provides two useful relations between the symmetrized smoothers and (see [18, Lemma 1]).

###### Lemma 2.1.

Let and be defined by (2.4) and (2.6), respectively. Then

 (2.8) ¯¯¯¯¯¯M(I−M−TA)=(I−AM−T)˜M, (2.9) (I−AM−1)¯¯¯¯¯¯M(I−M−TA)=˜M−A.

Let be a prolongation (or interpolation) matrix with full column rank, where is the number of coarse variables. Let be a restriction matrix. The Galerkin coarse-grid matrix is then denoted by . For a given initial guess , the standard two-grid scheme (i.e., the presmoothing and postsmoothing steps are performed in a symmetric way) for solving (2.1) can be described as Algorithm 1. If the coarse-grid matrix is chosen as , then Algorithm 1 is called an exact two-grid method; otherwise, it is called an inexact two-grid method.

From Algorithm 1, we have

 u−uTG=˜ETG(u−u0),

where

 (2.10) ˜ETG=(I−M−TA)(I−PB−1cPTA)(I−M−1A)

is called the iteration matrix (or error propagation matrix) of Algorithm 1. It can be rewritten as

 (2.11) ˜ETG=I−˜B−1TGA,

where

 (2.12) ˜B−1TG=¯¯¯¯¯¯M−1+(I−M−TA)PB−1cPT(I−AM−1).

Obviously, is an SPD matrix, which is called inexact two-grid preconditioner. From (2.11), we deduce that

 (2.13) ∥˜ETG∥A=ρ(˜ETG)=max{λmax(˜B−1TGA)−1,1−λmin(˜B−1TGA)}.

### 2.2. Convergence of exact two-grid methods

The convergence theory of exact two-grid methods has been well studied in the literature. For readers interested in its algebraic analysis, we refer to [24, 13, 18] and the references therein.

For the special case , we denote the iteration matrix by

 (2.14) ETG=(I−M−TA)(I−PA−1cPTA)(I−M−1A),

which can be written as

 (2.15) ETG=I−B−1TGA,

where

 (2.16) B−1TG=¯¯¯¯¯¯M−1+(I−M−TA)PA−1cPT(I−AM−1).

It is easy to see that is an SPD matrix, which is called exact two-grid preconditioner.

The following theorem gives an identity for the convergence factor of Algorithm 1 with  [10, Theorem 4.3], which is the so-called two-level XZ-identity [29, 35].

###### Theorem 2.2.

Let be defined by (2.6), and define

 (2.17) Π˜M:=P(PT˜MP)−1PT˜M.

The convergence factor of exact two-grid methods can be characterized as

 (2.18) ∥ETG∥A=1−1KTG,

where

 (2.19) KTG=maxv∈Rn∖{0}∥(I−Π˜M)v∥2˜M∥v∥2A.

The matrix defined by (2.17) is an -orthogonal projection (i.e., is orthogonal with respect to the inner product ) onto the coarse space . Similarly, we define a useful -orthogonal projection onto :

 (2.20) ΠA:=PA−1cPTA.

For a fixed smoother (e.g., the weighted Jacobi or Gauss–Seidel smoother), an optimal interpolation can be obtained by minimizing

. Unfortunately, the optimal interpolation is typically expensive to compute, because it requires explicit knowledge of eigenvectors corresponding to small eigenvalues of the eigenvalue problem

; see [30, 6] for details.

To maintain two-grid convergence and design an interpolation with simple structure, one can minimize an upper bound of . Let , where and . Obviously, is a projection onto . Then

 KTG =maxv∈Rn∖{0}∥(I−Π˜M)v∥2˜M∥v∥2A =maxv∈Rn∖{0}minvc∈Rnc∥v−Pvc∥2˜M∥v∥2A ≤maxv∈Rn∖{0}∥(I−Q)v∥2˜M∥v∥2A=:K,

which, together with (2.18), yields

 ∥ETG∥A≤1−1K.

By minimizing over all interpolations, one can obtain the so-called ideal interpolation [9, 33], which provides new insights for designing an interpolation with sparse or simple structure (see, e.g., [15, 16, 33, 14]). In particular, if is taken to be , then . Hence, the ideal interpolation can be viewed as a “relaxation” of the optimal one. Furthermore, a quantitative relation between and can be found in .

## 3. Convergence analysis

In this section, we establish the convergence theory of inexact two-grid methods. More specifically, two-sided bounds for the energy norm of the iteration matrix are derived under different approximation conditions.

### 3.1. Convergence estimate of the first kind

The first estimate (see (3.3) below) is a general convergence result, which does not need any additional conditions on except for its positive definiteness.

We first prove an important lemma, which gives two relations between the extreme eigenvalues of and .

###### Lemma 3.1.

Define

where

 Vc=range(PT(I−AM−1))∖{0}.

Then

 (3.1) Δ1 ≤λmax(˜B−1TGA)λmax(B−1TGA)≤Δ2, (3.2) Δ1 ≤λmin(˜B−1TGA)λmin(B−1TGA)≤Δ2.
###### Proof.

From (2.12) and (2.16), we have

 λmin(˜B−1TGBTG) =minv∈Rn∖{0}vT¯¯¯¯¯¯M−1v+vT(I−M−TA)PB−1cPT(I−AM−1)vvT¯¯¯¯¯¯M−1v+vT(I−M−TA)PA−1cPT(I−AM−1)v ≥Δ1.

Analogously, it holds that

 λmax(˜B−1TGBTG)≤Δ2.

Hence,

 Δ1≤λmin(˜B−1TGBTG)≤λmax(˜B−1TGBTG)≤Δ2,

which yields

 λmax(˜B−1TGA) ≥λmax(B−1TGA)λmin(˜B−1TGBTG)≥Δ1λmax(B−1TGA), λmax(˜B−1TGA) ≤λmax(B−1TGA)λmax(˜B−1TGBTG)≤Δ2λmax(B−1TGA).

Thus, the inequality (3.1) holds. The inequality (3.2) can be proved similarly. ∎

###### Remark 3.2.

It is easy to see that

 Δ1≥min{1,λmin(B−1cAc)}andΔ2≤max{1,λmax(B−1cAc)}.

From (3.1) and (3.2), we deduce that

 λmin(˜B−1TGA)λmin(B−1TGA)≥min{1,λmin(B−1cAc)}, λmax(˜B−1TGA)λmax(B−1TGA)≤max{1,λmax(B−1cAc)},

which are the results derived by Notay [17, Theorem 2.2]. It is worth noting that the specific form of is not used in the proof of Lemma 3.1. As the results in , here does not have to be the Galerkin-type.

The expression (2.14) implies that is an SPSD matrix and

 λmin(A12ETGA−12)=0.

Since , the matrix is also SPSD and

 λmax(B−1TGA)=1.

Due to

 1−1KTG=∥ETG∥A=λmax(ETG)=1−λmin(B−1TGA),

it follows that

 λmin(B−1TGA)=1KTG.

Hence, the estimates (3.1) and (3.2) become

 Δ1 ≤λmax(˜B−1TGA)≤Δ2, Δ1KTG ≤λmin(˜B−1TGA)≤Δ2KTG,

which, together with (2.13), yield the following convergence estimate.

###### Theorem 3.3.

The convergence factor of Algorithm 1 satisfies that

 (3.3)

The only assumption on the coarse-grid matrix is its positive definiteness. Hence, the estimate (3.3) is valid for any SPD matrix . Nevertheless, to design a well converged two-grid method, we are more interested in the situation that is a suitable approximation to . In what follows, we focus on the convergence analysis of Algorithm 1 under general approximation conditions. These conditions arise from measuring the difference between and .

### 3.2. Convergence estimate of the second kind

In light of (2.12), we can derive the following explicit expression for .

###### Lemma 3.4.

The inexact two-grid preconditioner can be expressed as

 (3.4) ˜BTG=A+(I−AM−T)˜M[I−P(PT˜MP+Bc−Ac)−1PT˜M](I−M−1A).
###### Proof.

Using (2.12) and the Sherman–Morrison–Woodbury formula [20, 25, 31], we obtain

 ˜BTG=¯¯¯¯¯¯M−¯¯¯¯¯¯M(I−M−TA)P[Bc+PT(I−AM−1)¯¯¯¯¯¯M(I−M−TA)P]−1PT(I−AM−1)¯¯¯¯¯¯M.

By (2.8) and (2.9), we have

 (3.5) ˜BTG=¯¯¯¯¯¯M−(I−AM−T)˜MP(PT˜MP+Bc−Ac)−1PT˜M(I−M−1A).

The relation (2.9) implies that

 (3.6) ¯¯¯¯¯¯M=A+(I−AM−T)˜M(I−M−1A).

Combining (3.5) and (3.6), we can arrive at the expression (3.4) immediately. ∎

###### Remark 3.5.

In particular, if , we get from (3.4) that

 (3.7) BTG=A+(I−AM−T)˜M(I−Π˜M)(I−M−1A),

from which one can easily see that is SPSD.

The following lemma provides some useful eigenvalue identities, which play an important role in the subsequent convergence analysis.

###### Lemma 3.6.

The extreme eigenvalues of and have the following properties:

 (3.8a) λmin((A−1˜M−I)(I−Π˜M))=0, (3.8b) λmax((A−1˜M−I)(I−Π˜M))=KTG−1, (3.8c) λmin((A−1˜M−I)Π˜M)=0, (3.8d) λmax((A−1˜M−I)Π˜M)=λmax(A−1˜MΠ˜M)−1.
###### Proof.

Since is an SPSD matrix and is an -orthogonal projection, we have

 λ((A−1˜M−I)(I−Π˜M))=λ((A−1−˜M−1)12˜M(I−Π˜M)(A−1−˜M−1)12),

which yields

 λ((A−1˜M−I)(I−Π˜M))⊂[0,+∞).

Similarly, we have

 λ((A−1˜M−I)Π˜M)⊂[0,+∞)andλ(A−1˜MΠ˜M)⊂[0,+∞).

Due to

 Π2˜M=Π˜Mandrank(Π˜M)=rank(P)=nc,

there exists a nonsingular matrix such that

 X−1Π˜MX=(Inc000).

Let be partitioned into the block form

 X−1A−1˜MX=(ˆX11ˆX12ˆX21ˆX22),

where , , , and . Straightforward computations yield

 X−1A−1˜MΠ˜MX=(ˆX110ˆX210).

Hence, the identities (3.8a), (3.8c), and (3.8d) hold.

In addition, using (2.7), (3.7), and the relation , we obtain

 KTG =1+λmax(A−1(I−AM−T)˜M(I−Π˜M)(I−M−1A)) =1+λmax((I−M−1A)(I−M−TA)A−1˜M(I−Π˜M)) =1+λmax((A−1˜M−I)(I−Π˜M)),

which yields the identity (3.8b). ∎

We are now in a position to present the convergence estimate of the second kind, which is based on characterizing the difference by reference to .

###### Theorem 3.7.

Let and . If the coarse-grid matrix satisfies

 (3.9) −αvTcPT˜MPvc≤vTc(Bc−Ac)vc≤βvTcPT˜MPvc∀vc∈Rnc,

then

 (3.10) L1≤∥˜ETG∥A≤min{U1,1,U1,2},

where

 L1 =1−min{1,1KTG(1−αλmax(A−1cPT˜MP)),1−αKTG−αλmax(A−1˜M)}, U1,1 =max{1−1KTG(1+βλmax(A−1cPT˜MP)),11−αλmax(A−1cPT˜MP)−1}, U1,2 =max{1−1+βKTG+βλmax(A−1˜M),1−α1−αλmax(A−1˜MΠ˜M)−1}.
###### Proof.

The proof is divided into two parts: the first part follows directly from (3.3); the second one is based on (2.13), (3.4), and Lemma 3.6.

Part I: From (3.9), we deduce that and are SPSD matrices. Hence,

 Δ1 ≥min{1,1λmax(A−1cBc)}≥11+βλmax(A−1cPT˜MP), Δ2 ≤max{1,1λmin(A−1cBc)}≤11−αλmax(A−1cPT˜MP).

An application of (3.3) yields

 (3.11) 1−1KTG(1−αλmax(A−1cPT˜MP))≤∥˜ETG∥A≤U1,1.

Part II: The relation (2.13) can be rewritten as

 (3.12) ∥˜ETG∥A=max{1λmin(A−1˜BTG)−1,1−1λmax(A−1˜BTG)}.

In order to establish two-sided bounds for , we need to estimate the extreme eigenvalues and . By (3.4), we have

 A−1˜BTG=I+(I−M−TA)A−1˜M[I−P(PT˜MP+Bc−Ac)−1PT˜M](I−M−1A),

 (3.13)

where we have used the relation (2.7).

(i) The positive semidefiniteness of implies that

 11−α(PT˜MP)−1−(PT˜MP+Bc−Ac)−1

is SPSD. Since is also SPSD, the matrix

 (A−1−˜M−1)12˜M[(I−P(PT˜MP+Bc−Ac)−1PT˜M)−(I−11−αΠ˜M)](A−1−˜M−1)12

is SPSD. This leads to, for any ,

 (3.14) λi((A−1˜M−I)[I−P(PT˜MP+Bc−Ac)−1PT˜M]) ≥λi((A−1˜M−I)[(1−α)I−Π˜M])1−α.

In particular, we have

 (3.15) λmin((A−1˜M−I)[I−P(PT˜MP+Bc−Ac)−1PT˜M]) ≥λmin((A−1˜M−I)[(1−α)I−Π˜M])1−α.

Using the Weyl’s theorem in matrix theory (see, e.g., [12, Theorem 4.3.1]), (3.8a), and (3.8d), we obtain

 (3.16) λmin((A−1˜M−I)[(1−α)I−Π˜M]) =λmin((A−1−˜M−1)12˜M[(1−α)I−Π˜M](A−1−˜M−1)12) ≥(1−α)λmin((A−1˜M−I)(I−Π˜M))−αλmax((A−1˜M−I)Π˜M) =α−αλmax(A−1˜MΠ˜M).

In view of (3.13), (3.15), and (3.16), it holds that

 (3.17) λmin(A−1˜BTG) ≥1