Momentum Accelerated Multigrid Methods

06/30/2020 ∙ by Chunyan Niu, et al. ∙ Tufts University NetEase, Inc 0

In this paper, we propose two momentum accelerated MG cycles. The main idea is to rewrite the linear systems as optimization problems and apply momentum accelerations, e.g., the heavy ball and Nesterov acceleration methods, to define the coarse-level solvers for multigrid (MG) methods. The resulting MG cycles, which we call them H-cycle (uses heavy ball method) and N-cycle (uses Nesterov acceleration), share the advantages of both algebraic multilevel iteration (AMLI)- and K-cycle (a nonlinear version of the AMLI-cycle). Namely, similar to the K-cycle, our H- and N-cycle do not require the estimation of extreme eigenvalues while the computational cost is the same as AMLI-cycle. Theoretical analysis shows that the momentum accelerated cycles are essentially special AMLI-cycle methods and, thus, they are uniformly convergent under standard assumptions. Finally, we present numerical experiments to verify the theoretical results and demonstrate the efficiency of H- and N-cycle.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Research on multigrid (MG) methods [1, 10, 11]

has been very active in recent years. The MG methods are efficient, scalable, and often computationally optimal for solving sparse linear systems of equations arising from discretizations of partial differential equations (PDEs). Therefore, they have been widely used in practical applications 

[15, 36, 6, 9, 33, 37, 35], especially the algebraic multigrid (AMG) methods [7, 29, 30, 8, 31, 38, 21]. However, the performance and efficiency of MG methods with standard V- or W-cycle may degenerate when the physical and geometric properties of the PDEs become more and more complicated.

For symmetric positive definite (SPD) problems, more involved cycles have been proposed. Axelsson and Vassilevski introduced the algebraic multilevel iteration (AMLI)-cycle MG method [2, 3, 34], which uses Chebyshev polynomial to define the coarse-level solver. However, the AMLI-cycle MG method requires an accurate estimation of extreme eigenvalues on coarse levels to compute the coefficients of the Chebyshev polynomial, which may be difficult in practice. The K-cycle MG method [5, 18], which is a nonlinear version of the AMLI-cycle and does not need to estimate the extreme eigenvalues, was developed thanks to the introduction of the nonlinear preconditioning method [4, 14, 32]. In the K-cycle MG method, steps of the nonlinear preconditioned conjugate gradient (NPCG) method, with the MG on a coarser level as a preconditioner, are applied to define the coarse-level solver. Under the assumption that the convergence factor of the V-cycle MG with a bounded-level difference is bounded, the uniform convergence property of the K-cycle MG is shown in [18] if  is chosen to be sufficiently large. In [16], a comparative analysis was presented to show that the K-cycle method is always better (or no worse) than the corresponding -fold V-cycle (V-cycle) method. Although the K-cycle method does not need to estimate extreme eigenvalues, its nonlinear nature requires the usage of the NPCG method, which increases the computational and memory cost due to the loss of the three-term recurrence relationship of the standard conjugate gradient (CG) method.

In this work, we propose momentum accelerated MG cycles that have potential to overcome the drawbacks of the AMLI- and K-cycle MG methods. The idea is to rewrite the linear systems on coarse levels as optimization problems and apply the momentum acceleration techniques for optimizations. Two types of accelerations are considered, one is the heavy ball (HB) method [28] and the other one is the Nesterov acceleration (NA) [23, 25, 26]. We use these momentum accelerations to define the coarse-level solvers and the resulting MG cycles are referred to as H-cycle (using the HB method) and N-cycle (using the NA method), respectively. We show that the HB and NA methods, when applied to quadratic optimization problems, can be related to special polynomials approximation. For example, the polynomial associated with the HB method coincides with the best polynomial approximation to  proposed in [19, 20]. Thus, H- and N-cycle are essentially special cases of AMLI-cycle. Following standard analysis of the AMLI-cycle, we show that both cycles converge uniformly assuming the extreme eigenvalues are available. The theoretical results are verified numerically when accurate estimations of the extreme eigenvalues are provided.

From our preliminary numerical tests, the H- and N-cycle methods show their efficiency in practice when the extreme eigenvalues (or accurate estimations) are not available. By simply choosing  and , the N-cycle MG method shows its superior performance in practice and, surprisingly, is even better than the two-grid method for some cases. N-cycle shares advantages of both AMLI- and K-cycle. Namely, similar to K-cycle, N-cycle does not require the estimation of the extreme eigenvalues while its computational cost is the same as the AMLI-cycle since it is still a linear method. Additionally, since the N-cycle is derived from the optimization point of view, it has the potential to be generalized to other types of problems rather than the SPD problems.

The rest of this paper is organized as follows. In Section 2 we introduce the V-cycle MG algorithm and the HB and NA methods. In Section 3, we present the HB method and NA method for the preconditioned linear system and their relationships with polynomial approximations to . Then, the momentum accelerated MG methods and their uniform convergence results are discussed in Section 4. In Section 5, we present some numerical experiments that illustrate the efficiency of momentum accelerated MG methods, especially N-cycle. Finally, some conclusions are drawn in Section 6.

2 Preliminaries

In this section, since our proposed momentum accelerated multigrid methods combine multigrid cycles with momentum accelerations, we first recall the basic multigrid method for solving linear systems, including the V-cycle and AMLI-cycle methods. Then we will review two classical momentum accelerated gradient descent methods, i.e., the HB and NA methods, for solving general unconstrained convex optimization problems.

2.1 Multigrid

We consider solving the following linear system

(1)

where  is SPD. Assume we have constructed a hierarchical structure of the matrices , , with , the prolongations , , and the restrictions , . Here, we assume that , . Furthermore, let  denote the smoother on level , such as Jacobi or Gauss-Seidel method. Now we define the V-cycle  on level  (more precisely, the action of ) recursively as shown in Algorithm 1. Note that, when  or , V-cycle MG becomes the classical V-cycle and W-cycle MG, respectively.

1:  if  then
2:     
3:  else
4:     Presmoothing:
5:     Restriction:
6:     Coarse-grid correction: set  and repeat  times
7:     Prolongation:
8:     Postsmoothing:
9:  end if
10:  
Algorithm 1 V-cycle MG:

Next, we recall AMLI-cycle. Several polynomials have been proposed to define the coarse-grid correction, which leads to different AMLI-cycle methods. Here we consider the classical choice, the Chebyshev polynomial. Therefore, we first recall the Chebyshev polynomial , ,

(2)

Denote the condition number of a matrix  by , Algorithm 2 gives the classical Chebyshev semi-iterative method [13].

1:   is the given initial guess and 
2:  for  do
3:     , where  and B is the preconditioner
4:  end for
Algorithm 2 Chebyshev semi-iterative method for preconditioned linear system: 

In AMLI-cycle, instead of just repeating coarse-grid correction  times in Algorithm 1, Algorithm 2 is used to define the coarse-level solver. Algorithm 3 summarizes the AMLI-cycle method.

1:  if  then
2:     
3:  else
4:     Presmoothing:
5:     Coarse-grid correction: , where  is implemented as in Algorithm 2 with  as the preconditioner for steps
6:     Postsmoothing:
7:  end if
8:  
Algorithm 3 AMLI-cycle MG: 

To implement Algorithm 3 in practice, since Algorithm 2 uses a parameter , we need to compute  on each level. This means that we need to estimate the smallest eigenvalue  and the largest eigenvalue  since . The overall performance of AMLI-cycle Algorithm 3 depends on the estimation of those extreme eigenvalues. In SPD case, a good estimation for the largest eigenvalue is , However, a good estimation of the smallest eigenvalue  is not a straightforward task. This fact motivates the development of nonlinear AMLI-cycle, i.e., the K-cycle MG method. However, the nonlinear feature makes K-cycle less efficient than AMLI-cycle in practice in terms of computation and storage. Therefore, in this work, we aim to develop linear MG cycles that can take advantages of both AMLI- and K-cycles.

2.2 Momentum Acceleration Methods

In this section, we introduce the momentum acceleration techniques which are essential for our proposed momentum accelerated MG cycles. We consider the heavy ball and the Nesterov acceleration methods. Both of them are the first-order momentum acceleration methods for solving the following unconstrained optimization problem,

(3)

where  is a continuously differentiable strongly convex function satisfies

(4)

with  being the Lipschitz constant and  being the convexity constant. Here  denotes a generic inner product of  and  denotes the corresponding induced norm.

The optimization problem Eq. 3 is usually solved by the gradient descent (GD) method. Under proper assumptions, GD converges linearly with convergence rate  [26]. Many algorithms have been developed to accelerate the convergence rate and the HB method is a classical GD method by adding momentum at each iteration [28],

1:   are given as initial iterates,  and  are given parameters
2:  for do
3:     
4:  end for
Algorithm 4 Heavy ball method

Let us denote the minimizer of the optimization problem Eq. 3 by , then the next theorem shows that the HB method indeed speeds up the convergence under proper assumptions.

Theorem 1 ([28])

If  satisfies Eq. 4, let ,  in Algorithm 4, then the following convergence rate estimate holds,

Asymptotically, the convergence rate for the HB method is  which improves the convergence rate of the GD method since  in general.

Another momentum acceleration technique is the so-called Nesterov acceleration [23]. The NA method uses a different momentum and is effective and robust for a wide range of problems [23, 27]. Following [23], the NA method for solving Eq. 3 is presented in Algorithm 5.

1:   are given as initial iterates and  is a given parameter
2:  for  do
3:     
4:     
5:  end for
Algorithm 5 Nesterov acceleration method

We recall the convergence results of the NA method Algorithm 5 in the following theorem.

Theorem 2 ([26])

If  satisfies Eq. 4, let ,  Algorithm 5 generates a sequence of points  such that

where .

Asymptotically, the NA method’s convergence rate is  which also improves the convergence rate of the GD method. Here, we only present the NA method for the strongly convex functions which is enough for our purpose since we are considering solving SPD linear systems. In fact, as discussed in [23], the NA method can be applied to convex functions (i.e., ) or even nonconvex cases [24, 12]. This makes the NA method more attractive in practice.

3 Momentum Acceleration Methods for Preconditioned Linear Systems

Since we consider solving linear systems Eq. 1 with MG preconditioners, in this section, we present how to apply heavy ball and Nesterov acceleration to preconditioned linear systems, and this will be used to define the coarse-grid correction in our proposed cycles later. To make the application of the HB and NA methods straightforward, we rewrite solving the linear system Eq. 1 with an SPD preconditioner  as solving the following quadratic optimization problem,

(5)

where . Now we can derive the HB and NA methods for solving Eq. 5 and discuss their convergence behaviors, respectively. Moreover, similar to the semi-iterative method Algorithm 2, which uses Chebyshev polynomials Eq. 2, we show that the HB and NA methods applied to Eq. 5 can be considered as semi-iterative methods based on different polynomials. Roughly speaking, the HB method is related to the polynomial of best uniform approximation to  [20] and the NA method is also related to a polynomial that converges to .

3.1 Heavy Ball Method for Preconditioned Linear Systems

We first consider the HB method, applying  steps of the HB method (Algorithm 4) to Eq. 5 leads to Algorithm 6, which is the HB method for solving preconditioned linear systems.

1:   are given as initial iterates,  and  are given parameters
2:  for   do
3:     
4:  end for
Algorithm 6 Heavy ball method for preconditioned linear systems: 

Algorithm 6 is essentially a linear iterative method that uses two previous iterates. Therefore, it is a semi-iterative method. Moreover, since it is linear, after  iterations, we associate it with a linear operator  (a matrix in our case since we consider ). The convergence of the HB method for the preconditioned linear systems follows directly from the general convergence analysis of the HB method presented in Theorem 1. Note that in this case, we have  and  and  the result is presented in the following theorem.

Theorem 3

Let , in Algorithm 6, then the following convergence estimate holds,

3.1.1 Relationship with the Polynomial Approximation of

To better understand Algorithm 6 as a semi-iterative method, we aim to find the underlying polynomial associated with it. To this end, we need to look at how the error propagates and, from Algorithm 6, the error  satisfies the following three-term recurrence relationship,

This implies that  where  is a polynomial of degree at most  and satisfies ,

(6)

Rewrite , where  is a polynomial of degree at most . Substituting it into Eq. 6, we have the following three-term recurrence relationship of ,

(7)

Actually, if

and

then  is the polynomial of best approximation of  respect to  norm when  and , see [20] for details.

By relating the HB method with the polynomial of the best uniform approximation to , we give a different perspective to understand the acceleration mechanism of the HB method. In the next section, we will show that the NA method is also related to polynomial approximation to , which emphasizes the strong connection between the momentum acceleration and polynomial approximation to .

3.2 Nesterov Acceleration Method for Preconditioned Linear Systems

Now we consider Nesterov acceleration. Applying  steps of the NA method Algorithm 5 to Eq. 5, and eliminating  lead to Algorithm 7.

1:   are given as initial iterates and  is a given parameter
2:  for  do
3:     
4:  end for
Algorithm 7 NA method for preconditioned linear systems: 

Algorithm 7 shows that the NA method for solving preconditioned linear systems also is a linear iterative method that uses the previous two steps. It takes a weighted average of the previous two updates. Essentially, Algorithm 7 is another semi-iterative method, and, due to its linearity, we associate it with a linear operator (or a matrix) .

Similarly, based on Theorem 2, the convergence of the NA method for the preconditioned linear systems follows directly. Note that  and , the convergence result is presented in Theorem 4.

Theorem 4

Choose  in Algorithm 7, then the following convergence rate estimate holds,

3.2.1 Relationship with the Polynomial Approximation of

Next we show that the NA method is also related to the polynomial approximation of  when it is applied to the preconditioned linear systems. To this end, denote the error at the -th step of the NA method as  and, from Algorithm 7, it satisfies the following three-term recurrence relationship,

This implies that  where  is a polynomial of degree at most  and satisfies ,

(8)

Rewrite , where  is a polynomial of degree at most . Substituting it into (8), we have the following three-term recurrence relationship of ,

(9)

From Eq. 9, we have  as . Comparing with the polynomial of best uniform approximation to  given in [20] (or the polynomial related to the HB method as discussed in Section 3.1, the polynomial associated with the NA method is different although it also converges to  as we increase the polynomial degree . In [22], this polynomial was explicitly derived.

3.3 Discussion on Polynomials Associated with Different Methods

To better understand the behavior of the polynomial associated with the NA method and its comparison with Chebyshev polynomial used to define the standard AMLI-cycle Algorithm 3 (i.e., the scaled version of (2)) the polynomial (6) associated with the HB method, we plot them in Fig. 1Fig. 2 together for different polynomial degree  and different choice of  while keeping . Here, to better illustrate how the error converges to , we choose to look at the polynomial , which converges to  for all cases instead of , which converges to . We emphasis that they are related by .

(a)
(b)
(c)
(d)
(e)
(f)
Figure 1: The polynomials  with  and different .

Fig. 1 shows the case that  and all the polynomials converge to  as we increase the degree . We can also see that on the interval of interest, , all three polynomials give comparable results. The polynomial associated with the HB method usually gives the best results near  and the polynomial associated with the NA method usually gives the best results near , while the Chebyshev polynomial gives the best results overall due to its well-known min-max property. This suggests us, in this case, we can use the polynomials associated with the HB and NA methods to define the AMLI-cycle method, and the resulting MG method should have comparable performance with the standard AMLI-cycle method using the Chebyshev polynomial.

(a)
(b)
(c)
(d)
(e)
(f)
Figure 2: The polynomials  with  and different .

However, in practice, estimating  might be difficult and expensive. Thus, let us look at the simple choice . In this case, the polynomial associated with the HB method is not well-defined since neither  nor  is well-defined. Therefore, we only plot the other two polynomials in Fig. 2. As we can see, the polynomial associated with the NA methods could provide a better approximation than the Chebyshev methods especially when the polynomial degree  increases. This observation suggests that, in practice, if we replace the Chebyshev polynomial with the polynomial associated with the NA method in AMLI-cycle Algorithm 3, the resulting MG method would be more robust and efficient when we simply choose . This is verified by our numerical results presented in Section 5.

4 Momentum Accelerated Multigrid Cycles

In this section, we present the momentum accelerated MG cycles. The basic idea is to use  steps of the HB method (Algorithm 6) or the NA method (Algorithm 7) to define the coarse-gird corrections. Since both momentum acceleration methods are semi-iterative methods based on certain polynomials, the resulting MG cycles are essentially special cases of AMLI-cycle. However, to distinguish them from the AMLI-cycle based on Chebyshev polynomials, i.e., Algorithm 3, we call them heavy ball MG cycle (H-cycle) and Nesterov acceleration MG cycle (N-cycle), respectively.

Besides presenting the momentum accelerated MG cycles, we also study their convergence rates by combining the standard analysis of AMLI-cycle and the convergence results in Theorem 3 and Theorem 4, respectively.

4.1 H-cycle MG Method

In this subsection, we present the H-cycle method, which uses  steps of the heavy ball method (Algorithm 6) to define the coarse-grid correction. Based on the notation introduced in Section 2.1, Algorithm 1, and Algorithm 6, we present the H-cycle MG method in Algorithm 8.

1:  if  then
2:     
3:  else
4:     Presmoothing:
5:     Coarse-grid correction: , where  is implemented as in Algorithm 6 with  as the preconditioner for  steps
6:     Postsmoothing:
7:  end if
8:  
Algorithm 8 H-cycle MG:

As mentioned before, H-cycle is essentially an AMLI-cycle MG method. Moreover, since the polynomial Eq. 7 associated with the HB method is the polynomial of best uniform approximation to , therefore our H-cycle is equivalent to the AMLI-cycle method proposed in [19] and the estimate of the condition number has been analyzed in [19] as well. Here, to give a more intuitively expression, following standard analysis for the AMLI-cycle method presented in [35] and utilizing the convergence theory of the HB method (Theorem 3), we will show that H-cycle MG method presented in Algorithm 8 provides a uniform preconditioner under the assumption that the condition number of the two-grid method  is uniformly bounded. We summarize the results in Theorem 5 and comment that the general case can be analyzed similarly when the conditioned number of the V-cycle MG method with a bounded-level difference is uniformly bounded.

Theorem 5

Let  be defined by Algorithm 8 and  be implemented as in Algorithm 6 with  as the preconditioner with  and . Assume that the two-grid method has uniformly bounded condition number , then the condition number of  can be uniformly bounded. More specifically, we can choose  and , satisfying

(10)

such that

(11)

Then we have the following uniform estimate for the H-cycle preconditioner  defined by Algorithm 8,

(12)

Proof

First of all, it is clear that, as , the left-hand side of Eq. 11 tends to  , which implies that there exists a  satisfies Eq. 11.

In order to proof the inequality Eq. 12, assume by induction that , where the eigenvalue of  are in the interval  for some . Then we want to show that the eigenvalues of are contained in an interval  for some . Since we use  steps of Algorithm 6 with  as a preconditioner to define  in Algorithm 8, we have that

where

and  is the polynomial defined by Eq. 6. According to Theorem 3, we have

then we have‘ for all  due to the choice of . Because  is an increasing function of , we have

On the other hand, use Corollary 5.11 in [35], we have . Therefore, in order to confirm the induction assumption, we need to choose such that , which is

This inequality is equivalent to Eq. 11. On the other hand, under the assumption of Eq. 10, it is obviously that

Thus, this completes the proof of Eq. 12.

Remark 1

Theorem 5 shows that H-cycle provides a uniform preconditioner under the assumption that the condition number of the two-grid method  is bounded. When , we have  from Eq. 10. Note that the convergence factor , inequality Eq. 11 reduces to

This implies that , which means that, if the two-grid method at any level  (with exact solution at coarse level has a uniformly bounded convergence factor , then the corresponding H-cycle has a uniformly bounded convergence factor . Obviously, it is better than W-cycle (-fold V-cycle), because, for W-cycle to be uniformly convergent, the convergence factor of the two-grid method should be bounded to be  (See Corollary 5.30 in [35]).

4.2 N-cycle MG Method

Similarly, our proposed the N-cycle MG method uses steps of the NA method Algorithm 7 to define the coarse-level solvers. Based on the notation introduced in Section 2.1, Algorithm 1, and Algorithm 7, Algorithm 9 presents the N-cycle algorithm recursively.

1:  if  then
2:     
3:  else
4:     Presmoothing:
5:     Coarse-grid correction: , where  is implemented as in Algorithm 7 with  as the preconditioner for  steps
6:     Postsmoothing:
7:  end if
8:  
Algorithm 9 N-cycle MG:

The N-cycle MG method can be viewed as an AMLI-cycle method using the polynomial defined in Eq. 9. Following standard analysis of the AMLI-cycle method presented in [35] and utilizing the convergence theory of the NA method (Theorem 4), we can show that N-cycle MG method presented in Algorithm 9 provides a uniform preconditioner under the assumption that the condition number of the two-grid method  is uniformly bounded. We summarize the results in Theorem 6 and comment that the general case ban analyzed similarly when the conditioned number of the V-cycle MG method with a bounded-level difference is uniformly bounded.

Theorem 6

Let  be defined by Algorithm 9 and  be implemented as in Algorithm 7 wit  as the preconditioner with . Assume that the two-grid method has a uniformly bounded condition number , then the condition number of  can be uniformly bounded. More specifically, there exists  and , satisfying , such that

(13)

And we have the following uniform estimate for the N-cycle method  defined by Algorithm 9,

(14)

Proof

The proof is essentially the same as the proof of Theorem 5. The only two differences are, the polynomial  used here is the one defined by Eq. 8, and the corresponding convergence property is that presented in Theorem 4.

More precisely, since we use  steps of Algorithm 7 with  as a preconditioner to define  in  Algorithm 9, we have that the eigenvalues of  are contained in the interval , where

and  is the polynomial defined by Eq. 8. According to Theorem 4, we have

then  for all  due to the choice of . Because  is an increasing function of , we have

On the other hand, use Corollary 5.11 in [35], we have . Therefore, in order to confirm the induction assumption, we need to choose  such that , which is

This inequality is equivalent to Eq. 13. On the other hand, under the assumption that , it is obviously that

Thus, this completes the proof of Eq. 14.

Next, we provide several remarks to better interpret the theoretical results of Theorem 6, comparison with standard AMLI-cycle, and limitations in predicting the performance of the N-cycle method in practice.

Remark 2

When , we have  due to . Note that the convergence factor of the two-grid method satisfies , inequality Eq. 13 reduces to

This implies that , which means that, if the two-grid method at any level  (with exact solution at coarse level ) has a uniformly bounded convergence factor , then the corresponding N-cycle has a uniformly bounded convergence factor . It is better than W-cycle (-fold V-cycle) because, for W-cycle to be uniformly convergent, the convergence factor of the two-grid method should be bounded to be  (See Corollary 5.30 in [35]).

Remark 3

If we want AMLI-cycle using Chebyshev polynomial Eq. 2 (i.e., Algorithm 3) to be uniformly convergent, the convergence factor of the two-grid method should be , which is slightly better than N-cycle theoretically due to the min-max property of the Chebyshev polynomials. However, it is a theoretical result based on the assumption that the extreme eigenvalues of the MG preconditioned coarse-level problems are available, which could be quite expensive or even impossible in practice. As we will see from our numerical experiments in Section 5, we can simply use the estimations  and , i.e., , to define the N-cycle method in practice and still obtain a good performance. Moreover, the N-cycle MG method could outperform the AMLI-cycle in practice with such a simple choice of the parameter .

5 Numerical Results

In this section, we present some numerical experiments to illustrate the efficiency of the N-cycle MG method. In our numerical experiments, we discretize all the examples using the linear finite-element method on uniform triangulations of the domain  with mesh size . Namely, the matrix  in (1) is the stiffness matrix of the finite-element discretizations. The true solution of the linear system is , , and the right hand side  is computed accordingly for all the examples. We use the unsmoothed aggregation AMG (UA-AMG) method [17] in all the experiments since it is well-known that, the V-cycle UA-AMG method does not converge uniformly in general and more involved cycles are needed. For all the MG cycles, we use Gauss-Seidel (GS) smoother ( step forward GS for pre-smoothing and  step backward GS for post-smoothing). In our implementations of Algorithm 6 (used in H-cycle Algorithm 8) and Algorithm 7 (used in N-cycle Algorithm 9), we choose  and  (i.e., one step of steepest descent method for solving (5) with ). We always use  (which is a good estimation for SPD problems in general) and choose different  to test the performance. In all the numerical experiments, all the MG methods are used as stand-alone iterative solvers. We use zero initial guess and the stopping criteria is that the relative residual is less than or equal to . Besides the number of iterations to convergence, the average convergence factor of the last five iterations is reported to illustrate the performance of the MG methods.

Example 1

Consider the model problem on .