Parameter Estimation in the Hermitian and Skew-Hermitian Splitting Method Using Gradient Iterations

09/03/2019
by   Qinmeng Zou, et al.
0

This paper presents enhancement strategies for the Hermitian and skew-Hermitian splitting method based on gradient iterations. The spectral properties are exploited for the parameter estimation, often resulting in a better convergence. In particular, steepest descent with early stopping can generate a rough estimate of the optimal parameter. This is better than an arbitrary choice since the latter often causes stability problems or slow convergence. Additionally, lagged gradient methods are considered as inner solvers for the splitting method. Experiments show that they are competitive with conjugate gradient in low precision.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

12/03/2019

On Extensions of Limited Memory Steepest Descent Method

We present some extensions to the limited memory steepest descent method...
04/19/2020

Stochastic gradient algorithms from ODE splitting perspective

We present a different view on stochastic optimization, which goes back ...
11/26/2019

The fixed-stress splitting scheme for Biot's equations as a modified Richardson iteration: Implications for optimal convergence

The fixed-stress splitting scheme is a popular method for iteratively so...
10/23/2016

Fast and Reliable Parameter Estimation from Nonlinear Observations

In this paper we study the problem of recovering a structured but unknow...
03/14/2022

A Linearly Convergent Douglas-Rachford Splitting Solver for Markovian Information-Theoretic Optimization Problems

In this work, we propose solving the Information bottleneck (IB) and Pri...
04/28/2018

Toward Designing Convergent Deep Operator Splitting Methods for Task-specific Nonconvex Optimization

Operator splitting methods have been successfully used in computational ...
06/18/2020

Kaczmarz-type inner-iteration preconditioned flexible GMRES methods for consistent linear systems

We propose using greedy and randomized Kaczmarz inner-iterations as prec...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We are interested in solving the linear system

(1)

where is a non-Hermitian positive definite matrix of size . It has been observed that splitting methods can be used with success. The traditional alternating direction implicit method [19] has inspired the construction of alternate two-step splittings and , and this leads to an iteration called Hermitian and skew-Hermitian splitting (HSS) [3] in which alternately a shifted Hermitian system and a shifted skew-Hermitian system are solved. HSS has received so much attention [4, 2, 6, 23, 26, 27, 20], possibly due to its guaranteed convergence and mathematical beauty.

Let and denote the Hermitian and skew-Hermitian parts of , respectively. Let be the conjugate transpose of matrix . It follows that

Let

be the identity matrix. In short, the HSS method is defined as follows

(2)

with . It could be regarded as a stationary iterative process

where

is a given vector. Following the notations of Bai et al. 

[3] let us set

The operators and can be expressed as

Let be the spectrum of a matrix and let be the spectral radius. Convergence result for (2) in the non-Hermitian positive definite case was established by Bai et al. [3]

(3)

where denotes -norm. This shows that the spectral radius of iteration matrix is less than . As a result, HSS has guaranteed convergence for which the speed depends only on the Hermitian part . Let be the

th eigenvalue of a matrix in ascending order. The key observation here is that choosing

(4)

leads to the well-known upper bound

(5)

where denotes the condition number. It is noteworthy that inequality (5) is similar to the convergence result for conjugate gradient (CG) [17, 25] in terms of -norm error. As mentioned by Bai et al. [3], minimizes the upper bound of but not itself. In some cases the right-hand side of (3) may not be an accurate approximation to the spectral radius. Since very little theory is available on direct minimization, we still try to approximate indirectly the optimal parameter .

In this paper we exploit spectral properties of gradient iterations in order to make the estimation feasible. In Section 2

, we focus on the asymptotic analysis of the steepest descent method. In Section 

3, we discuss some strategies for estimating the parameter in HSS based on gradient iterations and give a comparison of lagged gradient methods and CG for solving Hermitian positive definite systems in low precision. Numerical results are shown in Section 4 and some concluding remarks are drawn in Section 5.

2 Asymptotic analysis of steepest descent

In this section we consider the Hermitian positive definite (HPD) linear system

(6)

of size . The solution is the unique global minimizer of convex quadratic function

(7)

For , the gradient method is of the form

(8)

where . This gives the updating formula

(9)

The steepest descent (SD) method proposed by Cauchy [8] defines a sequence of steplengths as follows

(10)

which is the reciprocal of Rayleigh quotient. It minimizes the quadratic function or the -norm error of the system (6) and gives theoretically an optimal result at each step

where . This classical method is known to behave badly in practice. The directions tend to asymptotically alternate between two orthogonal directions resulting in a slow convergence [1].

The motivation for this paper arose during the development of efficient gradient methods. We notice that generally SD converges much slower than CG for HPD systems. However, the spectral properties of the former could be beneficial to parameter estimation. Akaike [1]

provided a probability distribution model for the asymptotic analysis of SD. It appears that standard techniques used in linear algebra are not very helpful in this case. The so-called two-step invariance property led to the work of Nocedal et al. 

[18] in which further asymptotic results are presented. Let

be the eigenvector corresponding to the eigenvalue 

. Relevant properties by Nocedal et al. [18] which will be exploited in the following text can be briefly described in Lemma 1. Note that a symmetric positive definite real matrix was used by Nocedal et al. [18]. Therefore, we extend this result and we present a new lemma and its proof in the case of Hermitian positive definite systems.

Lemma 1.

Assume that . Assume that and . Consider the gradient method (8) with steplength (10) being used to solve (6) where est Hermitian positive definite. Then

(11)
(12)

and

(13)
(14)

for some constant .

Proof.

Let Re and Im be the real and imaginary parts, respectively. The coefficients of system (6) have the following form

where denotes the imaginary unit. It is possible to rewrite system (6) into the real equivalent form

(15)

By Lemma 3.3 and Theorem 5.1 in Nocedal et al., 2002 [18], it is known that results (11) to (14) hold in the real case. To prove the desired result in the Hermitian case, it suffices to show that SD applied to (15) is equivalent to that for (6), namely, they should yield the same sequences of gradient vectors and steplengths. One finds that

(16)

where

Assume that the blocks in  is the same as the real and imaginary parts of , respectively. Then, from (15) one obtains that

(17)

On the other hand, let

Combining (16) and (17) implies . Since , it follows that for all , from which one obtains that

Hence, the following result holds:

Along with (15), this implies that , according to which one finds that when the blocks in  are equal to the real and imaginary parts of , respectively. Hence, the SD iteration for Hermitian system (6) and that for -by- real form yield exactly the same sequence of solutions. Since properties (11) to (14) in the real case has been proved by Nocedal et al. [18], we arrive at the desired conclusion. ∎

Concerning the assumption used in Lemma 1, if there exist repeated eigenvalues, then we can choose the eigenvectors so that the corresponding gradient components vanish [13]. If or , then the second condition can be replaced by inner eigenvectors with no effect on the theoretical results.

It took some time before the spectral properties described by Nocedal et al. [18] were applied for solving linear systems. De Asmundis et al. [12] proposed an auxiliary steplength

(18)

which could be used for efficient implementations of gradient methods. The major result is a direct consequence of (11) and (12). We state the lemma without proof, see De Asmundis et al., 2013 [12] for further discussion.

Lemma 2.

Under the assumptions of Lemma 1, the following result holds

(19)

Another direction of approach was based on a delicate derivation by Yuan [29]. Let us write and

(20)

Yuan [29] developed a new auxiliary steplength of the form

(21)

which leads to some -dimensional finite termination methods for solving system (6[29]. Let us now introduce an alternative steplength

(22)

Let us write and . It follows that

The spectral properties of (20), (21) and (22) are shown in Lemma 3. Note that the equations (23) and (24) have appeared in De Asmundis et al., 2014 [11] for the real case. Below, we extend equations (23) and (24) for the Hermitian case and also one new equation.

Lemma 3.

Under the assumptions of Lemma 1, the following limits hold

(23)
(24)
(25)
Proof.

Combining (11) and (12) implies

(26)

Combining (11) to (14), one could deduce that

from which one finds

(27)

The first equation follows by combining (26) and (27). Along with (19), this implies that

which yields the desired limits (24) and (25). ∎

It is noteworthy that steplengths (21) and (22) could be expressed as the roots of a quadratic function

(28)

with

from which one could observe that and

As mentioned by Yuan [29], a slightly shortened steplength would improve the efficiency of steepest descent. This is one reason why the Yuan steplength could be fruitfully used in alternate gradient methods [10, 11].

As an example, assume that and

(29)

Assume that is constructed by where is a vector of all ones. We plot in Figure 1 the curves of (28) for a few representative iteration numbers.

Figure 1: Curves of  for a few representative iteration numbers. Steepest descent is used for solving system (6) where satisfies (29) and is a vector of all ones.

This figure shows that the curves of corresponding to steepest descent converge to the limit, as proved in Lemma 2 and Lemma 3.

3 Application to HSS iterations

3.1 Preliminary considerations

In this section we first try to compute estimates for parameter  in the HSS method. One possible solution is to simply choose without resorting to special techniques, but experience shows that it often leads to very slow convergence or even divergence, depending on the system being solved. Another approach is based on the observation that was introduced to enable the bounded convergence, as seen in (3), and it is possible to express it differently. As an example consider a positive definite diagonal matrix such that

(30)

As a result, the iteration matrix is of the form

Notice that (30) is a special case of preconditioned HSS [7] when choosing and . In particular, the fact that Theorem 2.1 in Bertaccini et al., 2005 [7] holds for (30) implies , yielding the guaranteed convergence.

On the basis of similar reasoning as in HSS [3], the spectral radius is bounded by

A natural idea is to seek so that the upper bound is small. At first glance we may choose as the diagonal elements of . Inspired by the diagonal weighted matrix in Freund, 1992 [14], the Euclidean norms of column vectors could also be exploited. However, the common experience is that these strategies may lead to a stagnation of convergence, and sometimes perform much worse than choosing . We will not pursue them further in this paper.

3.2 Parameter estimation based on gradient iterations

It is observed that (23) leads to a straightforward estimation of parameter  in (4). From Figure 1 we can deduce that the optimal parameter in HSS could be actually approximated by steepest descent iterations, which is shown in the following theorem.

Theorem 4.

Assume that the matrix  in system (6) is the Hermitian part of  in system (1). If steepest descent is used for solving (6), then the following limit holds

(31)
Proof.

Combining (4), (23) and the fact that  observed from (28), the desired conclusion follows. ∎

Another approach is to compute the approximation by combining Lemmas 2 and 3, in which case could be estimated without explicit access to operator . This approach is shown in Theorem 5.

Theorem 5.

Assume that the matrix  in system (6) is the Hermitian part of  in system (1). If steepest descent is used for solving , then the following limit holds

(32)
Proof.

Recall that . Since

for , it follows that

Combining (19) and (23) implies

This completes out proof. ∎

Remark.

Practically, obtaining  by (32) requires a predetermined parameter . One could choose and give an integer  as the maximum number of iterations such that

in which case the HSS algorithm might be executed at reduced costs.

Another direction of approach is based on the minimal gradient (MG) steplength

the spectral properties of which have been discussed by the present authors along with several new gradient methods in a separate paper. Let

Let us write .

Theorem 6.

Assume that the matrix  in system (6) is the Hermitian part of  in system (1). If minimal gradient is used for solving (6), then the following limit holds

(33)
Proof.

The proof can be obtained similarly as the one in Theorem 4. ∎

Theorem 7.

Assume that the matrix  in system (6) is the Hermitian part of  in system (1). If minimal gradient is used for solving , then the following limit holds

(34)
Proof.

The proof can be obtained similarly as the one in Theorem 5. ∎

3.3 Solution based on lagged gradient iterations

Although steepest descent has remarkable spectral properties, as an iterative method, its popularity has been overshadowed by CG. Akaike [1] exploited the fact that the zigzag behavior nearly always leads to slow convergence, except when initial gradient approaches an eigenvector. This drawback can be cured with a lagged strategy, first proposed by Barzilai and Borwein [5], which was later called Barzilai-Borwein (BB) method. The idea is to provide a two-point approximation to the quasi-Newton methods, namely

where and , yielding

Notice that . The convergence analysis was given in Raydan, 1993 [21] and Dai and Liao, 2002 [9]. For the -linear result, however, has never been proved due to its nonmonotone convergence. It seems overall that the effect of this irregular behavior is beneficial.

For the HSS method, two iterative procedures are needed at each iteration. Since the solution of subproblems in (2) is sometimes as difficult as that of the original system (1), the inexact solvers with rather low precision are often considered, especially for ill-conditioned problems. In practice, the first equation of (2) is usually solve by CG, and the second equation of (2) can be solved by CGNE [22]. Friedlander et al. [15] made the observation that BB could often be competitive with CG when low precision is required. It is known that CG is sensitive to rounding errors, while lagged gradient methods can remedy this issue [13, 24] with less computational costs per iteration. Additionally, although BB sometimes suffers from the disadvantage of requiring increasing number of iterations for increasing condition numbers, its low-precision behavior tends to be less sensitive to the ill-conditioning.

A similar method developed by symmetry [5] is of the form

which imposes as well a quasi-Newton property

Notice that . In the last three decades, much effort was devoted to develop new lagged gradient methods, see De Asmundis et al., 2014 [11] and the references therein.

An example is illustrated in Figure 2.

Figure 2: Comparison of CG and BB for solving system (6) where is a diagonal matrix of size with and is a vector of all ones.

We solve (6) with different residual thresholds , where is chosen as a diagonal matrix of size  and is a vector of all ones. The diagonal entries have values logarithmically distributed between and in ascending order, with the first and the last entries equal to the limits, respectively, such that . The plot shows a fairly efficient behavior of BB.

4 Numerical experiments

In this section we perform some numerical tests. Assume that iterative algorithms are started from zero vectors. The global stopping criterion in HSS is determined by the threshold with a fixed convergence tolerance . The inner stopping thresholds and for the two half-steps of (2) are defined in the same way. For gradient iterations applied to system (6), similarly, the stopping criterion is defined by the threshold  with the same tolerance. All tests are run in double precision.

4.1 Asymptotic results of gradient iterations

The goal of the first experiment is to illustrate how the spectral properties described earlier can be used for providing a rough estimate of parameter . We have implemented steepest descent and minimal gradient iterations for several real matrices of size  generated by MATLAB routine sprandsym. The right-hand side is chosen to be a vector of ones. In Figure 3, parameter  is plotted versus iteration number, under which a red dotted line marks out the position of .

Figure 3: Parameter estimation with different matrix  generated randomly by MATLAB: (top), (middle), (bottom). Parameter  is computed by two approaches: Theorems 4 and 6 (left), Theorems 5 and 7 (right).

It is clear that tends to asymptotically as expected. As can be seen, steepest descent with limits (31) and (32) turns out to be a better strategy than minimal gradient in all cases. The indirect approximations based on (32) and (34) yield faster convergence for both steepest descent and minimal gradient.

This test confirms Theorems 4 to 7. Recall that choosing as parameter leads to an upper bound of , for which it is not necessary to obtain an exact estimate. Experience shows that this choice may sometimes cause overfitting, resulting in slow convergence or even divergence, especially when is small. One simple measure is to use early stopping in gradient iterations. In the following, it is assumed that steepest descent is used for parameter estimation in HSS, called preadaptive iterations, and we consider only the direct approach (31).

4.2 HSS with different parameters

In this test we generate some matrices obtained from a classical problem in order to understand the convergence behavior of HSS enhanced by steepest descent iterations.

Example 1.

Consider system (1) where

arises from the discretization of partial differential equation

(35)

on the unit cube with a positive constant. Assume that satisfies homogeneous Dirichlet boundary conditions. The finite difference discretization on a uniform grid with mesh size is applied to the above model yielding a linear system with .

In the following we use the centered difference scheme for discretization. The right-hand side  is generated with random complex values ranging in . As thresholds for inner iterations, and are chosen. CG is exploited for solving the Hermitian inner system, while CGNE is used for the skew-Hermitian part. Figure 4 shows the convergence behavior of HSS upon different values of the parameter.

Figure 4: Solving problem (35) by HSS with  and . The optimal parameters are located by red lines.

Here, we set  and . The optimal parameters  with  are located by red lines. Notice that a path that zigzags through the bottom of the valley corresponds to the best parameters. As already noted that the parameter estimates need not be accurate, and thus the red lines are good enough in practice.

Then approximating by inexact steepest descent iterations yields the preadaptive HSS method (PAHSS). Let denote the number of preadaptive iterations. The convergence behaviors and total computing times are illustrated in Figure 5.

Figure 5: Parameter estimation for problem (35) with different mesh densities: (first), (second), (third), (fourth). Left: convergence curves for different . Right: average wall-clock times for different  including that of steepest descent iterations.

The left four plots show the residual curves with several typical choices of  when , namely, . Two observations can be made for all dimensions: the first is that larger yields faster convergence of HSS; the second is that does not lead to significant gains in efficiency compared with . The right four plots show total wall-clock times of PAHSS iterations, measured in seconds, upon ranging from  to . It can be seen that substantial gains are made in the beginning, following a long period of stagnation. Experience shows that a small number of steepest descent iterations is sufficient and it is therefore appropriate to use early stopping.

4.3 CG and BB as low-precision inner solvers

In order to verify that BB can be an efficient alternative to CG as low-precision inner solver for HSS, some tests proceed along the same lines as above but consider both CG and BB as inner solvers for the Hermitian part. Numbers of total iterations and wall-clock times, which are measured in seconds, are shown in Table 1.

HSS-CG HSS-BB ORTHODIR
# iters
time (s)
Table 1: Results of different methods for problem (35) with , and .

Since the optimal parameters for (35) with are less that , which may lead to stability problem, we choose for all tests. We conduct repeated experiments and print only the average computation times. We also add here the results of ORTHODIR [28] for solving (1) for the purpose of comparison. The comparison of costs is shown in Table 2.

method dot products vector updates matrix-vector storage
CG
BB
CGNE
HSS solver solver solver matrices
ORTHODIR
Table 2: Summary of operations for iteration and storage requirements. In the HSS row, the term “solver” represents an inner solver like CG, BB or CGNE, while the storage requires Hermitian and skew-Hermitian parts of and the residual vector.

As expected, BB is less efficient than CG in terms of computation times but BB shows a clear advantage for storage requirements and resistance to perturbation [13, 24]. In addition, BB and CG used within HSS make the HSS method better than ORTHODIR. The major drawback to ORTHODIR is that the computational work and storage requirement per iteration rise linearly with the iteration number. This drawback can be reduced with a restarted version of the ORTHODIR, but this increases the total number of iterations and in the end does not reduce the computation time. The choice between HSS and Krylov subspace methods depends on how expensive the sparse matrix-vector multiplications are in comparison to the vector updates and how much storage is available for the routine.

5 Conclusion

Gradient iterations provide a versatile tool in linear algebra. Apart from parameter estimates related to the spectral properties, steepest descent variants have also been tried recently with success as iterative methods [12, 11, 16]. This paper extends the spectral properties of gradient iterations and gives an application in the Hermitian and skew-Hermitian splitting method. Note that this approach can be extended to other splitting methods (see Section 1 and the references therein) without difficulty where a parameter is needed to be computed. Our experiments confirm that the gradient-enhanced HSS method can be an attractive alternative to the original one.

Acknowledgments

This work was supported by the French national programme LEFE/INSU and the project ADOM (Méthodes de décomposition de domaine asynchrones) of the French National Research Agency (ANR).

References

  • [1] H. Akaike. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Ann. Inst. Stat. Math., 11(1):1–16, 1959.
  • [2] Z.-Z. Bai. Several splittings for non-Hermitian linear systems. Sci. China Ser. A, 51(8):1339–1348, 2008.
  • [3] Z.-Z. Bai, G. H. Golub, and M. K. Ng. Hermitian and skew-Hermitian splitting methods for non-Hermitian positive definite linear systems. SIAM J. Matrix Anal. Appl., 24(3):603–626, 2003.
  • [4] Z.-Z. Bai, G. H. Golub, and M. K. Ng. On successive-overrelaxation acceleration of the Hermitian and skew-Hermitian splitting iterations. Numer. Linear Algebra Appl., 14(4):319–335, 2007.
  • [5] J. Barzilai and J. M. Borwein. Two-point step size gradient methods. IMA J. Numer. Anal., 8(1):141–148, 1988.
  • [6] M. Benzi. A generalization of the Hermitian and skew-Hermitian splitting iteration. SIAM J. Matrix Anal. Appl., 31(2):360–374, 2009.
  • [7] D. Bertaccini, G. H. Golub, S. S. Capizzano, and C. T. Possio. Preconditioned HSS methods for the solution of non-Hermitian positive definite linear systems and applications to the discrete convection-diffusion equation. Numer. Math., 99(3):441–484, 2005.
  • [8] A. L. Cauchy. Méthode générale pour la résolution des systèmes d’équations simultanées. Comp. Rend. Sci. Paris, 25(1):536–538, 1847. (in French).
  • [9] Y.-H. Dai and L.-Z. Liao. -linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal., 22(1):1–10, 2002.
  • [10] Y.-H. Dai and Y.-X. Yuan. Analysis of monotone gradient methods. J. Ind. Manag. Optim., 1(2):181–192, 2005.
  • [11] R. De Asmundis, D. di Serafino, W. W. Hager, G. Toraldo, and H. Zhang. An efficient gradient method using the Yuan steplength. Comput. Optim. Appl., 59(3):541–563, 2014.
  • [12] R. De Asmundis, D. di Serafino, F. Riccio, and G. Toraldo. On spectral properties of steepest descent methods. IMA J. Numer. Anal., 33(4):1416–1435, 2013.
  • [13] R. Fletcher. On the Barzilai-Borwein method. In L. Qi, K. Teo, and X. Yang, editors, Optimization and Control with Applications, pages 235–256. Springer, Boston, MA, 2005.
  • [14] R. W. Freund. Conjugate gradient-type methods for linear systems with complex symmetric coefficient matrices. SIAM J. Sci. Stat. Comput., 13(1):425–448, 1992.
  • [15] A. Friedlander, J. M. Martínez, B. Molina, and M. Raydan. Gradient method with retards and generalizations. SIAM J. Numer. Anal., 36(1):275–289, 1999.
  • [16] C. C. Gonzaga and R. M. Schneider. On the steepest descent algorithm for quadratic functions. Comput. Optim. Appl., 63(2):523–542, 2016.
  • [17] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand., 49(6):409–436, 1952.
  • [18] J. Nocedal, A. Sartenaer, and C. Zhu. On the behavior of the gradient norm in the steepest descent method. Comput. Optim. Appl., 22(1):5–35, 2002.
  • [19] D. W. Peaceman and H. H. Rachford. The numerical solution of parabolic and elliptic differential equations. J. Soc. Indust. Appl. Math., 3(1):28–41, 1955.
  • [20] M. Pourbagher and D. K. Salkuyeh. On the solution of a class of complex symmetric linear systems. Appl. Math. Lett., 76:14–20, 2018.
  • [21] M. Raydan. On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal., 13(3):321–326, 1993.
  • [22] Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia, 2nd edition, 2003.
  • [23] D. K. Salkuyeh, D. Hezari, and V. Edalatpour. Generalized successive overrelaxation iterative method for a class of complex symmetric linear system of equations. Int. J. Comput. Math., 92(4):802–815, 2015.
  • [24] K. van den Doel and U. M. Ascher. The chaotic nature of faster gradient descent methods. J. Sci. Comput., 51(3):560–581, 2012.
  • [25] A. van der Sluis and H. A. van der Vorst. The rate of convergence of conjugate gradients. Numer. Math., 48(5):543–560, 1986.
  • [26] S.-L. Wu. Several variants of the Hermitian and skew-Hermitian splitting method for a class of complex symmetric linear systems. Numer. Linear Algebra Appl., 22(2):338–356, 2015.
  • [27] S.-L. Wu and C.-X. Li. Modified complex-symmetric and skew-Hermitian splitting iteration method for a class of complex-symmetric indefinite linear systems. Numer. Algorithms, 76(1):93–107, 2017.
  • [28] D. M. Young and K. C. Jea. Generalized conjugate-gradient acceleration of nonsymmetrizable iterative methods. Linear Algebra Appl., 34:159–194, 1980.
  • [29] Y.-X. Yuan. A new stepsize for the steepest descent method. J. Comput. Math., 24(2):149–156, 2006.