1 Introduction
Consider the linear system
(1) 
where is symmetric positive definite (SPD) and . The solution is the unique global minimizer of strictly convex quadratic function
(2) 
The gradient method is of the form
(3) 
where . The steepest descent (SD) method, originally proposed in [4], defined the steplength by the reciprocal of a Rayleigh quotient of Hessian matrix
(4) 
which is also called Cauchy steplength. It minimizes the function or the norm error and gives theoretically an optimal result in each step
where . This classical method is known to behave badly in practice. The directions generated tend to asymptotically alternate between two orthogonal directions leading to a slow convergence [1].
The first gradient method with retards is the BarzilaiBorwein (BB) method that was originally proposed in [3]. The BB method is of the form
which remedies the convergence issue for illconditioned problems by using nonmonotone steplength. The motivation arose in providing a twopoint approximation to the quasiNewton methods, namely
where and . Notice that . There exists a similar method developed by symmetry in [3]
which imposes as well a quasiNewton property
We remark that , see Section 2. Practical experience is generally in favor of BB. The convergence analysis of these methods was given in [29] and [7]. The preconditioned version was established in [25]. A more recent chapter by [15] discussed the efficiency of BB. In the years that followed numerous generalizations have appeared, such as alternate methods [5, 9], cyclic methods [18, 5, 6], adaptive methods [36, 17], and some general frameworks [18, 5, 35].
There exist several auxiliary steplengths acting as accelerators of other methods. More precisely, performing occasionally the auxiliary iterative steps could often improve the global convergence. For example, in order to find the unique minimizer in finitely many iterations in dimensions, [34] proposed a ingenious steplength as follows
which is called Yuan steplength. Recently, [14] proposed a new gradient method that exploits also the spectral properties of SD. The improvement resorts to a special steplength
In one direction, these steplengths give rise to some efficient gradient methods. For example, [10] provided several alternate steps, in which we mention here the second variant
which seems to be the most promising variant according to the experiments. As usual, it does not have a specific name. Here we call it DaiYuan (DY) method [17]. A closer examination of Yuan variants revealed that they have a distinguish property called “decreasing together” [10]
. It means that DY does not sink into any lower subspace spanned by eigenvectors. Experiments have shown that BB has also such feature. Important differences come from the fact that BB is a nonmonotone steplength, whereas DY is monotone thus being more stable.
On the other hand, the auxiliary steps lead to gradient methods with alignment such as
with . This method is called steepest descent with alignment (SDA). Here, we choose the version described in [13] without using the switch condition illustrated in [14], and vary the form while leaving the alignment property unchanged. Shortly after, they presented another similar method based on Yuan steplength [12], called steepest descent with constant steplength (SDC) which is of the form
with . The main feature of this method is to foster the reduction of gradient components along the eigenvectors of selectively, and reduce the search space into smaller and smaller dimensions. The problem tends to have a better and better condition number [12]. We note that the motivations of SDA and SDC are different according to [14] and [12]. Since their derivations both involve spectral analysis of Cauchy step, we define here that both of them are regarded as alignment methods. These two steps seem to be the state of the art of gradient methods and tend to give the best performance among all of these. Recently, [19] introduced a general framework of Cauchy steplength with alignment, which breaks the Cauchy cycle by periodically applying some short steplengths.
Despite the good practical performance of alignment methods, all promising formulations are based on the Cauchy steplength in order to ensure the alignment feature. It is convenient to relax such restriction and jump out of the framework. In this paper, we address this issue and investigate some gradient methods with the alignment property without Cauchy steplength. In Section 2, we analyze the spectral properties of minimal gradient step. In Section 3, we introduce some new gradient methods by virtue of the basic steplengths and discuss their alignment property. In Section 4, we focus on the convergence analysis of the new methods. A set of numerical experiments is illustrated in Section 5 and concluding remarks are drawn in Section 6.
2 Spectral analysis of minimal gradient
The minimal gradient (MG) method was proposed in [23] which is of the form
It minimizes the norm gradient value
where
denotes the Euclidean norm of a vector. Traditionally it does not have a specific name. From
[22] we know that it was originally called “minimal residues”. However, this term might cause confusion since there exists a Krylov subspace method called MINRES [27] which minimizes the norm of the residual through the Lanczos process. On the other hand, MG is also a special case of the Orthomin() method when [20], and thus sometimes called OM [2, 33]. Here, the name “minimal gradient” comes from [9] since it gives an optimal gradient result in each step.We can assume without loss of generality that
where
is the set of eigenvalues of
, and is the set of associated eigenvectors. Let be the condition number of such that(5) 
From (3) we can deduce that
(6) 
There exist real numbers such that
(7) 
Then, substituting (7) into (6) implies
We know from [1] that the SD method is asymptotically reduced to a search in the dimensional subspace generated by the two eigenvectors corresponding to the largest and the smallest eigenvalues of . Eventually the directions generated tend to zigzag in two orthogonal directions that gives rise to a slow convergence rate. Such argument was demonstrated by using the following lemma, see [1] and [16] for more details.
Lemma 1.
Let
be a probability measure attached to
where and . Consider a transformation such thatThen,
and
for some .
We now give our main result on the spectral properties of MG. These arguments lead to the gradient methods with alignment which shall be described in Section 3.
Theorem 2.
Consider the linear system where is SPD and . Assume that the sequence of solution vectors is generated by the MG method. If and the starting point is such that and , then for some constant , the following results hold

(8) (9) 
(10) (11) 
(12) 
(13) (14)
Proof.
We first prove (8) and (9). We have
Together with (7), this implies that
For any and , let us write , it follows that
(15) 
Moreover, we define a probability measure
(16) 
from which we notice that . Hence,
Notice that in Lemma 1 can be expressed as without loss of generality. Substituting (15) and applying again (16), it follows that
Along with Lemma 1 the desired result follows.
For the argument (b), notice that
Since argument (a) has been proved, relations (10) and (11) trivially follow by applying (8) and (9).
Then we prove the argument (c). For any , it follows from (6) that
Combining (8) and (10) implies
After some simplification, we can obtain (12) when the number of iteration is even in denominator. In an analogous fashion, combining (9) and (11) yields
One finds that the final result of the odd case converges also to the same limit, which is the desired conclusion.
Remark.
The assumption used in Theorem 2 is not restrictive since if there exist some repeated eigenvalues, then we can choose the corresponding eigenvectors so that the superfluous ones vanish [15]. Moreover, if or equals zero, then the second condition can be simply replaced by the components involving inner indices without changing the results discussed later on.
Note that argument (a) in Theorem 2 has been proved in [28] for a framework called gradient algorithms, while results (b) to (d) for the MG method have not appeared in any literature. (b) shows that MG has also the zigzag behavior, namely, alternates between two directions. The implications for Theorem 2 shall be seen later in Section 3. For now, we give the asymptotic behavior of the quadratic function for completeness.
Theorem 3.
Proof.
For any , it follows from (2) that
Let us write as defined in (15) and (16), in which case we obtain
If is an even number, from (8), one finds that
Notice that
which yields the first equation. Similarly, if is an odd number, it follows that
The numerator can be merged as follows
which yields the second result. Finally, (19) follows immediately by combining (17), (18) and (12). This completes our proof. ∎
3 New alignment methods without Cauchy steplength
As far as we know, all existing gradient methods with alignment are based on the Cauchy steplength. After a further rearrangement of steps, [19] concludes that one could break the Cauchy cycle by periodically applying some short steplengths to accelerate the convergence of gradient methods. We show here that such condition is not necessary and several methods that potentially possess the same feature without Cauchy step can be derived.
[14] observed that a constant equal to could lead to alignment property. Here we extend it to a more general case.
Theorem 4.
Proof.
We have
By [30], it is easy to deduce that the sequence converges to with a steplength . Hence, the first statement holds. One finds that
Let
For (22) to be satisfied, one needs to impose the condition for all , which yields
The second one is obviously satisfied, while the first one leads to
If equality holds, then
It is clear that . Then the second statement trivially follows, which completes the proof. ∎
Note that leads to the trivial case , and thus the limit in both (21) and (22) equals . From Theorem 4 we find that condition (20) has a twofold effect: driving the alignment property when strict partial order holds, as shown in (22), and forcing the search into a twodimensional space in the equal case, as shown in (21). It means that if there exist some steps asymptotically making the equality of (21) attainable, then it has similar tendency with the SD method, namely, alternating between two orthogonal directions. On the other hand, we can add a fractional factor to periodically break the cycle. This asymptotically yields a constant steplength strictly smaller than , leading to alignment process in the subsequent several iterations according to (22).
Recall that [8] proposed a gradient method of the form
It asymptotically converges to the optimal steplength
which minimizes the coefficient matrix
Thus we call it asymptotically optimal (AO) method. Notice that the following relationship holds
(23) 
which can be easily proved by the CauchySchwarz inequality
It is known that AO generates monotone curve and often leads to slow convergence.
We observe that the limit of AO satisfies condition (22) and may potentially be improved by a cyclic breaking. For example, we can choose a shorter one to constantly align the gradient vector to the onedimensional space spanned by . Let where . It follows that
From Theorem 4, we observe that can asymptotically trigger the alignment behavior. Hence, we can write a new gradient method called AO with alignment (AOA) as follows
(24) 
with .
Important differences between SDA and AOA come from the fact that the Cauchy step in SDA zigzags itself in two orthogonal directions, while the AO step in AOA converges to a constant and the constant leads later to the same feature.
On the other hand, since the spectral properties of MG have been studied in Section 2, we are now prepared to propose our new methods based on them. We first give some notations
Note that Y2 has been proposed in [10] as a component of the 2dimensional finite termination method.
Theorem 5.
Consider the linear system where is SPD and . Assume that the sequence of solution vectors is generated by the MG method. If and the starting point is such that and , then the following results hold
(25) 
(26) 
and
(27) 
Proof.
The first conclusion follows immediately by combining (10) and (11). For the second argument, we have
By combining (10), (11), (13) and (14), it follows that
Hence, one can see that
which implies the second conclusion after some simplification. Further, along with (25), we have
This completes our proof. ∎
One may conclude from Theorem 5 that A2 and Y2 are similar to the auxiliary steplengths discussed in [14] and [12]. However, since MG has shorter steplength than SD, we expect that the former might be more smoother than the latter. After a substitution of labels, we are able to define MG with alignment (MGA) and MG with constant steplength (MGC) as follows
(28) 
(29) 
with . Recall that the motivation in
Comments
There are no comments yet.