1 Introduction
The conjugate gradient (CG) method is an efficient numerical method for solving strongly convex quadratic programming (QP) in the form of
(1.1) 
or equivalently, the linear system , where is a symmetric positive definite matrix and . It terminates at the unique optimal solution of (1.1
) in a finite number of iterations. Moreover, it is suitable for solving largescale problems since it only requires matrixvector multiplications per iteration (e.g., see
[24] for details). The CG method has also been generalized to minimize a convex quadratic function over a box or a ball (e.g., see [11, 12, 25, 26, 27]).In this paper we are interested in generalizing the CG method to solve the regularized convex QP:
(1.2) 
where is a symmetric positive semidefinite matrix, and is a regularized parameter. Throughout this paper we make the following assumption for problem (1.2).
Assumption 1
Over the last decade, a great deal of attention has been focused on problem (1.2
) due to numerous applications in image sciences, machine learning, signal processing and statistics (e.g., see
[8, 16, 5, 14, 30, 29] and the references therein). Considerable effort has been devoted to developing efficient algorithms for solving (1.2) (e.g., see [2, 23, 30, 33, 15, 32, 31, 22]). These methods are iterative methods and capable of producing an approximate solution to (1.2). Nevertheless, they generally cannot terminate at an optimal solution of (1.2). Recently, Byrd et al. [6] proposed a method called iiCG to solve (1.2) that combines the iterative softthresholding algorithm (ISTA) [2, 10, 30] with the CG method. Under the assumption that is symmetric positive definite, it was shown in [6] that the sequence generated by iiCG converges to the unique optimal solution of (1.2), and if additionally this solution satisfies strict complementarity, iiCG terminates in a finite number of iterations. Its convergence is, however, unknown when is positive semidefinite (but not definite), which is typical for many instances of (1.2) arising in applications.In this paper we propose some generalized CG () methods for solving (1.2) that terminate at an optimal solution of (1.2) in a finite number of iterations with no additional assumption. At each iteration, our methods first identify a certain face of some orthant and then either perform an exact line search along the direction of the negative projected minimumnorm subgradient of or execute a CG subroutine that conducts a sequence of CG iterations until a CG iteration crosses the boundary of this face or an approximate minimizer of over this face or a subface is found. The purpose of the exact line search step is to release some zero components of the current iterate so that the value of is sufficiently reduced. The aim of executing a CG routine is to update the nonzero components of the current iterate, which also results in a reduction on . We determine which type of step should be taken by comparing the magnitude of some components of the minimumnorm subgradient of to that of its rest components. Our methods are substantially different from the iiCG method [6]. In fact, at each iteration, iiCG either performs a proximal gradient step or executes a single CG iteration. It determines which type of step should be conducted by comparing the magnitude of some components of a proximal gradient of to that of its rest components.
In order to analyze the convergence of our GCG methods, we establish some error bound results for problem (1.2). We also conduct some exclusive analysis on the aforementioned exact line search and the CG subroutine. Using these results, we show that the methods terminate at an optimal solution of (1.2) in a finite number of iterations. To the best of our knowledge, the GCG methods are the first methods for solving (1.2) with finite convergence. We also show that our methods are capable of finding an approximate solution of (1.2) by allowing some inexactness on the execution of the CG subroutine. The overall arithmetic operation cost of our GCG methods for finding an optimal solution depends on in , which is superior to the accelerated proximal gradient method [2, 23] that depends on in . In addition, it shall be mentioned that these methods can be extended to solve the following boxconstrained convex QP with finite convergence:
(1.3) 
where is symmetric positive semidefinite, , with . As for finite convergence, the existing CG type methods [11, 12] for (1.3), however, require that be symmetric positive definite. The extension of our methods to problem (1.3) is not included in this paper due to the length limitation.
The rest of the paper is organized as follows. In Section 2, we establish some results on error bound for problem (1.2). In Section 3, we propose several methods for solving problem (1.2) and establish their finite convergence. In Section 4, we discuss the application of our methods to solve the regularized leastsquares problems and develop a practical termination criterion for them. We conduct numerical experiments in Section 5 to compare the performance of our methods with some stateoftheart algorithms for solving problem (1.2). In Section 6 we present some concluding remarks. Finally, in the appendix we study some convergence properties of the standard CG method for solving (possibly not strongly) convex QP.
1.1 Notation and terminology
For a nonzero symmetric positive semidefinite matrix , we define a generalized condition number of as
(1.4) 
where is the MoorePenrose pseudoinverse of ,
is the largest eigenvalue of
and is the smallest positive eigenvalue of . Clearly, it reduces to the standard condition number when is symmetric positive definite. In addition, for any index set , is the cardinality of and is the submatrix of formed by its rows and columns indexed by . Analogously, is the subvector of formed by its components indexed by . In addition, the range space and rank of a matrix are denoted by and , respectively.Let be the standard sign operator, which is conventionally defined as follows
Let be defined in (1.2) and
(1.5) 
Let be the minimumnorm subgradient of at , which is the projection of the zero vector onto the subdifferential of at . It follows that
(1.6) 
where denotes the th partial derivative of at . It is known that is an optimal solution of problem (1.2) if and only if , where denotes the subdifferential of . Since is equivalent to , is an optimal solution of (1.2) if and only if .
For any , we define
(1.7) 
and also define
(1.8) 
In addition, given any closed set , denotes the distance from to , and denotes the projection of to . Finally, we define
(1.9) 
2 Error bound results
In this section we develop some error bound results for problem (1.2). To proceed, let for any , where and are defined in (1.2). We first bound the gap between and by for all .
Theorem 2.1
Proof. Let denote the set of optimal solutions of (1.2). Notice that is a convex piecewise quadratic function. By [21, Theorem 2.7], there exists some such that
(2.1) 
Let be such that . By and the convexity of , one has
which together with (2.1) implies that the conclusion holds.
We next bound the gap between and by the magnitude of some components of for all .
Theorem 2.2
Proof. Let be arbitrarily chosen, and . If , it is clear that and hence . Also, by convention . These imply the conclusion holds. We now assume . Consider the problem
(2.2) 
In view of the definitions of , , , , and , one can observe that
This together with implies that . By (1.6), (2.2) and the definition of , we also observe that is the minimumnorm subgradient of at . In addition, notice that problem (2.2) is in the same form as (1.2). By these facts and applying Theorem 2.3 to problem (2.2), there exists some (depending on and ) such that
(2.3) 
Let , which is finite due to the fact that all possible choices of are finite. The conclusion immediately follows from this and (2.3).
The error bound presented in Theorem 2.2 is a local error bound as it depends on . In addition, Theorem 2.2 only ensures the existence of some parameter for the error bound, but its actual value is generally unknown. We next derive a global error bound with a known for problem (1.2) when is symmetric positive definite. To proceed, we first establish a lemma as follows.
Lemma 2.1
Suppose and . Let be defined in (1.5) and . Then there holds:
Proof. Let be all eigenvalues of and
the corresponding orthonormal eigenvectors. In addition, let
be an optimal solution of the problem . Clearly, . Moreover, for any , we have for some . These imply(2.4) 
Let . It follows that for all . In view of this and (2.4), we have
This together with the fact yields
Using the definitions of and , (2.4), and for all , one can observe that
The conclusion then immediately follows from the last two relations and the fact that and .
Theorem 2.3
Proof. Let be arbitrarily chosen and let . If , it is clear that and hence . Also, by convention . These imply the conclusion holds. We now assume . Consider the problem
Since is positive definite, so is . It then follows that . By applying Lemma 2.1 to this problem, we obtain that
(2.5) 
In addition, by the definitions of , and , one can observe that for all , where is defined in (1.8). This together with the definitions of and implies . Also, we observe that and . Using these relations and (2.5), we have
and hence the conclusion holds.
3 Generalized conjugate gradient methods for (1.2)
In this section we propose several methods for solving problem (1.2), which terminate at an optimal solution in a finite number iterations. A key ingredient of these methods is to apply a truncated projected CG (TPCG) method to a sequence of convex QP over certain faces of some orthants in .
3.1 Truncated projected conjugate gradient methods
In this subsection we present two TPCG methods for finding an (perhaps very roughly) approximate solution to a convex QP on a face of some orthant in in the form of
(3.1) 
where is defined in (1.5), , and form a partition of . For convenience of presentation, we denote by the feasible region of (3.1).
For the first TPCG method, each iterate is obtained by applying the standard projected CG (PCG) method ^{2}^{2}2The PCG method applied to problem (3.2) is equivalent to the CG method applied to the problem , where is the complement of in . to the problem
(3.2) 
until an approximate solution of (3.2) is found or a PCG iterate crosses the boundary of . In the former case, the method outputs the resulting approximate solution. But in the latter case, it outputs the intersection point between the boundary of and the line segment joining the last two PCG iterates. Let be an arbitrary feasible point of problem (3.1) and be given. We now present the first TPCG method for problem (3.1).
Subroutine 1:
Input: , , , , , , , .
Set , , , , .
Repeat

, where

.

If , return and terminate.

.

. If , return and terminate.

.

.
Output: .
Remark 1: The iterations of the above TPCG method are almost identical to those of PCG applied to problem (3.2) except that the step length is chosen to be an intermediate one when an iterate of PCG crosses the boundary of . In addition, if holds at some , the output is on the boundary of . If holds at some , the output is an approximate optimal solution of problem (3.1).
We next show that under a mild assumption the above method terminates in a finite number of iterations.
Theorem 3.1
Proof. (i) Assume that problem (3.2) is bounded. Suppose for contradiction that Subroutine 1 does not terminate in iterations. Then the iterates , of Subroutine 1 are identical to those generated by the PCG method applied to problem (3.2). Let denote the optimal value of (3.2). It follows from Theorem A.3 (iii) that for ,
By the definition of and Lemma 2.1, we have
Using these relations, we obtain that
In view of this and Theorem A.2 (i), one can easily conclude that the PCG method must terminate at satisfying for some . This contradict the above supposition.
(ii) Assume that problem (3.2) is unbounded. Suppose for contradiction that Subroutine 1 does not terminate in iterations. Then the iterates , , of Subroutine 1 are identical to those generated by the PCG method applied to problem (3.2). By Theorem A.2 (ii), there must exist some such that as . Recall that is in and problem (3.1) has at least an optimal solution. Thus there exists a least such that lies on the boundary of and Subroutine 1 thus terminates at iteration , which is a contradiction to the above supposition.
Remark 2: It follows from Theorem 3.1 that when , executes at most (but possibly much less than) PCG iterations. On the other hand, when , the number of PCG iterations executed in depends on in .
As seen from step 3) of Subroutine 1, it immediately terminates once an iterate crosses the boundary of . In this case, the output may be a rather poor approximate solution to problem (3.1). In order to improve the quality of , we resort an active set approach by iteratively applying Subroutine 1 to minimize over a decremental subset of , which is formed by incorporating the active constraints of the iterate obtained from the immediately preceding execution of Subroutine 1. Let be an arbitrary feasible point of problem (3.1) and be given. We now present this improved TPCG method for problem (3.1) as follows.
Subroutine 2:
Input: , , , , , , , .
Set , , , , .
Repeat

If , return and terminate.

.

, , , .

.
Output: .
We next show that under some suitable assumptions, Subroutine 2 terminates in a finite number of iterations.
Theorem 3.2
Proof. (i) Observe that in step 2) of Subroutine 2, Subroutine 1 (namely, ) is applied to the problem
(3.3) 
where is defined in (3.1). Let denote the feasible region of (3.3). In view of the updating scheme of Subroutine 2 and the definitions of , and , it is not hard to observe that . By the assumption that (3.1) has at least an optimal solution, so does (3.3). It then follows from Theorem 3.1 that shall be successfully generated by Subroutine 1. Using this observation and an inductive argument, we can conclude that Subroutine 2 is well defined.
(ii) Suppose for contradiction that Subroutine 2 does not terminate in iterations. Then for all . Since are generated by Subroutine 1, one can observe that and hence for every . It then follows from these and the definition of that for all ,
This implies that when Subroutine 1 is applied to (3.3), it terminates at a boundary point of the feasible region of (3.3). It then follows that
Thus is strictly increasing, which along with and leads to . This contradicts the trivial fact . Therefore, Subroutine 2 must terminate at some in at most iterations. Clearly, . We now prove by considering two separate cases as follows.
Case 1): . In this case, Subroutine 2 terminates at and outputs . By and the definition of , one can see that and hence
which together with implies .
Case 2): . In this case, Subroutine 2 must terminate at some iteration . It then follows that and . In addition, we observe from the definitions of and that for . It then immediately follows that .
(iii) We now prove statement (iii). Since , must be generated by calling the subroutine , whose first iteration performs a projected gradient step to find a point , where
and . By the assumption that for sufficiently small , one can see that and . We also observe that the value of is nonincreasing along the subsequent iterates of the subroutine . These observations and the definition of imply that . In addition, is nonincreasing along the iterates generated in Subroutine 1. Hence, for all . It then follows for all . Notice that for some . Hence, .
Remark 3: As seen from Theorem 3.2, the subroutine is executed in at most (but possibly much less than) times. In view of this and Remark 2, one can see that when , the number of PCG iterations executed in is at most . On the other hand, when , its number of PCG iterations depends on in .
3.2 The first generalized conjugate gradient method for (1.2)
In this subsection we propose a method for solving problem (1.2). We show that this method terminates at an optimal solution of (1.2) in a finite number of iterations. Before proceeding, we introduce some notations that will be used through the next several subsections.
Given any , we define
(3.4) 
where is given in (1.7). Also, we define as follows:
(3.5) 
where and are defined in (1.7). It then follows from (1.6) and (3.5) that
(3.6) 
In addition, given any , we define
The main idea of our method is as follows. Given a current iterate , we check to see whether or not. If yes, then is an optimal solution of (
Comments
There are no comments yet.