Compressed sensing is a recently developed paradigm which considers solving inverse problems by recovering sparse large unknwon vectors from small number of measurement vectors, linearly obtained form the unknown vector. Specifically, in compressed sensing, a system of linear equations, is provided, where is a dimensional measurement vector, is a dimensional measurement matrix and is the unknown dimensional vector. The goal is to estimate . In compressed sensing, , which makes it an undetermined set of linear equations having infinitely many solutions. However, if is assumed to be -sparse,i.e., if of the coordinates of are zero, such that , then it is possible to recover exactly with number of measurements . Candes and Tao  first showed that such a vector can be found exactly by solving the following optimization problem
With little effort, the above problem can be cast as a linear programming (LP) problem and can be solved using standard LP routines. However, if the number of unknownsis large, this approach becomes computationally expensive . An alternate path to solution is provided by greedy algorithms, which leverage the knowledge of sparsity of , and iteratively try to estimate . Orthogonal matching pursuit (OMP) [4, 5] and iterative hard thresholding (IHT) [6, 7] are two such classical algorithms with distinct estimation strategies. OMP iteratively searches for columns of the measurement matrix, one per each iteration, which are most suitable to correspond to the support of the unknown vector. Whereas, IHT minimizes a quadratic approximation of the least square cost function along with the -sparsity constraint . Both these algorithms are popular for their simplicity and ease of implementation. However, OMP has weaker theoretical recovery guarantee than IHT and it requires larger computational costs due to a projection step, which in turn ensures convergence in finite number of iterations. On the other hand, IHT is slower and may require infinite number of iterations while requiring much smaller computational cost.
Several other algorithms, like compressive sampling matching pursuit(CoSaMP) , subspace pursuit(SP) , hard thresholding pursuit  were proposed to partially overcome the drawbacks of OMP and IHT to result in better convergence behaviour. Of these, HTP is particularly impressive as it is derived by taking the best of both OMP and IHT, and has been shown to have superior recovery performance to CoSaMP and SP  and requires much smaller, finite number of iterations for recovery as well . The key to the improved behaviour of HTP is attributed to the fact that at each iteration, it uses a two stage strategy, where in the first identification stage it identifies a support with indices and then uses it in the following estimation stage to estimate a -sparse vector that minimizes the least squares cost function with the solution constrained to be on that support. Since the estimation stage ensures that the estimate is one of the many solutions corresponding to the possible supports of size , it plays the principal role in ensuring that the algorithm converges in finite number of steps. However, the fast recovery performance of HTP is also highly dependent upon the quality of the support selected by the identification step, as a sequence of “bad” support selection might lead to delayed recovery.
In this paper we propose a generalization of HTP where where we modify the identification stage by adding an extra gradient term of a regularization function to the gradient of the objective function. We call this method, the regularized HTP (RHTP). Note that if the regularizer function is takes as , it becomes the HTP algorithm. This modification was partly motivated by observing the equivalency of the following two problems
as the latter is formed from the former by adding a Lagrange multiplier to the cost function corresponding to the second constraint, where we have assumed that the function , is decomposable, i.e., , and that the Lagrange multipliers are . The presence of a regularizer generally corresponds to availability of prior information of the unknown vector to be estimated, and often improves the accuracy of the estimator. For example, the estimator produced by the elastic net (EN) regularizer by Zou and Hastie , which uses a convex combination of the Ridge()-regularizer and the -regularizer, posses a certain “grouping” effect [13, Theorem 1], where EN produces nearly equal estimates for a group of the nonzero entries in the support of the true vector if the corresponding columns of the measurement matrix are highly correlated. However, the -minimizer alone has been seen to select only one of these columns and discard the others. Thus, using gradient corresponding to a regularized cost function in the identification step, it is hoped that the modified support selection step of the HTP algorithm will be useful in “better” convergence of the algorithm in some way.
We provide a convergence analysis of the RHTP algorithm to find conditions on the measurement matrix as well the regularizer function and certain parameters of the algorithm, under which the algorithm converges. In the process of doing so it is revealed that if the regularizer function is decomposable and if the decomposed functions have certain regularity properties, the sequence generated by the RHTP algorithm is topologically conjugate , i.e. is dynamically equivalent, to a transformed sequence which has a HTP-like evolution with the gradient of the identification step premultiplied by a time varying diagonal matrix. Using this alternate HTP-like evoltuion, we show that if the common paramteres of HTP and RHTP are kept the same, the latter enjoys faster convergence speed compared to HTP. We also analyze the number of iterations it takes for RHTP to recover the true support with both noiseless and noisy measurements, and show that RHTP requires less number of iterations compared to HTP to do so. Finally, we corroborate our theoretical findings by performing several numerical experiments.
We first define the notion of equivalence between two dynamical systems, related by mappings, which will be helpful in the analysis later. For this purpose we use the notion of topological conjugacy between two maps generating two dynamics (See Devaney [14, §1.7]). We borrow the definition for topological conjugacy from Devaney  and state it as below:
Definition 2.1 (Topological Conjugacy).
Let be two topological spaces. Two mappings , and are topologically conjugate if there is a homeomorphism such that .
For two topological spaces equipped with topologies , a homeomorphism is a mapping such that is a bijection, and both and are continuous. Thus intuitively topological conjugacy between two mappings implies that same dynamics can be represented in two different forms using the maps . The importance of topological conjugacy is in the fact that if the maps are topologically conjugate, the properties of the sequence can be studied by studying only the properties of the sequence .
We use the following notations in the sequel:
For any subset , denotes the complement of the subset .
The superscript ‘’ is used to denote the transpose of a vector or matrix.
denotes the collection of all subsets of size of the set .
is such that .
are functions defined as .
The mapping is defined as , and the corresponding inverse is defined as .
, and denote the Jacobians of the transformations and , respectively, at the point , i.e., and .
is defined as and . Also, for any subset such that , , and .
is defined such that .
is defined such that
where is defined such that .
is defined such that
where is defined as .
is the nondecreasing arrangement of a vector , i.e., , and there exists a permutation of such that .
For any positive integer , , and where .
Furthermore, we require the matrix to satisfy the restricted isometry property (RIP) which, informally, stands for the near unitariness of the columns of the matrix . Till its inception from the seminal paper by Candes and Tao , RIP has been used as a standard tool for the analysis of algorithms in the literature of compressed sensing, and has been used predominantly in the analysis of algorithms like HTP, CoSaMP and SP [11, 9, 10].
Definition 2.2 (Restricted Isometry Property).
For any integer , the matrix is said to satisfy the restricted isometry property (RIP) of order with restricted isometry constant (RIC) , if ,
for all .
We will also require the following lemmas in the analysis:
Lemma 2.1 (Wielandt ).
For any integer , let be a real positive definite matrix with maximum and minimum eigenvalues
be a real positive definite matrix with maximum and minimum eigenvalues, and , respectively. Then, , such that , the following is satisfied:
Moreover, there exists orthonormal pair of vectors such that the the above inequality is satisfied with equality.
The following lemma will also be useful for our analysis. Similar result has also been obtained in Lemma in Wen et al  which used Lemma of Chang and Wu  to prove their result. However, we will be using Lemma 2.1 due to Wielandt to prove the following lemma.
If is a subset of , and , and if , and if the matrix satisfies the RIP of order with RIC , then,
The proof is provided in Appendix A-A. ∎
If is subset of such that , and if , and if the matrix satisfies the RIP of order with RIC , then, ,
The proof is provided in Appendix A-B. ∎
The map is a homeomorphism.
The proof is provided in Appendix A-C. ∎
For any , the Jacobians and satisfy and , respectively.
The proof is provided in Appendix A-D. ∎
Iii Proposed algorithm
The RHTP algorithm is described in Table I.
where the diagonal elements of the diagonal matrix are the constants , and the function is assumed to be decomposable , i.e., we assume that there are functions , such that .We also assume that the functions satisfy the following:
, i.e. is continuously double differentiable.
From the Table I it can be seen that the only difference between RHTP and HTP is that the former has a inside the hard thresholding operator in the identification stage. This modification induces the effect of availability of some prior information about the unknown vector in terms of the regularizer and the factors , and thus help select a support which is closer to the true support of the unknown vector, which in turn accelerates the convergence of the algorithm.
Iv Theoretical results
Iv-a Convergence analysis
This section analyzes convergence of the RHTP algorithm. The principle strategy used here is to find a sequence which is topologically conjugate to the dynamics of the sequence , and thus can be analyzed instead of the sequence to understand the properties of the convergence.
If the matrix satisfies RIP of order , then the mappings and are topologically conjugate.
Let us fix some . We need to find a homeomorphism such that . We claim that can be chosen to be the mapping .
First, that is a homeomorphism, follows from Lemma 2.4. We now show that . First observe that, while obtaining , in the intermediate step we calculate the support , which can be written as
where the step follows from the observation that . Thus,
where step follows from the observation that , since for all , and step follows from the definition of . Now, observe that, by definition, is -sparse with support and is also -sparse with support , as is -sparse, by definition, and and have the same support thanks to the relation . Also, we have proved that . Furthermore, as the matrix satisfies RIP of order , for any support of size , the minimizer is unique, as the matrix has minimum eigenvalue lower bounded by , and hence is invertible. Consequently, we have . This establishes the topological conjugacy of with . ∎
Theorem 4.1 gives us an alternate dynamics furnished by which we can analyze to understand the convergence behavior of the dynamics governed by . In the sequel we will analyze the sequence instead of the sequence produced by the RHTP algorithm, where . To further proceed we denote by the support of , and .
If the matrix satisfies RIP of order , then at any iteration , the iterates and are upper bounded in the following way:
where , and
The proof is supplied in Appendix A-E. ∎
Our first result on the convergence of the sequence is stated as below:
Let the matrix satisfy the RIP of order . Also, let the functions satisfy the following property:
where , with as defined in Lemma 4.1. Then the RHTP algorithm converges in finite number of iterations.
Writing , we get
We will now find upper bounds on .
To find an upper bound of , first note that . Now, we expess as
where step uses Lemma 2.5 and the identity along with the definition of . Consequently, can be expressed as: , where , and . Thus, . Since the maximum eigenvalue is a convex function over the set of matrices, we have, due to Jensen’s inequality, Now, for given , . Now, to find an upper bound on , for any , we note that . Since, , using Lemma 4.1, we obtain, , which implies that . Thus, we obtain that , where . hence,
To find an upper bound on , we proceed as below:
where step is due to the fact that . To further proceed, we utilize the structure of the vector , i.e., that , and . Consequently, . Thus, we obtain,
Now, by Lemma 4.1, . Consequently, , . Thus,
where Now, again using the structure of , we obtain,
Now the definition of implies that