We consider the following consistent linear systems
where , , and is the
-dimensional unknown vector. As we know, the Kaczmarz method[kaczmarz1] is a popular so-called row-action method for solving the systems (1). In 2009, Strohmer and Vershynin [Strohmer2009] proved the linear convergence of the randomized Kaczmarz (RK) method. Following that, Needell [Needell2010]
found that the RK method is not converge to the ordinary least squares solution when the system is inconsistent. To overcome it, Zouzias and Freris[Completion2013] extended the RK method to the randomized extended Kaczmarz (REK) method. Later, Ma, Needell, and Ramdas [Completion2015] provided a unified theory of these related iterative methods in all possible system settings. Recently, many works on Kaczmarz methods were reported; see for example [Bai2018, Bai2018r, Gao2019, Dukui2019, Wu2020, Chen2020, Niu2020] and references therein.
In 2018, Bai and Wu [Bai2018]
first constructed a greedy randomized Kaczmarz (GRK) method by introducing an efficient probability criterion for selecting the working rows from the coefficient matrix, which avoids a weakness of the one adopted in the RK method. As a result, the GRK method is faster than the RK method in terms of the number of iterations and computing time. Subsequently, based on the GRK method, a so-called relaxed greedy randomized Kaczmarz (RGRK) method was proposed in [Bai2018r] by introducing a relaxation parameter , which makes the convergence factor of the RGRK method be smaller than that of the GRK method when it is in , and the convergence factor reaches the minimum when . For the latter case, i.e., , Du and Gao [Gao2019] called it the maximal weighted residual Kaczmarz method and carried out extensive experiments to test this method. By the way, the idea of greed applied in [Bai2018, Bai2018r] also has wide applications, see for example [Griebel2012, Nguyen2017, Liu2019, Bai2019] and references therein.
In the present paper, we propose a novel greedy Kaczmarz (GK) method. Unlike the GRK and RGRK methods, the new method adopts a quite different way to determine the working rows of the matrix and hence needs less computing time in each iteration; see the detailed analysis before Algorithm 3 below. Consequently, the GK method can outperform the GRK and RGRK methods in term of the computing time. This result is confirmed by extensive numerical experiments, which show that, for the same accuracy, the GK method requires almost the same number of iterations as those of the GRK and RGRK methods, but spends less computing time. In addition, we also prove the convergence of the GK method in theory.
2 Notation and preliminaries
For a vector , represents its th entry. For a matrix , , , and denote its
th row, spectral norm, and Frobenius norm, respectively. In addition, we denote the smallest positive eigenvalue ofby , where denotes the conjugate transpose of a vector or a matrix, and the number of elements of a set by .
In what follows, we use , with being the Moore-Penrose pseudoinverse, to denote the least-Euclidean-norm solution to the systems (1). For finding this solution, Bai and Wu [Bai2018] proposed the GRK method listed as follows, where denotes the residual vector.
, initial estimate
From the definitions of and in Algorithm 1, we have that if , then
Thus, we can’t conclude that if , then
As a result, there may exist some such that
Meanwhile, from the update formula, for any , we have
Thus, combining (2) and (3), we can find that we can’t make sure any row with the index from the index set make the distance between and be the largest when finding . Moreover, to compute , we have to compute the norm of each row of the matrix .
Based on the GRK method, Bai and Wu [Bai2018r] further designed the RGRK method by introducing a relaxation parameter, which is listed in Algorithm 2.
It is easy to see that when , the RGRK method is just the GRK method. Bai and Wu [Bai2018r] showed that the convergence factor of the RGRK method is smaller than that of the GRK method when , and the convergence factor reaches the minimum when . For the latter case, we have that if , then
From the analysis following the Algorithm 2, in this case, the row with the index from the index set can make the distance between and be the largest for any possible . However, we still needs to compute the norm of each row of the matrix when computing .
3 A novel greedy Kaczmarz method
On basis of the analysis of the GRK and RGRK methods and inspired by some recent works on selection strategy for working index based on the maximum residual [Nutini2018, Haddock2019, Rebrova2019], we design a new method for solving consistent linear systems which includes two main steps. In the first step, we use the maximum entries of the residual vector to determine an index set whose specific definition is given in Algorithm 3. In the second step, we capture an index from the set with which we can make sure the distance between and be the largest for any possible . On a high level, the new method seems to change the order of the two main steps of Algorithm 1 or Algorithm 2. However, comparing with the GRK and RGRK methods, we do not need to calculate the norm of each row of the matrix any longer in Algorithm 3, and, like the RGRK method with , our method always makes the distance between and be the largest when finding . In fact, the new method combines the maximum residual rule and the maximum distance rule. These characters make the method reduce the computation cost at each iteration and hence behaves better in the computing time, which is confirmed by numerical experiments given in Section 4.
Based on the above introduction, we propose the following algorithm, i.e., Algorithm 3.
Note that if
then So the index set in Algorithm 3 is nonempty for all iteration index .
Like Algorithm 1 or Algorithm 2, we can use the values of for as a probability selection criterion to devise a randomized version of Algorithm 3. In this case, the convergence factor may be a little worse than that of Algorithm 3 because, for the latter, the index is selected based on the largest value of for , which make the distance between and be the largest for any possible .
To the best of our knowledge, the idea of Algorithm 3 is brand new in the fields of designing greedy Kaczmarz type algorithms and we don’t find it in any work on greedy Gauss-Seidel methods either. So it is interesting to apply this idea to Gauss-Seidel methods to devise some new greedy Gauss-Seidel algorithms for solving other problems like large linear least squares problems. We will consider this topic in a subsequent paper.
In the following, we give the convergence theorem of the GK method.
The iteration sequence generated by Algorithm 3, starting from an initial guess in the column space of , converges linearly to the least-Euclidean-norm solution and
Moreover, let , Then,
From the update formula in Algorithm 3, we have
which implies that is parallel to . Meanwhile,
which together with the fact gives
Then is orthogonal to . Thus, the vector is perpendicular to the vector . By the Pythagorean theorem, we get
On the other hand, from Algorithm 3, we have
For , we have
which together with a result from [Bai2018]:
is valid for any vector in the column space of , implies
which is just the estimate (4).
Since and , it holds that
Hence, the convergence factor of the GK method is small when the parameters and are small. So, the smaller size of is, the better convergence factor of the GK method is when is fixed. From the definitions of , , and , we can find that the size of may be smaller than those of and . This is one of the reasons that our algorithm behaves better in computing time.
If and , the right side of (5) is smaller than
Note that the error estimate in expectation of the GRK method in [Bai2018] is
where So the convergence factor of GK method is slightly better for the above case.
4 Experimental results
In this section, we compare the GRK, RGRK and GK methods with the matrix from two sets. One is generated randomly by using the MATLAB function randn, and the other includes some full-rank sparse matrices (e.g., ch7-8-b1, ch8-8-b1, model1, Trec8, Stranke94 and mycielskian5) and some rank-deficient sparse matrices (e.g., flower_5_1, relat6, D_11, Sandi_sandi, GD01_c and GD02_a) originating in different applications from [Davis2011]. They possess certain structures, such as square () (e.g., Stranke94, mycielskian5, GD01_c and GD02_a), thin () (e.g., ch7-8-b1, ch8-8-b1, flower_5_1 and relat6 ) or fat () (e.g., model1, Trec8, D_11 and Sandi_sandi), and some properties, such as symmetric (e.g., Stranke94 and mycielskian5) or nonsymmetric (e.g., GD01_c and GD02_a).
We compare the three methods mainly in terms of the iteration numbers (denoted as “IT”) and the computing time in seconds (denoted as “CPU”). It should be pointed out here that the IT and CPU listed in our numerical results denote the arithmetical averages of the required iteration numbers and the elapsed CPU times with respect to 50 times repeated runs of the corresponding methods, and we always set in the RGRK method in our experiments since the convergence factor attains its minimum in this case. To give an intuitive compare of the three methods, we also present the iteration number speed-up of GK against GRK, which is defined as
the iteration number speed-up of GK against RGRK, which is defined as
the computing time speed-up of GK against GRK, which is defined as
and the computing time speed-up of GK against RGRK, which is defined as
In addition, for the sparse matrices from [Davis2011], we define the density as follows
and use cond(A) to represent the Euclidean condition number of the matrix .
In our specific experiments, the solution vector is generated randomly by the MATLAB function randn and we set the right-hand side . All the test problems are started from an initial zero vector and terminated once the relative solution error (RES), defined by
satisfies or the number of iteration exceeds .
For the first class of matrices, that is, the randomly generated matrices, the numerical results on IT and CPU are listed in Tables 4, 3, 2, and 1 when , and in Tables 8, 7, 6, and 5 when . From Tables 8, 7, 6, 5, 4, 3, 2, and 1, we see that the GK method requires almost the same number of iterations as those of the GRK and RGRK methods but the GK method is more efficient in term of the computing time. The computing time speed-up of GK against GRK is at least 1.6951 (see Table 8 for the matrix) and at most 7.2667 (see Table 1 for the matrix), and the computing time speed-up of GK against RGRK is at least 1.6144 (see Table 8 for the matrix) and at most 4.5714 (see Table 1 for the matrix).
|density||1.42 %||2.21 %||3.79 %||0.54 %||12.40%||16.45 %|