We consider the following consistent linear systems
where with , , and is the
-dimensional unknown vector. As we know, the Kaczmarz methodkaczmarz1 is a popular so-called row-action method for solving the systems (1). In 2009, Strohmer and Vershynin Strohmer2009 proved the linear convergence of the randomized Kaczmarz (RK) method. Subsequently, many randomized Kaczmarz type methods were proposed for different possible systems settings; see for example Needell2010 ; Eldar2011 ; Completion2013 ; Completion2015 ; Dukui2019 ; Wu2020 ; Chen2020 and references therein.
Unlike the RK method which selects the working rows of
according to some probability distribution, Motzkin methodAgamon54 ; Motzkin54 employs a greedy strategy, i.e., the maximum violation criterion, to select the working row in each iteration. So, the method is also known as the Kaczmarz method with the “most violated constraint control” or “maximal-residual control” Petra2016 ; Nutini2016 ; Nutini2018 . This greedy strategy makes the Motzkin method outperform the RK method in many cases. So, many analyses and applications of the Motzkin method were published recently; see for example haddock2019motzkin ; rebrova2019sketching ; li2020novel and references therein.
The sampling Kaczmarz-Motzkin (SKM) method proposed in de2017sampling for solving linear feasibility problem is a combination of the RK and Motzkin methods. Its accelerated version was presented in morshed2020accelerated , which introduces Nesterov’s acceleration scheme. In addition, an improved analysis of the SKM method was given in haddock2019greed . The SKM method overcomes some drawbacks of the methods of RK and Motzkin. For example, the Motzkin method is expensive since it selects the index at the iteration by comparing the residual errors of all the constraints, and the RK method may make progress slowly since it doesn’t employ a greedy strategy. Instead, the SKM method can select an index by just comparing the residual errors of part of constraints, and employs the maximum violation criterion. However, the SKM method may be slow because it makes progress by enforcing only one constraint. Inspired by the block algorithms given in needell2014paved ; needell2015randomized ; Niu2020 which can accelerate the original ones, in this paper, we consider the block versions of the SKM method. Two greedy strategies are devised to determine the index sets for block iteration.
2 Notation and preliminaries
Throughout the paper, for a matrix , and denote its -th row (or -th entry in the case of a vector) and column space, respectively. We denote the number of elements of a set by
and let the positive eigenvalues of, where denotes the transpose of a vector or a matrix, be always arranged in algebraically nonincreasing order:
To analyze the convergence of our new methods, the following fact will be used extensively later in the paper.
(horn2012matrix ) Let be symmetric and be its principal submatrix. Then
, initial estimate.
3 The first block sampling Kaczmarz-Motzkin method
The first block sampling Kaczmarz-Motzkin (BSKM1) method is presented in Algorithm 2. Compared with the SKM method, the main difference and key of the BSKM1 method is to devise the index set for updating the approximation. We mainly use the threshold value obtained from the small set which is the same as the one in Algorithm 1 to build the index set.
Note that if
then So the index set in Algorithm 2 is always nonempty.
Compared with the SKM method, in each iteration, the BSKM1 method can eliminate several large violated constraints control simultaneously. So, the BSKM1 method converges faster; see the detailed discussions following Theorem 3.1. In addition, if , the BSKM1 method reduces to the Motzkin method. This means that the BSKM1 method can be also regarded as the block version of the Motzkin method.
Now, we provide the convergence of Algorithm 2.
From an initial guess , the sequence generated by the BSKM1 method converges linearly in expectation to the least-Euclidean-norm solution and
From Algorithm 2, using the fact , we have
Since is an orthogonal projector, taking the square of the Euclidean norm on both sides and using the Pythagorean theorem, we get
which together with the Courant-Fisher theorem:
and the fact yields
On the other hand, from Algorithm 2, if , we have
Now, taking expectation of both sides (with respect to the sampled ), we have
which together with defined in (2) leads to
Further, considering (3), we get
which is the desired result.
According to Lemma 1, it is easy to see that , which together with the fact yields
That is, the convergence factor of the BSKM1 method is indeed smaller than 1.
which together with the fact
In the above expressions, and denote the next approximations generated by the BSKM1 and SKM methods, respectively. Hence, the BSKM1 method converges at least as quickly as the SKM method.
In addition, setting , from Algorithm 2, we can obtain since . So, we also immediately get that the BSKM1 method converges at least as quickly as the Motzkin method.
4 The second block sampling Kaczmarz-Motzkin method
Considering that in Algorithm 2 may be for all and the size of the index set cannot be controlled, we design the second block sampling Kaczmarz-Motzkin (BSKM2) method, which is presented in Algorithm 3. The biggest difference between the BSKM2 and BSKM1 methods is the way to build the index set. For Algorithm 3, we can control the size of the index set .
Note that if
then So the index set in Algorithm 3 is always nonempty.
The iteration index used for updating of the SKM method belongs to the index set used in the BSKM2 method. So the latter makes progress faster than the former. In addition, If , the BSKM2 method reduces to the Motzkin method.
Now, we bound the expected rate of convergence for Algorithm 3.
From an initial guess , the sequence generated by the BSKM2 method converges linearly in expectation to the least-Euclidean-norm solution and
where satisfies .
According to Lemma 1, we have and , which together with the facts and yield
That is, the convergence factor of the BSKM2 method is indeed smaller than 1.
Note that , where is the iteration index of the SKM method. Thus, similar to the analysis in Remark 4, we immediately obtain that the BSKM2 method converges at least as quickly as the SKM method. In addition, as , where , we also get that the BSKM2 method converges at least as quickly as the Motzkin method.
From Algorithms 2 and 3, we can find that both the update rules of the two methods need to compute the Moore-Penrose pseudoinverse of the row submatrix or in each iteration, which may be expensive. To avoid computing the Moore-Penrose pseudoinverse, we can adopt the following pseudoinverse-free iteration format:
5 Numerical experiments
In this section, we mainly compare our two block sampling Kaczmarz-Motzkin methods (BSKM1, BSKM2) and the SKM method in terms of the iteration numbers (denoted as “Iteration”) and computing time in seconds (denoted as “CPU time(s)”) using the matrix from two sets. One is generated randomly by using the MATLAB function randn, and the other one contains the matrices in Table 1 from the University of Florida sparse matrix collection Davis2011 . To compare these methods more clearly, we set . In addition, for the sparse matrices, the density is defined as follows:
In all the following specific experiments, we generate the solution vector using the MATLAB function randn, and set the vector to be . All experiments start from an initial vector , and terminate once the relative solution error (RES), defined by
satisfies , or the number of iteration steps exceeds 200,000.
For the first class of matrices, that is, the matrices generated randomly, the numerical results of the three methods are presented in Figs. 1–6. Figs. 1–4 show that, with different values of , the number of iterative steps and computing time of our two methods are less than those of the SKM method. From Figs. 5–6, we find that the BSKM1 and BSKM2 methods vastly outperform the SKM method in terms of the iterations and computing time when the problems are large-scale.
For the second class of matrices, that is, the sparse matrices from Davis2011 , we plot the numerical results on Iteration and CPU time(s) versus in Figs. 7–12. From these figures, we find that the similar results shown in Figs. 1–4. That is, the BSKM1 and BSKM2 methods converge faster and need less runtime for the same accuracy.
Therefore, in all the cases, our block sampling Kaczmarz-Motzkin methods, i.e., BSKM1 and BSKM2 methods, outperform the SKM method. This is mainly because the latter only updates one index in each iteration while the former enforces multiple greedy indices simultaneously.
- (1) S. Kaczmarz, Angenäherte auflösung von systemen linearer gleichungen, Bull. Int. Acad. Pol. Sci. Lett. A., 35, 355–357 (1937)
- (2) T. Strohmer, R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, J. Fourier Anal. Appl., 15, 262–278 (2009)
- (3) D. Needell, Randomized Kaczmarz solver for noisy linear systems, BIT Numer. Math., 50, 395–403 (2010)
- (4) Y. Eldar, D. Needell, Acceleration of randomized Kaczmarz method via the Johnson-Lindenstrauss lemma, Numer. Algor., 58, 163–177 (2011)
- (5) A. Zouzias, M. N. Freris, Randomized extended Kaczmarz for solving least squares, SIAM J. Matrix Anal. Appl., 34, 773–793 (2013)
- (6) A. Ma, D. Needell, A. Ramdas, Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods, SIAM J. Matrix Anal. Appl., 36, 1590–1604 (2015),
- (7) K. Du, Tight upper bounds for the convergence of the randomized extended Kaczmarz and Gauss-Seidel algorithms, Numer. Linear Algebra Appl., 26, e2233 (2019)
- (8) N. C. Wu, H. Xiang, Projected randomized Kaczmarz methods, J. Comput. Appl. Math., 372, 112672 (2020)
- (9) J. Q. Chen, Z. D. Huang, On the error estimate of the randomized double block Kaczmarz method, Appl. Math. Comput., 370, 124907 (2020)
- (10) S. Agamon, The relaxation method for linear inequalities, Canad. J. Math., 6, 382–392 (1954)
- (11) T. S. Motzkin, I. J. Schoenberg, The relaxation method for linear inequalities, Canad. J. Math., 6, 393–404 (1954)
- (12) S. Petra, C. Popa, Single projection Kaczmarz extended algorithms, Numer. Algor., 73, 791–806 (2016)
- (13) J. Nutini, B. Sepehry, A. Virani, I. Laradji, M. Schmidt, H. Koepke, Convergence rates for greedy Kaczmarz algorithms, presented at UAI (2016)
- (14) J. Nutini, Greed is good: greedy optimization methods for large-scale structured problems, PhD thesis, University of British Columbia (2018)
- (15) J. Haddock, D. Needell, On Motzkin’s method for inconsistent linear systems, BIT Numer. Math., 59, 387–401 (2019)
- (16) E. Rebrova, D. Needell, Sketching for Motzkin’s iterative method for linear systems, Proc. 50th Asilomar Conf. on Signals, Systems and Computers (2019)
- (17) H. Y. Li, Y. J. Zhang, A novel greedy Kaczmarz method for solving consistent linear systems, arXiv preprint arXiv:2004.02062 (2020)
- (18) J. A. De Loera, J. Haddock, D. Needell, A sampling Kaczmarz-Motzkin algorithm for linear feasibility, SIAM J. Sci. Comput., 39, S66–S87 (2017)
- (19) M. S. Morshed, M. S. Islam, M. Noor-E-Alam, Accelerated sampling Kaczmarz Motzkin algorithm for the linear feasibility problem, J. Global Optim., 77, 361–382 (2020)
- (20) J. Haddock, A. Ma, Greed works: an improved analysis of sampling Kaczmarz-Motzkin, arXiv preprint arXiv:1912.03544 (2019)
- (21) D. Needell, J. A. Tropp, Paved with good intentions: analysis of a randomized block Kaczmarz method, Linear Algebra Appl., 441, 199–221 (2014)
- (22) D. Needell, R. Zhao, A. Zouzias, Randomized block Kaczmarz method with projection for solving least squares, Linear Algebra Appl., 484, 322–343 (2015)
- (23) Y. Q. Niu, B. Zheng, A greedy block Kaczmarz algorithm for solving large-scale linear systems, Appl. Math. Lett., 104, 106294 (2020)
- (24) R. A. Horn, C. R. Johnson, Matrix analysis, Cambridge Univ. Press (2012)
- (25) I. Necoara, Faster randomized block Kaczmarz algorithms, SIAM J. Matrix Anal. Appl., 40, 1425–1452 (2019)
- (26) K. Du, W. T. Si, X. H. Sun, Pseudoinverse-free randomized extended block Kaczmarz for solving least squares, arXiv preprint arXiv:2001.04179 (2020)
- (27) H. Y. Li, Y. J. Zhang, Greedy block Gauss-Seidel methods for solving large linear least squares problem, arXiv preprint arXiv:2004.02476 (2020)
- (28) J. D. Moorman, T. K. Tu, D. Molitor, D. Needell, Randomized Kaczmarz with averaging, BIT Numer. Math. (2020)
- (29) T. A. Davis, Y. F. Hu, The university of florida sparse matrix collection, ACM. Trans. Math. Softw., 38, 1–25 (2011)