Block sampling Kaczmarz-Motzkin methods for consistent linear systems

11/12/2020 ∙ by Yanjun Zhang, et al. ∙ Chongqing University 0

The sampling Kaczmarz-Motzkin (SKM) method is a generalization of the randomized Kaczmarz and Motzkin methods. It first samples some rows of coefficient matrix randomly to build a set and then makes use of the maximum violation criterion within this set to determine a constraint. Finally, it makes progress by enforcing this single constraint. In this paper, on the basis of the framework of the SKM method and considering the greedy strategies, we present two block sampling Kaczmarz-Motzkin methods for consistent linear systems. Specifically, we also first sample a subset of rows of coefficient matrix and then determine an index in this set using the maximum violation criterion. Unlike the SKM method, in the rest of the block methods, we devise different greedy strategies to build index sets. Then, the new methods make progress by enforcing the corresponding multiple constraints simultaneously. Theoretical analyses demonstrate that these block methods converge at least as quickly as the SKM method, and numerical experiments show that, for the same accuracy, our methods outperform the SKM method in terms of the number of iterations and computing time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider the following consistent linear systems

(1)

where with , , and is the

-dimensional unknown vector. As we know, the Kaczmarz method

kaczmarz1 is a popular so-called row-action method for solving the systems (1). In 2009, Strohmer and Vershynin Strohmer2009 proved the linear convergence of the randomized Kaczmarz (RK) method. Subsequently, many randomized Kaczmarz type methods were proposed for different possible systems settings; see for example Needell2010 ; Eldar2011 ; Completion2013 ; Completion2015 ; Dukui2019 ; Wu2020 ; Chen2020 and references therein.

Unlike the RK method which selects the working rows of

according to some probability distribution, Motzkin method

Agamon54 ; Motzkin54 employs a greedy strategy, i.e., the maximum violation criterion, to select the working row in each iteration. So, the method is also known as the Kaczmarz method with the “most violated constraint control” or “maximal-residual control” Petra2016 ; Nutini2016 ; Nutini2018 . This greedy strategy makes the Motzkin method outperform the RK method in many cases. So, many analyses and applications of the Motzkin method were published recently; see for example haddock2019motzkin ; rebrova2019sketching ; li2020novel and references therein.

The sampling Kaczmarz-Motzkin (SKM) method proposed in de2017sampling for solving linear feasibility problem is a combination of the RK and Motzkin methods. Its accelerated version was presented in morshed2020accelerated , which introduces Nesterov’s acceleration scheme. In addition, an improved analysis of the SKM method was given in haddock2019greed . The SKM method overcomes some drawbacks of the methods of RK and Motzkin. For example, the Motzkin method is expensive since it selects the index at the iteration by comparing the residual errors of all the constraints, and the RK method may make progress slowly since it doesn’t employ a greedy strategy. Instead, the SKM method can select an index by just comparing the residual errors of part of constraints, and employs the maximum violation criterion. However, the SKM method may be slow because it makes progress by enforcing only one constraint. Inspired by the block algorithms given in needell2014paved ; needell2015randomized ; Niu2020 which can accelerate the original ones, in this paper, we consider the block versions of the SKM method. Two greedy strategies are devised to determine the index sets for block iteration.

The rest of this paper is organized as follows. In Section 2, some notation and preliminaries are given. Our methods and their convergences are discussed in Section 3 and Section 4, respectively. Finally, we present the numerical results in Section 5.

2 Notation and preliminaries

Throughout the paper, for a matrix , and denote its -th row (or -th entry in the case of a vector) and column space, respectively. We denote the number of elements of a set by

and let the positive eigenvalues of

, where denotes the transpose of a vector or a matrix, be always arranged in algebraically nonincreasing order:

To analyze the convergence of our new methods, the following fact will be used extensively later in the paper.

Lemma 1

(horn2012matrix ) Let be symmetric and be its principal submatrix. Then

In addition, to compare the SKM method and our new methods clearly, we list the SKM method from de2017sampling in Algorithm 1.

  Input:   Matrix , vector , parameter

, initial estimate

.
  Output:  Approximate solving .
  1. Choose a sample of constraints, , uniformly at random from among the rows of .
  2. Set .
  3. Update
  4. Repeat.
Algorithm 1 The SKM method

3 The first block sampling Kaczmarz-Motzkin method

The first block sampling Kaczmarz-Motzkin (BSKM1) method is presented in Algorithm 2. Compared with the SKM method, the main difference and key of the BSKM1 method is to devise the index set for updating the approximation. We mainly use the threshold value obtained from the small set which is the same as the one in Algorithm 1 to build the index set.

  Input:   Matrix , vector , parameter , initial estimate .
  Output:  Approximate solving .
  1. Choose a sample of constraints, , uniformly at random from among the rows of .
  2. Set , and .
  3. Determine the index set
  4. Update
  5. Repeat.
Algorithm 2 The BSKM1 method
Remark 1

Note that if

then So the index set in Algorithm 2 is always nonempty.

Remark 2

Compared with the SKM method, in each iteration, the BSKM1 method can eliminate several large violated constraints control simultaneously. So, the BSKM1 method converges faster; see the detailed discussions following Theorem 3.1. In addition, if , the BSKM1 method reduces to the Motzkin method. This means that the BSKM1 method can be also regarded as the block version of the Motzkin method.

Now, we provide the convergence of Algorithm 2.

Theorem 3.1

From an initial guess , the sequence generated by the BSKM1 method converges linearly in expectation to the least-Euclidean-norm solution and

where

(2)
Proof

From Algorithm 2, using the fact , we have

Since is an orthogonal projector, taking the square of the Euclidean norm on both sides and using the Pythagorean theorem, we get

which together with the Courant-Fisher theorem:

(3)

and the fact yields

On the other hand, from Algorithm 2, if , we have

Then

Now, taking expectation of both sides (with respect to the sampled ), we have

which together with defined in (2) leads to

Further, considering (3), we get

which is the desired result.

Remark 3

According to Lemma 1, it is easy to see that , which together with the fact yields

That is, the convergence factor of the BSKM1 method is indeed smaller than 1.

Remark 4

From Algorithm 2, we know that since . Similar to the analysis in haddock2019greed , we can obtain

which together with the fact

leads to

In the above expressions, and denote the next approximations generated by the BSKM1 and SKM methods, respectively. Hence, the BSKM1 method converges at least as quickly as the SKM method.

In addition, setting , from Algorithm 2, we can obtain since . So, we also immediately get that the BSKM1 method converges at least as quickly as the Motzkin method.

4 The second block sampling Kaczmarz-Motzkin method

Considering that in Algorithm 2 may be for all and the size of the index set cannot be controlled, we design the second block sampling Kaczmarz-Motzkin (BSKM2) method, which is presented in Algorithm 3. The biggest difference between the BSKM2 and BSKM1 methods is the way to build the index set. For Algorithm 3, we can control the size of the index set .

  Input:   Matrix , vector , parameter , initial estimate .
  Output:  Approximate solving .
  1. For
  2. Choose a sample of constraints, , uniformly at random from among the rows of without replacement.
  3. Compute .
  4. End for
  5. Determine the index set
  6. Update
  7. Repeat.
Algorithm 3 The BSKM2 method
Remark 5

Note that if

then So the index set in Algorithm 3 is always nonempty.

Remark 6

The iteration index used for updating of the SKM method belongs to the index set used in the BSKM2 method. So the latter makes progress faster than the former. In addition, If , the BSKM2 method reduces to the Motzkin method.

Now, we bound the expected rate of convergence for Algorithm 3.

Theorem 4.1

From an initial guess , the sequence generated by the BSKM2 method converges linearly in expectation to the least-Euclidean-norm solution and

where satisfies .

Proof

Following an analogous argument to Theorem 3.1, we can obtain

Now, taking expectation of both sides, we have

Note that

which together with leads to

Then, we get

which together with (3) leads to

which is the desired result.

Remark 7

According to Lemma 1, we have and , which together with the facts and yield

That is, the convergence factor of the BSKM2 method is indeed smaller than 1.

Remark 8

Note that , where is the iteration index of the SKM method. Thus, similar to the analysis in Remark 4, we immediately obtain that the BSKM2 method converges at least as quickly as the SKM method. In addition, as , where , we also get that the BSKM2 method converges at least as quickly as the Motzkin method.

Remark 9

From Algorithms 2 and 3, we can find that both the update rules of the two methods need to compute the Moore-Penrose pseudoinverse of the row submatrix or in each iteration, which may be expensive. To avoid computing the Moore-Penrose pseudoinverse, we can adopt the following pseudoinverse-free iteration format:

where represents the weight corresponding to the th row. See Necoara2019 ; Du20202 ; li2020greedy ; moorman2020randomized for a detailed discussion on this topic.

5 Numerical experiments

In this section, we mainly compare our two block sampling Kaczmarz-Motzkin methods (BSKM1, BSKM2) and the SKM method in terms of the iteration numbers (denoted as “Iteration”) and computing time in seconds (denoted as “CPU time(s)”) using the matrix from two sets. One is generated randomly by using the MATLAB function randn, and the other one contains the matrices in Table 1 from the University of Florida sparse matrix collection Davis2011 . To compare these methods more clearly, we set . In addition, for the sparse matrices, the density is defined as follows:

name ch8-8-b2 ch7-8-b2 Franz7 ch7-9-b2 mk12-b2 relat7
Full rank Yes Yes Yes Yes Yes No
Density 0.19% 0.26% 0.23% 0.20% 0.20% 0.36%
Condition number 1.6326e+15 1.9439e+15 5.5318e+15 1.6077e+15 1.8340e+15 Inf
Table 1: The properties of different sparse matrices.

In all the following specific experiments, we generate the solution vector using the MATLAB function randn, and set the vector to be . All experiments start from an initial vector , and terminate once the relative solution error (RES), defined by

satisfies , or the number of iteration steps exceeds 200,000.

Figure 1: Iteration and CPU time(s) versus for three methods with matrices generated randomly.
Figure 2: Iteration and CPU time(s) versus for three methods with matrices generated randomly.
Figure 3: Iteration and CPU time(s) versus for three methods with matrices generated randomly.
Figure 4: Iteration and CPU time(s) versus for three methods with matrices generated randomly.
Figure 5: Iteration and CPU time(s) versus m (10000 50000) with matrices generated randomly and and .
Figure 6: Iteration and CPU time(s) versus n (500 5000) with matrices generated randomly and and .
Figure 7: Iteration and CPU time(s) versus for three methods with the sparse matrix ch8-8-b2.
Figure 8: Iteration and CPU time(s) versus for three methods with the sparse matrix ch7-8-b2.
Figure 9: Iteration and CPU time(s) versus for three methods with the sparse matrix Franz7.
Figure 10: Iteration and CPU time(s) versus for three methods with the sparse matrix ch7-9-b2.
Figure 11: Iteration and CPU time(s) versus for three methods with the sparse matrix mk12-b2.
Figure 12: Iteration and CPU time(s) versus for three methods with the sparse matrix relat7.

For the first class of matrices, that is, the matrices generated randomly, the numerical results of the three methods are presented in Figs. 16. Figs. 14 show that, with different values of , the number of iterative steps and computing time of our two methods are less than those of the SKM method. From Figs. 56, we find that the BSKM1 and BSKM2 methods vastly outperform the SKM method in terms of the iterations and computing time when the problems are large-scale.

For the second class of matrices, that is, the sparse matrices from Davis2011 , we plot the numerical results on Iteration and CPU time(s) versus in Figs. 712. From these figures, we find that the similar results shown in Figs. 14. That is, the BSKM1 and BSKM2 methods converge faster and need less runtime for the same accuracy.

Therefore, in all the cases, our block sampling Kaczmarz-Motzkin methods, i.e., BSKM1 and BSKM2 methods, outperform the SKM method. This is mainly because the latter only updates one index in each iteration while the former enforces multiple greedy indices simultaneously.

References

  • (1) S. Kaczmarz, Angenäherte auflösung von systemen linearer gleichungen, Bull. Int. Acad. Pol. Sci. Lett. A., 35, 355–357 (1937)
  • (2) T. Strohmer, R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, J. Fourier Anal. Appl., 15, 262–278 (2009)
  • (3) D. Needell, Randomized Kaczmarz solver for noisy linear systems, BIT Numer. Math., 50, 395–403 (2010)
  • (4) Y. Eldar, D. Needell, Acceleration of randomized Kaczmarz method via the Johnson-Lindenstrauss lemma, Numer. Algor., 58, 163–177 (2011)
  • (5) A. Zouzias, M. N. Freris, Randomized extended Kaczmarz for solving least squares, SIAM J. Matrix Anal. Appl., 34, 773–793 (2013)
  • (6) A. Ma, D. Needell, A. Ramdas, Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods, SIAM J. Matrix Anal. Appl., 36, 1590–1604 (2015),
  • (7) K. Du, Tight upper bounds for the convergence of the randomized extended Kaczmarz and Gauss-Seidel algorithms, Numer. Linear Algebra Appl., 26, e2233 (2019)
  • (8) N. C. Wu, H. Xiang, Projected randomized Kaczmarz methods, J. Comput. Appl. Math., 372, 112672 (2020)
  • (9) J. Q. Chen, Z. D. Huang, On the error estimate of the randomized double block Kaczmarz method, Appl. Math. Comput., 370, 124907 (2020)
  • (10) S. Agamon, The relaxation method for linear inequalities, Canad. J. Math., 6, 382–392 (1954)
  • (11) T. S. Motzkin, I. J. Schoenberg, The relaxation method for linear inequalities, Canad. J. Math., 6, 393–404 (1954)
  • (12) S. Petra, C. Popa, Single projection Kaczmarz extended algorithms, Numer. Algor., 73, 791–806 (2016)
  • (13) J. Nutini, B. Sepehry, A. Virani, I. Laradji, M. Schmidt, H. Koepke, Convergence rates for greedy Kaczmarz algorithms, presented at UAI (2016)
  • (14) J. Nutini, Greed is good: greedy optimization methods for large-scale structured problems, PhD thesis, University of British Columbia (2018)
  • (15) J. Haddock, D. Needell, On Motzkin’s method for inconsistent linear systems, BIT Numer. Math., 59, 387–401 (2019)
  • (16) E. Rebrova, D. Needell, Sketching for Motzkin’s iterative method for linear systems, Proc. 50th Asilomar Conf. on Signals, Systems and Computers (2019)
  • (17) H. Y. Li, Y. J. Zhang, A novel greedy Kaczmarz method for solving consistent linear systems, arXiv preprint arXiv:2004.02062 (2020)
  • (18) J. A. De Loera, J. Haddock, D. Needell, A sampling Kaczmarz-Motzkin algorithm for linear feasibility, SIAM J. Sci. Comput., 39, S66–S87 (2017)
  • (19) M. S. Morshed, M. S. Islam, M. Noor-E-Alam, Accelerated sampling Kaczmarz Motzkin algorithm for the linear feasibility problem, J. Global Optim., 77, 361–382 (2020)
  • (20) J. Haddock, A. Ma, Greed works: an improved analysis of sampling Kaczmarz-Motzkin, arXiv preprint arXiv:1912.03544 (2019)
  • (21) D. Needell, J. A. Tropp, Paved with good intentions: analysis of a randomized block Kaczmarz method, Linear Algebra Appl., 441, 199–221 (2014)
  • (22) D. Needell, R. Zhao, A. Zouzias, Randomized block Kaczmarz method with projection for solving least squares, Linear Algebra Appl., 484, 322–343 (2015)
  • (23) Y. Q. Niu, B. Zheng, A greedy block Kaczmarz algorithm for solving large-scale linear systems, Appl. Math. Lett., 104, 106294 (2020)
  • (24) R. A. Horn, C. R. Johnson, Matrix analysis, Cambridge Univ. Press (2012)
  • (25) I. Necoara, Faster randomized block Kaczmarz algorithms, SIAM J. Matrix Anal. Appl., 40, 1425–1452 (2019)
  • (26) K. Du, W. T. Si, X. H. Sun, Pseudoinverse-free randomized extended block Kaczmarz for solving least squares, arXiv preprint arXiv:2001.04179 (2020)
  • (27) H. Y. Li, Y. J. Zhang, Greedy block Gauss-Seidel methods for solving large linear least squares problem, arXiv preprint arXiv:2004.02476 (2020)
  • (28) J. D. Moorman, T. K. Tu, D. Molitor, D. Needell, Randomized Kaczmarz with averaging, BIT Numer. Math. (2020)
  • (29) T. A. Davis, Y. F. Hu, The university of florida sparse matrix collection, ACM. Trans. Math. Softw., 38, 1–25 (2011)