A Novel Greedy Kaczmarz Method For Solving Consistent Linear Systems

04/05/2020 ∙ by Hanyu Li, et al. ∙ Chongqing University 0

With a quite different way to determine the working rows, we propose a novel greedy Kaczmarz method for solving consistent linear systems. Convergence analysis of the new method is provided. Numerical experiments show that, for the same accuracy, our method outperforms the greedy randomized Kaczmarz method and the relaxed greedy randomized Kaczmarz method introduced recently by Bai and Wu [Z.Z. BAI AND W.T. WU, On greedy randomized Kaczmarz method for solving large sparse linear systems, SIAM J. Sci. Comput., 40 (2018), pp. A592–A606; Z.Z. BAI AND W.T. WU, On relaxed greedy randomized Kaczmarz methods for solving large sparse linear systems, Appl. Math. Lett., 83 (2018), pp. 21–26] in term of the computing time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider the following consistent linear systems

(1)

where , , and is the

-dimensional unknown vector. As we know, the Kaczmarz method

[kaczmarz1] is a popular so-called row-action method for solving the systems (1). In 2009, Strohmer and Vershynin [Strohmer2009] proved the linear convergence of the randomized Kaczmarz (RK) method. Following that, Needell [Needell2010]

found that the RK method is not converge to the ordinary least squares solution when the system is inconsistent. To overcome it, Zouzias and Freris

[Completion2013] extended the RK method to the randomized extended Kaczmarz (REK) method. Later, Ma, Needell, and Ramdas [Completion2015] provided a unified theory of these related iterative methods in all possible system settings. Recently, many works on Kaczmarz methods were reported; see for example [Bai2018, Bai2018r, Gao2019, Dukui2019, Wu2020, Chen2020, Niu2020] and references therein.

In 2018, Bai and Wu [Bai2018]

first constructed a greedy randomized Kaczmarz (GRK) method by introducing an efficient probability criterion for selecting the working rows from the coefficient matrix

, which avoids a weakness of the one adopted in the RK method. As a result, the GRK method is faster than the RK method in terms of the number of iterations and computing time. Subsequently, based on the GRK method, a so-called relaxed greedy randomized Kaczmarz (RGRK) method was proposed in [Bai2018r] by introducing a relaxation parameter , which makes the convergence factor of the RGRK method be smaller than that of the GRK method when it is in , and the convergence factor reaches the minimum when . For the latter case, i.e., , Du and Gao [Gao2019] called it the maximal weighted residual Kaczmarz method and carried out extensive experiments to test this method. By the way, the idea of greed applied in [Bai2018, Bai2018r] also has wide applications, see for example [Griebel2012, Nguyen2017, Liu2019, Bai2019] and references therein.

In the present paper, we propose a novel greedy Kaczmarz (GK) method. Unlike the GRK and RGRK methods, the new method adopts a quite different way to determine the working rows of the matrix and hence needs less computing time in each iteration; see the detailed analysis before Algorithm 3 below. Consequently, the GK method can outperform the GRK and RGRK methods in term of the computing time. This result is confirmed by extensive numerical experiments, which show that, for the same accuracy, the GK method requires almost the same number of iterations as those of the GRK and RGRK methods, but spends less computing time. In addition, we also prove the convergence of the GK method in theory.

The rest of this paper is organized as follows. In Section 2, notation and some preliminaries are provided. We present our novel GK method and its convergence properties in Section 3. Experimental results are given in Section 4.

2 Notation and preliminaries

For a vector , represents its th entry. For a matrix , , , and denote its

th row, spectral norm, and Frobenius norm, respectively. In addition, we denote the smallest positive eigenvalue of

by , where denotes the conjugate transpose of a vector or a matrix, and the number of elements of a set by .

In what follows, we use , with being the Moore-Penrose pseudoinverse, to denote the least-Euclidean-norm solution to the systems (1). For finding this solution, Bai and Wu [Bai2018] proposed the GRK method listed as follows, where denotes the residual vector.

  Input:  , ,

, initial estimate

  Output:  
  for  do
     Compute
     Determine the index set of positive integers
     Compute the th entry of the vector according to
     Select with probability .
     Set
  end for
Algorithm 1 The GRK method

From the definitions of and in Algorithm 1, we have that if , then

Note that

Thus, we can’t conclude that if , then

As a result, there may exist some such that

(2)

Meanwhile, from the update formula, for any , we have

(3)

Thus, combining (2) and (3), we can find that we can’t make sure any row with the index from the index set make the distance between and be the largest when finding . Moreover, to compute , we have to compute the norm of each row of the matrix .

Based on the GRK method, Bai and Wu [Bai2018r] further designed the RGRK method by introducing a relaxation parameter, which is listed in Algorithm 2.

  Input:  , , , , initial estimate
  Output:  
  for  do
     Compute
     Determine the index set of positive integers
     Compute the th entry of the vector according to
     Select with probability .
     Set
  end for
Algorithm 2 The RGRK method

It is easy to see that when , the RGRK method is just the GRK method. Bai and Wu [Bai2018r] showed that the convergence factor of the RGRK method is smaller than that of the GRK method when , and the convergence factor reaches the minimum when . For the latter case, we have that if , then

From the analysis following the Algorithm 2, in this case, the row with the index from the index set can make the distance between and be the largest for any possible . However, we still needs to compute the norm of each row of the matrix when computing .

3 A novel greedy Kaczmarz method

On basis of the analysis of the GRK and RGRK methods and inspired by some recent works on selection strategy for working index based on the maximum residual [Nutini2018, Haddock2019, Rebrova2019], we design a new method for solving consistent linear systems which includes two main steps. In the first step, we use the maximum entries of the residual vector to determine an index set whose specific definition is given in Algorithm 3. In the second step, we capture an index from the set with which we can make sure the distance between and be the largest for any possible . On a high level, the new method seems to change the order of the two main steps of Algorithm 1 or Algorithm 2. However, comparing with the GRK and RGRK methods, we do not need to calculate the norm of each row of the matrix any longer in Algorithm 3, and, like the RGRK method with , our method always makes the distance between and be the largest when finding . In fact, the new method combines the maximum residual rule and the maximum distance rule. These characters make the method reduce the computation cost at each iteration and hence behaves better in the computing time, which is confirmed by numerical experiments given in Section 4.

Based on the above introduction, we propose the following algorithm, i.e., Algorithm 3.

  Input:  , , , initial estimate
  Output:  
  for  do
     Determine the index set of positive integers
     Compute
     Set
  end for
Algorithm 3 The GK method
Remark 1

Note that if

then So the index set in Algorithm 3 is nonempty for all iteration index .

Remark 2

Like Algorithm 1 or Algorithm 2, we can use the values of for as a probability selection criterion to devise a randomized version of Algorithm 3. In this case, the convergence factor may be a little worse than that of Algorithm 3 because, for the latter, the index is selected based on the largest value of for , which make the distance between and be the largest for any possible .

Remark 3

To the best of our knowledge, the idea of Algorithm 3 is brand new in the fields of designing greedy Kaczmarz type algorithms and we don’t find it in any work on greedy Gauss-Seidel methods either. So it is interesting to apply this idea to Gauss-Seidel methods to devise some new greedy Gauss-Seidel algorithms for solving other problems like large linear least squares problems. We will consider this topic in a subsequent paper.

In the following, we give the convergence theorem of the GK method.

Theorem 1

The iteration sequence generated by Algorithm 3, starting from an initial guess in the column space of , converges linearly to the least-Euclidean-norm solution and

(4)

and

(5)

Moreover, let , Then,

(6)

Proof

From the update formula in Algorithm 3, we have

which implies that is parallel to . Meanwhile,

which together with the fact gives

Then is orthogonal to . Thus, the vector is perpendicular to the vector . By the Pythagorean theorem, we get

(7)

On the other hand, from Algorithm 3, we have

Then

(8)

Thus, substituting (8) into (7), we obtain

(9)

For , we have

which together with a result from [Bai2018]:

(10)

is valid for any vector in the column space of , implies

(11)

Thus, substituting (11) into (9), we obtain

which is just the estimate (4).

For , we have

Note that, according to the update formula in Algorithm 3, it is easy to obtain

(12)

Then

which together with (10) yields

(13)

Thus, substituting (13) into (9), we get

(14)

So the estimate (5) is obtained. By induction on the iteration index , we can get the estimate (6).

Remark 4

Since and , it holds that

Hence, the convergence factor of the GK method is small when the parameters and are small. So, the smaller size of is, the better convergence factor of the GK method is when is fixed. From the definitions of , , and , we can find that the size of may be smaller than those of and . This is one of the reasons that our algorithm behaves better in computing time.

Remark 5

If and , the right side of (5) is smaller than

Since

(15)

which implies

we have

(16)

Note that the error estimate in expectation of the GRK method in [Bai2018] is

where So the convergence factor of GK method is slightly better for the above case.

For the RGRK method, its error estimate in expectation given in [Bai2018r] is

where The estimate attains its minimum at , which is

Considering (15) and similar to derivation of (5), we can get that when and , the convergence factor of GK method is also slightly better than that of the RGRK method.

4 Experimental results

In this section, we compare the GRK, RGRK and GK methods with the matrix from two sets. One is generated randomly by using the MATLAB function randn, and the other includes some full-rank sparse matrices (e.g., ch7-8-b1, ch8-8-b1, model1, Trec8, Stranke94 and mycielskian5) and some rank-deficient sparse matrices (e.g., flower_5_1, relat6, D_11, Sandi_sandi, GD01_c and GD02_a) originating in different applications from [Davis2011]. They possess certain structures, such as square () (e.g., Stranke94, mycielskian5, GD01_c and GD02_a), thin () (e.g., ch7-8-b1, ch8-8-b1, flower_5_1 and relat6 ) or fat () (e.g., model1, Trec8, D_11 and Sandi_sandi), and some properties, such as symmetric (e.g., Stranke94 and mycielskian5) or nonsymmetric (e.g., GD01_c and GD02_a).

We compare the three methods mainly in terms of the iteration numbers (denoted as “IT”) and the computing time in seconds (denoted as “CPU”). It should be pointed out here that the IT and CPU listed in our numerical results denote the arithmetical averages of the required iteration numbers and the elapsed CPU times with respect to 50 times repeated runs of the corresponding methods, and we always set in the RGRK method in our experiments since the convergence factor attains its minimum in this case. To give an intuitive compare of the three methods, we also present the iteration number speed-up of GK against GRK, which is defined as

the iteration number speed-up of GK against RGRK, which is defined as

the computing time speed-up of GK against GRK, which is defined as

and the computing time speed-up of GK against RGRK, which is defined as

In addition, for the sparse matrices from [Davis2011], we define the density as follows

and use cond(A) to represent the Euclidean condition number of the matrix .

In our specific experiments, the solution vector is generated randomly by the MATLAB function randn and we set the right-hand side . All the test problems are started from an initial zero vector and terminated once the relative solution error (RES), defined by

satisfies or the number of iteration exceeds .

IT GRK 88.7600 79.3200 75.4200 74.1200 72.3000
RGRK 67.0000 57.0000 50.0000 51.0000 48
GK 77.0000 64.0000 58.0000 54.0000 52
speed-up_1 1.1527 1.2394 1.3003 1.3726 1.3904
speed-up_2 0.8701 0.8906 0.8621 0.9444 0.9231
CPU GRK 0.0475 0.0606 0.0681 0.1241 0.1416
RGRK 0.0300 0.0353 0.0394 0.0862 0.0928
GK 0.0066 0.0084 0.0094 0.0222 0.0278
speed-up_1 7.2381 7.1852 7.2667 5.5915 5.0899
speed-up_2 4.5714 4.1852 4.2000 3.8873 3.3371
Table 1: IT and CPU of GRK, RGRK and GK for m-by-n matrices with and different .
IT GRK 205.0400 167.8400 157.2600 152 146.8600
RGRK 177 129 120 114 110
GK 183 137 122 122 113
speed-up_1 1.1204 1.2251 1.2890 1.2459 1.2996
speed-up_2 0.9672 0.9416 0.9836 0.9344 0.9735
CPU GRK 0.0959 0.0972 0.1163 0.2744 0.3409
RGRK 0.0844 0.0747 0.0912 0.2172 0.2512
GK 0.0187 0.0256 0.0291 0.0663 0.0791
speed-up_1 5.1167 3.7927 4 4.1415 4.3123
speed-up_2 4.5000 2.9146 3.1398 3.2783 3.1779
Table 2: IT and CPU of GRK, RGRK and GK for m-by-n matrices with and different .
IT GRK 364.6800 276.0200 249.5800 233.6800 226.4000
RGRK 318 239 199 189 179
GK 321 245 202 192 183
speed-up_1 1.1361 1.1266 1.2355 1.2171 1.2372
speed-up_2 0.9907 0.9755 0.9851 0.9844 0.9781
CPU GRK 0.1906 0.1734 0.2712 0.6228 0.8194
RGRK 0.1675 0.1572 0.2081 0.5209 0.6547
GK 0.0462 0.0500 0.0737 0.1556 0.2391
speed-up_1 4.1216 3.4687 3.6780 4.0020 3.4275
speed-up_2 3.6216 3.1437 2.8220 3.3474 2.7386
Table 3: IT and CPU of GRK, RGRK and GK for m-by-n matrices with and different .
IT GRK 557.7000 398.6800 351.0400 328.3800 312.2400
RGRK 517 341 294 277 257
GK 504 334 294 264 258
speed-up_1 1.1065 1.1937 1.1940 1.2439 1.2102
speed-up_2 1.0258 1.0210 1 1.0492 0.9961
CPU GRK 0.2706 0.2797 0.4300 1.1750 1.3834
RGRK 0.2566 0.2425 0.4034 1.0497 1.1747
GK 0.0741 0.0791 0.1197 0.3753 0.5031
speed-up_1 3.6540 3.5375 3.5927 3.1307 2.7497
speed-up_2 3.4641 3.0672 3.3708 2.7968 2.3348
Table 4: IT and CPU of GRK, RGRK and GK for m-by-n matrices with and different .
IT GRK 127.6600 114.3800 100.9400 97.7000 95.2000
RGRK 127 118 99 91 91
GK 126 117 98 92 91
speed-up_1 1.0132 0.9776 1.0300 1.0620 1.0462
speed-up_2 1.0079 1.0085 1.0102 0.9891 1
CPU GRK 0.0594 0.0669 0.0650 0.1247 0.1563
RGRK 0.0581 0.0625 0.0638 0.1166 0.1412
GK 0.0172 0.0275 0.0313 0.0625 0.0766
speed-up_1 3.4545 2.4318 2.0800 1.9950 2.0408
speed-up_2 3.3818 2.2727 2.0400 1.8650 1.8449
Table 5: IT and CPU of GRK, RGRK and GK for m-by-n matrices with and different .
IT GRK 285.1800 264.6200 232.9000 217.3200 212.3800
RGRK 276 255 232 215 208
GK 268 256 226 214 208
speed-up_1 1.0641 1.0337 1.0305 1.0155 1.0211
speed-up_2 1.0299 0.9961 1.0265 1.0047 1
CPU GRK 0.1412 0.1638 0.2197 0.4278 0.5150
RGRK 0.1375 0.1497 0.2172 0.4253 0.5031
GK 0.0431 0.0622 0.0788 0.1747 0.2122
speed-up_1 3.2754 2.6332 2.7897 2.4490 2.4271
speed-up_2 3.1884 2.4070 2.7579 2.4347 2.3711
Table 6: IT and CPU of GRK, RGRK and GK for m-by-n matrices with and different .
IT GRK 589.7600 441.2800 364.5600 342.4400 340.4400
RGRK 580 432 355 331 330
GK 586 427 358 338 323
speed-up_1 1.0064 1.0334 1.0183 1.0131 1.0540
speed-up_2 0.9898 1.0117 0.9916 0.9793 1.0217
CPU GRK 0.3700 0.3922 0.4500 0.9569 1.2394
RGRK 0.3531 0.3503 0.4416 0.9291 1.1884
GK 0.0975 0.1291 0.2250 0.5281 0.7097
speed-up_1 3.7949 3.0387 2 1.8118 1.7464
speed-up_2 3.6218 2.7143 1.9625 1.7592 1.6746
Table 7: IT and CPU of GRK, RGRK and GK for m-by-n matrices with and different .
IT GRK 946.4600 641.0400 540.6600 497.5000 471.0200
RGRK 932 636 514 492 455
GK 962 628 521 499 455
speed-up_1 0.9838 1.0208 1.0377 0.9970 1.0352
speed-up_2 0.9688 1.0127 0.9866 0.9860 1
CPU GRK 0.6241 0.6269 0.7834 1.7025 2.1916
RGRK 0.6238 0.5909 0.7381 1.6322 2.0866
GK 0.1766 0.2272 0.3756 1.0044 1.2925
speed-up_1 3.5345 2.7593 2.0857 1.6951 1.6956
speed-up_2 3.5327 2.6011 1.9651 1.6251 1.6144
Table 8: IT and CPU of GRK, RGRK and GK for m-by-n matrices with and different .

For the first class of matrices, that is, the randomly generated matrices, the numerical results on IT and CPU are listed in Tables 4, 3, 2, and 1 when , and in Tables 8, 7, 6, and 5 when . From Tables 8, 7, 6, 5, 4, 3, 2, and 1, we see that the GK method requires almost the same number of iterations as those of the GRK and RGRK methods but the GK method is more efficient in term of the computing time. The computing time speed-up of GK against GRK is at least 1.6951 (see Table 8 for the matrix) and at most 7.2667 (see Table 1 for the matrix), and the computing time speed-up of GK against RGRK is at least 1.6144 (see Table 8 for the matrix) and at most 4.5714 (see Table 1 for the matrix).

name ch7-8-b1 ch8-8-b1 model1 Trec8 Stranke94 mycielskian5
full rank Yes Yes Yes Yes Yes Yes
density 3.57% 3.13% 1.05% 28.42% 90.00% 26.84%
cond(A) 4.79e+14 3.48e+14 17.57 26.89 51.73 27.64
IT GRK 103.9800 113.0800 4.7484e+03 1.7655e+03 5.6706e+03 4.2651e+03
RGRK 87.0600 89 4504 1646 4158 4268
GK 87 89 4183 1681 3636 4169
speed-up_1 1.1952 1.2706 1.1352 1.0502 1.5596 1.0230
speed-up_2 1.0007 1 1.0767 0.9792 1.1436 1.0237
CPU GRK 0.0178 0.0200 0.5381 0.1619 0.4847 0.3816
RGRK 0.0116 0.0119 0.5128 0.1500 0.3497 0.3700
GK 0.0047 0.0047 0.1953 0.0441 0.0800 0.0872
speed-up_1 3.8000 4.2667 2.7552 3.6738 6.0586 4.3763
speed-up_2 2.4667 2.5333 2.6256 3.4043 4.3711 4.2437
Table 9: IT and CPU of GRK, RGRK and GK for m-by-n matrices with different and .
name flower_5_1 relat6 D_11 Sandi_sandi GD01_c GD02_a
full rank No No No No No No
density 1.42 % 2.21 % 3.79 % 0.54 % 12.40% 16.45 %
cond(A) 2.00e+16 Inf 2.21e+17 1.47e+17 Inf Inf
IT GRK 9.8127e+03 1.6099e+03 682.8200 1756 1.9329e+03 1.3928e+03
RGRK 9.8616e+03 1.5199e+03 668 1.6864e+03 1819 1469
GK 10521 1510 690 1787 1823 1228
speed-up_1 0.9327 1.0661 0.9896 0.9827 1.0603 1.1342
speed-up_2 0.9373 1.0065 0.9681 0.9437 0.9978 1.1963
CPU GRK 1.0197 0.2550 0.0866 0.1812 0.1663 0.1241
RGRK 0.9653 0.2353 0.0853 0.1741 0.1538 0.1219
GK 0.3006 0.0766 0.0369 0.0600 0.0378 0.0303
speed-up_1 3.3919 3.3306 2.3475 3.0208 4.3967 4.0928
speed-up_2 3.2110 3.0735 2.3136 2.9010 4.0661 4.0206
Table 10: IT and CPU of GRK, RGRK and GK for m-by-n matrices with different a