    GMRES algorithms over 35 years

This paper is about GMRES algorithms for the solution of nonsingular linear systems. We first consider basic algorithms and study their convergence. We then focus on acceleration strategies and parallel algorithms that are useful for solving challenging systems. We also briefly discuss other problems, such as systems with multiple right-hand sides, shifted systems, and singular systems.

Authors

10/13/2017

On Parallel Solution of Sparse Triangular Linear Systems in CUDA

The acceleration of sparse matrix computations on modern many-core proce...
02/22/2021

An algorithm to determine regular singular Mahler systems

This paper is devoted to the study of the analytic properties of Mahler ...
10/12/2019

Structured condition number for multiple right-hand side linear systems with parameterized quasiseparable coefficient matrix

In this paper, we consider the structured perturbation analysis for mult...
07/30/2019

Revisiting Performance of BiCGStab Methods for Solving Systems with Multiple Right-Hand Sides

The paper discusses the efficiency of the classical BiCGStab method and ...
12/30/2018

A New Deflation Method For Verifying the Isolated Singular Zeros of Polynomial Systems

In this paper, we develop a new deflation technique for refining or veri...
01/28/2020

A survey of subspace recycling iterative methods

This survey concerns subspace recycling methods, a popular class of iter...
02/10/2015

Talk to the Hand: Generating a 3D Print from Photographs

This manuscript presents a linear algebra-based technique that only requ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Generalized minimal residual (GMRES) algorithms are widely used for solving nonsymmetric linear systems arising from partial differential equations. They possess some optimality property and behave reasonably well in practice. The first and most well-known algorithm is that of Saad and Schultz, which was initially introduced in a technical report in 1983, and formally published in 1986

. Since then, numerous variants appeared, as well as studies of their convergence and accuracy. In this paper we concentrate on GMRES algorithms. We sketch the main developments and focus on the algorithmic innovations in the past 35 years.

Consider the nonsingular linear system

 Ax=b, (1)

where and . Given an initial approximation , a projection method constructs a sequence of approximate solutions , , such that

 xn∈x0+Sn,rn⊥Cn,

where

is the residual vector and

and are -dimensional subspaces, called respectively search space and constraint space. As is nonsingular, is uniquely defined. In general, we build a sequence of nested search spaces

 S1⊂S2⊂S3⊂… (2)

to ensure finite termination; see, e.g., . A minimal residual method obtains the optimal approximation by minimizing the residual norm over all candidate vectors in the search space

 ∥rn∥=∥b−Axn∥=minx∈x0+Sn∥b−Ax∥, (3)

where denotes the 2-norm or the corresponding induced matrix norm. This class of methods can be interpreted as a projection process with . In other words, they find an optimal correction in , such that is the orthogonal projection of  onto ; see [81, 219, 80] for further analysis. The Krylov subspace is the most broadly employed search space and is defined by

 Kn=Kn(A,r0)=span{r0,Ar0,…,An−1r0}. (4)

These subspaces are nested in the sense of (2) and can be built up gradually using only matrix-vector multiplication by . The residual  lies in the next Krylov subspace . If denotes the grade of  with respect to , then holds. Therefore, the sequence of Krylov subspaces will eventually become invariant. A projection method with  is called a Krylov subspace method. Practically a Krylov subspace method is terminated as soon as the approximate solution is good enough. Ways to formulate typical Krylov subspace methods and derivations of their properties have been extensively discussed in the literature; see, e.g., [155, 105, 121, 220, 266, 243, 167, 186] and references therein. A comparison of Krylov subspace methods can be found in a recent work ; see also [167, 222] for a historical perspective.

Classical GMRES algorithms realize the so-called minimal residual Krylov subspace method, or loosely called GMRES method, which is a projection process satisfying (3) and taking (4) as search space. In this case, the projection process can be rewritten as follows:

 xn∈x0+Kn, (5a) rn⊥AKn, (5b)

or equivalently,

 ∥rn∥=∥b−Axn∥=minx∈x0+Kn∥b−Ax∥. (6)

We shall consider both sequential and parallel GMRES algorithms. Some variants such as restarted or hybrid algorithms exhibit more complex behavior and may no longer possess the finite termination property. Other crucial aspects that can dramatically affect the performance of Krylov subspace algorithms include the finite precision effect and communication costs; see  for a general insight on the cost of the iterative computation and its role in Krylov subspace methods. In what follows these topics shall be discussed to some extent. We do not talk here about general-purpose preconditioning techniques like incomplete factorization and algebraic multigrid, since these topics deserve individual coverage by themselves, for which we refer the reader to [27, 281]; see also  for experiments. For the similar reason, problem-specific techniques will also not be covered, though they can be very successful when prior knowledge of the source problem is available; see, e.g., [201, 109, 208]. We only mention certain techniques that are closely related to GMRES iterations. Throughout most of the paper we restrict to the nonsingular linear system (1). In Section 5, however, we briefly mention some work where this restriction is lifted.

The paper is organized as follows. We begin, in Section 2, by formulating basic GMRES algorithms, giving some equivalent formulations, and presenting theoretical results in both exact and finite arithmetic. In Section 3 we summarize various strategies for improving classical algorithms. In Section 4 we discuss parallel techniques that can reduce or hide communication overhead. Sections 5 is devoted to a brief literature review of related topics. Finally, concluding remarks are draw in Section 6.

2 Basic algorithms and convergence

The techniques introduced in this section form the basis for subsequent discussions. In particular, the algorithm developed by Saad and Schultz  is still in common use today. Methodologies have been developed to partially characterize their convergence behavior. Note that the literature on Krylov subspace methods uses the term “GMRES” almost as a synonym for the specific algorithm developed in 

. For the sake of clarity, here the latter is referred to as “MGS-GMRES”, as the modified Gram-Schmidt process is applied in

 to orthogonalize successive basis vectors of (4). Later, the distinction between the two terms shall often be blurred because in the literature GMRES is almost exclusively implemented by means of a Gram-Schmidt-like process.

2.1 Mgs-Gmres

Saad and Schultz proposed the first GMRES algorithm in , which uses the Arnoldi process  to generate an orthonormal basis  of the Krylov subspace (4). The key relation used in the Arnoldi process is

 hj+1,jvj+1=Avj−j∑i=1hi,jvi, (7)

where and are selected such that  is normalized and orthogonal to all the previous basis vectors, that is,

 hi,j=(Avj,vi),i=1,…,j, (8) hj+1,j=∥Avj−j∑i=1hi,jvi∥, (9)

where denotes the Euclidean inner product. The above rules correspond directly to the classical Gram-Schmidt (CGS) orthogonalization process, leading to the so-called CGS-Arnoldi algorithm. In practice the computation of  by (7)–(9) undergoes a severe cancellation. Special precautions, such as the reorthogonalization [61, 138] and modified Gram-Schmidt (MGS) processes, have been recommended to ensure numerical stability. The former is costly and generally used in the context of parallel algorithms [141, 256]

, eigenvalue problems

, and mixed precision , while MGS has the same operation count as CGS, but with better numerical properties; see  for a thorough presentation of Gram-Schmidt orthogonalization. Arnoldi with MGS orthogonalization (MGS-Arnoldi)  can be viewed as the standard version of Arnoldi iteration.

In Algorithm 1, we can see that lines 3–6 are mathematically equivalent to (7)–(8). Let denote the MGS projector. Then, according to Algorithm 1, one finds . Recalling the grade  of  defined in the preceding section, we observe that (line 7) if and only if , that is, the maximal dimension of the Krylov subspace is attained. In what follows we shall assume .

Let us write

 Vn=[v1,…,vn]∈RN×n,Hn=[hi,j]∈Rn×n,¯¯¯¯Hn=[Hnhn+1,ne⊺n]∈R(n+1)×n,

where denotes the

th column of the identity matrix of appropriate order and

for . Therefore, and are upper Hessenberg matrices. It follows that

 AVn (10a) =VnHn+hn+1,nvn+1e⊺n, (10b) VTnAVn =Hn, (10c)

which can be interpreted as an orthogonal reduction of  to upper Hessenberg form. Let denote the coordinate vector of the correction  in the basis . Then it follows from the minimal residual process (5)–(6) that

 y (11) xn =x0+Vny,

where . Saad and Schultz  suggested using Givens rotations that reduce the upper Hessenberg matrix  recursively to the upper triangular matrix.

We show MGS-GMRES as Algorithm 2, where denotes the residual norm, denotes the th entry of vector  and refers to the subvector with elements indexed by  through . Lines 3–7 are the same as in Algorithm 1, followed by Givens rotations

 (hi,jhi+1,j)←(cisi−sici)(hi,jhi+1,j)

to maintain a QR factorization of the upper Hessenberg matrix , which involves all the previous rotations (lines 8–10) and a new rotation (lines 11–13) to annihilate the subdiagonal entry. In particular, and correspond to the sine and cosine of the th rotation angle such that ; we refer the reader to  for a more robust implementation of Givens rotations. An important by-product is that the residual norm  can be readily obtained at the end of each step (line 13). Note that the CGS-based GMRES (CGS-GMRES) algorithm can be easily derived by moving the vector update operations in line 5 out of the for-loop. The CGS projector can be written as .

In , it was proved that MGS-GMRES never breaks down. Besides, in a remarkable work by Paige et al. , it was proved that MGS-GMRES is backward stable. By modifying the constraint (5b) as , we can get the so-called FOM method . A detailed account of relations between GMRES and FOM can be found in [38, 59]; see also . FOM has received far less attention partly because the corresponding algorithms can break down. However, it was shown in  that for the cases FOM performs poorly, nor will GMRES offers much help. In this situation, the latter may still be preferred due to its minimal residual property. GMRES has also been compared with CMRH and Lanczos-based methods; see [227, 229, 140].

Algorithm 2 uses long recurrences to orthogonalize  against all the previous basis vectors, the cost of which grows quadratically with the number of steps. Saad and Schultz  considered restarting and truncation as remedies. The truncation strategy keeps only a small number of previous basis vectors in the orthogonalization phase, and computes the approximate solution in a progressive manner; see . In the restarted version, a parameter is introduced to limit the size of the Krylov subspaces. The resulting scheme outlined in Algorithm 3 is called restarted GMRES, denoted by GMRES(). Practically we use the restarted scheme on the top level, and a specific GMRES algorithm at the lower level; see  for an efficient implementation with variable orthogonalization schemes.

Unfortunately, the desired properties such as finite termination and residual minimization are lost. Both strategies may result in poor convergence or even stagnation; see, e.g., [99, 251, 65, 17, 115, 249]. Moreover, using a large  may not necessarily lead to fast convergence (see [81, 90]). We shall discuss other potential acceleration strategies in Section 3.

2.2 Hh-Gmres

The minimal residual Krylov subspace method can be completely described by (5) from a mathematical point of view. From a computational point of view, however, mathematically equivalent algorithms may have quite different numerical behavior. Since at that time the backward stability of MGS-GMRES  was still unknown, motivated by numerical stability concerns, Walker  in 1988 proposed an algorithm that uses Householder reflections to orthogonalize the basis vectors; see also .

Algorithm 4 depicts the Householder-based GMRES (HH-GMRES) algorithm. Here, returns  or  depending on the sign of  and denotes the th element of . The signs (lines 1 and 4) are chosen to reduce the risk of subtractive cancellation. Line 4 corresponds directly to Householder reflections; the details can be found in [118, 220]. HH-GMRES gradually generates a QR factorization

 [r0,Av1,…,Avn]=P1P2…Pn+1[βe1,h1,…,hn].

By discarding the zero rows of the upper triangular matrix and the first column of both sides, we can obtain the Arnoldi relation (10). The Householder vector  contains zeros in the first rows (line 3) such that the Householder transformation can leave the previous vectors

unchanged. For the solution of line 8 and the estimate of the residual norm at each iteration step, the same techniques used in Algorithm

2 can be applied. Note that we only need to store the ’s instead of the ’s. We also prefer not to explicitly store the ’s but use the Qin Jiushao’s scheme (Horner’s scheme) to evaluate line 9 in a recursive manner; see, e.g., .

Similar to MGS-GMRES, the cost of HH-GMRES increases dramatically as the number of steps increases. Therefore, the restarted scheme in Algorithm 3 is commonly employed as an outer loop. HH-GMRES was proved earlier to be backward stable (see ), but can be roughly two times more expensive than MGS-GMRES.

2.3 Simpler GMRES

In 1994, Walker and Zhou  proposed another variant of the GMRES method, called simpler GMRES (SGMRES), which does not require the QR factorization of an upper Hessenberg matrix. Although rarely used in practice due to stability issues, this algorithm can enhance our understanding of minimal residual iterations.

We present a generalized version as given in  that contains the essential features of SGMRES. Consider a nonorthogonal basis  of  and an orthonormal basis  of . We construct a matrix relation

 AZn=VnTn (12)

such that is an upper triangular matrix. According to the constraint (5b), one finds , yielding the following update:

 αn=(rn−1,vn),rn=rn−1−αnvn. (13)

In addition, since  with some n-dimensional vector , it follows that . Combining (5b) and (13), one obtains that

 Tny=VTnr0=[α1,…,αn]T.

The generalized simpler GMRES framework (see ) is listed in Algorithm 5. Choosing  results in SGMRES, which was formalized by virtue of both MGS- and Householder-based processes in . Note that in the Householder variant there is no need to store the ’s and thus line 4 should be modified accordingly; see  for details. The choice  was introduced in , leading to a slightly different algorithm called residual-based simpler GMRES (RB-SGMRES). Then, an adaptive variant combining the two choices was proposed in , following the rule

 (14)

with . Notice that and correspond to SGMRES and RB-SGMRES, respectively.

Theoretical aspects are collected in [280, 163, 164, 56, 150, 149], which provides bounds for the condition number of  and allows for a better understanding of the unstable behavior of Algorithm 5; see also . It’s interesting to observe that, along with the results in , SGMRES can be employed to describe GMRES convergence; see .

2.4 Equivalent formulations

We briefly review some iterative schemes that are in some sense mathematically equivalent to GMRES. Paige and Saunders  in 1975 proposed the MINRES algorithm satisfying (6) for solving symmetric indefinite systems, which motivated the development of MGS-GMRES. In 1982, Elman [88, 83] proposed the GCR algorithm which amounted to generalizing the conjugate residual algorithm (see, e.g., ) to the nonsymmetric case.

In Algorithm 6, the term  can be updated by means of the formula in line 6 and the value of  without an additional matrix-vector multiplication. GCR generates a set of -orthogonal basis vectors  spanning the Krylov subspace (4). These vectors can be alternatively updated by

 βi,j=−(A2qj,Aqi)/(Aqi,Aqi),qj+1=Aqj+j∑i=0βi,jqi, (15)

leading to the so-called ORTHODIR algorithm, which was proposed by Jea and Young [291, 147]. A truncated version of GCR was given earlier in , and both GCR and ORTHODIR can be formulated in a cyclic manner like Algorithm 3 to reduce the associated work and storage. Nonetheless, GCR can fail if the symmetric part of  is indefinite and the implementation of (15) can cause stability problems, while it is known that MGS-GMRES is a cheaper and more robust alternative; see, e.g., . Similar reasoning holds for another early variant proposed by Axelsson . It was mentioned in  that there is a close relationship between SGMRES and ORTHODIR; see also , in which it was also shown that RB-SGMRES is closely related to GCR.

An early algorithm that realized (6) was already described by Khabaza  in 1963, which used as the basis vectors and thus suffered from numerical instabilities. In , Eirola and Nevanlinna formalized a quasi-Newton method, later called EN method, that approximates  by rank-one updates. It was shown in [82, 275] that a variant of EN is mathematically equivalent to GMRES; see also  for additional comments. More recent examples of equivalence relations involving quasi-Newton methods can be found in [135, 136]. In 1988, on the other hand, Sidi gave a proof in  that an acceleration technique called reduced rank extrapolation is mathematically equivalent to GMRES; see  for a survey on vector extrapolation methods. Later, it was established by Walker and Ni  that another technique called Anderson acceleration is equivalent to GMRES in some sense; we refer the reader to [279, 37] for more details on Anderson acceleration and its relation to GMRES.

2.5 Convergence behavior

An important observation for studying GMRES convergence is that any vector in the Krylov subspace (4) can be written in polynomial form, which implies

 rn=pn(A)r0,pn∈Pn,

where is the set of all polynomials  of degree at most  such that . Then, (6) can be expressed as

 ∥rn∥=∥pn(A)r0∥=minp∈Pn∥p(A)r0∥. (16)

Let denote the eigenvalues of a matrix and . Assume that is diagonalizable so that  with . The residual norm of GMRES satisfies

 ∥rn∥∥r0∥≤κ(X)minp∈Pnmaxi|p(λi)|, (17)

where denotes the condition number of . Note that being normal implies , in which case the bound in (17) is sharp (see [124, 152]). This bound was presented in 1982 by Elman  in the context of the so-called GCR algorithm; see also [83, 223]. It was also proved in  that if is positive definite, then we have

 ∥rn∥∥r0∥≤(1−λmin(M)2λmax(ATA))n2,

where and are respectively the smallest and largest eigenvalues of a matrix in the absolute sense. This bound was improved two decades later in [25, 24]. Now, it follows from (16) that

 ∥rn∥∥r0∥≤ψn(A)≤φn(A),ψn(A)=max∥v∥=1minp∈Pn∥p(A)v∥,φn(A)=minp∈Pn∥p(A)∥. (18)

In general, is called worst-case GMRES value, as its purpose is to study the worst-case behavior of the GMRES method. It is known that is attainable by the residual norm, that is, for each  there exists an initial vector  such that (see, e.g., ). is called ideal GMRES value, which was introduced in  to totally exclude the influence of the initial vector. It was proved in [129, 170] that has a unique minimizer, while as discussed in [100, 169], the polynomial and unit-norm vector associated with  may not be uniquely determined. Moreover, if is normal, then  holds (see [124, 152]); otherwise, there exist examples for which  (see [99, 260, 100]). We refer the reader to [168, 169, 258, 100] and their references for further details on (18).

The field of values plays a beneficial role in convergence analysis, which is defined as

 F(A)={(Av,v):∥v∥=1,v∈CN}.

Assume that the origin is outside  and let

 μF(A)=minλ∈F(A)|λ|.

Then, the following bound holds:

 ∥rn∥∥r0∥≤(1−μF(A)μF(A−1))n2; (19)

see [253, 80] for proofs based on the worst-case GMRES approximation; see also  where the bound in (19) was proved to be hold for the ideal GMRES approximation. Besides, an early account of the field of the values applied to convergence analysis can be found in . We refer the reader to the most recent work  and the references therein for the use of the field of values for convergence analysis. The polynomial numerical hull introduced in  can be seen as a generalization of the field of values, which is defined as

 Hn(A)={λ∈C:∥p(A)∥≥|p(λ)|,deg(p)≤n},

from which a lower bound of the ideal GMRES approximation can be easily derived. However, the determination of  is very difficult in its full generality. We refer to [122, 98, 123, 258] for related works. Another group of results is obtained via the -pseudospectrum . Let denote the identity matrix of appropriate size. For a real , the -pseudospectrum of  is the set

 Λϵ={λ∈C:∥λI−A∥−1≥ϵ−1},

from which an upper bound of  can be obtained; see, e.g., [198, 262] for more details.

Linear contraction bounds for the residual norms can be somewhat misleading since a highly nonlinear convergence behavior can not be adequately described by a linear contraction. The book  written by Liesen and Strakoš is very useful for getting more insight in the cost of computations (see also ), which highlights the inherent nonlinear nature of Krylov subspace methods. In , an experimental comparison of GMRES convergence bounds based on eigenpairs, field of values and -pseudospectrum for non-normal matrices can be found, but none of these can be consistently better than the others. In order to get more descriptive results, Titley-Peloquin et al.  in 2014 derived several GMRES bounds that involve the initial residual vector; see also  for an analysis with respect to nonstandard inner products. More recently, Sacchi and Simoncini  gave a more descriptive convergence analysis specifically for the case of localized ill-conditioning.

It was confirmed in a series of papers by Arioli, Greenbaum, Pták and Strakoš [128, 126, 8] from 1994 to 1998 that convergence of GMRES for non-normal matrices could not be determined solely by the distribution of eigenvalues. In , Greenbaum and Strakoš studied the matrices  for which the spaces and are the same. As a result, the convergence history generated for the pair by GMRES is the same as that generated for the pair. The matrices  that satisfies this property are called GMRES-equivalent matrices (see, e.g., [163, 78]). Among other results, Greenbaum and Strakoš concisely state in  on GMRES convergence that

Any behavior that can be seen with the method can be seen with the method applied to a matrix having any nonzero eigenvalues.

This has been extended by Greenbaum et al.  along a different line, who conclude that

Any nonincreasing convergence curve can be obtained with GMRES applied to a matrix having any desired eigenvalues.

Finally, a complete parametrization of all  pairs with prescribed eigenvalues for which GMRES generates prescribed residual norms has been given in ; see also  for further discussion. GMRES-equivalent matrices have been employed by some subsequent studies [163, 78], while investigations of “any behavior is possible”, like that in [128, 126, 8], have been continued in the past ten years, concerning Ritz values [75, 76], harmonic Ritz values , restarted GMRES [271, 77], and block GMRES . Note that Ritz and harmonic Ritz values are respectively the roots of FOM and GMRES residual polynomials; see [119, 185]. We also mention  for a generalization to other Krylov subspace methods.

Eigenvalue clusters associated with a parameterized model were employed in  to explain GMRES convergence. Exact but complicated expressions for residual norms were derived in [144, 165, 228]; see also . Inspired by the work  on the conjugate gradient method, van der Vorst and Vuik  in 1993 investigated the superlinear convergence behavior of GMRES by means of Ritz values. Simoncini and Szyld  addressed the same problem using spectral projectors. Moret  discussed superlinear convergence of GMRES in a complex separable Hilbert space; see also  for a generalization of Moret’s result. Since any nonincreasing convergence curve is possible for GMRES , on the other hand, stagnation may occur even for the original GMRES method without restarting. This was first investigated by Brown  in 1991, and was then discussed in [293, 169, 244, 237, 182, 184]. Although most work on convergence analysis considered full GMRES, the restarted version is more practical, on which we refer the reader to, e.g., [151, 294, 295, 16, 270, 271, 77] for more details.

The above discussion assumes exact arithmetic. In finite precision arithmetic, it was proved by Drkošová et al.  in 1995 that HH-GMRES is backward stable. In 2006, it was proved by Paige et al.  that MGS-GMRES is also backward stable; previous observations can be found in [127, 206]. Due to the ill-conditioning of the matrix  in (12), SGMRES has serious stability problems (see, e.g, [164, 56]), which can be partially remedied by using RB-SGMRES  or the adaptive variant (14; see also the experiments in .

3 Acceleration strategies

The focus in this section is on the acceleration strategies to tackle the challenge that a basic GMRES algorithm might face. First, it is almost certain that the use of preconditioners will be beneficial for Krylov subspace methods when solving ill-conditioned problems. There are also numerous publications concerning deflation and augmentation, some of which are highly related to preconditioning. Then we briefly introduce weighted inner products and inexact matrix-vector products, and end up this section by discussing mixed-precision techniques.

3.1 Preconditioning

Preconditioning of the linear system is used to improve the convergence behavior of Krylov subspace methods. One needs to find a nonsingular matrix  close to  in some sense, commonly called preconditioning matrix or preconditioner, such that the new system involving  should be inexpensive to solve, that is,

 M−1Ax=M−1b, (20) AM−1u=b,Mx=u. (21)

The cases (20) and (21) correspond to left preconditioning and right preconditioning, respectively. Another form called split preconditioning can be employed when nearly symmetric systems are encountered; see, e.g., . For GMRES, right preconditioning is often preferred because the residuals in this case are identical to the true residuals, only the solution update formula being transformed to , while in another case left preconditioning leads to preconditioned residuals . In addition, from an algorithmic point of view, one only needs to transform line 3 in Algorithm 2 to  for the case (20) and  for (21). High-quality preconditioners are often constructed by incorporating problem-specific information; see, e.g., [201, 95, 96, 109, 208]. On the other hand, general-purpose preconditioners have also been broadly exploited, such as incomplete factorization, approximate inverse, and algebraic multigrid; see [27, 281] and references therein.

The flexible MGS-GMRES variant proposed by Saad  in 1993 allows the preconditioner to vary at each step, which shall be denoted by FGMRES as widely used in the literature.

As seen in Algorithm 7, should be explicitly stored in , and then reused in line 10. Therefore, the Arnoldi-like relation  holds. Unlike standard right preconditioning, which constructs an orthonormal basis of the Krylov subspace

 span{r0,AM−1r0,…,(AM−1)n−1r0},

the subspace of solution estimates in FGMRES is no longer a standard Krylov subspace. In addition, may be singular even if is nonsingular, and thus the assumption of nonsingularity of  must be made when deducing exact solution from ; see [217, 220] for more discussion. A very detailed analysis of FGMRES, in comparison with right preconditioned MGS-GMRES, can be found in . More recently, Greif et al.  developed the multi-preconditioned GMRES scheme in 2017, which shares some similarities with the block version of FGMRES [43, 42], but targets (1) instead of systems with multiple right-hand sides. In practice, a restarted procedure should be used in replacement of the simple version in Algorithm 7; see  for details on the implementation of FGMRES.

Another interesting but less used variable preconditioning technique proposed by van der Vorst and Vuik  in 1994 consists of the GCR algorithm as the outer iterations and a GMRES algorithm as the inner iterations. The inner algorithm constructs a preconditioner for the outer procedure. This new scheme is called GMRESR, which was motivated by the EN algorithm . Note that in GMRESR the residuals are preconditioned, while in FGMRES, as shown in Algorithm 7, the search directions are preconditioned. We refer to  for a comparison of FGMRES and GMRESR. Extensions of GMRESR have been developed in [64, 65] by requiring that the inner basis maintains orthogonality to the outer basis, possibly combining the truncation strategy to reduce storage costs.

The GMRES polynomial itself, defined as the polynomial in (16), can be used to derive preconditioners for Krylov subspace algorithms. Liu et al.  in 2015 developed a polynomial preconditioned GMRES variant using the GMRES polynomial. An improved approach was proposed in  by Loe and Morgan. The idea can be summarized as follows:

1. Run steps of MGS-GMRES;

2. Compute the harmonic Ritz values (see (24)) which are the roots of the GMRES polynomial;

3. Order the roots and construct the polynomial;

4. Apply a GMRES algorithm to the polynomial preconditioned system.

We refer the reader to [174, 178, 177, 176] for more details and experiments. Polynomial preconditioners can be combined with other preconditioners and were highlighted as effective tools for parallel computing. Early contributions of hybrid or polynomial preconditioned GMRES algorithms can be found in [199, 254, 152, 269]. The stability problems arising in high degree preconditioners were addressed in [91, 176, 290].

In order to overcome scaling difficulties for large number of processor cores, two general iterative schemes were introduced in 2014 by McInnes et al. . As an example, they presented a two-level hierarchical FGMRES algorithm. The outer level applies FGMRES to the global problem, while the inner level executes inexact GMRES within subgroups of cores. Therefore, the inner solver serves as a variable preconditioner for the outer solver and entails less communication overhead. The two-level scheme can be extended to multiple levels with multiple solvers, leading to the hierarchical Krylov scheme. Meanwhile, a nested version was also presented in . The inner solvers in the nested scheme apply inexact Krylov algorithms to the global system instead of local systems.

3.2 Deflation and augmentation

The convergence of GMRES() is often hampered by eigenvalues of small magnitude and loss of information at the end of each cycle. Deflation techniques can eliminate certain small eigenvalues or move certain small eigenvalues away from the origin. Note that the word “deflation” has multiple meanings in the literature which may lead to confusion. Augmentation amounts to considering an -dimensional subspace  with , adding information from previous cycles to the search space.

Let us write

 Zm=[z1,…,zm]=[v1,…,vm1,u1,…,um2], (22)

where the first vectors are the orthonormal Arnoldi vectors. In 1995, Morgan  presented an augmented GMRES algorithm, later called GMRES-E in , where are chosen as the harmonic Ritz vectors corresponding to the harmonic Ritz values of smallest magnitude; see also [55, 218] for more analysis of this approach. Given the relation

 AZm=Vm+1¯¯¯¯Hm, (23)

where

forms a set of orthonormal vectors, it is known that solving the harmonic eigenvalue problem

 ZTmATZmyi=1θiZTmATAZmyi

is equivalent to solving

 (Hm+h2m+1,mH−Tmeme⊺m)yi=θiyi, (24)

with . Morgan  in 2000 proposed a mathematically equivalent but more efficient algorithm, commonly called GMRES-IR, which uses the implicit restarting technique with unwanted harmonic Ritz values as shifts to generate augmented subspaces (see ). A particularly important observation is that the augmented subspace can be formulated as a Krylov subspace with a carefully chosen starting vector , that is,

 span{^r0,A^r0,…,Am−1^r0}=span{r0,Ar0,…,Am1−1r0,u1,…,um2}.

We mention that Le Calvez and Molina  developed independently an implicitly restarted GMRES algorithm, which, however, resorts to a different eigenvalue problem instead of (24). Further, Morgan  revisited GMRES-IR in 2002 and found that there was room for improvement. The so-called GMRES with deflated restarting (GMRES-DR) algorithm , based on the thick restarting approach (see ), can be simpler and more stable than GMRES-IR; see also  for a slightly modified variant. Besides, there are many situations where variable preconditioners are worth to be considered, and thus a flexible variant of GMRES-DR (FGMRES-DR) was suggested in . We point here to the more recent work of Liu et al.  who applied polynomial preconditioning to GMRES-DR. In addition, the effectiveness of GMRES-DR was studied by Morgan et al. 

. The main idea is that the approximate eigenvectors generated by GMRES-DR can be seen as pseudoeigenvectors. It was shown in

 by examples that GMRES-DR can also work for highly nonnormal matrices.

Some deflation strategies are highly related to preconditioning. Kharchenko and Yeremin  in 1995 suggested the use of low-rank transformations by means of left and right Ritz vectors to construct a right preconditioner. As a result, extremal eigenvalues would be translated into a vicinity of one. In 1996, Erhel et al.  employed a different approach to estimate the invariant subspace and constructed a right preconditioner that can move the small eigenvalues to a multiple large eigenvalue. A similar idea was later suggested by Baglama et al. . They described two algorithms in the context of left preconditioning, which exploit implicit restarting to improve flexibility. Moreover, an improved algorithm built on the previous work [191, 94, 12] was introduced in ; see also  for an adaptive strategy for determining the restart frequency.

In 1996, de Sturler  proposed an inner-outer algorithm with augmented basis, called GCRO, which was extended to the so-called GCROT algorithm in  by incorporating truncation strategies. A beautiful account of Morgan’s work, deflation by preconditioning, and GCROT can be found in the survey . It was observed by Baker et al.  in 2005 that GMRES() has alternating behavior, namely, the residual vector often alternates direction at the end of each cycle, leading to slow convergence and even stalling. They gave some remedies to overcome this problem by combining the ideas of augmented subspaces  and reserved error approximations , resulting in the LGMRES() algorithm.

Algorithm 8 satisfies (23) and generalizes GMRES-E to a GCRO-like vector selection strategy. Notice that  can be considered as an approximation to the exact error, which is the key idea involved in both GCRO and LGMRES; see [64, 17]. We refer the reader to  for further details and numerical experiments; see also  for a similar approach. On the other hand, combining the main features of GMRES-DR and GCRO, in 2006, an algorithm called GCRO-DR was derived in  for sequences of linear systems. When solving a single linear system, GCRO-DR is mathematically equivalent to GMRES-DR.

It was Gutknecht  who attempted to present deflated and augmented Krylov subspace methods in a common framework. Also, Gutknecht investigated the breakdown conditions of deflated GMRES; we also refer the interested reader to the companion paper . These two papers might provide good starting point for further investigations of deflation and augmentation.

3.3 Weighted inner products

In 1998, it was shown by Essai  that a suitably chosen inner product could improve the performance of restarted MGS-GMRES. Let  with . The -inner product is defined as for all , and then the associated -norm is . From an algorithmic viewpoint, for example in Algorithm 1, one only needs to replace lines 5 and 7 with the new inner product and norm. The resulting algorithm shall be denoted by WGMRES. In , the diagonal matrix  was chosen as  and updated at each restart.

Preconditioned variants were studied in . Later, a modified version motivated by the idea of  was proposed in , in which are chosen randomly and there is no need to use Givens rotations. In , WGMRES was combined with the augmentation technique, and then the new variant was compared with LGMRES as illustrated in Algorithm 8. On the other hand, Embree et al.  in 2017 applied weighting to GMRES-DR and observed that the mixed scheme could be better than using deflation alone or weighting alone. In 2014, Güttel and Pestana  proposed a new variant that applied the orthogonalization process to the transformed matrix  and starting vector  with respect to the Euclidean inner product, which is mathematically equivalent to WGMRES. For more insight in nonstandard inner products, see [209, 134, 92] and references therein.

3.4 Inexact matrix-vector products

An inexact process might be beneficial if the computation of the matrix-vector product  is time-consuming. From a mathematical point of view, an inexact matrix-vector product can be modeled by  where is a perturbation matrix at th iteration. From (10), one obtains that

The inexact GMRES algorithm was first proposed in a technical report by Bouras and Frayssé  in 2000 (later published in  in 2005), and then analyzed by Simoncini and Szyld [241, 242], van den Eshof and Sleijpen , Giraud et al. , and Sifuentes et al. . A relaxation strategy on the inner accuracy of GMRES can be found in . It was shown in  that the early matrix-vector products must be computed with high accuracy, but can be relaxed as the iteration progresses, which explains the term “relaxation”. An interesting review of inexact GMRES was given in ; see  for performance evaluation. In 2013, Dolgov 

employed relaxation techniques to improve the performance of tensor GMRES; see

[20, 72] for more details on tensor formats.

3.5 Mixed precision

On modern computer architectures, single precision arithmetic (32 bits) is cheaper than double precision arithmetic (64 bits) in the sense that communication and computation costs grow with the size of the floating point format . In the 2008 revision of the IEEE standard, half precision (16 bits) was defined as a storage format. Mixed-precision numerical algorithms can date back to more than five decades ago . Recent advances in hardware-level support (e.g., NVIDIA V100 GPU) for low-precision arithmetic and the increasing impact of communication costs have promoted the reuse of mixed-precision techniques; see  for a recent survey.

It is very natural to extend Algorithm 2 to the case of mixed precision. Here, we only consider the two-precision algorithm, which exploits single precision everywhere, except that

1. The original system must be stored in double precision;

2. Although could be computed in single precision, the update of  should be done in double precision;

3. For the restarted version, the computation of residual vector should also be done in double precision.

A recent paper by Gratton et al.  showed that using low precision in MGS-GMRES could achieve the same convergence rate and final accuracy as the full precision variant. They provided a theoretical analysis for the non-restarted algorithm by means of inexact inner products; see also . Then, Lindquist et al.  focused on the experimental aspects of restarted mixed-precision GMRES algorithms, including MGS-GMRES and a variant based on CGS with one reorthogonalization (CGS2); see [117, 116] for error analysis of CGS2.

In Algorithm 9, we illustrate the CGS2-Arnoldi procedure. Motivated by the idea in , a theoretical result for CGS2-GMRES was provided in , along with more experimental results.

Another interesting class of mixed-precision algorithms involves iterative refinement where low-precision GMRES is employed as inner solver. In 1992, Turner and Walker  showed that GMRES() can be regarded as an iterative refinement process. In 2009, Arioli and Duff  proved that the FGMRES algorithm preconditioned by low-precision factorization is backward stable.

We sketch GMRES-based iterative refinement (GMRES-IR) in Algorithm 10; see, e.g., [51, 175] for more details. Note that this is just a simple example illustrated using two precisions, for which three or more precisions could also be exploited by relaxing some of the operations. Carson and Higham  in 2017 gave a new forward analysis of iterative refinement and suggested the use of GMRES-IR. Then, in , they introduced and analyzed an iterative refinement scheme that can possibly exploit three different precisions. It was shown that GMRES-IR could provide the required relative accuracy, even though the LU factorization in single precision might be of low quality. The error analysis in [50, 51] was recently generalized in  to the case where the GMRES solver and matrix-vector multiplications in line 5 may proceed in independent precisions, yielding a possibility of using five precisions. On the other hand, the overflow and underflow problems associated with mixed-precision algorithms were investigated by Higham et al. .

The survey article  provides additional details and references on mixed-precision techniques. Moreover, although not directly connected to GMRES, the studies in  and  can deepen our insight into mixed-precision orthogonalization algorithms.

4 Parallel algorithms

On modern computer architectures, communication is expensive relative to computation. Sparse matrix-vector products (SpMVs) and dot products, especially the latter, often limit the possible speedups for parallel algorithms. Therefore, the high cost of global reduction operations in the orthogonalization process makes GMRES hard to parallelize. To alleviate performance bottleneck, much work has focused on ways to reduce or hide communication costs. Early work concerning the parallelization of GMRES can be found in, e.g., [277, 71, 232, 60]; see also  for a general discussion on parallel algorithms. Here we focused on the -step GMRES algorithm, and then present the idea of pipelining. These approaches have gained popularity in the last decade. Finally, some other potential strategies are included at the end of this section.

4.1 Matrix powers and polynomial basis

In  and some subsequent publications [70, 187, 141, 21], a Krylov subspace algorithm is thought to contain several “kernels”, each kernel being an important and time-consuming building block of that algorithm. In GMRES, matrix powers computation is thus considered as a kernel, which computes vectors . The most trivial implementation requires messages between each processor and its neighbors, whereas a well-designed matrix powers kernel (MPK) may require only message at the cost of redundant computation and storage consumption; see , in which data dependencies and performance models are discussed in terms of both distributed and shared memory systems. Nonetheless, this technique has several disadvantages. First, MPK introduces additional overheads. Second, it’s still difficult to apply effective preconditioners with MPK [141, 285]. Third, basis vectors will become more and more linearly dependent .

The last drawback can be partially corrected by using polynomial basis, that is,

 Ws=[w1,…,ws]=[ϕ0(A)w,ϕ1(A)w,…,ϕs−1(A)w],

where is a polynomial of degree . Choosing

 ϕj(z)=zj (25)

yields the monomial basis, which converges to the eigenvector corresponding to the dominant eigenvalue of , and thus, as mentioned above, will rapidly become numerically rank deficient. Two alternatives have been suggested. One is to use Newton polynomials

 ϕ0(z)=1,ϕj(z)=ϱj(z−θj)ϕj−1(z), (26)

as suggested by Bai et al.  in 1994, in which an -step GMRES algorithm was proposed. Here, are scaling factors, which are generally taken as , and are eigenvalue estimates computed by a few iterations of the Arnoldi process. The other is Chebyshev polynomials, which can be defined by

 ϕ0(z)=1,ϕ1(z)=12γ(z−ζ)ϕ0(z),ϕj(z)=1γ((z−ζ)ϕj−1(z)−τ24γϕj−2(z)), (27)

where the spectrum is assumed to be circumscribed by the rectangle  with the imaginary unit. Then, we choose and , so that are the foci of the ellipse for the spectrum. Chebyshev polynomials for GMRES were discussed by Joubert and Carey [153, 154] in 1992; more details of Krylov subspace bases can be found in [141, 210, 49].

4.2 s-step GMRES

The idea of performing iterations at once can date back to the early 1950s, as quoted in . In 1990, Chronopoulos and Kim  developed an -step GMRES algorithm using monomial basis. A Newton basis -step GMRES algorithm was considered by Bai et al. , in which modified Leja ordering and basis scaling were employed to enhance robustness. They also suggested the use of parallel QR factorization, and this was implemented by Erhel  in 1995 by means of the so-called RODDEC algorithm. On the other hand, Joubert and Carey [153, 154] in 1992 considered an -step GMRES algorithm using Chebyshev basis along with an MPK for 2D regular meshes; see also . In 2009, Mohiyuddin et al.  introduced an improved variant, for which a new kernel called tall skinny QR was derived to achieve the optimal reduction of communication while maintaining the accuracy; see . Then, the thesis by Hoemmen  gave a complete treatise of -step GMRES.

Let be an  matrix such that

 AWs=Ws+1¯¯¯Bs. (28)

It is called a basis conversion matrix, by which the three bases with respect to (25)–(27) can be exploited in an elegant manner. It was shown in  that the QR factorization of

could produce the same orthogonal matrix as that in (

10). Then, it follows that

 Ws=VsTs, (29)

where is an upper triangular matrix. Along with (10a) and (28), this implies

 ¯¯¯¯Hs=Ts+1¯¯¯BsT−1s.

The above formulation can potentially bring two benefits. First, one could compute with less communication than in the classical case. Second, the QR factorization could be performed in parallel.

The tall skinny QR (TSQR) algorithm studied in [141, 67] is a good choice for reducing communication when there are many more rows than columns. Let be vertically partitioned into  submatrices such that the number of rows of each submatrix is larger than . Then, we perform the QR factorization of each block entry, that is,

 Ws=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝Ws,1Ws,2⋮Ws,ν0⎞⎟ ⎟ ⎟ ⎟ ⎟⎠=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝Q(0)1R(0)1Q(0)2R(0)2⋮Q(0)ν0R(0)ν0⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝