1 Introduction
Generalized minimal residual (GMRES) algorithms are widely used for solving nonsymmetric linear systems arising from partial differential equations. They possess some optimality property and behave reasonably well in practice. The first and most wellknown algorithm is that of Saad and Schultz, which was initially introduced in a technical report in 1983, and formally published in 1986
[223]. Since then, numerous variants appeared, as well as studies of their convergence and accuracy. In this paper we concentrate on GMRES algorithms. We sketch the main developments and focus on the algorithmic innovations in the past 35 years.Consider the nonsingular linear system
(1) 
where and . Given an initial approximation , a projection method constructs a sequence of approximate solutions , , such that
where
is the residual vector and
and are dimensional subspaces, called respectively search space and constraint space. As is nonsingular, is uniquely defined. In general, we build a sequence of nested search spaces(2) 
to ensure finite termination; see, e.g., [167]. A minimal residual method obtains the optimal approximation by minimizing the residual norm over all candidate vectors in the search space
(3) 
where denotes the 2norm or the corresponding induced matrix norm. This class of methods can be interpreted as a projection process with . In other words, they find an optimal correction in , such that is the orthogonal projection of onto ; see [81, 219, 80] for further analysis. The Krylov subspace is the most broadly employed search space and is defined by
(4) 
These subspaces are nested in the sense of (2) and can be built up gradually using only matrixvector multiplication by . The residual lies in the next Krylov subspace . If denotes the grade of with respect to , then holds. Therefore, the sequence of Krylov subspaces will eventually become invariant. A projection method with is called a Krylov subspace method. Practically a Krylov subspace method is terminated as soon as the approximate solution is good enough. Ways to formulate typical Krylov subspace methods and derivations of their properties have been extensively discussed in the literature; see, e.g., [155, 105, 121, 220, 266, 243, 167, 186] and references therein. A comparison of Krylov subspace methods can be found in a recent work [112]; see also [167, 222] for a historical perspective.
Classical GMRES algorithms realize the socalled minimal residual Krylov subspace method, or loosely called GMRES method, which is a projection process satisfying (3) and taking (4) as search space. In this case, the projection process can be rewritten as follows:
(5a)  
(5b) 
or equivalently,
(6) 
We shall consider both sequential and parallel GMRES algorithms. Some variants such as restarted or hybrid algorithms exhibit more complex behavior and may no longer possess the finite termination property. Other crucial aspects that can dramatically affect the performance of Krylov subspace algorithms include the finite precision effect and communication costs; see [54] for a general insight on the cost of the iterative computation and its role in Krylov subspace methods. In what follows these topics shall be discussed to some extent. We do not talk here about generalpurpose preconditioning techniques like incomplete factorization and algebraic multigrid, since these topics deserve individual coverage by themselves, for which we refer the reader to [27, 281]; see also [112] for experiments. For the similar reason, problemspecific techniques will also not be covered, though they can be very successful when prior knowledge of the source problem is available; see, e.g., [201, 109, 208]. We only mention certain techniques that are closely related to GMRES iterations. Throughout most of the paper we restrict to the nonsingular linear system (1). In Section 5, however, we briefly mention some work where this restriction is lifted.
The paper is organized as follows. We begin, in Section 2, by formulating basic GMRES algorithms, giving some equivalent formulations, and presenting theoretical results in both exact and finite arithmetic. In Section 3 we summarize various strategies for improving classical algorithms. In Section 4 we discuss parallel techniques that can reduce or hide communication overhead. Sections 5 is devoted to a brief literature review of related topics. Finally, concluding remarks are draw in Section 6.
2 Basic algorithms and convergence
The techniques introduced in this section form the basis for subsequent discussions. In particular, the algorithm developed by Saad and Schultz [223] is still in common use today. Methodologies have been developed to partially characterize their convergence behavior. Note that the literature on Krylov subspace methods uses the term “GMRES” almost as a synonym for the specific algorithm developed in [223]
. For the sake of clarity, here the latter is referred to as “MGSGMRES”, as the modified GramSchmidt process is applied in
[223] to orthogonalize successive basis vectors of (4). Later, the distinction between the two terms shall often be blurred because in the literature GMRES is almost exclusively implemented by means of a GramSchmidtlike process.2.1 MgsGmres
Saad and Schultz proposed the first GMRES algorithm in [223], which uses the Arnoldi process [9] to generate an orthonormal basis of the Krylov subspace (4). The key relation used in the Arnoldi process is
(7) 
where and are selected such that is normalized and orthogonal to all the previous basis vectors, that is,
(8)  
(9) 
where denotes the Euclidean inner product. The above rules correspond directly to the classical GramSchmidt (CGS) orthogonalization process, leading to the socalled CGSArnoldi algorithm. In practice the computation of by (7)–(9) undergoes a severe cancellation. Special precautions, such as the reorthogonalization [61, 138] and modified GramSchmidt (MGS) processes, have been recommended to ensure numerical stability. The former is costly and generally used in the context of parallel algorithms [141, 256]
, eigenvalue problems
[221], and mixed precision [172], while MGS has the same operation count as CGS, but with better numerical properties; see [162] for a thorough presentation of GramSchmidt orthogonalization. Arnoldi with MGS orthogonalization (MGSArnoldi) [215] can be viewed as the standard version of Arnoldi iteration.In Algorithm 1, we can see that lines 3–6 are mathematically equivalent to (7)–(8). Let denote the MGS projector. Then, according to Algorithm 1, one finds . Recalling the grade of defined in the preceding section, we observe that (line 7) if and only if , that is, the maximal dimension of the Krylov subspace is attained. In what follows we shall assume .
Let us write
where denotes the
th column of the identity matrix of appropriate order and
for . Therefore, and are upper Hessenberg matrices. It follows that(10a)  
(10b)  
(10c) 
which can be interpreted as an orthogonal reduction of to upper Hessenberg form. Let denote the coordinate vector of the correction in the basis . Then it follows from the minimal residual process (5)–(6) that
(11)  
where . Saad and Schultz [223] suggested using Givens rotations that reduce the upper Hessenberg matrix recursively to the upper triangular matrix.
We show MGSGMRES as Algorithm 2, where denotes the residual norm, denotes the th entry of vector and refers to the subvector with elements indexed by through . Lines 3–7 are the same as in Algorithm 1, followed by Givens rotations
to maintain a QR factorization of the upper Hessenberg matrix , which involves all the previous rotations (lines 8–10) and a new rotation (lines 11–13) to annihilate the subdiagonal entry. In particular, and correspond to the sine and cosine of the th rotation angle such that ; we refer the reader to [31] for a more robust implementation of Givens rotations. An important byproduct is that the residual norm can be readily obtained at the end of each step (line 13). Note that the CGSbased GMRES (CGSGMRES) algorithm can be easily derived by moving the vector update operations in line 5 out of the forloop. The CGS projector can be written as .
In [223], it was proved that MGSGMRES never breaks down. Besides, in a remarkable work by Paige et al. [204], it was proved that MGSGMRES is backward stable. By modifying the constraint (5b) as , we can get the socalled FOM method [216]. A detailed account of relations between GMRES and FOM can be found in [38, 59]; see also [220]. FOM has received far less attention partly because the corresponding algorithms can break down. However, it was shown in [38] that for the cases FOM performs poorly, nor will GMRES offers much help. In this situation, the latter may still be preferred due to its minimal residual property. GMRES has also been compared with CMRH and Lanczosbased methods; see [227, 229, 140].
Algorithm 2 uses long recurrences to orthogonalize against all the previous basis vectors, the cost of which grows quadratically with the number of steps. Saad and Schultz [223] considered restarting and truncation as remedies. The truncation strategy keeps only a small number of previous basis vectors in the orthogonalization phase, and computes the approximate solution in a progressive manner; see [224]. In the restarted version, a parameter is introduced to limit the size of the Krylov subspaces. The resulting scheme outlined in Algorithm 3 is called restarted GMRES, denoted by GMRES(). Practically we use the restarted scheme on the top level, and a specific GMRES algorithm at the lower level; see [104] for an efficient implementation with variable orthogonalization schemes.
Unfortunately, the desired properties such as finite termination and residual minimization are lost. Both strategies may result in poor convergence or even stagnation; see, e.g., [99, 251, 65, 17, 115, 249]. Moreover, using a large may not necessarily lead to fast convergence (see [81, 90]). We shall discuss other potential acceleration strategies in Section 3.
2.2 HhGmres
The minimal residual Krylov subspace method can be completely described by (5) from a mathematical point of view. From a computational point of view, however, mathematically equivalent algorithms may have quite different numerical behavior. Since at that time the backward stability of MGSGMRES [204] was still unknown, motivated by numerical stability concerns, Walker [277] in 1988 proposed an algorithm that uses Householder reflections to orthogonalize the basis vectors; see also [278].
Algorithm 4 depicts the Householderbased GMRES (HHGMRES) algorithm. Here, returns or depending on the sign of and denotes the th element of . The signs (lines 1 and 4) are chosen to reduce the risk of subtractive cancellation. Line 4 corresponds directly to Householder reflections; the details can be found in [118, 220]. HHGMRES gradually generates a QR factorization
By discarding the zero rows of the upper triangular matrix and the first column of both sides, we can obtain the Arnoldi relation (10). The Householder vector contains zeros in the first rows (line 3) such that the Householder transformation can leave the previous vectors
unchanged. For the solution of line 8 and the estimate of the residual norm at each iteration step, the same techniques used in Algorithm
2 can be applied. Note that we only need to store the ’s instead of the ’s. We also prefer not to explicitly store the ’s but use the Qin Jiushao’s scheme (Horner’s scheme) to evaluate line 9 in a recursive manner; see, e.g., [220].Similar to MGSGMRES, the cost of HHGMRES increases dramatically as the number of steps increases. Therefore, the restarted scheme in Algorithm 3 is commonly employed as an outer loop. HHGMRES was proved earlier to be backward stable (see [73]), but can be roughly two times more expensive than MGSGMRES.
2.3 Simpler GMRES
In 1994, Walker and Zhou [280] proposed another variant of the GMRES method, called simpler GMRES (SGMRES), which does not require the QR factorization of an upper Hessenberg matrix. Although rarely used in practice due to stability issues, this algorithm can enhance our understanding of minimal residual iterations.
We present a generalized version as given in [150] that contains the essential features of SGMRES. Consider a nonorthogonal basis of and an orthonormal basis of . We construct a matrix relation
(12) 
such that is an upper triangular matrix. According to the constraint (5b), one finds , yielding the following update:
(13) 
In addition, since with some ndimensional vector , it follows that . Combining (5b) and (13), one obtains that
The generalized simpler GMRES framework (see [150]) is listed in Algorithm 5. Choosing results in SGMRES, which was formalized by virtue of both MGS and Householderbased processes in [280]. Note that in the Householder variant there is no need to store the ’s and thus line 4 should be modified accordingly; see [280] for details. The choice was introduced in [150], leading to a slightly different algorithm called residualbased simpler GMRES (RBSGMRES). Then, an adaptive variant combining the two choices was proposed in [149], following the rule
(14) 
with . Notice that and correspond to SGMRES and RBSGMRES, respectively.
Theoretical aspects are collected in [280, 163, 164, 56, 150, 149], which provides bounds for the condition number of and allows for a better understanding of the unstable behavior of Algorithm 5; see also [186]. It’s interesting to observe that, along with the results in [128], SGMRES can be employed to describe GMRES convergence; see [163].
2.4 Equivalent formulations
We briefly review some iterative schemes that are in some sense mathematically equivalent to GMRES. Paige and Saunders [205] in 1975 proposed the MINRES algorithm satisfying (6) for solving symmetric indefinite systems, which motivated the development of MGSGMRES. In 1982, Elman [88, 83] proposed the GCR algorithm which amounted to generalizing the conjugate residual algorithm (see, e.g., [220]) to the nonsymmetric case.
In Algorithm 6, the term can be updated by means of the formula in line 6 and the value of without an additional matrixvector multiplication. GCR generates a set of orthogonal basis vectors spanning the Krylov subspace (4). These vectors can be alternatively updated by
(15) 
leading to the socalled ORTHODIR algorithm, which was proposed by Jea and Young [291, 147]. A truncated version of GCR was given earlier in [272], and both GCR and ORTHODIR can be formulated in a cyclic manner like Algorithm 3 to reduce the associated work and storage. Nonetheless, GCR can fail if the symmetric part of is indefinite and the implementation of (15) can cause stability problems, while it is known that MGSGMRES is a cheaper and more robust alternative; see, e.g., [223]. Similar reasoning holds for another early variant proposed by Axelsson [10]. It was mentioned in [280] that there is a close relationship between SGMRES and ORTHODIR; see also [150], in which it was also shown that RBSGMRES is closely related to GCR.
An early algorithm that realized (6) was already described by Khabaza [156] in 1963, which used as the basis vectors and thus suffered from numerical instabilities. In [82], Eirola and Nevanlinna formalized a quasiNewton method, later called EN method, that approximates by rankone updates. It was shown in [82, 275] that a variant of EN is mathematically equivalent to GMRES; see also [268] for additional comments. More recent examples of equivalence relations involving quasiNewton methods can be found in [135, 136]. In 1988, on the other hand, Sidi gave a proof in [233] that an acceleration technique called reduced rank extrapolation is mathematically equivalent to GMRES; see [245] for a survey on vector extrapolation methods. Later, it was established by Walker and Ni [279] that another technique called Anderson acceleration is equivalent to GMRES in some sense; we refer the reader to [279, 37] for more details on Anderson acceleration and its relation to GMRES.
2.5 Convergence behavior
An important observation for studying GMRES convergence is that any vector in the Krylov subspace (4) can be written in polynomial form, which implies
where is the set of all polynomials of degree at most such that . Then, (6) can be expressed as
(16) 
Let denote the eigenvalues of a matrix and . Assume that is diagonalizable so that with . The residual norm of GMRES satisfies
(17) 
where denotes the condition number of . Note that being normal implies , in which case the bound in (17) is sharp (see [124, 152]). This bound was presented in 1982 by Elman [88] in the context of the socalled GCR algorithm; see also [83, 223]. It was also proved in [88] that if is positive definite, then we have
where and are respectively the smallest and largest eigenvalues of a matrix in the absolute sense. This bound was improved two decades later in [25, 24]. Now, it follows from (16) that
(18) 
In general, is called worstcase GMRES value, as its purpose is to study the worstcase behavior of the GMRES method. It is known that is attainable by the residual norm, that is, for each there exists an initial vector such that (see, e.g., [258]). is called ideal GMRES value, which was introduced in [129] to totally exclude the influence of the initial vector. It was proved in [129, 170] that has a unique minimizer, while as discussed in [100, 169], the polynomial and unitnorm vector associated with may not be uniquely determined. Moreover, if is normal, then holds (see [124, 152]); otherwise, there exist examples for which (see [99, 260, 100]). We refer the reader to [168, 169, 258, 100] and their references for further details on (18).
The field of values plays a beneficial role in convergence analysis, which is defined as
Assume that the origin is outside and let
Then, the following bound holds:
(19) 
see [253, 80] for proofs based on the worstcase GMRES approximation; see also [171] where the bound in (19) was proved to be hold for the ideal GMRES approximation. Besides, an early account of the field of the values applied to convergence analysis can be found in [79]. We refer the reader to the most recent work [28] and the references therein for the use of the field of values for convergence analysis. The polynomial numerical hull introduced in [200] can be seen as a generalization of the field of values, which is defined as
from which a lower bound of the ideal GMRES approximation can be easily derived. However, the determination of is very difficult in its full generality. We refer to [122, 98, 123, 258] for related works. Another group of results is obtained via the pseudospectrum [261]. Let denote the identity matrix of appropriate size. For a real , the pseudospectrum of is the set
from which an upper bound of can be obtained; see, e.g., [198, 262] for more details.
Linear contraction bounds for the residual norms can be somewhat misleading since a highly nonlinear convergence behavior can not be adequately described by a linear contraction. The book [167] written by Liesen and Strakoš is very useful for getting more insight in the cost of computations (see also [54]), which highlights the inherent nonlinear nature of Krylov subspace methods. In [89], an experimental comparison of GMRES convergence bounds based on eigenpairs, field of values and pseudospectrum for nonnormal matrices can be found, but none of these can be consistently better than the others. In order to get more descriptive results, TitleyPeloquin et al. [259] in 2014 derived several GMRES bounds that involve the initial residual vector; see also [209] for an analysis with respect to nonstandard inner products. More recently, Sacchi and Simoncini [226] gave a more descriptive convergence analysis specifically for the case of localized illconditioning.
It was confirmed in a series of papers by Arioli, Greenbaum, Pták and Strakoš [128, 126, 8] from 1994 to 1998 that convergence of GMRES for nonnormal matrices could not be determined solely by the distribution of eigenvalues. In [128], Greenbaum and Strakoš studied the matrices for which the spaces and are the same. As a result, the convergence history generated for the pair by GMRES is the same as that generated for the pair. The matrices that satisfies this property are called GMRESequivalent matrices (see, e.g., [163, 78]). Among other results, Greenbaum and Strakoš concisely state in [128] on GMRES convergence that
Any behavior that can be seen with the method can be seen with the method applied to a matrix having any nonzero eigenvalues.
This has been extended by Greenbaum et al. [126] along a different line, who conclude that
Any nonincreasing convergence curve can be obtained with GMRES applied to a matrix having any desired eigenvalues.
Finally, a complete parametrization of all pairs with prescribed eigenvalues for which GMRES generates prescribed residual norms has been given in [8]; see also [183] for further discussion. GMRESequivalent matrices have been employed by some subsequent studies [163, 78], while investigations of “any behavior is possible”, like that in [128, 126, 8], have been continued in the past ten years, concerning Ritz values [75, 76], harmonic Ritz values [74], restarted GMRES [271, 77], and block GMRES [158]. Note that Ritz and harmonic Ritz values are respectively the roots of FOM and GMRES residual polynomials; see [119, 185]. We also mention [183] for a generalization to other Krylov subspace methods.
Eigenvalue clusters associated with a parameterized model were employed in [47] to explain GMRES convergence. Exact but complicated expressions for residual norms were derived in [144, 165, 228]; see also [186]. Inspired by the work [265] on the conjugate gradient method, van der Vorst and Vuik [267] in 1993 investigated the superlinear convergence behavior of GMRES by means of Ritz values. Simoncini and Szyld [242] addressed the same problem using spectral projectors. Moret [189] discussed superlinear convergence of GMRES in a complex separable Hilbert space; see also [33] for a generalization of Moret’s result. Since any nonincreasing convergence curve is possible for GMRES [126], on the other hand, stagnation may occur even for the original GMRES method without restarting. This was first investigated by Brown [38] in 1991, and was then discussed in [293, 169, 244, 237, 182, 184]. Although most work on convergence analysis considered full GMRES, the restarted version is more practical, on which we refer the reader to, e.g., [151, 294, 295, 16, 270, 271, 77] for more details.
The above discussion assumes exact arithmetic. In finite precision arithmetic, it was proved by Drkošová et al. [73] in 1995 that HHGMRES is backward stable. In 2006, it was proved by Paige et al. [204] that MGSGMRES is also backward stable; previous observations can be found in [127, 206]. Due to the illconditioning of the matrix in (12), SGMRES has serious stability problems (see, e.g, [164, 56]), which can be partially remedied by using RBSGMRES [150] or the adaptive variant (14) [149]; see also the experiments in [180].
3 Acceleration strategies
The focus in this section is on the acceleration strategies to tackle the challenge that a basic GMRES algorithm might face. First, it is almost certain that the use of preconditioners will be beneficial for Krylov subspace methods when solving illconditioned problems. There are also numerous publications concerning deflation and augmentation, some of which are highly related to preconditioning. Then we briefly introduce weighted inner products and inexact matrixvector products, and end up this section by discussing mixedprecision techniques.
3.1 Preconditioning
Preconditioning of the linear system is used to improve the convergence behavior of Krylov subspace methods. One needs to find a nonsingular matrix close to in some sense, commonly called preconditioning matrix or preconditioner, such that the new system involving should be inexpensive to solve, that is,
(20)  
(21) 
The cases (20) and (21) correspond to left preconditioning and right preconditioning, respectively. Another form called split preconditioning can be employed when nearly symmetric systems are encountered; see, e.g., [220]. For GMRES, right preconditioning is often preferred because the residuals in this case are identical to the true residuals, only the solution update formula being transformed to , while in another case left preconditioning leads to preconditioned residuals . In addition, from an algorithmic point of view, one only needs to transform line 3 in Algorithm 2 to for the case (20) and for (21). Highquality preconditioners are often constructed by incorporating problemspecific information; see, e.g., [201, 95, 96, 109, 208]. On the other hand, generalpurpose preconditioners have also been broadly exploited, such as incomplete factorization, approximate inverse, and algebraic multigrid; see [27, 281] and references therein.
The flexible MGSGMRES variant proposed by Saad [217] in 1993 allows the preconditioner to vary at each step, which shall be denoted by FGMRES as widely used in the literature.
As seen in Algorithm 7, should be explicitly stored in , and then reused in line 10. Therefore, the Arnoldilike relation holds. Unlike standard right preconditioning, which constructs an orthonormal basis of the Krylov subspace
the subspace of solution estimates in FGMRES is no longer a standard Krylov subspace. In addition, may be singular even if is nonsingular, and thus the assumption of nonsingularity of must be made when deducing exact solution from ; see [217, 220] for more discussion. A very detailed analysis of FGMRES, in comparison with right preconditioned MGSGMRES, can be found in [7]. More recently, Greif et al. [130] developed the multipreconditioned GMRES scheme in 2017, which shares some similarities with the block version of FGMRES [43, 42], but targets (1) instead of systems with multiple righthand sides. In practice, a restarted procedure should be used in replacement of the simple version in Algorithm 7; see [103] for details on the implementation of FGMRES.
Another interesting but less used variable preconditioning technique proposed by van der Vorst and Vuik [268] in 1994 consists of the GCR algorithm as the outer iterations and a GMRES algorithm as the inner iterations. The inner algorithm constructs a preconditioner for the outer procedure. This new scheme is called GMRESR, which was motivated by the EN algorithm [82]. Note that in GMRESR the residuals are preconditioned, while in FGMRES, as shown in Algorithm 7, the search directions are preconditioned. We refer to [274] for a comparison of FGMRES and GMRESR. Extensions of GMRESR have been developed in [64, 65] by requiring that the inner basis maintains orthogonality to the outer basis, possibly combining the truncation strategy to reduce storage costs.
The GMRES polynomial itself, defined as the polynomial in (16), can be used to derive preconditioners for Krylov subspace algorithms. Liu et al. [174] in 2015 developed a polynomial preconditioned GMRES variant using the GMRES polynomial. An improved approach was proposed in [176] by Loe and Morgan. The idea can be summarized as follows:

Run steps of MGSGMRES;

Compute the harmonic Ritz values (see (24)) which are the roots of the GMRES polynomial;

Order the roots and construct the polynomial;

Apply a GMRES algorithm to the polynomial preconditioned system.
We refer the reader to [174, 178, 177, 176] for more details and experiments. Polynomial preconditioners can be combined with other preconditioners and were highlighted as effective tools for parallel computing. Early contributions of hybrid or polynomial preconditioned GMRES algorithms can be found in [199, 254, 152, 269]. The stability problems arising in high degree preconditioners were addressed in [91, 176, 290].
In order to overcome scaling difficulties for large number of processor cores, two general iterative schemes were introduced in 2014 by McInnes et al. [181]. As an example, they presented a twolevel hierarchical FGMRES algorithm. The outer level applies FGMRES to the global problem, while the inner level executes inexact GMRES within subgroups of cores. Therefore, the inner solver serves as a variable preconditioner for the outer solver and entails less communication overhead. The twolevel scheme can be extended to multiple levels with multiple solvers, leading to the hierarchical Krylov scheme. Meanwhile, a nested version was also presented in [181]. The inner solvers in the nested scheme apply inexact Krylov algorithms to the global system instead of local systems.
3.2 Deflation and augmentation
The convergence of GMRES() is often hampered by eigenvalues of small magnitude and loss of information at the end of each cycle. Deflation techniques can eliminate certain small eigenvalues or move certain small eigenvalues away from the origin. Note that the word “deflation” has multiple meanings in the literature which may lead to confusion. Augmentation amounts to considering an dimensional subspace with , adding information from previous cycles to the search space.
Let us write
(22) 
where the first vectors are the orthonormal Arnoldi vectors. In 1995, Morgan [191] presented an augmented GMRES algorithm, later called GMRESE in [192], where are chosen as the harmonic Ritz vectors corresponding to the harmonic Ritz values of smallest magnitude; see also [55, 218] for more analysis of this approach. Given the relation
(23) 
where
forms a set of orthonormal vectors, it is known that solving the harmonic eigenvalue problem
is equivalent to solving
(24) 
with . Morgan [192] in 2000 proposed a mathematically equivalent but more efficient algorithm, commonly called GMRESIR, which uses the implicit restarting technique with unwanted harmonic Ritz values as shifts to generate augmented subspaces (see [160]). A particularly important observation is that the augmented subspace can be formulated as a Krylov subspace with a carefully chosen starting vector , that is,
We mention that Le Calvez and Molina [159] developed independently an implicitly restarted GMRES algorithm, which, however, resorts to a different eigenvalue problem instead of (24). Further, Morgan [193] revisited GMRESIR in 2002 and found that there was room for improvement. The socalled GMRES with deflated restarting (GMRESDR) algorithm [193], based on the thick restarting approach (see [283]), can be simpler and more stable than GMRESIR; see also [214] for a slightly modified variant. Besides, there are many situations where variable preconditioners are worth to be considered, and thus a flexible variant of GMRESDR (FGMRESDR) was suggested in [115]. We point here to the more recent work of Liu et al. [174] who applied polynomial preconditioning to GMRESDR. In addition, the effectiveness of GMRESDR was studied by Morgan et al. [195]
. The main idea is that the approximate eigenvectors generated by GMRESDR can be seen as pseudoeigenvectors. It was shown in
[195] by examples that GMRESDR can also work for highly nonnormal matrices.Some deflation strategies are highly related to preconditioning. Kharchenko and Yeremin [157] in 1995 suggested the use of lowrank transformations by means of left and right Ritz vectors to construct a right preconditioner. As a result, extremal eigenvalues would be translated into a vicinity of one. In 1996, Erhel et al. [94] employed a different approach to estimate the invariant subspace and constructed a right preconditioner that can move the small eigenvalues to a multiple large eigenvalue. A similar idea was later suggested by Baglama et al. [12]. They described two algorithms in the context of left preconditioning, which exploit implicit restarting to improve flexibility. Moreover, an improved algorithm built on the previous work [191, 94, 12] was introduced in [41]; see also [197] for an adaptive strategy for determining the restart frequency.
In 1996, de Sturler [64] proposed an innerouter algorithm with augmented basis, called GCRO, which was extended to the socalled GCROT algorithm in [65] by incorporating truncation strategies. A beautiful account of Morgan’s work, deflation by preconditioning, and GCROT can be found in the survey [81]. It was observed by Baker et al. [17] in 2005 that GMRES() has alternating behavior, namely, the residual vector often alternates direction at the end of each cycle, leading to slow convergence and even stalling. They gave some remedies to overcome this problem by combining the ideas of augmented subspaces [191] and reserved error approximations [64], resulting in the LGMRES() algorithm.
Algorithm 8 satisfies (23) and generalizes GMRESE to a GCROlike vector selection strategy. Notice that can be considered as an approximation to the exact error, which is the key idea involved in both GCRO and LGMRES; see [64, 17]. We refer the reader to [17] for further details and numerical experiments; see also [142] for a similar approach. On the other hand, combining the main features of GMRESDR and GCRO, in 2006, an algorithm called GCRODR was derived in [207] for sequences of linear systems. When solving a single linear system, GCRODR is mathematically equivalent to GMRESDR.
It was Gutknecht [132] who attempted to present deflated and augmented Krylov subspace methods in a common framework. Also, Gutknecht investigated the breakdown conditions of deflated GMRES; we also refer the interested reader to the companion paper [110]. These two papers might provide good starting point for further investigations of deflation and augmentation.
3.3 Weighted inner products
In 1998, it was shown by Essai [97] that a suitably chosen inner product could improve the performance of restarted MGSGMRES. Let with . The inner product is defined as for all , and then the associated norm is . From an algorithmic viewpoint, for example in Algorithm 1, one only needs to replace lines 5 and 7 with the new inner product and norm. The resulting algorithm shall be denoted by WGMRES. In [97], the diagonal matrix was chosen as and updated at each restart.
Preconditioned variants were studied in [48]. Later, a modified version motivated by the idea of [11] was proposed in [225], in which are chosen randomly and there is no need to use Givens rotations. In [202], WGMRES was combined with the augmentation technique, and then the new variant was compared with LGMRES as illustrated in Algorithm 8. On the other hand, Embree et al. [92] in 2017 applied weighting to GMRESDR and observed that the mixed scheme could be better than using deflation alone or weighting alone. In 2014, Güttel and Pestana [134] proposed a new variant that applied the orthogonalization process to the transformed matrix and starting vector with respect to the Euclidean inner product, which is mathematically equivalent to WGMRES. For more insight in nonstandard inner products, see [209, 134, 92] and references therein.
3.4 Inexact matrixvector products
An inexact process might be beneficial if the computation of the matrixvector product is timeconsuming. From a mathematical point of view, an inexact matrixvector product can be modeled by where is a perturbation matrix at th iteration. From (10), one obtains that
The inexact GMRES algorithm was first proposed in a technical report by Bouras and Frayssé [34] in 2000 (later published in [35] in 2005), and then analyzed by Simoncini and Szyld [241, 242], van den Eshof and Sleijpen [264], Giraud et al. [114], and Sifuentes et al. [236]. A relaxation strategy on the inner accuracy of GMRES can be found in [35]. It was shown in [241] that the early matrixvector products must be computed with high accuracy, but can be relaxed as the iteration progresses, which explains the term “relaxation”. An interesting review of inexact GMRES was given in [243]; see [235] for performance evaluation. In 2013, Dolgov [72]
employed relaxation techniques to improve the performance of tensor GMRES; see
[20, 72] for more details on tensor formats.3.5 Mixed precision
On modern computer architectures, single precision arithmetic (32 bits) is cheaper than double precision arithmetic (64 bits) in the sense that communication and computation costs grow with the size of the floating point format [1]. In the 2008 revision of the IEEE standard, half precision (16 bits) was defined as a storage format. Mixedprecision numerical algorithms can date back to more than five decades ago [188]. Recent advances in hardwarelevel support (e.g., NVIDIA V100 GPU) for lowprecision arithmetic and the increasing impact of communication costs have promoted the reuse of mixedprecision techniques; see [1] for a recent survey.
It is very natural to extend Algorithm 2 to the case of mixed precision. Here, we only consider the twoprecision algorithm, which exploits single precision everywhere, except that

The original system must be stored in double precision;

Although could be computed in single precision, the update of should be done in double precision;

For the restarted version, the computation of residual vector should also be done in double precision.
A recent paper by Gratton et al. [120] showed that using low precision in MGSGMRES could achieve the same convergence rate and final accuracy as the full precision variant. They provided a theoretical analysis for the nonrestarted algorithm by means of inexact inner products; see also [4]. Then, Lindquist et al. [172] focused on the experimental aspects of restarted mixedprecision GMRES algorithms, including MGSGMRES and a variant based on CGS with one reorthogonalization (CGS2); see [117, 116] for error analysis of CGS2.
In Algorithm 9, we illustrate the CGS2Arnoldi procedure. Motivated by the idea in [120], a theoretical result for CGS2GMRES was provided in [173], along with more experimental results.
Another interesting class of mixedprecision algorithms involves iterative refinement where lowprecision GMRES is employed as inner solver. In 1992, Turner and Walker [263] showed that GMRES() can be regarded as an iterative refinement process. In 2009, Arioli and Duff [6] proved that the FGMRES algorithm preconditioned by lowprecision factorization is backward stable.
We sketch GMRESbased iterative refinement (GMRESIR) in Algorithm 10; see, e.g., [51, 175] for more details. Note that this is just a simple example illustrated using two precisions, for which three or more precisions could also be exploited by relaxing some of the operations. Carson and Higham [50] in 2017 gave a new forward analysis of iterative refinement and suggested the use of GMRESIR. Then, in [51], they introduced and analyzed an iterative refinement scheme that can possibly exploit three different precisions. It was shown that GMRESIR could provide the required relative accuracy, even though the LU factorization in single precision might be of low quality. The error analysis in [50, 51] was recently generalized in [5] to the case where the GMRES solver and matrixvector multiplications in line 5 may proceed in independent precisions, yielding a possibility of using five precisions. On the other hand, the overflow and underflow problems associated with mixedprecision algorithms were investigated by Higham et al. [139].
4 Parallel algorithms
On modern computer architectures, communication is expensive relative to computation. Sparse matrixvector products (SpMVs) and dot products, especially the latter, often limit the possible speedups for parallel algorithms. Therefore, the high cost of global reduction operations in the orthogonalization process makes GMRES hard to parallelize. To alleviate performance bottleneck, much work has focused on ways to reduce or hide communication costs. Early work concerning the parallelization of GMRES can be found in, e.g., [277, 71, 232, 60]; see also [68] for a general discussion on parallel algorithms. Here we focused on the step GMRES algorithm, and then present the idea of pipelining. These approaches have gained popularity in the last decade. Finally, some other potential strategies are included at the end of this section.
4.1 Matrix powers and polynomial basis
In [69] and some subsequent publications [70, 187, 141, 21], a Krylov subspace algorithm is thought to contain several “kernels”, each kernel being an important and timeconsuming building block of that algorithm. In GMRES, matrix powers computation is thus considered as a kernel, which computes vectors . The most trivial implementation requires messages between each processor and its neighbors, whereas a welldesigned matrix powers kernel (MPK) may require only message at the cost of redundant computation and storage consumption; see [69], in which data dependencies and performance models are discussed in terms of both distributed and shared memory systems. Nonetheless, this technique has several disadvantages. First, MPK introduces additional overheads. Second, it’s still difficult to apply effective preconditioners with MPK [141, 285]. Third, basis vectors will become more and more linearly dependent [63].
The last drawback can be partially corrected by using polynomial basis, that is,
where is a polynomial of degree . Choosing
(25) 
yields the monomial basis, which converges to the eigenvector corresponding to the dominant eigenvalue of , and thus, as mentioned above, will rapidly become numerically rank deficient. Two alternatives have been suggested. One is to use Newton polynomials
(26) 
as suggested by Bai et al. [14] in 1994, in which an step GMRES algorithm was proposed. Here, are scaling factors, which are generally taken as , and are eigenvalue estimates computed by a few iterations of the Arnoldi process. The other is Chebyshev polynomials, which can be defined by
(27) 
where the spectrum is assumed to be circumscribed by the rectangle with the imaginary unit. Then, we choose and , so that are the foci of the ellipse for the spectrum. Chebyshev polynomials for GMRES were discussed by Joubert and Carey [153, 154] in 1992; more details of Krylov subspace bases can be found in [141, 210, 49].
4.2 step GMRES
The idea of performing iterations at once can date back to the early 1950s, as quoted in [102]. In 1990, Chronopoulos and Kim [58] developed an step GMRES algorithm using monomial basis. A Newton basis step GMRES algorithm was considered by Bai et al. [14], in which modified Leja ordering and basis scaling were employed to enhance robustness. They also suggested the use of parallel QR factorization, and this was implemented by Erhel [93] in 1995 by means of the socalled RODDEC algorithm. On the other hand, Joubert and Carey [153, 154] in 1992 considered an step GMRES algorithm using Chebyshev basis along with an MPK for 2D regular meshes; see also [66]. In 2009, Mohiyuddin et al. [187] introduced an improved variant, for which a new kernel called tall skinny QR was derived to achieve the optimal reduction of communication while maintaining the accuracy; see [67]. Then, the thesis by Hoemmen [141] gave a complete treatise of step GMRES.
Let be an matrix such that
(28) 
It is called a basis conversion matrix, by which the three bases with respect to (25)–(27) can be exploited in an elegant manner. It was shown in [141] that the QR factorization of
could produce the same orthogonal matrix as that in (
10). Then, it follows that(29) 
where is an upper triangular matrix. Along with (10a) and (28), this implies
The above formulation can potentially bring two benefits. First, one could compute with less communication than in the classical case. Second, the QR factorization could be performed in parallel.
The tall skinny QR (TSQR) algorithm studied in [141, 67] is a good choice for reducing communication when there are many more rows than columns. Let be vertically partitioned into submatrices such that the number of rows of each submatrix is larger than . Then, we perform the QR factorization of each block entry, that is,
Comments
There are no comments yet.