 # Exploiting variable precision in GMRES

We describe how variable precision floating point arithmetic can be used in the iterative solver GMRES. We show how the precision of the inner products carried out in the algorithm can be reduced as the iterations proceed, without affecting the convergence rate or final accuracy achieved by the iterates. Our analysis explicitly takes into account the resulting loss of orthogonality in the Arnoldi vectors. We also show how inexact matrix-vector products can be incorporated into this setting.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

As highlighted in a recent SIAM News article , there is growing interest in the use of variable precision floating point arithmetic in numerical algorithms. In this paper, we describe how variable precision arithmetic can be exploited in the iterative solver GMRES. We show that the precision of some floating point operations carried out in the algorithm can be reduced as the iterations proceed, without affecting the convergence rate or final accuracy achieved by the iterates.

There is already a literature on the use of inexact matrix-vector products in GMRES and other Krylov subspace methods; see, e.g., [19, 6, 3, 7, 8] and the references therein. This work is not a simple extension of such results. To illustrate, when performing inexact matrix-vector products in GMRES, one obtains an inexact Arnoldi relation

 AVk+Ek=Vk+1Hk,VTkVk=I. (1)

On the other hand, if only inexact inner products are performed, the Arnoldi relation continues to hold exactly, but the orthogonality of the Arnoldi vectors is lost:

 AVk=Vk+1Hk,VTkVk=I−Fk. (2)

Thus, to understand the convergence behaviour and maximum attainable accuracy of GMRES implemented in variable precision arithmetic, it is absolutely necessary to understand the resulting loss of orthogonality in the Arnoldi vectors. We adapt techniques used in the rounding-error analysis of the Modified Gram-Schmidt (MGS) algorithm (see [1, 2] or  for a more recent survey) and of the MGS-GMRES algorithm (see [5, 9, 14]). We also introduce some new analysis techniques. For example, we show that (2) is equivalent to an exact Arnoldi relation in a non-standard inner product, and we analyze the convergence of GMRES with variable precision arithmetic in terms of exact GMRES in this inner product. For more results relating to GMRES in non-standard inner products, see, e.g., [10, 18] and the references therein.

We focus on inexact inner products and matrix-vector products (as opposed to the other saxpy operations involved in the algorithm) because these are the two most time-consuming operations in parallel computations. The rest of the paper is organized as follows. We start with a brief discussion of GMRES in non-standard inner products in Section 2. Next, in Section 3, we analyze GMRES with inexact inner products. We then show how inexact matrix-vector products can be incorporated into this setting in Section 4. Some numerical illustrations are presented in Section 5.

## 2 GMRES in weighted inner products

Shown below is the Arnoldi algorithm, with denoting the standard Euclidean inner product.

After steps of the algorithm are performed in exact arithmetic, the output is and upper-Hessenberg such that

 v1=bβ,AVk=Vk+1Hk,VTkVk=Ik.

The columns of form an orthonormal basis for the Krylov subspace . In GMRES, we restrict to this subspace: , where is the solution of

 miny∥b−AVky∥2=miny∥Vk+1(βe1−Hky)∥2=miny∥βe1−Hky∥2.

It follows that

 xk=Vkyk=Vk(HTkHk)−1HTk(βe1)=VkH†k(βe1),rk=b−Axk=Vk+1(βe1−Hkyk)=Vk+1(I−HkH†k)βe1. (3)

Any given symmetric positive definite matrix defines a weighted inner product and associated norm . Suppose we use this inner product instead of the standard Euclidean inner product in the Arnoldi algorithm. We use tildes to denote the resulting computed quantities. After steps, the result is and upper-Hessenberg such that

 ˜v1=b∥b∥W=b˜β,A˜Vk=˜Vk+1˜Hk,˜VTkW˜Vk=Ik.

The columns of form a -orthonormal basis for . Let , where is the solution of

 miny∥b−A˜Vky∥W=miny∥˜Vk+1(˜βe1−˜Hky)∥W=miny∥˜βe1−˜Hky∥2,

so that

 ˜xk=˜Vk˜H†k(˜βe1),˜rk=b−A˜xk=˜Vk+1(I−˜Hk˜H†k)˜βe1.

We denote the above algorithm -GMRES.

The following lemma shows that if is small, the Euclidean norm of the residual vector in -GMRES converges at essentially the same rate as in standard GMRES. The result is known; see e.g. . We include a proof for completeness.

###### Lemma 1.

Let and denote the iterates computed by standard GMRES and -GMRES, respectively, with corresponding residual vectors and . Then

 1≤∥˜rk∥2∥rk∥2≤√κ2(W). (4)
###### Proof.

Both and lie in the same Krylov subspace, . Because is chosen to minimize the Euclidean norm of the residual in , while minimizes the -norm of the residual in ,

 ∥rk∥2≤∥˜rk∥2,∥˜rk∥W≤∥rk∥W.

 σmin(W)≤zTWzzTz=∥z∥2W∥z∥22≤σmax(W),

we have

 ∥rk∥2≤∥˜rk∥2≤∥˜rk∥W√σmin(W)≤∥rk∥W√σmin(W)≤√σmax(W)√σmin(W)∥rk∥2,

from which (4) follows. ∎

## 3 GMRES with inexact inner products

### 3.1 Recovering orthogonality

We will show that the standard GMRES algorithm implemented with inexact inner products is equivalent to -GMRES implemented exactly, for some well-conditioned matrix . To this end, we need the following theorem.

###### Theorem 1.

Consider a given matrix of rank such that

 QTQ=Ik−F. (5)

If for some , then there exists a matrix such that is symmetric positive definite and

 QT(In+M)Q=Ik. (6)

In other words, the columns of are exactly orthonormal in an inner product defined by . Furthermore,

 κ2(In+M)≤1+δ1−δ. (7)
###### Proof.

Note from (5

) that the singular values of

satisfy

 (σi(Q))2=σi(QTQ)=σi(Ik−F),i=1,…,k.

Therefore,

 √1−∥F∥2≤σi(Q)≤√1+∥F∥2,i=1,…,k. (8)

Equation (6) is equivalent to the linear matrix equation

 QTMQ=Ik−QTQ.

It is straightforward to verify that one matrix  satisfying this equation is

 M =(Q†)T(Ik−QTQ)Q† =Q(QTQ)−1(Ik−QTQ)(QTQ)−1QT.

Notice that the above matrix

is symmetric. It can also be verified using the singular value decomposition of

that the eigenvalues and singular values of

are

which implies that the matrix is positive definite. From the above and (8), provided ,

 11+δ≤1(σmax(Q))2≤σi(In+M)≤1(σmin(Q))2≤11−δ, (9)

from which (7) follows. ∎

Note that remains small even for values of close to . For example, suppose , indicating an extremely severe loss of orthogonality. Then , so still has exactly orthonormal columns in an inner product defined by a very well-conditioned matrix.

###### Remark 1.

Paige and his coauthors [2, 13, 17] have developed an alternative measure of loss of orthogonality. Given with normalized columns, the measure is , where and is the strictly upper-triangular part of . Additionally, orthogonality can be recovered by augmentation: the matrix has orthonormal columns. This measure was used in the groundbreaking rounding error analysis of the MGS-GMRES algorithm . In the present paper, under the condition , we use the measure and recover orthogonality in the inner product. For future reference, Paige’s approach is likely to be the most appropriate for analyzing the Lanczos and conjugate gradient algorithms, in which orthogonality is quickly lost and long before convergence.

### 3.2 Bounding the loss of orthogonality

Suppose the inner products in the Arnoldi algorithm are computed inexactly, i.e., line 6 in Algorithm 1 is replaced by

 hij=vTiwj+ηij, (10)

with bounded by some tolerance. We use tildes to denote the resulting computed quantities. It is straightforward to show that despite the inexact inner products in (10), the relation continues to hold exactly (under the assumption that all other operations besides the inner products are performed exactly). On the other hand, the orthogonality of the Arnoldi vectors is lost. We have

 [b,AVk]=Vk+1[βe1,Hk],VTk+1Vk+1=Ik+1+Fk. (11)

The relation between each and the overall loss of orthogonality is very difficult to understand. To simplify the analysis we suppose that each is normalized exactly. (This is not an uncommon assumption; see, e.g.,  and .) Under this simplification, we have

 Fk=¯Uk+¯UTk,¯Uk=[0k×1Uk01×101×k],Uk=⎡⎢ ⎢ ⎢⎣vT1v2…vT1vk+1⋱⋮vTkvk+1⎤⎥ ⎥ ⎥⎦, (12)

i.e., contains the strictly upper-triangular part of . Define

 Nk=⎡⎢ ⎢⎣η11…η1k⋱⋮ηkk⎤⎥ ⎥⎦,Rk=⎡⎢ ⎢⎣h21…h2k⋱⋮hk+1,k⎤⎥ ⎥⎦. (13)

Following Björck’s seminal rounding error analysis of MGS , it can be shown that

 Nk=−[0,Uk]Hk=−UkRk. (14)

For completeness, a proof of (14) is provided in the appendix. Note that, assuming GMRES has not terminated by step , i.e., for , then must be invertible. Using (14), the following theorem shows how the convergence of GMRES with inexact inner products relates to that of exact GMRES. The idea is similar to [14, Section 5], in which the quantity must be bounded, where is a matrix containing rounding errors.

###### Theorem 2.

Let denote the -th iterate of standard GMRES, performed exactly, with residual vector . Now suppose that the Arnoldi algorithm is run with inexact inner products as in (10), so that (11)–(14) hold, and let and denote the resulting GMRES iterate and residual vector. If

 ∥NkR−1k∥2≤δ2 (15)

for some , then

 1≤∥rk∥2∥r(e)k∥2≤√1+δ1−δ. (16)
###### Proof.

Consider the matrix in (11). From (12) and (14), we have

 ∥Fk∥2≤2∥Uk∥2=2∥NkR−1k∥2. (17)

Thus, if (15) holds, and we can apply Theorem 1 with . There exists a symmetric positive definite matrix such that

 [b,AVk]=Vk+1[βe1,Hk],VTk+1WVk+1=Ik+1,κ2(W)≤1+δ1−δ.

The Arnoldi algorithm implemented with inexact inner products has computed an -orthonormal basis for . The iterate is the same as the iterate that would have been obtained by running -GMRES exactly. The result follows from Lemma 1. ∎

### 3.3 A strategy for bounding the ηij

The challenge in applying Theorem 2 is bounding the tolerances at step to ensure that (15) holds for all subsequent iterations . Theorem 3 below leads to a practical strategy for bounding the . We will use

 tk=βe1−Hkyk

to denote the residual computed in the GMRES subproblem at step . We will use the known fact that for ,

 |eTjyk|≤∥H†k∥2∥tj−1∥2. (18)

This follows from

 eTjH†k[Hj−1000]∈R(k+1)×k[yj−10k−j+1]=eTjH†kHk[yj−10]=eTj[yj−10]=0,

and thus

 |eTjyk|=|eTjH†kβ1e1| =∣∣∣eTjH†k[β1e1−Hj−1yj−10]∣∣∣≤∥H†k∥2∥tj−1∥2.

Additionally, in order to understand how increases as the residual norm decreases, we will need the following rather technical lemma, which is essentially a special case of [15, Theorem 4.1]. We defer its proof to the appendix.

###### Lemma 2.

Let and be the least squares solution and residual vector of

 miny∥βe1−Hky∥2.

Given , let be any nonsingular matrix such that

 ∥Dk∥2≤σmin(Hk)ϵ∥b∥2√2∥tk∥2. (19)

Then

 ∥tk∥2(ϵ2∥b∥22+2∥Dkyk∥22)1/2≤σmin([ϵ−1e1,HkD−1k])≤∥tk∥2ϵ∥b∥2. (20)
###### Theorem 3.

In the notation of Theorem 2 and Lemma 2, if for all steps of GMRES all inner products are performed inexactly as in (10) with tolerances bounded by

 |ηij|≤ηj≡ϕjϵσmin(Hk)√2∥b∥2∥tj−1∥2 (21)

for any and any positive numbers such that , then at step either (16) holds with , or

 ∥tk∥2∥b∥2≤6kϵ, (22)

implying that GMRES has converged to a relative residual of .

###### Proof.

If (21) holds, then in (13)

 |Nk|≤⎡⎢ ⎢ ⎢ ⎢ ⎢⎣η1η2…ηkη2…ηk⋱⋮ηk⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=EkDk,

where is an upper-triangular matrix containing only ones in its upper-triangular part, so that , and . Then in (15),

 ∥NkR−1k∥2≤∥NkD−1k∥2∥DkR−1k∥2≤∥Ek∥2∥DkR−1k∥2≤k∥(RkD−1k)−1∥2. (23)

Let denote the first row of , so that . For any we have

 σmin(RkD−1k) =min∥u∥2=∥v∥2=1uTRkD−1kv =min∥u∥2=∥v∥2=1 [0,uT][ϵ−1hTkD−1k0RkD−1k][0v] ≥min∥u∥2=∥v∥2=1 uT[ϵ−1hTkD−1k0RkD−1k]v =σmin([ϵ−1e1,HkD−1k]).

Therefore,

 ∥(RkD−1k)−1∥2=1σmin(RkD−1k)≤1σmin([ϵ−1e1,HkD−1k]).

Notice that if the are chosen as in (21), automatically satisfies (19). Using the lower bound in Lemma 2, and then (18) and (21), we obtain

 ∥(RkD−1k)−1∥2 ≤(ϵ2∥b∥22+2∥Dkyk∥22)1/2∥tk∥2 =(ϵ2∥b∥22+2∑kj=1η2j(eTjyk)2)1/2∥tk∥2 ≤(ϵ2∥b∥22+∑kj=1ϕ2jϵ2∥b∥22)1/2∥tk∥2=√2ϵ∥b∥2∥tk∥2.

Therefore, in (23),

 ∥NkR−1k∥2≤√2kϵ∥b∥2∥tk∥2≤6kϵ∥b∥2∥tk∥2δ2

with . If (22) does not hold, then , which from Theorem 2 implies (16). Therefore, if the are bounded by tolerances chosen as in (21), either (16) holds with , or (22) holds. ∎

Theorem 3 can be interpreted as follows. If at all steps of GMRES the inner products are computed inaccurately with tolerances in (21), then convergence at the same rate as exact GMRES is achieved until a relative residual of essentially is reached. Notice that is inversely proportional to the residual norm. This allows the inner products to be computed more and more inaccurately as as the iterations proceed.

If no more than iterations are to be performed, we can let (although more elaborate choices for could be considered; see for example ). Then the factor in (21) can be absorbed along with the in (22).

One important difficulty with (21) is that is required to pick at the start of step , but is not available until the final step . A similar problem occurs in GMRES with inexact matrix-vector products; see [19, 6] and the comments in Section 4. In our experience, is often possible to replace in (21) by , without significantly affecting the convergence of GMRES. This leads to following:

 Aggressive threshhold:ηj=ϵ∥b∥2∥tj−1∥2,j=1,2,…. (24)

In exact arithmetic, is bounded below by . If the smallest singular value of

is known, one can estimate

in (21), leading to the following:

 Conservative threshhold:ηj=ϵσmin(A)∥b∥2∥tj−1∥2,j=1,2,…. (25)

This prevents potential early stagnation of the residual norm, but is often unnecessarily stringent. (It goes without saying that if the conservative threshold is less than , where is the machine precision, then the criterion is vacuous: according to this criterion no inexact inner products can be carried out at iteration .) Numerical examples are given in Section 5.

## 4 Incorporating inexact matrix-vector products

As mentioned in the introduction, there is already a literature on the use of inexact matrix-vector products in GMRES. These results are obtained by assuming that the Arnoldi vectors are orthonormal and analyzing the inexact Arnoldi relation

 AVk+Ek=Vk+1Hk,VTkVk=I.

In practice, however, the computed Arnoldi vectors are very far from being orthonormal, even when all computations are performed in double precision arithmetic; see for example [5, 9, 14].

The purpose of this section is to show that the framework used in  and  to analyze inexact matrix-vector products in GMRES is still valid when the orthogonality of the Arnoldi vectors is lost, i.e., under the inexact Arnoldi relation

 AVk+Ek=Vk+1Hk,VTkVk=I−Fk. (26)

This settles a question left open in [19, Section 6].

Throughout we assume that . Then from Theorem 1 there exists a symmetric positive definite matrix such that , and with singular values bounded as in (9).

### 4.1 Bounding the residual gap

As in previous sections, we use to denote the computed GMRES iterate, with for the actual residual vector and for the residual vector updated in the GMRES iterations. From

 ∥rk∥2≤∥rk−Vk+1tk∥2+∥Vk+1tk∥2,

if

 max{∥rk−Vk+1tk∥2,∥Vk+1tk∥2}≤ϵ2∥b∥2 (27)

then

 ∥rk∥2≤ϵ∥b∥2. (28)

From the fact that the columns of are orthonormal as well as (9), we obtain

 ∥Vk+1tk∥2≤∥W−\sfrac12∥2∥W\sfrac12Vk+1tk∥2=∥W∥−\sfrac122∥tk∥2≤√1+δ∥tk∥2.

In GMRES, with increasing , which implies that as well. Therefore, we focus on bounding the residual gap in order to satisfy (27) and (28).

Suppose the matrix-vector products in the Arnoldi algorithm are computed inexactly, i.e., line 4 in Algorithm 1 is replaced by

 wj=(A+Ej)vj, (29)

where for some given tolerance . Then in (26),

 Ek=[E1v1,E2v2,…,Ekvk]. (30)

The following theorem bounds the residual gap at step in terms of the tolerances and , for .

###### Theorem 4.

Suppose that the inexact Arnoldi relation (26) holds, where is given in (30) with for , and . Then the resulting residual gap satisfies

 ∥rk−Vk+1tk∥2≤∥H†k∥2k∑j=1ϵj∥tj−1∥2. (31)
###### Proof.

From (26) and (30),

 ∥rk−Vk+1tk∥2 =∥b−AVkyk−Vk+1tk∥2 =∥b−Vk+1Hkyk+Ekyk−Vk+1tk∥2 =∥Vk+1(β1e1−Hkyk)+Ekyk−Vk+1tk∥2 =∥Ekyk∥2=∥∥ ∥∥k∑j=1EjvjeTjyk∥∥ ∥∥2≤k∑j=1ϵj|eTjyk|.

The result then follows from (18). ∎

### 4.2 A strategy for picking the ϵj

Theorem 4 suggests the following strategy for picking the tolerances that bound the level of inexactness in the matrix-vector products in (29). Similarly to Theorem 3, let  be any positive numbers such that . If for all steps ,

 ϵj≤ϕjϵσmin(Hk)2∥b∥2∥tj−1∥2, (32)

then from (31) the residual gap in (27) satisfies

 ∥rk−Vk+1tk∥2≤ϵ2∥b∥2.

Interestingly, this result is independent of . Similarly to (21), the criterion for picking at step  involves that is only available at the final step . A large number of numerical experiments [6, 3] indicate that can often be replaced by . Absorbing the factor into in (32) and replacing by  or by  leads, respectively, to the same aggressive and conservative thresholds for as we obtained for in (24) and in (25). This suggests that matrix-vector products and inner products in GMRES can be computed with the same level of inexactness. We illustrate this with numerical examples in the next section.

## 5 Numerical examples

We illustrate our results with a few numerical examples. We run GMRES with different matrices and right-hand sides , and compute the inner products and matrix-vector products inexactly as in (10) and (29). We pick

randomly, uniformly distributed between

and , and pick

to be a matrix of independent standard normal random variables, scaled to have norm

. Thus we have

 |ηij|≤ηj,∥Ej∥2≤ϵj,

for chosen tolerances and . Throughout we use the same level of inexactness for inner products and matrix-vector products, i.e., .

In our first example, is the Grcar matrix of order . This is a highly non-normal Toeplitz matrix. The right hand side is . Results are shown in Figure 1. The solid green curve is the relative residual . For reference, the dashed blue curve is the relative residual if GMRES is run in double precision. The full magenta curve corresponds to the loss of orthogonality in (11). The black dotted curve is the chosen tolerance .

In Example 1(a),

 ηj=ϵj=⎧⎪⎨⎪⎩10−8∥A∥2, for 20≤j≤30,10−4∥A∥2, for 40≤j≤50,2−52∥A∥2, otherwise.

The large increase in the inexactness of the inner products at iterations and immediately leads to a large increase in . This clearly illustrates the connection between the inexactness of the inner products and the loss of orthogonality in the Arnoldi vectors. As proven in Theorem 2, until , the residual norm is the same as it would have been had all computations been performed in double precision. Due to its large increases at iterations and , approaches , and the residual norm starts to stagnate, long before a backward stable solution is obtained.

In Example 1(b), the tolerances are chosen according to the aggressive criterion (24) with . With this judicious choice, does not reach , and the residual norm does not stagnate, until a backward stable solution is obtained.

In our second example, is the matrix from the SuiteSparse matrix collection . This is a matrix with condition number . The right hand side is once again .

Results are shown in Figure 2. In Example 2(a), tolerances are chosen according to the aggressive threshhold (24) with . In this more ill-conditioned problem, the residual norm starts to stagnate before a backward stable solution is obtained. In Example 2(b), the tolerances are chosen according to the conservative threshhold (25) with , and there is no more such stagnation. Because of these lower tolerances, the inner products and matrix-vector products have to be performed in double precision until about iteration 200. This example illustrates the tradeoff between the level of inexactness and the maximum attainable accuracy. If one requires a backward stable solution, the more ill-conditioned the matrix is, the less opportunity there is for performing floating-point operations inexactly in GMRES.

## 6 Conclusion

We have shown how inner products can be performed inexactly in MGS-GMRES without affecting the convergence or final achievable accuracy of the algorithm. We have also shown that a known framework for inexact matrix-vector products is still valid despite the loss of orthogonality in the Arnoldi vectors. It would be interesting to investigate the impact of scaling or preconditioning on these results. Additionally, in future work, we plan to address the question of how much computational savings can be achieved by this approach on massively parallel computer architectures.

## Appendix A Proof of (14)

In line 7 of Algorithm 1, in the th pass of the inner loop at step , we have

 w(ℓ)j=w(ℓ−1)j−hℓjvℓ (33)

for and with . Writing this equation for to , we have

 w(i+1)j