# Fast Accurate Randomized Algorithms for Linear Systems and Eigenvalue Problems

This paper develops a new class of algorithms for general linear systems and eigenvalue problems. These algorithms apply fast randomized sketching to accelerate subspace projection methods, such as GMRES and Rayleigh–Ritz. This approach offers great flexibility in designing the basis for the approximation subspace, which can improve scalability in many computational environments. The resulting algorithms outperform the classic methods with minimal loss of accuracy. For model problems, numerical experiments show large advantages over MATLAB's optimized routines, including a 100 × speedup over gmres and a 10 × speedup over eigs.

## Authors

• 20 publications
• 17 publications
02/08/2021

### Infinite GMRES for parameterized linear systems

We consider linear parameter-dependent systems A(μ) x(μ) = b for many di...
11/15/2020

### On Relaxed Filtered Krylov Subspace Method for Non-Symmetric Eigenvalue Problems

In this paper, by introducing a class of relaxed filtered Krylov subspac...
10/14/2021

### A Theory of Quantum Subspace Diagonalization

Quantum subspace diagonalization methods are an exciting new class of al...
08/27/2019

### An Eigenwise Parallel Augmented Subspace Method for Eigenvalue Problems

A type of parallel augmented subspace scheme for eigenvalue problems is ...
12/01/2020

### Subspace method for multiparameter-eigenvalue problems based on tensor-train representations

In this paper we solve m-parameter eigenvalue problems (mEPs), with m an...
02/12/2021

### Smoothed-adaptive perturbed inverse iteration for elliptic eigenvalue problems

We present a perturbed subspace iteration algorithm to approximate the l...
08/20/2020

### Efficient and Accurate Algorithms for Solving the Bethe-Salpeter Eigenvalue Problem for Crystalline Systems

Optical properties of materials related to light absorption and scatteri...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Arguably, the most exciting recent development in numerical linear algebra (NLA) is the advent of new randomized algorithms that are fast, scalable, robust, and reliable. For example, many practitioners have adopted the “randomized SVD” and its relatives [20, 26]

to compute truncated singular value decompositions of large matrices. Randomized preconditioning

[36, 3] allows us to solve highly overdetermined least-squares problems faster than any classical algorithm.

In spite of these successes, we have made less progress on other core challenges from NLA, especially problems involving nonsymmetric square matrices. This paper exposes a new class of algorithms for solving general linear systems and eigenvalue problems. Our framework enhances subspace projection methods [37, 38], such as GMRES and the Rayleigh–Ritz process, with the modern technique of randomized sketching [41, 52, 26]. This approach allows us to accelerate the existing methods by incorporating approximation subspaces that are easier to construct. The resulting algorithms are faster than their classical counterparts, without much loss of accuracy. In retrospect, the marriage of these ideas appears foreordained.

### 1.1 Sketching a least-squares problem

The sketch-and-solve paradigm [41, 52, 26]

is a basic tool for randomized matrix computations. The idea is to decrease the dimension of a large problem by projecting it onto a random subspace and to solve the smaller problem instead. The solution of this “sketched problem” sometimes serves in place of the solution to the original computational problem.

For a typical example, consider the overdetermined least-squares problem

 minimizey∈Cd∥My−f∥2, (1)

where is a tall matrix with . Draw a random sketching matrix with sketch dimension , say. Then solve the sketched problem

 minimizey∈Cd∥S(My−f)∥2. (2)

For a carefully designed sketching matrix , the whole sketch-and-solve process may be significantly faster than solving Eq. 1 directly. See Section 2 for details.

We can compare the residual norms of the solution to the sketched problem Eq. 2 and the solution to the original problem Eq. 1. The sketching method ensures that

 ∥My⋆−f∥2≤∥M^y−f∥2≤Const⋅∥My⋆−f∥2. (3)

Provided that the original problem has a tiny residual, the solution to the sketched problem also yields a tiny residual!

### 1.2 Solving linear systems by sketched GMRES

Now, suppose that we wish to solve the (nonsymmetric) linear system

 Find x∈Cn : Ax=f% where A∈Cn×n and f∈Cn. (4)

All algorithms in this paper access the matrix via products: . Our approach builds on a classic template, called a subspace projection method [37], which casts the linear system as a variational problem. We can treat this formulation by sketching. Let us summarize the ideas; a full exposition appears in Sections 4 and 3.

#### 1.2.1 Sketched GMRES

For the moment, suppose that we have acquired a tall matrix

, called a basis, with the property that contains a good approximate solution to the linear system Eq. 4. That is, for some . In addition, assume we have the reduced matrix at hand.

At its heart, the GMRES algorithm [40, 39, 37] is a subspace projection method that replaces the linear system Eq. 4 with the overdetermined least-squares problem

 minimizey∈Cd∥ABy−f∥2. (5)

The solution to Eq. 5 yields an approximate solution to the linear system Eq. 4. The residual norm reflects how well the basis captures a solution to the linear system.

The least-squares formulation Eq. 5 is a natural candidate for sketching. Draw a sketching matrix with , say, and sketch the problem:

 minimizey∈Cd∥S(ABy−f)∥2. (6)

The solution of the sketched problem Eq. 6 induces an approximate solution to the linear system Eq. 4. According to Eq. 3, the residual norm is within a constant factor of the original residual norm . In summary, the sketched formulation Eq. 6 is effective if and only if the subspace contains a good solution of the linear system.

We refer to Eq. 6 as the sketched GMRES problem (sGMRES). For an unstructured basis , the sGMRES approach is faster than solving the original least-squares problem Eq. 5, both in theory and in practice. With careful implementation, sGMRES is reliable and robust, even when the conditioning of the reduced matrix is poor. Indeed, it suffices that where is the unit roundoff.111The unit roundoff in standard IEEE double-precision arithmetic. As a consequence, we have an enormous amount of flexibility in choosing the basis .

#### 1.2.2 Krylov subspaces

To make sGMRES work well, we must construct a subspace that captures an approximate solution to the linear system. To that end, consider a Krylov subspace of the form

 Kp(A,f):=span{f,Af,A2f,…,Ap−1f}. (7)

The Krylov subspace often contains an excellent approximate solution to the linear system, even when the depth . See [37, Chaps. 6 and 7].

For computations, we need an explicit basis for the Krylov subspace. Although it is straightforward to form the monomial basis visible in Eq. 7, the condition number may grow exponentially [7, 17], rendering the basis useless for numerical purposes.

Instead, we pursue other fast basis constructions that offer better conditioning. For example, we may assemble a basis using the Arnoldi process with -partial orthogonalization. Define and . For each , we iteratively form

 bj=wj/∥wj∥2wherewj=(I−bj−1b∗j−1−⋯−bj−kb∗j−k)(Abj−1). (8)

It often suffices to take or . Note that we obtain the reduced matrix as a by-product of this computation.

#### 1.2.3 Comparison with GMRES

The standard version of GMRES [40] applies the expensive Arnoldi process (with full orthogonalization) to build an orthonormal basis for the Krylov subspace, and it exploits the structure of this basis to solve the least-squares problem Eq. 5 efficiently.

In contrast, we propose to use a quick-and-dirty construction, such as Arnoldi with -partial orthogonalization, to obtain a basis for the Krylov subspace. Then we solve the sGMRES least-squares problem Eq. 6 to produce an approximate solution of the linear system. The sGMRES approach has lower arithmetic costs than classic GMRES, while attaining similar accuracy:

GMRES: operations  vs.  sGMRES: operations.

This expression assumes that is a fixed constant.

As evidence of the benefits of sGMRES, Figure 1 depicts a 100 speedup for a sparse nonsymmetric linear system with dimension . In this case, sGMRES is comparable in speed to restarted GMRES with restarting frequency , whose convergence is significantly impaired. Section 8 provides more details on the experimental setup, as well as further illustrations. For example, when applied to a sparse positive-definite linear system, sGMRES can produce residual norms about smaller than the conjugate gradient (CG) method after the same running time. Thus, it can be argued that sGMRES combines the speed of CG with the generality and robustness of GMRES.

### 1.3 Solving eigenvalue problems by sketched Rayleigh–Ritz

Similar ideas apply to spectral computations. We pose the nonsymmetric eigenvalue problem

 Find nonzero x∈Cn and λ∈C :% Ax=λxwhere A∈Cn×n. (9)

As before, we access the matrix via products:

. Typically, we seek a family of eigenvectors associated with a particular class of eigenvalues (e.g., largest real part, closest to zero). Let us outline a sketched subspace projection method for the eigenvalue problem. Full details appear in

Sections 7 and 6.

#### 1.3.1 Sketched Rayleigh–Ritz

As in Section 1.2.1, suppose that we are equipped with a basis and the reduced matrix . The range of the basis should contain approximate eigenpairs for which . In this setting, the most commonly employed strategy is the Rayleigh–Ritz (RR) method.

We begin with the classic variational formulation [33, Thm. 11.4.2] of RR:

 minimizeM∈Cd×d∥AB−BM∥F. (10)

The solution is . At this point, RR frames the eigenvalue problem . Each solution yields an approximate eigenpair of the matrix .

Evidently, the least-squares problem Eq. 10 is ripe for sketching. Draw a sketching matrix with , say, and pass to the sketched RR problem:

 minimizeM∈Cd×d∥S(AB−BM)∥F. (11)

It is cheap to form a solution to the sketched problem Eq. 11. As before, we can frame an ordinary eigenvalue problem

 ^My=(SB)†(SAB)y=θy. (12)

For each solution , we obtain an approximate eigenpair of the original matrix . We will show—both theoretically and empirically—that the computed eigenpairs are competitive with the eigenpairs obtained from RR.

We refer to Eq. 11 as the sketched Rayleigh–Ritz (sRR) formulation. Although it demands a careful implementation, sRR is faster than the original least-squares method Eq. 10 for an unstructured basis . Moreover, sRR is robust, even when the basis has poor conditioning. Indeed, it suffices that .

#### 1.3.2 Comparison with Arnoldi + Rayleigh–Ritz

We can also deploy the Krylov subspace Eq. 7 for eigenvalue computations [33, 38]. In this case, we typically use a random starting vector to generate the subspace .

To solve a large nonsymmetric eigenvalue problem, one standard algorithm [38, Sec. 6.2] applies the Arnoldi process (with full orthogonalization) to form an orthonormal basis for the Krylov subspace, and it uses the structure of the basis to solve the RR eigenvalue problem efficiently.

Instead, we propose to combine a fast construction of a Krylov subspace basis, such as -partial orthogonalization Eq. 8, with the sRR eigenvalue problem Eq. 12. Asymptotically, this algorithm uses less arithmetic than the classic approach.

As evidence, Figure 2 highlights an eigenvalue problem from optimization where sRR runs over 10 faster than the MATLAB eigs command. Even so, both methods compute the desired eigenpair to the same accuracy. Section 8 describes the experimental setup and provides further illustrations. For example, when applied to a sparse symmetric eigenvalue problem, sRR can outperform standard implementations of the Lanczos method in both speed and reliability.

#### 1.3.3 Block Krylov subspaces

For eigenvalue problems, there is also a compelling opportunity to explore alternative subspace constructions. For example, consider the block Krylov subspace

 Kp(A,Ω):=span{Ω,AΩ,A2Ω,…,Ap−1Ω}where % Ω∈Cn×b. (13)

We commonly generate the Krylov subspace from a random matrix . The standard prescription recommends a small block size and a large depth , but recent research [26, Sec. 11] has shown the value of a large block size and a small depth .

We must take care in constructing the block Krylov subspace. Partial orthogonalization is only competitive when the block size is a small constant. For larger , the Chebyshev recurrence offers an elegant way to form a basis :

 B1=Ω;B2=AΩ;Bi=2ABi−1−Bi−2for i=3,…,p.

We obtain the reduced matrix as a by-product. In practice, the Chebyshev polynomials must be shifted and scaled to adapt to the spectrum of .

### 1.4 Discussion

Our idea to combine subspace projection methods with sketching offers compelling advantages over the classic algorithms, especially in modern computing environments. Nevertheless, it must be acknowledged that this approach suffers from some of the same weaknesses as GMRES and RR. For example, when the basis is a Krylov subspace, these methods are limited by the approximation power of Krylov subspaces.

With hindsight, our framework appears as an obvious application of the sketch-and-solve paradigm for overdetermined least-squares problems. A critical reader may even wonder whether this idea is actually novel. Let us respond to this concern.

First, the sketch-and-solve methodology is typically used to design fast, low-accuracy algorithms that have limited access to the problem data. Indeed, the ostensible purpose of sketching is to reduce the input data to a more manageable size. To the best of our knowledge, the literature has not evaluated sketching for data-adaptive problem formulations, such as the ones in this paper.

Second, we believe that we are the first authors to identify the natural connection between sketching and subspace projection methods for linear algebra problems. In the past, researchers in theoretical algorithms may have overlooked this opportunity because they did not recognize that subspace projection algorithms are variational methods. Meanwhile, numerical analysts who are familiar with the variational perspective may not have appreciated that sketching is compatible with accuracy.

Third, for our algorithms to be competitive, we must employ efficient constructions of Krylov subspace bases. These ideas have a long history, but they are not widespread. In the past, researchers presented these techniques as a way to postpone expensive orthogonalization steps in parallel computing environments [23, 34]. In contrast, sketching sometimes allows us to eliminate the orthogonalization steps. Thus, we can finally take full advantage of the potential of fast computational bases.

More broadly, orthogonal bases and transformations historically played an important role because they guarantee backward stability. In contrast, we will demonstrate that non-orthogonal bases can support fast and accurate matrix computations. We believe that there are many other settings where this insight will be valuable.

In Section 2, we give a rigorous treatment of sketching for least-squares problems. Sections 5, 4, and 3 develop and analyze the sGMRES method and associated basis constructions. Sections 7 and 6 contain the analogous developments for sRR. Computational experiments in Section 8 confirm that these algorithms are fast, robust, and reliable. Sections 10 and 9 describe extensions and prospects.

### 1.6 Notation

The symbol denotes the (conjugate) transpose of a vector or matrix. We write for the norm or the spectral norm, while is the Frobenius norm. The dagger denotes the pseudoinverse. For a matrix

, define the largest singular value

and the minimum singular value . The condition number .

## 2 Background: Subspace embeddings

A subspace embedding is a linear map, usually from a high-dimensional space to a low-dimensional space, that preserves the norm of every vector in a given subspace. This definition is due to Sarlós [41]; see also [52, 26].

###### Definition 1 (Subspace embedding)

Suppose that the columns of span the subspace . A matrix is called a subspace embedding for with distortion if

 (1−ε)⋅∥By∥2≤∥SBy∥2≤(1+ε)⋅∥By∥2for all y∈Cd. (14)

For matrix computations, we need to design subspace embeddings with several extra properties. First, the subspace embedding should be equipped with a fast matrix–vector multiply so that we can perform the data reduction process efficiently. Second, for a fixed (but unknown) subspace , we want to draw a subspace embedding at random to achieve Eq. 14

with high probability. Last, the embedding dimension

for a randomized subspace embedding of a -dimensional subspace should have the optimal scaling . Owing to this relation, subspace embeddings are only appropriate in settings where a moderate distortion, say , is enough for computational purposes.

Before turning to constructions in Section 2.3, let us outline the applications of subspace embeddings that we will use in this paper.

### 2.1 Sketching for least-squares problems

As discussed in Section 1.1, we can use a subspace embedding to reduce the dimension of an overdetermined least-squares problem. This idea is also due to Sarlós [41]; it serves as the foundation for a collection of methods called the sketch-and-solve paradigm [52, 26].

[Sketching for least-squares] Let be a matrix, and suppose that is a subspace embedding for with distortion . For every vector , we have the two-sided inequality

 (1−ε)⋅∥My−f∥2≤∥S(My−f)∥2≤(1+ε)⋅∥My−f∥2. (15)

In particular, the solution to the least-squares problem Eq. 1 and the solution to the sketched least-squares problem Eq. 2 satisfy residual norm bounds

 ∥My⋆−f∥2≤∥M^y−f∥2≤1+ε1−ε⋅∥My⋆−f∥2. (16)

Equation Eq. 16 justifies the claim Eq. 3.

###### Remark 1 (Accurate sketch-and-solve methods?)

The sketch-and-solve paradigm has a poor reputation among numerical analysts because of a perception that it yields results of deplorable accuracy. For least-squares problems, this criticism is valid when the residual norm is large, which is often the case in data-fitting applications. In contrast, we will apply the technique when the residual is small (say, ), so we can well afford a constant-factor loss.

### 2.2 Whitening the basis

Rokhlin & Tygert [36] observed that a subspace embedding yields an inexpensive way to precondition an iterative algorithm for the overdetermined least-squares problem. We can invoke the same idea to approximately orthogonalize, or whiten, a given basis.

[Whitening] Let be a basis with full column rank. Let be a subspace embedding for with distortion . Compute the QR factorization of the sketched basis, . Then the whitened basis satisfies

 κ2(¯B)=σmax(¯B)σmin(¯B)≤1+ε1−ε. (17)

Furthermore, we have the condition number diagnostic

 1−ε1+ε⋅κ2(T)≤κ2(B)≤1+ε1−ε⋅κ2(T). (18)

### 2.3 Constructing a subspace embedding

There are many performant constructions of fast randomized subspace embeddings that work for an unknown subspace of bounded dimension [26, Sec. 9]. Let us summarize two that are most relevant for our purposes. In each case, for a subspace with dimension , to obtain empirical distortion , we set the embedding dimension .

#### 2.3.1 SRFTs

First, we introduce the subsampled random Fourier transform (SRFT)

[2, 53, 47, 26]. This subspace embedding222For worst-case problems, a more elaborate SRFT construction may be needed [26, Sec. 9]. takes the form

 S=√nsDFE∈Cs×n. (19)

In this expression, is a diagonal projector onto coordinates, chosen independently at random, is the unitary discrete Fourier transform (DFT), and is a diagonal matrix whose entries are independent Steinhaus333A Steinhaus random variable is uniform on the complex unit circle .random variables. The cost of applying the matrix to an matrix is operations using the subsampled FFT algorithm [53].

#### 2.3.2 Sparse maps

Next, we describe the sparse dimension reduction map [27, 31, 10, 11, 26], which is useful for sparse data and may require less data movement. It takes the form

 S=1√s[s1,…,sn]∈Cs×n. (20)

The columns of are statistically independent. Each column has exactly nonzero entries, drawn from the Steinhaus distribution, placed in uniformly random coordinates. For reliability, we can choose the sparsity level . We can apply to a matrix with operations, but it may require a sparse arithmetic library to achieve the best performance.

## 3 Solving linear systems with sGMRES

 Find x∈Cn : Ax=f% where A∈Cn×n and f∈Cn. (21)

This section elaborates on the sGMRES method outlined in Section 1.2. Section 4 discusses methods for constructing the basis required by sGMRES. Section 5 combines these ideas to obtain complete sGMRES algorithms.

### 3.1 Derivation of GMRES

We begin with a basis and the reduced matrix . Suppose that is an initial guess for the solution of Eq. 21 with residual . Lacking prior information, we may take .

Consider the affine family of approximate solutions to Eq. 21 of the form where . Among this class, we may select a representative whose residual has the minimum norm:

 minimizey∈Cd∥ABy−r0∥2. (22)

With some imprecision, we refer to Eq. 22 as the GMRES problem [40]. By calculus, the least-squares problem Eq. 22 is equivalent to an orthogonality principle:

 Find y∈Cd :(AB)∗(ABy−r0)=0. (23)

We can stably solve Eq. 22 or Eq. 23 using a QR factorization of the reduced matrix [21, Ch. 20]. The cost is arithmetic operations, assuming that the reduced matrix is unstructured. Given a solution to either problem, we obtain a new approximate solution with residual .

The formulation Eq. 23 is called a Petrov–Galerkin method [37, Chap. 5] with approximation space and orthogonality space . The GMRES algorithm [39, 40] is a particular instance where is an orthonormal basis for a Krylov subspace generated by . GMRES forms the basis via the Arnoldi process (Section 4.2), which involves matvecs with plus arithmetic. This reduces Eq. 22 to a structured least-squares problem that can be solved in operations.

### 3.2 Derivation and analysis of sGMRES

To develop the sGMRES method, we just sketch the GMRES problem Eq. 22. Construct a subspace embedding for with distortion . The sketched GMRES problem is

 minimizey∈Cd∥S(ABy−r0)∥2. (24)

Let denote any solution of Eq. 24. Write and .

We have an a priori comparison of the GMRES Eq. 22 and sGMRES Eq. 24 residual norms because of the relation Eq. 16:

 ∥AxB−f∥2≤∥A^x−f∥2≤1+ε1−ε⋅∥AxB−f∥2. (25)

Thus, sGMRES produces good solutions to Eq. 21 precisely when GMRES does. A posteriori, we can diagnose the quality of the computed solution by examining the sketched residual norm:

 ^rest:=∥S(AB^y−r0)∥2=(1±ε)⋅∥A^x−f∥2. (26)

The last display is a consequence of Eq. 15.

For both GMRES Eq. 22 and sGMRES Eq. 24, the fundamental challenge is to produce a basis that captures an approximate solution to the linear system Eq. 21. We return to this matter in Section 4.

### 3.3 Implementation

Let us outline a numerically stable implementation of sGMRES and describe some of the issues that arise.

The algorithm operates with either an SRFT Eq. 19 or a sparse embedding Eq. 20, depending on which is more appropriate to the computational environment. We recommend the embedding dimension , which typically yields distortion . In view of Eq. 25, the sGMRES residual norm is less than the GMRES residual norm, although the discrepancy is often smaller in practice.

To obtain the data for the sGMRES problem Eq. 24, we sketch the reduced matrix () and the right-hand side () at a cost of operations. To solve Eq. 24, we compute a thin QR decomposition of the sketched matrix: . A minimizer of the sGMRES problem is

 ^y=(SAB)†(Sr0)=T†(U∗(Sr0)). (27)

The sketched residual norm Eq. 26 admits the simple expression

 ^rest=∥(I−UU∗)(Sr0)∥2. (28)

The two preceding displays require arithmetic since . Last, we explicitly form the approximate solution at a cost of operations.

In summary, given the basis , the cost of forming and solving the sGMRES problem Eq. 24 is arithmetic. In contrast, for an unstructured basis, the cost of solving the GMRES problem Eq. 22 is arithmetic. Section 5 contains pseudocode for sGMRES, along with a more detailed accounting of the costs of forming the basis and solving the least-squares problem.

### 3.4 Stability

The classical stability result [21, Thm. 20.3] shows that standard numerical methods for the least-squares problem (24) produce a solution with essentially optimal residual as long as . According to (18), this condition is equivalent to .

Our computational work (Section 8) confirms that sGMRES is reliable unless the reduced matrix is very badly conditioned. In our experience, it suffices that in double-precision arithmetic. Therefore, we have wide latitude to design bases that we can construct quickly; see Section 4. We will provide evidence that sGMRES with a fast basis construction is more efficient than GMRES with a structured basis.

### 3.5 Restarting

Standard implementations of GMRES periodically restart [37, Sec. 6.5.5]. That is, they use a basis to compute an approximate solution to the linear system Eq. 21 with the residual vector . If the residual norm exceeds an error tolerance, the residual vector is used to generate a new basis, which is fed back to GMRES to construct another approximate solution. This process is repeated until a solution of desired quality is obtained.

Restarting has a number of benefits for the process of basis construction. It allows us to work with bases that have fewer columns, which limits the cost of storing the basis. For orthogonal basis constructions, restarting reduces the cost of orthogonalization. For non-orthogonal basis constructions, the restarting process helps control the conditioning of the basis. On the other hand, restarted GMRES may not converge if the bases are not rich enough (see Fig. 1). As we will discuss in Section 5, sGMRES can help us manage all of these issues.

### 3.6 Preconditioning

For difficult linear systems where the matrix has badly distributed eigenvalues, we may need a preconditioner to solve it successfully with either GMRES or sGMRES. The preconditioned system has the form

 P−1Ax=P−1f. (29)

A good preconditioner has two features [37, Chaps. 9 and 10]. First, the matrix has a more “favorable” eigenvalue spectrum than . Second, we can solve efficiently. (Let us emphasize that we only interact with by solving linear systems!) Although preconditioning is critical in practice, it is heavily problem dependent, so we will not delve into examples.

We may derive sGMRES for the preconditioned system Eq. 29, following the same pattern as before. Note that we employ the preconditioned matrix when we construct the basis and the reduced matrix . The details are routine.

### 3.7 Variations

There are many other subspace projection methods for solving linear systems [37, Chap. 5]. We anticipate that some of these approaches may be accelerated by sketching. For example, we can sketch the flexible GMRES method [37, Chap. 9], which produces basis vectors that are not necessarily correlated.

## 4 Constructing a basis for sGMRES

As we have seen, the success of both GMRES Eq. 22 and sGMRES Eq. 24 hinges on the approximation power of the basis. Krylov subspaces are, perhaps, the most natural way to capture solutions to a linear system when we access the matrix via products [37, Chaps. 6 and 7]. In this section, we describe a number of ways to compute bases for Krylov subspaces. Although these strategies are decades old, they deserve a fresh look because sGMRES has a fundamentally different computational profile from GMRES.

### 4.1 The single-vector Krylov subspace

Many iterative methods for solving the linear system Eq. 21 implicitly search for solutions in the Krylov subspace

 Kp(A;r):=span{r,Ar,A2r,…,Ap−1r}=span{φ(A)r:deg(φ)≤p−1}.

In this context, the generating vector is often the normalized residual , defined by , where is an approximate solution to Eq. 21. The function ranges over polynomials with degree at most .

A basis for the Krylov subspace comprises a system of vectors that spans the subspace. We can write

 B=[b1,…,bd]wherebj=φj(A)rfor j=1,…,d.

The filter polynomials have degree at most , and they are usually linearly independent (so ). In most cases, the polynomials are also graded , and they are constructed sequentially by a recurrence. This process delivers the reduced matrix without any extra work.

For example, the monomial basis takes the form and for . The associated polynomials are for . For many matrices , the conditioning of the monomial basis for grows exponentially with , so it is inimical to numerical computation [17].

We will consider other constructions of Krylov subspace bases that are more suitable in practice. Our aim is to control the resources used to obtain the basis, including arithmetic, (working) storage, communication, synchronization, etc. We can advance these goals by relaxing the requirement that the basis be well-conditioned, which has historically been viewed as essential.

For theoretical analysis of the approximation power of Krylov subspaces in the context of linear system solvers, see [37, Sec. 6.11].

### 4.2 The Arnoldi process

It is supremely natural to build an orthonormal basis for the Krylov subspace sequentially. This is called the Arnoldi process [37, Sec. 6.3]. The initial vector . After steps, the method updates the partial basis by appending the vector

 qj+1=wj+1/∥wj+1∥2wherewj+1=(I−QjQ∗j)(Aqj).

The Arnoldi basis has the happy property that

 AQp=QpHp+wpe∗pwhere % Hp∈Cp×p is upper Hessenberg.

As a consequence, we can solve the least-squares problem Eq. 22 with in time and produce the approximate solution in operations. This is how the standard implementation of the GMRES algorithm operates [40].

The orthogonalization steps in the Arnoldi process are expensive. For iterations, they expend arithmetic, and they may also involve burdensome inner-products, communication, and synchronization. Robust implementations usually incorporate double Gram–Schmidt or Householder reflectors.

The literature contains many strategies for controlling the orthogonalization costs in the Arnoldi process [37, Chap. 6]. sGMRES motivates us to reevaluate techniques for building a nonorthogonal basis. For example, we can use -partial orthogonalization as in Eq. 8. Provided the reduced matrix is reasonably conditioned, we can still obtain accurate solutions to the linear system via sGMRES Eq. 24.

### 4.3 The Lanczos recurrence

For this subsection, assume is Hermitian. In this case, the Arnoldi process simplifies to a three-term recurrence [37, Sec. 6.6]:

 qj+1=wj+1/∥wj+1∥2wherewj+1=(I−qjq∗j−qj−1q∗j−1)(Aqj).

The Lanczos basis has the remarkable property that

 AQp=QpJp+wpe∗pwhere Jp∈Cp×p is tridiagonal. (30)

This allows us to solve the least-squares problem Eq. 22 with in time, and we construct the approximate solution with arithmetic. This is how the MINRES algorithm operates [32].

For iterations, the Lanczos recurrence costs just operations, but it has complicated behavior in finite-precision arithmetic. This issue is not devastating when Lanczos is used to solve linear systems, but it can present a more serious challenge when solving eigenvalue problems [33, Chap. 13].

Although it is very efficient to solve the least-squares problem Eq. 22 by passing to the tridiagonal matrix , it is more reliable to sketch and to solve the sketched problem Eq. 24 instead. The approach based on sketching is competitive with MINRES when .

The literature describes many approaches for maintaining the orthogonality of the Lanczos basis, such as selective orthogonalization [33, Chap. 13]

. Sketching allows us to develop faster alternatives, such as omitting the extra orthogonalization. We can also use the sketched basis vectors to obtain coarse estimates for inner products

[4]. Indeed, for all , so we can choose to orthogonalize only against the basis vectors where the inner product is nonnegligible.

### 4.4 The Chebyshev recurrence

In some settings, we may wish to avoid the orthogonalization steps entirely because they involve expensive inner products between basis vectors. We can achieve this goal by using other polynomial recurrences to construct a Krylov subspace basis. This idea is attributed to Joubert & Carey [23].

For simplicity, suppose that the spectrum of is contained in the axis-aligned rectangle , and set . Then we can assemble a shifted-and-scaled Chebyshev basis via the following recurrence [25, 23]. Let and . Then

 bj=1ϱ[(A−cI)bj−1−δ2x−δ2y4ϱbj−2]for j=3,…,p. (31)

In practice, we also rescale each basis vector to have unit norm after it has played its role in the recurrence. The key theoretical fact is that the Chebyshev basis tends to have a condition number that grows polynomially in , rather than exponentially. This claim depends on assumptions that the eigenvalues of the matrix are equidistributed over an ellipse [16, 25, 23, 34].

To implement this procedure, we may first apply a few iterations of the Arnoldi method (Section 6.2) to estimate the spectrum of . More generally, we find a (transformed) ellipse that contains the spectrum. Then we adapt the Chebyshev polynomials to this ellipse [34]. The overall cost of constructing a Chebyshev basis for is , and it involves no orthogonalization whatsoever.

### 4.5 Newton polynomials

The Newton polynomials provide another standard construction of a nonorthogonal basis for the Krylov subspace [34]. Suppose that are complex-valued shift parameters. Then we can build a basis for via the recurrence

 b1=r/∥r∥2andbj=(A−θj−1I)bj−1for j=2,…,p.

The shifts are often chosen to be estimated eigenvalues of , obtained from an invocation of the Arnoldi method (Section 6.2). The overall computational profile of constructing the Newton basis is similar to constructing a Chebyshev basis.

### 4.6 Local orthogonalization

We can improve the conditioning of a computed basis by local orthogonalization. Indeed, it is generally helpful to orthogonalize subcollections of basis vectors, even if it proves too expensive to orthogonalize all of the basis vectors. In particular, scaling each column to have unit norm is always appropriate. See Demmel’s paper [14] for an analysis.

## 5 sGMRES algorithms

This section presents complete algorithms for solving the linear system Eq. 21 via sGMRES, including options for adaptive basis generation.

### 5.1 Basic implementation

Algorithm 1 contains pseudocode for a basic implementation of sGMRES using the -partial Arnoldi basis Eq. 8. We recommend this version of the algorithm when the user lacks information about the spectrum of . Given bounds on the spectrum, one may replace the partial Arnoldi basis with a Chebyshev basis (Section 4.4). See Table 1 for a summary of the arithmetic costs.

### 5.2 Iterative sGMRES

As noted, most methods for producing the Krylov subspace basis are recursive. They generate the columns of and the reduced matrix in sequence. This observation suggests an iterative implementation of sGMRES. We sketch the columns of the reduced matrix as they arrive, incrementally solving the sGMRES problem Eq. 24 at each step.

Let be the maximum allowable dimension of the Krylov subspace. Draw and fix a randomized subspace embedding with embedding dimension . As we compute each column of the reduced matrix, we can immediately form the sketch and update the QR decomposition:

 S(ABj)=UjTjwhereBj=[b1,…,bj]. (32)

At each step, we obtain an approximate solution to the linear system:

 ^yj=T†j(U∗j(Sr0))with^rest,j=∥(I−UjUj)∗(Sr0)∥2. (33)

Repeat this process until the estimated residual norm is sufficiently small or we breach the threshold for the size of the Krylov space. After iterations, the arithmetic costs of Eqs. 33 and 32 match the non-sequential implementation (Section 3.3) with a basis of size .

There is a further opportunity to design an adaptive strategy for restarting. According to Eq. 18, the condition number is comparable with . When first , we recognize that it is time to restart. We generate the new Krylov subspace using the previous residual . Alternatively, instead of restarting, we could approximately orthogonalize by replacing it with , whose condition number is constant.

### 5.4 Storage-efficient versions

In situations where storage is at a premium, we can even avoid storing the reduced matrix by sketching its columns sequentially and discarding them immediately after sketching. Once the estimated residual norm is sufficiently small, we can construct the approximate solution

 ^xj=x0+ABj^yj=x0+∑ji=1(ABj)i(^yj)i

by iteratively regenerating the columns of the reduced matrix and linearly combining them on the fly. For some basis constructions (e.g., partial Arnoldi or Chebyshev), we only need to maintain a few columns of and the columns of . Unfortunately, this modification doubles the arithmetic cost associated with basis generation (matvecs plus orthogonalization). A similar technique was used in [54].

### 5.5 Obtaining a solution with full accuracy

While the constant-factor loss in sGMRES is unlikely to be an issue, we can obtain a solution with the same quality as GMRES by using as a preconditioner to solve Eq. 24 via an iterative method as in [36, 3]. This method still requires to operate reliably.

## 6 The sketched Rayleigh–Ritz method

Let us turn to the nonsymmetric eigenvalue problem

 Find nonzero x∈Cn and λ∈C : Ax=λxwhere A∈Cn×n. (34)

We will provide an implementation and analysis of the sRR method outlined in Section 1.3.1. Section 6.8 describes modifications for the symmetric eigenvalue problem. Section 7 covers techniques for constructing the basis for sRR.

### 6.1 The many faces of Rayleigh–Ritz

Let be a basis with full rank, and let be the reduced matrix. Rayleigh–Ritz is best understood as a Galerkin method for computing eigenvalues [38, Sec. 4.3]. From the family , we seek a residual orthogonal to . More precisely,

 Find nonzero y∈Cd and θ∈C :B∗(ABy−θBy)=0. (35)

Rearranging, we see that (35) can be posed as an ordinary eigenvalue problem:

 M⋆y:=B†(AB)y=θy% where y≠0 and θ∈C. (36)

Recall that eigenvalue problems are invariant under similarity transforms. In the present context, the computed eigenpairs only depend on the range of , so they are invariant under the map for a nonsingular . Therefore, if is an orthonormal basis for , then we may pass to

 Q∗(AQ)z=θzwhere z≠0. (37)

Given a solution to Eq. 37, we obtain an approximate eigenpair of the matrix . This is the most typical presentation of RR.

In contrast, consider the problem of minimizing the residual over the subspace:

 minimizey∈Cd,θ∈C∥ABy−θBy∥2subject to ∥By∥2=1. (38)

This formulation is sometimes called a rectangular eigenvalue problem [8, 22]. Let us emphasize that the RR method Eq. 36 does not solve the rectangular eigenvalue problem. Nevertheless, for any eigenpair of the matrix , it holds that

 ∥ABy⋆−θ⋆By⋆∥2=∥(AB−BM⋆)y⋆∥2. (39)

The matrix from Eq. 36 does solve a related variational problem [33, Thm. 11.4.2]:

 minimizeM∈Cd×d∥AB−BM∥F. (40)

These connections support the design and analysis of a sketched version of RR.

### 6.2 The Arnoldi method

The Arnoldi method is a classic algorithm [38, Sec. 6.2] for eigenvalue problems based on RR. First, it invokes the Arnoldi process (Section 4.2) to build an orthonormal basis for a Krylov subspace (generated by a random vector) at a cost of operations. This construction ensures that has upper Hessenberg form, so we can solve the eigenvalue problem Eq. 36 with operations by means of the QR algorithm [38, Chap. 7]. Each eigenpair of Eq. 36 induces an approximate eigenpair of , which we can form with operations.

### 6.3 Derivation of sRR

We can view the sRR method as a sketched version of the matrix optimization problem Eq. 40. Consider a subspace embedding for with distortion . The sketched problem is

 minimizeM∈Cd×d∥S(AB−BM)∥F. (41)

The sRR method finds a solution to this optimization problem. Then it poses the ordinary eigenvalue problem

 ^My=θywhere y≠0. (42)

This computation yields up to eigenpairs of the matrix . We obtain approximate eigenpairs of by the transformation .

Sketching allows us to obtain inexpensive a posteriori error bounds. For a computed eigenpair of , it is cheap to form the sketched residual:

 ^rest:=^rest(^y,^θ):=∥S(AB^y−^θB^y)∥2/∥SB^y∥2. (43)

By definition, the subspace embedding ensures that the true residual satisfies

 1−ε1+ε⋅^rest≤∥AB^y−^θB^y∥2∥B^y∥2≤1+ε1−ε⋅^rest. (44)

In other words, we can diagnose when the sRR method has (or has not) produced a high-quality approximate eigenpair of the original matrix .

### 6.4 Implementation of sRR

To implement sRR, we may use either an SRFT embedding Eq. 19 or a sparse embedding Eq. 20. We recommend the embedding dimension , which typically results in distortion for the range of .

We first sketch the reduced matrix () and the basis () at a cost of operations. Next, we compute a thin QR decomposition of the sketched basis. A minimizer of the sRR problem Eq. 41 is the matrix

 ^M:=(SB)†(SAB)=T−1(U∗(SAB))∈Cd×d. (45)

Afterward, we invoke the QR algorithm to solve the eigenvalue problem Eq. 42. Each of the last three steps costs operations.

Given a computed eigenpair , we can obtain the sketched residual value from Eq. 43 at a cost of operations. If the residual estimate is sufficiently small, we declare that is an approximate eigenpair of . For maximum efficiency, we present the approximate eigenvector in factored form. If we need the full vector , it costs operations. Ironically, if we extract a large number of explicit eigenvectors, this last step dominates the cost of the computation. Usually, the number of high-quality approximate eigenpairs is moderate.

In summary, given the basis , if we use sRR to solve