# Global Convergence of Hessenberg Shifted QR I: Dynamics

Rapid convergence of the shifted QR algorithm on symmetric matrices was shown more than fifty years ago. Since then, despite significant interest and its practical relevance, an understanding of the dynamics of the shifted QR algorithm on nonsymmetric matrices has remained elusive. We give a family of shifting strategies for the Hessenberg shifted QR algorithm with provably rapid global convergence on nonsymmetric matrices of bounded nonnormality, quantified in terms of the condition number of the eigenvector matrix. The convergence is linear with a constant rate, and for a well-chosen strategy from this family, the computational cost of each QR step scales nearly logarithmically in the eigenvector condition number. We perform our analysis in exact arithmetic. In the companion paper [Global Convergence of Hessenberg Shifted QR II: Numerical Stability and Deflation], we show that our shifting strategies can be implemented efficiently in finite arithmetic.

There are no comments yet.

## Authors

• 10 publications
• 6 publications
• 14 publications
05/13/2022

### Global Convergence of Hessenberg Shifted QR II: Numerical Stability

We develop a framework for proving rapid convergence of shifted QR algor...
11/25/2021

### A Letter on Convergence of In-Parameter-Linear Nonlinear Neural Architectures with Gradient Learnings

This letter summarizes and proves the concept of bounded-input bounded-s...
01/08/2020

### A quadratically convergent iterative scheme for locating conical degeneracies in the spectra of parametric self-adjoint matrices

A simple iterative scheme is proposed for locating the parameter values ...
05/14/2019

### On the Convergence Rate of Variants of the Conjugate Gradient Algorithm in Finite Precision Arithmetic

We consider three mathematically equivalent variants of the conjugate gr...
09/17/2020

### Analysis of the Convergence Speed of the Arimoto-Blahut Algorithm by the Second Order Recurrence Formula

In this paper, we investigate the convergence speed of the Arimoto-Blahu...
03/27/2019

### The Global Convergence Analysis of the Bat Algorithm Using a Markovian Framework and Dynamical System Theory

The bat algorithm (BA) has been shown to be effective to solve a wider r...
02/28/2022

### Markovian Analysis of Coordination Strategies in Tandem Polling Queues with Setups

We analyze a network of tandem polling queues with two stations operatin...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The Hessenberg Shifted QR Algorithm, discovered in the late 1950s independently by Francis [Fra61, Fra62] and Kublanovskaya [Kub62], has been for several decades the most widely used method for approximately111In the sense of backward error, i.e., exactly computing the eigenvalues of a nearby matrix.

computing all of the eigenvalues of a dense matrix. It is implemented in all of the major software packages for numerical linear algebra and was listed as one of the “Top 10 algorithms of the twentieth century,” along with the Metropolis algorithm and the Simplex algorithm

[DS00, Par00]. The algorithm is specified by a shifting strategy, which is an efficiently computable222In this paper, we assume exact arithmetic with complex numbers and count arithmetic operations as a measure of complexity. function

 Sh:Hn×n→Pk,

where is the set of complex Hessenberg333A matrix is (upper) Hessenberg if whenever . Such matrices are “almost” upper triangular. matrices and is the set of monic complex univariate polynomials of degree , for some typically much smaller than . The word “shift” comes from the fact that when we have for some . The algorithm then consists of the following discrete-time isospectral nonlinear dynamical system on , given an initial condition :

 QtRt =pt(Ht) where pt=Sh(Ht) whenever pt(Ht) is invertible, (1) Ht+1 =Q∗tHtQt, t=0,1,2,….

The first step in (1) is a decomposition so that is unitary. It is not hard to see that each iteration preserves the Hessenberg structure; we ignore the case when is singular in this introduction (see Section 1.3 for a discussion).

The relevance of this iteration to the eigenvalue problem stems from two facts. First, every matrix is unitarily similar to a Hessenberg matrix , and in exact arithmetic such a similarity can be computed exactly in operations. Second, it was shown in [Fra61, Kub62] that for the trivial “unshifted” strategy , the iterates under some mild genericity conditions always converge to an upper triangular matrix ; this is because the unshifted QR iteration can be precisely related to the (inverse) power iteration (see e.g. [TBI97]). Combining the unitary similarities accumulated during the iteration, these two facts yield a Schur factorization of the original matrix, from which the eigenvalues of can be read off. The unshifted QR iteration does not give an efficient algorithm, however, as it is easy to see that convergence can be arbitrarily slow if the ratios of the magnitudes of the eigenvalues of are close to . The role of the shifting strategy is to adaptively improve these ratios and thereby accelerate convergence. The challenge is that this must be done efficiently without prior knowledge of the eigenvalues.

We quantify the rate of convergence of a sequence of iterates of (1) in terms of its -decoupling time , which is defined as the smallest at which some subdiagonal entry of satisfies

 |Ht(i+1,i)|≤δ∥Ht∥.

In this context, “rapid” convergence means that is a very slowly growing function of and , ideally logarithmic or polylogarithmic.

In a celebrated work, Wilkinson [Wil68] proved global convergence444i.e., from any initial condition . of shifted QR on all real symmetric tridiagonal555i.e., arising as the Hessenberg form of symmetric matrices. matrices using the shifting strategy that now carries his name. The linear convergence bound for this shifting strategy was then obtained by Dekker and Traub [DT71] (in the more general setting of Hermitian matrices), and reproven by Hoffman and Parlett [HP78] using different arguments. Other than these results for Hermitian matrices, there is no known bound on the worst-case decoupling time for any large class of matrices or any other shifting strategy.666For nonnormal matrices, it is not even known if there is a shifting strategy which yields global convergence regardless of an effective bound on the decoupling time. A thorough discussion of related work appears in Section 1.3.

Shifted QR is nonetheless the most commonly used algorithm for the nonsymmetric eigenproblem on dense matrices, though there is little theoretical foundation for this practice. The strategies implemented in standard software libraries heuristically converge very rapidly on “typical” inputs, but occasionally examples of nonconvergence are found

[Day96, Mol14] and dealt with in ad hoc ways.

Accordingly, the main theoretical question concerning shifted QR, which has remained open since the 1960s, is:

###### Question D.

Is there a shifting strategy for which the Hessenberg shifted QR iteration provably and rapidly decouples on nonsymmetric matrices?

Question D was asked in various forms e.g. by Parlett [Par74], Moler [Mol78, Mol14], Demmel [Dem97, Ch. 4], Higham [HDG15, IV.10], and Smale [Sma97] (who referred to it as a “great challenge”).

The main result of this article (Theorem 1.1) is a positive answer to Question D which is quantified in terms of the degree of nonnormality of the input matrix . Let be the set of diagonalizable Hessenberg matrices with eigenvector condition number777See Section 1.1 for a precise definition. . We exhibit a two parameter family of deterministic shifting strategies indexed by a degree parameter and a nonnormality parameter such that:

1. The strategy satisfies for every and .

2. has degree and can be computed in roughly arithmetic operations, which is simply for the judicious setting .

Thus, the computational cost of the shifting strategy required for convergence blows up as the input matrix becomes more and more nonnormal, but the dependence on nonnormality is very mild.

We remark that such a result was not previously known even in the case of normal matrices. Further, as we discuss in Remark 1.1, a tiny random perturbation of any is likely to be an element of for small (not depending on

). Thus, while our theorem does not give a single shifting strategy which works for all matrices, it does give a strategy which works for a tiny random perturbation of every matrix (with high probability, where “tiny” and “small” must be quantified appropriately). This may be viewed as a smoothed analysis result in the sense of

[ST04], with the notable feature that the running time of the algorithm depends very mildly on the size of the perturbation, or alternatively as a quantitative statement about the shifting strategy succeeding for “almost all” matrices.

[Arithmetic Complexity from Decoupling Time] The motivation for the particular measure of convergence above is that there is a procedure called deflation which zeroes out the smallest subdiagonal entry of a -decoupled Hessenberg matrix and obtains a nearby block upper triangular matrix, which allows one to pass to subproblems of smaller size incurring a backward error of . Repeating this procedure times (and exploiting the special structure of Hessenberg matrices to compute the efficiently) yields an algorithm for computing a triangular and unitary such that in a total of arithmetic operations [Wat07]. Thus, the interesting regime is to take .

### 1.1 Statement of Results

We need the following two notions to precisely state our main theorem.

Eigenvector Condition Number. The eigenvector condition number

is defined for a diagonalizable matrix

as

 κV(M):=infV:M=VDV−1∥V∥∥V−1∥.

Note that for a normal matrix and that for all iterates of the QR algorithm due to unitarily similarity. It is not hard to see that the infimum is achieved.

Approximate Ritz Values. Like most previously studied shifting strategies, we define in terms of the Ritz values of the current iterate . Recall that the Ritz values of order of a Hessenberg matrix are the eigenvalues of its bottom right corner ; they are also characterized variationally as the roots of the degree monic polynomial minimizing , where

is the last standard basis vector (see Lemma

2.1 for details). Since computing eigenvalues exactly is impossible when , we assume access to a method for computing approximate Ritz values, in the sense encapsulated in the following definition.

[Approximate Ritz values and Ritz value finders] Let . We call a set of -approximate Ritz values of a Hessenberg matrix if

 ∥∥ ∥∥e∗n∏i≤k(H−ri)∥∥ ∥∥1/k≤(1+θ)minp∈\calPk∥e∗np(H)∥1/k. (2)

A Ritz value finder is an algorithm that takes as inputs a Hessenberg matrix , a positive integer and an accuracy parameter , and outputs a set of -approximate Ritz values of whenever the right hand side of is nonzero. Let be the maximum number of arithmetic operations used by over all inputs such that the right hand side of satisfies888Such a lower bound is needed, since otherwise we could use to compute the eigenvalues of to arbitrary accuracy in finite time.

 minp∈\calPk∥e∗np(H)∥1/k≥δ∥H∥.

A Ritz value finder satisfying Definition 1.1 can be efficiently instantiated using polynomial root finders or other provable eigenvalue computation algorithms (e.g. [BGVKS20b]) with guarantees of type . We defer a detailed discussion of numerical precision issues surrounding this implementation to our companion paper [BGVS21]. The subtlety of not being able to compute Ritz values exactly is secondary to the dynamical phenomena which are the focus of this paper, so on first reading it is recommended to assume (i.e., Ritz values are computed exactly), even though this is unrealistic.

We now present our main theorem. All logarithms are base . There is a family of deterministic shifting strategies (described in Section 2) of degree and nonnormality parameter with the following properties.

1. (Rapid Decoupling) If , then for every , the QR iteration with strategy satisfies

 decδ(H0)≤4log(1/δ). (3)
2. (Cost Per Iteration Before Decoupling) Given a Ritz value finder with complexity , a parameter , and a Hessenberg matrix , computing given has a cost per iteration of at most

 12kn2⋅logk+Texc(k,B)+TAppRitz(k,0.0001,δ)+logk (4)

arithmetic operations for all iterations before (3) is satisfied, where

 Texc(k,B):=12kn2⋅131B8logk+8k−1. (5)

The quantity denotes the cost of performing certain “exceptional shifts” (see Section 1.3) used in the strategy and the number corresponds to an upper bound on the arithmetic cost of a degree implicit QR step (see Section 2). The tradeoff between the nonnormality of the input matrix and the efficiency of the shifting strategy appears in (5), where it is seen that setting

 k=Ω(logBloglogB) (6)

yields , for a total running time of operations per iteration. Note that the bound must be known in advance in order to determine how large a is needed to make small. In particular, in the special case of normal matrices, setting yields . One may also take to be a constant independent of , but then the arithmetic complexity of each iteration will depend polynomially rather than logarithmically on .

[Regularization of by Random Perturbation] The pseudospectral regularization guarantees from [BKMS21] (verifying the conjecture of [Dav08]) imply that every matrix is -close in operator norm to matrices with . Such a nearby, well-conditioned matrix can be produced (with high probability) by perturbing each entry of

with an independent complex Gaussian of variance

999See also [BGVKS20a, JSS20] for more general perturbations which have a similar effect, and [CES21] for improved guarantees when the perturbation is real Gaussian.. After perturbing we can thus (with high probability) take in Theorem 1.1 and set , incurring a backward error of and yielding a per iteration arithmetic cost of before -decoupling. This approach embraces the fact that the shifted QR algorithm can only in the first place guarantee backward accuracy of the eigenvalues it computes, so there is no harm in using an initial small random perturbation as a “preconditioning” step.

[Higher Degree Shifts] A QR step with a degree shift is identical to a sequence of steps with degree shifts (see e.g. [Wat07] for a proof), so any degree strategy can be simulated by a degree strategy while increasing the iteration count by a factor of 101010This also has some important advantages with regards to numerical stability, which are discussed in [BGVS21].. We choose to present our strategy as higher degree for conceptual clarity. The efficiency of using degrees as high as has been tested in the past [BBM02, Section 3] and is often used in practice [Kre21].

[Numerical Stability, Deflation, and Bit Complexity] The strategy in Theorem 1.1

can be implemented in floating point arithmetic using

bits of precision,111111Hence, when a random perturbation is used as a preconditioner, in view of Remark 1.1 the number of bits of precision required becomes . while preserving both correctness and rapid convergence, with the caveat that the numerical implementation requires using a small amount of randomization in order to be efficient. This is proved in the companion paper [BGVS21]

, along with a detailed analysis of deflation, yielding a complete algorithm for computing the eigenvalues of a matrix with good bit complexity estimates.

### 1.2 Techniques

There are two distinct phenomena which make analyzing the dynamics of shifted QR challenging.

1. Transient behavior due to nonnormality. In the nonnormal case, the iterates can behave chaotically on short time scales,121212 We measure time not as the number of QR steps, but as the number of QR steps of degree , so for example a QR step with a degree shift corresponds to time steps. lacking any kind of obvious algebraic or geometric monotonicity properties (which are present in the symmetric case). This lack of monotonicity makes it hard to reason about convergence.

2. Fixed points and periodic orbits due to symmetry. The most natural shifting strategies define as a simple function of the entries of , typically a function of the characteristic polynomial of the bottom right corner of (see Section 1.3 for more details). These strategies typically have attractive fixed points and cycles which are not upper triangular, leading to slow convergence or nonconvergence (e.g. see [Par66, Bat90, Day96]). The conceptual cause of these fixed points is symmetry — at a very high level, the dynamical system “cannot decide which invariant subspace to converge to.” This feature is seen even in normal matrices, and in fact its most severe manifestation occurs in the case of unitary matrices.

Both pathologies are seen in the instructive family of examples

 M=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝βnβ1β2⋱βn−1⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠

where . Observe that, for , the characteristic polynomial of is just , so any naïve shifting strategy based on it will yield the trivial shift. One can verify that a QR step with the trivial strategy applied to cyclically permutes the , while leaving the zero pattern of intact. This means that for adversarially chosen , the bottom few subdiagonal entries of — the traditional place to look for monotonicity in order to prove convergence — exhibit arbitrary behavior. At very long time scales of steps, the behavior becomes periodic and predictable, but there is still no convergence.

Previous approaches to Question D have been essentially algebraic (relying on examining entries of the iterates, their resolvents, or characteristic polynomials of their submatrices) or geometric (viewing the iteration as a flow on a manifold), and have been unable to surmount these difficulties in the nonsymmetric case.

In contrast, we take an essentially analytic approach. The key idea is that when the eigenvector condition number is bounded, the dynamics of shifted QR can be understood in terms of certain measures, similar in spirit to the notion of spectral measure of a normal matrix, associated with the (not necessarily normal) iterates . These measures lack any kind of monotonicity on short time scales, but evolve in a predictable way, much like those of normal matrices, over time scales of (degree QR) steps when as in (6) (this is explained in detail in Section 2.2). Moreover, the behavior of these measures can be related to the geometric mean of the bottom subdiagonal entries of , which we use as a potential function to track convergence.

To see this phenomenon in action, if we impose a bound on in Example 1.2, it can be seen that the ratios of the cannot be arbitrary and the geometric mean of the bottom subdiagonal entries of behaves predictably rather than chaotically on intervals of time steps.

Guided by this insight, we carefully design a shifting strategy which satisfies the following dichotomy: either (i) a QR step of degree significantly decreases the potential defined above, or, (ii) the measure associated to the current iterate must have a special structure. In the second case (which corresponds to the symmetry case discussed above) we exploit the structure to design a simple exceptional shift which is guaranteed to significantly reduce the potential, giving linear convergence in either case. Thus, our proof articulates that transients and symmetry are the only obstacles to rapid convergence of the shifted QR iteration on nonsymmetric matrices.

### 1.3 History and Related Work

The literature on shifted QR is vast, so we mention only the most relevant works — in particular, we omit the large body of experimental work and do not discuss the many works on local convergence of shifted QR (i.e., starting from an which is already very close to decoupling). The reader is directed to the excellent surveys [Bat95, Sma97, Chu08] or [Par00, Wat08, GU09] for a dynamical or numerical viewpoint, respectively, or to the books [GVL96, TBI97, Dem97, Wat07] for a comprehensive treatment.

Most of the shifting strategies studied in the literature are a combination of the following three types. The motivation for considering shifts depending on is closely related to Krylov subspace methods, see e.g. [Wat07]. Below denotes the current Hessenberg iterate.

1. -Francis Shift. Take for some . The case is called Rayleigh shift.

2. Wilkinson Shift. Take where is the root of closer to .

3. Exceptional Shift. Let for some chosen randomly or arbitrarily, perhaps with a specified magnitude (e.g. for unitary matrices in [EH75, Wan01, WG02, WG03]).

Shifting strategies which combine more than one of these through some kind of case analysis are called “mixed” strategies.

Symmetric Matrices. Jiang [Erx92] showed that the geometric mean of the bottom subdiagonal entries is monotone for the -Francis strategy in the case of symmetric tridiagonal matrices. Aishima et al. [AMMS12] showed that this monotonicity continues to hold for a “Wilkinson-like” shift which chooses out of Ritz values. Both of these results yield global convergence on symmetric tridiagonal matrices (without a rate).

Rayleigh Quotient Iteration and Normal Matrices. The behavior of shifted QR is well known to be related to shifted inverse iteration (see e.g. [TBI97]). In particular, the Rayleigh shifting strategy corresponds to a vector iteration process known as Rayleigh Quotient Iteration (RQI). Parlett [Par74] (building on [Ost57, Buu58, PK68]) showed that RQI converges globally (but without giving a rate) on almost every normal matrix and investigated how to generalize this to the nonnormal case.

Batterson [Bat90] studied the convergence of -Francis shifted QR on normal matrices with a certain exceptional shift and showed that it always converges. The subsequent work [Bat94] showed that -Francis shifted QR converges globally on almost every real normal matrix (without a rate). In Theorem 6 of that paper, it was shown that the same potential that we consider is monotone-decreasing when the -Francis shift is run on normal matrices, which was an inspiration for our proof of almost-monotonicty for nonnormal matrices.

Nonnormal Matrices. Parlett [Par66] showed that an unshifted QR step applied to a singular matrix leads to immediate -decoupling, taking care of the singularity issue that was glossed over in the introduction, and further proved that all of the fixed points of an extension of the -Francis shifted QR step (for general matrices) are multiples of unitary matrices.

In a sequence of works, Batterson and coauthors investigated the behavior of RQI and -Francis on nonnormal matrices from a dynamical systems perspective. Batterson and Smillie [BS89, BS90] showed that there are real matrices such that RQI fails to converge for an open set of real starting vectors. The latter paper also established that RQI exhibits chaotic behavior on some instances, in the sense of having periodic points of infinitely many periods. Batterson and Day [BD92] showed that -Francis shifted QR converges globally and linearly on a certain conjugacy class of Hessenberg matrices.

In the realm of periodicity and symmetry breaking, Day [Day96], building on an example of Demmel, showed that there is an open set of matrices on which certain mixed shifting strategies used in the library EISPACK fail to converge rapidly; such an example was independently discovered by Moler [Mol14]. These examples are almost normal in the sense that they satisfy , so the reason for nonconvergence is symmetry, and our strategy with modest parameters is guaranteed to converge rapidly on them.

Using topological considerations, Leite et al. [LST13] proved that no continuous shifting strategy can decouple on every symmetric matrix. Accordingly (in retrospect), the most successful shifting strategy for symmetric matrices, the Wilkinson Shift, is discontinuous in the entries of the matrix and explicitly breaks symmetry when it occurs. Our strategy is also discontinuous in the entries of the matrix.

Mixed and Exceptional Shifts. Eberlein and Huang [EH75] showed global convergence (without a bound on the rate) of a certain mixed strategy for unitary Hessenberg matrices; more recently, the works [Wan01, WG02, WG03] exhibited mixed strategies which converge globally for unitary Hessenberg matrices with a bound on the rate, but this bound depends on the matrix in a complicated way and is not clearly bounded away from . Our strategy is also a mixed strategy which in a sense combines all three types above. Our choice of exceptional shift was in particular inspired by the work of [EH75, WG02] — the difference is that the size of the exceptional shift is naturally of order in the unitary case, but in the general case it must be chosen carefully at the correct spectral scale.

Higher Degree Shifts. The idea of using higher degree shifts was already present in [Fra61, DT71], but was popularized in by Bai and Demmel in [BD89], who observed that higher order shifts can sometimes be implemented more efficiently than a sequence of lower order ones; see [BD89, Section 3] for a discussion of various higher order shifting strategies which were considered in the 1980s.

Integrable Systems. The unshifted QR algorithm on Hermitian matrices is known to correspond to evaluations of an integrable dynamical system called the Toda flow at integer times [DNT83]; such a correspondence is not known for any nontrivial shifting scheme or for nonnormal matrices. See [Chu08] for a detailed survey of this connection. More recently, the line of work [PDM13, DMOT14, DT19] studied the universality properties of the decoupling time of unshifted QR on random matrices, and used the connection to Toda flow to prove universality in the symmetric case; it was experimentally observed that such universality continues to hold for shifted QR.

We defer a detailed discussion of the extensive related work on numerical issues related to shifted QR as well as a comparison to other algorithms for computing eigenvalues (in particular, [ABB18] and [BGVKS20b]) to our companion paper [BGVS21].

Organization. The remainder of the paper, Section 2, contains the proof of Theorem 1.1. Section 2.1 contains some elementary but important lemmas, as well as the notion of approximate functional calculus (see Lemma 2.1) which will be used repeatedly in the sequel. The notion of a promising Ritz value, around which our shifting strategy revolves, appears in Section 2.2 along with a discussion of its key properties. We describe the shifting strategy and prove Theorem 1.1 in Section 2.3, deferring some lemmas and details to the final sections.

## 2 Main Result

Throughout the remainder of the paper, will denote an upper Hessenberg matrix, an upper bound on its eigenvector condition number and and a power of two, which the reader may consider for concreteness to be on the order of ; all logarithms will be taken base two for simplicity. As above, we use and to denote the lower-right corner of and its characteristic polynomial respectively.

We will use the geometric mean of the last subdiagonal entries of the to track convergence of the Shifted QR iteration, since we are guaranteed -decoupling once this quantity is smaller than . More explicitly, define the potential of to be

 ψk(H):=|hn−k,n−k−1⋯hn,n−1|1k.

Fixing some , we will show that our shifting strategy guarantees potential reduction: the efficient computation of a Hessenberg matrix , unitarily equivalent to , with the property that

 ψk(ˆH)≤(1−γ)ψk(H). (7)

Since , it follows immediately that we can achieve -decoupling in iterations. Note that the relationship (6) between and is not required for the proof of potential reduction, but impacts the cost of performing each iteration. The table below collates several constants which will appear throughout the paper.

We assume black box access to a routine for efficiently performing a QR step in arithmetic operations rather than . [Implicit QR Algorithm] For , an efficient implicit QR algorithm takes as inputs a Hessenberg matrix and a polynomial and outputs a Hessenberg matrix satisfying

 ˆH=Q∗HQ,

where

is a unitary matrix such that

for some upper triangular matrix , as well as the number whenever is invertible. It runs in at most operations. See e..g [Wat08, Section 3] for a proof in exact arithmetic of the existence of an efficient implicit QR algorithm (we have simplified the stronger bound shown there to ). We defer a discussion of numerically stable implementations of to our companion paper [BGVS21].

### 2.1 Basic Lemmas and Approximate Functional Calculus

Before introducing and analyzing our shifting strategy, we pause to prove three simple and essential lemmas relating the potential , the Hessenberg structure of , its eigenvector condition number , and certain measures associated with . The first is well known and gives a variational characterization of the potential (see [TBI97, Theorem 34.1]).

[Variational Formula for ] Let be any Hessenberg matrix. Then, for any

 ψk(H)=minp∈\calPk∥e∗np(H)∥1k,

with the minimum attained for .

###### Proof.

Since is upper Hessenberg, for any polynomial we have

 p(H)n,n−j=⎧⎪⎨⎪⎩p(H(k))k,k−j+1j=0,…,k−1,hn−k,n−k−1⋯hn,n−1j=k,0j≥k+1.

Thus for every such ,

 minp∈\calPk∥e∗np(H)∥≥|hn−k,n−k−1⋯hn,n−1|=ψk(H)k,

and the bound will be tight for any polynomial whose application to zeroes out the last row; by Cayley-Hamilton, the matrix is identically zero. ∎

It will be useful to have a mechanism for proving upper bounds on the potential of produced from by an implicit QR step. To this end, let and define

 τp(H):=∥e∗np(H)−1∥−1k, (8)

when is invertible, and otherwise. The special case of this quantity has been used to great effect in previous work studying linear shifts (e.g. [HP78]), and our next lemma shows that it bounds the potential of for shift polynomials of arbitrary degree.

[Upper Bounds on ] Let be a Hessenberg matrix, a monic polynomial of degree and . Then

 ψk(ˆH)≤τp(H).
###### Proof.

Assume first that

is singular. In this case for any QR decomposition

, the entry , and because , the last row of is zero as well. In particular . When is invertible, applying Lemma 2.1 and using repeatedly that is unitary, is triangular, and ,

 ψk(ˆH)k≤∥e∗np(ˆH)∥=∥e∗nQ∗p(H)∥=∥e∗nR∥=∥e∗nR−1Q∗∥−1=∥e∗np(H)−1∥−1=τp(H)k.

Lemma 2.1 ensures that given , we can reduce the potential with an implicit QR step by producing a polynomial with

. To do so, we will require a final lemma relating quantities of this form to the moments of a certain measure associated to

which quantifies the overlap of the vector with the left eigenvectors of .

The following notation will be used extensively throughout the paper. Assume that is diagonalizable, with chosen so that , a diagonal matrix with , the eigenvalues of . Write

for the random variable

131313Our shifting strategy is deterministic, but we use random variables rather than measures for notational convenience. supported on the eigenvalues of , with distribution

 P[ZH=λi]=|e∗nVei|2∥e∗nV∥2

so that exactly when is a left eigenvector with eigenvalue .

When is normal, the distribution of is the spectral measure of associated to , and by the functional calculus we have , meaning that the (inverse) moments of are observable to us even without knowing the true eigenvectors and eigenvalues of . The following lemma generalizes this fact to the nonnormal case, at a multiplicative cost of .

[Approximate Functional Calculus] For any upper Hessenberg and complex function whose domain includes the eigenvalues of ,

 ∥e∗nf(H)∥κV(H)≤E[|f(ZH)|2]12≤κV(H)∥e∗nf(H)∥.
###### Proof.

By the definition of above,

 E[|f(ZH)|2]12=∥e∗nf(H)V∥∥e∗nV∥≤∥e∗nf(H)∥∥V∥∥V−1∥=∥e∗nf(H)∥κV(H),

and the left hand inequality is analogous. ∎

Using this lemma with some carefully chosen rational functions of degree , we are able to probe the distribution of for each iterate of the algorithm by examining the observable quantities — for appropriately large , these are related to by a multiplicative factor of , so we obtain accurate information about , which enables a precise understanding of convergence. Since the iterates are all unitarily similar, is preserved with each iteration, so the required is an invariant of the algorithm. Thus the use of a sufficiently high-degree shifting strategy is both an essential feature and unavoidable cost of our approach.

### 2.2 Promising Ritz Values and Almost Monotonicity of the Potential

In the same spirit as Wilkinson’s shift, which chooses a particular Ritz value (out of two), but using a different criterion, our shifting strategy will begin by choosing a Ritz value (out of ) that has the following property for some . [-promising Ritz value] Let , be a set of -approximate Ritz values for , and . We say that is -promising if

 E1|ZH−r|k≥1αkE1|p(ZH)|. (9)

Note that there is at least one -promising Ritz value in every set of approximate Ritz values, since

 1kk∑i=1E1|ZH−ri|k=E1kk∑i=11|ZH−ri|k≥E1|p(ZH)| (10)

by linearity of expectation and AM/GM. The notion of -promising Ritz value is a relaxation which can be computed efficiently from the entries of (in fact, as we will explain in Section 2.4, using a small number of implicit QR steps with Francis-like shifts of degree ).

As a warm-up for the analysis of the shifting strategy, we will first show that if and is a promising Ritz value, the potential is almost monotone under the shift . This justifies the intuition from Section 1.2 and suggests that promising Ritz values should give rise to good polynomial shifts, but is not actually used in the proof of our main theorem. Subsequent proofs will instead use the stronger property (11) established below.

[Almost-monotonicity and Moment Comparison] Let be a set of -approximate Ritz values and assume that is -promising. If then

 ψk(ˆH)≤κV(H)2kα(1+θ)ψk(H),

and moreover

 (11)
###### Proof.

Let . The claim follows from the following chain of inequalities:

 √E[|ZH−r|−2k] ≥E[|ZH−r|−k] Jensen, x↦x2 ≥1αkE[|p(ZH)|−1] r is α-promising ≥1αk1√E[|p(ZH)|2] Jensen, x↦x2 ≥1αk1∥e∗np(H)∥κV(H) Lemma 2.1 ≥1αk1(1+θ)k∥e∗nχk(H)∥κV(H) Definition 1.1 of θ-approximate =1αk1(1+θ)kψk(H)kκV(H) Lemma ???.

This already shows (11). For the other claim, rearrange both extremes of the above inequality to get

 α(1+θ)κV(H)1kψk(H) ≥E[|ZH−r|−2k]−12k ≥τ(z−r)k(H)κV(H)1k Lemma 2.1 ≥ψk(ˆH)κV(H)1k Lemma 2.1

which concludes the proof. ∎

In Section 2.3, we will see that when the shift associated with a promising Ritz value does not reduce the potential, Lemma 2.2 can be used to provide a two-sided bound on the quantities and . This is the main ingredient needed to obtain information about the distribution of when potential reduction is not achieved.

### 2.3 The Shifting Strategy

An important component of our shifting scheme, discussed in detail in Section 2.4, is a simple subroutine, “,” guaranteed to produce an -promising Ritz value with . Guarantees for this subroutine are stated in the lemma below and proved in Section 2.4.

[Guarantees for ] The subroutine specified in Section 2.4 produces a -promising Ritz value, using at most arithmetic operations.

Our strategy is then built around the following dichotomy, which crucially uses the -promising property: in the event that a degree implicit QR step with the -promising Ritz value output by does not achieve potential reduction, we show that there is a modestly sized set of exceptional shifts, one of which is guaranteed to achieve potential reduction. These exceptional shifts are constructed by the procedure “” described in Section 2.5. The overall strategy is specified below.

 Shk,B

Input: Hessenberg and a set of -approximate Ritz values of
Output: Hessenberg .
Requires: and
Ensures: and

1. If , output

2. Else,

3. For each , if , output

The failure of line (2) of to reduce the potential gives useful quantitative information about the distribution of , articulated in the following lemma. This will then be used to design the set of exceptional shifts produced by in line (3) and prove that at least one of them makes progress in line (4). [Stagnation Implies Support] Let and let be a set of -approximate Ritz values of . Suppose is -promising and assume

 ψk(iqr(H,(z−r)k))≥(1−γ)ψk(H)>0. (12)

Then is well-supported on an disk of radius approximately centered at in the following sense: for every :

 P⎡⎢⎣|ZH−r|≤(1+θ)α(κV(H)t)1kψk(H)⎤⎥⎦≥(1−t)2(1−γ)2kα2k(1+θ)2kκV(H)4. (13)
###### Proof.

Observe that is invertible since otherwise, for , we would have by Lemma 2.1. Our assumption implies that that:

 (1−γ)ψk(H) ≤ψk(ˆH) hypothesis ≤τ(z−r)k(H) Lemma 2.1 =∥e∗n(H−r)−k∥−1k definition ≤⎛⎜ ⎜ ⎜⎝κV(H)E[|ZH−r|−2k]12⎞⎟ ⎟ ⎟⎠1/k Lemma ???.

Rearranging and using (11) from Lemma 2.2 we get

 κV(H)2(1−γ)2kψk(H)2k≥E[|ZH−r|−2k]≥E[|ZH−r|−k]2≥1α2k(1+θ)2kψk(H)2kκV(H)2, (14)

which upon further rearrangement yields the “reverse Jensen” type bound:

 E[|ZH−r|−2k]E[|ZH−r|−k]2≤α2k(1+θ)2k(1−γ)2kκV(H)4. (15)

We now have

 P[|ZH−r|≤αt1/k(1+θ)ψk(H)κ1/kV] =P[|ZH−r|−k≥t1αk(1+θ)kψk(H)kκV] ≥P[|ZH−r|−k≥tE[|ZH−r|−k]] by (14) ≥(1−t)2E[|ZH−r|−k]2E[|ZH−r|−2k] Paley-Zygmund ≥(1−t)2(1−γ)2kα2k(1+θ)2kκV(H)4 by (???),

establishing (13), as desired. ∎

In Section 2.5, we will use Lemma 2.3 to prove the following guarantee on .

[Guarantees for ] The subroutine specified in Section 2.5 produces a set of exceptional shifts, one of which achieves potential reduction. If and , then

 |\calS|≤130B8logk+8k−1,

and requires at most arithmetic operations.

[Improving the Disk to an Annulus] Control on the other tail of can be achieved by using Markov’s inequality and the upper bound (15) on the inverse moment . Then, for , the control on both tails yields that the distribution of has significant mass on a thin annulus (the inner and outer radii are almost the same).141414 We note in passing (cf. [Par66]) that when is normal, , , and , the above arguments can be modified to show that, under the assumption of Lemma 2.3, is fully supported on a circle with center and radius , and hence is a unitary matrix. In this scenario one can take a net with fewer elements when calling the exceptional shift, which would reduce the running time of . However, following this path would complicate the analysis and for the sake of exposition we decided to not pursue it any further.

We are now ready to prove Theorem 1.1.

###### Proof of Theorem 1.1.

Rapid convergence. Assume that is the chosen -promising Ritz value in step (1), but that step (2) does not achieve potential reduction in the sense of (7). Combining Lemmas 2.3 and Lemma 2.3 we get there is an such that

 τ(z−s)k(H)−k≥1ψ(H)k(1−γ)k.

Then, if by Lemma 2.1 we have

 ψ(ˆH)−k≥τ(z−s)k(H)−k,

which combined with the above implies that that shifting by incurs potential reduction.

Since we have shown that each iteration decreases the potential by a factor of at least , and , we need at most

 log(1/δ)log(1/(1−γ))≤4log(1/δ)

iterations before , which in particular implies -decoupling.

Arithmetic Complexity. Computing a full set of -approximate Ritz values of has a cost . Then, using an efficient implicit QR algorithm (cf. Definition 2) each computation of has a cost of . By Lemma 2.3, we can produce a promising Ritz value in at most arithmetic operations. Then, in the event that the promising shift fails to reduce the potential the algorithm calls which produces a net in at most arithmetic operations. Checking potential reduction for each point in the net has a cost of operations. A loose bound on the cost of the last two steps is . ∎

### 2.4 Efficiently Finding a Promising Ritz Value

In this section we show how to efficiently find a promising Ritz value, in arithmetic operations. Note that it is trivial to find a -promising Ritz value in