Eigenvalue perturbation theory is an old topic dating originally to the work of Rayleigh in the 19th century. Broadly speaking, there are two main streams of research. The most classical is analytic perturbation theory (APT), where one considers the behavior of eigenvalues of a matrix or linear operator that is an analytic function of one or more parameters. Authors of well-known books describing this body of work include Kato [Kat66, Kat76, Kat82, Kat95],111The first edition of Kato’s masterpiece Perturbation Theory for Linear Operators was published in 1966 and a revised second edition appeared in 1976. The most recent edition is the 1995 reprinting of the second edition with minor corrections. Most of this book is concerned with linear operators, but the first two chapters treat the finite-dimensional case of matrices, and these appeared as a stand-alone short version in 1982. Since we are only concerned with matrices in this article, our references to Kato’s book are to the 1982 edition, although in any case the equation numbering is consistent across all editions. Rellich [Rel69], Chatelin [Cha11], Baumgärtel [Bau85] and, in text book form, Lancaster and Tismenetsky [LT85]. Kato [Kat82, p. XII]) and Baumgärtel [Bau85, p. 21] explain that it was Rellich who first established in the 1930s that when a Hermitian matrix or self-adjoint linear operator with an isolated eigenvalue of multiplicity is subjected to a real analytic perturbation, that is a convergent power series in a real parameter , then (1) it has exactly eigenvalues converging to as , (2) these eigenvalues can also be expanded in convergent power series in and (3) the corresponding eigenvectors can be chosen to be mutually orthogonal and may also be written as convergent power series. As Kato notes, these results are exactly what were anticipated by Rayleigh, Schrödinger and others, but to prove them is by no means trivial, even in the finite-dimensional case.
The second stream of research is largely due to the numerical linear algebra (NLA) community. It is mostly restricted to matrices and generally concerns perturbation bounds rather than expansions, describing how to bound the change in the eigenvalues and associated eigenvectors or invariant subspaces when a given matrix is subjected to a perturbation with a given norm and structure. Here there are a wide variety of well-known results due to many of the founders of matrix analysis and numerical linear algebra: Gerschgorin, Hoffman and Wielandt, Mirksy, Lidskii, Ostrowski, Bauer and Fike, Henrici, Davis and Kahan, Varah, Ruhe, Stewart, Elsner, Demmel and others. These are discussed in many books, of which the most comprehensive include those by Wilkinson [Wil65], Stewart and Sun [SS90], Bhatia [Bha97, Bha07] and Stewart [Ste01], as well as Chatelin [Cha12], which actually covers both the APT and the NLA streams of research in some detail. See also the survey by Li [Li14]. An important branch of the NLA stream concerns the pseudospectra of a matrix; see the book by Trefethen and Embree [TE05] and the Pseudospectra Gateway web site [ET].
This paper is inspired by both the APT and the NLA streams of research, and its scope is limited to an important special case: first-order perturbation analysis of a simple eigenvalue and the corresponding right and left eigenvectors of a general square matrix, not assumed to be Hermitian or normal. The eigenvalue result is well known to a broad scientific community. The treatment of eigenvectors is more complicated, with a perturbation theory that is not so well known outside a community of specialists. We give two different proofs of the main eigenvector perturbation theorem. The first, inspired by the NLA research stream and based on the implicit function theorem, has apparently not appeared in the literature in this form. The second, based on complex function theory and on eigenprojectors, as is standard in APT, is largely a simplified version of results in the literature that are well known. The second derivation uses a convenient normalization of the right and left eigenvectors that depends on the perturbation parameter, but although this dates back to the 1950s, it is rarely discussed in the literature. We then show how the eigenvector perturbation theory is easily extended to handle other normalizations that are often used in practice. We also explain how to verify the perturbation results computationally. In the final section, we illustrate the difficulties introduced by multiple eigenvalues with two illuminating examples, and give references to work on perturbation of invariant subspaces corresponding to multiple or clustered eigenvalues.
2 First-order perturbation theory for a simple eigenvalue
Throughout the paper we use
to denote the vector or matrix 2-norm,
to denote the identity matrix of order, the superscript to denote transpose and to denote complex conjugate transpose. Greek lower case letters denote complex scalars. Latin lower case letters denote complex vectors, with the exception of for the imaginary unit and for integers. Upper case letters denote complex matrices or, in some cases, sets in the complex plane. We begin with an assumption that also serves to establish our notation.
Let have a simple eigenvalue corresponding to right eigenvector (so with ) and left eigenvector (so with ), normalized so that . Let and let be a complex-valued matrix function of a complex parameter that is analytic in a neighborhood of , satisfying .
The normalization is always possible since the right and left eigenvector corresponding to a simple eigenvalue cannot be orthogonal. Note that since and are unique only up to scalings, we may multiply by any nonzero complex scalar provided we also scale by the reciprocal of the conjugate of so that remains equal to one. The use of the complex conjugate transpose in instead of an ordinary transpose is purely a convention that is often, but not universally, followed. The statement that the matrix is analytic means that each entry of is analytic (equivalently, complex differentiable or holomorphic) in in a neighborhood of .
The most basic result in eigenvalue perturbation theory follows.
(Eigenvalue Perturbation Theorem) Under Assumption 1, has a unique eigenvalue that is analytic in a neighborhood of , with and with
where and are respectively the derivatives of and at .
The proof appears in the next section.
introduced by [Wil65], is called the eigenvalue condition number for . We have . In the real case, is the reciprocal of the cosine of the angle between and . In the special case that is Hermitian, its right and left eigenvectors coincide so , but in this article we are concerned with general square matrices.
In the APT research stream, instead of eigenvectors, the focus is mostly on the eigenprojector222In APT, the standard term is “eigenprojection”, while in NLA, “spectral projector” is often used. The somewhat nonstandard term “eigenprojector” is a compromise. corresponding to , which can be defined as
and which satisfies
Note that the eigenprojector does not depend on the normalization used for the eigenvectors and (assuming ), which simplifies the associated perturbation theory, and note also that . Let denote trace and recall the property . Clearly, equation (1) is equivalent to
Kato [Kat82, p. XIII] explains that the results of Rellich for analytic perturbations of self-adjoint linear operators were extended (by Sz-Nagy, Kato and others) to non-self-adjoint linear operators and therefore non-Hermitian matrices in the early 1950s using complex function theory, so (3), equivalently (1), was known at that time. However, it seems that these results were not well known until the publication of the first edition of Kato’s book in 1966 (although Kato did present a summary of these results for the linear case at a conference on matrix computations [Giv58, p. 104] in 1958). Eq. (1) was independently obtained for the analytic case by Lancaster [Lan64], and for the linear case by Wilkinson [Wil65, p.68–69] and Lidskii [Lid66]. They all used the theory of algebraic functions to obtain their results, exploiting the property that eigenvalues are roots of the characteristic polynomial. A different technique is used by Stewart and Sun [SS90, p. 185] who show that the eigenvalue is differentiable w.r.t. its matrix argument using a proof depending on Gerschgorin circles; the result for a differentiable family
then follows from the ordinary chain rule.
We close this section with a brief discussion of multiple eigenvalues. The algebraic multiplicity of is the multiplicity of the factor in the characteristic polynomial , while the geometric multiplicity (which is always less than or equal to the algebraic multiplicity) is the number of associated linearly independent right (equivalently, left) eigenvectors. A simple eigenvalue has both algebraic and geometric multiplicity equal to one. More generally, if the algebraic and geometric multiplicity are equal, the eigenvalue is said to be semisimple or nondefective. An eigenvalue whose geometric multiplicity is one is called nonderogatory.
3 First-order perturbation theory for an eigenvector corresponding to a simple eigenvalue
We begin this section with a basic result from linear algebra; see [Ste01, Theorem 1.18 and eq. (3.10)] for a proof.
Suppose Assumption 1 holds. There exist matrices , and satisfying
Note that, from , it is immediate that the columns of and respectively span the null spaces of and . Furthermore, we have
We also have and , so the columns of and are respectively bases for right and left ()-dimensional invariant subspaces of , and where is the complementary projector to . If we assume that is diagonalizable, i.e., with linearly independent eigenvectors, then we can take the columns of and of to respectively be right and left eigenvectors corresponding to the eigenvalues of that differ from , which we may denote by (some of which could coincide, as diagonalizability implies only that the eigenvalues are semisimple, not that they are simple). In this case, we can take to be the diagonal matrix . More generally, however, and may be any matrices satisfying (4), ignoring the multiplicities and Jordan structure of the other eigenvalues.
It then follows that
In the NLA stream of research, is called the group inverse of [SS90, p. 240–241], [MS88], [CM91], [GO11, Theorem 5.2.]. In the APT research stream, it is called the reduced resolvent matrix of w.r.t. the eigenvalue (see [Kat82, eqs. I.5.28 and II.2.11].)333The notion of group inverse or reduced resolvent extends beyond the simple eigenvalue context to multiple eigenvalues. If is a defective eigenvalue with a nontrivial Jordan structure, the reduced resolvent matrix of with respect to must take account of “eigennilpotents”. It is the same as the Drazin inverse of , a generalization of the group inverse (see [Cha12, p.98], [CM91] and, for a method to compute the Drazin inverse, [GOS15]).
We now give a first-order perturbation theorem for right and left eigenvectors corresponding to a simple eigenvalue.
(Eigenvector Perturbation Theorem) Suppose that Assumption 1 holds and define , and as in Lemma 1 and as in (5). Then there exist vector-valued functions and that are analytic in a neighborhood of with , and , satisfying the right and left eigenvector equations
where is the analytic function from Theorem 1. Furthermore, these can be chosen so that their derivatives, and , satisfy and , with444We use the notation to mean .
Note that it is , not , that is analytic with respect to the complex parameter . However, is differentiable w.r.t. the real and imaginary parts of . Note also that we do not claim that and are unique, even when they are chosen to satisfy (7) and (8). Sometimes, other normalizations of the eigenvectors, not necessarily satisfying (7) and (8), are preferred, as we shall discuss in §3.4.
It follows from Theorem 2 that
where , the ordinary matrix condition number of , equivalently of (as ), with the same bound also holding for .
In the diagonalizable case, as already noted above, we can take , so
with the same bound also holding for . In this case, the formula (7) for the eigenvector derivative was given by Wilkinson [Wil65, p. 70]. He remarked (p. 109) that although his derivation is essentially classical perturbation theory, a simple but rigorous treatment did not seem to be readily available in the literature. Lancaster [Lan64] and Lidskii [Lid66] both showed that the perturbed eigenvector corresponding to a simple eigenvalue may be defined to be differentiable at , but they did not give the first-order perturbation term. The books by Stewart and Sun [SS90, sec. V.2] and Stewart [Ste01, sec. 1.3 and 4.2] give excellent discussions of the issues summarized above as well as many additional related results. The eigenvector derivative formula (7) in Theorem 2 above is succinctly stated just below [Ste01, eq. (3.14), p. 46], where on the same page Theorem 3.11 stating it more rigorously, and providing additional bounds, is also given; see also [Ste01, line 4, p. 48]. The reader is referred to [Ste71] and [Ste73] for a proof. Stewart [Ste71] introduced the idea of establishing the existence of a solution to an algebraic Riccati equation by a fixed point iteration, a technique that was followed up in [Ste73, eq. (1.5), p. 730] and [Dem86, eq. (7.2), p.187]. Alternatively, proofs of Theorem 2 may be derived by various approaches based on the implicit function theorem; see [Mag85, Sun85] and [Sun98, Sec. 2.1]. A related argument appears in [Lax07, Theorem 9.8]. These approaches generally focus on obtaining results for the right eigenvector subject to some normalization; they can also be applied to obtain results for the left eigenvector, and these can be normalized further to obtain the condition . The proof that we give in §3.2 is also based on the implicit function theorem, using a block-diagonalization approach that obtains the perturbation results for the right and left eigenvectors simultaneously, ensuring that . Note, however, that a fundamental difficulty with eigenvectors is their lack of uniqueness. In contrast, the eigenprojector is uniquely defined, and satisfies the following perturbation theorem.
This result is well known in the APT research stream [Kat82, eq. (II.2.13)], and, like the eigenvalue perturbation result, goes back to the 1950s. Furthermore, while it’s easy to see how Theorem 3 can be proved using Theorem 2, it is also the case that Theorem 2 can be proved using Theorem 3, by defining the eigenvectors appropriately in terms of the eigenprojector, as discussed below. This provides a convenient way to define eigenvectors uniquely.
, because then the right and left eigenvectors coincide. The results for the Hermitian case lead naturally to perturbation theory for singular values and singular vectors of a general rectangular matrix; see[Ste01, Sec. 3.3.1] and [Sun98, Sec. 3.1].
If we assume that for in some neighborhood of , the matrix has an eigenvalue and corresponding right and left eigenvectors and with such that , and are all analytic functions of satisfying , , and , then differentiating the equation and setting , we find
Multiplying on the left by and using and , we obtain the formula for :
Equation (11) can be written in the form
Using Lemma 1, we can write
and substituting this into (12) and multiplying on the left by , we find
The first row equation here is , which is simply the formula for . The remaining equations are
and since and is invertible, we obtain the following formula for :
Note that is not completely determined by this formula because each eigenvector is determined only up to a multiplicative constant. If we can choose the scale factor in such a way that then, multiplying on the left by and recalling that , we obtain the formula in (7) for :
Similarly, the formula (8) for can be derived assuming that we can choose so that .
In the following subsections, we establish the assumptions used here when is a simple eigenvalue of , and thus obtain proofs of Theorems 1, 2, and 3, in two different ways. The first involves finding equations that a similarity transformation must satisfy if it is to take (or, more specifically, ) to a block diagonal form like that in Lemma 1 for . The implicit function theorem555Since the perturbation parameter and the matrix family are complex, we need a version of the implicit function theorem from complex analysis, but in the special case that and are real, we could use a more familiar version from real analysis. In that case, although some of the eigenvalues and eigenvectors of a real matrix may be complex, they occur in complex conjugate pairs and are easily represented using real quantities. is then invoked to show that these equations have a unique solution, for in some neighborhood of , and that the solution is analytic in . The second uses the argument principle and the residue theorem from complex analysis to establish that, for in a neighborhood of , each matrix has a simple eigenvalue that is analytic in and satisfies . It then follows from Lemma 1 that there is a similarity transformation taking to block diagonal form, but Lemma 1 says nothing about analyticity or even continuity of the associated matrices and . Instead, the similarity transformation is applied to the resolvent and integrated to obtain an expression for the eigenprojector that is shown to be analytic in . Finally, left and right eigenvectors satisfying the analyticity conditions along with the derivative formulas (7) and (8) are defined in terms of the eigenprojector.
Note that the assumptions used here do not generally hold when is not a simple eigenvalue of , as discussed in §4.
The first proof that we give is inspired by the NLA research steam, but instead of Stewart’s fixed-point iteration technique mentioned previously, we rely on the implicit function theorem [Kra01, Theorem 1.4.11], which we now state in the form that we need.
(Implicit Function Theorem) Let be an open set, an analytic mapping, and a point where and where the Jacobian matrix is nonsingular. Then the system of equations has a unique analytic solution in a neighborhood of that satisfies .
We now exploit this result in our proof of Theorem 2. The setting of the stage before applying the implicit function theorem follows Demmel’s variant of Stewart’s derivation mentioned above. We obtain a proof of Theorem 1 along the way, and then give a proof of Theorem 3 as an easy consequence.
Using Lemma 1, define
Here the scalar , the row and column vectors and and the matrix are analytic functions of near , since is. In what follows, we will transform this matrix into a block diagonal matrix by a similarity transformation. We will choose , so that
with and , and consequently , , and , analytic in a neighborhood of , with , and hence . This transformation idea traces back to [Ste73, p.730] who designed a with , but in the form given here, it is due to [Dem86, p.187].
We would like to have, for sufficiently close to , the similarity transformation
where and are also analytic, with and . Since is block diagonal by definition, we need to be block diagonal. Suppressing the dependence on , this last matrix is given by
For clarity, we introduce the notation for the analytic row vector function . We then seek column and row vector analytic functions and making the off-diagonal blocks of (15) zero, i.e., satisfying
Taking , , and equal to first and then with equal to and , respectively, in Theorem 4, we note that since , , and the Jacobian matrices
where, using the definition (13), we have
again suppressing dependence on on the right-hand sides. These functions are analytic in a neighborhood of , satisfying and , with
proving Theorem 1.
Let be the first column of the identity matrix . Multiplying (14) on the left by and on the right by we obtain, using and ,
is analytic with and satisfies the left eigenvector equation in (6). Furthermore, , as claimed. Finally, differentiating and we have
The eigenprojector equations (9) follow immediately. We have
In this proof, in contrast to the previous one, we focus on proving Thereom 3 first, obtaining the proof of Theorem 1 along the way, and finally obtaining Theorem 2 as a consequence. This proof of Theorem 3 is based on complex function theory, as is standard in APT. However, our derivation is simpler than most given in the literature, which usually prove more general results, such as giving complete analytic expansions for the eigenvalue and eigenprojector, while we are concerned only with the first order term. The key to the last part of the proof, yielding Theorem 2, is to use an appropriate eigenvector normalization.
The main tool here is the residue theorem [MH06, p. 293-294, Thm. 8.1 and 8.2]:
(Residue Theorem) Let be a simply connected domain in and let be a simple closed positively oriented contour that lies in . If is analytic inside and on , except at the points that lie inside , then
where if has a simple pole at , then
and if has a pole of order at , then
Let be the boundary of an open set in the complex plane containing the simple eigenvalue , with no other eigenvalues of in . First note that since , the characteristic polynomial of , does not vanish on , the same will hold for all polynomials with coefficients sufficiently close to those of ; in particular, it will hold for , the characteristic polynomial of , if is sufficiently close to ; say, . From here on, we always assume that . By the argument principle [MH06, p. 328, Thm. 8.8], the number of zeros of inside is
For , this value is . Since for each , the integrand is a continuous function of , the integral above is as well. Since it is integer-valued, it must be the constant . So, let denote the unique root of in the region , i.e., the unique eigenvalue of in . Note that this means that can be written in the form , where has no roots in . It therefore follows from the residue theorem that
Since the left-hand side of (20) is an analytic function of , the right-hand side is as well. Thus has a unique eigenvalue in and is an analytic function of .
For not an eigenvalue of , define the resolvent of by
Lemma 1 states that there exist left and right eigenvectors and associated with and satisfying , along with matrices , and , satisfying
Note that we do not claim that and are analytic, or even continuous functions of . It follows that the resolvent of satisfies
where . Now, is a matrix-valued function of with no poles in , so it follows from the residue theorem (applied to the functions associated with each entry of ) that and therefore from (22) that
For , this is .
From the definition of the resolvent (21), it follows that since is an analytic function of , the resolvent is as well, provided that is not an eigenvalue of . Differentiating the equation
with respect to gives