 # Wilkinson's bus: Weak condition numbers, with an application to singular polynomial eigenproblems

We propose a new approach to the theory of conditioning for numerical analysis problems for which both classical and stochastic perturbation theory fail to predict the observed accuracy of computed solutions. To motivate our ideas, we present examples of problems that are discontinuous at a given input and have infinite classical and stochastic condition number, but where the solution is still computed to machine precision without relying on structured algorithms. Stimulated by the failure of classical and stochastic perturbation theory in capturing such phenomena, we define and analyse a weak worst-case and a weak stochastic condition number. This new theory is a more powerful predictor of the accuracy of computations than existing tools, especially when the worst-case and the expected sensitivity of a problem to perturbations of the input is not finite. We apply our analysis to the computation of simple eigenvalues of matrix polynomials, including the more difficult case of singular matrix polynomials. In addition, we show how the weak condition numbers can be estimated in practice.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The condition number of a computational problem measures the sensitivity of with respect to perturbations in the input. If the problem is differentiable, then the condition number is the norm of the derivative of . In the case of solving systems of linear equations, the idea of conditioning dates back at least to the work of von Neumann and Goldstine Neumann1947 and Turing Turing1948 , who coined the term. For an algorithm computing in finite precision arithmetic, the importance of the condition number stems from the “rule of thumb” popularized by N. J. Higham (Higham1996, , §1.6),

The backward error is small if the algorithm computes the exact value of at a nearby input, and a small condition number would certify that this is enough to get a small overall error. Higham’s rule of thumb comes from a first order expansion, and in practice it often holds as an approximate equality and is valuable for practitioners who wish to predict the accuracy of numerical computations. Suppose that a solution is computed with, say, a backward error equal to . If then one would trust the computed value to have (at least) meaningful decimal digits.

The condition number can formally still be defined when is not differentiable, though it may not be finite. In particular, if is discontinuous at an input, then the condition number is ; a situation clearly beyond the applicability of Higham’s rule. Inputs for which the condition number is not finite are usually referred to as ill-posed. Based on the worst-case sensitivity one would usually only expect a handful of correct digits when evaluating a function at such an input, and quite possibly none.111The number of accurate digits that can be expected when requires a careful discussion. It depends on the unit roundoff , on the exact nature of the pathology of , and on . For example, computing the eigenvalues of a matrix similar to an Jordan block has an infinite condition number. Usually this translates into expecting only about accuracy, up to constants, when working in finite precision arithmetic. For a more complete discussion, see GW1976 , where pathological examples of derogatory matrices are constructed, whose eigenvalues are not sensitive to finite precision computations (for fixed ). For discontinuous , however, these subtleties alone cannot justify any accurately computed decimal digits. On the other hand, a small condition number is not a necessary condition for a small forward-backward error ratio: it is not inconceivable that certain ill-posed problems can be solved accurately despite having large or infinite condition number. Consider, for example, the problem of computing an eigenvalue of the matrix pencil (linear matrix polynomial)

 L(x)=⎡⎢ ⎢ ⎢⎣−1142−23126131162274⎤⎥ ⎥ ⎥⎦x+⎡⎢ ⎢ ⎢⎣2−1−5−16−2−11−250−203131⎤⎥ ⎥ ⎥⎦; (1)

this is a singular matrix pencil (the determinant is identically zero) whose only finite eigenvalue is simple and equal to (see Section 4 for the definition of an eigenvalue of a singular matrix polynomial and other relevant terminology). The input is and the solution is . If the QZ algorithm QZ , which is the standard eigensolver for pencils, is called via MATLAB’s command eig 222MATLAB R2016a on Ubuntu 16.04, the output is:

>> eig(L0,-L1)
ans =
-138.1824366539536
-0.674131242894470
1.000000000000000
0.444114486065683

All but the third computed eigenvalues are complete rubbish. This is not surprising: singular pencils form a proper Zariski closed set in the space of matrix pencils of a fixed format, and it is unreasonable to expect that an unstructured algorithm would detect that the input is singular and return only one eigenvalue. Instead, being backward stable, QZ computes the eigenvalues of some nearby matrix pencil, and almost all nearby pencils have eigenvalues. On the other hand, the accuracy of the approximation of the genuine eigenvalue is quite remarkable. Indeed, the condition number of the problem that maps to the exact eigenvalue is infinite because the map from matrix pencils to their eigenvalues is discontinuous at any matrix pencil whose determinant is identically zero. To make matters worse, there exist plenty of matrix pencils arbitrarily close to and whose eigenvalues are all nowhere near . For example, for any ,

 ^L(x)=L(x)+ϵ⎛⎜ ⎜ ⎜⎝⎡⎢ ⎢ ⎢⎣0−1−4−11−3−13−30−2−8−2−1−1−3−1⎤⎥ ⎥ ⎥⎦x+A⎞⎟ ⎟ ⎟⎠,

where

 A=⎡⎢ ⎢ ⎢⎣−1−1−3−2−3−3−9−6−2−2−6−4−1−1−3−2⎤⎥ ⎥ ⎥⎦γ0+⎡⎢ ⎢ ⎢⎣1000300020001000⎤⎥ ⎥ ⎥⎦γ1+⎡⎢ ⎢ ⎢⎣0−1−4−10−3−12−30−2−8−20−1−4−1⎤⎥ ⎥ ⎥⎦γ2+⎡⎢ ⎢ ⎢⎣0000−1010000010−10⎤⎥ ⎥ ⎥⎦γ3,

has characteristic polynomial and therefore, by an arbitrary choice of the parameters , can have eigenvalues literally anywhere. Yet, unaware of this worrying caveat, the QZ algorithm computes an excellent approximation of the exact eigenvalue: correct digits! This example has not been carefully cherry-picked: readers are encouraged to experiment with any singular input in order to convince themselves that QZ often computes333Of course, if the exact solution is not known a priori, one faces the practical issue of deciding which of the computed eigenvalues are reliable. There are various ways in which this can be done in practice, such as artificially perturbing the problem; the focus of our work is on explaining why the correct solution has been shortlisted in the first place. accurately the (simple) eigenvalues of singular pencils, or singular matrix polynomials, in spite of having infinite condition number. Although the worst-case sensitivity to perturbations is indeed infinite, the raison d’être of the condition number, which is to predict the accuracy of computations on a computer, is not fulfilled.

Why does the QZ algorithm accurately compute the eigenvalue, when the map describing this computational problem is not even continuous? Two natural attempts at explaining this phenomenon would be to look at structured condition numbers and/or average-case (stochastic) perturbation theory.

1. An algorithm is structured if it computes the exact solution to a perturbed input, where the perturbations respect some special features of the input: for example singular, of rank , triangular, or with precisely one eigenvalue. The vanilla implementation of QZ used here is unstructured in the sense that it does not preserve any of the structures that would explain the strange case of the algorithm that computes an apparently uncomputable eigenvalue.444There exist algorithms able to detect and exploit the fact that a matrix pencil is singular, such as the staircase algorithm VanDooren1979 . It does, however, preserve the real structure. In other words, if the input is real, QZ computes the eigenvalues of a nearby real pencil. Yet, by taking real in the example above, it is clear that there are real pencils arbitrary close to whose eigenvalues are all arbitrarily far away from .

2. The classical condition number is based on the worst-case perturbation of an input; as discussed in (Higham1996, , §2.8), this approach tends to be overly pessimistic in practice. Numerical analysis pioneer James Wilkinson, in order to illustrate that Gaussian elimination is unstable in theory but in practice its instability is only observed by mathematicians looking for it, is reported to have said trefethen2012smart

 0−\heightofA\noindentto0.0ptAnyone that unlucky has already been run over by a bus.0−\heightofAto0.0pt''

In other words: in Wilkinson’s experience, the likelihood of seeing the admittedly terrifying worst case appeared to be very small, and therefore Wilkinson believed that being afraid of the potential catastrophic instability of Gaussian elimination is an irrational attitude. Based on this experience, Weiss et al. weiss1986average and Stewart stewart1990stochastic proposed to study the effect of perturbations on average, as opposed to worst-case; see (Higham1996, , §2.8) for more references on work addressing the stochastic analysis of roundoff errors. This idea was later formalized and developed further by by Armentano armentano2010stochastic . This approach gives some hope to explain the example above, because it is known that the set of perturbations responsible for the discontinuity of has measure zero dd09 . However, this does not imply that on average perturbations are not harmful. In fact, as we will see, the stochastic condition number for the example above (or for similar problems) is still infinite! Average-case perturbation analysis, at least in the form in which it has been used so far, is still unable to solve the puzzle.

While neither structured nor average-case perturbation theory can explain the phenomenon observed above, Wilkinson’s colourful quote does contain a hint on how to proceed. We will get back to the matrix pencil (1) in Example 1.3, where we show that our new theory does explain why this problem is solved to high accuracy using standard backward stable algorithms.

### 1.1 Main results

The main contributions of this paper are

1. a new species of ‘weak’ condition numbers, which we call the weak worst-case condition number and the weak stochastic condition number, that give a more accurate description of the perturbation behaviour of a computational map;

2. an illustration of the advantages of the new concept by demonstrating that, unlike both classical and stochastic condition numbers, the weak conditon numbers are able to explain why the apparently uncomputable eigenvalues of singular matrix polynomials, such as the eigenvalue in the example above, can be computed (and sometimes with remarkable accuracy!);

3. to show, by exhibiting a counterexample, that a theorem of Armentano on the relation between stochastic and classical condition numbers, valid under certain regularity assumptions, cannot in general be extended to situations in which the map to be computed is not differentiable;

4. a concrete method for the bounding the weak condition numbers for the eigenvalues of singular matrix polynomials.

To be able to state the main results without much delay, we first introduce the concepts involved informally, postponing the rigorous definitions to later sections. Loosely speaking, the stochastic condition number of a computation problem is the expected sensitivity of the output to random perturbations in the input. Practical limitations of this concept lead us to define the notion of weak worst-case and weak stochastic condition. A computation map at an input is said to have weak worst-case condition

if, with probability

, the infinitesimal sensitivity of the solution to random perturbations in the input is not greater than . The weak stochastic condition number is the expected sensitivity to perturbations, conditioned on a subset of perturbations of measure (see Definitions 3.2 and 3.3, respectively).

Our case study is the problem of computing an eigenvalue of a singular matrix polynomial. A matrix polynomial is a matrix , where is a field; here we take to be a subfield of . Alternatively, we can think of it as an expression of the form

 P(x)=P0+P1x+⋯+Pdxd,

with . If we require , then the integer in such an expression is called the degree of the matrix polynomial 555

By convention, the zero matrix polynomial has degree

.

. We denote the vector space of matrix polynomials over

of degree at most by . A matrix polynomial is called singular if and otherwise regular. For the definition of the rank and of simple eigenvalues we refer to Section 4. The main theorems below are expressed by means of a parameter that depends only on the matrix polynomial and on the eigenvalue in question; see (20) for the definition and Section 6 for details on how to estimate this parameter in practice. The condition numbers in the statements below refer to a function mapping to the eigenvalue of interest, as specified formally at the beginning of Section 4.3

. The norm used in the definition of the various condition numbers is the usual Euclidean norm on the vector space of matrix polynomials of a given degree, and by uniformly distributed perturbations we mean perturbations distributed uniformly on the unit sphere with respect to this norm (see Section

5 for details). We assume throughout rank (which implies ) and degree . The number of free parameters of the problem then satistifes . By we denote the gamma function.

###### Theorem 1.1

Let be a matrix polynomial of rank , and let be a simple eigenvalue of . Let and as defined in (20). Then

• the worst-case condition number is

 κ={∞ if r
• the stochastic condition number, with respect to uniformly distributed perturbations, is

 κs=1γPπ2Γ(N)Γ(n−r+1)Γ(N+1/2)Γ(n−r+1/2) (3)
• if and , then the -weak worst-case condition number, with respect to uniformly distributed perturbations, is bounded by

 κw(δ)≤1γPmax{1,√n−rδN} (4)

The expression for the stochastic condition number involves the quotient of gamma functions. For such quotients, we will frequently use the bounds

 √x≤Γ(x+1)Γ(x+1/2)≤√x+1/2, (5)

which hold for , see wendel1948 for a derivation. Using these bounds on the numerator and denominator of (3), we get the more interpretable

 κs≤1γPπ2√n−r+1/2N−1/2≤1γPπ2√n−r+1N. (6)

The expression for the worst-case condition in the regular case coincides, up to minor differences due to the use of different norms, with the expression found by Tisseur tisseur2000backward . The bound on the weak condition number (4) shows that

, which is the median of the same random variable of which

is the expected value, is bounded by , which is the expression of the worst-case condition number in the regular case .

The situation changes dramatically when considering real matrix polynomials with real perturbations, as in this case even the stochastic condition becomes infinite if the matrix polynomial is singular. In the statement (but not in the derivation and in later chapters) we denote the resulting condition number with respect to real perturbations by using the superscript .

###### Theorem 1.2

Let be a real matrix polynomial of rank , and let be a simple eigenvalue of . Let and as defined in (20). Then

• the worst-case condition number is

 κR={∞ if r
• the stochastic condition number, with respect to uniformly distributed real perturbations, is

 κRs={∞ if r
• if and , then the -weak worst-case condition, with respect to uniformly distributed real perturbations, is

 κRw(δ)≤1γPmax{1,√n−rN1δ} (9)
• if and , then the -weak stochastic condition number satisfies

 κRws(δ)≤1γP(11−δ)(1+√n−rNlog(√n−rNδ−1)) (10)

Note that the weak worst-case condition number bound for real perturbations is proportional to , while in the complex case it is proportional to . This is a consequence of the different probability tails of the distribution of the directional sensitivity for real and for complex perturbations, which also accounts for the fact that the stochastic condition number diverges when considering real perturbations, and is finite for complex perturbations.

It is instructive to compare the weak condition numbers in the singular case to the worst-case and stochastic condition number in the regular case. In the regular case (), when replacing the worst-case with the stochastic condition we get an improvement by a factor of , which is consistent with previous work armentano2010stochastic (see also Section 2) relating the worst-case to the stochastic condition. We will see in Section 6.1 that the expected value in the case captures the typical perturbation behaviour of the problem more accurately than the worst-case bound. Among the many possible interpretations of the weak worst-case condition, we highlight the following:

• Since the bounds are monotonically decreasing as the rank increases, we can get bounds independent of . Specifically, we can replace the quotient with . This is useful since, in applications, the rank is not always known.

• While the stochastic condition number (8), which measures the expected sensitivity of the problem of computing a singular eigenvalue, is infinite, for the median sensitivity is bounded by

 κRw(1/2)≤1γP.

The median is a more robust and arguably better summary parameter than the expectation.

• Choosing in (10), we get a weak stochastic condition bound of

 κRws(e−N)≤1γP(1+√N(n−r)1−e−N).

That is, the condition number improves from being unbounded to sublinear in , by just removing a set of inputs of exponentially small measure.

###### Example 1.3

Consider the matrix pencil from (1). This matrix pencil has rank , with only one simple eigenvalue . As we will see in Example 5.3, the constant appearing in the bounds is

 γ−1L=12.16.

In this example, , and , so that , , and

 √n−rN=0.1767.

For small enough we get the (not optimized) bound . If is of order and an algorithm delivers a solution with a backward error of, say, , then this means that with probability the computed solution will be correct to (approximately) decimal digits.

### 1.2 Estimating the weak condition number

It is of practical interest to be able to compute an approximation of the weak worst-case and weak stochastic condition numbers for singular polynomial eigenvalue problems. We show that a bound on these condition numbers can be computed from the output of a backward stable algorithm that computes an eigenvalue and a pair of eigenvectors of a nearby problem. More specifically, let

be a singular matrix polynomial of rank , a unit-norm regular matrix polynomial. Observe that, since , there can be at most finitely many solutions in to the polynomial equation (and not necessarily all constant in ); it immediately follows that is regular for almost all . Define

 ¯¯¯κ=limϵ→0κ(P(x)+ϵE(x)),¯¯¯κs=limϵ→0κs(P(x)+ϵE(x)). (11)

We will add the superscript when restricting to real problems with real perturbations. As we will see, these limits are well defined for all perturbations outside a proper Zariski closed set. The parameters defined in (11), in turn, can be used to bound the weak condition numbers of the singular problem.

###### Theorem 1.4

Let be a singular matrix polynomial of rank with simple eigenvalue . Then

 κw(δ)≤¯¯¯κ⋅max{√n−rδN,1},κRw(δ)≤¯¯¯κR⋅max{√n−rN1δ,1}. (12)

If and , then for uniformly distributed perturbations,

 P{δ−1/2¯¯¯κs≥η⋅κw(δ)}≥1−e−1/η2,P{δ−1¯¯¯κRs≥η⋅κRw(δ)}≥1−e−1/2η2.

The probabilistic bounds in Theorem 1.4 are most useful when is large (for example, large matrix polynomials of small rank), and is proportional to . If , for example, there is nothing to be gained over the deterministic bounds (12). On the other hand, if is large and for some , then Theorem 1.4 shows that as long as a set of measure at most is avoided, we get an improvement over the deterministic bounds (12) by a factor of .

We can approximate using the eigenvalue and eigenvectors computed by a backward stable algorithm: if such an algorithm produces the eigenvalue and corresponding eigenvectors and , then we set

 ~κ:=|~u∗P′(λ)~v|−1(d∑j=0|λ|2j)1/2,

where denotes the Hermitian (conjugate) transpose of a matrix or vector. Similarly, we get expressions for the stochastic condition by using (3) and (8) with . If the perturbation is small, then these computed condition numbers will be close to and . The proof of Theorem 1.4 is given in Section 6, where the context is explained in more detail.

### 1.3 Related work

Rounding errors, and hence the perturbations considered, are not random (Higham1996, , 1.17). Nevertheless, the observation that the computed bounds on rounding errors are overly pessimistic has lead to the study of statistical and probabilistic models for rounding errors. An early example of such a statistical analysis is Goldstine and von Neumann goldstine1951numerical , see  (Higham1996, , 2.8) and the references therein for more background. The idea of using an average, rather than a supremum, in the definition of conditioning was introduced by Weiss et.al. weiss1986average in the context of the (matrix) condition number of solving systems of linear equations, and a more comprehensive stochastic perturbation theory was developed by G. W. Stewart stewart1990stochastic . In armentano2010stochastic , Armentano introduced the concept of a smooth condition number, and showed that it can be related to the worst case condition. His work uses a geometric theory of conditioning and does not extend to singular problems. This line of work is not to be confused with the probabilistic analysis of condition numbers, where a condition number is given, and the distribution of this condition number is studied over the space of inputs (see BC2013 and the references therein). Nevertheless, our work is inspired by the idea of weak average-case analysis amelunxen2017average that was developed in this framework. Weak average-case analysis is based on the observation, which has its origins in the work of Smale smale1981fundamental and Kostlan kostlan1988complexity , that discarding a set of small measure from the input space can dramatically improve the expected value of a condition number. The core of this approach is a shift in focus away from the average case and towards bounding the probability of rare events. In a sense, our contribution is to apply this line of thought in the context of the stochastic perturbation theory of Weiss et.al., Stewart and Armentano. We would like to stress that in contrast to the probabilistic analysis of condition numbers, we do not seek to model the distribution of perturbations. The aim is to formally quantify statements such as “the set of bad perturbations is small compared to the set of good perturbations”. In other words, the (non-random) accumulation of rounding errors in a procedure would need a very good reason to give rise to a badly perturbed problem.

Our case study are polynomial eigenvalue problems. The conditioning of regular polynomial eigenvalue problems has been studied in detail by Tisseur tisseur2000backward and by Dedieu and Tisseur in a homogeneous setting dedieu2003perturbation . A probabilistic analysis of condition numbers for such problems was given by Armentano and Beltrán armentano2017polynomial over the complex numbers and by Beltrán and Kozhasov beltran2018real over the real numbers. Their work studies the distribution of the condition number on the whole space of inputs, and such an analysis only considers the condition number at regular matrix polynomials. A perturbation theory for singular polynomial eigenvalue problems was developed by de Terán and Dopico dd10 , and our work makes extensive use of their results.

### 1.4 Organization of the paper

The paper is organized as follows. In Section 2 we review the rigorous definitions of the worst-case (von Neumann-Turing) condition number and the stochastic framework (Weiss et.al., Stewart, Armentano), and comment on their advantages and limitations. In Section 3 we define the weak condition numbers, and derive general results; we argue in particular that, even when Wilkinson’s metaphorical bus hits von Neumann-Turing’s and Armentano-Stewart’s theories of conditioning, ours comes well endowed with powerful dodging skills. In Section 4 we introduce the perturbation theory of singular matrix polynomials, along with the definitions of simple eigenvalues and eigenvectors. We define the input-output map underlying our case study and introduce the directional sensitivity of such problems. In Section 5, we apply our weak theory of conditioning to the singular polynomial eigenvalue problems discussed in Section 4

, which amounts to a detailed study of the probability distribution of the directional sensitivity of such problems. In particular, we prove Theorem

1.1 and Theorem 1.2. In Section 6 we sketch how our new condition numbers can be estimated in practice and prove Theorem 1.4. Along the way we derive a simple concentration bound on the directional sensitivity of regular polynomial eigenvalue problems. Finally, in Section 7, we give some concluding remarks and discuss potential further applications.

## 2 Theories of conditioning

For our purposes, a computational problem is a map between normed vector spaces 666One can, more generally, allow and to be anything with a notion of distance, such as general metric spaces or Riemannian manifolds. All the definitions of condition can be adapted accordingly; in this paper we focus on the case of normed vector spaces.

 f:V→W,D↦S:=f(D),

and we will denote the (possibly different) norms in each of these spaces by . Following the remark on (High:FM, , p. 56), for simplicity of exposition in this paper we focus on absolute, as opposed to relative, condition numbers. The condition numbers considered depend on the map and an input ; they are maps

 cond:V×WV→[0,∞].

As we are only concerned with the condition of a fixed computational problem at a fixed input , in what follows we omit reference to and in the notation.

###### Definition 2.1 (Worst-case condition number)

The condition number of at is

 κ=limϵ→0sup∥E∥≤1∥f(D+ϵE)−f(D)∥ϵ∥E∥.

If is Fréchet differentiable at , then this definition is equivalent to the operator norm of the Fréchet derivative of . However, Definition 2.1 also applies (and can even be finite) when is not differentiable at ; we elaborate more on this point when discussing the directional sensitivity in Section 3.

In complexity theory BCSS1998 ; BC2013 , an elegant geometric definition of condition number is often used, which is essentially equivalent to Defintion 2.1 under certain assumptions. In this setting, the input-output relationship is modelled by means of a differentiable map and a solution set . Thus is considered a solution of our computational problem on input , if . The explicit input-output map is then provided by the implicit function theorem, and the condition number is defined as the operator norm of this map; see (BC2013, , Chapter 14) for details.

###### Definition 2.2 (Shub-Smale worst-case condition number)

Let be an open neighbourhood of . Suppose that is smooth and that the Jacobian ) has full rank in . Suppose further that the projection ) is a diffeomorphism with inverse . Then the condition number is the operator norm of the differential of .

Note that Definition 2.2 emphasizes the local nature of the condition number. This geometric approach, albeit useful in many other contexts, cannot have any more predictive power than its limsup-based counterpart. In particular, it cannot be applied to the the problem discussed in the introduction, since it assumes that is locally differentiable. As we will see in Section 4, however, for that problem can be shown to have bounded partial derivatives in almost all directions, and this fact suggests a probabilistic approach to the problem. The following definition is loosely derived from the work of Stewart stewart1990stochastic and Armentano armentano2010stochastic , based on earlier work by Weiss et. al. weiss1986average . In what follows, we use the terminology for a random variable with distribution , and for the expectation with respect to this distribution.

###### Definition 2.3 (Stochastic condition number)

Let be a -valued random variable with distribution and assume that and . Assume that the function is measurable. Then the stochastic condition number is

 κs=limϵ→0EE∼D∥f(D+ϵE)−f(D)∥ϵ∥E∥.
###### Remark 2.4

We note in passing that Definition 2.3 depends on the choice of a measure

. This measure is a parameter that the interested mathematician should choose as convenient for the analysis of a particular class of problems; this is of course not particularly different than the freedom one is given in picking a norm. In fact, it is often convenient to combine these two choices, using a distribution that is invariant with respect to a given norm. Typical choices that emphasize invariance are the uniform (on a sphere) or Gaussian distributions, and the Bombieri-Weyl inner product when dealing with homogeneous multivariate polynomials

(BC2013, , 16.1). Technically speaking, the distribution is on the space of perturbations, rather than the space of inputs.

While Definition 2.1 is inherently pessimistic in that it considers the worst-case scenario in picking the direction for the perturbation , Definition 2.3 averages over all directions and therefore describes the expected value of the effects of rounding errors on an algorithm. There is clearly a tradeoff between guaranteeing bounds on the propagation of errors, as the worst-case analysis does, and averaging over these, as the stochastic analysis does.

If is differentiable at and is finite dimensional, then it was observed by Armentano armentano2010stochastic that the stochastic condition number can be related to the worst-case one. We illustrate this relation in a simple but instructive special case. Consider the setting777

Armentano’s results apply to differentiable maps between Riemannian manifolds, and cover the moments of the directional derivative as well: they are stronger and are derived with a more comprehensive approach.

where () is differentiable at , so that is the operator norm of the differential. If

denote the singular values of

(with for ), then . If is the uniform distribution on the sphere, then one has

 1mκ(a)≤σ1EE∼D|E1|≤EE∼D[√∑iσ2iE2i](b)=EE∼D[∥df(D)E∥2]=κs, (13)

where for (a) we used the fact that

 EE∼D|E1|=1mEE∼D∥E∥1≥1mEE∼D∥E∥2=1m

and for (b) we used the orthogonal invariance of the uniform distribution on the sphere. As we will see in the case of singular polynomial eigenvalue problems with complex perturbations, the bound (13) does not hold in general, as the condition number can be infinite while the stochastic condition number is bounded. However, sometimes it can happen that the stochastic condition number is also infinite, because the “directional sensitivity” (see Definition 3.1) is not an integrable function. For example, for the problem of computing the eigenvalue of the singular pencil in the introduction, in spite of the fact that real perturbations are analytic for all but a proper Zariski closed set of perturbations dd09 , when restricting to real perturbations, we get

 κs=κ=∞.

So why does QZ compute the eigenvalue with digits of accuracy? A more sophisticated theory is needed to understand what is going on.

## 3 Weak condition numbers

The first concept that we need to build our theory of “weak condition” is that of directional sensitivity. Just as the classical worst-case condition corresponds to the norm of the derivative, the directional sensitivity corresponds to a directional derivative. And, just as a function can have some, or all, directional derivatives while still not being continuous, a computational problem can have well-defined directional sensitivities but have infinite condition number.

###### Definition 3.1 (Directional sensitivity)

The directional sensitivity of the computational problem at the input with respect to the perturbation is

 σE=limϵ→0∥f(D+ϵE)−f(D)∥ϵ∥E∥.

The directional sensitivity takes values in . In numerical analytic language, the directional sensitivity is the limit, for a particular direction of the backward error, of the ratio of forward and backward errors of the computational problem ; this limit is taken letting the backward error tend to zero (again having fixed its direction), which could also be thought of as letting the unit roundoff tend to zero. See e.g. (Higham1996, , §1.5) for more details on this terminology.

The directional sensitivity is, if it is finite, times the norm of the Gâteaux derivative of at in direction . If is Fréchet differentiable, then the Gâteaux derivative agrees with the Fréchet derivative, and we get

 κ=sup∥E∥≤1σE. (14)

We note in passing that the above relation does not necessarily hold if is not Fréchet differentiable.

If is a -valued random variable satisfying the conditions of Definition 2.3 and if is Gâteaux differentiable in almost all directions, then by the Fatou-Lebesgue Theorem we get

 κs=E[σE].

If is not integrable, then the stochastic condition number is infinite. This can happen even when the directional sensitivity is finite in almost all directions. When integrating, null sets can be safely ignored; however, depending on the exact nature of the divergence (or lack thereof) of the integrand when approaching those null sets, the value of the integral need not be finite. To overcome this problem and still give probabilistically meaningful statements, we propose to use instead the concept of numerical null sets, i.e., sets of finite but small (in a sense that can be made precise depending on, for example, the unit roundoff of the number system of choice, the confidence level required by the user, etc.) measure. This is, of course, completely analogous to the idea that the “numerical zero” is the unit roundoff.

We next define our main characters, two classes of weak condition numbers which generalize, respectively, the classical worst-case and stochastic condition numbers.

In the following, we fix a probability space and a random variable , where we consider endowed with the Borel -algebra. We further assume that

 E[E]=∫ΩE(ω) dP(ω)=0,E[∥E∥2]=∫Ω∥E(ω)∥2 dP(ω)=1.

We use the notation for the measure of a set if there is no ambiguity.

###### Definition 3.2 (Weak worst-case condition number)

Let . The -weak-worst-case condition number is

 κw(δ)=inf\SS∈Σ,|\SS|≥1−δsupω∈\SSσE(ω).

Note that the definition of weak worst-case condition number does not require to be a random variable, i.e., a measurable function defined -a.e. on . For Definition 3.3

and in the characterization of this condition as a quantile (Lemma

3.4), however, we need to be a random variable. This is the case, for example, if is measurable and the directional (Gâteaux) derivative exists -a.e.

###### Definition 3.3 (Weak stochastic condition number)

Let and assume that is measurable. The -weak-stochastic condition number is

 κws(δ)=inf\SS∈Σ,|\SS|≥1−δE[σE | S].

In Definitions 3.2 and 3.3, the probability measure can be seen as a parameter. We find this generality useful in that it induces maximal flexibility: a mathematician can appropriately make the definitions more concrete, adapting them to the particular problem of interest. In practice, a statement of the form “ is bounded above by ” has an easy and useful interpretation: “with probability , the directional sensitivity is bounded above by ”. In turn, the directional sensitivity has an interpretation as (the limit of) a ratio of forward and backward errors, and hence the new approach provides a potentially useful general framework to give probabilistic bounds on the forward accuracy of outputs of numerically stable algorithms. These probabilistic bounds can be more realistic than classical bounds, particularly when the latter are not finite. Moreover, as we will discuss in Section 6, upper bounds on the weak condition numbers can be computed in practice for natural distributions. One can therefore see as a parameter representing the confidence level that a user wants for the output, and any computable upper bound on becomes a practical reliability measure on the output, valid with probability . Although of course roundoff errors are not really random variables, we hope that modelling them as such can become, with this “weak theory”, a useful tool for numerical analysis problems whose traditional condition number is infinite.

Lemma 3.4 gives an alternative characterization of the weak worst-case condition number as the -quantile of the directional sensitivity. This equivalent formulation can be exploited to obtain closed formulae for in terms of probabilistic quantile functions.

###### Lemma 3.4

Assume that is -measurable. Then for ,

 κw(δ)=inf{y∈R:P{σE

If, in addition, is absolutely continuous, then

 κws(δ)=E[σE | σE≤κw(δ)].
###### Proof

For the first statement, set and . For any , we then have , and hence, setting ,

 μ≤supω∈S0σE(ω)≤λ+ϵ,

proving . If , then there exists and a set of measure such that . Hence, and as a consequence , contradicting the definition of . Hence, .

For the second statement, set

. By the right-continuity of the cumulative distribution function of

, . Therefore,

 κws(δ)≤E[σE | σE≤μ}.

The reverse inequality follows from (amelunxen2017average, , Proposition 2.1).

Lemma 3.4 has the immediate consequence that . Moreover, in the case of a differentiable , equality holds if and only if Sufficient conditions for the latter to hold are if the worst case happens on a set of positive measure and/or if the sensitivity is continuous as a map from to . It is clear, however, that equality does not always hold (let , and take ).

## 4 Eigenvalues of matrix polynomials and their directional sensitivity

Algebraically, the spectral theory of matrix polynomials is most naturally described over an algebraically closed field; however, the theory of condition is analytic in nature and it is sometimes of interest to restrict the coefficients, and their perturbations, to be real. In this section we give a unified treatment of both real and complex matrix polynomials. For conciseness we keep this overview very brief; interested readers can find further details in dd10 ; dd09 ; Dopico2018 ; GLR09 ; Noferini2012 and the references therein. For , denote by the vector space of square matrix polynomials of size and degree up to with coefficients in the field : , with for . An element is said to be a finite eigenvalue of if

 rankC(P(λ))

where is the field of fractions of , that is, the field of rational functions with coefficients in . The geometric multiplicity of the eigenvalue is the amount by which the rank decreases in the above definition,

 gλ=r−rankC(P(λ)).

There exist matrices with , , that transform into its Smith canonical form,

 U∗P(x)V=D:=diag(h1(x),…,hr(x),0,…,0), (15)

where the invariant factors are non-zero monic polynomials such that for . If one has the factorizations for some , with for and not dividing any of the , then the are called the partial multiplicities of the eigenvalue . The algebraic multiplicity is the sum of the partial multiplicities. Note that an immediate consequence of this definition is . If (i.e., all non-zero equal to ) then the eigenvalue is said to be semisimple, otherwise it is defective. If (i.e., for and zero otherwise), then we say that is simple, otherwise it is multiple.

A square matrix polynomial is regular if , i.e., if is not identically zero. A finite eigenvalue of a regular matrix polynomial is simply a root of the characteristic equation , and its algebraic multiplicity is equal to the multiplicity of the corresponding root. If a matrix polynomial is not regular it is said to be singular. More generally, a finite eigenvalue of a matrix polynomial (resp. its algebraic multiplicity) is a root (resp. the multiplicity as a root) of the equation , where is the monic greatest common divisor of all the minors of of order (note that ).

###### Remark 4.1

For completeness we recall that infinite eigenvalues, and their multiplicities, can be formally defined via zero eigenvalues, and their multiplicities, of the reversal matrix polynomial ; clearly, with this definition, whether infinite eigenvalues exist or not is not necessarily an intrinsic property of but depends also on the choice of ; the latter is typically dictated by the application, and known in the matrix polynomial literature as the grade. Alternatively, we can homogenize the polynomials and work in projective space dedieu2003perturbation .

### 4.1 Eigenvectors

To define the eigenvectors, let and be minimal bases Dopico2018 ; Forney1975 ; Noferini2012 of and (as vector spaces over ), respectively. For , it is not hard to see Dopico2018 ; Noferini2012 that and are vector spaces over of dimension .

Note that and for , and that the difference in dimension is the geometric multiplicity, . A right eigenvector corresponding to an eigenvalue is defined (Dopico2018, , Sec. 2.3) to be a nonzero element of the quotient space . A left eigenvector is similarly defined as an element of . In terms of the Smith canonical form (15), the last columns of , evaluated at , represent a basis of , while the last columns of , evaluated at , represent a basis of .

In the analysis we will be concerned with a quantity of the form , where are representatives of eigenvectors. It is known (Dopico2018, , Lemma 2.9) that is equivalent to the existence of a polynomial vector such that and . Then,

 0=ddxP(x)b(x)|x=λ=P′(λ)b(λ)+P(λ)b′(λ)

implies that for any representative of a left eigenvector we get . It follows that for an eigenvalue representative , depends only the component of orthogonal to , and an analogous argument also shows that this expression only depends on the component of orthogonal to . In practice, we will therefore choose representatives and for the left and right eigenvalues that are orthogonal to and , respectively, and have unit norm. If is a matrix polynomial with simple eigenvalue , then there is a unique (up to sign) way of choosing such representatives and .

### 4.2 Perturbations of singular matrix polynomials: the De Terán-Dopico formula

Assume that , where , is a matrix polynomial of rank , and let be a simple eigenvalue. Let be a matrix whose columns form a basis of , and such that the columns of form a basis of . Likewise, let be a matrix whose columns form a basis of , and such that the columns of form a basis of . In particular, and are representatives of, respectively, right and left eigenvectors of . The following explicit characterization of a simple eigenvalue is due to De Terán and Dopico (dd10, , Theorem 2 and Eqn. (20)). To avoid making a case distinction for the regular case , we agree that if and are empty.

###### Theorem 4.2

Let be matrix polynomial of rank with simple eigenvalue and as above. Let be such that is non-singular. Then for small enough , the perturbed matrix polynomial has exactly one eigenvalue of the form

 λ(ϵ)=λ−det(X∗E(λ)Y)u∗P′(λ)v⋅det(U∗E(λ)V)ϵ+O(ϵ2). (16)

Note that in the special case we recover the expression for regular matrix polynomials from (tisseur2000backward, , Theorem 5) and (dd10, , Corollary 1),

 λ(ϵ)=λ−u∗E(λ)vu∗P′(λ)vϵ+O(ϵ2), (17)

where are left and right eigenvectors corresponding to the eigenvalue .

### 4.3 The directional sensitivity of a singular polynomial eigenproblem

We can now describe the input-output map that underlies our analysis. By the local nature of our problem, we consider a fixed matrix polynomial of rank with simple eigenvalue , and define the input-output function

 f:Fn×nd[x]→C

that maps to , maps to for any and satisfying the conditions of Theorem 4.2, and maps any other matrix polynomial to an arbitrary number other than .

An immediate consequence of Theorem 4.2 and our definition of the input-output map is an explicit expression for the directional sensitivity of the problem. Here we write for the Euclidean norm of the vector of coefficients of as a vector in . From now on, when talking about the “directional sensitivity of an eigenvalue in direction ”, we implicitly refer to the input-output map defined above.

###### Corollary 4.3

Let be a simple eigenvalue of and let be a regular matrix polynomial. Then the directional sensitivity of the eigenvalue in direction is

 σE=1∥E∥∣∣∣det(X∗E(λ)Y)u∗P′(λ)v⋅det(U∗E(λ)V)∣∣∣. (18)

In the special case , we have

 σE=1∥E∥∣∣∣u∗E(λ)vu∗P′(λ)v∣∣∣. (19)

For the goals in this paper, these results suffice. However, we note that it is possible to obtain equivalent formulae for the expansion that, unlike the one by De Terán and Dopico, do not involve the eigenvectors of singular polynomial. As we find that such an alternative may be useful, we derive one in Appendix A.

Finally, we introduce a parameter that will enter all of our results, and coincides with the inverse of the worst-case condition number in the regular case . Choose representatives of the eigenvectors that satisfy and (if ) . For such a choice of eigenvectors, define

 γP:=|u∗P′(λ)v|⋅(d∑j=0|λ|2j)−1/2. (20)

We conclude with the following variation of (tisseur2000backward, , Theorem 5). Since our expression for the worst-case condition number for computing simple eigenvalues of regular matrix polynomials is slightly different to that from (tisseur2000backward, , Theorem 5), owing to the use of a different norm for perturbations, we include a proof.

###### Proposition 4.4

Let be a regular matrix polynomial and a simple eigenvalue. Then the worst-case condition number of the problem of computing is

 κ=γ−1P. (21)
###### Proof

If is regular with simple eigenvalue , then the map that assigns to is differentiable in a neighbourhood of (since is a differentiable map and is a simple root of the univariate polynomial equation ). The worst-case condition number is then given by (14),

 κ=sup∥E∥≤1σE=sup∥E∥≠01∥E∥|u∗E(λ)v||u∗P′(λ)v|,

where for the second equality we used (19) and the homogeneity of the expression. Applying the Cauchy-Schwarz inequality to each entry of , we get

 |u∗E(λ)v|2≤∥E(λ)∥2≤(d∑j=0|λ|2j)⋅∥E∥2,

and this implies the upper bound . To see that equality is attainable, take with

 Ej=(λ∗)juv∗,j∈{0,1,…,d}.

Then clearly,

 ∥E∥=(d∑j=0|λ|2j)1/2.

Moreover, it follows that , and hence

 ∥E(λ)∥=|u∗E(λ)v|=d∑j=0|λ|2j=(d∑j=0|λ|2j)1/2⋅∥E∥.

This shows that .

###### Remark 4.5

Note that, in practice, an algorithm such as QZ applied to will typically compute all the eigenvalues of a nearby matrix polynomial. Therefore, any conditioning results on the conditioning of our specific input-output map will explain why the correct eigenvalue is found among

the computed eigenvalues, but not tell us how to choose the right one in practice. For selecting the right eigenvalue one could use heuristics, such as computing the eigenvalues of an artificially perturbed problem. These practical considerations and their analysis are very important, but lead us beyond the scope of this paper.

## 5 The stochastic and weak condition numbers of simple eigenvalues of singular matrix polynomials

Let be a matrix polynomial of rank with simple eigenvalue . We consider perturbations , which we identify with the matrix