Structured backward errors in linearizations

12/09/2019
by   Vanni Noferini, et al.
aalto
0

A standard approach to compute the roots of a univariate polynomial is to compute the eigenvalues of an associated confederate matrix instead, such as, for instance the companion or comrade matrix. The eigenvalues of the confederate matrix can be computed by Francis's QR algorithm. Unfortunately, even though the QR algorithm is provably backward stable, mapping the errors back to the original polynomial coefficients can still lead to huge errors. However, the latter statement assumes the use of a non-structure exploiting QR algorithm. In [J. Aurentz et al., Fast and backward stable computation of roots of polynomials, SIAM J. Matrix Anal. Appl., 36(3), 2015] it was shown that a structure exploiting QR algorithm for companion matrices leads to a structured backward error on the companion matrix. The proof relied on decomposing the error into two parts: a part related to the recurrence coefficients of the basis (monomial basis in that case) and a part linked to the coefficients of the original polynomial. In this article we prove that the analysis can be extended to other classes of comrade matrices. We first provide an alternative backward stability proof in the monomial basis using structured QR algorithms; our new point of view shows more explicitly how a structured, decoupled error on the confederate matrix gets mapped to the associated polynomial coefficients. This insight reveals which properties must be preserved by a structure exploiting QR algorithm to end up with a backward stable algorithm. We will show that the previously formulated companion analysis fits in this framework and we will analyze in more detail Jacobi polynomials (Comrade matrices) and Chebyshev polynomials (Colleague matrices).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/24/2021

A Provably Componentwise Backward Stable O(n^2) QR Algorithm for the Diagonalization of Colleague Matrices

The roots of a monic polynomial expressed in a Chebyshev basis are known...
10/22/2020

Rank-structured QR for Chebyshev rootfinding

We consider the computation of roots of polynomials expressed in the Che...
01/15/2020

Min-Max Elementwise Backward Error for Roots of Polynomials and a Corresponding Backward Stable Root Finder

A new measure called min-max elementwise backward error is introduced fo...
03/30/2021

Structural backward stability in rational eigenvalue problems solved via block Kronecker linearizations

We study the backward stability of running a backward stable eigenstruct...
03/21/2020

Backward Error Measures for Roots of Polynomials

We analyze different measures for the backward error of a set of numeric...
05/10/2020

Structured backward errors for eigenvalues of linear port-Hamiltonian descriptor systems

When computing the eigenstructure of matrix pencils associated with the ...
05/14/2018

Bilinear systems with two supports: Koszul resultant matrices, eigenvalues, and eigenvectors

A fundamental problem in computational algebraic geometry is the computa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A standard approach to find the solutions of a univariate polynomial equation is to convert the problem into an equivalent one where the eigenvalues of a matrix are computed instead. The algebraic technique used to construct such a matrix is called a linearization and, albeit ultracentenarian, it is still the most popular initial step of modern rootfinding algorithms, at least if all the polynomial roots are sought. For example, this is what MATLAB’s roots function does [9] for polynomials expressed in the monomial basis, and it is at the heart of chebfun/roots for polynomials expressed in the Chebyshev basis [23].

In the landmark paper [9] Edelman and Murakami cast a shadow on this strategy. They showed that, even if the matrix eigenvalue problem is solved with a backward stable algorithm, such as QR [25], the whole approach can (depending on the specific linearized polynomial) be catastrophically unstable. More recently, De Terán, Dopico and Pérez [7] argued that, if Fiedler linearizations [11] are used instead of the classical companion linearization [9], the potential misfortunes can be even more pronounced. Fortunately, Van Dooren and Dewilde [24] showed that this problem could be circumvented by solving a generalized eigenproblem instead; the disadvantage, however, is that this is significantly less efficient.

While De Terán, Dopico, and Pérez [7] and Edelman and Murakami [9] focused only on polynomials expressed in the monomial basis, Nakatsukasa and Noferini [18] proved that analogous results on the dangers of always trusting the linearize-and-use-QR philosophy can be stated for any degree-graded basis. That is, the beautiful idea of constructing the so-called confederate matrix and then finding its eigenvalues by QR is potentially, depending on the polynomial input, unstable. Again, for many bases of practical importance, switching to a pencil and the QZ algorithm provably avoids any instabilities [16, 15, 18]. Less clear than in the monomial basis is under which conditions on the input polynomial the QR-based approach is stable; Noferini and Pérez [20] gave a complete answer for the Chebyshev basis, but we are not aware of any progress for other bases.

This story has recently seen a sudden twist towards positive news. All the aforementioned results rely on the assumption that a general eigensolver, e.g., unstructured QR, is applied to the linearizing matrix. However, confederate matrices are typically highly structured. Algorithms specifically designed to preserve and utilize this structure result in two advantages: reduced computational and storage costs, and a structured backward error. Several such algorithms can be found in the literature: see, for example, [5, 3, 14] for companion matrices and the references therein for the case of unitary (fellow) and symmetric plus low rank (comrade) matrices.

Consider, for instance, the companion algorithm presented by Aurentz et al. [5]. There the authors proved that the structured QR algorithm has a backward error on the companion matrix of the order of for the rank one part, and of the order of for the unitary part (with denoting the machine precision). This implies that as an eigensolver that particular algorithm is not stable, and a blind application of the results of Edelman and Murakami [9], merging both errors, would yield a backward error on the polynomial of the size : an apparent disaster, as this is even worse than what the unstructured QR obtains: . In the numerical experiments, however, only an error of the form was observed, insinuating that something peculiar was happening with the errors.

Two years later, Aurentz et al. [3, 2], were able to improve their companion code to get an error of the order of for the rank one part. According to the results of Edelman and Murakami, this should have implied an error of size about on the polynomial coefficients. However, by considering a mixed backward error analysis they demonstrated that the specific structure of the backward error on the companion matrix implies that as a rootfinder, considering the backward error on the polynomial, the algorithm is backward stable, with a backward error bounded by ! This was the first time that a rootfinder based on linearization and (structured) QR was proved to be stable in this stronger sense.

In the current paper we extend the backward error results of Aurentz et al. [3] to other confederate matrices. As a first step, we present an alternative derivation of the same result of [3], which is less coupled with the underlying algorithm and thus easier to generalize to other bases. We examine how to cleverly map the structured backward error on the confederate matrix back to the polynomial coefficients. As an example of particular interest we analyze the case of Chebyshev polynomials (colleague matrices) in detail, see how the companion results [3] fit in, and later discuss the extension to more general Jacobi polynomials (comrade matrices).

More specifically, we address the following problem. We assume we are given a confederate pencil, that is, a structured plus rank one pencil that linearizes a polynomial expressed in a degree graded basis. The pencil is of the form , where is independent of and links to the polynomial basis, and the rank-one addend encodes the coefficients of . This is precisely the scenario encountered for polynomials expressed in a broad class of orthogonal polyomial bases, including monomials, Chebyshev, Legendre, ultraspherical, and other Jacobi polynomials. Next, we assume that a structured eigensolver is used to compute the eigenvalues of the structured pencil, such that backward errors of different form can be attached to and to . The question of interest is to map the error back to and to characterize it, thereby assessing the overall stability of the rootfinding algorithm. We show that under minimal assumptions on the pencil (satisfied in practice by most linearization schemes), only the backward error on increases when mapping it back to the polynomial.

Our result thus clarifies the direction that should be followed in the development of stable structured QR algorithms for polynomial rootfinding: one has to ensure that the backward error on the “basis part” of the pencil, the addend , is small, and independent of the polynomial under consideration.

The paper is structured as follows. In Section 2 we introduce confederate matrices, prove properties essential for the article and refine to comrade matrices. Section 3 discusses the basic principles for the mixed backward error analysis; it is shown how the structured error can be mapped back to the polynomials. In Section 4 we illustrate the main idea by reconsidering the companion matrix, and providing an alternative and simpler derivation of the results of Aurentz et al. [5]. In Section 5 we provide specific bounds for polynomials in the Chebyshev basis (colleague matrices) and come up with a conjecture for Jacobi polynomials. We conclude with Section 6.

2 Confederate matrices

First we discuss general confederate matrices. Then we refine to companion and comrade matrices, and discuss the special case of colleague matrices.

2.1 Definition and properties of confederate matrices

Let be any degree-graded (i.e., ) polynomial basis, such that, for all , has leading coefficient when expressed in the monomial basis. Let be a polynomial of degree , monic in the basis . Denoting

we can write

for a unique coefficients vector

. Following [6, 18] we now introduce the confederate matrix of .

Definition 2.1.

The confederate matrix of is the unique matrix satisfying

where

In the following theorem, the second item is classical, [6]. The first item also dates back to [6], although in a weaker form; it was stated in this form (without proof) in [18]. The third item may be new in this general form, although some special cases can be deducted from other published results, for example, if is the monomial basis it is a consequence of [7] and for the Chebyshev basis it can be proved using the analysis of [20].

Theorem 2.2.

The following properties hold:

  1. is a (strong) linearization of (implying );

  2. can be written as

    where is Hessenberg and only depends on the basis ;

Proof.

We prove the three points separately.

  1. It can be easily verified that belongs to the vector space for the basis [17, 19]. Since it is manifestly a nonsingular pencil, it is a strong linearization for by [19, Theorem 2.1] (or [17, Theorem 4.3] for the monomial basis). Since is monic in the monomial basis, and is monic in the basis , the equality follows.

  2. Let be the matrix that satisfies Since has degree for all it follows that is Hessenberg, by the degree-gradedness of . Now,

    Since this relation holds over , a fortiori it is still true as a relation over after evaluating at any point. Thus, for any Vandermonde matrix we obtain

    as desired.

  3. By definition of we have

    As is regular, it is invertible over . Hence we can premultiply by its inverse (using ), to obtain

Remark 1.

The matrix is a square submatrix of the multiplication matrix in [19, Section 2] and it represents the multiplication-by- operator in the quotient space .

Example 2.3 (Companion matrix).

Consider the monomial basis , where . We have . As a consequence where is the downshift matrix, i.e. the matrix with only ones on the subdiagonal, and zeroes elsewhere.

2.2 Comrade matrices

When are orthonormal on a closed interval and have positive leading coefficients, the three terms recurrence

holds for all and for some , [22, Theorem 3.2.1]. As a consequence, multiplication-by- is encoded by

which immediately implies that in this case

where is tridiagonal, and has positive subdiagonal/superdiagonal elements. In the case of an orthogonal basis, the confederate matrix is also known as the comrade matrix of . Note moreover that, as displayed above, in this setting , so that for it holds

The matrix has the following form:

We remark that for polynomials represented in the Chebyshev basis the matrix is called the colleague matrix.

In addition, we note that since and are positive, it is possible to perform a diagonal scaling to the matrix that makes it symmetric. Indeed, we can consider the matrix , where is any diagonal matrix with entries

(1)

Observe that, in particular, . This corresponds to choosing the orthogonal basis , having formally set . The scaled matrices are as follows:

where is the vector of the coefficients of expressed in the scaled basis and . From now on we work in this symmetrized setting, and we only consider . From the viewpoint of developing structured algorithms, this is particularly relevant. If , with real symmetric or Hermitian and of rank , then all the matrices obtained through the iteration of a QR method, that can be written as , with orthogonal or unitary, have the same property.

This observation is key in the development of fast algorithms; in the monomial case, the companion matrix can be similarly decomposed as the sum of a unitary and a rank part; this property is also preserved by QR iterations.

Fast algorithms for these classes of matrices often work on the structured (either Hermitian or unitary) and rank one part separately. Therefore, it is reasonable to assume that these parts might be contaminated, throughout the iterations, by backward errors of different magnitudes. Classical backward error analysis does not take this property into account, so we present a more general backward error formulation in the next section.

3 Mixed backward error analysis

We are now ready to study the behavior of the linearized polynomial under perturbations to the pencil . More precisely, we consider where has a mixed backward error of the following form:

(2)

By Theorem 2.2 the perturbed matrix linearizes the polynomial

(3)

Our aim is now to examine the size of , under the assumption that for the various actors in (2) a bound is known. In particular, we assume to know appropriate positive such that

(4)

In this section we will discuss the general setting, which holds for all confederate matrices. In Sections 4 and 5 we will specialize to companion and comrade matrices, that is either the monomial basis, or a basis of polynomials orthogonal on a real interval. Our analysis only holds for structured . In particular, we will sometimes explicitly assume in this section that is a normal matrix. Note that this includes both unitary and Hermitian matrices, which correspond to the monomial and orthogonal basis mentioned above, respectively.

Based on the expressions above, we can rewrite the perturbed polynomial.

Theorem 3.1.

With the notation of (2), (3), and (4), the following first order expansion in holds:

Proof.

By (3) we have , which by Theorem 2.2 is equal to

To obtain the statement, we first add and subtract its expansion obtained by Theorem 2.2. Next, we discard higher order terms, and use the equalities and . ∎

Theorem 3.1 reveals that, in order to provide bounds for the perturbation , it is essential to do a perturbation analysis related to determinants and adjugates. To this aim, we provide a few results that will be useful in later proofs.

Lemma 3.2 (Jacobi’s formula).

Let be any square matrix, and a small perturbation. Then,

A similar result can be given for the adjugate as well, and characterizes the effect of small perturbations.

Lemma 3.3.

Let be a small perturbation (). Then,

(5)
Proof.

Since , is invertible and therefore we can write

We shall make a first order approximation of both terms involved in the above equality. Concerning the first one, we have that . To bound the change in the determinant we use Lemma 3.2 and obtain

which provides the sought first-order expansion (5). ∎

Lemma 3.4.

Let be two matrices, and assume that is normal with eigenvalues . Then,

Proof.

Let be an eigendecomposition of , with unitary. Then, if we denote by the columns of we can write:

Since , the result follows. ∎

With these tools at hand, we are now able to bound the point-wise perturbation , i.e., the evaluation of the perturbation at any point . Later on, when going to comrade and companion matrices we will need to specify these points to retrieve tight bounds on the polynomials’ coefficients.

Lemma 3.5.

Let be the perturbed polynomial. Then, for every and for sufficiently small perturbations, we get

where

having defined555We omitted the dependence of and on for cosmetic reasons.

In the expressions for and above, denote the roots of the orthogonal polynomial of degree .

Proof.

Let us first note that, since and since is normal we have

By Theorem 2.2 we have the following first order approximation for :

(6)
(7)
(8)

We bound all the terms separately.

  • We consider (6) first. By Lemma 3.2 we can write

    Since is a normal matrix, we can use Lemma 3.4 to obtain

  • To bound the second term in (7), we use

    Bounding the first term just requires to take the norms of all the factors involved.

  • To bound (8), assuming and using Lemma 3.3, we write

    Then, using the fact that , we have

Taking norms and combining all the results yields the desired bound. ∎

The results we have proved are valid for any class of polynomials under the assumption that is normal. We will use this idea to generalize the point-wise bound to a bound on the coefficients in the case of the monomial basis and of orthogonal polynomials on a real interval. These are the subjects of the next sections.

4 Companion matrix

In this section we reconsider the error analysis of Aurentz et al. [3], in view of this new theory. The derivation of [3]

is based on running the Faddev-Leverrier algorithm to compute the coefficients of the adjugates, and uses it to provide bounds on its norm. This approach is not easily generalizable, despite the existence of a Faddev-Leverrier scheme for nonmonomial bases. Our new point of view yields a simple and clean derivation of the results therein, based instead on an interpolation argument.

To analyze the backward error of an algorithm running on the companion matrix, we have to rewrite the companion matrix slightly. Example 2.3 revealed that the Hessenberg matrix is the downshift matrix, and the eigenvalues can be retrieved from , i.e. the downshift matrix plus a rank one part. Structure exploiting algorithms, however, rely on the unitary-plus-low rank structure, and rewriting , with and is clearly of unitary-plus-low rank form.

This has some impact on the backward error, since we are now working with the basis , instead of the classical monomial basis. Moreover, also the trailing coefficient of our polynomial has changed. For simplicity we will therefore, from now on, assume to be working in the basis .

Eventually we will use the fast Fourier transform to retrieve the coefficients of

. To do so, we need to bound evaluated in the -th roots of unity , for . Lemma 3.5 provides

where we already have , , and . Since , we have that the quantities of Lemma 3.5 can be controlled with . To bound , we use

and the fact that for . As a result we obtain

Combining all of this leads to

(9)

As a result, we get as Euclidean norm on the coefficients of , denoted as ,

where , and is the matrix of the discrete Fourier transform. The last factor can be bounded by (9).

Reconsidering the algorithm of Aurentz et al. [3], we have that , , and , where is the machine precision. Clearly we end up with the same bound as proposed by Aurentz et al., namely a linear dependency on .

Before moving to orthogonal basis on real intervals, and in particular Chebyshev and Jacobi polynomials, we emphasize that the main ingredients playing a role in the bound are related to the eigenvalues of the structured matrix , namely their separation, as measured by the constants of Lemma 3.5, and their good properties as interpolation points for the chosen basis. These two quantities will play an important role in the analysis of the following sections as well.

5 Orthogonal polynomials on a real interval

In this section, we consider a class of degree-graded polynomials , for , that are orthogonal on with respect to a positive measure .

Our aim is to leverage Lemma 3.5 to provide a bound on the coefficients of the perturbed polynomial . To this aim, we provide the following result, which holds for any polynomial family orthogonal on ; since this bound is not very explicit, we will then specialize it to a few particular families of polynomials for which we can be more precise, namely Chebyshev and later on all Jacobi polynomials.

Theorem 5.1.

In the notation of (2) and (4), let be a basis of orthogonal polynomials on , such that is real and symmetric. Let be distinct points in , and the roots of . Let be the Lagrange polynomials defined on the nodes , and consider the matrix such that contains the -th coefficient of with respect to the basis . Then, the norm of the vector of coefficients of can be bounded by

where is the matrix with the first rows of , and are defined as in Lemma 3.5.

Proof.

We note that is a degree polynomial. Its coefficients can be recovered by interpolation on the points . Notice that these are points, one more than actually required. Let be the generalized Vandermonde matrix interpolating on these nodes in the prescribed basis. Hence, we have

Note that . Indeed, the entries of the inverse of a Vandermonde matrix are the cofficients of the Lagrange polynomials with nodes . Therefore, we have, for ,

where with we denote the first rows of . The statement then follows by applying Lemma 3.5. ∎

5.1 Chebyshev polynomials

Chebyshev polynomials of the first kind play a special role among orthogonal polynomials on , in particular thanks to their nice approximation properties. For instance, they are the basis of the chebfun MATLAB toolbox [8], that aims at making computing with functions as accessible as computing with matrices and vectors.

Their orthogonality measure is defined by the weight function , and they can be obtained through the recursive relations

We denote by the Chebyshev polynomials of the second kind, which can be obtained replacing the degree polynomial with , and keeping the rest of the recursion unchanged. The latter are orthogonal with respect to the weight . Moreover, , and therefore the extrema of are the roots of .

Our aim in this section is to apply Theorem 5.1 to Chebyshev polynomials of the first kind, making all the involved constants explicit, or functions of the degree. To this aim, we need to choose the interpolation nodes, and in this case we select , for , which are the roots of (with, additionally, the points ) and therefore the extrema of on .

Lemma 5.2.

Let be the matrix defined as in Theorem 5.1 choosing as the Chebyshev polynomials of the first kind, and as nodes , for . Then, .

Proof.

We prove the result by showing that, for we have if , and for . It immediately follows that the row sums of are bounded by , and thus the claim holds.

For any , since is the Chebyshev coefficients corresponding to of , we can recover it by writing

Here denotes the norm induced by the scalar product considered above. We note that, if , then is divisible by , since it vanishes at . Therefore, for , we can define the degree polynomial and rewrite the formula as follows:

Since , because we are assuming , we can integrate the above exactly using a Chebyshev-Gauss quadrature formula with Chebyshev polynomials of the second kind of degree , which yields

For Chebyshev-Gauss quadrature of the second kind, the are known explicitly and are ; this, combined with and yields .

It remains to consider the case . Without loss of generality we can consider , which is associated with . Since has as roots the zeros of and , we can write it as up to a scaling factor . The latter can be determined imposing which yields since . Similarly, we can show that . In addition, we may write

where if , and if . These equalities can be easily verified using relation (22.5.8) from [1, page 778]. Hence, we can conclude that , and therefore . ∎

To apply Theorem 5.1 we need to obtain bounds for the constants and , which in turn requires to bounds the quantities and as defined in Lemma 3.5.

Lemma 5.3.

For Chebyshev polynomials, with the notation of Lemma 3.5, and as defined in Theorem 3.1, we have

The above result is somewhat tedious to prove, so we delay the proof to Section 5.2; it allows to state the following corollary for the case of Chebyshev polynomials. Recall that, given a monic polynomial , the (scaled) colleague matrix is given by:

(10)

as described in Section 2.2, since for Chebyshev polynomials of the first kind we have , with the only exception of .

Corollary 5.4.

Let the scaled linearization for a polynomial expressed in the Chebyshev basis given by (10). Consider perturbations , , and . Then, the matrix linearizes the polynomial

where .

Proof.

This result follows combining Lemma 3.5 with Theorem 3.1 and Lemma 5.3. More precisely, the bound is obtained on the coefficients of the polynomial

where and otherwise. Therefore, we have and , so in particular . ∎

The previous result tells us that a structured QR algorithm working on the Hermitian and rank one part separately, and ensuring a low relative backward error on these two components, would give a backward stable rootfinding algorithm. Indeed, in that case we would have

where is used to denote the inequality up to a constant and a low-degree polynomial in the degree. Combining this fact with the result Corollary 5.4 would guarantee that the backward error on the polynomial is bounded by .

5.2 Proof of Lemma 5.3

Bounding the constant of Lemma 5.3 requires to give a lower bound to the pairwise distance between the roots of the Chebyshev polynomial of the first kind of degree , denoted by , and the ones of the second kind of degree , denoted by