1 Introduction
A standard approach to find the solutions of a univariate polynomial equation is to convert the problem into an equivalent one where the eigenvalues of a matrix are computed instead. The algebraic technique used to construct such a matrix is called a linearization and, albeit ultracentenarian, it is still the most popular initial step of modern rootfinding algorithms, at least if all the polynomial roots are sought. For example, this is what MATLAB’s roots function does [9] for polynomials expressed in the monomial basis, and it is at the heart of chebfun/roots for polynomials expressed in the Chebyshev basis [23].
In the landmark paper [9] Edelman and Murakami cast a shadow on this strategy. They showed that, even if the matrix eigenvalue problem is solved with a backward stable algorithm, such as QR [25], the whole approach can (depending on the specific linearized polynomial) be catastrophically unstable. More recently, De Terán, Dopico and Pérez [7] argued that, if Fiedler linearizations [11] are used instead of the classical companion linearization [9], the potential misfortunes can be even more pronounced. Fortunately, Van Dooren and Dewilde [24] showed that this problem could be circumvented by solving a generalized eigenproblem instead; the disadvantage, however, is that this is significantly less efficient.
While De Terán, Dopico, and Pérez [7] and Edelman and Murakami [9] focused only on polynomials expressed in the monomial basis, Nakatsukasa and Noferini [18] proved that analogous results on the dangers of always trusting the linearizeanduseQR philosophy can be stated for any degreegraded basis. That is, the beautiful idea of constructing the socalled confederate matrix and then finding its eigenvalues by QR is potentially, depending on the polynomial input, unstable. Again, for many bases of practical importance, switching to a pencil and the QZ algorithm provably avoids any instabilities [16, 15, 18]. Less clear than in the monomial basis is under which conditions on the input polynomial the QRbased approach is stable; Noferini and Pérez [20] gave a complete answer for the Chebyshev basis, but we are not aware of any progress for other bases.
This story has recently seen a sudden twist towards positive news. All the aforementioned results rely on the assumption that a general eigensolver, e.g., unstructured QR, is applied to the linearizing matrix. However, confederate matrices are typically highly structured. Algorithms specifically designed to preserve and utilize this structure result in two advantages: reduced computational and storage costs, and a structured backward error. Several such algorithms can be found in the literature: see, for example, [5, 3, 14] for companion matrices and the references therein for the case of unitary (fellow) and symmetric plus low rank (comrade) matrices.
Consider, for instance, the companion algorithm presented by Aurentz et al. [5]. There the authors proved that the structured QR algorithm has a backward error on the companion matrix of the order of for the rank one part, and of the order of for the unitary part (with denoting the machine precision). This implies that as an eigensolver that particular algorithm is not stable, and a blind application of the results of Edelman and Murakami [9], merging both errors, would yield a backward error on the polynomial of the size : an apparent disaster, as this is even worse than what the unstructured QR obtains: . In the numerical experiments, however, only an error of the form was observed, insinuating that something peculiar was happening with the errors.
Two years later, Aurentz et al. [3, 2], were able to improve their companion code to get an error of the order of for the rank one part. According to the results of Edelman and Murakami, this should have implied an error of size about on the polynomial coefficients. However, by considering a mixed backward error analysis they demonstrated that the specific structure of the backward error on the companion matrix implies that as a rootfinder, considering the backward error on the polynomial, the algorithm is backward stable, with a backward error bounded by ! This was the first time that a rootfinder based on linearization and (structured) QR was proved to be stable in this stronger sense.
In the current paper we extend the backward error results of Aurentz et al. [3] to other confederate matrices. As a first step, we present an alternative derivation of the same result of [3], which is less coupled with the underlying algorithm and thus easier to generalize to other bases. We examine how to cleverly map the structured backward error on the confederate matrix back to the polynomial coefficients. As an example of particular interest we analyze the case of Chebyshev polynomials (colleague matrices) in detail, see how the companion results [3] fit in, and later discuss the extension to more general Jacobi polynomials (comrade matrices).
More specifically, we address the following problem. We assume we are given a confederate pencil, that is, a structured plus rank one pencil that linearizes a polynomial expressed in a degree graded basis. The pencil is of the form , where is independent of and links to the polynomial basis, and the rankone addend encodes the coefficients of . This is precisely the scenario encountered for polynomials expressed in a broad class of orthogonal polyomial bases, including monomials, Chebyshev, Legendre, ultraspherical, and other Jacobi polynomials. Next, we assume that a structured eigensolver is used to compute the eigenvalues of the structured pencil, such that backward errors of different form can be attached to and to . The question of interest is to map the error back to and to characterize it, thereby assessing the overall stability of the rootfinding algorithm. We show that under minimal assumptions on the pencil (satisfied in practice by most linearization schemes), only the backward error on increases when mapping it back to the polynomial.
Our result thus clarifies the direction that should be followed in the development of stable structured QR algorithms for polynomial rootfinding: one has to ensure that the backward error on the “basis part” of the pencil, the addend , is small, and independent of the polynomial under consideration.
The paper is structured as follows. In Section 2 we introduce confederate matrices, prove properties essential for the article and refine to comrade matrices. Section 3 discusses the basic principles for the mixed backward error analysis; it is shown how the structured error can be mapped back to the polynomials. In Section 4 we illustrate the main idea by reconsidering the companion matrix, and providing an alternative and simpler derivation of the results of Aurentz et al. [5]. In Section 5 we provide specific bounds for polynomials in the Chebyshev basis (colleague matrices) and come up with a conjecture for Jacobi polynomials. We conclude with Section 6.
2 Confederate matrices
First we discuss general confederate matrices. Then we refine to companion and comrade matrices, and discuss the special case of colleague matrices.
2.1 Definition and properties of confederate matrices
Let be any degreegraded (i.e., ) polynomial basis, such that, for all , has leading coefficient when expressed in the monomial basis. Let be a polynomial of degree , monic in the basis . Denoting
we can write
for a unique coefficients vector
. Following [6, 18] we now introduce the confederate matrix of .Definition 2.1.
The confederate matrix of is the unique matrix satisfying
where
In the following theorem, the second item is classical, [6]. The first item also dates back to [6], although in a weaker form; it was stated in this form (without proof) in [18]. The third item may be new in this general form, although some special cases can be deducted from other published results, for example, if is the monomial basis it is a consequence of [7] and for the Chebyshev basis it can be proved using the analysis of [20].
Theorem 2.2.
The following properties hold:

is a (strong) linearization of (implying );

can be written as
where is Hessenberg and only depends on the basis ;

Proof.
We prove the three points separately.

It can be easily verified that belongs to the vector space for the basis [17, 19]. Since it is manifestly a nonsingular pencil, it is a strong linearization for by [19, Theorem 2.1] (or [17, Theorem 4.3] for the monomial basis). Since is monic in the monomial basis, and is monic in the basis , the equality follows.

Let be the matrix that satisfies Since has degree for all it follows that is Hessenberg, by the degreegradedness of . Now,
Since this relation holds over , a fortiori it is still true as a relation over after evaluating at any point. Thus, for any Vandermonde matrix we obtain
as desired.

By definition of we have
As is regular, it is invertible over . Hence we can premultiply by its inverse (using ), to obtain
∎
Remark 1.
The matrix is a square submatrix of the multiplication matrix in [19, Section 2] and it represents the multiplicationby operator in the quotient space .
Example 2.3 (Companion matrix).
Consider the monomial basis , where . We have . As a consequence where is the downshift matrix, i.e. the matrix with only ones on the subdiagonal, and zeroes elsewhere.
2.2 Comrade matrices
When are orthonormal on a closed interval and have positive leading coefficients, the three terms recurrence
holds for all and for some , [22, Theorem 3.2.1]. As a consequence, multiplicationby is encoded by
which immediately implies that in this case
where is tridiagonal, and has positive subdiagonal/superdiagonal elements. In the case of an orthogonal basis, the confederate matrix is also known as the comrade matrix of . Note moreover that, as displayed above, in this setting , so that for it holds
The matrix has the following form:
We remark that for polynomials represented in the Chebyshev basis the matrix is called the colleague matrix.
In addition, we note that since and are positive, it is possible to perform a diagonal scaling to the matrix that makes it symmetric. Indeed, we can consider the matrix , where is any diagonal matrix with entries
(1) 
Observe that, in particular, . This corresponds to choosing the orthogonal basis , having formally set . The scaled matrices are as follows:
where is the vector of the coefficients of expressed in the scaled basis and . From now on we work in this symmetrized setting, and we only consider . From the viewpoint of developing structured algorithms, this is particularly relevant. If , with real symmetric or Hermitian and of rank , then all the matrices obtained through the iteration of a QR method, that can be written as , with orthogonal or unitary, have the same property.
This observation is key in the development of fast algorithms; in the monomial case, the companion matrix can be similarly decomposed as the sum of a unitary and a rank part; this property is also preserved by QR iterations.
Fast algorithms for these classes of matrices often work on the structured (either Hermitian or unitary) and rank one part separately. Therefore, it is reasonable to assume that these parts might be contaminated, throughout the iterations, by backward errors of different magnitudes. Classical backward error analysis does not take this property into account, so we present a more general backward error formulation in the next section.
3 Mixed backward error analysis
We are now ready to study the behavior of the linearized polynomial under perturbations to the pencil . More precisely, we consider where has a mixed backward error of the following form:
(2) 
By Theorem 2.2 the perturbed matrix linearizes the polynomial
(3) 
Our aim is now to examine the size of , under the assumption that for the various actors in (2) a bound is known. In particular, we assume to know appropriate positive such that
(4) 
In this section we will discuss the general setting, which holds for all confederate matrices. In Sections 4 and 5 we will specialize to companion and comrade matrices, that is either the monomial basis, or a basis of polynomials orthogonal on a real interval. Our analysis only holds for structured . In particular, we will sometimes explicitly assume in this section that is a normal matrix. Note that this includes both unitary and Hermitian matrices, which correspond to the monomial and orthogonal basis mentioned above, respectively.
Based on the expressions above, we can rewrite the perturbed polynomial.
Proof.
Theorem 3.1 reveals that, in order to provide bounds for the perturbation , it is essential to do a perturbation analysis related to determinants and adjugates. To this aim, we provide a few results that will be useful in later proofs.
Lemma 3.2 (Jacobi’s formula).
Let be any square matrix, and a small perturbation. Then,
A similar result can be given for the adjugate as well, and characterizes the effect of small perturbations.
Lemma 3.3.
Let be a small perturbation (). Then,
(5) 
Proof.
Lemma 3.4.
Let be two matrices, and assume that is normal with eigenvalues . Then,
Proof.
Let be an eigendecomposition of , with unitary. Then, if we denote by the columns of we can write:
Since , the result follows. ∎
With these tools at hand, we are now able to bound the pointwise perturbation , i.e., the evaluation of the perturbation at any point . Later on, when going to comrade and companion matrices we will need to specify these points to retrieve tight bounds on the polynomials’ coefficients.
Lemma 3.5.
Let be the perturbed polynomial. Then, for every and for sufficiently small perturbations, we get
where
having defined^{5}^{5}5We omitted the dependence of and on for cosmetic reasons.
In the expressions for and above, denote the roots of the orthogonal polynomial of degree .
Proof.
Let us first note that, since and since is normal we have
By Theorem 2.2 we have the following first order approximation for :
(6)  
(7)  
(8) 
We bound all the terms separately.

To bound the second term in (7), we use
Bounding the first term just requires to take the norms of all the factors involved.
Taking norms and combining all the results yields the desired bound. ∎
The results we have proved are valid for any class of polynomials under the assumption that is normal. We will use this idea to generalize the pointwise bound to a bound on the coefficients in the case of the monomial basis and of orthogonal polynomials on a real interval. These are the subjects of the next sections.
4 Companion matrix
In this section we reconsider the error analysis of Aurentz et al. [3], in view of this new theory. The derivation of [3]
is based on running the FaddevLeverrier algorithm to compute the coefficients of the adjugates, and uses it to provide bounds on its norm. This approach is not easily generalizable, despite the existence of a FaddevLeverrier scheme for nonmonomial bases. Our new point of view yields a simple and clean derivation of the results therein, based instead on an interpolation argument.
To analyze the backward error of an algorithm running on the companion matrix, we have to rewrite the companion matrix slightly. Example 2.3 revealed that the Hessenberg matrix is the downshift matrix, and the eigenvalues can be retrieved from , i.e. the downshift matrix plus a rank one part. Structure exploiting algorithms, however, rely on the unitarypluslow rank structure, and rewriting , with and is clearly of unitarypluslow rank form.
This has some impact on the backward error, since we are now working with the basis , instead of the classical monomial basis. Moreover, also the trailing coefficient of our polynomial has changed. For simplicity we will therefore, from now on, assume to be working in the basis .
Eventually we will use the fast Fourier transform to retrieve the coefficients of
. To do so, we need to bound evaluated in the th roots of unity , for . Lemma 3.5 provideswhere we already have , , and . Since , we have that the quantities of Lemma 3.5 can be controlled with . To bound , we use
and the fact that for . As a result we obtain
Combining all of this leads to
(9) 
As a result, we get as Euclidean norm on the coefficients of , denoted as ,
where , and is the matrix of the discrete Fourier transform. The last factor can be bounded by (9).
Reconsidering the algorithm of Aurentz et al. [3], we have that , , and , where is the machine precision. Clearly we end up with the same bound as proposed by Aurentz et al., namely a linear dependency on .
Before moving to orthogonal basis on real intervals, and in particular Chebyshev and Jacobi polynomials, we emphasize that the main ingredients playing a role in the bound are related to the eigenvalues of the structured matrix , namely their separation, as measured by the constants of Lemma 3.5, and their good properties as interpolation points for the chosen basis. These two quantities will play an important role in the analysis of the following sections as well.
5 Orthogonal polynomials on a real interval
In this section, we consider a class of degreegraded polynomials , for , that are orthogonal on with respect to a positive measure .
Our aim is to leverage Lemma 3.5 to provide a bound on the coefficients of the perturbed polynomial . To this aim, we provide the following result, which holds for any polynomial family orthogonal on ; since this bound is not very explicit, we will then specialize it to a few particular families of polynomials for which we can be more precise, namely Chebyshev and later on all Jacobi polynomials.
Theorem 5.1.
In the notation of (2) and (4), let be a basis of orthogonal polynomials on , such that is real and symmetric. Let be distinct points in , and the roots of . Let be the Lagrange polynomials defined on the nodes , and consider the matrix such that contains the th coefficient of with respect to the basis . Then, the norm of the vector of coefficients of can be bounded by
where is the matrix with the first rows of , and are defined as in Lemma 3.5.
Proof.
We note that is a degree polynomial. Its coefficients can be recovered by interpolation on the points . Notice that these are points, one more than actually required. Let be the generalized Vandermonde matrix interpolating on these nodes in the prescribed basis. Hence, we have
Note that . Indeed, the entries of the inverse of a Vandermonde matrix are the cofficients of the Lagrange polynomials with nodes . Therefore, we have, for ,
where with we denote the first rows of . The statement then follows by applying Lemma 3.5. ∎
5.1 Chebyshev polynomials
Chebyshev polynomials of the first kind play a special role among orthogonal polynomials on , in particular thanks to their nice approximation properties. For instance, they are the basis of the chebfun MATLAB toolbox [8], that aims at making computing with functions as accessible as computing with matrices and vectors.
Their orthogonality measure is defined by the weight function , and they can be obtained through the recursive relations
We denote by the Chebyshev polynomials of the second kind, which can be obtained replacing the degree polynomial with , and keeping the rest of the recursion unchanged. The latter are orthogonal with respect to the weight . Moreover, , and therefore the extrema of are the roots of .
Our aim in this section is to apply Theorem 5.1 to Chebyshev polynomials of the first kind, making all the involved constants explicit, or functions of the degree. To this aim, we need to choose the interpolation nodes, and in this case we select , for , which are the roots of (with, additionally, the points ) and therefore the extrema of on .
Lemma 5.2.
Let be the matrix defined as in Theorem 5.1 choosing as the Chebyshev polynomials of the first kind, and as nodes , for . Then, .
Proof.
We prove the result by showing that, for we have if , and for . It immediately follows that the row sums of are bounded by , and thus the claim holds.
For any , since is the Chebyshev coefficients corresponding to of , we can recover it by writing
Here denotes the norm induced by the scalar product considered above. We note that, if , then is divisible by , since it vanishes at . Therefore, for , we can define the degree polynomial and rewrite the formula as follows:
Since , because we are assuming , we can integrate the above exactly using a ChebyshevGauss quadrature formula with Chebyshev polynomials of the second kind of degree , which yields
For ChebyshevGauss quadrature of the second kind, the are known explicitly and are ; this, combined with and yields .
It remains to consider the case . Without loss of generality we can consider , which is associated with . Since has as roots the zeros of and , we can write it as up to a scaling factor . The latter can be determined imposing which yields since . Similarly, we can show that . In addition, we may write
where if , and if . These equalities can be easily verified using relation (22.5.8) from [1, page 778]. Hence, we can conclude that , and therefore . ∎
To apply Theorem 5.1 we need to obtain bounds for the constants and , which in turn requires to bounds the quantities and as defined in Lemma 3.5.
Lemma 5.3.
The above result is somewhat tedious to prove, so we delay the proof to Section 5.2; it allows to state the following corollary for the case of Chebyshev polynomials. Recall that, given a monic polynomial , the (scaled) colleague matrix is given by:
(10) 
as described in Section 2.2, since for Chebyshev polynomials of the first kind we have , with the only exception of .
Corollary 5.4.
Let the scaled linearization for a polynomial expressed in the Chebyshev basis given by (10). Consider perturbations , , and . Then, the matrix linearizes the polynomial
where .
Proof.
The previous result tells us that a structured QR algorithm working on the Hermitian and rank one part separately, and ensuring a low relative backward error on these two components, would give a backward stable rootfinding algorithm. Indeed, in that case we would have
where is used to denote the inequality up to a constant and a lowdegree polynomial in the degree. Combining this fact with the result Corollary 5.4 would guarantee that the backward error on the polynomial is bounded by .
5.2 Proof of Lemma 5.3
Bounding the constant of Lemma 5.3 requires to give a lower bound to the pairwise distance between the roots of the Chebyshev polynomial of the first kind of degree , denoted by , and the ones of the second kind of degree , denoted by extended with as and . In addition, bounding requires an upper bound to the sum of their inverses. To obtain such results, we exploit the fact that these quantities are explicitly known: