Finding polynomial roots by dynamical systems – a case study

04/07/2020 ∙ by Sergey Shemyakov, et al. ∙ Jacobs University Bremen GMX 0

We investigate two well known dynamical systems that are designed to find roots of univariate polynomials by iteration: the methods known by Newton and by Ehrlich-Aberth. Both are known to have found all roots of high degree polynomials with good complexity. Our goal is to determine in which cases which of the two algorithms is more efficient. We come to the conclusion that Newton is faster when the polynomials are given by recursion so they can be evaluated in logarithmic time with respect to the degree, or when all the roots are all near the boundary of their convex hull. Conversely, Ehrlich-Aberth has the advantage when no fast evaluation of the polynomials is available, and when roots are in the interior of the convex hull of other roots.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Among the most fundamental and classical results in mathematics is the fact that every univariate polynomial of degree over splits into linear factors (i.e., is algebraically closed), and that there is no closed formula to determine their roots in terms of radicals. Therefore, these roots have to be approximated numerically, usually in terms of an iterated process that can be viewed as a very natural dynamical system.

We investigate two such dynamical systems that are particularly prominent as root finders: Newton’s method and the Ehrlich–Aberth-method. Both are well known as efficient root finders that have a good track record for finding all roots of rather large degrees, and the question arises which of these is more efficient. Of course, there are numerous other root finding procedures, including eigenvalue methods or Weierstrass’ iteration method. Our choice is motivated by the fact that the Ehrlich–Aberth-method is underlying one of the best established practical root finding software packages,

MPSolve, while in our own work Newton’s method has turned out as particularly successful. Moreover, these two root finders are conceptually related in an interesting way: Newton approximates one root at a time, while Ehrlich–Aberth may be seen as a system of orbits that try to approximate all the roots at the same time, and these orbits can be viewed as Newton orbits that are synchronized in the sense that each of them is aware of where the others are: they are “attracted” by the roots in the same sense as for Newton’s method, but repelled from each other so that they avoid approximating the same (simple) root.

We feel it would be quite interesting to extend this comparison also to other root finding procedures.

It is clear that no continuous deterministic root finding system can converge to roots for every set of initial conditions: the domains of convergence to some set of roots is open, and the complement must contain at least the boundaries of the basins. The second best desired property of a root finder is that it be generally convergent

: this means it should converge to a root for an open and dense subset of all possible starting conditions (starting points for Newton, starting vectors for Ehrlich–Aberth). This property is difficult to establish: Newton’s method is known not to be generally convergent, and for Ehrlich–Aberth it is an open question.

The structure of this paper is as follows: in Section 2 we describe the iteration steps of Newton and of Ehrlich–Aberth, trying to bring out their conceptual similarities. Relevant and interesting properties of both methods are then described, respectively, in Sections 3 and 4. Our key results are presented in Section 5: we describe several families of polynomials with different properties, and all with a wide range of degrees, and measure the performance of Newton’s and Ehrlich–Aberth’s methods in systematic experiments. The final Sections 6 contains a discussion of the results and the complexity of the method, as well possible conclusions and ideas for improvements that might inspire further research.

Acknowledgements

This project has been inspired by several of our friends and colleagues. It owes a lot to many interesting discussions with Dario Bini and Leonardo Robol in Pisa, as well as with Victor Pan in New York, as well as John Hubbard in Cornell. It builds on many ideas and discussions with the members of our research group, in particular Robin Stoll, Marvin Randig, and Bernhard Reinke. In addition, we are most grateful to the two referees for their helpful comments and suggestions.

We gratefully acknowledge support through the Advanced Grant HOLOGRAM of the European Research Council.

2. The root finding methods by Newton and Ehrlich–Aberth in comparison

The setting in all cases is the same: we investigate a monic polynomial of degree , with roots ,…,; nothing will be lost by assuming that is monic.

2.1. Newton’s method

Newton’s well known root finding method is not an algorithm but a heuristic: given a polynomial

(or more generally a differentiable mapping) and a guess for a possible root, then a “suggested improvement” for the guess is

(1)

If is already a root, i.e. , then this point remains fixed: roots of are fixed points of . In general, is a rational map of the same degree as the number of distinct roots of (multiple roots of are accounted for in the degree of but not of ).

Lemma 1 (Local convergence of Newton’s method).

Every simple root has a neighborhood on which convergence of Newton’s method is quadratic: . For multiple roots the convergence is only linear: if is a root with multiplicity , then as .

There is an electrostatic interpretation of Newton’s method: place protons at the locations of the roots of ; then the electrostatic force felt by an electron at a point is the sum of the attractive forces of all the roots, and each of these forces is in the direction to the root and with strength inversely proportional to the distance. Except for constants of physical units, the combined force equals (where the overbar denotes complex conjugation). For Newton’s method, the image of is the unique point for which this sum is equal to : it places the point so that a single charge at has the same effect as the superposition of all charges at the roots. In other words, it moves in the direction of the net force of all the roots, but at a distance inversely proportional to the net force: the smaller this net force, the further away all the roots, and the further the point has to move in the direction of the origin of the force.

2.2. The Ehrlich–Aberth-method

This method does not approximate one root at a time, but tries to find all roots at once; hence it is an iteration on (or rather an open subset thereof).

Consider a vector of initial guesses. The Ehrlich–Aberth-iteration consists of the map given by

(2)

Observe that if some is equal to a root of , i.e. , then so this coordinate remains fixed. There is a heuristic interpretation for this formula: each of the component variables “thinks” that all the other components are already equal to a root (“everyone else knows what they are doing, only I am wrong”), and tries to use the standard Newton update for the function , which would be the linear function if all the other approximations were correct. To see this, observe that

and hence

as claimed above.

Lemma 2 (Local convergence of Ehrlich–Aberth method).

Every vector consisting of the roots of has a neighborhood in on which the Ehrlich–Aberth method converges to this solution vector. This convergence is cubic when all roots of are simple, and linear otherwise.

This method has an electrostatic interpretation similar as Newton’s method: once again, we place positive unit charges at the positions of the roots in . This time, there are approximations to the roots at the positions , each coming with a negative unit charge. The approximations to the roots are as much attracted to the actual roots as they are repelled by each other, so they have a natural tendency to find different roots; in particular, if one approximation has already “found” a root, then the positive and negative charges cancel exactly.

Lemma 3 (Electrostatic interpretation of Ehrlich–Aberth-method).

In the Ehrlich–Aberth-method, each root moves in the direction of the resulting net charge, and a distance that is inversely proportional to the strength of this charge.

Proof.

The following computation starts with the Ehrlich–Aberth iteration formula end terminates with the electrostatic interpretation:

So indeed, the formula is the same as Newton’s method, except that all the other components in the vector of guesses are added as “roots with negative charge”. ∎

If one runs Newton’s method on approximations separately, one can give it a similar electrostatic interpretation in the sense that the approximations have infinitesimally small test charges so that they see the roots, but do not interact with each other. This can be seen as a conceptual advantage of Ehrlich–Aberth over Newton: the approximations are synchronized with each other. However, currently there does not seem to be theory known that exploits this fact towards an understanding of the global dynamics of the Ehrlich–Aberth-method. In particular, it is not known whether it is globally convergent; see Section 3.1.

One key difference between the Newton and Ehrlich–Aberth-methods will be relevant below: in order to evaluate Newton’s method, it suffices to have a way of evaluating . In certain cases, for instance when is given by iteration or other efficient types of recursion, this may have much better complexity than : in the cases explored here, this complexity was . Linear recursions for do not offer improvements here, but sparse polynomials might (compare for instance [BF1]).

Even though one can use the same efficient evaluation of in Ehrlich–Aberth, this does not improve the overall complexity of the method. This explains why specifically in such cases, the performance of Newton is much better.

We would like to mention that for both methods approximate improvements are indeed possible that might lead to substantially faster evaluations at least in appropriate cases: these are discussed in Section 6.5.

2.3. Machine precision and stopping criteria

Like for all numerical experiments, it is important to keep in mind the capabilities and limitations of the computing system. Our experiments were all performed with floating point numbers with double precision. In practice, we made the “idealistic” assumption that all our calculations were exact. They are not, of course, but we have reasons to believe that this is not a relevant issue in practice for our polynomials, even for large degrees: both numerical methods are known to be intrinsically stable with self-correcting errors, and one can often verify a posteriori that indeed all roots have been found (see for instance in [SSt1, Section 2]

, where an explicit worst-case estimate on error bounds for Newton’s method is computed, and

[RaSS] for such an a-posteriori verification for degrees up to ). In fact, in many cases (including all our polynomials) the mutual distance between roots is much larger compared to available precision of computation, so all roots have numerically distinguishable domains of quadratic convergence. In each case it remains an a-posteriori verification that the conceivable problem of insufficient machine precision does not lead to missed roots.

A final remark concerns the stopping criterion: a frequent condition that a root has been found is when the value of the polynomial is smaller than a given threshold. A different criterion is that a particular approximation is -close to an actual root , with a precise bound on . This can be upgraded to the requirement that all roots of have been found if there exists a vector with a guarantee that, up to permutation, for all . That is the stopping criterion that we used for Ehrlich–Aberth implementation.

The question of whether some root has higher multiplicity is irrelevant (and numerically usually not a valid question): what matters is that several are close to each other, whether or not the approximated roots are multiple or only nearby. As we mentioned before, all the polynomials considered in our experiments have well-separated roots.

The actual precision of the computed roots is not of great relevance because of quadratic convergence, provided the roots are simple: once the roots have been found in the sense that they are separated then any desired precision can be achieved in time of about (within the limits of machine precision). For both methods, by far most of the computation time is spent on separating the roots. However if one wants to estimate the error from the perspective of numerical analysis, the paper [Pr] gives some useful a priori and a posteriori bounds for Newton’s method.

3. The Ehrlich–Aberth-method

3.1. Properties of the method; general convergence

The key feature of this method is the recursive step, given in (2), that takes one vector of complex numbers, viewed as initial guesses, and computes a new vector of supposedly improved approximations of the roots of a given degree polynomial . In order to upgrade this iteration step to an algorithm, one needs to specify the vector of initial guesses (starting points), as well as a stopping criterion. Moreover, at least in principle there is the possibility that the iteration fails to find the roots of the polynomials, and this needs to be detected.

While the local convergence of the Ehrlich–Aberth-method in a neighborhood of the roots is understood (Lemma 2), we are not aware of any theory about global convergence properties. The method cannot converge in all cases. An obvious case where it fails is when is a real polynomial with non-real roots, and the vector of starting points is entirely real. Then by symmetry all subsequent iterates will be entirely real, so they cannot converge to the roots. There are similar symmetric situations that prevent convergence to roots.

More conceptually, convergence must fail on larger sets of starting points: the set of initial vectors from which the iteration converges to a vector of roots in any particular order is open, and it is non-empty because it includes a neighborhood of the root vector. Since this is so for every ordered vector of the roots, it follows that the subset of from which convergence occurs is a finite union of disjoint open sets, more than one, and this cannot be all of : therefore convergence must fail on all the boundaries, which must be large enough to separate open sets in , so it must fail on a set of topological dimension at least .

In practice, failure to converge seems rare, and apparently has not been observed except in the case of symmetries, despite extensive experience in implementations such as MPSolve; see [Bi, Ro]. It may well be that, from a dynamical systems point of view, the locus of non-convergence is unstable: any small perturbation may lead to iteration away from this locus and convergence towards the roots; but for all we know this is an open question. There are periodic orbits: these are finitely many points in that are permuted by the iteration. The way how Newton’s method fails to be generally convergent (see Section 4) is because some of its periodic points may become attracting, so they attract a whole open set of starting points, and this happens for all non-trivial cases: even for degree there are polynomials that have attracting cycles of any period. By analogy, one may suspect that the Ehrlich–Aberth-method also has periodic cycles that are attracting. In fact, this has been the intuition of several people in dynamical systems; however, in numerous experiments in particular using the MPSolve implementation no attracting cycles have ever been observed.

At this point a digression to the Weierstrass method may be interesting, even though it is not the focus of the current paper. Like the Ehrlich–Aberth-method, it is an iteration in (undefined at certain points where two or more coordinates coincide). Both methods are known as reliable root finders that seem to find roots generically and efficiently, but both are lacking global theory.

For the Weierstrass method, there are recent results in [ReSS]. Most importantly, it has been discovered that the Weierstrass is not generally convergent: there is an open subset of the space of polynomials of degree or higher that has an open set of starting vectors that fails to converge to any roots. More precisely, there are explicit cubic polynomials such as for which the Weierstrass method has periodic points of period that are attracting. This result seems to go against the expectation of people working in numerical analysis: but it is what had been suspected in analogy to Newton’s method by people in dynamical systems, in particular by Smale.

The result in [ReSS] comes out of systematic investigation of periodic points of small degrees and periods. It is shown that for cubic polynomials, period is minimal: all periodic points of period or are non-attracting because explicit equations are found for the sums of the eigenvalues at periodic points, and these equations are not compatible with the condition that all eigenvalues are in as required for attracting periodic points. Another unexpected property that was discovered is that there are starting points for which the Weierstrass iteration is defined at all times but for which the dynamics tends to ; this is so for (almost) every polynomial of degree or higher.

This situation is less clear for the Ehrlich–Aberth-method. This method seems similar to the Weierstrass method, but it is more complicated: the equations themselves are algebraically more complicated, and Weierstrass has the advantage that it has an invariant subspace of codimension where the interesting dynamics takes place. Therefore, the analysis in [ReSS] (which even for Weierstrass very quickly developes high complexity that limits the cases that can be treated) can cover only basic cases for Ehrlich–Abert, and it is not clear whether attracting cycles might exist, or whether the substantial numerical evidence points to general convergence.

3.2. Starting points; order of update of approximation vector

As mentioned earlier, there is no theory yet about where to start the Ehrlich–Aberth-iteration from. For our experiments, we felt it was natural to use those points that provide good starting points for Newton’s method: hence, we used a vector of points equidistributed on a large circle that surrounds all the roots. In contrast, Bini [Bi] discusses the use of Rouché’s theorem to find several circles that enclose different clusters of roots.

The update from the “old” vector in of possible roots to the “new” vector can be done in (at least) two ways: for the computation of the new coordinates one can either use only the old coordinates throughout (“Jacobi-style”), or, for each , make use of the already computed new coordinates (“Gauss–Seidel-style”). The former approach is a natural iteration in , while the latter is not, but it is considered more efficient.

3.3. The MPSolve Implementation

There is a prominent software package that finds roots of univariate polynomials in practice, authored by Dario Bini, Giuseppe Fiorentino and Leonardo Robol and called MPSolve (with the last published version 3.1.8). This software package has an impressive track record on root finding. Experiments were made for polynomials of degrees exceeding , some of them with coefficients larger than ; see [Bi, Ro]. For all these experiments, usually no more than 20 Ehrlich–Aberth iteration cycles sufficed to find all roots (due to superlinear convergence, the required precision for root finding is not a decisive issue once all roots were found with sufficient precision so as to separate them).

In this implementation, the update of the new approximate root vectors is done in Gauss–Seidel style: every new coordinate was used immediately in all subsequent computations.

4. Newton’s method and iterated refinement: virtues and problems

Newton’s method in its original form is a heuristic not an algorithm: it gives a rule to modify an initial guess, and it comes with a promise that initial guesses sufficiently close to a root will converge to this root. However, many starting points will not converge to a root, and if they do, it is not clear where to start in order to find all roots, not a subset.

For a polynomial , the Newton method is a rational map that very naturally forms a dynamical system on the Riemann sphere . The Riemann sphere decomposes into Julia set and Fatou set , both of which are invariant under the dynamics. The Fatou set is open and dense and contains all the roots. The Julia set is compact and nowhere dense, but it is a non-empty, even uncountable compact set that contains finitely many periodic points of all periods . Any starting points in the Julia set will remain in the Julia set forever, so these points fail to converge to any root. It is known that the Julia set may have positive 2-dimensional Lebesgue measure, so a random starting point may be in the Julia set.

Worse yet, the Fatou set may contain open sets of points that converge to attracting cycles of any period and hence not to any root. The simplest such example occurs for where there is a superattracting 2-cycle for the Newton map ; see Figure 1, but it may happen for any degree and any period; a complete classification may be found in [LMS].

Figure 1. The Newton map for has a superattracting 2-cycle; the domain in black converges to this cycle.

Therefore, individual orbits may fail to converge to a root (even though one may argue that most orbits will converge to a root; see [HSS, Section 4]). If nothing is known about the positions of the roots (other than a normalization such as a circle enclosing all the roots), then one needs more than starting points in order to have a chance that all roots will be found. A sufficient such set of starting points has been constructed explicitly in [HSS].

These starting points have to be placed on a circle, or on a small number of circles, that surround all roots and are at a certain distance from the roots, at places where a good control of Newton’s method can be assured. The disadvantage is that from there it takes a lot of iterations until the orbits reach the region where the roots are; typically, more than orbits each have to undergo more than iterations before interesting dynamics happens: so iterations are “wasted” to move from the domain with control about Newton’s method to the domain where the roots are. This prohibits any complexity of less than iterations.

In response to this problem, the “iterated refinement” Newton method has been developed in [SSt1, RaSS]: the idea is that initially, where the orbits are still close to the initial circles, there is a lot of control on the Newton dynamics so all orbits do just about the same thing. The iterated refinement approach thus starts with a small initial number of starting points on the original circles, perhaps 64 points, and iterates them while their orbits are “parallel” in a sense to be made precise (for instance, in the sense of cross ratios between three adjacent orbits, together with ). One may also think about such orbits as of “boring”, they don’t exhibit any unexpected behavior. Once two or three orbits fail to remain parallel, these orbits are refined by creating additional orbits between them. We think of this occurrence as of indication that dynamics becomes more intricate and needs more orbits to be studied. This way, for certain families of polynomials of degrees up to , all roots were found with log-linear complexity and log-linear computing time [RaSS]. For an illustration of the iterated refinement in action one can have a look at Figure 2.

Figure 2. Illustration of the “iterated refinement” idea: we start with a small number of points and refine when adjacent orbits no longer move in parallel. Note that most of the refinement happens once the orbits are close to the roots, which drastically reduces the necessary computations. Figure taken from [RaSS].

The problem is that this is experimental science, and the original guarantee that all roots will be found by or more starting points is given up in favor of speed of computation. Among the heuristic parameters are the exact way to measure deviation of parallel orbits, as well as a quantitative refinement sensitivity with respect to this measure. Too insensitive refinements will lead to missed roots, while too high sensitivity will lead to slow computation. It is not clear how to fill in roots that are missed in the end. Deflation is usually not an option: it is in general very unstable, and it may destroy the simple structure of the polynomial (for instance, if the polynomial is given by iteration, then after deflation this structure is lost, and the evaluation becomes highly inefficient).

There is another possible optimization for Newton’s method: for those cases where polynomials are given in terms of coefficients, one has numerous parallel Newton orbits, hence has to evaluate the same polynomial at many different points. We did these evaluations at all these points independently, not taking advantage of possible efficient evaluation of polynomials at many points in parallel.

5. Experimental results of both methods in comparison

5.1. Overview of polynomial families considered

The numerical experiments were performed on various classes of polynomials of different degrees, defined in as different ways as possible, and so that the geometry of the root distribution was as different as possible, in order to explore different behavior of the two root finders. We explored the following families of polynomials:

  1. iterated quadratic polynomials (or periodic points of quadratic polynomials), Section 5.4.1;

  2. polynomials that describe the centers of hyperbolic components of the Mandelbrot set, Section 5.4.2;

  3. Chebyshev polynomials, Section 5.4.3;

  4. Legendre polynomials, Section 5.4.4;

  5. polynomials with random roots on a circle, Section 5.4.5;

  6. polynomials with random roots on a disk, Section 5.4.6;

  7. polynomials with roots on a finite square grid, Section 5.4.7.

The motivation for these various cases was quite different: iterated polynomials were originally chosen because they can be evaluated easily, and our focus was on root finding not polynomial evaluation. Chebyshev and Legendre polynomials were selected because the distribution of their roots is well known (along intervals). For polynomials where the coefficients are equidistributed [ET, Ar], the roots tend to accumulate at the unit circle with even distribution; this justifies some of the other choices.

5.2. Evaluation of polynomials

A polynomial of degree is a holomorphic mapping from to of degree ; we would like to stress that one should not assume by default that it is given in terms of coefficients: indeed, all we require is that and can be evaluated in some way (or possibly only ). Indeed, much of the difficulties in finding roots of polynomials lies in evaluating the polynomials, and this problem is especially pronounced for high degree polynomials that are given in terms of coefficients. Here is one of our favorite prominent examples: periodic points of period for are roots of the polynomial , where denotes the -th iterate of . The constant coefficient in has size greater than , which is utterly impossible to handle even for small , while and its derivatives can easily be evaluated for as large as or more [RaSS], and all these roots have been found successfully and easily.

In our experiments, some polynomials are given by recursive formulas that allow for evaluation in logarithmic time with respect to the degree. Others are evaluated in terms of coefficients, and yet others are, for the purposes of these experiments, evaluated in terms of the roots that are known ahead of time: in some experiments we had wanted to use a prescribed distribution of the locations of the roots, so these roots were computed first and the polynomials were evaluated in terms of these. This may be against the spirit of “finding” the roots, but is quite interesting for checking properties of root finding algorithms. Our main focus is not on polynomial evaluation, but we do want to point out that the evaluation is a serious and interesting issue, and often a serious bottleneck in the performance of root finders.

5.3. Remarks on implementation and performance

Our focus was on flexibility of experimentation, not real time optimization. We thus did most of our implementations in Java.

We measured complexity in terms of number of arithmetic operations, and that is independent of the environment of implementation. More precisely, our implementation contains a counter which is updated each time an operation of real addition or multiplication is performed.

In order to work with high-degree polynomials we had to solve potential overflow problems. In particular, when we compute any fraction where numerator or denominator can be unexpectedly large we manually maintain the scientific representation of numbers by remembering mantissa and the order of magnitude.

In cases when a polynomial is given by its roots we computed the value of with the formula .

When performing the Ehrlich–Aberth iteration we update the components of the approximation vector in Gauss-Seidel style. However all components are updated until the iteration stops, even if some components approximate the root well enough. We admit that our implementation of the Ehrlich–Aberth iteration was quite down-to-earth. We believe, however, that the overall structure of the results is unaffected by such possible optimizations. A relevant method of improvement in the special case of polynomials with “fast” evaluation was recently pointed out to us by Dario Bini; this is discussed in Section 6.5.

5.4. Experimental results

We describe the results of our experiments on the performance comparison. Graphs picture the dependence of number of operations (real additions and multiplications) needed to find all the roots in terms of the degree of polynomials. As mentioned earlier, various polynomials were evaluated in linear time with respect to the degree (“slow evaluation”) and others in logarithmic time (“fast evaluation”), depending on their definition.

5.4.1. Iterated quadratic polynomials

A class of polynomials that is particularly easy to evaluate are those that describe periodic points of period for quadratic polynomials , with : these polynomials have the form , where denotes the -th iterate of . These polynomials have degree but can be evaluated in complexity hence in logarithmic complexity with respect to the degree. For certain choices of , all periodic points of periods up to (i.e., degrees up to ) were computed successfully by Newton’s method in [SSt1, RaSS].

Figure 3. Complexity (in terms of arithmetic operations) for finding periodic points of quadratic polynomials; log-log scale. Top: the polynomial ; bottom: the polynomial . This graphs combine results for evaluating the polynomials in coefficient form (“slow evaluation”) and evaluating using iteration (“fast evaluation”). Notice that for the Ehrlich–Aberth-iteration does not accelerate significantly by taking advantage of fast evaluation.

In order to gauge the dependence of complexity of the possibility to evaluate the polynomials efficiently, these experiments were run twice: once by evaluating in logarithmic time (“fast”) and once in linear time (“slow”). The results are shown in Figure 3. It turns out that Newton’s method is much faster, especially for high degrees, when taking advantage of the fast polynomial iteration, but not necessarily otherwise. Moreover, the graphs show clearly that most of the computing time for Ehrlich–Aberth is not spent for the evaluation of the polynomials (but compare the discussion in Section 6.5).

It is a well known and simple fact from complex dynamics that periodic points of are distributed along the Julia sets (even equidistributed with respect to harmonic measure). For the later discussion it may be worth mentioning that the Julia sets are never convex, except when (the Julia set is a circle) or (the Julia set is an interval). Our choice of polynomials are (the Julia set is a Cantor set) and (the Julia set is a dendrite).

5.4.2. Centers of hyperbolic components of the Mandelbrot set

These polynomials in the variable are defined by the iteration and , i.e. , , , etc., so has degree . Equivalently, for the iteration , we have with : the roots are exactly the parameters for which the critical point of is periodic with period . Such parameters are well known to be the centers of hyperbolic components of period for the Mandelbrot set (see for instance [DH, Sch2]). These polynomials can be evaluated by recursion as efficiently as the iterated quadratic polynomials from Section 5.4.1. For large , these points accumulate at the boundary of the Mandelbrot set, so they form a very non-convex set. The results of the experiments are displayed in Figure 4.

Figure 4. Complexity for finding centers of hyperbolic components of the Mandelbrot set; log-log scale.

It may be interesting to note that these polynomials have been used as sample polynomials for evaluating the efficiency the Ehrlich–Abert-method in its implementation in MPSolve (see [BF2]), and independently also for our experiments on Newton’s method (see [SSt1, RaSS]). One has to keep in mind that formally speaking, [BF2] works with polynomials of degree but this difference is, of course, insignificant for root finding.

We recently learned that a new implementation of Ehrlich–Aberth’s method that focuses especially on polynomials given by iteration has yielded particularly efficient evaluation for high degrees; see the discussion in Section 6.5.

5.4.3. Chebyshev polynomials

These polynomials have the special property that they can be evaluated in logarithmic time with respect to the degree, based on the recursion formula . This evaluation works, of course, only for degrees , but we have to mention that Chebyshev polynomials can be defined for arbitrary degree. Two sets of experiments were done for these polynomials as well, with evaluation in logarithmic and in linear time (“fast” and “slow”).

For these polynomials, Newton turned out to be significantly fast than Ehrlich–Aberth in both cases — obviously more so for the recursive evaluation.

Figure 5. Complexity for finding periodic points of Chebyshev polynomials; log-log scale. Top: evaluation of polynomials in coefficient form (“slow evaluation”); bottom: evaluation using iteration.

5.4.4. Legendre polynomials

Here the evaluation of the polynomials was always in linear time. The result of the observed complexity seems comparable to the “slow” experiment of Chebyshev polynomials (Figure 5); which is consistent with the observation that the speed of evaluation is comparable between both polynomial families, and so is the location of the roots (on the unit interval).

Figure 6. Complexity for finding roots of Legendre polynomials; log-log scale. Evaluated “slowly”.

5.4.5. Polynomials with random roots on a circle

Here the structure of the experiment was that initially a given number of roots was distributed randomly on a circle, and then the polynomials were defined in terms of these roots (evaluation in linear complexity). The outcome is that Newton’s method seem to be faster for large degrees, but this effect is not very pronounced.

Figure 7. Complexity for finding roots distributed randomly on the unit circle; log-log scale.

5.4.6. Polynomials with random roots in a disk

For these polynomials, the roots were placed randomly in the unit disk, and the polynomials were again defined in terms of these roots; as a result, their evaluation is linear in the degree (“slow”). In this case, the Ehrlich–Aberth method is consistently faster.

We should mention that in our implementation, the roots were distributed independently over the unit disk, but not with respect to planar Lebesgue measure, but with respect to the linear product measure in polar coordinates: i.e. for the points , the distribution used planar Lebesgue measure for .

Figure 8. Complexity for finding roots distributed randomly in the unit disk; log-log scale.

5.4.7. Polynomials with roots on a square grid

In this set of experiments, the roots were placed on a grid of points in , for various values of , and once again the polynomials were defined in terms of their roots, with evaluation in linear time. Consistently, the Ehrlich–Aberth method was faster than Newton’s method, but not as significantly as for random roots on the disk.

Figure 9. Complexity for finding roots on a square grid; log-log scale.

6. Conclusion

6.1. Analysis of the performance

The outcome of our experiments is that, among our two implementations, none of them is universally better: for some cases, one of the two methods outperforms the other, and for others the result is the opposite. What follows is a heuristic explanation for the observed behavior.

  1. Evaluation of polynomials. Some of the experiments were motivated by the choice of polynomials that can be evaluated in logarithmic time with respect to the degree. This leads to efficient evaluation by Newton’s method. Newton’s method generally outperforms Ehrlich–Aberth for those polynomials: this includes iterated quadratic polynomials (Section 5.4.1) and Chebyshev polynomials (Section 5.4.3), as well as the centers of hyperbolic components that are among the “demo polynomials” for the MPSolve implementation.

    This advantage plays an especially significant role when the degrees of the polynomials are large, and for these has the decisive effect on the overall efficiency outcome.

  2. Location of roots. There are several additional cases when Newton outperforms Ehrlich–Aberth, even though the evaluation of the polynomials was only linear with respect to the degree: among our experiments these were Chebyshev polynomials (Section 5.4.3), Legendre polynomials (Section 5.4.4), and those polynomials where the roots are randomly distributed on the unit circle (Section 5.4.5). These were exactly those experiments where the roots were “exposed” from infinity, that is all roots were on the boundary of the convex hull formed by the roots

    On the other hand, Ehrlich–Aberth was faster, sometimes very much so, when roots were “hidden” in the interior of the convex hull: this includes the case of iterated quadratics with linear (“slow”) evaluation (Section 5.4.1) where the roots are distributed along polynomial sets that are almost never convex, as well as the cases of random roots on the disk (Section 5.4.6) and when the roots were distributed on a grid (Section 5.4.7).

6.2. Complexity

We are not aware of any complete analysis of the complexity of both methods, but there are some partial results that shed some light on the overall efficiency.

For the Ehrlich–Aberth-method, the complexity of every iteration step seems to be of the order , irrespective of how efficiently the polynomials can be evaluated. In the various experiments performed by Bini and Robol, no more than 20 iteration steps needed to be executed overall in order to find all roots (where the required precision for finding the roots is largely irrelevant, at least in case of simple roots, because of the superlinear convergence once all roots are separated). This suggests an overall complexity of ) for finding all roots. It is not sure that this method is generally convergent, even though failure to converge has never been observed experimentally except in cases with clear symmetry. If the method was not generally convergent, then this may have a relevant impact on the complexity.

In the case of Newton’s method, the results are less clear, especially since the method is not generally convergent, and orbits that fail to converge can frequently be observed. One needs at least orbits to iterate, and can be certain to find all roots when starting at initial points [HSS]. Each of these orbits starts away from the disk containing the roots, and in order to have good control on the starting points need to be iterated at least times each in order to land in the disk containing all roots; this gives a lower bound on Newton iterations. It is shown in [BAS] that one can expect, under reasonable assumption of equidistribution of the roots, that the same complexity should also suffice to find all roots. Of course, this complexity needs to be multiplied with the complexity to evaluate each Newton step, which may be or depending on how the polynomial is specified. However, parallel evaluation of all the Newton orbits can yield an overall logarithmic complexity for the Newton evaluation, so that we expect again arithmetic complexity of for some (based on a machine model that has infinite precision in each operation); however, there may be issues with stability.

In [RaSS], the iterated refinement Newton method was introduced; this was also the procedure implemented here as described earlier. The number of orbits that are iterated in parallel goes down from to a much smaller number. The exact complexity depends on the sensitivity of refinement and is difficult to estimate theoretically (too sensitive refinement leads to orbits that are iterated in parallel, hence no gain in complexity; too insensitive refinement leads to missed roots that are hard to recover.)

6.3. A new conjecture on starting points for Newton’s method

Neither of the two root finding methods discussed here has a complete theory about its global dynamics. Very little is known about the global properties of the Ehrlich–Aberth-method (the recent results in [ReSS] show that the related Weierstrass method is not generally convergent, but it is not clear what the conclusion about Ehrlich–Aberth should be). More is known about Newton’s method, beyond the fact that it is not generally convergent and may have attracting cycles of any period. The first quantitative result was published in [HSS], as mentioned earlier; it is usually presented for polynomials with all roots in the unit disk (this is not a restriction after an appropriate coordinate change).

Theorem 4 (Starting points for Newton’s method, [Hss]).

For every degree , there are circles with points on each so that for every polynomial of degree with all roots in , for every root of at least one of these points converges to under iteration of the Newton method.

These circles and the points on them, as well as the constants and are explicitly given. When allowing to take the circles far away from (thus allowing many iterations until the points on the circle reach the disk containing the roots), one may get .

This result was improved in [BLS], using a probabilistic set of starting points, to a total number of starting points of size

that finds all roots with arbitrary desired probability (where of course the constant

depends on this probability).

In the experiments described in this paper, we discovered that Newton’s method finds roots more easily when they are on the boundary of the convex hull of all the roots. We believe that the complexity to find all roots with this property is much better, and suggest the following.

Conjecture 1.

For every radius there is a universal constant with the following property. If is a polynomial of degree , rescaled so that all its roots are in the unit disk, then equidistributed points on the circle have the property that for each root on the boundary of the convex hull of all roots, at least one of the given points will converge to this root under iteration of Newton’s method.

In other words, there is a universal set of points that finds all the roots of that are on the boundary of the convex hull. We believe that for big .

This conjecture goes back to the Bachelor thesis [Sh]. In order to express the underlying idea, we need to review some background from [HSS]. For every root , the immediate basin of the root is the connected component containing of the set of points that converge to under the Newton iteration. A channel is a homotopy class of curves in with endpoints fixed that connect the root to . Every root has some number of channels with . Each of these channels has a certain “thickness”, measured in terms of a conformal invariant. The thicker a channel, the fewer starting points are needed in order to be sure that one of them is in the immediate basin. The more channels a root has, the thinner they may be, and the more starting points are required.

The idea behind the conjecture is that every root on the boundary of the convex hull has “main channel” with the property that the root can be connected to along this main channel so that the convex hull of the roots is avoided. While this is a theorem, the conjecture is based on the assumption that a main channel has at least the same thickness as a root with only two channels. The conjecture would then follow as in [HSS] in the case when all roots are real.

Of course, with the conjectured improvement of the number of starting points, the complexity of finding all the roots on the boundary of the convex hull, as described in [Sch1, BAS] would also improve.

6.4. Postprocessing; mixing both approaches

One notorious issue for the Iterated Refinement Newton Method is that it is optimized beyond the regime for which convergence can be guaranteed by theoretical bounds; it depends on some experimental parameters such as the refinement threshold. In case of too optimistic refinement, hence too insensitive refinement threshold, a certain number of roots may be missed in the process. One possible conclusion is to run a “postprocessing step” of Ehrlich–Aberth on the approximate roots found by Newton. If most roots were found by Newton initially, say only of the roots are still missing, then the complexity of each Ehrlich–Aberth-postprocessing step requires operations, so this implies a nontrivial lower bound on the overall complexity. However, since the total number of Ehrlich–Aberth-steps is usually quite small (as mentioned in Section 3.3 and also confirmed by our experiments), this may become a realistic option only when .

This post-processing was implemented in various of our experiments. The general observation was that, compared to necessary post-processing by Ehrlich–Aberth, the performance of Newton’s method was generally better when more sensitive refinement parameters were chosen that avoided any post-processing steps, even though this led to earlier refinement overall.

Another rather radical approach is to run both methods in parallel, sharing execution time equally between both processes, and stopping once one of the two methods terminates successfully. This seems wasteful as roughly half the operations are not used at all eventually. However, the overall complexity is then the minimum of both methods, and if the complexities of both of them differ significantly in ways that cannot be predicted efficiently ahead of time, this may lead to overall improved complexity. (After all, for each of the two methods our focus was on complexity not on improvement by bounded factors, so factors or two were often ignored.)

6.5. Possible improvements

We take it as an encouraging sign that since the times of our original experiments, there have been inspiring discussions and suggestions for improvements on both methods.

Newton’s method has its most obvious bottleneck in the case of polynomials with “slow” evaluation, i.e. when the computation of has complexity . Since this is a parallel computation, it can in principle be accelerated by computing all orbits in parallel. We were aware of fast algorithms that were, however, too unstable to be used in practice. We are grateful to Dan Spielman for having pointed out to us that meanwhile there are fast stable methods (albeit apparently not easy to implement). These should have the potential to speed up Newton’s method even for general polynomials to a speed comparable to those with “fast” evaluation.

On the other hand, the Ehrlich–Aberth iteration step, especially in the form given in (2), has two major ingredients: an evaluation of the logarithmic derivative , and the correction term . The logarithmic derivative benefit from the same advantages of fast computation as for Newton’s method (especially in the cases of particularly suitable polynomials like the ones considered in some of our experiments, but also for general polynomials as described above). However, since the correction term requires operations, its exact computation can far outweigh all computations depending on the polynomials (which is why we had not optimized polynomial evaluations for the Ehrlich–Aberth-method). Soon after the first version of this paper was submitted, we learned from Dario Bini and Victor Pan about the Fast Multipole Method (FMM; see [CGR]) that makes a very fast approximate computation of the correction term possible. Since all described methods only compute (hopefully ever-improving) approximations, it is not a structural disadvantage to settle for “approximate approximations” — even more so as there is still no global theory for the Ehrlich–Aberth-method even in its precise form.

Dario Bini informed us that he has used this improved Ehrlich–Aberth-method very recently for certain polynomials that were defined by fast recursion (possibly inspired by the super-fast performance of Newton’s method for such polynomials) with almost linear CPU time up to very high degrees.

It would be an interesting project to perform a fresh set of experiments where both methods are optimized in the ways described, and see whether the relative advantages or comparisons in terms of recursive form of the polynomials, and shape of the roots, persist. This would, however, be an entirely new project, inspired not the least by the results of the experiments described here.

6.6. Conclusion

This project of comparing the efficiency between the Newton and Ehrlich–Aberth root finders is experimental, not an exact science. It is supposed to inspire or possibly provoke further investigations into the question which root finders perform more efficiently under which circumstances, and how this depends on the properties of the polynomials — and possibly of the specific clever implementations of the root finders; and we are quite pleased to learn that some of this has already happened since the first version of this paper was submitted. Surprisingly, even at this time the fundamental question of finding all roots of univariate complex polynomials seems far from resolved and invites for more progress to be made!

References