Counting the real and complex roots of a univariate polynomial has always been a fundamental task in computer algebra and numerical analysis. For example, given a routine that counts the number of real roots of a polynomial within an interval, we can compute each real root to arbitrary precision through bisection, in which sense we have solved the polynomial equation. Another example would be the Routh-Hurwitz stability criterion (Arnold, 1992, Section 23)(Marden, 1949, Chapter 9), where the stability of a linear system can be tested by deciding the number of complex roots of the characteristic polynomial within the left half-plane (left of the imaginary axis).
Numerous methods have been invented in the symbolic and numerical computing community to efficiently count (or test) real and complex roots of a polynomial (Yap and Sagraloff, 2011; Wilf, 1978; Collins and Krandick, 1992). However, in the theorem proving community, where procedures are usually formally verified in a foundational proof assistant (e.g., Coq, Isabelle, HOL and PVS), our choices are typically limited to relying on Sturm’s theorem to count distinct roots within an interval through signed remainder sequences.
In this paper, we aim to reinforce our root-counting ability in the Isabelle theorem prover (Paulson, 1994). In particular, our main contributions are the following:
We have mechanised a proof of the Budan-Fourier theorem and a subsequent roots test based on Descartes’ rule of signs and Taylor shift. This roots test efficiently over-approximates the number of real roots within an open interval, counting multiplicity.
We have made a novel extension to our previous formalisation of Sturm’s theorem to count real roots with multiplicity.
Benefited from the developments above, we have extended our previous verified complex-root-counting procedures to more general cases: zeros on the border are allowed when counting roots in a half-plane; we can now additionally count roots within a ball.
All results of this paper have been formalised in Isabelle/HOL without using extra axioms, and the source code is available from the following URL:
This paper continues as follows: after introducing some frequently used notations (§2), we first present a formal proof of the Budan-Fourier theorem (§3), which eventually leads to the Descartes roots test. Next, we extend our previous formalisation of the classical Sturm theorem to count multiple real roots (§4). As an application, we apply those newly-formalised results to improve our previous complex-root-counting procedures (§5). After that, we discuss related work (§6) and some experiments (§7). Finally, we conclude the paper in §8.
Below, we will present definitions and proofs in both natural language and the formal language of Isabelle. Some common notations are as follows:
we often use p and q in our proof scripts to denote polynomials. However, when presenting them in mathematical formulas, we will switch to capitalised letters— and .
poly p a in Isabelle means : the value of the univariate polynomial evaluated at the point .
denotes the multiplicity/order of as a root of the polynomial . In Isabelle, this becomes order a p.
denotes the extended real numbers:
and denote the number of real and complex roots of counting multiplicity within the set , while in Isabelle they both correspond to proots_count p s—they are distinguished by the type of the polynomial p.
Similarly, and denote the number of distinct real and complex roots of within . In Isabelle, they correspond to card (proots_within p s) differentiated by the type of the polynomial p.
Many of the theorems presented in this paper will involve sign variations, so we give a definition here:
Definition 2.1 (Sign variations).
Given a list of real numbers , we use to denote the number of sign variations after dropping zeros. Additionally, we abuse the notation by letting
where is a sequence of univariate polynomials, and is interpreted as the number of sign variations of evaluated at . Finally, given , we also use to denote sign variations of the coefficient sequence of :
Example 2.2 ().
By Definition 2.1, we can have calculations like the following :
3. From the Budan-Fourier Theorem to the Descartes Roots Test
In this section, we first formalise the proof of the Budan-Fourier theorem. We then apply this to derive Descartes’ rule of signs, which effectively over-approximates the number of positive real roots (counting multiplicity) of a real polynomial by calculating the sign variations of its coefficient sequence. We then use this to show the Descartes roots test111 There does not seem to be a uniform name for this test (Akritas et al., 2008; Akritas, 2008; Collins et al., 2002)—here we follow the one used in Arno Eigenwillig’s PhD thesis (Eigenwillig, 2008) where he refers this test as ”the Descartes test for roots”.: given a polynomial of degree and a bounded interval , we can apply Descartes’ rule of signs to a base-transformed polynomial
Our formal proof of the Budan-Fourier theorem and Descartes’ rule of signs roughly follows the textbook by Basu et al. (Basu et al., 2006), while that of the Descartes roots test is inspired by various sources (Collins and Akritas, 1976; Eigenwillig, 2008; Kobel et al., 2016).
3.1. The Budan-Fourier Theorem
Definition 3.1 (Fourier sequence).
Let be a univariate polynomial of degree . The Fourier sequence of is generated through polynomial derivatives:
Theorem 3.2 (The Budan-Fourier theorem).
Let , be two extended real numbers (i.e., ) such that . Through Fourier sequences and sign variations, the Budan-Fourier theorem over-approximates and the difference is an even number:
and is even.
To prove Theorem 3.2, the key idea is to examine sign variations near a root of :
where is a possible root of and is a small real number.
Regarding , the property we have derived in Isabelle/HOL is the following:
Lemma 3.3 (budanfourierauxleft).
fixes c d1real and preal poly assumes d1 c and p 0 assumes nonzerox d1 x x c q set pders p poly q x 0 shows changesitvder d1 c p order c p even changesitvder d1 c p order c p
order c p is : the order/multiplicity of as a root of the polynomial p, as we described in §2;
changes_itv_der d1 c p stands for ;
the assumption non_zero asserts that (i.e., poly q x 0) for all and . Here, pders p is the Fourier sequence (i.e., ) and set (pders p) converts this sequence/list into a set of polynomials.
Proof of Lemma 3.3.
By induction on the degree of . For the base case (i.e., the degree of is zero), the proof is trivial since both and are equal to 0.
For the inductive case, through the induction hypothesis, we have
First, we consider the case when . In this case, we can derive
which concludes the proof.
Considering , we have an analogous proposition:
Lemma 3.4 (budanfourierauxright).
fixes c d2real and preal poly assumes c d2 and p 0 assumes x c x x d2 q set pders p poly q x 0 shows changesitvder c d2 p 0
which indicates that , since can be treated as .
Proof of Lemma 3.4.
Similar to that of Lemma 3.3: by induction on the degree of and case analysis. ∎
Lemma 3.5 (budanfourierauxleft).
fixes c d1real and preal poly assumes d1 c and p 0 assumes nonzerox d1 x x c q set pders p poly q x 0 shows changesitvder d1 c p order c p even changesitvder d1 c p order c p
Finally, we come to our mechanised statement of the bounded interval case of Theorem 3.2:
Theorem 3.6 (budanfourierinterval).
fixes a breal and preal poly assumes a b and p 0 shows changesitvder a b p prootscount p x a x x b even changesitvder a b p prootscount p x a x x b
where prootscount p x a x x b denotes , which is the number of real roots of (counting multiplicity) within the interval .
Proof of Theorem 3.6.
By induction on the number of roots of polynomials in within the interval . For the base case, we have
and then, by Lemma 3.5,
In addition, (12) also leads to
Regrading the inductive case, let be the largest root within the interval of the polynomials from :
With the induction hypothesis, we have
Also, considering there is no root of within (otherwise it will be larger than , contradicting (15)), we have
Finally, Lemma 3.5 yields
An interesting corollary of the Budan-Fourier theorem is that when all roots are real, the over-approxiamation (i.e., ) becomes exact:
Corollary 3.7 (budanfourierreal).
fixes a breal and preal poly assumes allrootsreal p shows prootscount p x x a changesleder a p a b prootscount p x a x x b changesitvder a b p prootscount p x b x changesgtder b p
allrootsreal p is formally defined as every complex root of having a zero imaginary part,
changesleder a p encodes ,
changesitvder a b p encodes ,
changesgtder b p encodes .
3.2. Descartes’ Rule of Signs
Given , and a polynomial , the Budan-Fourier theorem (Theorem 3.2) in the previous section grants us an effective way to over-approximate (by an even number) through calculating .
Nevertheless, the approximation still requires calculating a Fourier sequence () and a series of polynomial evaluations. When and , the approximation can be refined to counting the number of sign variations of the coefficient sequence of , which requires almost no calculation! Approximating using (rather than ) is the celebrated Descartes’ rule of signs:
Theorem 3.8 (descartessign).
fixes preal poly assumes p 0 shows changes coeffs p prootscount p x 0 x even changes coeffs p prootscount p x 0 x
where changes coeffs p encodes —sign variations of the coefficient sequence of .
3.3. Base Transformation and Taylor Shift
Given , with Descartes’ rule of signs (Theorem 3.8) in the previous section, we can efficiently approximate . However, in many cases, we are interested in roots within a bounded interval: , where and . Can we still exploit the efficiency from Descartes’ rule of signs? The answer is yes, via an operation to transform into such that
In order to develop this transformation operation, we first define the composition of a univariate polynomial and a rational function (i.e., a function of the form , where and are polynomials) in Isabelle/HOL: definition fcompose a field poly a poly a poly a poly where fcompose p q1 q2 fst foldcoeffs a r1r2 r2 a q1 r1q2 r2 p 01 where
q1 and q2 are respectively the numerator and denominator of a rational function,
fst gives the first part of a pair,
a is a constant polynomial lifted from the value .
Also, foldcoeffs is the classical foldr operation on a coefficient sequence: definition foldcoeffs azero b b a poly b b where foldcoeffs f p foldr f coeffs p Essentially, let , , and be three univariate polynomials over some field such that is of degree . Our composition operation over these three polynomials (i.e., fcompose p q1 q2) gives the following polynomial:
The idea of (27) can be illustrated by the following mechanised lemma:
Lemma 3.9 (polyfcompose).
fixes p q1 q2afield poly assumes poly q2 x 0 shows poly fcompose p q1 q2 x poly p poly q1 x poly q2 x poly q2 x degree p
where poly p x gives the value of the polynomial p when evaluated at x.
When and , (27) yields a transformation (i.e., Taylor shift):
with which we have achieved (26):
Lemma 3.10 (prootscountposinterval).
fixes a breal and preal poly assumes p 0 and a b shows prootscount p x a x x b prootscount fcompose p ba 11 x 0 x
ba encodes the polynomial ,
11 stands for the polynomial .
3.4. The Descartes Roots Test
Finally, we come to the Descartes roots test. Given , and , the Descartes roots test is : the number of sign variations on the coefficient sequence of the Taylor-shifted polynomial : definition descartesrootstest real real real poly nat where descartesrootstest a b p nat changes coeffs fcompose p ba 11 where
fcompose p ba 11 encodes Taylor shift as in (28),
coeffs converts a polynomial into its coefficient sequence,
changes calculates the number of sign variations (i.e., ),
nat converts an integer into a natural number.
Just like , whose root approximation property has been reflected in Theorem 3.6, has a similar theorem related to :
Theorem 3.11 (descartesrootstest).
fixes a breal and preal poly assumes p 0 and a b shows prootscount p x a x x b descartesrootstest a b p even descartesrootstest a b p prootscount p x a x x b
which claims that always exceeds by an even number.
As an approximation, it is natural to ask when the Descartes roots test () is exact. From Theorem 3.11, it is easy to see that it would be exact as least when and . Also, it is analogous to Corollary 3.7 that is exact when all roots are real:
Corollary 3.12 (descartesrootstestreal).
fixes a breal and preal poly assumes allrootsreal p and a b shows prootscount p x a x x b descartesrootstest a b p
Ever since the seminal paper by Collins and Akritas (Collins and Akritas, 1976), the Descartes roots test has been closely linked to modern real root isolation (Kobel et al., 2016; Eigenwillig, 2008; Rouillier and Zimmermann, 2004), where an effective method is needed for testing if an interval has zero or exactly one root. Although Sturm’s theorem (which has already been formalised in Isabelle (Li, 2014; Li and Paulson, 2016a; Eberl, 2015a)
) is also up to the task of root testing, it is considered too slow in modern computer algebra. Our mechanised version of the Descartes roots test is, by no means, state of the art; it is probably the most straightforward and naive implementation. Improvements over our current implementation are mainly about avoiding exact arithmetic, and the approaches include partial Taylor shift(Kobel et al., 2016) and bitstream arithmetic (Eigenwillig, 2008, Chapter 3).
4. Extending Sturm’s Theorem to Exactly Count Multiple Roots
With the Descartes roots test we obtained from the previous section, we have an effective method to over-approximate the number of roots (with multiplicity) within an interval. However, we may sometimes want to know the exact number, as we will describe below (§5). For now, we only have the classical Sturm theorem available (in Isabelle/HOL), which only counts distinct real roots. In this section, we extend our previous formalisation of Sturm’s theorem so that we will be able to count roots with multiplicity and exactly.
Our mechanised proof follows Rahman and Schmeisser (Rahman and Schmeisser, 2002, Theorem 10.5.6).
Theorem 4.1 (Sturm’s theorem).
Let , such that , , and . Sturm’s theorem claims
is the number of distinct roots of the polynomial within the interval ,
is the first derivative of ,
is as in Definition 2.1,
is the signed remainder sequence:
such that , , (), and .
The core idea of our extended Sturm’s theorem is to extend the remainder sequence ():
Definition 4.2 (Extended signed remainder sequence).
Let . The extended signed remainder sequence
is defined as , , and for :
until such that by (30).
In Isabelle/HOL, and are respectively mechanised as smods and smods_ext: function smods real poly real poly real poly list where smods p q if p 0 then else p smods q p mod q function smodsext real poly real poly real poly list where smodsext p q if p 0 then else if p mod q 0 then p smodsext q p mod q else p smodsext q pderiv q where  is an empty list and is the Cons operation on lists—adding one element at the start of a list.
As extends (from the back), it is natural to consider as a prefix of :
Lemma 4.3 (smodsextprefix).
fixes p qreal poly defines r last smods p q assumes p 0 and q 0 shows smodsext p q smods p q tl smodsext r pderiv r
last gives the last element of a list,
@ concatenates two lists,
tl removes the head of a list,
pderiv returns the first derivative of a polynomial.
Moreover, we may need to realise that the last element of is actually the greatest common divisor (gcd) of and up to some scalar:
Lemma 4.4 (lastsmodsgcd).
fixes p qreal poly defines r last smods p q assumes p 0 shows r smult leadcoeff r gcd p q
smult multiplies a polynomial with a scalar,
lead_coeff gives the lead coefficient of a polynomial.
Finally, we can state (the bounded version of) our extended Sturm’s theorem:
Theorem 4.5 (sturmextinterval).
fixes a breal and preal poly assumes a b and poly p a 0 and poly p b 0 shows prootscount p x a x x b changesitvsmodsext a b p pderiv p
where changesitvsmodsext a b p pderiv p encodes . Essentially, Theorem 4.5 claims that under some conditions
Proof of Theorem 4.5.
By induction on the length of , and case analysis on whether . When , the proof is trivial since both and provided .
When , we let be the last element of , and Lemma 4.4 gives us
where is the leading coefficient of .
The essential part of the proof is to relate and :
In particular, (33) has been derived by
Moreover, the induction hypothesis yields
and the classical Sturm theorem (Theorem 4.1) yields
Also, by joining Lemma 4.3 and definition of , we may have
Be aware that Lemma 4.5 only corresponds to the bounded version of the extended Sturm’s theorem. Our formal development also contains unbounded versions (i.e., when or ).
5. Applications to Counting Complex Roots
In the previous sections (§3 and §4), we have demonstrated our enhancements for counting real roots in Isabelle/HOL. In this section, we will further apply those enhancements to improve existing complex-root-counting procedures (Li and Paulson, 2018).
In particular, we will first review the idea of counting complex roots through Cauchy indices in §5.1. After that, we will apply the extended Sturm’s theorem (§4) to remove the constraint of forbidding roots on the border when counting complex roots in the upper half-plane (§5.2). In §5.3, we will combine the improved counting procedure (for roots in the upper half-plane) and the base transformation in §3.3 to build a verified procedure to count complex roots within a ball. Finally, we give some remarks about counting complex roots (§5.4).
5.1. Number of Complex Roots and the Cauchy Index
In this section we will briefly review the idea of counting complex roots through Cauchy indices, For a more detailed explanation, the reader can refer to our previous work (Li and Paulson, 2018).
Thanks to the argument principle, the number of complex roots can be counted by evaluating a contour integral:
where , is the first derivative of and is the number of complex roots of (counting multiplicity) inside the loop . Also, by the definition of winding numbers, we have
Now the question becomes how to evaluate the winding number . One of the solutions is to utilise the Cauchy index.
To define the Cauchy index, we need to first introduce the concept of jumps:
Definition 5.1 (Jump).
For and , we define
We can now proceed to define Cauchy indices by summing up these jumps over an interval and along a path.
Definition 5.2 (Cauchy index).
For and , the Cauchy index of over the interval is defined as
Definition 5.3 (Cauchy index along a path).
Given a path and a point , the Cauchy index along about is defined as
As the Cauchy index captures the way that crosses the line , we can evaluate the winding number through the Cauchy index:
Theorem 5.4 ().
Given a valid path and a point , such that is a loop and is not on the image of , we have
What is more interesting is that (or ) can be calculated through remainder sequences and sign variations when (or ) is a rational function. That is, the right-hand side of (44) becomes executable, and we have a procedure to count .
5.2. Resolving the Root-on-the-Border Issue when Counting Roots within a Half-Plane
Fundamentally, the complex-root-counting procedure in the previous section relies on the winding number and the argument principle, both of which disallow roots of on the border . As a result, both mechanised procedures — counting roots within a rectangle and within a half-plane — in our previous work will fail whenever there is a root on the border. In this section, we will utilise our newly mechanised extended Sturm’s theorem to resolve the root-on-the-border issue when counting roots within a half-plane. Note that the root-on-the-border issue for the rectangular case, unfortunately, remains: we leave this issue for future work.
Considering that any half-plane can be transformed into the upper half-plane through a linear-transformation, we only need to focus on the upper-half-plane case: definition prootsupper complex poly nat where prootsupper p prootscount p z Im z 0 where prootsupper p encodes— the number of complex roots of within the upper half-plane .
Previously, we relied on the following lemma to count :
Lemma 5.5 (prootsuppercindexeq).
fixes pcomplex poly assumes leadcoeff p 1 and norealroots xproots p Im x 0 shows prootsupper p degree p cindexpolyubd mappoly Im p mappoly Re p 2
leadcoeff p 1 asserts the polynomial to be monic,
the assumption no_real_roots asserts that does not have any root on the real axis (i.e., the border). This assumption is, as mentioned earlier, because the argument principle disallows roots on the border.
cindexpolyubd mappoly Im p mappoly Re p encodes the Cauchy index
which can be computed (through remainder sequences and sign variations) due to being a rational function.
To solve the root-on-the-border issue in Lemma 5.5, we observe the effect of removing roots from a horizontal border:
Lemma 5.6 (cindexErootsonhorizontalborder).
fixes p q r complex poly and stcomplex and a b sreal defines linepath st st ofreal s assumes p q r and leadcoeff r 1 and xproots r Im x Im st shows cindexE a b t Im poly p t Re poly p t cindexE a b t Im poly q t Re poly q t
where the polynomial is the result of after removing some roots on the horizontal border . Lemma 5.6 claims that
That is, the Cauchy index will remain the same if we only drop roots on a horizontal border.
We can now refine Lemma 5.5 by dropping the no_real_roots assumption:
Lemma 5.7 (prootsuppercindexeq).
fixes pcomplex poly assumes leadcoeff p 1 shows prootsupper p degree p prootscount p x Im x0 cindexpolyubd mappoly Im p mappoly Re p 2
where are respectively the real and complex part of a complex polynomial such that . The rationale behind (45) is that each root of on the real axis () is actually real and is also a root of both and . More importantly, the right-hand side of (45) is where we will apply our extended Sturm’s theorem in §4.
Proof of Lemma 5.7.
Let be the polynomial after removing all the roots on the border (i.e., the real axis) such that
Moreover, and are related by the fundamental theorem of algebra:
and Lemma 5.6 brings us the equivalence between two Cauchy indices:
Finally, we can have a code equation (i.e., executable procedure) from the refined Lemma 5.7 to compute :
Lemma 5.8 (prootsuppercode1code).
prootsupper p if p 0 then let pm smult inverse leadcoeff p p pI mappoly Im pm pR mappoly Re pm g gcd pI pR in nat degree p changesRsmodsext g pderiv g changesRsmods pR pI div 2 else Codeabort STR prootsupper fails when p0 prootsupper p
pm is a monic polynomial produced by divided by its leading coefficient, and this monic polynomial is required by the assumption leadcoeff p 1 in Lemma 5.7.
changesRsmods pR pI stands for
which computes in Lemma 5.7.
The command Codeabort raises an exception in the case of .
Overall, Lemma 5.8 asserts that
is equivalent to an executable expression: the right-hand side of Lemma 5.8. Because it is declared as a code equation, it implements a verified procedure to compute . We can now type the following command in Isabelle/HOL: value prootsupper 1 2 1 to compute , which was not possible in our previous work (Li and Paulson, 2018) since the polynomial has a root on the border (i.e., the real axis).
5.3. Counting Roots within a Ball
In this section, we will introduce a verified procedure to count , the number of complex roots of a polynomial within the ball . The core idea is to use the base transformation in §3.3 to convert the current case to the one of counting roots within the upper half-plane, and then make use of the procedure in §5.2 to finish counting.
Let prootsball denote : definition prootsball complex poly complex real nat where prootsball p z0 r prootscount p ball z0 r
With the transformation operation (fcompose) we developed in §3.3, we can derive the following equivalence relation in the number of roots:
Lemma 5.9 (prootsballplaneeq).
fixes pcomplex poly assumes p 0 shows prootscount p ball 0 1 prootscount fcompose p 1 1 z 0 Im z
where is the degree of .
Moreover, we can relate roots between different balls using normal polynomial composition:
Lemma 5.10 (prootsuballeq).
fixes pcomplex poly and z0complex and rreal assumes p 0 and r 0 shows prootscount p ball z0 r prootscount p p z0 ofreal r ball 0 1
where p encodes the composition operation between two polynomials. Overall, Lemma 5.10 claims