Local solution clustering for a triangular system of polynomials

06/26/2018 ∙ by Rémi Imbach, et al. ∙ NYU college 0

This report is about finding clusters of complex solutions of triangular systems of polynomial equations. We introduce the local solution clustering problem for a system of polynomial equations, that is grouping all its complex solutions lying in an initial complex domain in clusters smaller than a given real number ϵ>0, and counting the sum of multiplicities of the solutions in each clusters. For triangular systems, we propose a criterion based on the Pellet theorem to count the sum of the multiplicities of the solutions in a cluster. We also propose an algorithm for solving the local solution clustering problem for triangular systems, based on a recent near-optimal algorithm for clustering the complex roots of univariate polynomials. Our algorithm is numeric and certified. We implemented it and compared it with two homotopy solvers for randomly generated triangular systems. Our solver always give correct answers, is often faster than the homotopy solver that gives often correct answers, and sometimes faster than the one that gives sometimes correct results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This report considers the fundamental problem of finding the complex solutions of a system of polynomial equations in complex variables . The system is triangular in the sense that for

. Throughout this paper, we use boldface symbols to denote vectors and tuples; for instance

stands for .

We are interested in finding clusters of solutions of triangular systems and in counting the total multiplicity of solutions in clusters. Solving triangular systems is a fundamental task in polynomial equations solving, since many algebraic approaches (Gröbner basis, CAD, resultants,…) generally reduce the original system to triangular systems.

The problem of isolating the complex solutions of a polynomial system in an initial region-of-interest (ROI) is defined as follows: let denote the set of solutions of in , regarded111 A multiset is a pair where is an ordinary set called the underlying set and assigns a positive integer to each . Call the multiplicity of in , and the total multiplicity of . Also, let denote the cardinality of . If , then is called a singleton. We can form the union of two multisets with underlying set , and the multiplicities add up as expected. as a multiset.

Local Isolation Problem (LIP): Given: a polynomial map , a polybox , Output: a set of pairwise disjoint polydiscs of radius where Output: - . Output: - each is a singleton.

This is “local” because we restrict attention to roots in a ROI . There are two issues with (LIP) as formulated above: deciding if is a singleton, and deciding if such a singleton lies in , are two “zero problems” that require exact computation. Generally, this can only be decided if is algebraic. Even in the algebraic case, this may be very expensive. In [22, 4] these two issues are side-stepped by defining the local clustering problem which is described next.

Before proceeding, we fix some general notations for this paper. A polydisc is a vector of complex discs. The center of is the vector of the centers of its components and the radius of is the vector of the radii of its components. If is any positive real number, we denote by the polydisc that has the same center than and radius . We also say if each component of is . A (square complex) box is a complex interval where and ; the width of is and the center of is . A polybox is a vector of boxes. The center of is the vector of the centers of its components; the width of is the max of the widths of its components. If is any positive real number, we denote by the polybox that has the same center than and width . It is also convenient to identify as the subset of ; a similar remark applies to .

We introduce three notions to define the local solution clustering problem. Let be a solution of . The multiplicity of in , also called the intersection multiplicity of in is classically defined by localization of rings as in [23, Def. 1, p. 61], we denote it by . An equivalent definition uses dual spaces, see [11, Def.  1, p. 117]. For any set , we denote by the multiset of zeros (i.e., solutions) of in , and the total multiplicity of . If is a polydisc, we call a cluster if it is non-empty, and is an isolator of the cluster. If in addition, we have that , we call a natural cluster and call a natural isolator. In the context of numerical algorithm, the notion of cluster of solutions is more meaningful than that of solution with multiplicity since the perturbation of a multiple solution generates a cluster. We thus “soften” the problem of isolating the solutions of a triangular system of polynomial equations while counting their multiplicities by translating it into the local solution clustering problem defined as follows:

Local Clustering Problem (LCP): Given: a polynomial map , a polybox , Output: a set of pairs where: Output: - the s are pairwise disjoint polydiscs of radius , Output: - each Output: - .

In this (LCP) reformulation of (LIP), we have remove the two “zero problems” noted above: we output clusters to avoid the first problem, and we allow the output to contain zeroes outside the ROI to avoid the second one. We choose for simplicity; it is easy to replace the factor of by for any desired .

Overview.

In the remaining of this section we explain our contribution, summarize previous work and the local univariate clustering method of [4]. In Sec. 2, we define the notion of tower of clusters together with a recursive method to compute the sum of multiplicities of the solutions it contains. Sec. 3 analyzes the loss of precision induced by approximate specialization. Our algorithm for solving the local clustering problem for triangular systems is introduced in Sec. 4. The implementation and experimental results are presented in Sec. 5.

1.1 Our contributions

We propose an algorithm for solving the complex clustering problem for a triangular system with a zero-dimensional solution set. To this end, we propose a formula to count the sum of multiplicities of solutions in a cluster. Our formula is derived from a result of [23] that links the intersection multiplicity of a solution of a triangular system to multiplicities in fibers. We define towers of clusters to encode clusters of solutions of a triangular system in stacks (or towers) of clusters of roots of univariate polynomials.

Our algorithm leverages from the triangular form of : it computes first clusters of solutions of , then clusters of roots of on fibers over clusters previously found. The components of those fibers are clusters of roots of univariate polynomials that are advantageously represented by oracles; oracle means here a procedure providing approximations at any precision. The coefficients of specialized in fibers are thus also known by oracles.

To compute clusters of roots of a univariate polynomial given as an oracle, we rely on the recent algorithm described in [4], based on a predicate introduced in [5] that combines Pellet’s theorem and Graeffe iterations to determine the number of roots counted with multiplicities in a complex disc; this predicate is called soft because it only requires the polynomial to be known as approximations. It is used in a subdivision framework combined with Newton iterations to achieve a near optimal complexity.

We implemented and experimented our algorithm; we show that it compares advantageously with two homotopy solvers: HOM4PS-2.0 that is fast but not robust and Bertini that is more robust but slower.

1.2 Related work

There is a vast literature on solving polynomial systems and we can only refer to book surveys and references therein, see for instance [12, 20]. On the algebraic side, symbolic tools like Groebner basis, resultant, rational univariate parametrization, triangularization, enable to reduce the problem to the univariate case. These methods are global: they do not take advantage of solving in a predefined small domain. Being symbolic, these methods handle all input, in particular with solutions with multiplicities, and are certified but at the price of a high complexity that limits their use in practice.

On the numerical side, one can find subdivision and homotopy methods. The main advantage of subdivision methods is their locality: the practical complexity depends on the size of the solving domain and the number of solutions in this domain. Their main drawback is that they are only practical for low dimensional systems. On the other hand, homotopy methods are efficient for high dimensional systems, they are not local but solutions are computed independently from one another. Numerical methods only work for restricted classes of systems and the certification of the output remains a challenge. Multiprecision arithmetic, interval analysis, deflation and -theory are now classical tools to address this certification issue [19, 13, 6, 21].

In the univariate case, practical certified algorithms are now available for real and complex solving that match the best known complexity bounds together with efficient implementations [17, 15].

Only a few work address the specific problem of solving triangular polynomial systems. The solving can then be performed coordinate by coordinate by specialization and univariate solving in fibers. For real solving, see [7] for the regular case and [8, 23] for systems with solutions with multiplicities.

1.3 Definitions and Notation

Convention for Vectors. We introduce some general conventions for vectors that will simplify the following development. Vectors are indicated by bold fonts. If is an -vector, and , then the -th component is222 In general, since and are independent variables. So our bold font variables do not entail the existence of non-bold font counterparts such as . denoted and the -vector is denoted . Thus , and “” is an idiomatic way of saying that is an -vector. Because of the subscript convention, we will superscripts such as , etc, to distinguish among a set of related -vectors.

Normed Vector Spaces. In order to do error analysis, we need to treat and as normed vector spaces: for and , let and denote the infinity norm on polynomials and vectors, respectively. We use the following perturbation convention: let . Then we will write to denote some polynomial that satisfies . Similarly, denotes some vector that satisfies . If then and are called -bit approximations of and , respectively.

We define the degree sequence of to be where is the degree of in . If (), let denote the polynomial that results from the substitution (for ). The result is a polynomial called the specialization of by . Note that is a polynomial in at most variables. In particular, when , then is a constant (called the evaluation of at ). For instance, suppose , then is a polynomial in and .

If is a box with center and width , we denote by the disc with center and radius . Note that contains . If is a polybox, let be the polydisc where .

Oracle Computational Model. We use two kinds of numbers in our algorithms: an explicit kind which is standard in computing, and an implicit kind which we call “oracles”. Our explicit numbers are dyadic numbers (i.e., bigFloats), . A pair of integers represents the nominal value of . However, we also want this pair to represent the interval . To distinguish between them, we write for the nominal value, and for the interval of width . Call an -bit dyadic interval if (so the interval has width at most ). Note that and are different despite having the same nominal value. As another example, note that is the interval . When we say a box, disc, polybox, etc, is dyadic, it means that all its parameters are given by dyadic numbers. The set of closed intervals with dyadic endpoints is denoted . Also, let denote -dimensional dyadic boxes.

The implicit numbers in our algorithms are functions: for any real number , an oracle for is a function such that is an -bit dyadic interval containing . There is normally no confusion in identifying the real number with any oracle function for . Moreover, we write instead of . E.g., if is a real algebraic number with defining polynomial and isolating interval , we may define an oracle for in a fairly standard way. Next, an oracle for a complex number is a function such that where are oracles for and . Again, we may identify with any oracle , and write instead of . For polynomials in variables, we assume a sparse representation, with fixed support , with coefficients , and are power products. An oracle for amounts to having oracles for each coefficient of . Moreover may be written is the interval polynomial whose coefficients are . Call a dyadic interval polynomial.

1.4 Oracles for Root Cluster of Univariate Polynomials

The starting point for this paper is the fundamental result that the Local Clustering Problem (LCP) has been solved in the univariate setting:

Proposition 1 (See [4, 5])

There is an algorithm that solves the Local Clustering Problem when is a univariate oracle polynomial.

In other words, the output of is a set such that each is a natural -isolator, and

To make this result the basis of our multivariate clustering algorithm, we need to generalize this result. In particular, we need to be able to further refine each output of this algorithm. If represents the cluster of roots, we want to get better approximation of , i.e., we want to treat like number oracles. Fortunately, the algorithm in [4, 5] already contains the tools to do this. What is lacking is a conceptual framework to capture this.

Our goal is to extend the concept of number oracles to “cluster oracles”. To support the several modifications which are needed, we revise our previous view of “oracles as functions”. We now think of an oracle as a computational object with state information, and which can transforms itself in order to update its state information. For any , recall that is a dyadic object that is at least -bit accurate. E.g., if is the oracle for , is an interval containing of width . But now, we say that oracle is transformed to a new oracle which we shall denote by “” whose state information is . In general, let denote the state information in . Next, for a cluster of roots of a univariable polynomial , its oracle has state that is a pair where is a dyadic disc satisfying and is the total multiplicity of . Thus is automatically a natural cluster. We say is -bit accurate if the radius of is at most . Intuitively, we expect to be an oracle for that is -bit accurate. Unfortunately, this may be impossible unless is a singleton cluster. In general, we may have to split into two or more clusters. We therefore need one more extension: the map returns a set

of cluster oracles with the property that (union of multisets), and each is -bit accurate. We generalize Proposition 1 so that it outputs a collection of cluster oracles:

Proposition 2 (See [4, 5, 15])

Let be an oracle for a univariate polynomial . There is an algorithm that returns a set of cluster oracles such that

and each is -bit accurate.

2 Sum of multiplicities in clusters of solutions

We extend in Sec. 2.2 a result of [23] to an inductive formula giving the sum of multiplicities of solutions of a triangular system in a cluster. In Sec. 2.3, we introduce a representation of clusters of solutions of called tower representation, reflecting the triangular form of . Sec. 2.1 presents two illustrative examples.

2.1 Two examples

Let be an integer. We define the systems and as follows:

(1)
(2)

has 4 solutions: , , and . has 6 solutions: , , , , and . For , let . The solutions of both and are depicted in Fig. 1.


Figure 1: On the left (resp. right), the solutions of (resp. ) defined in Eq. 1 (resp. Eq. 2) with . (resp. ) is the polybox of with center (resp. ) and width . The boxes in dashed lines are the real parts of and . In the frame, the multiplicities of solutions of each system are computed with the formula of Zhang (Proposition 3) and Thm. 2.1.

2.2 Sum of multiplicities in a cluster

We recall a theorem of Zhang [23] for counting multiplicities of solutions of triangular systems, based multiplicities in fibers. We may rephrase it inductively:

Proposition 3 ([23])

Let and be a solution of the triangular system . The multiplicity of in is

We extend Proposition 3 to a formula giving the total multiplicity of a cluster .

Theorem 2.1

Let be a cluster of solutions of the triangular system . If there is an integer so that for any solution , one has , then

where when .

Let us apply Proposition 3 to compute the multiplicities of solutions of and (see Eq. 1 and Eq. 2). has multiplicity in : . has multiplicity in : . The multiplicities of other solutions are given in fig. 1.

Let be the polybox centered in having width . and . Since , applying Thm. 2.1 yields .

and . Again, one has
. Thus applying Thm. 2.1 yields .

Let be the polybox centered in having width . One can apply Thm. 2.1 to obtain and . The real parts of and are depicted in Fig. 1.

Proof (of Thm. 2.1.)

Remark that can be defined in an inductive way as . Using Proposition 3, we may write
. ∎

2.3 Tower Representation

Definition 1 (Tower Representations)

Let be a polydisc and a -vector of positive integers. The pair is a tower (relative to ) if it satisfies: if , then is a cluster representation relative to . Inductively, if then:

  1. is a tower relative to

  2. , is a cluster representation relative to .

The height of the tower is .

If we replace ‘cluster’ by ‘natural cluster’ in the above definition, then a natural tower. If is a polybox, and is a tower relative to , then we can also call a (polybox) tower relative to . Below, we will only consider natural towers and will omit the word natural.

Let be defined as in Eqs. 1 and 2 and , be as defined in 2.2. The pair is a tower relative to . Moreover, if , and are towers relative to . Consider the polynomial . If then and for any , has 3 roots counted with multiplicity in and in . Hence for any , is a tower relative to then is a tower relative to . Similarly, is a tower relative to . and are (polybox) towers relative to .

In contrast, although is a tower relative to , there exist no tower relative to having or as box: , 0 and are three points of ; consider the three polynomials , and . and have each 1 root of multiplicity 1 in while has 1 root of multiplicity in : there is no that satisfy condition of Def. 1. In the case of , and have both 1 root of multiplicity 1 in while has no root in .

An immediate consequence of the previous theorem is

Corollary 4

Let be a tower relative to of height . Then

Inductively, we have

Remark finally that if is a tower relative to and is an oracle for for any , one can use , as specified in Prop. 1, to compute clusters of for any in a box . If this returns a list , then for all , is a tower relative to , and from corollary 4, . Moreover, . In other words, is a solution for the clustering problem in .

We show in Sec. 4 how to setup an oracle for for any . This oracle may refine and split it into several clusters.

3 Error Analysis of Approximate Specializations

The proofs for this section is found in the Appendix.

Given and , our basic goal is to bound the evaluation error

in terms of and . This will be done by induction on . Our analysis aims not just to produce some error bound, but to express this error in terms that are easily understood, and which reveals the underlying inductive structure. Towards this end, we introduce the following -bound function: if is a positive integer and ,

(3)

Let , i.e., for each . The support of is where where . Here, . We assume that . Our induction variable is . For , let . E.g., if then . Thus . Next define . With this notation, we can write

(4)

where each . E.g., if then and so . Assume that we are given and . Also the degree sequences satisfies , that is the inequality holds componentwise. Then we may define these quantities for :

Note that is a operator that must attach to some function or vector to denote the “th perturbation” of or .

Lemma 5

Let and :

(i) .
(ii) .
(iii) .

We now have a recursive bound . But we need to convert the bound to only depend on the data