Probabilistic Condition Number Estimates for Real Polynomial Systems II: Structure and Smoothed Analysis

09/10/2018 ∙ by Alperen A. Ergür, et al. ∙ Berlin Institute of Technology (Technische Universität Berlin) Texas A&M University 0

We consider the sensitivity of real zeros of polynomial systems with respect to perturbation of the coefficients, and extend our earlier probabilistic estimates for the condition number in two directions: (1) We give refined bounds for the condition number of random structured polynomial systems, depending on a variant of sparsity and an intrinsic geometric quantity called dispersion. (2) Given any structured polynomial system P, we prove the existence of a nearby well-conditioned structured polynomial system Q, with explicit quantitative estimates. Our underlying notion of structure is to consider a linear subspace E_i of the space H_d_i of homogeneous n-variate polynomials of degree d_i, let our polynomial system P be an element of E:=E_1×...× E_n-1, and let (E):=(E_1) +...+(E_n-1) be our measure of sparsity. The dispersion σ(E) provides a rough measure of how suitable the tuple E is for numerical solving. Part I of this series studied how to extend probabilistic estimates of a condition number defined by Cucker to a family of measures going beyond the weighted Gaussians often considered in the current literature. We continue at this level of generality, using tools from geometric functional analysis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Numerous problems in mathematical modeling reduce to finding a distinguished optimum state of a continuously evolving dynamical system. These type of problems can often be translated into finding the roots of a system of polynomial equations in many variables. Just to name a few examples: The computation of steady states in chemical reaction networks [23], Nash equilibria in economics [McL05], and the determination of protein structures from metric data [1] all lie in this framework. Two frequent features of polynomial systems arising this way are (1) the polynomial systems are highly structured and (2) practitioners are mainly interested in the real solutions of these systems instead of the (non-real) complex solutions.

From a computational complexity point of view, deciding existence of a complex root of a system of complex polynomials is already NP-Hard. From the point of view of geometry, Bertini’s Theorem and Bezout’s Theorem show that a generic (square) system of homogenous polynomials with degrees has exactly many complex solutions. This brings us to a problem posed by Steve Smale: The 17th problem in Steve Smale’s list for 21st century mathematicians asks for an efficient numerical algorithm to find an approximation of a single complex root [43]. This problem is now solved after two decades of intensive research [3, 6, 10, 29]: there are now algorithms that can find a single complex approximate root of a system of homogenous polynomials in average-case polynomial time, and the notion of “approximate” implies a kind of guaranteed fast convergence with respect to Newton iteration. Also, the underlying input size is taken to be

(the underlying number of monomial terms), and average-case polynomial time means a polynomial upper bound on the expectation of the complexity of the algorithm, assuming a particular model of randomness: The random polynomial systems considered have independent Gaussian random coefficients with specially chosen variances. To be more precise, the state of the art is represented by Lairez’s recent article

[30], which proves an expected complexity bound of , under the preceding randomness model, where . The only drawback of these elegant results is that, in practice, the input size is often much smaller than .

In a different direction, there have been polyhedral homotopy algorithms specifically tailored for solving sparse polynomial systems since the early 90’s [46, 22]. The polyhedral homotopy method is implemented in PHCpack and Hom4ps-3 and has been tested on many practical problems (see, e.g., [45, 11]). However, to the best of our knowledge, general and explicit average-case complexity bounds do not yet exist for polyhedral homotopy. Recently, Malajovich has developed a mathematically rigorous toric homotopy iteration for sparse polynomial system solving [34], yielding a promising theoretical framework. Unfortunately, his theory does not yet include an algorithm that is both implementable and provably fast.

A central question behind the complexity of all the algorithms mentioned above is estimating the minimal distance between roots of the underlying polynomial system. This root spacing question is now known to be equivalent to the following question: Given a generic input polynomial system (i.e., a polynomial system with many isolated roots), what is the distance of this generic system to the closest degenerate polynomial system? Both of these questions are captured by the mathematically elegant notion of condition number [7].

The numerical methods discussed in the preceding paragraphs aim to solve generic polynomial systems over the field of complex numbers. In the case of real solutions, generic behavior is replaced by multiple possible typical behaviors. For instance, a small perturbation in the coefficients can change a system from having no real root at all to having many real roots. Luckily, a condition number theory that captures these subtleties was developed by Cucker in [12]. Later, this theory was applied in the design and analysis of a numerical algorithm for real root finding [13, 14, 15]. This condition number also appears in recent papers on numerically computing homology groups of semialgebraic sets [9, 16].

The papers that culminated in the solution of Smale’s 17th problem [3, 6, 10, 29], and the series of papers [13, 14, 15] on real root solving, analyzed the condition number of random polynomial systems defined by

independent Gaussian coefficients with specially chosen variances. The specially chosen variances induce a unitarily invariant measure on the coefficient space, and this invariance property is heavily used in the complexity analysis. Structured polynomial systems, however, often form a much smaller space which is not closed under a unitary action on the variables. So, to enable the analysis of numerical algorithms for structured polynomial systems one has to drop the unitary group invariance assumption on the underlying probability measure.

We use techniques coming from asymptotic geometric analysis and high-dimensional probability ([47], [49]) that have been applied very successfully to the non-asymptotic theory of random matrices — a “linear version” of our problem. These techniques have the advantage of allowing probability measures much more general than Gaussians: In Part I [20] of our present work, we analyzed the real condition number of Cucker for a broad family of measures, without any invariance assumptions. In this current paper, we apply our techniques to derive condition number estimates for structured random real polynomial systems. We then derive smoothed analysis type estimates analyzing the change in the condition number under structure preserving random perturbations.

1.1. The Real Condition Number and Analysis of Algorithms

In this section we present the real condition number of Cucker and comment on it’s relation to analysis of numerical algorithms in real algebraic geometry.

Definition 1.1.

Given and , let be a homogenous polynomial with , and let be the corresponding polynomial system. We set where , and let denote the coefficient of in . We define the Weyl-Bombieri norms of and to be, respectively,
and .
Let be the diagonal matrix with diagonal entries and let denote the linear map between tangent spaces induced by the Jacobian matrix of the polynomial system evaluated at the point . Finally, we define the (normalized) local condition number (for solving ) to be or , according as has full rank or not, where denotes the operator norm of a matrix .

Cucker’s condition number definition from [12] is the following:

Definition 1.2.

[12] Let and . We respectively call and the local and global condition numbers for real solving.

It turns out that the worst case analysis of the numerical algorithm for real root counting in [13], and the worst case analysis of numerical algorithms for computing homology groups in [9, 16], can be analyzed using only two parameters: the condition number of the input polynomial system and the evaluation complexity of the input polynomial system. For instance, in [13] the authors assume the input is a sparse polynomial system with total number of monomials , and they present an analysis of their algorithm based on (an earlier variant of ) and . Even though the analysis in [9, 16] is conducted for dense polynomials with evaluation complexity linear in , it can also be modified for sparse polynomial systems with lower evaluation complexity.

A first step toward going beyond worst-case complexity analysis is average-case analysis of algorithms. The authors of [13, 14, 15] and [9, 16] conducted an average-case analysis of their algorithms using a specially chosen Gaussian measure which is invariant under an orthogonal group action over the variables. As explained before, this specially chosen Gaussian measure creates an obstacle toward conducting average analysis of algorithms on structured polynomial systems. The main result of this article at last enables a more general average-case analysis of these numerical algorithms, including structured inputs.

Smoothed analysis of algorithms, as conceived by Spielman and Teng [44], can be considered as a common generalization of average-case and worst-case complexity analysis. The idea of smoothed analysis is to draw an -ball around every point in the input space, then conduct an average analysis inside all -balls, and then take the supremum over all these averages as a complexity measure. Our main results also include a smoothed analysis of for structured polynomial systems (i.e., the balls are drawn inside the structured linear subspace not in the space of dense polynomials). To the best of our knowledge our results are the first ones to address smoothed analysis of condition number for randomly perturbed structured polynomials.

1.2. Linear Structure in Spaces of Polynomial Systems

In this article we are concerned with polynomials that have a simple structure: polynomials that satisfy a given set of linear relations. More precisely, let

denote the vector space consisting of all real homogenous degree

polynomials with variables (along with the zero polynomial), and let be a linear subspace. When the number of variables is clear from the context, we will often write instead of . We call full (in ) if for every the pointwise evaluation map is not identically zero. For instance, if is the subspace of consisting of homogeneous polynomials lying in , then this particular is not full in (since each polynomial in this vanishes on ).

Fullness is a geometric property, and it can be checked by an optimization procedure once an orthonormal basis for the linear space is fixed. We discuss this further in Section 4.

We generalize to the case of polynomial systems in a straightforward manner: Let be a vector with positive integer coordinates, and let denote the vector space of homogenous real polynomial systems with . Let be linear subspaces and let be defined by . We say is full if the are full for all .

One may wonder if the fullness assumption on our linear structures is really necessary for probabilistic analysis. An easy example to consider is the following: Let us pick so that , and define the following subspaces:

where denotes the the gradient of evaluated at . Note that these particular are codimension linear subspaces of . Set . By construction, for any polynomial system with , has a singularity at . Hence, for all , we have that is infinite, and a probabilistic analysis of on this linear space is not meaningful. On the other hand, we have the following fact on full linear subspaces.

Lemma 1.3.

Let be a full linear subspace. Then for a random polynomial system with the general model of randomness described in Section 1.3, we have that with probability 1.

We emphasize that the randomness model we consider, detailed fully in the next section, is far more general than the restricted Gaussian models considered earlier, and allows many non-Gaussian probability distributions as well. Lemma

1.3 will in fact be an easy consequence of our main condition number bound: Theorem 1.5 of the next section.

The main results of this paper include two quantities related to the linear structure : and . The quantity replaces the dimension count in our earlier bounds from [20], and is a new quantity — the dispersion constant for (in Section 2 below) — related to the geometry of . We explore the basic properties of the dispersion constant in Section 4. For instance, our definition will immediately imply and . Moreover, we also show for linear subspaces and that, if one can find such that (using the obvious action) then . So is indeed a geometric quantity independent of the underlying basis representation.

Our estimates profit when is low-dimensional and suffer when is large. So we investigate how large can typically be as a function of and other parameters. Our first theorem below shows that for linear spaces with , the quantity typically admits a constant upper bound.

Theorem 1.4.

There are constants with the following property: Let and be integers, random linear dimensional subspaces drawn using Haar measure on the Grassmannian for , set , and let . Then for any and we have

with probability greater than .

One should first note that the linear spaces are more general than, say, polynomials with a fixed number of terms in the standard monomial basis: We do not restrict to any basis in Theorem 1.4, but rather study linear subspaces of geometrically. Theorem 1.5 is in fact a reflection of a concentration of measure phenomenon on the Grassmanian of linear spaces in (see, e.g., [31]):

is a function of random variables having the property that the graph of

clusters around the graph of a constant function with high probability.

For numerical algorithms operating over the sphere involving a structured input from a linear space , can be considered as the “condition number of the structure”. We discuss this in Section 4, and indicate ways to compute . Theorem 1.4 shows that a linear space with is typically well-conditioned for deploying numerical algorithms in a structure-preserving way. Simply put, some families of polynomial systems are better-suited for numerical algorithms than others, and Theorem 1.4 gives us a way to make this precise.

1.3. Randomness Assumptions

We say a random vector satisfies the Centering, Sub-Gaussian, and Small Ball properties, with constants and , if the following hold true:
         1. (Centering) For any we have .
         2. (Sub-Gaussian) There is a such that for every we have
                                        for all .
         3. (Small Ball) There is a such that for every vector we have
                                  for all .

We note that these three assumptions directly yield a relation between and : We in fact have (see [20, Inequality (1), just before Sec. 3.2]).

Random vectors that satisfy these three properties form a large family of distributions, including standard Gaussian vectors and uniform measures on a large family of convex bodies called -bodies, such as uniform measures on -balls for all . We refer the reader to the upcoming book of Vershynin [47]

for more details. Discrete sub-Gaussian distributions, such as the Bernoulli distribution, also satisfy an inequality similar to the small ball inequality in our assumptions. However, the small ball type inequality satisfied by such discrete distributions depends not only on the norm of the deterministic vector

but also on the arithmetic structure of . It is possible that our methods combined with the work of Rudelson and Veshynin in the Littlewood-Offord problem [38] can extend our main results to discrete distributions such as Bernoulli. In this work, we will content ourselves with continuous distributions.

The examples of random vectors from the preceding paragraph do not necessarily have independent coordinates and this provides important extra flexibility. There are also interesting examples of random vectors with independent coordinates. In particular, if are independent centered random variables that all satisfy the Sub-Gaussian inequality with constant and the Small Ball condition with , then the random vector also satisfies the Sub-Gaussian and Small Ball inequalities with constants and , where and are universal constants. This is a relatively new result of Rudelson and Vershynin [40]. The best possible universal constant is discussed in [33, 37]. To create a random variable satisfying the Small Ball and Sub-Gaussian properties one can, for instance, start by fixing any and then considering a random variable with density function for suitably chosen positive .

1.4. Main Results

We consider the linear structure to be given and assume that we can find a basis for orthonormal with respect to the Weyl-Bombieri inner product (defined in Section 2 below).

Theorem 1.5.

Let be a vector with positive integer coordinates, let be full linear subspaces, and let . Assume and . Let be independent random elements of that satisfy the Centering property, the Sub-Gaussian property with constant , and the Small Ball property with constant all with respect to the Bombieri-Weyl inner product. We set , and , where is a universal constant. Then for the random polynomial system , we have

Moreover, for , we have . In particular, .

Following this condition number estimate, our next goal is a smoothed-analysis type result for . For this we will need a slightly stronger assumption on the random input. This slightly stronger property is called the Anti-Concentration Property and it replaces the Small Ball assumption in our model of randomness. We will need a bit of terminology to define anti-concentration.

Definition 1.6 (Concentration Function).

For any real-valued random variable and , the concentration function, , is defined as . Let denote the standard inner product on . We then say a random vector satisfies the Anti-Concentration Property with constant if we have for all .

It is easy to check that if the random variable has bounded density then . Moreover, the Lebesque Differentiation theorem states that upper bounds for the function for all imply upper bounds for . See [39] for the details.

Theorem 1.7.

Let be a full linear subspace, and a fixed (deterministic) polynomial system. Assume , and . Now let be a random polynomial system given by the same model of randomness as in Theorem 1.5, but with the Small Ball Property replaced by the Anti-Concentration Property. Set , and , where is a universal constant. Then for the randomly perturbed polynomial system , we have

Moreover, for , we have . In particular, .

We prove a stronger version of Theorem 1.7 in Section 3.4: See Theorem 3.12 and Remark 3.14 there. As a corollary of this smoothed-analysis theorem we derive the following structural result.

Theorem 1.8.

Let be full linear subspaces, let , and let . Then, for every , there is a polynomial system with the following properties:

and

for a universal constant .

One can view this result as a metric entropy type statement as follows: Suppose we are given a bounded set with , and we would like to cover with balls of radius , i.e., . Moreover, suppose we want the ball-centers to have a controlled condition number. We can start with an arbitrary covering of , and use Theorem 1.8 with to find a with controlled condition number in each one of the balls . Then gives a -covering of where has controlled condition number.

Remark 1.9.

In the literature on random polynomials, it is customary to consider a model of randomness expressed in terms of a fixed basis and random coefficients. To create such a model of randomness one can consider the following: Let be a vector with positive integer coordinates, assume that are full linear subspaces with , and let . Suppose for each that is an orthonormal basis for the linear space with respect to the Weyl-Bombieri inner product. Let be independent random vectors satisfying the Centering, Sub-Gaussian, and Small Ball properties with constants and . Consider the random polynomials

where is the th coordinate of . Then, the random polynomial system satisfies the three assumptions in Theorem 1.5. Similarly, if we replace the Small Ball assumption with the Anti-Concentration Property then the assumptions of Theorem 1.7 are satisified by .

2. Preliminaries

We start by introducing the main inner product (on vector spaces of polynomials) that will be used throughout our paper. For -variate degree homogenous polynomials and in , their Weyl-Bombieri inner product is defined as

It is known (see, e.g., [28, Thm. 4.1]) that for any we have

We equip the vector space of -variate degree homogenous polynomials with the Weyl-Bombieri inner product and we consider pointwise evaluations . More precisely, for any , consider the . For the natural resulting Hilbert space structure on , the Riesz Representation Theorem tells us that for any there exists a corresponding unique such that, for all , we have

It is easy to show that for the norm on induced by , we have for all . In particular, for the Weyl-Bombieri inner product it is a simple exercise to verify that .

Now, let be a linear subspace, and let denote orthogonal projection onto . Then, for all , we have . Even though is fixed for all , can vary arbitrarily between to . This is due to the fact that is closed under -action, and all directions are “equal”. In particular, need not be closed under this action and every has weight .

We now state some basic probabilistic estimates for our structured setup.

Lemma 2.1.

Let be a full linear subspace, and let be a random element of that satisfy the Centering Property, the Sub-Gaussian Property with constant , and the Small Ball Property with constant . Then, for all , the following estimates hold:

Proof.

By definition, we have . So by the Cauchy-Schwartz Inequality and the assumed properties on the distribution for , we are done.

For notational convenience, we define two extremal quantities for a linear subspace :

The following is an immediate corollary of Lemma 2.1.

Corollary 2.2.

Assume the hypothesis of Lemma 2.1, then for all the following estimates hold:

We now define the Weyl-Bombieri inner product for polynomial systems and derive some basic probabilistic estimates. Let and let denote the space of (real) systems of homogenous -variate polynomials with respective degrees . Then for and we define their Weyl-Bombieri inner product to be . We also let .

Let be linear subspaces, and let denote a linear space of polynomial systems with for all . We define the following quantities for notational convenience: and . We are now ready to present our basic tool.

Lemma 2.3.

Let be a vector with positive integer coordinates, let be full linear subspaces, and let . Let be random elements of that satisfy the Centering Property, the Sub-Gaussian Property with constant , and the Small Ball Property with constant , all with respect to Bombieri-Weyl inner product. Then, for the random polynomial system and all , the following estimates hold:

where and are absolute constants.

The reader has perhaps already observed the contrast between the two inequalities in Lemma 2.3. The first inequality controls how large is, while the second inequality controls how small is. This contrast between these inequalities will be our main tool throughout this paper. So we will need to somehow use the ratio of and .

Definition 2.4.

Let be a linear subspace, and let and be as defined above. Then we call the dispersion constant of . Now, for and , we then define the dispersion constant (for an -tuple of linear spaces) to be .

For now, the reader should be aware of two things: will appear in main theorems in the following section, and the last section of this article is completely devoted to understanding . So after one finishes reading preliminaries, the remaining two sections can be read independently.

Now we prove Lemma 2.3

. For the proof we need to recall some theorems from probability theory and some basic tools developed in Part I

[20] of our present work. These basic lemmata will also be used later in other proofs. We start with a theorem which is reminiscent of Hoeffding’s classical inequality [21].

Theorem 2.5.

[49, Prop. 5.10] There is an absolute constant with the following property: If are centered, sub-Gaussian random variables with constant , and and , then

We will also need the following standard lemma (see, e.g., [38, Lemma 2.2]).

Lemma 2.6.

Assume are independent random variables that have the property that for all . Then , , where . Moreover, if are independent random variables such that, for every , we have , then there is a universal constant such that for every we have .

Now we have the basic tools to “tensorize” sub-Gaussian tail bounds and small ball inequalities. We proceed with deterministic inequalities for polynomial systems.

The following lemma was proved in our earlier paper [20], generalizing a classical Theorem of Kellog [24]. To state the lemma we need a bit of terminology. For any system of homogeneous polynomials define . Let denote the Jacobian matrix of the polynomial system at point , let denote the image of the vector under the linear operator , and set .

Lemma 2.7.

Let be a polynomial system with homogeneous of degree for each and set . Then:

  1. We have and, for any mutually orthogonal , we also have .

  2. If for all then we also have .

The last lemma we need is a discretization tool for homogenous polynomial systems that was developed in [20] based on Lemma 2.7.

Lemma 2.8.

Let be a system of homogenous polynomials with variables and . Let be a -net on . Let and . Similarly let us define,

  1. When for all we have and
    .

  2. .95[1]When we have and .

Proof of Lemma 2.3: We begin with the first claim. Using Lemma 2.1 and the fact that yield the following estimate for any and for all :

Now let with , and apply Lemma 2.5 to the sub-Gaussian random variables and the vector :

Observe that . For any fixed point and a free variable , we have that is a linear polynomial on . We then use Lemma 2.8 for this linear polynomial, which gives us the following estimate:

We need to control . So we set (i.e assuming ), , and we have the following estimate with some universal constant .

We continue with proof of the second claim. Using Lemma 2.1 and the fact that for all , we deduce the following estimate hold for all and for any :

Using Lemma 2.6 on random variables gives the following estimate:

3. Probabilistic Analysis of Condition Number for Structured Polynomial Systems

In this section we will prove our main theorem on probabilistic analysis of the condition number for structured polynomial systems. Recall that for a given , we defined to be the vector space of homogenous polynomial systems with . We call a system of homogenous polynomials overdetermined if , square if , and underdetermined if . Our techniques in this section work for probabilistic analysis of for square and overdetermined polynomial systems. However, for the sake of simplicity in presentation we will state and prove our theorems only for the square case. Deriving estimates for the overdetermined case based on the arguments presented in this article is routine. For technical details of the overdetermined case, the interested reader is invited to consult our earlier paper [20].

The definitions of local condition number and global condition number were given in the introduction as follows: and