1 Introduction
Background and related work.
In 2010, Guth and Katz [14] resolved the Erdős distinct distances problem in the plane. A major ingredient in their proof was a partitioning theorem for points in . Specifically, they proved that, given a set of points in and an integer , there is a variate “partitioning polynomial” of degree at most so that each connected component of contains points from the set. Their polynomial partitioning theorem has led to a flurry of new results in combinatorial and incidence geometry, harmonic analysis, and theoretical computer science.
The GuthKatz result established the existence of a partitioning polynomial, but it did not give an effective way to compute such a polynomial given a set of points. In [3], Agarwal, Matoušek, and Sharir developed an efficient algorithm to compute partitioning polynomials, matching the degree bound obtained in [14] up to a constant factor. They used their algorithm to obtain a linearsize data structure for the problem of range searching with semialgebraic sets in the “low storage / sublinear query” regime.
In 2015, Guth [13] generalized the GuthKatz partitioning polynomial result from points to semialgebraic sets.^{5}^{5}5Guth stated his result for the special case where the semialgebraic sets are real algebraic varieties, but his proof in fact holds in the more general setting of semialgebraic sets. Recall that a semialgebraic set in is the locus of points in that satisfy a Boolean formula over a set of polynomial inequalities. Informally, he proved that given a collection of dimensional semialgebraic sets^{6}^{6}6We refer the reader to [9, Chapter 2] for a formal definition of dimension of a semialgebraic set. in and an integer , there is a variate partitioning polynomial of degree at most so that each connected component of intersects semialgebraic sets from the collection (the implicit constant in the notation depends on and on the degree and number of polynomials required to define each semialgebraic set). We refer to such a polynomial as a generalized partitioning polynomial.
To sum up, Guth’s proof established the existence of a generalized partitioning polynomial, but it did not give an effective way to compute such a polynomial given a collection of semialgebraic sets. In [4], the last three authors developed a computationally efficient way to construct a partitioning polynomial for a set of algebraic curves in . For other settings, however, no effective method for computing a partitioning polynomial was known prior to the present work.
Our results.
Our main result is a computationally efficient implementation of Guth’s polynomial partitioning theorem for semialgebraic sets (Theorem 4). Given a set of semialgebraic sets in , our algorithm computes a polynomial partition of degree in expected running time linear in and singlyexponential in .
Next, we present four applications of our algorithm in Section 4:

Let be a family of semialgebraic sets in , each of complexity at most for some constant (see Section 2 for the definition of the complexity of a semialgebraic set). Each set in is assigned a weight that belongs to a semigroup. We present a data structure of size , for any constant , that can compute, in time, the cumulative weight of the sets in containing a query point. The data structure can be constructed in randomized expected time. This is a significant improvement over the best known data structure by Koltun [16], for , that used space.

Let be a set of points in , each of which is assigned a weight, and let be a (possibly infinite) family of semialgebraic sets in . Suppose that there exists a positive integer and an injection so that for each , the set is a semialgebraic set in of complexity at most . We can construct in randomized expected time a data structure of size , for any constant , that can compute in time the cumulative weight of for a query range . The previous best known data structure used space.

Given a family of semialgebraic sets in , we present a data structure of size , for any constant , that can answer vertical ray shooting queries in time. The data structure can be constructed in randomized expected time.

Finally, we follow the technique of Sharir and Zahl [19] to cut algebraic planar curves into a collection of pseudosegments (that is, a collection of Jordan arcs, each pair of which intersects at most once), where the constant of proportionality depends on the degree of the curves. By exploiting Theorem 4, we show that this collection can be constructed in comparable time bound.
2 Preliminaries
In what follows, the complexity of a semialgebraic set in is the minimum value so that can be represented as the locus of points satisfying a Boolean formula with at most atoms of the form or , with each being a variate polynomial of degree at most .
Hereafter we write to mean that there exits a constant depending only on so that , for all positive integers .
Our analysis makes extensive use of concepts and results from real algebraic geometry and random sampling. We review them below.
2.1 Polynomials, partitioning, and quantifier elimination
Sign conditions.
Consider polynomials . A sign condition on is an element of . A strict sign condition on is an element of . A sign condition is realizable if the set
(1) 
is nonempty. A realizable strict sign condition is defined analogously. The set (1) is called the realization of the sign condition. The set of realizations of sign conditions (resp., realizations of strict sign conditions) corresponding to the tuple is the collection of all nonempty sets of the above form. These sets are pairwise disjoint and partition , by definition.
Polynomials and partitioning.
The set of polynomials in of degree at most
is a real vector space of dimension
; we identify this vector space with . For a point , let be the corresponding polynomial of degree at most .Remark 1.
Consider the polynomial given by . Since we can write , where is a monomial of degree at most , has degree .
For each positive integer , let be the smallest positive integer so that ; we have . For each , pick a dimensional subspace of the vector space of polynomials in of degree at most . These subspaces will be fixed hereafter. For each positive integer , define the product space
(2) 
We identify each point with a tuple of polynomials where . For each , and thus .
Let , let be a collection of semialgebraic sets in , and let . We say that is a partitioning tuple for if has realizable strict sign conditions and the realization of each of them intersects at most sets from .
Guth [13] proved that, if is chosen appropriately, then a partitioning tuple is guaranteed to exist:
Proposition 1 (Generalized Polynomial Partitioning [13]).
Let be a family of semialgebraic sets in , each of dimension at most and complexity at most . For each , there exists a partitioning tuple for , with
We also recall Theorem 2.16 from [7]:
Proposition 2 (Point Location in SemiAlgebraic Sets).
Let be a set of at most polynomials in of degree at most . Then there is an algorithm that computes a set of points meeting every semialgebraically connected component of every realizable sign condition on in time . There is also an algorithm providing the list of signs of all the polynomials of at each of these points in time .
Singly exponential quantifier elimination.
Let and be nonnegative integers and let . Let be a firstorder formula given by
(3) 
where is a block of free variables; is a block of variables, and is a quantifierfree Boolean formula with atomic predicates of the form , with
The TarskiSeidenberg theorem states that the set of points satisfying the formula is semialgebraic. The next proposition is a quantitative version of this result that bounds the number and degree of the polynomial equalities and inequalities needed to describe the set of points satisfying . This proposition is known as a “singly exponential quantifier elimination,” and its more general form (where may contain a mix of and quantifiers) can be found in [7, Theorem 2.27].
Proposition 3.
Let be a set of at most polynomials, each of degree at most in real variables. Given a formula of the form (3), there exists an equivalent quantifierfree formula
(4) 
where are polynomials in the variables , ,
(5)  
and the degrees of the polynomials are bounded by .
2.2 Range spaces, VC dimension, and samples
We first recall several standard definitions and results from [15, Chapter 5]. A range space is a pair , where is a set and is a collection of subsets of . Let be a range space and let be a set. We define the restriction of to , denoted by to be , where If is finite, then . If equality holds, then we say is shattered. We define the shatter function by . The VC dimension of is the largest cardinality of a set shattered by . If arbitrarily large (finite) subsets can be shattered, we say that the VC dimension of is infinite.
Let be a range space, a finite subset of , and . A set is an sample (also known as approximation) of if
The following classical theorem of Vapnik and Chervonenkis [20] guarantees that if the VCdimension of is finite, then for each positive , a sufficiently large random sample of is likely to be an sample.^{7}^{7}7The following bound is not the strongest possible (see, e.g. [15, Chapter 7] for an improved bound), but is sufficient for our purposes.
Proposition 4 (Sample Theorem).
Let be a range space of VC dimension at most and let be finite. Let . Then a random subset of cardinality is an sample for
with probability at least
.Proposition 5 ([12, 15]).
Let be a range space whose shatter function satisfies the bound , for all positive integers , where is a real parameter. Then has VC dimension at most .
We next closely follow the arguments in the proof of Corollary 2.3 from [12], and show the following theorem:
Theorem 1.
Let be a semialgebraic set of complexity . For each , define . Then the range space has VC dimension at most .
Proof.
By assumption, there are polynomials and a Boolean formula , so that, for , if and only if .
Put . Fix a positive integer and let . Our goal is to bound
For each define
Let and suppose that there exists with . This means for each and for each , i.e., the semialgebraic set consisting of those points satisfying the Boolean formula
is nonempty. Observe that if and are distinct subsets of , then and are disjoint and, in fact,
Each of the nonempty sets contains at least one realization of a sign condition of the polynomials
each of degree at most . By a result of Milnor and Thom stated in Section 2.1, these polynomials determine at most realizable sign conditions. Thus
(6) 
Since (6) holds for every choice of , we conclude that
By Proposition 5, has VC dimension at most . ∎
3 Computing Generalized Polynomial Partition
In this section we obtain the main result of the paper: given a collection of semialgebraic sets in , each of dimension at most and complexity at most , a partitioning tuple for can be computed efficiently. We obtain this result in several steps. First, we represent a semialgebraic set in of complexity at most as a point in a parameter space —each point in corresponds to a tuple of polynomials in variables, each of degree at most . We use the set (defined in Section 2.1) to parameterize the space of sign conditions specified by tuples of polynomials. With these parameterizations in place, the condition that a semialgebraic set intersects a given sign condition is represented by a subset of pairs of points from .
In Theorem 2, we prove that is semialgebraic, and its complexity depends only on , , and . This means that for each semialgebraic set , the set of tuples whose realization intersects is semialgebraic. This in turn implies that if are semialgebraic sets and if , then the set set of tuples whose realization intersects at most of the sets is semialgebraic. Unfortunately, however, the complexity of this subset of might be very large; in particular, it is likely to be exponential in .
To circumvent this problem, we use the theory of samples. That is, we show that rather than considering a large number of semialgebraic sets , it suffices to select a small number of these sets at random. If a tuple has the property that each of its realizable sign conditions intersect few sets from the random sample, then with high probability each of the realizable sign conditions will intersect few sets from the original collection. This property is shown by applying Theorems 1 and 2.
3.1 The parameter space of semialgebraic sets
Fix positive integers , , and , and let . Hereafter we assume that , which can be enforced by choosing sufficiently large.
As above, we denote by a family of semialgebraic sets in , each of dimension at most and complexity at most . Let be a Boolean function. Let . We identify a point with the semialgebraic set
Observe that each semialgebraic set in is of the form for some choice of and a Boolean function . Let . For each , define , where is the tuple associated to . Define
Theorem 2.
The set is semialgebraic, defined by polynomials, each of degree .
Before proceeding with the proof of the theorem, we note that the complexity of is only singly exponential in , which we will exploit in Section 3.3.
Proof.
Define . The condition is a Boolean condition on polynomials. By Remark 1, each of these polynomials has degree at most . Similarly, the condition consists of polynomial inequalities, each of degree at most . This means that there exists a set of polynomials of degree in the variables , and a Boolean function so that
With the above definitions
We now apply Proposition 3. We have a set of polynomials, each of degree at most . The variables and from the hypothesis of Proposition 3 are set to and , recall that is sufficiently larger than , and thus is a suitably chosen polynomial function of . With these assignments, Proposition 3 says that can be expressed as a quantifierfree formula of the form
(7) 
where are polynomials in the variables , ,
(8)  
where the degrees of the polynomials are bounded by .
Summarizing, the quantifierfree formula (7) for is a Boolean combination of polynomial inequalities, each of degree , as claimed. ∎
3.2 A singlyexponential algorithm
In this section, we discuss how to compute a partitioning tuple (for an appropriate value of ) for a small number of semialgebraic sets.
Theorem 3.
Let be a family of semialgebraic sets in , each of dimension at most and complexity at most . Let and let . Then a partitioning tuple for can be computed in time.
Proof.
Set . As above, we identify points in with tuples of polynomials. The argument in Theorem 2, as well as the fact that the class of semialgebraic sets is closed under the operation of taking a projection, show that, for each and each ,
is a semialgebraic set in that can be expressed as a Boolean combination of polynomials, each of degree .
Let be a constant to be specified later (the constant will depend only on and ) and let ; observe that . For each and for each set of cardinality , the set is a semialgebraic set in that can be expressed as a Boolean combination of polynomials, each of degree , where . Therefore
(9) 
is a semialgebraic set in that can be expressed as a Boolean combination of
polynomials, each of degree . This and the fact that the class of semialgebraic sets is closed under the operation of taking complement imply that
is a semialgebraic set in that can be expressed as a Boolean combination of polynomials, each of degree . This means that the set
(10) 
is a semialgebraic set in that can be expressed as a Boolean combination of polynomials, each of degree . Recall that by assumption and . It thus follows that the degree is bounded by . Similarly, the dimension of the space is bounded by as well.
3.3 Speeding up the algorithm using sampling
In this section we first state and prove the following lemma:
Lemma 1.
For every choice of positive integers and , there is a constant so that the following holds. Let be a positive integer. Let be a finite collection of semialgebraic sets in , each of dimension at most and complexity at most . Let be a positive integer and let . Let be a randomly chosen subset of of cardinality at least and let be a partitioning tuple for . Then with probability at least , each of the realizable sign conditions of intersects elements from .
Note that Lemma 1 states that it is sufficient to consider a random subset of size polynomial in in order to obtain an appropriate partitioning tuple for the entire collection , with reasonable probability.
Proof.
Define and as above, and let . For each , define the range
Define . By Theorems 1 and 2, the range space has VC dimension . Define , where the union is taken over the Boolean functions . Since the shatter function grows by at most a multiplicative factor of , the VC dimension of the range space is as well (this is a standard fact, see, e.g., [15, Chapter 5]).
We are now ready to prove the statement of the lemma. Set . Suppose that is an sample of and that is a partitioning tuple for . Then for each range , we have . Combining this with sample properties, we obtain:
and by the choice of (with an appropriate constant of proportionality) we have:
and thus
The corresponding cardinality of is
We next proceed as follows. We select a random sample of of cardinality and use Theorem 3 to compute the corresponding partitioning tuple . This takes time. By Lemma 1, this tuple will be a partitioning tuple for with probability at least . We can verify whether the partitioning tuple works in time. If the tuple does not produce the appropriate partition, we discard it and try again. Specifically, the verification step is done as follows. For each semialgebraic set we compute the subset of sign conditions of , with which it has a nonempty intersection. To this end, we restrict each of the polynomials to and apply Proposition 2 on this restricted collection, thereby obtaining a set of points meeting each semialgebraically connected component of each of the realizable sign conditions, as well as the corresponding list of signs of the restricted polynomials for each of these points. This is done in time for a single semialgebraic set , and overall time, over all sets. We refer the reader to [6] for further details concerning the complexity of the restriction of to . We have thus shown:
Theorem 4.
Let be a finite collection of semialgebraic sets in , each of which has dimension at most and complexity at most . Let and let . Then a partitioning tuple for can be computed in expected time by a randomized algorithm.
4 Applications
In this section we describe a few applications of Theorem 4, namely, point location amid semialgebraic sets, semialgebraic range searching with logarithmic query time, vertical ray shooting amid semialgebraic sets, and cutting algebraic curves into pseudosegments.
4.1 Point location
Let be a set of semialgebraic sets in , each of complexity at most . Each set has a weight . We assume that the weights belong to a semigroup, i.e., subtractions are not allowed, and that the semigroup operation can be performed in constant time. We wish to preprocess into a data structure so that the cumulative weight of the sets in that contain a query point can be computed in time. Note that if the weight of each set is and the semigroup operation is Boolean , then the pointlocation query becomes an instance of unionmembership query: determine whether the query point lies in . We follow a standard hierarchical partitioning scheme of space, e.g., as in [10, 1], but use Theorem 4 at each stage. Using this hierarchical partition, we construct a tree data structure of depth, and a query is answered by following a path in .
More precisely, we fix sufficiently large positive constants and . If , consists of a single node that stores itself. So assume that . Using Theorem 4, we construct a tuple of variate polynomials of degree at most , which have realizable sign conditions, each of which with a realization that meets the boundaries of at most sets of . For each realizable sign condition , let be the family of sets whose boundaries meet the realization of , and let be the family of sets that contain the realization of . We compute , , and , as follows: We first apply Proposition 2 to to compute, in time, a representative point in each realization of a sign condition.
Next, fix a set and mark all realizations that meet the boundary of . This step is similar to the one described in the proof of Theorem 4, that is, we restrict each of the polynomials to the algebraic varieties representing the boundary of and apply Proposition 2 to this restricted collection. Each remaining realization is either contained in or disjoint from it, which can be determined by testing, for each such realization, whether its representing point (computed earlier using Proposition 2 on the original collection ) is contained in . This task can be completed in overall time over all sets of .
We create the root of and store the tuple at . We then create a child for each realizable sign condition and store and at . We recursively construct the data structure for each and attach it to as its subtree.
Since each node of has degree at most and the size of the subproblem reduces by a factor of at each level of the recursion, a standard analysis shows that the total size of the data structure is , where is a constant that can be made arbitrarily small by choosing and to be sufficiently large. Similarly, the expected preprocessing time is also .
Given a query point , we compute the cumulative weight of the sets containing by traversing a path in the tree in a topdown manner: We start from the root and maintain a partial weight , which is initially set to . At each node , we find the sign condition of the polynomial tuple at whose realization contains , add to , and recursively query the child of . The total query time is , where the constant of proportionality depends on (and thus on ). Putting everything together, we obtain the following:
Theorem 5.
Let be a set of semialgebraic sets in , each of complexity at most for some constant , and let be the weight of each set that belongs to a semigroup. Assuming that the semigroup operation can be performed in constant time, can be preprocessed in randomized expected time into a data structure of size , for any constant , so that the cumulative weight of the sets that contains a query point can be computed in time.
4.2 Range searching
Next, we consider range searching with semialgebraic sets: Let be a set of points in . Each point is assigned a weight that belongs to a semigroup. Again we assume that the semigroup operation takes constant time. We wish to preprocess so that for a query range , represented as a semialgebraic set in , the cumulative weight of can be computed in time. Here we assume that the query ranges (semialgebraic sets) are parameterized as described in Section 3.1. That is, we have a fixed variate Boolean function . A query range is represented as a point , for some , and the underlying semialgebraic set is . We refer to as the dimension of the query space, and to the range searching problem in which all query ranges are of the form as semialgebraic range searching.
For a point , let denote the set of semialgebraic sets that contain , i.e., . It can be checked that is a semialgebraic set whose complexity depends only on , and . Let . For a query range , we now wish to compute the cumulative weight of the sets in that contain . This can be done using Theorem 5. Putting everything together, we obtain the following:
Theorem 6.
Let be a set of points in , let be the weight of that belongs to a semigroup, and let be a fixed variate Boolean function for some constant . Let