Since the initial presentation of this problem in 1968, there has not been significant progress from the algorithmic point of view. Although there were several attempts for developing algorithms, they either did not guarantee the optimality [4, 6, 16] or had exponential running time . In SoCG 1991, Wilfong  proved that the consistent subset problem is NP-complete if the input points are colored by at least three colors—the proof is based on the NP completeness of the disc cover problem . He further presented a technically-involved -time algorithm for a special case of two-colored input points where one point is red and all other points are blue; his elegant algorithm transforms the consistent subset problem to the problem of covering points with disks which in turn is transformed to the problem of covering a circle with arcs. It has been recently proved, by Khodamoradi et al. , that the consistent subset problem with two colors is also NP-complete—the proof is by a reduction from the planar rectilinear monotone 3-SAT . Observe that the one color version of the problem is trivial because every single point is a consistent subset. More recently, Banerjee et al.  showed that the consistent subset problem on collinear points, i.e., points that lie on a straight line, can be solved optimally in time.
Recently, Gottlieb et al.  studied a two-colored version of the consistent subset problem — referred to as the nearest neighbor condensing problem — where the points come from a metric space. They prove a lower bound for the hardness of approximating a minimum consistent subset; this lower bound includes two parameters: the doubling dimension of the space and the ratio of the minimum distance between points of opposite colors to the diameter of the point set. Moreover, for this two-colored version of the problem, they give an approximation algorithm whose ratio almost matches the lower bound.
In a related problem, which is called the selective subset problem, the goal is to find the smallest subset of such that for every the nearest neighbor of in belongs to . Wilfong  showed that this problem is also NP-complete even with two colors. See  for some recent progress on this problem.
In this paper we study the consistent subset problem. We improve some previous results and present some new results. To obtain these results, we combine tools from planar separators, additively-weighted Voronoi diagrams with respect to a convex distance function, point location in farthest-point Voronoi diagrams, range trees, paraboloid lifting, minimum covering of a circle with arcs, and several geometric transformations. We present the first subexponential-time algorithm for this problem. We also present an -time algorithm that finds a consistent subset of size two in two-colored point sets (if such a subset exists); this is obtained by transforming the consistent subset problem into a point-cone incidence problem in dimension three. Towards our proof of this running time we present a deterministic -time algorithm for computing a variant of the compact Voronoi diagram; this improves the expected running time of the randomized algorithm of Bhattacharya et al. . We also revisit the case where one point is red and all other points are blue; we give an -time algorithm for this case, thereby improving the previous running time of . For collinear points, we present an -time algorithm; this improves the previous running time by a factor of . We also present a non-trivial -time dynamic programming algorithm for points arranged on two parallel lines.
2 A Subexponential Algorithm
The consistent subset problem can easily be solved in exponential time by simply checking all possible subsets of . In this section we present the first subexponential-time algorithm for this problem. We consider the decision version of this problem in which we are given a set of colored points in the plane and an integer , and we want to decide whether or not has a consistent subset of size . Moreover, if the answer is positive, then we want to find such a subset. This problem can be solved in time by checking all possible subsets of size . We show how to solve this problem in time ; we use a recursive separator-based technique that was introduced in 1993 by Hwang et al.  for the Euclidean -center problem, and then extended by Marx and Pilipczuk  for planar facility location problems. Although this technique is known before, its application in our setting is not straightforward and requires technical details which we give in this section.
Consider an optimal solution of size . The Voronoi diagram of , say , is a partition of the plane into convex regions. We want to convert to a 2-connected 3-regular planar graph that have a balanced curve separator. Then we want to use this separator to split the problem into two subproblems that can be solved independently. To that end, first we introduce small perturbation
to the coordinates of points of to ensure that no four points lie on the boundary of a circle; this ensures that every vertex of has degree 3. The Voronoi diagram consists of finite segments and infinite rays. We want to have at most three infinite rays. To achieve this, we introduce three new points that lie on the vertices of a sufficiently large equilateral triangle222The triangle is large in the sense that for every point , the closet point to , among , is in . that contains , and then we color them by three new colors; see the right figure. Since these three points have distinct colors, they appear in any consistent subset of . Moreover, since they are far from the original points, by adding them to any consistent subset of we obtain a valid consistent subset for . Conversely, by removing these three points from any consistent subset of we obtain a valid consistent subset for . Therefore, in the rest of our description we assume, without loss of generality, that contains . Consequently, the optimal solution also contains those three points; this implies that has three infinite rays which are introduced by (see the above figure). We introduce a new vertex at infinity and connect these three rays to that vertex. To this end we obtain a 2-connected 3-regular planar graph, namely . Marx and Pilipczuk  showed that such a graph has a polygonal separator of size (going through faces and vertices) that is face balanced, in the sense that there are at most faces of strictly inside and at most faces of strictly outside . The vertices of alternate between points of and the vertices of as depicted in Figure 1(a). See  for an alternate way of computing a balanced curve separator.
We are going to use dynamic programming based on balanced curve separators of . The main idea is to use to split the problem into two smaller subproblems, one inside and one outside , and then solve each subproblem recursively. But, we do not know and hence we have no way of computing . However, we can guess by trying all possible balanced curve separators of size .
Every vertex of is either a point of or a vertex of (and consequently a vertex of ) that is introduced by three points of . Therefore, every curve separator of size is defined by at most points of , and thus, the number of such separators is at most . To find these curve separators, we try every subset of at most points of . For every such subset we compute its Voronoi diagram, which has at most vertices. For the set that is the union of the points and the vertices, we check all subsets and choose every subset that forms a balanced curve separator (that alternates between points and vertices). Therefore, in a time proportional to we can compute all balanced curve separators.
By trying all balanced curve separators, we may assume that we have correctly guessed and the subset of , with , that defines . The solution of our main problem consists of and the solutions of the two separate subproblems, one inside and one outside . To solve these two subproblems recursively, in the later steps, we get subproblems of the following form. Throughout our description, we will assume that is fixed for all subproblems. The input of every subproblem consists of a positive integer , a subset of points of that are already chosen to be in the solution, and a polygonal domain —possibly with holes—of size which is a polygon its vertices alternating between the points of and the vertices of the Voronoi diagram of . The task is to select a subset of size such that:
is a polygon where its vertices alternate between the points of and the vertices of the Voronoi diagram of , and
is a consistent subset for .
See Figure 1(b) for an illustration of such a subproblem. The top-level subproblem has and . We stop the recursive calls as soon as we reach a subproblem with , in which case, we spend time to solve this subproblem; this is done by trying all subsets of that have size . For every subproblem, the number of points in (i.e., ) is at most three times the number of vertices on the boundary of the domain . The number of vertices on the boundary of —that are accumulated during recursive calls—is at most
Therefore, , and thus the Voronoi diagram of has a balanced curve separator of size .333In fact the 2-connected 3-regular planar graph obtained from the Voronoi diagram of has such a separator. We try all possible such separators, and for each of which we recursively solve the two subproblems in its interior and exterior. For these two subproblems to be really independent we include the points, defining the separator, in the inputs of both subproblems. Therefore, the running time of our algorithm can be interpreted by the following recursion
which solves to . Notice that our algorithm solves the decision version of the consistent subset problem for a fixed .
To compute the consistent subset of minimum cardinality, whose size, say , is unknown at the start of the algorithm, we apply the following standard technique: Start with a constant value , for example . Run the decision algorithm with the value . If the answer is negative, then double the value of and repeat this process until the first time the decision algorithm gives a positive answer.
Consider the last value for . Note that . We perform a binary search for in the interval . In this way, we find the value of , as well as the consistent subset of minimum cardinality, by running the decision algorithm times. Thus, the total running time is , which is . We have proved the following theorem.
A minimum consistent subset of colored points in the plane can be computed in time, where is the size of the minimum consistent subset.
3 Consistent Subset of Size Two
In this section we investigate the existence of a consistent subset of size two in a set of bichromatic points where every point is colored by one of the two colors, say red and blue. Before stating the problem formally we introduce some terminology. For a set of points in the plane, we denote the convex hull of by . For two points and in the plane, we denote the straight-line segment between and by , and the perpendicular bisector of by .
Let and be two disjoint sets of total points in the plane such that the points of are colored red and the points of are colored blue. We want to decide whether or not has a consistent subset of size two. Moreover, if the answer is positive, then we want to find such points, i.e., a red point and a blue point such that all red points are closer to than to , and all blue points are closer to than to . Alternatively, we want to find a pair of points such that separates and . This problem can be solved in time by trying all the pairs ; for each pair we can verify, in time, whether or not separates and . In this section we show how to solve this problem in time . To that end, we assume that and are disjoint, because otherwise there is no such pair .
It might be tempting to believe that a solution of this problem contains points only from the boundaries of and . However, this is not necessarily the case; in the figure to the right, the only solution of this problem contains and which are in the interiors of and . Also, due to the close relation between Voronoi diagrams and Delaunay triangulations, one may believe that a solution is defined by the two endpoints of an edge in the Delaunay triangulation of . This is not necessarily the case either; the green edges in the figure to the right, which are the Delaunay edges between and , do not introduce any solution.
A separating common tangent of two disjoint convex polygons, and , is a line that is tangent to both and such that and lie on different sides of . Every two disjoint convex polygons have two separating common tangents; see Figure 2. Let and be the separating common tangents of and . Let and be the subsets of and on the boundaries of and , respectively, that are between and as depicted in Figure 2. For two points and in the plane, let be the closed disk that is centered at and has on its boundary.
For every two points and , the bisector separates and if and only if
For the direct implication since separates and , every red point (and in particular every point in ) is closer to than to ; this implies that does not contain and thus (i) holds. Also, every blue point (and in particular every point in ) is closer to than to ; this implies that contains and thus (ii) holds. See Figure 2.
Now we prove the converse implication by contradiction. Assume that both (i) and (ii) hold for some and some , but the bisector does not separate and . After a suitable rotation we may assume that is vertical, is to the left side of and is to the right side of . Since does not separate and , there exists either a point of to the right side of , or a point of to the left side of . If there is a point of to the right side of then there is also a point to the right side of . In this case is closer to than to , and thus the disk contains which contradicts (i). If there is a point of to the left side of then there is also a point to the left side of . In this case is closer to than to and thus the disk does not contain which contradicts (ii). ∎
Lemma 1 implies that for a pair to be a consistent subset of it is necessary and sufficient that every point of is closer to than to , and every point of is closer to than to . This lemma does not imply that and are necessarily in and . Observe that Lemma 1 holds even if we swap the roles of with in (i) and (ii). Also, observe that this lemma holds even if we take and as all red and blue points on boundaries of and .
For every red point we define a feasible region as follow
For every two points and , the bisector separates and if and only if .
Based on this corollary, our original decision problem reduces to the following question.
Is there a blue point such that lies in the feasible region of some red point ?
If the answer to Question 1 is positive then is a consistent subset for , and if the answer is negative then does not have a consistent subset with two points. In the rest of this section we show how to answer Question 1. To that end, we lift the plane onto the paraboloid by projecting every point in onto the point in . This lift projects a circle in onto a plane in . Consider a disk in and let be the plane in that contains the projection of the boundary circle of . Let be the lower closed halfspace defined by , and let be the upper open halfspace defined by . For every point , its projection lies in if and only if , and lies in otherwise. Moreover, lies in if and only if is on the boundary circle of . For every point we define a polytope in as follow
Based on the above discussion, Corollary 1 can be translated to the following corollary.
For every two points and , the bisector separates and if and only if .
This corollary, in turn, translates Question 1 to the following question.
Is there a blue point such that its projection lies in the polytope for some red point ?
Now, we are going to answer Question 2. The polytope is the intersection of some halfspaces, each of which has on its boundary plane. Therefore, is a cone in with apex ; see Figure 3. Recall that , however, for the purposes of worst-case running-time analysis and to simplify indexing, we will index the red points, and also the blue points, from 1 to . Let be the points of . For every point , let be the translation that brings to . Notice that is the identity transformation. In the rest of this section we will write for .
For every point , the cone is the translation of with respect to .
Proof. For a circle in , let denote the plane in that translates onto. For every two concentric circles and in it holds that and are parallel; see the figure to the right. It follows that, if passes through the point , and passes through the point , then is obtained from by the translation that brings to , that is . A similar argument holds also for the halfspaces defined by and . Since for every the disks and are concentric and the boundary of passes through and the boundary of passes through , it follows that and . Since a translation of a polytope is obtained by translating each of the halfspaces defining it, we have as depicted in Figure 3. ∎
It follows from Lemma 2 that to answer Question 2 it suffices to solve the following problem: Given a cone defined by halfspaces, translations of , and set of points, we want to decide whether or not there is a point in some cone (see Figure 3). This can be verified in time, using Theorem 7 that we will prove later in Section 6. This is the end of our constructive proof. The following theorem summarizes our result in this section.
Given a set of bichromatic points in the plane, in time, we can compute a consistent subset of size two if such a set exists.
4 One Red Point
In this section we revisit the consistent subset problem for the case where one input point is red and all other points are blue. Let be a set of points in the plane consisting of a red point and blue points. Observe that any consistent subset of contains the only red point and some blue points. In his seminal work in SoCG 1991, Wilfong  showed that has a consistent subset of size at most seven (including the red point); this implies an -time brute force algorithm for this problem. Wilfong showed how to solve this problem in -time; his elegant algorithm transforms the consistent subset problem to the problem of covering points with disks which in turn is transformed to the problem of covering a circle with arcs. The running time of his algorithm is dominated by the transformation to the circle covering problem which involves computation of arcs in time; all other transformations together with the solution of the circle covering problem take time ([16, Lemma 19 and Theorem 9]).
We first introduce the circle covering problem, then we give a summary of Wilfong’s transformation to this problem, and then we show how to perform this transformation in time which implies the same running time for the entire algorithm. We emphasis that the most involved part of the algorithm, which is the correctness proof of this transformation, is due to Wilfong.
Let be a circle and let be a set of arcs covering the entire . The circle covering problem asks for a subset of , with minimum cardinality, that covers the entire .
Wilfong’s algorithm starts by mapping input points to the projective plane, and then transforming (in two stages) the consistent subset problem to the circle covering problem. Let denote the set of points after the mapping, and let denote the only red point of . The transformation, which is depicted in Figure 4(a), proceeds as follows. Let be a circle centered at that does not contain any blue point. Let be the blue points in clockwise circular order around ( is the first clockwise point after , and is the first counterclockwise point after ). For each point , let be the disk of radius centered at . Define to be the first counterclockwise point (measured from ) that is not in , and similarly define to be the first clockwise point that is not in . Denote by the open arc of that is contained in the wedge with counterclockwise boundary ray from to and the clockwise boundary ray from to .444Wilfong shrinks the endpoint of that corresponds to by half the clockwise angle from to the next point, and shrinks the endpoint of that corresponds to by half the counterclockwise angle from to the previous point. Let be the set of all arcs ; since blue points are assumed to be in circular order, covers the entire . Wilfong proved that our instance of the consistent subset problem is equivalent to the problem of covering with . The running time of his algorithm is dominated by the computation of in time. We show how to compute in time.
In order to find each arc it suffices to find the points and . Having the clockwise ordering of points around , one can find these points in time for each , and consequently in time for all ’s. In the rest of this section we show how to find for all ’s in time; the points can be found in a similar fashion.
By the definition of all points of the sequence , except , lie inside . Therefore among all points , the point is the farthest from . This implies that in the farthest-point Voronoi diagram of , the point lies in the cell of . To exploit this property of , we construct a 1-dimensional range tree on all blue points based on their clockwise order around ; blue points are stored at the leaves of as in Figure 4(b). At every internal node of we store the farthest-point Voronoi diagram of the blue points that are stored at the leaves of the subtree rooted at ; we refer to this diagram by FVD(). This data structure can be computed in time because has levels and in each level we compute farthest-point Voronoi diagrams of total
points. To simplify our following description, at the moment we assume thatis a linear order. At the end of this section, in Remark 1, we show how to deal with the circular order.
We use the above data structure to find each point in time. To that end, we walk up the tree from the leaf containing (first phase), and then walk down the tree (second phase) as described below; also see Figure 4(b). For every internal node , let and denote its left and right children, respectively. In the first phase, for every internal node in the walk, we locate the point in FVD() and find the point that is farthest from . If lies in then also does every point stored at the subtree of . In this case we continue walking up the tree and repeat the above point location process until we find, for the first time, the node for which does not lie in . To this end we know that is among the points stored at . Now we start the second phase and walk down the tree from . For every internal node in this walk, we locate in FVD() and find the point that is farthest from . If lies in , then also does every point stored at , and hence we go to , otherwise we go to . At the end of this phase we land up in a leaf of , which stores . The entire walk has nodes and at every node we spend time for locating . Thus the time to find is . Therefore, we can find all ’s in total time.
A minimum consistent subset of points in the plane, where one point is red and all other points are blue, can be computed in time.
Remark 1. To deal with the circular order , we build the range tree with leaves . For a given , the point can be any of the points . To find , we first follow the path from the root of to the leftmost leaf that stores , and then from that leaf we start looking for as described above.
5 Restricted Point Sets
In this section we present polynomial-time algorithms for the consistent subset problem on three restricted classes of point sets. First we present an -time algorithm for collinear points; this improves the previous quadratic-time algorithm of Banerjee et al. . Then we present an involved non-trivial -time dynamic programming algorithm for points that are placed on two parallel lines. Finally we present an -time algorithm for two-colored points, namely red and blue, that are placed on two parallel lines such that all points on one line are red and all points on the other line are blue.
5.1 Collinear Points
Let be a set of colored points on the -axis, and let be the sequence of these points from left to right. We present a dynamic programming algorithm that solves the consistent subset problem on . To simplify the description of our algorithm we add a point very far (at distance at least ) to the right of . We set the color of to be different from that of . Observe that every solution for contains . Moreover, by removing from any optimal solution of we obtain an optimal solution for . Therefore, to compute an optimal solution for , we first compute an optimal solution for and then remove .
Our algorithm maintains a table with entries . Each table entry represents the number of points in a minimum consistent subset of provided that is in this subset. The number of points in an optimal solution for will be ; the optimal solution itself can be recovered from . In the rest of this section we show how to solve a subproblem with input provided that should be in the solution (thereby in the rest of this section the phrase “solution of ” refers to a solution that contains ). In fact, we show how to compute , by a bottom-up dynamic programming algorithm that scans the points from left to right. If is monochromatic, then the optimal solution contains only , and thus, we set . Hereafter assume that is not monochromatic. Consider the partition of into maximal blocks of consecutive points such that the points in each block have the same color. Let denote these blocks from left to right, and notice that is in . Assume that the points in are red and the points in are blue. Let be the leftmost point in ; see Figure 5(a). Any optimal solution for contains at least one point from ; let be the rightmost such point ( can be either red or blue). Then, . Since we do not know the index , we try all possible values in and select one that produces a valid solution, and that minimizes :
The index produces a valid solution (or is valid) if one of the following conditions hold:
is red, or
is blue, and for every it holds that if is blue then is closer to than to , and if is red then is closer to than to .
If holds then and have the same color. In this case the validity of our solution for is ensured by the validity of the solution of . If holds then and have distinct colors. In this case the validity of our solution for depends on the colors of points . To verify the validity in this case, it suffices to check the colors of only two points that are to the left and to the right of the mid-point of the segment . This can be done in time for all blue points in while scanning them from left to right. Thus, can be computed in time because . Therefore, the total running time of the above algorithm is .
We are now going to show how to compute in constant time, which in turn improves the total running time to . To that end we first prove the following lemma.
Let be an integer, be a sequence of points in , and be an index for which is minimum. Then, .
To verify this inequality, observe that by adding to the optimal solution of we obtain a valid solution (of size ) for . Therefore, any optimal solution of has at most points, and thus . ∎
At every point , in every block , we store the index of the first point to the left of where and is strictly smaller than ; if there is no such point then we store at . These indices can be maintained in linear time while scanning the points from left to right. We use these indices to compute in constant time as described below.
Notice that if the minimum, in the above calculation of , is obtained by a red point in then it always produces a valid solution, but if the minimum is obtained by a blue point then we need to verify its validity. In the former case, it follows from Lemma 3 that the smallest for red points in is obtained either by or by the point whose index is stored at . Therefore we can find the smallest in constant time. Now consider the latter case where the minimum is obtained by a blue point in . Let be the rightmost point of , and let be the leftmost endpoint of . Set and as depicted in Figure 5(b). Set and , where and are the -coordinates of and . Any point that is to the right of is invalid because otherwise would be closer to than to . Any point that is to the left of is also invalid because otherwise would be closer to than to . However, every point , that is in the range , is valid because it satisfies condition above. Thus, to compute it suffices to find a point of in range with the smallest . By slightly abusing notation, let be the rightmost point of in range . It follows from Lemma 3 that the smallest is obtained either by or by the point whose index is stored at . Thus, in this case also, we can find the smallest in constant time.
It only remains to identify, in constant time, the index that we should store at (to be used in next iterations). If is the leftmost point in , then we store at . Assume that is not the leftmost point in , and let be the index stored at . In this case, if is smaller than then we store at , otherwise we store . This assignment ensures that stores a correct index.
Based on the above discussion we can compute and identify the index at in constant time. Therefore, our algorithm computes all values of in total time. The following theorem summarizes our result in this section.
A minimum consistent subset of collinear colored points can be computed in time, provided that the points are given from left to right.
5.2 Points on Two Parallel Lines
In this section we study the consistent subset problem on points that are placed on two parallel lines. Let and be two disjoint sets of colored points of total size , such that the points of are on a straight line and points of are on a straight line that is parallel to . The goal is to find a minimum consistent subset for . We present a top-down dynamic programming algorithm that solves this problem in time. By a suitable rotation and reflection we may assume that and are horizontal and lies above . If any of the sets and is empty, then this problem reduces to the collinear version that is discussed in Section 5.1. Assume that none of and is empty. An optimal solution may contain points from only , only , or from both and . We consider these three cases and pick one that gives the minimum number of points:
The optimal solution contains points from only . Consider any solution . For every point , let be the vertical projection of on . Then, a point is the closest point to if and only if is the closest point to . This observation suggests the following algorithm for this case: First project all points of vertically on ; let be the resulting set of points. Then, solve the consistent subset problem for points in , which are collinear on , with this invariant that the points of should not be included in the solution but should be included in the validity check. This can be done in time by modifying the algorithm of Section 5.1.
The optimal solution contains points from only . The solution of this case is analogous to that of previous case.
The optimal solution contains points from both and . The description of this case is more involved. Add two dummy points and at and on , respectively. Analogously, add and on . Color these four points by four new colors that are different from the colors of points in . See Figure 6. Set . Observe that every solution for contains all points of . Moreover, by removing from any optimal solution of we obtain an optimal solution for . Therefore, to compute an optimal solution for , we first compute an optimal solution for and then remove . In the rest of this section we show how to compute an optimal solution for . Without loss of generality, from now on, we assume that and belong to , and and belong to . For a point let be the vertical line through .
In the following description the term “solution” refers to an optimal solution. Consider a solution for this problem with input pair , and let and be the closest pair in this solution such that and (for now assume that such a pair exists; later we deal with all different cases). These two points split the problem into two subproblems and where contains all points of that are to the left of (including ), contains all points of that are to the right of (including ), and are defined analogously. Our choice of and ensures that no point in the solution lies between the vertical lines and because otherwise that point would be part of the closest pair. See Figure 6. Thus, and are independent instances of the problem in the sense that for any point in (resp. ) its closest point in the solution belongs to (resp. ). Therefore, if and are given to us, we can solve as follows: First we recursively compute a solution for that contains and does not contain any point between and . We compute an analogous solution for recursively. Then, we take the union of these two solutions as our solution of . We do not know and , and thus we try all possible choices.
Let and be the points of and , respectively, from left to right, where and . In later steps in our recursive solution we get subproblems of type where the input to this subproblem is and we want to compute a minimum consistent subset that
contains , and , and
does not contain any point between and , nor any point between and .
To simplify our following description, we may also refer to as a four dimensional matrix where each of its entries stores the size of the solution for the corresponding subproblem; the solution itself can also be retrieved from . The solution of the original problem will be stored in . In the rest of this section we show how to solve by a top-down dynamic programming approach. Let and be the first points of and , respectively, that are to the right sides of both and , and let and be the first points of and , respectively, that are to the left sides of both and ; see Figure 7. Depending on whether or not the solution of contains points from and we consider the following three cases and pick one that minimizes .
The solution does not contain points from any of and . Thus, the solution contains only , , , and . To handle this case, we verify the validity of . If this set is a valid solution, then we assign , otherwise we assign .
The solution contains points from both and . Let and be two such points with minimum distance. Our choice of and ensures that no point of the solution lies between and ; see Figure 7. Therefore, the solution of is the union of the solutions of subproblems and . Since we do not know and , we try all possible pairs and pick one that minimizes , that is