 # A Practical Algorithm for Enumerating Collinear Points

This paper studies the problem of enumerating all maximal collinear subsets of size at least three in a given set of n points. An algorithm for this problem, besides solving degeneracy testing and the exact fitting problem, can also help with other problems, such as point line cover and general position subset selection. The classic topological sweeping algorithm of Edelsbrunner and Guibas can find these subsets in O(n^2) time in the dual plane. We present an alternative algorithm that, although asymptotically slower than their algorithm in the worst case, is simpler to implement and more amenable to parallelization. If the input points are decomposed into m convex polygons, our algorithm has time complexity O(n^2 m) and space complexity O(n). Our algorithm can be parallelized on the CREW PRAM with time complexity O(n m) using n processors.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

We study the problem of finding all maximal collinear subsets of size at least three in a given set of points in the plane. In this paper, we assume the real RAM model of computation, where real arithmetic operations and comparison of reals take constant time and the floor function () is not allowed .

Our main motivation is that some of the algorithms for problems like point line cover (i.e., covering a set of points with the minimum number of lines)  or general position subset selection (finding the largest subset of points in general position)  need to identify maximal collinear subsets as a first step. A special case of this problem is the well-known degeneracy testing problem (i.e. testing whether any three of the points are collinear). The fastest known algorithm for this problem has time complexity , which seems best possible, based on either the number of sidedness queries  or 3-SUM reduction . Another special case is the exact fitting problem, which tries to find the line that covers the most points . Finding maximal collinear subsets trivially solves these two problems.

As with degeneracy testing, the best known algorithm for finding collinear subsets uses topological sweeping with time complexity and space complexity .

In this paper, we present an alternative algorithm based on cyclically sorting the input points. The algorithm runs in time and space , if the input points can be decomposed into convex polygons. Our algorithm is much easier to implement than those depending on arrangements, and should be almost as fast in practice. The techniques we use for sorting the points cyclically may be helpful in other algorithms, such as those for computing visibility graphs. Another advantage of our algorithm is that it can be executed in parallel on the CREW PRAM in time and space using processors.

The rest of this paper is organized as follows. We summarize related work in Section 2 and present the basis of our algorithm in Section 3 with time complexity . In Section 4, we improve its time complexity to , if input points are decomposed into convex polygons. We present our parallel algorithm in Section 5 and end with some concluding remarks in Section 6.

## 2. Related Work

The problem of finding collinear subsets can be restated in the dual plane. Each point in the input is mapped to a line in the dual plane. A subset of these lines intersect each other at a common point, if their corresponding points in the original plane are collinear. Therefore, the problem of finding maximal sets of collinear points is equivalent to finding the set of lines that intersect each other at each intersection in the dual plane. Intersecting lines can be identified using plane sweeping with time complexity . An alternative for identifying intersecting lines is using an arrangement of lines, i.e. a partition of the plane induced by the set of lines into vertices, edges, and faces. An arrangement has space complexity and can be constructed with time complexity .

The well-known topological sweeping algorithm of Edelsbrunner and Guibas  can sweep arrangements without constructing them, thus reducing its space complexity to . The time complexity matches the lower-bound for the time complexity of the problem of testing degeneracy and is thus the best possible. The algorithm, however, is rather difficult to implement, especially since input points are not in general position (see for instance ). Note that reporting line intersections in the dual plane is not enough for finding collinear points. These intersections should also be ordered so that all lines that intersect can be reported at once efficiently, making intersection reporting algorithms (for instance ) inefficient for this problem.

Because topological sweeping is inherently sequential and thus unsuitable for parallel execution, parallel algorithms for sweeping or constructing arrangements have appeared in the literature. Anderson et al.  presented a CREW PRAM algorithm for constructing arrangements in time with processors and Goodrich  presented an algorithm (also on the CREW PRAM) with the same goal with time complexity with processors. Since these algorithms construct the arrangement, they have space complexity , which is more than the space complexity of our parallel algorithm. Our algorithm also beats Goodrich’s in time complexity, besides being easier to code since it uses only simple data structures.

Other methods have been presented to partition the arrangement into smaller regions with fewer points and use the sequential algorithm for sweeping these regions in parallel (e.g.  and ). They are usually based on the assumption that the points are in general position. Such partitions yield poor performance, when this assumption is violated as it is the case in the problem of finding collinear points. Similar parallel sweeping methods have been presented for specific applications (e.g., rectangle intersection  and hidden surface elimination ), in which there are fewer events than the number of intersections in the worst case (for finding collinear subsets ). More recently, McKenny et al. presented a plane sweep algorithm that divides the plane into vertical slabs perpendicular to the sweep line, which has time complexity with processors for intersections , whose cost is more than our algorithm.

## 3. The Base Algorithm

In what follows, we describe our main algorithm for enumerating maximal collinear subsets of a set of points. It assumes an arbitrary ordering on the input points . It processes each point (in the order specified by ) and finds and reports all maximal sets of points collinear with that have not already been found in a previous iteration.

It is not difficult to see that except for step 1 of the algorithm, each step has time complexity . Sorting the points in step 1 can be performed in . Note that the points in sequences and remain sorted according to both and in step 1 and can be merged in . Since these steps are repeated for every point, the total time complexity of the algorithm is and its space complexity is . Theorem 3.1 shows its correctness.

###### Theorem 3.1.

The algorithm reports every maximal set of collinear points exactly once.

###### Proof.

Let be a set of maximal collinear points in and let be the first point of in . When is sorted cyclically around , all points of appear contiguously either in or in , and when these sequences are merged, in . Thus, they are identified in step 1 and reported in step 1, since was the first point of in . For every other point of , although the points of follow each other in , they will not be reported in step 1, since is before in . Note that the algorithm never outputs non-collinear points, thanks to step 1. ∎

In what follows, we try to improve the time complexity of step 1 (i.e. sorting the points to obtain

). A simple heuristic, which we shall not pursue here, is to adjust

so that consecutive points are as close as possible, so that fewer alterations are made to . This would help if in step 1, the partially sorted sequence for the previous point is sorted for the current point (some sorting algorithms are much faster for sorting partially-sorted sequences). More formally, let and be two consecutive points in . A pair of points, and , is inverted if appears before in but after in . This can happen only if the line passing through and intersects the segment from to . While moving a point from to , it can be shown that each such intersection swaps two adjacent points in . Thus, the goal is to minimize the number of such intersections or inverted pairs.

In the next section we use another method for the same goal: finding sets of points, whose order does not change substantially in for different points in .

## 4. Using Convex Polygon Decompositions

We now concentrate on improving the complexity of computing for every point in . For that, we try to identify sets of points in that appear in the same order in for every in . As we show in Lemma 4.1, convex polygons have this property to a large extent. A tangent from a point to a convex polygon is a line that passes through and another point of such that all of lies to one side of ; is called the tangent point from to polygon . If the tangent passes through more than one point of , the closest to is considered the tangent point. Tangent points can be identified via a linear scan (or binary search) through the points of .

###### Lemma 4.1.

For any point and any convex polygon , can be decomposed into at most two sequences such that the points in any of these sequences are cyclically sorted (either clockwise or counterclockwise) from .

###### Proof.

If is inside , every point of appears in the same order when cyclically ordered from ; thus, the whole of is a sorted sequence. Otherwise, let and be tangent points from to polygon . Then, the sequences and are cyclically sorted from (indices are modulo ). ∎

###### Theorem 4.2.

If the set of points can be decomposed into convex polygons, the algorithm introduced in Section 3 for identifying collinear points can be improved to achieve the time complexity of .

###### Proof.

We modify step 1 of the algorithm for calculating . Based on Lemma 4.1, we can obtain sequences (), all of which are sorted cyclically around . This can be done in (to identify the tangent points of the polygons and to create the new sequences). We have to merge these sorted sequences to obtain . This can be done in using a heap priority queue: initially the first points of the sorted sequences are inserted into the heap with time complexity . Then, the point with the smallest value of is extracted from the heap and the point with the next smallest in ’s sequence is inserted into the heap. This process is repeated times to obtain . Since extracting the value from and inserting a new value into the heap can be done in , can be constructed with time complexity , which is equivalent to , since . ∎

One method for decomposing a set of points into convex polygons is convex hull peeling, i.e., repeatedly extracting convex hulls from the set (for applications and a survey, the reader may consult ). The number of resulting convex hulls is sometimes called convex hull peeling depth. Common convex hull algorithms can be used for convex hull peeling by repeatedly extracting convex hulls in , if the convex hull peeling depth is (this seems adequate for our purpose; there are faster algorithms however ). This yields the following corollary. Note that most convex hull algorithms can be slightly modified to allow collinear points in the boundary of the hull.

###### Corollary 4.3.

All maximal collinear subsets of size at least three in a set of points can be identified in , where is the convex hull peeling depth of the points.

Parts (a)–(d) of Figure 1 demonstrate the steps of the algorithm described in this section for decomposing the set of input points into convex polygons (part (b)), splitting each polygon into at most two cyclically sorted sequences (part (c)), and merging these sequences to obtain (part (d); the arrows show the order of processing the points in the sequences). The points on each of the lines , , and should be contiguous in . Figure 1. (a)–(b) An input point set P is decomposed into convex layers. The algorithm is currently processing point p. (c) The convex layers are decomposed into sorted subsequences, bookended by points tangent to p. (d) Subsequences are split up and added to U or D depending on whether they lie above or below the horizontal line l through p. (e) Points appearing in D are then mapped to their antipodes and merged with U to obtain Mp. Collinear points appear contiguous in Mp.

## 5. Parallel Algorithm

To obtain a parallel algorithm for finding sets of collinear points, both the algorithm presented in Section 3 and convex polygon decomposition should be performed in parallel, as Theorem 5.2 shows. We shall use the following lemma in the proof of Theorem 5.2.

If , then .

###### Proof.

Suppose for the sake of contradiction that the converse is true, i.e. if , then for some and . This implies that , a contradiction, since the function for is monotonically increasing. ∎

We are now ready to prove Theorem 5.2.

###### Theorem 5.2.

With processors, it is possible to identify all maximal collinear subsets of a set of points on the CREW PRAM with time complexity and space complexity , where is the convex hull peeling depth of the points.

###### Proof.

The parallel algorithm first decomposes the input points into convex polygons and then enumerates collinear subsets of the points. We discuss these two steps separately as follows.

For decomposing points into convex polygons, we use the parallel algorithm presented by Akl : it computes the convex hull of a set of points in time with processors. Akl’s algorithm assumes input points to be sorted by their -coordinates; this can be done on the CREW PRAM in with processors . The convex hull algorithm should be repeated times, yielding the time complexity of . Thus, the total time complexity of the algorithm is , which is equivalent to . This follows trivially if . However, if it follows from Lemma 5.1.

Given that there is no dependency between different iterations of the algorithm presented in Section 3, it can be parallelized by distributing the points among the processors, each of which requires to finish its task with space complexity . Therefore, it remains to improve the overall space complexity of the parallel algorithm to .

The space complexity per processor is due to storing the sorted and merged sequences of points ( and , as defined in Section 3), storing sequences of points obtained after splitting convex polygons, and storing consecutive collinear points in step 1 of the algorithm presented in Section 3. We reduce the space complexity of each processor to (plus for storing convex polygons, which is shared among the processors). Instead of storing all elements of a subsequence of a sequence of points, we store two pointers to indicate its start and end positions. Therefore, we can store the subsequences resulting from splitting the polygons in (there are at most such subsequences).

We now modify the base algorithm not to store and . We use some of the symbols defined in Section 3: and for point . Let be the set of cyclically sorted sequences obtained by splitting convex polygons, as explained in Theorem 4.2. We split each sequence in into at most two sequences to obtain the set (this still requires words of memory, as the resulting subsequences are also subsequences of the convex polygons): Let be the subsequence of points in , for which and be the rest of the sequence. We insert the first point in each sequence in to a priority heap, as described in the proof of Theorem 4.2. Then, instead of merging these subsequences to obtain , we detect collinear points (steps 1 and 1 of the base algorithm) as we extract each point from the heap based on their . This is demonstrated in part (e) of Figure 1: for points like below , . Therefore, the algorithm virtually considers their antipodal points when extracting the minimum from the heap; these antipodal points are coloured red in the figure (for instance, for ). The arrow shows the order of inserting points into the heap for each sequence. The three points in , including and , are extracted successively from the heap, as required in step 1 of the base algorithm. Given there are

items in the heap at any moment, its space complexity is also

. ∎

## 6. Conclusion

We have presented a simple sequential algorithm for finding collinear points in the real RAM model that initially ran in time and linear space. We then improved its running time to by first decomposing the points into convex layers. Decomposition into convex layers by repeatedly invoking an optimal convex hull algorithm takes time. Note that this is also by Lemma 5.1. By taking advantage of the ordering in the constituent convex polygons, we were able to avoid the sorting step and reduce the time complexity to using linear space. Finally, we showed a parallel version of the algorithm on the CREW PRAM that runs in time using space and processors.

While we have used the convex layers decomposition to obtain the convex polygons used in our algorithm, we could have used any other decomposition. Particularly, it would be interesting to explore the space of convex decompositions that minimize . We have restricted our attention to point sets in the plane, but hope to explore generalizations to higher dimensions in future work. In Section 3, we had hinted at the idea of finding orderings that minimize inversions between iterations of the base algorithm. It would be interesting to further explore this idea in a future work. We have also worked within the confines of the real RAM model. It would be interesting to explore algorithms for the same problem in alternative models of computation.

## References

•  Franco P Preparata and Michael Shamos. Computational geometry: an introduction. Springer Science & Business Media, 2012.
•  S. Kratsch, G. Philip, and S. Ray. Point line cover: The easy kernel is essentially tight. ACM Transactions on Algorithms, 12(3):40:1–40:16, 2016.
•  V. Froese, I. Kanj, A. Nichterlein, and R. Niedermeier. Finding points in general position. In The Canadian Conference on Computational Geometry, pages 7–14, 2016.
•  H. Edelsbrunner and L. J. Guibas. Topologically sweeping an arrangement. Journal of Computer and System Sciences, 38(1):165–194, 1989.
•  J. G. Erickson. Lower Bounds for Fundamental Geometric Problems. PhD Thesis, University of California at Berkeley, 1996.
•  A. Gajentaan and M. H. Overmars. On a class of problems in computational geometry. Computational Geometry, 45(4):140–152, 1995.
•  L. J. Guibas, M. H. Overmars, and J.-M. Robert. The exact fitting problem in higher dimensions. Computational Geometry, 6(4):215–230, 1996.
•  J. L. Bentley and T. Ottmann. Algorithms for reporting and counting geometric intersections. IEEE Transactions on Computers, 28(9):643–647, 1979.
•  M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer, 2008.
•  E. Rafalin, D. Souvaine, and I. Streinu. Topological sweep in degenerate cases. In International Workshop on Algorithm Engineering and Experiments, pages 155–165. Springer, 2002.
•  I. J. Balaban. An optimal algorithm for finding segments intersections. In Symposium on Computational Geometry, pages 211–219, 1995.
•  R. Anderson, P. Beanie, and E. Brisson. Parallel algorithms for arrangements. Algorithmica, 15(2):104–125, 1996.
•  M. T. Goodrich. Constructing arrangements optimally in parallel. In ACM Symposium on Parallel Algorithms and Architectures, pages 169–179. ACM, 1991.
•  P. K. Agarwal. Partitioning arrangements of lines i: An efficient deterministic algorithm. Discrete & Computational Geometry, 5(5):449–483, 1990.
•  T. Hagerup, H. Jung, and E. Welzl.

Efficient parallel computation of arrangements of hyperplanes in d dimensions.

In ACM Symposium on Parallel Algorithms and Architectures, pages 290–297. ACM, 1990.
•  A. B. Khlopotine, V. Jandhyala, and D. Kirkpatrick. A variant of parallel plane sweep algorithm for multicore systems. IEEE Transactions Computer-Aided Design of Integrated Circuits and Systems, 32(6):966–970, 2013.
•  M. T. Goodrich, M. R. Ghouse, and J. Bright. Sweep methods for parallel computational geometry. Algorithmica, 15(2):126–153, 1996.
•  M. McKenney, R. Frye, M. Dellamano, K. Anderson, and J. Harris. Multi-core parallelism for plane sweep algorithms as a foundation for gis operations. GeoInformatica, 21(1):151–174, 2017.
•  R. A. Rufai. Convex Hull Problems. PhD Thesis, George Mason University, 2015.
•  B. Chazelle. On the convex layers of a planar set. IEEE Transactions on Information Theory, 31(4):509–517, 1985.
•  S. G. Akl. Optimal parallel algorithms for computing convex hulls and for sorting. Computing, 33(1):1–11, 1984.
•  R. Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770–785, 1988.