Practical I/O-Efficient Multiway Separators

07/06/2021
by   Svend C. Svendsen, et al.
Aarhus Universitet
0

We revisit the fundamental problem of I/O-efficiently computing r-way separators on planar graphs. An r-way separator divides a planar graph with N vertices into O(r) regions of size O(N/r) and O(√(Nr)) boundary vertices in total, where boundary vertices are vertices that are adjacent to more than one region. Such separators are used in I/O-efficient solutions to many fundamental problems on planar graphs such as breadth-first search, finding single-source shortest paths, topological sorting, and finding strongly connected components. Our main result is an I/O-efficient sampling-based algorithm that, given a Koebe-embedding of a graph with N vertices and a parameter r, computes an r-way separator for the graph under certain assumptions on the size of internal memory. Computing a Koebe-embedding of a planar graph is difficult in practice and no known I/O-efficient algorithm currently exists. Therefore, we show how our algorithm can be generalized and applied directly to Delaunay triangulations without relying on a Koebe-embedding. This adaptation can produce many boundary vertices in the worst-case, however, to our knowledge our result is the first to be implemented in practice due to the many non-trivial and complex techniques used in previous results. Furthermore, we show that our algorithm performs well on real-world data and that the number of boundary vertices is small in practice. Motivated by applications in geometric information systems, we show how our algorithm for Delaunay triangulations can be applied to compute the flow accumulation over a terrain, which models how much water flows over the vertices of a terrain. When given an r-way separator, our implementation of the algorithm outperforms traditional sweep-line-based algorithms on the publicly available digital elevation model of Denmark.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 16

08/30/2021

BDD-Based Algorithm for SCC Decomposition of Edge-Coloured Graphs

Edge-coloured directed graphs provide an essential structure for modelli...
10/18/2021

Planar Median Graphs and Cubesquare-Graphs

Median graphs are connected graphs in which for all three vertices there...
02/19/2018

The Complexity of Drawing a Graph in a Polygonal Region

We prove that the following problem is complete for the existential theo...
10/28/2014

Throat Finding Algorithms based on Throat Types

The three-dimensional geometry and connectivity of pore space determines...
07/20/2016

Dappled tiling

We consider a certain tiling problem of a planar region in which there a...
02/11/2019

Flexibility of planar graphs of girth at least six

Let G be a planar graph with a list assignment L. Suppose a preferred co...
04/30/2018

The idemetric property: when most distances are (almost) the same

We introduce the idemetric property, which formalises the idea that most...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Acknowledgements

We thank Pankaj K. Agarwal and Lars Arge for useful discussions on computing planar separators for Delaunay triangulations. We also thank Gerth S. Brodal for his input on this paper.

1 Introduction

In this paper, we revisit the fundamental problem of computing -way separators by presenting I/O-efficient algorithms and demonstrating how our results can be applied to I/O-efficiently compute flow accumulation on a terrain. We implement and evaluate our algorithms on real-world terrain data.

The -way separator is a generalization of the planar separator theorem by Lipton et al[16]. The planar separator theorem states that a planar graph with vertices can be partitioned into two unconnected sets each of size at most by removing vertices from the graph. Lipton and Tarjan [16] showed that such a partitioning can be computed in linear time in classical models of computation. Frederickson et al[11] described how the planar separator theorem can be generalized to the concept of an -way separator: Given a parameter , an -way separator is a division of the vertices of the graph into non-disjoint regions such that each vertex of the graph is contained in at least one region. A region contains two types of vertices: boundary vertices and interior vertices. An interior vertex is contained in exactly one region and is adjacent only to vertices in that region. A boundary vertex is shared among at least two regions and is adjacent to vertices in multiple regions. Each region contains vertices in total of which are boundary vertices. It follows that the total number of boundary vertices is .

The concept of -way separators is particularly interesting when handling planar graphs that exceed the capacity of the main memory since the computation of such separators can be used to divide the graph into memory-sized regions. In this situation, data is written and read in large blocks to disk, so it is important to design algorithms that minimize the movement of such blocks. This has led to the development of the so-called I/O model by Aggarwal and Vitter [2]. In this model, the computer is equipped with a two-level memory hierarchy consisting of an internal memory capable of holding data items and an external memory of unlimited size. All computation has to happen on data in internal memory. Data is transferred between internal and external memory in blocks of consecutive data items. Such a transfer is referred to as an I/O-operation or an I/O. The cost of an algorithm is the number of I/Os it performs. The number of I/Os required to read consecutive items from disk is and the number of I/Os required to sort items is  [2]. For all realistic values of , and we have . Maheshwari et al[17] showed that an -way separator can be computed in I/Os. This algorithm results in solutions to fundamental graph problems, such as breadth-first search, finding single-source shortest paths, topological sorting, and finding strongly connected components, that uses I/Os [4, 6]. Later, Arge et al[5] presented an I/O-efficient -way separator algorithm that uses I/Os and internal memory computation time. Furthermore, they showed that this result can be used to derive algorithms for finding single-source shortest paths, topological sorting, and finding strongly connected components using I/Os and internal memory computation time. This improves upon the result by Maheshwari et al. by upper bounding the internal memory computation time used.

To our knowledge, no algorithms for I/O-efficiently computing multiway separators have been implemented in practice yet due to the many non-trivial and complex techniques used to derive them. Therefore, we consider the problem of computing -way planar separators when given a Koebe-embedding of the graph. A Koebe-embedding of a planar graph is a set of disks in the plane with disjoint interiors where the center of each disk corresponds to a vertex in the graph and two disks are tangent if and only if the corresponding vertices in the graph are adjacent. Miller et al[19] showed that a Koebe-embedding can be used to partition the corresponding graph into two unconnected parts each of size at most by removing vertices from the graph. In this paper, we present a simple I/O-efficient algorithm that computes an -way separator for a planar graph when given a Koebe-embedding of the graph and having certain assumptions on the size of internal memory.

To our knowledge, the computation of Koebe-embeddings is not trivial and no I/O-efficient algorithms have been presented. Bannister et al[7] showed that computing exact Koebe-embedding requires computing the roots of polynomials of unbounded degree. Thus, the focus of the current state-of-the-art algorithms is to numerically approximate the Koebe-embedding. Orick et al[20] presented an algorithm that approximates a Koebe-embedding by alternating between adjusting radii and positions of vertices. Empirical results show that the algorithm runs in approximately linear time, however, no theoretical worst-case bounds are given. Recently, Dong et al[9] presented an algorithm based on convex optimization that computes an approximate Koebe-embedding in near-linear worst-case time. To our knowledge, these algorithms do not trivially extend to the I/O model.

Motivated by applications in geometric information systems, we show that our algorithm can be adapted to Delaunay triangulation without having to first compute a Koebe-embedding. Delaunay triangulations can be computed using I/Os [1] and are widely used to convert terrain point clouds into so-called triangulated irregular networks which represent a terrain as a triangulated surface. A triangulated irregular network (TIN) is computed by projecting the terrain point cloud in onto the -plane, computing the Delaunay triangulation of the projected points, and lifting the Delaunay triangulation back to . This adaptation can result in boundary vertices in the worst case [18]. However, we test our algorithm on the publicly available digital elevation model of Denmark [10] and show that the algorithm results in a small number of boundary vertices in practice.

Finally, we describe how -way separators can be used to compute the flow accumulation over a terrain, which models the flow of water over a terrain represented as a TIN. We consider a variant of the flow accumulation problem, where we are given a rain distribution function that fits in internal memory and assigns units of water to each vertex . The water in each vertex is then distributed by pushing water to a neighboring vertex according to a given flow direction of . The flow accumulation of a vertex is the total amount of water that flows through . This problem is traditionally solved in the I/O-model using I/Os by a sweep-line algorithm where the flow is propagated using a priority queue during a downward sweep of the terrain [14]. In this paper, we adapt the grid terrain algorithm by Haverkort et al[14] to speed up the computation of flow accumulation over a TIN when given an -way separator of the terrain. Furthermore, we show that this algorithm performs well in practice and outperforms the traditional sweep-line algorithm on the digital elevation model of Denmark when given an -way separator of the terrain.

2 Preliminaries

In this section, we state several preliminary definitions and introduce a more general definition of the -way separator.

2.1 k-ply Neighborhood Systems

We begin by presenting a more formal definition of a Koebe-embedding and then introduce the more general -ply neighborhood system. This generalization will be used later when applying our result to Delaunay triangulations. The definitions in this section follow Miller et al[19]. Let a disk packing be a set of disks in the plane that have disjoint interiors. Koebe [15] showed that every planar graph can be embedded as a disk packing such that the center of each disk corresponds to a vertex in the planar graph and two disks are tangent if and only if there is an edge connecting the two corresponding vertices in the graph. We refer to this as a Koebe-embedding of the planar graph. Note that a partitioning of a Koebe-embedding into disjoint subsets implies a partitioning of the vertices of the graph. Miller et al[19] used this idea to describe how a Koebe-embedding of a planar graph can be used to compute a planar separator. In order to describe this result, we first introduce the more general -ply neighborhood system:

Definition (-ply neighborhood system).

A -ply neighborhood system in dimensions is a set of closed balls in such that no point in is in the interior of more than of the balls.

In the following sections, we introduce the notion of a separator for -ply neighborhood systems in for general . Observe that a disk packing is a -ply neighborhood system and, thus, this separator will also be applicable to Koebe-embeddings.

We now state the planar separator result by Miller et al[19]. A -dimensional sphere partitions a -neighborhood system in into three subsets: the set of all balls of contained in the exterior of , the set of all balls of contained in the interior of , and the set of all balls of that intersect the boundary of . Correspondingly, we define the subsets and .

Theorem 2.1 (Sphere Separator [19]).

Suppose is a -ply neighborhood system in with size . Then there exists a sphere in such that

Additionally, Miller et al[19] presented a sampling-based algorithm for approximately computing sphere separators. We state their result with two additional properties that follow from their original proof; first, we state the result when applied to a subset . Note that does not have to be a proper subset. Secondly, we state the number of I/Os used.

Theorem 2.2 (Randomized Separator Algorithm [19]).

Suppose is a -ply neighborhood system in , is a subset of . Then for any constant we can compute a sphere 

such that with probability at least

Furthermore, the algorithm uses I/Os, where is a constant depending only on and .

We remark that the randomization in the algorithm is over random numbers chosen by the algorithm independent of the input. Therefore, the algorithm can be used to find a sphere satisfying the inequalities by applying the algorithm an expected constant number of times [19]. When describing our algorithm, we will use Theorem 2.2 as a black box and refer to the resulting sphere as a sphere separator.

2.2 Multiway Separator

We now present a generalization of the sphere separator result by Miller et al[19]. Given a -ply neighborhood system in and a parameter , an -way division of is a division of into non-disjoint regions such that each ball in is contained in at least one region. A region contains two types of balls: boundary balls and interior balls. An interior ball is contained in exactly one region and has non-empty intersection only with balls contained in the same region. A boundary ball is shared among at least two regions and has non-empty intersection with balls in multiple regions. Each region contains balls in total which are stored consecutively on disk. An -way separator is an -way division where each region contains boundary balls. It follows that the total number of boundary balls of an -way separator is . We use the term multiway separator and multiway division whenever is clear from the context.

2.3 Range Spaces, VC dimensions, and Samples

The main result of this paper is obtained by computing a multiway separator on a sample of a given -ply neighborhood system. In order to prove correctness of our algorithm, we show that the result generalizes to the entire neighborhood system with at least constant probability. This proof relies on the concepts of Vapnik–Chervonenkis dimension (VC dimension) [12] and relative -approximations [13]. Here, we will provide a quick summary of various definitions and theorems. For a more in-depth introduction to VC-dimension, we refer to Har-Peled et al[12].

Definition (Range Space).

A range space is a pair , where is the ground set (finite or infinite) and is a (finite or infinite) family of subsets of . The elements of are referred to as classifiers.

Definition (VC Dimension).

Let be a range space. Given , let the intersection of and be defined as

If contains all subsets of , then we say that is shattered by . The VC Dimension of , denoted by , is the maximum cardinality of a shattered subset of :

If there are arbitrarily large shattered subsets, then .

Lemma 2.3 (VC Dimension of Halfspaces [12, Chapter 5]).

Let be the range space where and is the set of halfspaces in . Then has VC dimension .

Lemma 2.4 (Mixing of Range Spaces [12, Chapter 5]).

Let be range spaces which share the same ground set and all have VC dimension at most . Consider the sets of classifiers and , where

Then the range spaces and have VC dimension .

Definition (Measure).

Let be a range space, and let be a finite subset of . The measure of a classifier in is the quantity

Definition (Relative Approximation).

Let be a range space, and let be a finite subset of . For given parameters , a subset is a relative -approximation for if, for each , we have

Lemma 2.5 (Relative Approximation Sampling [13]).

Let be a range space with VC dimension , and let be a finite subset of . Given parameters , a random sample of size at least

for an appropriate constant , is a relative -approximation for with probability at least .

3 Multiway Separator Algorithm for k-ply Neighborhood Systems

In this section, we state our main result for I/O-efficiently computing an -way separator of a -ply neighborhood system . The algorithm can be applied to -ply neighborhood systems in for any dimensions , however, we prove correctness only for .

We begin by presenting an algorithm that computes an -way division of under the assumption that . Given a -ply neighborhood system in the plane and a parameter , we let and compute an -way division by recursively computing -way divisions until is divided into regions of size . In order to compute an -way division on , we sample a subset of sufficiently large size. By recursively computing sphere separators using Theorem 2.2, we can compute an -way separator for . Let denote the sphere separators that are computed during the recursion. We refer to as a separator tree. We prove that with at least constant probability we obtain an -way division by recursively applying the sphere separators of on . It follows that we obtain an -way division for by repeating this sampling-based algorithm an expected constant number of times. This result provides guarantees on the number of boundary balls in the sample , however, we do not prove bounds for the total number of boundary balls in the -way division of . We expect the number of boundary balls to be small and confirm so by experimental evaluation in later sections. Additionally, by increasing the sample size and slightly modifying the algorithm, one can remove the assumption on and prove that the result is an -way separator for . This results in an I/O-efficient algorithm for computing -way separators when . In other words, we provide guarantees on the number of boundary balls by assuming is sufficiently large.

The rest of this section is structured as follows: in Section 3.1, given , we describe how to sample , recursively apply Theorem 2.2, and prove that the result can be used to divide into regions of size with at least constant probability. In Section 3.2, we bound the number of regions to and the total number of boundary balls in to . In Section 3.3, we bound the number of boundary balls in each region of to . Finally, in Section 3.4, we bound the expected number of I/Os used and state the final algorithm.

3.1 Recursively Computing Separators

In this subsection, we describe how to sample , recursively apply Theorem 2.2, and prove that the result can be used to divide into regions of size with at least constant probability, where and assuming .

First, sample of size at least , where is a constant we choose later. Letting , we recursively compute sphere separators on for at most levels; let denote the balls of that occur in a node of the recursion. In the root of the recursion, we let . At each node of the recursion, we compute a sphere separator such that and are smaller than by at least a constant factor. For now, assume that such a sphere separator is obtained. We then recurse on the two subproblems and . The recursion is continued until the problem size is at most , where is a sufficiently large constant. The separator tree is then formed from the sphere separators by letting the nodes in correspond to the recursively computed sphere separators.

We proceed by describing how to compute a sphere separator in a node of the recursion. Using Theorem 2.2 and setting , we compute a sphere separator that with probability at least satisfies

(1)
(2)

where is a constant. Note that this uses I/Os. We apply Theorem 2.2 an expected constant number of times until a separator that satisfies (2) and (1) is obtained. Since we divide into regions of size at most , it follows that . Using the assumption that and that (2) holds, we upper bound as follows:

(3)

Thus, for , it follows from (3) that . Combining this with (1), we obtain a separator that satisfies

(4)

Thus, the problem size becomes smaller by a constant factor and we can recursively compute separators until is divided into regions of size at most . This requires at most levels of recursion.

We proceed by showing that is divided into regions of size when is divided recursively using the sphere separators of . We proceed introducing the following two lemmas:

Lemma 3.1.

Let be the set of all circles in the plane and let be the set of all disks in the plane. Let be the set of classifiers defined as . Correspondingly, we define . The range spaces and have constant VC dimension.

The proof of Lemma 3.1 is included in Appendix A.

Lemma 3.2.

Let be the set of classifiers defined as

The range space has VC dimension .

Proof.

Observe that any finite subset shattered in the range space can also be shattered in the range space , where . Thus the VC-dimension of is upper bounded by the VC-dimension of . The proof now follows from Lemma 3.1 and Lemma 2.4. ∎

Observe that the separator tree defines a set of regions such that each region is defined by the intersection of at most classifiers in the set corresponding to the sphere separators in a path from the root to a leaf in . Thus, a region can be defined by a classifier in . Furthermore, each region contains at most balls of . It now follows from Lemma 3.2 and Lemma 2.5, that by sampling with size at least , is a relative -approximation of in the range space with at least constant probability. The constant is chosen according to Lemma 3.2 and Lemma 2.5. Thus, with at least constant probability, the regions of contains at most balls.

3.2 Bounding the Total Number of Boundary Balls

In the previous subsection, we bounded the size of each region. However, the number of regions may be large, since boundary balls occur in multiple regions. Recall that in a node of the recursion we obtain a sphere separator such that the number of intersected balls is . We proceed to upper bound the total number of boundary balls by bounding the total number of intersected balls during the recursion. We show how to bound the total number of intersections in by .

In Section 3.1, we argued that the recursion on produces regions of size at most , where is a constant. Furthermore, it follows from (4) that the size of the smallest region is at least . Similar to Frederickson [11], we let denote the number of intersections of balls in during the recursion. At each node of the recursion, is divided into two regions and by the sphere separator . Recall that is selected such that , where . Furthermore, it follows from (2) that . We upper bound as follows:

It can be shown by induction in the size of that . The proof of this is included in Appendix B. Using the assumption , it follows that the number of regions in and is . Thus, the separator tree can be used to divide into regions of size by recursively applying the sphere separators of to

3.3 Reducing the Number of Boundary Balls in a Region

In order to obtain an -way separator of from an -way division of , we reduce the number of boundary balls in each region. Let be the number of boundary balls in a region of . Letting be a constant, we describe how to ensure for each region . We do this by adapting the technique described by Arge et al[5, Section 3.3]. It follows from the previous subsection that . Let denote the total number of regions which contain more than boundary balls. For each region where , we recursively apply Theorem 2.2 on the boundary balls of . In other words, we recursively compute sphere separators that divide into regions and with at most boundary balls each, where is a constant. Recall that , where is a constant. It follows that for sufficiently large , the region is divided into two regions such that the number of boundary balls in each is . We recurse until the number of boundary balls is at most . Observe, that the total number of sphere separators required to divide all regions is

Thus, the total number of regions will be increased by only . Furthermore, since each sphere separator adds new boundary balls, the total number of boundary balls remains . Each separator can be computed using expected I/Os and, thus, we can reduce the number of boundary balls in each region by using an additional I/Os. Thus, the algorithm results in an -way separator for .

3.4 Bounding the Total I/O-Complexity

We now bound the total I/O-complexity of the algorithm described. Let be the separator tree computed during the levels of the recursion. Recall, at each node of the recursion we compute a sphere separator using Theorem 2.2 using expected I/Os. Thus, the expected number of I/Os used in a level of recursion is . Using the upper bound on and the assumption , we conclude that the total number of I/Os used during the levels of recursion is . The results can now be stated in the following lemma:

Lemma 3.3.

Given a -ply neighborhood system in the plane, a parameter , and a random sample such that and , where is a constant. Then an -way separator for can be computed using expected I/Os. Furthermore, with at least constant probability produces an -way division of when applied to .

In order to compute the -way division of , we apply Lemma 3.3 to obtain a separator tree such that the sphere separators of can be used to divide into regions of size with at least constant probability. Note that since , the number of leaves in is . We partition into a constant number of subtrees such that each subtree fits in memory along with one block per leaf of . We now divide by recursing over the subtrees using I/Os in total. We repeat the above an expected constant number of times until an -way division of is obtained.

We proceed by bounding the number of I/Os used by Lemma 3.3 to . We use a sample of size and assume . Under this assumption, the expected number of I/Os is bounded as follows:

Next, consider the case when . We observe that it is sufficient to divide into regions of size , since regions of size can be divided further by directly applying Theorem 2.2 on the regions. That is, each region of size fits in memory after a constant number of applications of Theorem 2.2, so directly applying Theorem 2.2 uses additional I/Os. Thus, using Lemma 3.3, we compute a separator tree such that can be used to divide into regions of size with at least constant probability, where

It follows that a sample of size is sufficient. Similar to before, we repeat the sampling and separator computation until an -way division of is found. This uses expected I/Os since the problem size fits in internal memory after a constant number of levels of recursion.

Lemma 3.4.

Given a -ply neighborhood system in the plane and a parameter such that , an -way division of can be computed using expected I/Os.

We now present the final algorithm for computing an -way division for and general . This algorithm follows immediately from Lemma 3.4. Let and recursively apply Lemma 3.4 for levels of recursion. This uses a total of I/Os.

Theorem 3.5.

Given a -ply neighborhood system in the plane, an -way division of can be computed using expected I/Os, assuming .

Furthermore, this implies an algorithm for Koebe-embeddings, since Koebe-embeddings are -ply neighborhood systems.

Theorem 3.6.

Given a planar graph and a Koebe-embedding of , an -way division of can be computed using expected I/Os.

Note that the above results in an -way division, but does not provide any bounds on the number of boundary balls when dividing . In Appendix D, we argue that the number of boundary balls in can be bounded by increasing the sample size. The result is stated in the theorem below.

Theorem 3.7.

Given a -ply neighborhood system in the plane, an -way separator of can be computed using expected I/Os, assuming and

4 Applications to Delaunay Triangulations and Terrain

Motivated by applications in geographic information systems, we turn our attention to Delaunay triangulations. Delaunay triangulations are often used in practice to compute TINs from point clouds and can be computed I/O-efficiently using I/Os [1]. In the previous section, we described how an -way separator for a planar graph can be computed I/O-efficiently, when given a Koebe-embedding of and having certain assumptions on the size of the internal memory. However, to our knowledge, there are no known I/O-efficient algorithms for the computation of Koebe-embeddings. Therefore, we describe how to adapt our algorithm to compute separators for triangulations without computing a Koebe-embedding. This adaptation can also be applied to any triangulation in the plane and, thus, any planar graph since a planar graph can be triangulated by trivially adding edges until every face of is a triangle.

Observe that the circumcircles of a triangulation in the plane form a -ply neighborhood system in the plane. However, is in the worst-case since many circumcircles may overlap in one point. Miller et al[18] showed that the circumcircles of a Delaunay triangulation form a -ply neighborhood with at most constant when the largest ratio of the circum-radius to the length of the smallest edge over all triangles is at most constant. Therefore, we describe how to adapt our algorithm to circumcircles of a Delaunay triangulation. We compute an -way division for and use to divide by mapping each region of to a region as follows: Let the boundary vertices of be the vertices that are contained in a triangle whose circumcircle is a boundary ball of . The internal vertices of are the vertices which are contained only in triangles whose circumcircles are internal balls of region .

We see that the number of vertices in is , where is the number of circumcircles of . Furthermore, the number of boundary vertices in regions is at most a constant factor larger than the number of boundary circumcircles of .

It follows that we can also divide a TIN into regions by applying the above construction to the Delaunay triangulation used to construct the TIN. This approach allows for division-based algorithms on TINs. Haverkort et al[14] showed that a separator of a grid graph can be used to I/O-efficiently compute the flow accumulation of a terrain. Their results can be applied to TINs and -way divisions:

Lemma 4.1 (Division-Based Flow Accumulation [14]).

Let be a TIN and let denote the vertices of the TIN. Given a rain distribution and an -way division of such that each region fits in internal memory along with the rain distribution. Letting denote the number of boundary vertices in a region of the -way division, the flow accumulation over can be computed using I/Os

The flow accumulation over a TIN can also be computed using an algorithm that performs a top-down sweep of the terrain [14]. The result is stated in the following lemma:

Lemma 4.2 (Sweep-Based Flow Accumulation [14]).

Let be a TIN and let denote the vertices of the TIN. Given a rain distribution that fits in internal memory, the flow accumulation of can be computed using I/Os.

5 Experiments

In this section, we present the experiments we have conducted on real terrain data to demonstrate the efficiency of our algorithms. We have implemented and tested the -way division algorithm for Delaunay triangulations as described in Section 4 and Theorem 3.5. Furthermore, we have implemented and tested the two flow accumulation algorithms stated in Lemma 4.1 and Lemma 4.2. The algorithms have been implemented in C++ and make heavy use of the TPIE library [3] which provides implementations of fundamental I/O-efficient algorithms such as sorting and priority queues. The experiments were run on an Intel i7-3770 CPU with of RAM and of storage running Linux. Each program was assigned of memory during testing.

For our tests, we used the Danish Elevation Model published by the Danish Agency for Data Supply and Efficiency [10]. The model consists of a highly detailed point cloud collected using LiDAR technology. Each point in the model is labeled as ground, vegetation, a building rooftop, and others. There is an average of 4.5 points per square meter, however, this varies for each area of the terrain since non-reflective surfaces, such as certain types of vegetation, can result in large holes in the point cloud. In this paper, we focus on modeling the flow of water and, thus, we filtered out points not labeled as ground or building points. We constructed a TIN from the resulting point cloud using a Delaunay triangulation.

We have implemented Theorem 2.2 to compute sphere separators as described by Miller et al[19] and Clarkson et al[8]. Whenever we sample sphere separators on input , we discard a separator if it does not satisfy that and , where is the number of triangles in , is the number of triangles inside or intersecting , and is the number of triangles outside or intersecting . We note that this differs from previous sections, where we computed the separator based on the circumcircles. However, since the number of intersected circumcircles upper bounds the number of intersected triangles, the previous proofs still apply to this adaptation. Furthermore, this simplifies the algorithm by avoiding the computation of circumcircles. In order to examine the size of , we sampled a large number of sphere separators on various areas of the terrain and computed the number of intersected triangles. The results are shown in Appendix C and demonstrate that is small in practice.

Next, we assessed our -way division algorithm (Theorem 3.5) and the flow accumulation algorithms described in Lemma 4.1 and Lemma 4.2. We evaluated our implementation on a TIN representing a area around Herning, Denmark. The TIN is represented as a list of triangles, such that each vertex of a triangle is annotated with its coordinates, an index, and the flow direction of the vertex. The flow directions have been computed by selecting the neighboring vertex with minimum height. This resulted in a TIN with billion points and a total size of . In order to measure the I/O throughput of the system, we implemented a simple program that reads the entire TIN and observed that the program took hours and minutes to run on the TIN.

We computed a multiway division on the input such that our implementation of division-based flow accumulation (Lemma 4.1) can fit each region in memory. The division was saved to disk by representing each region as a file containing a list of triangles. Each triangle is annotated with the same information as the input as well as a boolean variable indicating whether the triangle is a boundary triangle or not. This resulted in a division with regions and only boundary vertices in total. Furthermore, , where is the number of boundary vertices in region . Thus, we see that the number of boundary vertices is very small in practice despite having no theoretical bounds for this algorithm. The computation of this multiway division took hours and minutes. We implemented a program that reads the output division and writes a dummy division with the same number of regions and the same number of triangles in each region. This program took hours and minutes to read the division and hours and minutes to write the dummy division. In other words, we see that a large proportion of the running time is due to computation in internal memory.

Having computed the division, we applied it to compute the flow accumulation over the TIN using the division-based algorithm (Lemma 4.1). In our implementation, we used the uniform rain distribution that distributes one unit of water on each vertex. In this setup, our implementation of division-based flow accumulation took hours and minutes when given the division as input. We compared this to an implementation of Lemma 4.2 which took hours and minutes when running on the same input TIN. Thus, we see a significant improvement in running time when the terrain has been preprocessed by computing the multiway division. In other words, our approach improves performance when computing flow accumulation for different rain distributions on the same TIN. However, the division-based approach is slower when computing the flow accumulation for only a single rain distribution, and the time spent preprocessing is included.

References

  • [1] P. K. Agarwal, L. Arge, and K. Yi. I/O-efficient construction of constrained Delaunay triangulations. In Algorithms - ESA 2005, 13th Annual European Symposium, Proceedings, volume 3669 of Lecture Notes in Computer Science, pages 355–366. Springer, 2005. doi:10.1007/11561071_33.
  • [2] A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116–1127, September 1988. doi:10.1145/48529.48535.
  • [3] L. Arge, M. Rav, S. C. Svendsen, and J. Truelsen. External memory pipelining made easy with TPIE. In 2017 IEEE International Conference on Big Data (Big Data), pages 319–324. IEEE, 2017. doi:10.1109/BigData.2017.8257940.
  • [4] L. Arge, L. Toma, and N. Zeh. I/O-efficient topological sorting of planar DAGs. In Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 85–93. Association for Computing Machinery, 2003. doi:10.1145/777412.777427.
  • [5] L. Arge, F. van Walderveen, and N. Zeh. Multiway simple cycle separators and I/O-efficient algorithms for planar graphs. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 901–918. Society for Industrial and Applied Mathematics, 2013. doi:10.1137/1.9781611973105.65.
  • [6] L. Arge and N. Zeh. I/O-efficient strong connectivity and depth-first search for directed planar graphs. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, pages 261–270. IEEE Computer Society, 2003. doi:10.1109/SFCS.2003.1238200.
  • [7] M. J. Bannister, W. E. Devanny, D. Eppstein, and M. T. Goodrich. The Galois complexity of graph drawing: Why numerical solutions are ubiquitous for force-directed, spectral, and circle packing drawings. In Graph Drawing, volume 8871 of Lecture Notes in Computer Science, pages 149–161. Springer, 2014. doi:10.1007/978-3-662-45803-7_13.
  • [8] K. L. Clarkson, D. Eppstein, G. L. Miller, C. Sturtivant, and S. Teng. Approximating center points with iterated radon points. In Proceedings of the Ninth Annual Symposium on Computational Geometry, pages 91–98. Association for Computing Machinery, 1993. doi:10.1145/160985.161004.
  • [9] S. Dong, Y. T. Lee, and K. Quanrud. Computing circle packing representations of planar graphs. In Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2860–2875. Society for Industrial and Applied Mathematics, 2020. doi:10.1137/1.9781611975994.174.
  • [10] The Danish Agency for Data Supply and Efficiency. Danmarks højdemodel (the Danish elevation model), 2021. URL: https://sdfe.dk/hent-data/danmarks-hoejdemodel/.
  • [11] G. N. Frederickson. Fast algorithms for shortest paths in planar graphs, with applications. SIAM Journal on Computing, 16(6):1004–1022, December 1987. doi:10.1137/0216064.
  • [12] S. Har-Peled. Geometric Approximation Algorithms. Mathematical surveys and monographs. American Mathematical Society, 2011.
  • [13] S. Har-Peled and M. Sharir. Relative (p, )-approximations in geometry. Discrete & Computational Geometry, 45(3):462–496, 2011. doi:10.1007/s00454-010-9248-1.
  • [14] H. J. Haverkort and J. Janssen. Simple I/O-efficient flow accumulation on grid terrains. CoRR, abs/1211.1857, 2012. URL: http://arxiv.org/abs/1211.1857, arXiv:1211.1857.
  • [15] P. Koebe. Kontaktprobleme der konformen abbildung. Ber. Sächs. Akad. Wiss. Leipzig, Math.-Phys. Kl., 88:141–164, 1936.
  • [16] R. Lipton and R. Tarjan. A separator theorem for planar graphs. SIAM Journal on Applied Mathematics, 36(2):177–189, 1979. doi:10.1137/0136016.
  • [17] A. Maheshwari and N. Zeh. I/O-efficient planar separators. SIAM Journal on Computing, 38(3):767–801, May 2008. doi:10.1137/S0097539705446925.
  • [18] G. L. Miller, D. Talmor, S. Teng, and N. Walkington. A Delaunay based numerical method for three dimensions: Generation, formulation, and partition. In

    Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing

    , pages 683–692. Association for Computing Machinery, 1995.
    doi:10.1145/225058.225286.
  • [19] G. L. Miller, S. Teng, W. Thurston, and S. A. Vavasis. Separators for sphere-packings and nearest neighbor graphs. Journal of the ACM, 44(1):1–29, January 1997. doi:10.1145/256292.256294.
  • [20] G. L. Orick, K. Stephenson, and C. Collins. A linearized circle packing algorithm. Computational Geometry, 64:13–29, August 2017. doi:10.1016/j.comgeo.2017.03.002.

Appendix A Proof of Constant VC dimension

In this section, we present the proof of Lemma 3.1.

Proof.

We begin by proving that the VC dimension of is constant. Let denote a classifier corresponding to the separator sphere with center and radius . Let be a disk in the plane with center and radius . We can express as follows:

We now map to by mapping the monomials to variables as follows:

Assume there is a finite subset that is shattered by . It follows that maps to a subset where and is shattered by , where is the set of all halfspaces in . Thus, the VC dimension of is at most the VC dimension of . It follows from Lemma 2.3 and Lemma 2.4 that the VC dimension of is constant. The VC dimension of can be bounded correspondingly. ∎

Appendix B Bound on the Total Number of Boundary Balls

In this section, we provide an upper bound for the function introduced in Section 3.2. Recall the definition of :

We proceed to show that , where and . We prove this by induction in the size of . Letting , it follows that since . This proves the base case. We proceed with proving the induction step using the induction hypothesis:

Recall the assumption that and that . It follows that . Furthermore, since it follows that . We now rewrite as follows:

Recall that the recursion produces regions of size at most , where is a constant. By inserting and setting sufficiently large such that , we obtain

This completes the proof by induction. Thus, the total number of intersections when recursing is at most .

Appendix C Experimental Evaluation of Separator Size

In this section, we examine the number of intersections when computing sphere separators on a terrain represented as a TIN. As previously described in Section 5, we have implemented Theorem 2.2 as described by Miller et al[19] and Clarkson et al[8]. We apply the implementation to various areas of a TIN constructed from the Danish Elevation model including only ground and building points. Let denote the input TIN and let denote the number of triangles in . For each area we sample sphere separators such that each sphere separator satisfies and