 # On Optimal Polyline Simplification using the Hausdorff and Fréchet Distance

We revisit the classical polygonal line simplification problem and study it using the Hausdorff distance and Fréchet distance. Interestingly, no previous authors studied line simplification under these measures in its pure form, namely: for a given ϵ > 0, choose a minimum size subsequence of the vertices of the input such that the Hausdorff or Fréchet distance between the input and output polylines is at most ϵ. We analyze how the well-known Douglas-Peucker and Imai-Iri simplification algorithms perform compared to the optimum possible, also in the situation where the algorithms are given a considerably larger error threshold than ϵ. Furthermore, we show that computing an optimal simplification using the undirected Hausdorff distance is NP-hard. The same holds when using the directed Hausdorff distance from the input to the output polyline, whereas the reverse can be computed in polynomial time. Finally, to compute the optimal simplification from a polygonal line consisting of n vertices under the Fréchet distance, we give an O(kn^5) time algorithm that requires O(kn^2) space, where k is the output complexity of the simplification.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Line simplification (a.k.a. polygonal approximation) is one of the oldest and best studied applied topics in computational geometry. It was and still is studied, for example, in the context of computer graphics (after image to vector conversion), in Geographic Information Science, and in shape analysis. Among the well-known algorithms, the ones by Douglas and Peucker

 and by Imai and Iri  hold a special place and are frequently implemented and cited. Both algorithms start with a polygonal line (henceforth polyline) as the input, specified by a sequence of points , and compute a subsequence starting with and ending with , representing a new, simplified polyline. Both algorithms take a constant and guarantee that the output is within from the input.

The Douglas-Peucker algorithm  is a simple and effective recursive procedure that keeps on adding vertices from the input polyline until the computed polyline lies within a prespecified distance

. The procedure is a heuristic in several ways: it does not minimize the number of vertices in the output (although it performs well in practice) and it runs in

time in the worst case (although in practice it appears more like time). Hershberger and Snoeyink  overcame the worst-case running time bound by providing a worst-case time algorithm using techniques from computational geometry, in particular a type of dynamic convex hull.

The Imai-Iri algorithm  takes a different approach. It computes for every link with whether the sequence of vertices that lie in between in the input lie within distance to the segment . In this case is a valid link that may be used in the output. The graph that has all vertices as nodes and all valid links as edges can then be constructed, and a minimum link path from to represents an optimal simplification. Brute-force, this algorithm runs in time, but with the implementation of Chan and Chin  or Melkman and O’Rourke  it can be done in time.

There are many more results in line simplification. Different error measures can be used , self-intersections may be avoided , line simplification can be studied in the streaming model , it can be studied for 3-dimensional polylines , angle constraints may be put on consecutive segments , there are versions that do not output a subset of the input points but other well-chosen points , it can be incorporated in subdivision simplification [12, 13, 16], and so on and so forth. Some optimization versions are NP-hard [12, 16]. It is beyond the scope of this paper to review the very extensive literature on line simplification.

Among the distance measures for two shapes that are used in computational geometry, the Hausdorff distance and the Fréchet distance

are probably the most well-known. They are both

bottleneck measures, meaning that the distance is typically determined by a small subset of the input like a single pair of points (and the distances are not aggregated over the whole shapes). The Fréchet distance is considered a better distance measure, but it is considerably more difficult to compute because it requires us to optimize over all parametrizations of the two shapes. The Hausdorff distance between two simple polylines with and vertices can be computed in time . Their Fréchet distance can be computed in time .

Now, the Imai-Iri algorithm is considered an optimal line simplification algorithm, because it minimizes the number of vertices in the output, given the restriction that the output must be a subsequence of the input. But for what measure? It is not optimal for the Hausdorff distance, because there are simple examples where a simplification with fewer vertices can be given that still have Hausdorff distance at most between input and output. This comes from the fact that the algorithm uses the Hausdorff distance between a link and the sub-polyline . This is more local than the Hausdorff distance requires, and is more a Fréchet-type of criterion. But the line simplification produced by the Imai-Iri algorithm is also not optimal for the Fréchet distance. In particular, the input and output do not necessarily lie within Fréchet distance , because links are evaluated on their Hausdorff distance only.

The latter issue could easily be remedied: to accept links, we require the Fréchet distance between any link and the sub-polyline to be at most  [2, 15]. This guarantees that the Fréchet distance between the input and the output is at most . However, it does not yield the optimal simplification within Fréchet distance . Because of the nature of the Imai-Iri algorithm, it requires us to match a vertex in the input to the vertex in the output in the parametrizations, if is used in the output. This restriction on the parametrizations considered limits the simplification in unnecessary ways. Agarwal et al.  refer to a simplification that uses the normal (unrestricted) Fréchet distance with error threshold as a weak -simplification under the Fréchet distance.111Weak refers to the situation that the vertices of the simplification can lie anywhere. They show that the Imai-Iri algorithm using the Fréchet distance gives a simplification with no more vertices than an optimal weak -simplification under the Fréchet distance, where the latter need not use the input vertices.

The discussion begs the following questions: How much worse do the known algorithms and their variations perform in theory, when compared to the optimal Hausdorff and Fréchet simplifications? What if the optimal Hausdorff and Fréchet simplifications use a smaller value than ? As mentioned, Agarwal et al.  give a partial answer. How efficiently can the optimal Hausdorff simplification and the optimal Fréchet simplification be computed (when using the input vertices)?

##### Organization and results.

In Section 2 we explain the Douglas-Peucker algorithm and its Fréchet variation; the Imai-Iri algorithm has been explained already. We also show with a small example that the optimal Hausdorff simplification has fewer vertices than the Douglas-Peucker output and the Imai-Iri output, and that the same holds true for the optimal Fréchet simplification with respect to the Fréchet variants.

In Section 3 we will analyze the four algorithms and their performance with respect to an optimal Hausdorff simplification or an optimal Fréchet simplification more extensively. In particular, we address the question how many more vertices the four algorithms need, and whether this remains the case when we use a larger value of but still compare to the optimization algorithms that use .

In Section 4 we consider both the directed and undirected Hausdorff distance to compute the optimal simplification. We show that only the simplification under the directed Hausdorff distance from the output to the input polyline can be computed in polynomial time, while the rest is NP-hard to compute. In Section 5 we show that the problem can be solved in polynomial time for the Fréchet distance.

## 2 Preliminaries

The line simplification problem takes a maximum allowed error and a polyline defined by a sequence of points , and computes a polyline defined by and the error is at most . Commonly the sequence of points defining is a subsequence of points defining , and furthermore, and . There are many ways to measure the distance or error of a simplification. The most common measure is a distance, denoted by , like the Hausdorff distance or the Fréchet distance (we assume these distance measures are known). Note that the Fréchet distance is symmetric, whereas the Hausdorff distance has a symmetric and an asymmmetric version (the distance from the input to the simplification).

The Douglas-Peucker algorithm for polyline simplification is a simple recursive procedure that works as follows. Let the line segment be the first simplification. If all points of lie within distance from this line segment, then we have found our simplification. Otherwise, let be the furthest point from , add it to the simplification, and recursively simplify the polylines and . Then merge their simplifications (remove the duplicate ). It is easy to see that the algorithm runs in time, and also that one can expect a much better performance in practice. It is also straightforward to verify that polyline has Hausdorff distance (symmetric and asymmetric) at most to the output. We denote this simplification by , and will leave out the arguments and/or if they are understood.

We can modify the algorithm to guarantee a Fréchet distance between and its simplification of at most by testing whether the Fréchet distance between and its simplification is at most . If not, we still choose the most distant point to be added to the simplification (other choices are possible). This modification does not change the efficiency of the Douglas-Peucker algorithm asymptotically as the Fréchet distance between a line segment and a polyline can be determined in linear time. We denote this simplification by .

We have already described the Imai-Iri algorithm in the previous section. We refer to the resulting simplification as . It has a Hausdorff distance (symmetric and asymmetric) of at most and never has more vertices than . Similar to the Douglas-Peucker algorithm, the Imai-Iri algorithm can be modified for the Fréchet distance, leading to a simplification denoted by .

We will denote the optimal simplification using the Hausdorff distance by , and the optimal simplification using the Fréchet distance by . In the case of Hausdorff distance, we require to be within of its simplification, so we use the directed Hausdorff distance.

The example in Figure 1 shows that and —which are both equal to itself—may use more vertices than . Similarly, the example in Figure 2 shows that and may use more vertices than . Figure 1: Simplifications \it IIH (same as input, left) and \it OPTH (in blue, right) for an example. Figure 2: Simplifications \it IIF (same as input, left) and \it OPTF (in blue, right) for an example.

## 3 Approximation quality of Douglas-Peucker and Imai-Iri simplification

The examples of the previous section not only show that and (and and ) use more vertices than and , respectively, they show that this is still the case if we run II with a larger value than . To let use as few vertices as , we must use instead of when the example is stretched horizontally. For the Fréchet distance, the enlargement factor needed in the example approaches if we put far to the left. In this section we analyze how the approximation enlargement factor relates to the number of vertices in the Douglas-Peucker and Imai-Iri simplifications and the optimal ones. The interest in such results stems from the fact that the Douglas-Peucker and Imai-Iri algorithms are considerably more efficient than the computation of and .

### 3.1 Hausdorff distance

To show that (and by consequence) may use many more vertices than , even if we enlarge , we give a construction where this occurs. Imagine three regions with diameter at the vertices of a sufficiently large equilateral triangle. We construct a polyline where are in one region, are in the second region, and the remaining vertices are in the third region, see Figure 3. Let be such that is in the third region. An optimal simplification is where is any even number between and . Since the only valid links are the ones connecting two consecutive vertices of , is itself. If the triangle is large enough with respect to , this remains true even if we give the Imai-Iri algorithm a much larger error threshold than .

For any , there exists a polyline with vertices and an such that has vertices and has vertices. Figure 3: The Douglas-Peucker and Imai-Iri algorithms may not be able to simplify at all, whereas the optimal simplification using the Hausdorff distance has just three vertices (in blue, right).

Note that the example applies both to the directed and the undirected Hausdorff distance.

### 3.2 distance

Our results are somewhat different for the Fréchet distance; we need to make a distinction between and .

##### Douglas-Peucker

We construct an example that shows that may have many more vertices than , even if we enlarge the error threshold. It is illustrated in Figure 4. Vertex is placed slightly higher than so that it will be added first by the Fréchet version of the Douglas-Peucker algorithm. Eventually all vertices will be chosen. has only four vertices. Since the zigzag can be arbitrarily much larger than the height of the vertical zigzag , the situation remains if we make the error threshold arbitrarily much larger. Figure 4: Left: a polyline on which the Fréchet version of the Douglas-Peucker algorithm performs poorly and the output polyline contains n vertices. Right: the optimal simplification contains four vertices (in blue).

For any , there exists a polyline with vertices and an such that has vertices and has vertices.

##### Remark

One could argue that the choice of adding the furthest vertex is not suitable when using the Fréchet distance, because we may not be adding the vertex (or vertices) that are to “blame” for the high Fréchet distance. However, finding the vertex that improves the Fréchet distance most is computationally expensive, defeating the purpose of this simple algorithm. Furthermore, we can observe that also in the Hausdorff version, the Douglas-Peucker algorithm does not choose the vertex that improves the Hausdorff distance most (it may even increase when adding an extra vertex).

##### Imai-Iri

Finally we compare the Fréchet version of the Imai-Iri algorithm to the optimal Fréchet distance simplification. Our main construction has ten vertices placed in such a way that has all ten vertices, while has only eight of them, see Figures 5 and 6.

It is easy to see that under the distance, = for the previous construction in Figure 4. We give another input polyline in Figure 6 to show that also does not approximate even if is allowed to use that is larger by a constant factor. Figure 5: The Imai-Iri simplification will have all vertices because the only valid links with a Fréchet distance at most ε are the ones connecting two consecutive vertices in the polyline. Figure 6: The optimal simplification can skip p2 and p3; in the parametrizations witnessing the Fréchet distance, \it OPTF “stays two vertices behind” on the input until the end. Right, the free space diagram of P and \it OPTF.

We can append multiple copies of this construction together with a suitable connection in between. This way we obtain:

There exist constants , , a polyline with vertices, and an such that .

By the aforementioned result of Agarwal et al. , we know that the theorem is not true for .

## 4 Algorithmic complexity of the Hausdorff distance

The results in the previous section show that both the Douglas-Peucker and the Imai-Iri algorithm do not produce an optimal polyline that minimizes the Hausdorff or Fréchet distance, or even approximate them within any constant factor. Naturally, this leads us to the following question: Is it possible to compute the optimal Hausdorff or simplification in polynomial time?

In this section, we present a construction which proves that under the Hausdorff distance, computing the optimal simplified polyline is NP-hard.

### 4.1 Undirected Hausdorff distance

We first consider the undirected (or bidirectional) Hausdorff distance; that is, we require both the maximum distance from the initial polyline to the simplified polyline and the maximum distance from to to be at most .

Given a polyline and a value , the problem of computing a minimum length subsequence of such that the undirected Hausdorff distance between and is at most is NP-hard.

We prove the theorem with a reduction from Hamiltonian cycle in segment intersection graphs. It is well-known that Hamiltonian cycle is NP-complete in planar graphs , and by Chalopin and Gonçalves’ proof  of Scheinerman’s conjecture  that the planar graphs are included in the segment intersections graphs it follows that Hamiltonian cycle in segment intersections graphs is NP-complete.

Let be a set of line segments in the plane, and assume all intersections are proper (if not, extend the segments slightly). Let be its intersection graph (i.e. has a vertex for every segment in , and two vertices in are connected by an edge when their corresponding segments intersect). We assume that is connected; otherwise, clearly there is no Hamiltonian cycle in .

We first construct an initial polyline as follows. (Figure 7 illustrates the construction.) Let be the arrangement of , let be some endpoint of a segment in , and let be any path on that starts and finishes at and visits all vertices and edges of (clearly, may reuse vertices and edges). Then is simply copies of appended to each other. Consequently, the order of vertices in now must follow the order of these copies. We now set to a sufficiently small value.

Now, an output polyline with Hausdorff distance at most to must also visit all vertices and edges of , and stay close to . If is sufficiently small, there will be no benefit for to ever leave . Figure 7: The construction: A is the arrangement of a set of segments S. We build an input path P that “paints” over S completely, and we are looking for an output path Q that corresponds to a Hamiltonian cycle. In this case, there is no Hamiltonian cycle, and the path gets stuck.

A solution of length exists if and only if admits a Hamiltonian cycle.

###### Proof.

Clearly, any simplification will need to visit the endpoints of the segments in , and since it starts and ends at the same point , will need to have length at least . Furthermore, will need to have at least two internal vertices on every segment : once to enter the segment and once to leave it (note that we cannot enter or leave a segment at an endpoint since all intersections are proper intersections). This means the minimum number of vertices possible for is .

Now, if admits a Hamiltonian cycle, it is easy to construct a simplification with vertices as follows. We start at and collect the other endpoint of the segment of which is an endpoint. Then we follow the Hamiltonian cycle to segment ; by definition is an edge in so their corresponding segments intersect, and we use the intersection point to leave and enter . We proceed in this fashion until we reach , which intersects , and finally return to .

On the other hand, any solution with vertices must necessarily be of this form and therefore imply a Hamiltonian cycle: in order to have only vertices per segment the vertex at which we leave must coincide with the vertex at which we enter some other segment, which we call , and we must continue until we visited all segments and return to . ∎

### 4.2 Directed Hausdorff distance: P→Q

We now shift our attention to the directed Hausdorff distance from to : we require the maximum distance from to to be at most , but may have a larger distance to . The previous reduction does not seem to work because there is always a Hamiltonian Cycle of length for this measure. Therefore, we prove the NP-hardness differently.

The idea is to reduce from Covering Points By Lines, which is known to be both NP-hard  and APX-hard : given a set of points in , find the minimum number of lines needed to cover the points.

Let be an instance of the Covering Points By Lines problem. We fix based on and present the construction of a polyline connecting a sequence of points: such that for every , we have for some . The idea is to force the simplification to cover all points in except those in , such that in order for the final simplification to cover all points, we only need to collect the points in using as few line segments as possible. To this end, we will place a number of forced points , where a point is forced whenever its distance to any line through any pair of points in is larger than . Since must be defined by a subset of points in , we will never cover unless we choose to be a vertex of . Figure 9 shows this idea. On the other hand, we need to place points that allow us to freely draw every line through two or more points in . We create two point sets and to the left and right of , such that for every line through two of more points in , there are a point in and a point in on that line. Finally, we need to build additional scaffolding around the construction to connect and cover the points in and . Figure 9 shows the idea.

We now treat the construction in detail, divided into three parts with different purposes:

1. a sub-polyline that contains ;

2. a sub-polyline that contains and ; and

3. two disconnected sub-polylines which share the same purpose: to guarantee that all vertices in the previous sub-polyline are themselves covered by .

#### Part 1: Placing S

First, we assume that every point in has a unique -coordinate; if this is not the case, we rotate until it is.222Note that, by nature of the Covering Points By Lines problem, we cannot assume is in general position; however, a rotation for which all -coordinates are unique always exists. We also assume that every line through at least two points of has a slope between and ; if this is not the case, we vertically scale until it is. Now, we fix to be smaller than half the minimum difference between any two -coordinates of points in , and smaller than the distance from any line through two points in to any other point in not on the line.

We place forced points such that the -coordinate of lies between the -coordinates of and and the points lie alternatingly above and below ; we place them such that the distance of the line segment to is and the distance of to is larger than . Next, we place two auxiliary points and on such that the distance of each point to is ; refer to Figure 9. Then let be a polyline connecting all points in the construction; will be part of the input segment .

The idea here is that all forced points must appear on , and if only the forced points appear on , everything in the construction will be covered except the points in (and some arbitrarily short stubs of edges connecting them to the auxiliary points). Of course, we could choose to include more points in in to collect some points of already. However, this would cost an additional three vertices per collected point (note that using fewer than three, we would miss an auxiliary point instead), and in the remainder of the construction we will make sure that it is cheaper to collect the points in separately later. Figure 8: Example of τ1 where n=3. For a given ε, the (simplified) polyline f1,f2,f3,f4 covers the gray area but not the blue points s1,s2,s3.

#### Part 2: Placing and covering L and R

In the second part of the construction we create two sets of vertices, and , which can be used to make links that cover . Consider the set of all unique lines that pass through at least two points in . We create two sets of points and with the following properties:

• [nolistsep]

• the line through and is one of the lines in ,

• the line through and for has distance more than to any point in , and

• the points in (resp. ) all lie on a common vertical line.

Clearly, we can satisfy these properties by placing and sufficiently far from . We create a vertical polyline for each set, which consists of non-overlapping line segments that are connecting consecutive vertices in their -order from top to bottom. Let and be such polylines containing vertices each.

Now, each line that covers a subset of can become part of by selecting the correct pair of vertices from and . However, if we want to contain multiple such lines, this will not necessarily be possible anymore, since the order in which we visit and is fixed (and to create a line, we must skip all intermediate vertices). The solution is to make copies333The copies are in exactly the same location. If the reader does not like that and feels that points ought to be distinct, she may imagine shifting each copy by a sufficiently small distance (smaller than ) without impacting the construction. of and copies of and visit them alternatingly. Here is the maximum number of lines necessary to cover all points in in the Covering Points By Lines problem.

We create a polyline that contains and by connecting them with two new vertices and . Both and should be located far enough from and such that a link between and a vertex in (and with ) will not cover any point in . To ensure that the construction ends at the last vertex in , we use two vertices and , see Figure 9. Let be a polyline connecting all points in the construction; will also be part of the input .

#### Part 3: Putting it together

All vertices in can be covered by the simplification and a suitable choice of links in . Therefore, the last part is a polyline that will definitely cover all vertices in and at the same time, serve as a proper connection between and . Consequently, all vertices in this part will also be forced and therefore be a part of the final simplified polyline. Figure 10: Schematic views of connecting up different parts of the NP hardness construction into a single polyline. The bold polylines show τ1 and τ2 and indicate multiple parts of P close together.

We divide this last part into two disconnected polylines: and . The main part of is a vertical line segment that is parallel to . There is a restriction to : the Hausdorff distance from each of , and also from line segments between them to should not be larger than . In order to force to be a part of the simplified polyline, we must place its endpoints away from . Then, and can be connected by connecting and the first vertex in to different endpoints of .

Next, the rest of that has not been covered yet, will be covered by . First, we have a vertical line segment that is similar to , in order to cover (), and all line segments between them. Then, a horizontal line segment is needed to cover all horizontal line segments and (). Similar to , the endpoints of and should be located far from , implying that intersects both and . This is shown in Figure 10, left. We complete the construction by connecting the upper endpoint of to the left endpoint of and the lower endpoint of to the last vertex in .

We can show that even if the input is restricted to be non-self-intersecting, the simplification problem is still NP-hard. We modify the last part of the construction to remove the three intersections. Firstly, we shorten on the right side and place it very close to . Since the right endpoint of is an endpoint of the input, it will always be included in a simplification. Secondly, to remove the intersection of and , we bring the upper endpoint of to just below , so very close to . To make sure that we must include in the simplification we connect the lower endpoint of to . This connecting segment is further from so it cannot help enough to cover the lower part of ; only itself can do that. This is shown in Figure 10, right. Figure 11: The full construction showing that computing \it OPTH is NP-hard. τ3a is a line segment e=¯¯¯¯¯¯¯¯¯¯¯¯f6,f7 and τ3b=⟨f8,...,f11⟩. The endpoints of the construction are f11 and l′6∈L2. The gray area is within ε from the sub-polyline consist of all green vertices: ⟨f11,..,f8,f1,..,f7⟩, which is a part of the simplification. The rest of the simplification is the purple polyline ⟨f7,r6,l1,l4,r′3,l′6⟩ that covers all blue points S (r′3∈R2 and l′6∈L2). In order to show the red points clearly, ε used in this figure is larger than it needs to be. Consequently, a link ¯¯¯¯¯¯¯¯¯s1s4 can cover s2 and s3, which is not possible if ε is considerably smaller.

We present a full construction of for in Figure 11.

Given a polyline and a value , the problem of computing a minimum length subsequence of such that the directed Hausdorff distance from to is at most is NP-hard.

###### Proof.

The construction contains vertices and a part of its simplified polyline with a constant number of vertices that contains and all vertices in and can cover all vertices in the construction except for . Then, the other part of the simplified polyline depends on links to cover points in . These links alternate between going from left to right and from right to left. Between two such links, we will have exactly two vertices from some or two from some .

The only two ways a point can be covered is by including explicitly or by one of the links that cover and at least another point . If we include explicitly then we must also include and or else they are not covered. It is clearly more efficient (requiring fewer vertices in the simplification) if we use a link that covers and another , even if is covered by another such link too. The links of this type in an optimal simplified polyline correspond precisely to a minimum set of lines covering . Therefore, the simplified polyline of the construction contains a solution to Covering Points By Lines instance. Since in the construction is simple, the theorem holds even for simple input. ∎

### 4.3 Directed Hausdorff distance: Q→P

Finally, we finish this section with a note on the reverse problem: we want to only bound the directed Hausdorff distance from to (we want the output segment to stay close to the input segment, but we do not need to be close to all parts of the input). This problem seems more esoteric but we include it for completeness. In this case, a polynomial time algorithm (reminiscent of Imai-Iri) optimally solves the problem.

Given a polyline and a value , the problem of computing a minimum length subsequence of such that the directed Hausdorff distance from to is at most can be solved in polynomial time.

###### Proof.

We compute the region with distance from explicitly. For every link we compute if it lies within that region, and if so, add it as an edge to a graph. Then we find a minimum link path in this graph. For a possibly self-intersecting polyline as the input a simple algorithm takes time (faster is possible). ∎

## 5 Algorithmic complexity of the distance

In this section, we show that for a given polyline and an error , the optimal simplification can be computed in polynomial time using a dynamic programming approach.

### 5.1 Observations

Note that a link in is not necessarily within distance to the sub-polyline (for example, in Figure 2). Furthermore, a (sequence of) link(s) in could be mapped to an arbitrary subcurve of , not necessarily starting or ending at a vertex of . For example, in Figure 6, the sub-polyline has distance to a sub-polyline of that starts at but ends somewhere between and . At this point, one might imagine a dynamic programming algorithm which stores, for each vertex and value , the point on which is the farthest along such that there exists a simplification of the part of up to using links that has distance at most to the part of up to . However, the following lemma shows that even this does not yield optimality; its proof is the example in Figure 12.

There exists a polyline and an optimal --simplification that has to use , using links, with the following properties:

• There exists a partial simplification of and a point on such that the distance between and the subcurve of up to is , but

• there exists no partial simplification of that is within distance to the subcurve of starting at that uses fewer than links. Figure 12: An example where the farthest-reaching simplification up to p4 using 2 links is not part of any solution that uses p4. Left: the input curve P in black, with circles of radius ε around all vertices in light gray. Middle: A 2-link simplification of ⟨p1,p2,p3,p4⟩ that reaches up to a point on ¯¯¯¯¯¯¯¯¯¯p4p5 (in yellow) which can be extended to a 4-link simplification of P. Right: A 2-link simplification of ⟨p1,p2,p3,p4⟩ that reaches point r on ¯¯¯¯¯¯¯¯¯¯p5p6 (in pink) which does not allow simplification.

### 5.2 A dynamic programming algorithm

Lemma 5.1 shows that storing a single data point for each vertex and value of is not sufficient to ensure that we find an optimal solution. Instead, we argue that if we maintain the set of all points at that can be “reached” by a simplification up to each vertex, then we can make dynamic programming work. We now make this precise and argue that the complexity of these sets of reachable points is never worse than linear.

First, we define , a parameterization of as a continuous mapping: where and . We also write for to be the subcurve of starting at and ending at , also writing for short.

We say that a point can be reached by a -simplification for if there exists a simplification of using links which has distance at most to . We let in this case, and otherwise. With slight abuse of notation we also say that itself is reachable, and that an interval is reachable if all are reachable (by a -simplification).

###### Obervation 1.

A point can be reached by a -simplification if and only if there exist a and a such that can be reached by a -simplification and the segment has distance at most to .

###### Proof.

Follows directly from the definition of the distance. ∎

Observation 1 immediately suggests a dynamic programming algorithm: for every and we store a subdivision of into intervals where is true and intervals where is false, and we calculate the subdivisions for increasing values of . We simply iterate over all possible values of , calculate which intervals can be reached using a simplification via , and then take the union over all those intervals. For this, the only unclear part is how to calculate these intervals.

We argue that, for any given and , there are at most reachable intervals on , each contained in an edge of . Indeed, every -reachable point must have distance at most to , and since the edge of that lies on intersects the disk of radius centered at in a line segment, every point on this segment is also -reachable. We denote the farthest point on which is -reachable by .

Furthermore, we argue that for each edge of , we only need to take the farthest reachable point into account during our dynamic programming algorithm. If , , , , and exist such that , and has distance to , then also has distance to .

###### Proof.

By the above argument, is a line segment that lies completely within distance from , and is a line segment that lies completely within distance from .

We are given that the distance between and is at most ; this means a mapping exists such that . Let . Then and , so the line segment lies fully within distance from .

Therefore, we can define a new -mapping between and which maps to the segment , the curve to the segment (following the mapping given by ), and the segment to the point . ∎

Now, we can compute the optimal simplification by maintaining a table storing , and calculate each value by looking up values for the previous value of , and testing in linear time for each combination whether the distance between the new link and is within or not.

Given a polyline and a value , we can compute the optimal polyline simplification of that has distance at most to in time and space, where is the output complexity of the optimal simplification.

## 6 Conclusions

In this paper, we analyzed well-known polygonal line simplification algorithms, the Douglas-Peucker and the Imai-Iri algorithm, under both the Hausdorff and the distance. Both algorithms are not optimal when considering these measures. We studied the relation between the number of vertices in the resulting simplified polyline from both algorithms and the enlargement factor needed to approximate the optimal solution. For the Hausdorff distance, we presented a polyline where the optimal simplification uses only a constant number of vertices while the solution from both algorithms is the same as the input polyline, even if we enlarge by any constant factor. We obtain the same result for the Douglas-Peucker algorithm under the distance. For the Imai-Iri algorithm, such a result does not exist but we have shown that we will need a constant factor more vertices if we enlarge the error threshold by some small constant, for certain polylines.

Next, we investigated the algorithmic problem of computing the optimal simplification using the Hausdorff and the distance. For the directed and undirected Hausdorff distance, we gave NP hardness proofs. Interestingly, the optimal simplification in the other direction (from output to input) is solvable in polynomial time. Finally, we showed how to compute the optimal simplification under the distance in polynomial time. Our algorithm is based on the dynamic programming method and runs in time and requires space.

A number of challenging open problems remain. First, we would like to show NP-hardness of computing an optimal simplification using the Hausdorff distance when the simplification may not have self-intersections. Second, we are interested in the computational status of the optimal simplification under the Hausdorff distance and the distance when the simplification need not use the vertices of the input. Third, it is possible that the efficiency of our algorithm for computing an optimal simplification with distance at most can be improved. Fourth, we may consider optimal polyline simplifications using the weak distance.

## References

•  Mohammad Ali Abam, Mark de Berg, Peter Hachenberger, and Alireza Zarei:. Streaming algorithms for line simplification. Discrete & Computational Geometry, 43(3):497–515, 2010.
•  Pankaj K. Agarwal, Sariel Har-Peled, Nabil H. Mustafa, and Yusu Wang. Near-linear time approximation algorithms for curve simplification. Algorithmica, 42(3):203–219, 2005.
•  Helmut Alt, Bernd Behrends, and Johannes Blömer. Approximate matching of polygonal shapes.

Annals of Mathematics and Artificial Intelligence

, 13(3):251–265, Sep 1995.
•  Helmut Alt and Michael Godau. Computing the Fréchet distance between two polygonal curves. International Journal of Computational Geometry & Applications, 5(1-2):75–91, 1995.
•  Gill Barequet, Danny Z. Chen, Ovidiu Daescu, Michael T. Goodrich, and Jack Snoeyink. Efficiently approximating polygonal paths in three and higher dimensions. Algorithmica, 33(2):150–167, 2002.
•  Lilian Buzer. Optimal simplification of polygonal chain for rendering. In Proceedings 23rd Annual ACM Symposium on Computational Geometry, SCG ’07, pages 168–174, 2007.
•  Jérémie Chalopin and Daniel Gonçalves. Every planar graph is the intersection graph of segments in the plane: Extended abstract. In

Proceedings 41st Annual ACM Symposium on Theory of Computing

, STOC ’09, pages 631–638, 2009.
•  W.S. Chan and F. Chin. Approximation of polygonal curves with minimum number of line segments or minimum error. International Journal of Computational Geometry & Applications, 06(01):59–77, 1996.
•  Danny Z. Chen, Ovidiu Daescu, John Hershberger, Peter M. Kogge, Ningfang Mi, and Jack Snoeyink. Polygonal path simplification with angle constraints. Computational Geometry, 32(3):173–187, 2005.
•  Mark de Berg, Marc van Kreveld, and Stefan Schirra. Topologically correct subdivision simplification using the bandwidth criterion. Cartography and Geographic Information Systems, 25(4):243–257, 1998.
•  David H. Douglas and Thomas K. Peucker. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica, 10(2):112–122, 1973.
•  Regina Estkowski and Joseph S. B. Mitchell. Simplifying a polygonal subdivision while keeping it simple. In Proceedings 17th Annual ACM Symposium on Computational Geometry, SCG ’01, pages 40–49, 2001.
•  Stefan Funke, Thomas Mendel, Alexander Miller, Sabine Storandt, and Maria Wiebe. Map simplification with topology constraints: Exactly and in practice. In Proc. 19th Workshop on Algorithm Engineering and Experiments (ALENEX), pages 185–196, 2017.
•  M.R. Garey, D.S. Johnson, and L. Stockmeyer. Some simplified NP-complete graph problems. Theoretical Computer Science, 1(3):237–267, 1976.
•  Michael Godau. A natural metric for curves - computing the distance for polygonal chains and approximation algorithms. In Proceedings 8th Annual Symposium on Theoretical Aspects of Computer Science, STACS 91, pages 127–136. Springer-Verlag, 1991.
•  Leonidas J. Guibas, John E. Hershberger, Joseph S.B. Mitchell, and Jack Scott Snoeyink. Approximating polygons and subdivisions with minimum-link paths. International Journal of Computational Geometry & Applications, 03(04):383–415, 1993.
•  John Hershberger and Jack Snoeyink. An implementation of the Douglas-Peucker algorithm for line simplification. In Proceedings 10th Annual ACM Symposium on Computational Geometry, SCG ’94, pages 383–384, 1994.
•  Hiroshi Imai and Masao Iri. Polygonal approximations of a curve - formulations and algorithms. In Godfried T. Toussaint, editor, Computational Morphology: A Computational Geometric Approach to the Analysis of Form. North-Holland, Amsterdam, 1988.
•  V. S. Anil Kumar, Sunil Arya, and H. Ramesh. Hardness of set cover with intersection 1. In Automata, Languages and Programming: 27th International Colloquium, ICALP 2000, pages 624–635. Springer, Berlin, Heidelberg, 2000.
•  Nimrod Megiddo and Arie Tamir. On the complexity of locating linear facilities in the plane. Operations Research Letters, 1(5):194–197, 1982.
•  Avraham Melkman and Joseph O’Rourke. On polygonal chain approximation. In Godfried T. Toussaint, editor, Computational Morphology: A Computational Geometric Approach to the Analysis of Form, pages 87–95. North-Holland, Amsterdam, 1988.
•  E. R. Scheinerman. Intersection Classes and Multiple Intersection Parameters of Graphs. PhD thesis, Princeton University, 1984.